chickadee » cjson

chicken-cjson

Chicken Scheme bindings for the JSON parser cjson. It cannot read from ports and must have the entire JSON object in memory. chicken-cjson only parses JSON, does not serialize.

The string->json procedure returns the same datastrucures as medea's, read-json, so if you're passing that strings already, string->json should be a drop-in replacement:

    
$ printf '{"array":[1,2,3],"null":null}' | csi -R cjson -p '(string->json (read-line))'
((null . null) (array . #(1.0 2.0 3.0)))
$ printf '{"array":[1,2,3],"null":null}' | csi -R medea -p '(read-json (read-line))'
((array . #(1 2 3)) (null . null))

chicken-cjson may offer a significant performance improvement over medea and json, but comes at a price: all data must be availabe as a string. This means you cannot parse JSON coming in from a port directly, and that's why there is no read-json here.

chicken-cjson offers an alternative API which exposes a JSON c-struct as a #<cjson> scheme record, and accompanying procedures like cjson-string. This is for performance reasons and allows you to pick apart JSON objects using lolevel C-functions, without transitioning into the Scheme data-structure. This may be faster but is a lot uglier:

    
$ printf '{"array":[1,2,3],"null":null}' |\
  csi -R cjson -p '(cjson-double (cjson-array-ref (cjson-obj-ref (string->cjson (read-line)) "array") 1))'
2.0

Note that if you know that a number is fixnum, you can use cjson-int instead.

Requirements

None. cjson comes bundled.

TODO

API

(string->cjson string) procedure

Note that that's not string->json! Parses the string and returns a #<cjson> record which holds a c-struct representing the JSON. The returned record has a finalizer attached to it so the underlying c-struct gets freed on garbage collection.

(string->cjson* string) procedure

Like string->cjson but does not attach a finalizer to the #<cjson> record. cjson-free must be explicitly called later on the returned value to avoid memory leaks. This is sometimes faster than attaching finalizers, particularly if there are large numbers of #<cjson> objects.

(cjson->string cjson [pretty-print?]) procedure

Convert the #<cjson> object to its JSON-representation, returned as a string. pretty-print? defaults to true. Note that this can only serialize cjson records, and not scheme objects.

(cjson-schemify cjson) procedure

Convert the #<cjson> object to scheme data-structures. The data-structures are the same as medea's, where array => vector and object => alist.

(string->json string) procedure

Make scheme data-structures of the json data in string using [cjson]:

    
(define (string->json str)
  (let* ((j (string->cjson* str))
         (s (cjson-schemify j)))
    (cjson-free j)
    s))

For string inputs, this should be API-equivalent of medea's (read-json).

(cjson-type cjson) procedure

Pick out the type of a cjson record. Returns a fixnum.

   [variable] cjson/false
   [variable] cjson/true
   [variable] cjson/null
   [variable] cjson/number
   [variable] cjson/string
   [variable] cjson/array
   [variable] cjson/object

Exposes the cjson-type constants.

(cjson-int cjson) procedure
(cjson-double cjson) procedure
(cjson-string cjson) procedure
(cjson-key cjson) procedure

"Unbox" the json value. cjson-type must match like this:

    
(select (cjson-type cjson)
   ((cjson/false) #f)
   ((cjson/true)  #t)
   ((cjson/null) 'null)
   ((cjson/number) (cjson-double cjson))
   ((csjon/string) (cjson-string cjson))
   (else (error "probably a vector or object")))

[procedure] (cjson-array-size cjson)

Return the number of elements in the array. If cjson's type is not an array, this is undefined bahaviour.

(cjson-array-ref cjson index) procedure

Return the element of cjson at position index. index must be a fixnum. Undefined behaviour if cjson is not an array.

(cjson-obj-ref cjson key) procedure

Select field key from cjson. Key must be a string. Undefined behaviour if cjson is not of type cjosn/object.

Performance

The performance characteristics of JSON parsing is mysterious. It is recommended to use medea or json in most usage-cases because they can parse directly off ports and they also serialize. For particular cases, however, there may be significant performace improvements in using cjson.

Sometimes the speedup is negligible:

    
$ (echo '[' ; for i in {0..1000} ; do echo '"str", 1, 2, 3, 4,' ; done ; echo ' 0]') > bigjson
$ time csi -R medea -e '(pp (read-json))' < bigjson >/dev/null

real    0m0.152s
user    0m0.140s
sys     0m0.013s
$ time csi -R cjson -e '(pp (string->json (read-string)))' < bigjson >/dev/null

real    0m0.114s
user    0m0.103s
sys     0m0.010s

Here, having to doing a read-string first isn't great. cjson may shine when you already have the JSON data as a string:

    
$ for i in {0..100000} ; do echo '{"field" : {"id" : 1}}' ; done > jsonlines
$ time csi -R medea -R ports -e '(port-for-each (lambda (line) (pp (alist-ref `field (read-json line)))) read-line)' < jsonlines  > /dev/null

real    0m3.997s
user    0m3.933s
sys     0m0.067s
$ time csi -R cjson -R ports -e '(port-for-each (lambda (line) (pp (alist-ref `field (string->json line)))) read-line)' < jsonlines  > /dev/null

real    0m1.099s
user    0m1.083s
sys     0m0.017s

That's four times faster. In some cases, though, keeping the #<cjson> record and using its combersome API can pay off:

    
$ JSON='{"field" : {"id" : "ID"} , "a":1, "b":2, "c":[1,{"x":{"y":"y"}},3],"d":{"e":[]}}'
$ for i in {0..100000} ; do echo $JSON ; done > jsonlines
$ for f in test*.scm ; do echo ===== $f === ; cat $f ; done
===== test-cjson.scm ===
(use cjson ports)

(port-for-each
 (lambda (line)
   (let ((cjson (string->cjson* line)))
     (print (cjson-string (cjson-obj-ref (cjson-obj-ref cjson "field") "id")))
     (cjson-free cjson)))
 read-line)

===== test-medea.scm ===
(use medea ports)

(port-for-each
 (lambda (line)
   (print (alist-ref 'id (alist-ref 'field (read-json line)))))
   read-line)

$ for f in test*.scm ; do echo $f ; time csi -s $f < jsonlines >/dev/null ; done
test-cjson.scm

real    0m0.541s
user    0m0.523s
sys     0m0.020s
test-medea.scm

real    0m13.870s
user    0m13.717s
sys     0m0.153s
$ time jq '.field.id' < jsonlines  > /dev/null

real    0m0.513s
user    0m0.500s
sys     0m0.017s

In this particular run, chicken-cjson is 25 times faster than medea, and performas about as well as jq. This speedup typically only happens where you are parsing a lot of JSON, but only a small part of that needs to go back into scheme. Also, this only works because we have one JSON object per line, effectively giving us a json-object delimiter.

Contents »