How to handle JSON
2019-07-22JSON data is ubiquitous, constantly flowing between web services. But when you have a largish blob of the stuff how do you inspect it's structure or quickly extract the piece you need?
Enter the handy little utility
jq. jq
is a tool to query,
filter, reshape, and otherwise be your JSON swiss army knife.
Lets dive into how it works. As our example JSON data we'll be using the list of IP addresses for AWS services that is published at: ip-ranges.amazonaws.com/ip-ranges.json
Here is a sample from the head of the ip-ranges.json
file:
{
"syncToken": "1563369545",
"createDate": "2019-07-17-13-19-05",
"prefixes": [
{
"ip_prefix": "18.208.0.0/13",
"region": "us-east-1",
"service": "AMAZON"
},
{
"ip_prefix": "52.95.245.0/24",
"region": "us-east-1",
"service": "AMAZON"
},
One of the simplest things we can do with jq
is access a property
using the dot .
operator. Lets get the creation date of the file:
jq .createDate ip-ranges.json
Another handy feature is that .
pretty prints it's output, and can
be used alone to prettify a JSON file.
Lets try something more fun, and more "query"-like, lets create a list of all AWS regions:
jq '[.prefixes[].region, .ipv6_prefixes[].region] | unique' ip-ranges.json
Lets break that down. The .prefixes
makes sense, that is another property
access like in the first sample. `.prefixes` is an array of objects, which we
then iterate over with []
which should remind one of JSON's own array
syntax. For each object in the array we then pull out the .region
key. Then
things get cooler. We have two arrays that we would like to combine, which can
be done with the comma ,
operator. We then end up with a list, which can be
converted back into an array by surrounding the entire query with another []
.
Notice the placement of the single quotes in the statement above, we are
not taking the output of jq
and then using a unix pipe to sent it
to uniq
, rather jq
includes a built in idea of pipes,
and many useful functions.
Here we are piping our 2,233 line array — cough
jq '[.prefixes[].region, .ipv6_prefixes[].region] | length' ip-ranges.json
— to jq's unique
returning a sorted list of regions.
Lets do one more: what are all the current ipv4 addresses, for EC2 in us-west-2?
jq '.prefixes[] | select(.service=="EC2" and .region=="us-west-2") \
| .ip_prefix' ip-ranges.json
I'll leave interpreting it as an exercise for the reader.
jq is really pretty awesome, but as usual, there is more than one
way to do it. If jq
isn't quite your cup of tea, then there is an
entire set of related tools:
- fx Run arbitrary JavaScript on JSON input. Standalone binaries available.
- gron Convert JSON to and from flat, greppable lists of "path=value" statements.
- jid Explore JSON interactively with filtering queries like jq.
- jj Query and modify values in JSON or JSON lines with a key path.
- jl Query and manipulate JSON using a tiny functional language.
- jp (jmespath) JMESPath
- jshon Create and manipulate JSON using getopt-style command-line options.
- json2 Convert JSON to and from flat, greppable lists of "path=value" statements.
- jsonaxe Create and manipulate JSON with a Python-based DSL. Inspired by jq.
- json Run arbitrary JavaScript on JSON input.
- json-table Convert nested JSON into CSV or TSV for processing in the shell.
- json.tool Python 3 docs Validate and pretty-print JSON. This module is part of the standard library of Python 2/3 and is likely to be available wherever Python is installed.
- lobar Explore JSON
interactively or process it in batch with a wrapper for
lodash.chain()
. An alternative to jq with a JavaScript syntax. - ramda-cli Manipulate JSON with the Ramda functional library, and either LiveScript or JavaScript syntax.
- RecordStream Create, manipulate, and output a stream of records, or JSON objects. Can retrieve records from an SQL database, MongoDB, Atom feeds, XML, and other sources.
- rq Create and manipulate JSON with a DSL inspired by Rust, C and JavaScript. Similar to jq. Supports JSON, YAML and TOML as well as binary formats like Apache Avro and MessagePack.