jsongrep is faster than {jq, jmespath, jsonpath-rust, jql}

Source 1 min read
jsonrustcli-toolsdfasearchperformanceautomata-theory

Summary

jsongrep is a Rust-based JSON query tool that compiles queries into deterministic finite automata (DFAs), enabling single-pass O(n) tree traversal with O(1) per-node transitions and no backtracking. On a ~190 MB dataset it dramatically outperforms jq, jmespath, jsonpath-rust, jaq, and jql in end-to-end benchmarks, combining zero-copy parsing (serde_json_borrow) with DFA-driven subtree pruning.

Key Insight

  • The core idea: JSON documents are trees, and queries over them describe sets of paths through those trees. If the query language is restricted to a regular language (concatenation, alternation, Kleene star, optional), it can be compiled into a DFA before any data is touched.
  • DFA advantage: each node in the JSON tree requires exactly one O(1) table lookup. Non-matching subtrees are pruned immediately without recursion. Tools like jq and jmespath interpret path expressions at each node, which involves recursion stacks and potential backtracking on recursive descent queries (.. or $..).
  • Compilation pipeline: PEG grammar parse to AST, Glushkov’s construction (produces epsilon-free NFA), subset construction to DFA, then single-pass DFS over the JSON tree carrying DFA state.
  • Zero-copy parsing via serde_json_borrow means the parsed tree holds borrowed &str references into the original input buffer rather than allocating new strings, reducing memory overhead significantly on large documents.
  • Trade-off: jsongrep pays a higher upfront query compilation cost (DFA construction is ~10x slower than jmespath’s compile step), but this is amortized instantly on any non-trivial document.
  • Deliberate limitation: jsongrep is a search tool, not a transformation tool. No filters, arithmetic, or string interpolation. It finds values but does not compute new ones.
  • Available as both a CLI binary (jg) and a Rust library crate for embedding in other projects.