Token efficiency

On this page

A coding agent that wants to know “where is SegmentWriter used?” greps, then reads every matching file in full to find a few lines. On this repo that’s 12 files and ~22,700 tokens. greplm answers from its index and returns the same 12 files in 474 tokens — 97.9% fewer:

"Where is SegmentWriter used?"

  grep + read whole files →  12 files,  ~22,700 tokens
  greplm search           →  12 files,      474 tokens   ·  97.9% fewer

That’s the whole idea. greplm keeps agents off the “grep, then read whole files” treadmill that burns context. Every query returns compact locations (and, for snippet, an exact slice) instead of file bodies, so the agent pulls in a few lines rather than thousands.

Compact by design on the wire

The payload itself is tuned for an LLM reader, not a human. The MCP server (and --json output by default) emits compact JSON — no pretty-print indentation or per-field newlines. Snippet bodies are a single text blob with one starting line number instead of an array of {line, text} objects, so field names and line numbers are never repeated. On a typical pack that roughly halves the bytes versus indented, per-line JSON — for identical content. Pass --pretty on the CLI when you want indented JSON to read by eye.

greplm tracks this automatically. Each query records the grep+read baseline (the full size of the unique files it referenced) against the size of the payload it actually returned; greplm savings aggregates the estimate (≈4 chars/token, a conservative basis):

greplm savings            # rolling 24h / 7d / all-time summary
greplm savings --verbose  # also break down by query kind
greplm savings --json     # machine-readable

  greplm Token Savings
  ================================================================
  Period          Calls   Savings
  ----------------------------------------------------------------
  Last 24h            4   [███████████████░]  ~95.6k tokens (96%)
  Last 7 days         4   [███████████████░]  ~95.6k tokens (96%)
  All time            4   [███████████████░]  ~95.6k tokens (96%)

Stats live in .greplm/savings.jsonl; set GREPLM_NO_SAVINGS=1 to disable recording.

Benchmarks

To reproduce the efficiency numbers, run the benchmark in bench/. It runs against this repository itself and needs only a release build plus ripgrep — no external corpus, embedding model, or third-party tool:

cargo build --release

# Search efficiency vs the ripgrep + read-whole-files baseline:
python3 bench/run_bench.py

# Context-pack efficiency (budgeted packs vs reading whole files):
python3 bench/context/pack_bench.py

A typical run on this repo shows greplm returning the same files as ripgrep with ~99% fewer tokens for content search and ~89% fewer for context packs. See benchmark README for the methodology.