---
title: "Corpus search · GrepRAG"
description: "Productivity harness for Claude Code — Odyssey, Corpus, Lore, Inbox, plus the advisor skills that turn them into multitasking agents."
canonical: https://www.greprag.com/corpus/
source: /corpus/
---

Store · Corpus

# Drop in a book.  
Your agent reads it.

Drop in a manuscript, a folder of articles, or any text you want your agent to know. Ask for a passage; get one back.

Your AI quotes from the actual text instead of paraphrasing what it half-remembers. The agent searches with `OR`, quotes, and negations. No embeddings, no vector DB, no monthly bill that scales with collection size.

Position

## No embeddings. Ever.

Embeddings drift. They cost per token at index time and per query at read time. They reward fluency over precision. Worst of all, they hide the why behind a rank.

A grep that finds the right block is a feature you can debug. A nearest-neighbor that mostly finds vibes is a feature you can't.

Ingest

## One store per source. Pick a kind.

Stores are typed at creation. The kind picks the right enrichment style at write time — synonym-heavy for fiction, identifier-aware for code, citation-tuned for reference docs.

$ greprag corpus create \--kind book my-novel
$ greprag corpus create \--kind voice my-tweets
$ greprag corpus create \--kind reference research-notes
$ greprag corpus create \--kind book treatise-on-rhetoric

$ greprag corpus upload my-novel ./manuscript/
\# Uploads. Enrichment drains in background via cron.

A write-side LLM (`gemini-2.5-flash-lite`) produces a synonym sidecar for each substantive node: two or three near-synonyms, the concept names that surface it, the kinds of questions it answers. Stored in `node_enrichments`, indexed via tsvector.

Read

## Pure SQL at query time.

Field-weighted tsvector: content at weight A, heading path at B, enrichment at C. Ranked by `ts_rank_cd`. Neighbors-of-hits get an adjacency boost — when three results land in the same paragraph, all three rank higher because something coherent is in there.

Zero LLM tokens at read. Sub-100ms p50 on a few-thousand-node store.

Query surface

## The agent formulates the query.

GrepRAG receives lexical strings in `websearch_to_tsquery` syntax. `OR` widens; quotes phrase-match; `-word` negates. The agent — sitting on the user's intent — decomposes one fuzzy request into one to three sub-queries and chains probe → narrow → expand.

$ curl https://api.greprag.com/v1/stores/my-novel/query \\
  \-d '{"q": "antagonist confrontation OR \\"first meeting\\"", "limit": 10}'

$ curl https://api.greprag.com/v1/search \# multistore batch \\
  \-d '{"stores": \["paybot", "rfcs"\], "q": "idempotency key retry"}'

Composition

## Batch across stores. One ranked list.

`POST /v1/search` takes N store IDs and a single query. Each store runs independently. Results fuse via reciprocal-rank with per-store normalization. Useful when an answer might live in either your manuscript or your reference notes and you don't want to pick first.

Walk

## Find one hit. Read its neighbors.

Retrieval returns a node. The neighbors of a node are addressable: previous sibling, next sibling, parent section, full chapter. Useful when the right answer is the paragraph after the keyword hit.

$ greprag corpus walk my-novel \--node-id a4f9c2e1 \--direction forward \--limit 5

Use cases

## What people do with it.

### Voice retrieval

Index your writing. Ask "find three samples that read like a calm explainer." The agent quotes one back as the voice anchor for a rewrite.

### Research desk

Drop in every paper, article, and book you've collected. Ask "what did Cialdini say about reciprocity?" and the agent quotes the exact passage instead of paraphrasing.

### Reference desk

Index RFCs, vendor docs, internal standards. The agent searches once and pulls the relevant paragraph instead of paraphrasing what it half-remembers.
