Choosing a configuration
If you’re not sure which Document settings to use, this page tells you in
60 seconds, based on what the docs you’re loading actually look like.
The decision tree
Section titled “The decision tree” Your corpus is… │ ┌────────────────────┼─────────────────────────┐ │ │ │"normal" — code, "structured" — a "synonym-mismatch" — HRAPI refs, runbooks, contract / policy with FAQs, support ticketshandbooks, reports, near-duplicate clauses where users ask inmixed folders (e.g. governing-law different words than overrides per region) the docs use │ │ ▼ ▼ ▼ Document.from_file(p) Document.from_file(p, Document.from_file(p, doc.context(q) retrieval="hybrid", retrieval="hybrid", model="bge-small") model="bge-small", rerank="cross-encoder") doc.context(q, include_heading=True, neighbors=1)Three recipes cover the practical space.
The recipes
Section titled “The recipes”Default: for most docs
Section titled “Default: for most docs”No model download. ~50ms warm queries. Zero ONNX runtime.
import redhop
doc = redhop.Document.from_file("contract.pdf")ctx = doc.context("What is the governing law?")prompt = ctx.text() # feed to any LLMprint(ctx.report) # see what was retrieved and whyconst { Document } = require("redhop");const doc = Document.fromFile("contract.pdf");const ctx = doc.context("What is the governing law?");const prompt = ctx.text;console.log(ctx.report.rendered);let mut doc = redhop::read_file("contract.pdf")?;let ctx = doc.context("What is the governing law?")?;let prompt = ctx.text();When this is right: code, API references, internal docs, runbooks,
financial reports, handbooks, mixed folders (from_folder). The queries
share vocabulary with the answers, which is the case for most technical
and policy content.
Structured docs with parallel clauses
Section titled “Structured docs with parallel clauses”Hybrid + heading-aware retrieval. Adds an ~80MB embedding model download on first run, and warm queries climb to ~150ms. Worth it only if your doc has clauses like “main clause X” and “EU override of clause X” and “Japan override of clause X”. Heading awareness disambiguates them.
doc = redhop.Document.from_file( "msa.pdf", retrieval="hybrid", model="bge-small",)ctx = doc.context( "What law applies in the UK?", include_heading=True, neighbors=1,)const doc = Document.fromFile("msa.pdf", { retrieval: "hybrid", model: "bge-small",});const ctx = doc.context( "What law applies in the UK?", undefined, // budget — keep default 1, // neighbors true, // includeHeading);let mut doc = redhop::read_file_with("msa.pdf", &redhop::LoadOptions { retrieval: Some("hybrid".into()), model: Some("bge-small".into()), ..Default::default()})?;let ctx = doc.context_with("What law applies in the UK?", &redhop::ContextOptions { include_heading: true, neighbors: 1, ..Default::default()})?;When this is right: legal contracts with regional variations,
multi-jurisdiction policies, vendor security questionnaires with repeated
sub-sections. When it’s wrong: clean single-chapter docs. Adding
neighbors=1 to well-structured chapters can dilute well-targeted
retrieval rather than help it.
Synonym-mismatch corpora
Section titled “Synonym-mismatch corpora”Adds a cross-encoder reranker, which closes the synonym gap (the canonical “employee left” vs “staff terminated” case). Adds ~300MB of model download and 5–10× query latency.
doc = redhop.Document.from_file( "support_kb.md", retrieval="hybrid", model="bge-small", rerank="cross-encoder",)ctx = doc.context("why did the worker leave?")const doc = Document.fromFile("support_kb.md", { retrieval: "hybrid", model: "bge-small", rerank: "cross-encoder",});const ctx = doc.context("why did the worker leave?");When this is right: corpora where queries and answers regularly share no surface words (HR, support FAQs translated from internal phrasing, multilingual content). When it’s wrong: anywhere the lexical default already works: it adds latency without recovering anything. Verify it helps on your corpus before adopting.
Trade-offs at a glance
Section titled “Trade-offs at a glance”| Lexical default | Hybrid + bge | + cross-encoder rerank | |
|---|---|---|---|
| First-run model download | none | ~80MB (bge-small) | + ~300MB (cross-encoder) |
| Warm query latency | ~50ms | ~150ms | ~1000ms |
| Compile-time deps | none | ONNX runtime | ONNX runtime |
| Where it helps | most document QA | regional overrides, parallel sub-sections | synonym-mismatch retrieval |
| Where it hurts | — | adds latency on docs lexical already handles | adds latency without recovering anything unless the failure mode is synonym mismatch |
Query writing: the part the user controls
Section titled “Query writing: the part the user controls”The library can only retrieve what your query gives it. Three patterns no config can fix:
1. One-word polysemy queries
Section titled “1. One-word polysemy queries”'vendor' retrieves the vendor-management section, not the liability cap
(even when both mention vendors). 'settle' can retrieve the
indemnification clause (“settle a claim”) rather than the arbitration
clause (“settle a dispute”), even with a cross-encoder reranker, because
both readings are defensible.
Fix it in the query, not the config: add one disambiguating word.
'liability cap for vendor' correctly finds the cap clause.
'arbitration forum to settle disputes' finds the arbitration clause.
2. Natural-language paraphrase with no shared vocabulary
Section titled “2. Natural-language paraphrase with no shared vocabulary”'How long do I have to cancel and get my money back?' against a
contract that uses “refund” and “termination for convenience” (not
“cancel” or “money back”) can return an empty or weak context across
every tier.
Fix in the query: use the doc’s vocabulary. “What’s the refund
window?” finds the relevant clause immediately. Fix at the config
level (sometimes): retrieval="hybrid" adds a dense embedder that
can match refund to cancel through semantic similarity. Hybrid is a
strict superset of lexical (BM25-tail fallback fills any chunks the
dense pool missed), so you never lose candidates by turning it on. The
cost is the ~80MB embedder download and ~3× warm latency.
3. Templated queries with heavy boilerplate
Section titled “3. Templated queries with heavy boilerplate”If every query in your workload follows a fixed template (“Highlight the parts (if any) of this contract related to X that should be reviewed by a lawyer. Details: …”, “Help me with X, my account is Y, the error is Z”, form-filled queries from a structured UI), BM25 weights each query term by corpus IDF, not by how often the term appears across your query set. So the 19 boilerplate words dilute the 5 real signal words, and retention suffers.
This is measured. On CUAD’s fixed 24-word template, stripping the
boilerplate to just <clause name> <details> before calling context()
lifts ≥0.8 retention from 81.3% → 87.7% (n=300, BM25, budget 2,000 tok),
overtaking LlamaIndex’s 86%. Full mechanism + numbers:
CUAD_CLAUSE_EXPANSION (the controlled three-arm run).
Two paths up the same hill: pick one, don’t combine. Measured on CUAD (CUAD_HYBRID_RERANK):
| path | what you do | retention | latency |
|---|---|---|---|
| One-knob | retrieval="hybrid" (BGE-small embedder) | ~86–88% | ~10 ms/q |
| Best-quality | BM25 default + analyze_query_set → Stripper + Vocabulary chain via doc.context_with_rewrites(...) | 90.7% | ~2.5 ms/q |
Hybrid retrieval reads chunks as semantic content rather than counting tokens, so the boilerplate ratio stops mattering. It substitutes for template stripping by a different mechanism. Running both gives diminishing returns: once one mechanism has fixed the boilerplate dilution, the other adds only +0.3 points. Strip + expand is Pareto-optimal on CUAD (higher retention AND lower latency) but takes the upfront work of writing a stripper and building a synonym dict.
Recommended workflow if you go the best-quality path: detect →
compile → run-through-rewrites → A/B. Every step ships in the public
API. The rewrites compile once (the analyzer pass happens at
construction time, not per query) and run through
doc.context_with_rewrites(query, [stripper, vocab]). The per-stage
audit trail lands on ctx.report.query_rewrites so every transform is
observable:
import redhop
# 1. Detect — analyzer reports the shape of your query set.report = redhop.analyze_query_set(my_queries[:300])# Cross-workload probe (findings/QUERY_SET_ANALYZER.md):# CUAD → is_templated=True, share=0.66, cost="high"# HotpotQA → is_templated=False, share=0.00, cost="none"# MuSiQue → is_templated=False, share=0.12, cost="none"
if report.is_templated: # 2. Compile the rewrites. Stripper compiles the boilerplate # once via the analyzer — token-level matching, so an "of" # stripper does NOT erase the "of" inside "office". stripper = redhop.Stripper(report.boilerplate_terms)
# 3. (optional) Vocabulary. If your workload has a known taxonomy # of "topics" with predictable synonyms (clause types, error # codes, issue categories), compile them once. On CUAD this # lifts retention from 87.7% to 90.7% on top of the Stripper. # Mechanism: workload-curated high-IDF discriminators raise # the BM25 score of the relevant chunk. Opposite mechanism # direction from PRF (which fails on boilerplate-heavy # corpora; see CUAD_PRF_NULL). vocab = redhop.Vocabulary({ # YOUR workload's keys → synonyms. Worked CUAD example in # CUAD_CLAUSE_EXPANSION.md. "change of control": ["merger", "successor", "acquisition"], "non-compete": ["restraint", "non-competition"], })
# 4. Run the chain inside context_with_rewrites; the per-stage # audit lands on ctx.report.query_rewrites automatically. doc = redhop.Document.from_text(your_document) ctx_a = doc.context(user_query) # baseline ctx_b = doc.context_with_rewrites(user_query, [stripper, vocab])
# 5. A/B — redhop.evaluate scores both arms deterministically. # No LLM judge; same primitives the Decision Report uses. eval_a = redhop.evaluate(user_query, ctx_a, gold_chunks=gold_ids) eval_b = redhop.evaluate(user_query, ctx_b, gold_chunks=gold_ids) # eval_b.overall - eval_a.overall is the per-query lift.
# 6. Inspect — every rewrite is observable. for rec in ctx_b.report.query_rewrites: print(rec.stage, "matched=", rec.matched, "added=", rec.added, "removed=", rec.removed)The analyzer is conservative by design: HotpotQA and MuSiQue both
register quiet on the probe (is_templated=False), while CUAD fires
(is_templated=True, share 0.66). The analyzer measures the shape of
your queries. It does not promise a specific retention lift. The
CUAD lift was measured directly at +6 points on CUAD specifically. On a
different templated workload the magnitude depends on how much of your
real signal was being drowned, which is why step 3 matters.
For single-doc extraction workloads also set strategy="raw_topk".
On contract-shape tasks the Auto-routed reasoning_preserving strategy
is solving a multi-hop problem you don’t have, and raw_topk beats it
by ~4 points at every chunk size.
We deliberately do not ship a CUAD-specific strip_template()
helper. Templates are workload-specific, and embedding one into the
library would make the wrong call for the next workload.
Stripper(...) and Vocabulary({...}) take your boilerplate /
synonym dict so the call stays on your side.
What about PRF / query expansion? Tested twice on RedHop, falsified twice with two different failure mechanisms. The dilution win here is subtraction at the query boundary, not addition. See CUAD_PRF_NULL for the mechanism that predicts where unweighted PRF will fail on a new workload.
See also
Section titled “See also”- Context optimization strategy, when to prune what was retrieved: Tips guide.
- All parameters, the full reference: Options.
- Loaders, every on-ramp (
from_text,from_file,from_folder,from_bytes): Loaders.