open-source retrieval & context library

The context layer
that shows its work.

Hand RedHop your documents and a question. It pulls just the sections that matter, hands them to your LLM, and explains every decision — with citations back to the source.

Get started → GitHub

surface load → ask → read core one Rust engine · Python · Node · Rust runs in-process · no vector DB · no LLM bundled

load → ask → read ctx.text() · ctx.citations ctx.report

import redhop

doc = redhop.Document.from_file("contract.pdf")
ctx = doc.context("What is the governing law?")

ctx.text()       # prompt, ready for any LLM
ctx.citations    # sources, cited to the page
ctx.report       # what it kept, and why

const { Document } = require("redhop");

const doc = Document.fromFile("contract.pdf");
const ctx = doc.context("What is the governing law?");

ctx.text;        // prompt, ready for any LLM
ctx.citations;   // sources, cited to the page
ctx.report;      // what it kept, and why

use redhop::read_file;

let mut doc = read_file("contract.pdf")?;
let ctx = doc.context("What is the governing law?")?;

ctx.text();      // prompt, ready for any LLM
ctx.citations;   // sources, cited to the page
ctx.report;      // what it kept, and why

−80% prompt tokens ~1.7ms / query gold evidence kept

A short, fixed pipeline

you bring the documents · RedHop owns chunking, retrieval, allocation

inputDocument(s) PDF · DOCX · PPTX · MD · code · folder

→

stage 1Chunking sentence-budgeted · 128-tok default

→

stage 2Retrieval BM25 · Hybrid · Semantic — no ANN, no vector DB

→

stage 3Allocation ContextStrategy · size-gated Auto

→

output in-processBuiltContext prompt · citations · Decision Report

no services to run · nothing to wire · a whole PDF parsed, chunked & indexed in a millisecond or two

Three calls do the work

load → ask → read · each returns a plain object you hand to any LLM

01 · Load

Point at a file or folder

doc = Document.from_file("contract.pdf")

A whole PDF is parsed, chunked and indexed in a millisecond or two. from_folder handles a directory the same way.

02 · Ask

Send a question

ctx = doc.context("…governing law?")

Retrieval, token-budgeting and the assembly decision all happen in-process. No service to call.

03 · Read

Use the result

ctx.text()       # prompt
ctx.citations  # sources
ctx.report     # decision

One context object carries the prompt, the per-chunk citations, and the Decision Report.

What a query matches on

start lexical · climb only when the words don't line up

BM25 matches words. For keyword-dense documents — contracts, specs, API references, logs — the words in the question are usually the words in the answer, so it needs no model at all. When a query shares no vocabulary with its answer, you climb a tier.

lexical match · the same words

what is the governing law of this agreement?

BM25 ✓ doc says "governing law" → exact term overlap, top hit

when the words don't line up

“why did the employee leave?” vs doc: “the staff member was terminated”

BM25 ✕zero shared words — a lexical miss

Semantic ✓matched by meaning, not terms

A pure-synonym query is exactly when you reach past lexical. RedHop tells you which tier found the hit — on the Decision Report.

the retrieval ladder · climb when you need to

lexical BM25 — default · zero deps · fully offline the words in Q are the words in A

hybrid BM25 → dense rerank · bge-small embeds only the pruned pool · scales to a folder

semantic embed every chunk once · exact cosine highest recall when Q and A share no words

type-aware: code → BM25 prose → dense merged by reciprocal-rank fusion

the honest edge

RedHop holds the whole corpus in memory and scores it directly — no ANN index, no vector database. That's a scoped choice, not a claim that vectors are obsolete: at millions of chunks, an ANN-scale stack is the right tool. RedHop is the focused context layer for a corpus that fits in memory.

It shows its work

every answer ships a Decision Report — what was kept, dropped, and why

ctx.report · decision report

{
  "kept": 3, "dropped": 41, "budget": "1024 tok",
  "chunks": [
    { "id": 17, "score": 8.41, "cite": "p.12 §14.2",
      "why": "kept · top BM25 · 'governing law'" },
    { "id": 9,  "score": 5.02, "cite": "p.12 §14.1",
      "why": "kept · adjacent · bridge passage" }
  ],
  "dropped_sample": [{ "id": 31,
      "why": "over budget after gold kept" }]
}

3 / 44

chunks kept · the rest dropped to budget

§14.2

per-chunk citation · page + clause

bridge

adjacent passage kept for multi-hop

audit

every rewrite stage lands here too

The same report drives evaluation — no LLM judge, the same engine as runtime, milliseconds per query. So you can see why a chunk made the cut, and measure when a change helps.

Evaluation → Multi-hop / bridge passages →

Tune retrieval, measure the lift

numbers are workload-shaped · measured on the same report · check yours

81%

HotpotQA hybrid ≥0.8 retention — ahead of LangChain 77% & LlamaIndex 67%

+8 / +5

multi-hop retention lead on HotpotQA / MuSiQue (n=300 each)

−80%

prompt tokens on real contracts, gold evidence kept, ~1.7ms / query

CUAD retention · the detect → strip → expand recipe

raw 24-word template 81.3%

+ Stripper 87.7%

LlamaIndex · ref 86.0%

+ Vocabulary 90.7%

A reproducible, audited workflow — not a retrieval-engine win: the same Stripper applied to LlamaIndex lifts it to 94%. Every rewrite stage lands on the Decision Report as an audit trail.

Choosing a configuration → Benchmarks →

RedHop vs LangChain vs LlamaIndex

same contract.pdf, same answer — count what you stand up

redhop · load → ask → read

import redhop
from openai import OpenAI

q = "What is the governing law?"
ctx = redhop.Document.from_file("contract.pdf").context(q)
#  parsed, chunked, retrieved, token-budgeted — internally

OpenAI().chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": f"{ctx.text()}\n\nQ: {q}"}],
)

stand up: nothing parse · chunk · retrieve · token-budget all internal — and every call hands back a Decision Report.

LangChain6 pieces to wire

PyMuPDF loader
text splitter (chunk_size / overlap)
embedding model
FAISS vector store
a retriever
prompt + retrieval chain

six wired pieces · embeddings cost a call per chunk

LlamaIndex4 pieces to own

PyMuPDF reader
sentence splitter
vector index
query engine

cleaner than LangChain — still an embed-and-index pipeline you own and pay for

Same answer, without an index to build, embed, or persist. The full head-to-head covers retrieval quality too — and where a dense-first stack wins.

Where RedHop fits

honest positioning · the wins are measured and workload-shaped

Pick RedHop when

You want three small calls (load → ask → read) instead of a framework
You need a Decision Report and per-rewrite audit trail — regulated or debugging-heavy contexts
You're doing multi-hop reasoning where the bridge passage matters
Your corpus fits in memory
Your team ships in Python, Node or Rust and wants one engine, defaults and reports in each

Look elsewhere when

You have millions of chunks and need ANN-scale search infrastructure
Your queries share almost no vocabulary with your documents — a dense-first vector stack fits better
You want a broad connector and agent ecosystem rather than a focused context layer
You want a managed or hosted offering — RedHop is library-only

Apples-to-apples with the same bge-small. On MuSiQue LangChain still leads narrowly (39% vs 34%). CUAD reaches 90.7% with the Stripper + Vocabulary recipe, but the same Stripper lifts LlamaIndex to 94% — a reproducible, audited workflow, not a retrieval-engine win. RedHop's hybrid tier is currently 2–5× slower than the competitors' hybrid, a known open item. We have not measured against dense-only services at scale. Full head-to-head →

three calls, and you can see every one

Grounded, cited answers — and a report that says exactly what the engine kept, and why.

Get started → Star on GitHub

The context layerthat shows its work.