Skip to content

Quickstart

RedHop has the same API in three languages. Pick your tab, and the choice follows you down the page.

Terminal window
pip install redhop

One package, no services, no vector DB. Document parsing (PDF/DOCX/PPTX/XLSX) and the optional semantic model are built in.

Point RedHop at a file. It parses, chunks, and indexes it, then hands you back just the context your question needs, which you give to any LLM:

import redhop
from openai import OpenAI
doc = redhop.Document.from_file("contract.pdf") # parse + chunk + index
question = "What is the governing law of this contract?"
ctx = doc.context(question)
# Hand ctx.text() to any provider — no lock-in.
resp = OpenAI().chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"Use only this context:\n\n{ctx.text()}\n\nQ: {question}"}],
)
print(resp.choices[0].message.content)
print(ctx.report) # the Decision Report ↓

Every call explains itself, including when RedHop deliberately does nothing:

RedHop Decision Report
══════════════════════
Decision: Auto → passthrough (left the context intact)
Why:
- input is small: 91 tokens ≤ 1500 gate
- under headroom, pruning is measured to be wash-to-harmful
- intervention predicted to add no signal density here
Result:
- kept all retrieved chunks — full evidence preserved
- avoided unnecessary intervention
Economics retrieved / final tokens, savings, density, retained evidence
Diagnostics chunks, distractor ratio, second-hop rescues, …

The decision is also available programmatically:

ctx.report.auto_decision # "passthrough" | "prune"
ctx.report.total_tokens
ctx.report.retained_evidence_ratio

Every selected chunk remembers where it came from, so you can show the model’s evidence trail, not just paste it:

for c in ctx.citations:
print(c["source"], c["page"]) # e.g. contract.pdf 3 → "from contract.pdf, p.3"

Loading a file is the quickest start, but it’s one of several on-ramps, and all return a Document:

# Text you already have (your own parser/OCR, a DB field).
doc = redhop.Document.from_text(open("notes.md").read())
# Already chunked it yourself — wrap each in redhop.Chunk so source/id/metadata travel through.
doc = redhop.Document.from_chunks([
redhop.Chunk("clause one …", source="msa.pdf", id="c1"),
redhop.Chunk("clause two …", source="msa.pdf", id="c2"),
])
# A whole folder — one combined index, citations per file.
doc = redhop.Document.from_folder("./docs")
# Bytes from S3 / Azure / GCS / HTTP.
doc = redhop.Document.from_bytes(s3_object_bytes, source="contract.pdf")

See all the loaders →, including a persistent, incremental on-disk index over thousands of files.

doc = redhop.Document.from_file(
"contract.pdf",
chunk_size=128, # index-time: how the doc is split
strategy="auto", # size-gated: prune only under dilution
)
ctx = doc.context(query, budget=2000) # query-time: vary freely, no re-indexing

chunk_size is fixed at construction (it’s how the index is built). The per-query budget is free to vary. Every parameter has a default, see Options for the full list.

Next: Loaders for every way to get documents in · Overview for the one idea, and how it works · Retrieval options for when BM25 isn’t enough.