Architecture¶
Design principles¶
Most knowledge tools give you an answer and hope it's right. KB tells you how much to trust the answer and why. When sources disagree, it flags the contradiction rather than silently picking one.
Embeddings and search run entirely on your machine. The only external call is to an LLM for synthesis, and that's optional.
Connectors, extractors, embedding models, and LLM providers are all swappable. Adding a new source type or file format doesn't touch core code.
System overview¶
INGESTION QUERY
───────── ─────
┌───────────┐ ┌──────────┐ ┌──────────┐
│ Local │ │ Git │ │Confluence│ ...
│Filesystem │ │ Repos │ │ Slack │
└────┬──────┘ └────┬─────┘ └─────┬────┘
│ │ │
▼ ▼ ▼
┌────────────────────────────────────────┐
│ Connectors │ ┌─────────────────┐
│ Pull content, detect changes (SHA-256)│ │ User Query │
└──────────────────┬─────────────────────┘ └────────┬────────┘
│ │
▼ ▼
┌────────────────────────────────────────┐ ┌──────────────────────────┐
│ Extractors │ │ Multi-Query Expansion │
│ Chunk at semantic boundaries per type │ │ LLM rephrases using │
│ (headings, functions, paragraphs) │ │ corpus vocabulary │
└──────────────────┬─────────────────────┘ │ (optional, needs API) │
│ └────────────┬─────────────┘
▼ │
┌────────────────────────────────────────┐ ▼
│ Enrichment (optional) │ ┌──────────────────────────┐
│ Local LLM annotates chunks with │ │ Embedding │
│ entities and keywords │ │ Embed original │
└──────────────────┬─────────────────────┘ │ + expanded queries │
│ └────────────┬─────────────┘
▼ │
┌────────────────────────────────────────┐ ▼
│ Embedding │ ┌──────────────────────────┐
│ Local model (nomic-embed-text, 768d) │ │ Hybrid Search │
└──────────────────┬─────────────────────┘ │ │
│ │ ┌────────┐ ┌──────────┐ │
▼ │ │Vector │ │ BM25 │ │
┌─────────────────┐ │ │sqlite- │ │ FTS5 │ │
│ │ │ │vec │ │ keyword │ │
│ ┌───────────┐ │ │ └───┬────┘ └────┬─────┘ │
│ │sqlite-vec │ │ │ └─────┬─────┘ │
│ │ (vectors) │ │◄─────────────────│ │ │
│ └───────────┘ │ search │ RRF Merge │
│ ┌───────────┐ │ └────────────┬─────────────┘
│ │ FTS5 │ │ │
│ │(keywords) │ │ ▼
│ └───────────┘ │ ┌──────────────────────────┐
│ │ │ Synthesis (LLM) │
│ SQLite (.db) │ │ or Raw Fragments │
│ │ └────────────┬─────────────┘
└─────────────────┘ │
▼
┌──────────────────────────┐
│ Response │
│ ┌────────────────────┐ │
│ │ Answer + Sources │ │
│ ├────────────────────┤ │
│ │ Confidence Signals │ │
│ │ (fresh/corr/cons/ │ │
│ │ auth → overall) │ │
│ ├────────────────────┤ │
│ │ Contradictions │ │
│ └────────────────────┘ │
└──────────────────────────┘
Ingestion pipeline¶
Connectors¶
Pluggable adapters that pull content from sources. Each source is registered with a type and name. The connector handles authentication, pagination, and change detection.
Ingestion is incremental: unchanged files (by SHA-256 checksum) are skipped. Documents that no longer exist at the source are removed from the database.
Supported connectors: local filesystem, Git (GitHub, GitLab, any Git host), Confluence Cloud, Slack, GitHub Wiki. See Connectors for setup details.
Extractors¶
Files are chunked at semantic boundaries based on file type:
| File type | Strategy |
|---|---|
Markdown (.md) |
Split on headings |
Code (.go, .py, .js, .ts, .jsx, .tsx, .java, .rs, .rb) |
Split on function/class boundaries |
PDF (.pdf) |
Text extraction |
Jupyter (.ipynb) |
Cell boundaries |
Config (.yaml, .yml, .toml, .json, .ini, .conf, .env, .properties) |
Logical sections |
| Everything else | Paragraph-based fallback |
Oversized chunks get a fixed-size fallback with configurable overlap (KB_MAX_CHUNK_SIZE, KB_CHUNK_OVERLAP).
Enrichment (optional)¶
A small local LLM (qwen2.5:0.5b by default) runs over each chunk with a sliding window of neighboring chunks. It appends entity and keyword annotations that improve retrieval without modifying the original text.
Enrichment runs entirely locally, no external API calls. The enrichment model is pulled automatically on first run.
What enrichment produces¶
The enrichment LLM reads each chunk (plus its neighbors for context) and generates entity names, keywords, and domain terms that are relevant but may not appear verbatim in the text. These annotations are appended to the chunk before embedding. The original chunk text is preserved separately so the raw content is never modified.
Choosing a model¶
The enrichment model is configured via KB_ENRICH_MODEL or the --enrich-model flag.
| Model | Speed | Quality | Memory |
|---|---|---|---|
qwen2.5:0.5b (default) |
Fast | Basic keyword extraction | ~500 MB |
qwen2.5:3b |
Slower | Better entity recognition, more accurate keywords | ~2 GB |
Smaller models are fine for most corpora. Use a larger model if you notice retrieval missing results that should match on entity names or domain-specific terminology.
How enrichment affects retrieval¶
Enriched terms are embedded alongside the chunk text, so they influence both vector similarity and BM25 keyword search. This is especially useful for vocabulary mismatch: if a chunk discusses "k8s pod autoscaling" but the user searches for "Kubernetes horizontal scaling," enrichment can bridge the gap by adding both phrasings.
When to re-enrich¶
Enrichment metadata is stored with each fragment. You need to re-enrich when:
- Changing the enrichment model — different models produce different annotations.
- Changing the prompt version — the annotation format changes.
To re-enrich, either re-ingest with --force or use --re-enrich to update enrichment on existing fragments without re-scanning sources:
# Re-enrich all fragments with a new model
kb ingest --re-enrich --enrich-model qwen2.5:3b
# Re-enrich only a specific source
kb ingest --re-enrich --source ./my-docs
Skipping enrichment¶
Pass --skip-enrichment to kb ingest if you want faster ingestion and don't need the keyword boost. This is useful for quick iteration during development or when your corpus already uses consistent terminology that matches how users search.
Troubleshooting¶
- Ollama OOM with larger models — If Ollama crashes or returns errors during enrichment with
qwen2.5:3bor larger, your machine may not have enough memory. Fall back toqwen2.5:0.5bor skip enrichment entirely. - Slow enrichment on large corpora — Enrichment adds an LLM call per chunk. For large corpora (thousands of files), expect enrichment to take significantly longer than embedding alone. Use
--parallelto speed up multi-source ingestion, or--skip-enrichmentfor the initial ingest and run--re-enrichlater. - Enrichment model not found — KB auto-pulls the enrichment model on first run via
kb setup. If this fails (e.g., no internet), pull it manually:ollama pull qwen2.5:0.5b.
Embedding and storage¶
Each chunk is embedded locally (nomic-embed-text by default, 768 dimensions). Vectors are stored in sqlite-vec for similarity search. The raw text is also indexed in an FTS5 table for BM25 keyword search.
Everything lives in a single SQLite database file. No external database infrastructure.
Query pipeline¶
Multi-query expansion¶
When an API key is available, KB does a quick scout retrieval to extract domain vocabulary from the corpus, then asks the LLM to rephrase the query using those terms. This bridges vocabulary mismatch, like when the user says "auth" but the docs say "authentication middleware."
Each expanded query variant is searched independently. Results are merged in the RRF step.
Hybrid search¶
Every query runs through both vector similarity (semantic meaning) and BM25 keyword search (exact term matching). This catches both conceptual matches and precise terminology.
Results from all search paths are merged via Reciprocal Rank Fusion (RRF), which boosts fragments that appear in multiple result lists without requiring score normalization.
Synthesis vs raw mode¶
Synthesis mode (default, requires an LLM provider) sends the top fragments to the configured LLM with a system prompt that instructs it to:
- Synthesise a direct answer from the retrieved fragments
- Assess confidence signals across the full context
- Cite specific sources
- Flag contradictions between sources explicitly
Raw mode (no API key needed) returns fragments directly with per-fragment confidence scores computed locally. Useful for debugging retrieval, feeding a separate pipeline, or when no API key is configured.
The trust layer¶
Every response includes a composite trust score built from four independent dimensions.
Confidence signals¶
| Signal | Weight | What it measures |
|---|---|---|
| Freshness | 0.20 | How recently were the sources modified, relative to the corpus age distribution |
| Corroboration | 0.25 | How many independent sources support the answer |
| Consistency | 0.30 | Do the sources agree, or are there contradictions |
| Authority | 0.25 | How authoritative are the source types for this kind of query |
The overall score is a weighted composite:
How confidence is computed¶
In raw mode, confidence is computed per fragment using local heuristics:
- Freshness is scored relative to the corpus age distribution, so a document modified last week scores higher than one modified last year, calibrated to how old the corpus is overall
- Corroboration reflects how many distinct sources contain similar information
- Consistency is based on embedding similarity between fragments about the same topic
- Authority weights source types based on query characteristics (e.g., code repos are more authoritative for implementation questions, Confluence for process questions)
In synthesis mode, the LLM assesses confidence across the full retrieved context, considering cross-fragment agreement, source diversity, and information completeness.
Contradictions¶
When sources disagree, Knowledge Broker flags the contradiction explicitly in the response. The contradictions array contains natural-language descriptions of what the sources disagree about and which sources are involved.
Most knowledge tools silently pick one answer. KB surfaces the disagreement so agents can escalate to a human and humans can figure out which source is actually right.
Using confidence signals¶
Agents can use the overall score to decide how to proceed:
| Score range | Suggested behavior |
|---|---|
| 0.85+ | Answer confidently |
| 0.6–0.85 | Answer with caveats, note uncertainty |
| Below 0.6 | Surface the contradiction or uncertainty to the user |
These thresholds are suggestions. Agents and applications can define their own logic based on the confidence breakdown.
Configuration¶
KB loads settings from multiple sources (later overrides earlier):
- Defaults — sensible built-in values
~/.config/kb/config— persistent user config (respects$XDG_CONFIG_HOME).envin working directory — project-local overrides--config <path>— explicit file (useful for server deployments)- Environment variables — always highest precedence
All config files use KEY=VALUE format. Run kb config to see the resolved values and where each one comes from. See the CLI Reference for the full variable list.