# Knowledge Broker — Complete Documentation Source: https://knowledgebroker.dev Generated: 2026-03-19 # Knowledge Broker Your AI agents are guessing at things your org already knows, because the answer is buried across three repos, a Confluence page, and a Slack thread from February. Knowledge Broker is an AI knowledge retrieval engine that searches all of them at once and gives back one answer with sources, confidence scores, and a heads-up when things contradict each other. Run it for your whole org or just on your laptop — either way, AI agents query it over MCP or HTTP, people use the CLI, and nobody needs to already know where to look. ## Why Knowledge Broker Your team's knowledge is scattered across repos, wikis, Confluence, Slack, and local docs. The answer to any question usually exists somewhere, spread across three sources that partially contradict each other. Traditional search finds documents. Knowledge Broker finds answers, tells you how much to trust them, and shows you where sources disagree. It runs on SQLite with local embedding models, no Postgres, no Elasticsearch, no cloud dependencies. One binary, one database file, everything managed automatically. The only external call is to an LLM for answer synthesis, and even that's optional (raw mode does retrieval and confidence scoring entirely locally). The MCP server gives AI agents structured access to the knowledge base with confidence scores they can branch on. When sources disagree, the contradiction is surfaced explicitly so agents and people can act on it. ## What it looks like ```jsonc $ kb query "What database does the inventory service use?" { "answer": "The inventory service uses PostgreSQL (v16 on RDS, r6g.2xlarge).", "confidence": { "overall": 0.93, "breakdown": { "freshness": 0.94, "corroboration": 0.85, "consistency": 1.00, "authority": 0.95 } }, "sources": [ { "source_type": "confluence", "source_name": "ACME", "source_path": "Internal Services" }, { "source_type": "slack", "source_name": "acme", "source_path": "#platform-engineering/2026-03-06" } ], "contradictions": [] } ``` The answer is synthesised from Confluence docs and Slack history. Every response includes a confidence breakdown and source attribution. ## Who it's for Engineering teams that want AI-powered knowledge retrieval across all their repos, docs, and chat history. Platform teams that want to give everyone — and every AI coding agent — access to the same organizational knowledge without each person setting up their own tooling. ## Get started ```bash curl -fsSL https://knowledgebroker.dev/install.sh | sh ``` Install and run your first query in under 5 minutes: [Getting Started](quickstart.md). Then [deploy for your team](deployment.md). ## How it works 1. **[Connectors](connectors.md)** pull content from sources: local filesystem, Git, Confluence, Slack, GitHub Wiki 2. **Extractors** chunk files at semantic boundaries (headings for markdown, functions for code) 3. **Embeddings** convert chunks to vectors locally; raw text is indexed with FTS5 for keyword search 4. **Hybrid search** runs vector similarity and BM25 keyword search, merged via Reciprocal Rank Fusion 5. **[Confidence signals](architecture.md#confidence-signals)** assess trust across four dimensions: freshness, corroboration, consistency, authority 6. **Synthesis** (optional) produces an answer via an LLM, or returns ranked fragments directly in raw mode Read the full [architecture](architecture.md) for details on the trust layer and query pipeline. ## License [BSL 1.1](https://github.com/alecgard/knowledge-broker/blob/main/LICENSE), free to use and self-host. Converts to Apache 2.0 after 4 years. --- # Getting Started Install KB and run your first query locally. For shared team setups, see [Team Deployment](deployment.md). ## Install ```bash curl -fsSL https://knowledgebroker.dev/install.sh | sh ``` This downloads the latest `kb` binary for your platform (macOS or Linux) and places it on your PATH. All runtime dependencies are managed automatically on first run. ??? note "Build from source" Requires Go 1.24+: ```bash git clone https://github.com/alecgard/knowledge-broker.git cd knowledge-broker make install ``` `make install` builds the `kb` binary and adds it to your PATH. ## Ingest Point KB at your sources. Descriptions help agents understand what each source contains: ```bash kb ingest --source ./my-project --description "Payment processing service" kb ingest --git https://github.com/acme/platform --description "Platform services" kb ingest --confluence ENGINEERING --description "Engineering wiki" kb ingest --slack C0ABC123DEF --description "Platform engineering channel" ``` KB walks each source, chunks files at semantic boundaries (headings for markdown, functions for code), embeds them locally, and stores everything in a single SQLite database. Ingestion is incremental, so re-running the same command only processes new or changed files. Set this up as a cron job or CI step to keep the knowledge base current. ## Query ### Raw mode (no API key needed) Raw mode runs the full retrieval pipeline (embedding, hybrid search, confidence scoring) entirely locally. No external API key required. ```bash kb query --raw "how does authentication work?" ``` Returns ranked fragments with content, source metadata, and per-fragment confidence scores. ### Synthesis mode (requires an LLM provider) For synthesised answers with cross-fragment confidence assessment and contradiction detection. Configure an API key for your preferred provider: ```bash # Save to your persistent config (recommended — survives new shells) mkdir -p ~/.config/kb echo 'ANTHROPIC_API_KEY=sk-ant-...' >> ~/.config/kb/config # Or export for the current session export ANTHROPIC_API_KEY=sk-ant-... ``` Other providers work too: ```bash # OpenAI KB_LLM_PROVIDER=openai OPENAI_API_KEY=sk-... # Local model via Ollama (no API key needed) KB_LLM_PROVIDER=ollama ``` ```bash kb query "how does authentication work?" ``` Returns a natural-language answer with an overall confidence score, source citations, and any contradictions between sources. ### Human-readable streaming ```bash kb query --human "how does authentication work?" ``` Streams the answer to the terminal as it's generated. ## Tell your agents about KB If you use an AI coding agent (Claude Code, Cursor, etc.), add a prompt to your project config telling it when and how to use KB. Without this, agents won't know the knowledge base exists. We provide ready-made prompt templates you can drop into your `CLAUDE.md`, `.cursorrules`, or equivalent — see [Agent prompts](mcp.md#agent-prompts). ## What requires an API key KB works entirely locally out of the box. An LLM provider (Claude, OpenAI, or local via Ollama) unlocks additional capabilities but is never required for core retrieval. | Capability | Local only | With API key | |------------|:-----------:|:------------:| | Ingestion, embedding, hybrid search | :material-check: | :material-check: | | Raw retrieval with confidence signals | :material-check: | :material-check: | | Chunk enrichment (entity/keyword annotations) | :material-check: | :material-check: | | **Multi-query expansion** | | :material-check: | | **Answer synthesis** | | :material-check: | | **Cross-fragment confidence assessment** | | :material-check: | | **Contradiction detection** | | :material-check: | Run `kb config` at any time to see where your settings are coming from. See [CLI Reference — Configuration](cli.md#configuration) for the full search path. ## Next steps - [Deploy for your team](deployment.md) — shared server, HTTP API, remote MCP - [MCP Server](mcp.md) — connect AI agents to your local or shared KB instance - [Connect more sources](connectors.md) — Confluence, Slack, GitHub Wiki - [Understand the trust layer](architecture.md) — how confidence signals work - [CLI Reference](cli.md) — all commands and flags --- # Team Deployment The typical setup: one KB instance runs on a shared server with your org's sources ingested. Developers and AI agents connect to it from their own machines via CLI, HTTP, or MCP. ## Server setup ### 1. Install On the server: ```bash curl -fsSL https://knowledgebroker.dev/install.sh | sh ``` ### 2. Ingest your org's sources ```bash kb ingest --git https://github.com/acme/platform --description "Platform services" kb ingest --confluence ENGINEERING --description "Engineering wiki" kb ingest --slack C0ABC123DEF --description "Platform engineering channel" ``` Set up a cron job or CI step to re-run ingestion periodically. Only new or changed files are processed. ### 3. Configure synthesis (optional) Set an API key on the server for answer synthesis. Without one, raw retrieval still works. The recommended approach for servers is a config file: ```bash # Create a server config file cat > /etc/kb/config <` to write to a specific location. ### Restore Restore from a backup file: ```bash kb restore /path/to/kb-backup-20250115-030000.db ``` This validates that the backup is a valid SQLite database, then prompts for confirmation before overwriting: ``` This will replace the current database at /home/deploy/.local/share/kb/kb.db. Continue? [y/N] ``` Pass `--force` to skip the confirmation prompt (useful in scripts): ```bash kb restore --force /path/to/backup.db ``` Stop the server before restoring, then restart it afterward. ### Migration between machines Copy the backup file to the new machine. The only requirement is that the embedding model matches — same model name and dimensions. If the models differ, re-ingest from your sources instead. ## Architecture ``` ┌─────────────────────────────────┐ │ KB Server │ │ │ │ kb serve │ │ ├── HTTP API (:8080) │ │ ├── MCP SSE (:8082) │ │ └── SQLite DB (kb.db) │ └──────┬──────────────┬───────────┘ │ │ ┌────────────┴──┐ ┌──────┴────────────┐ │ HTTP / CLI │ │ MCP SSE │ │ --remote │ │ │ ├───────────────┤ ├───────────────────┤ │ kb query │ │ Claude Code │ │ kb sources │ │ Cursor │ │ kb ingest │ │ Windsurf │ │ curl / scripts│ │ Any MCP client │ └───────────────┘ └───────────────────┘ ``` --- # Architecture ## Design principles Most knowledge tools give you an answer and hope it's right. KB tells you how much to trust the answer and why. When sources disagree, it flags the contradiction rather than silently picking one. Embeddings and search run entirely on your machine. The only external call is to an LLM for synthesis, and that's optional. Connectors, extractors, embedding models, and LLM providers are all swappable. Adding a new source type or file format doesn't touch core code. ## System overview ``` INGESTION QUERY ───────── ───── ┌───────────┐ ┌──────────┐ ┌──────────┐ │ Local │ │ Git │ │Confluence│ ... │Filesystem │ │ Repos │ │ Slack │ └────┬──────┘ └────┬─────┘ └─────┬────┘ │ │ │ ▼ ▼ ▼ ┌────────────────────────────────────────┐ │ Connectors │ ┌─────────────────┐ │ Pull content, detect changes (SHA-256)│ │ User Query │ └──────────────────┬─────────────────────┘ └────────┬────────┘ │ │ ▼ ▼ ┌────────────────────────────────────────┐ ┌──────────────────────────┐ │ Extractors │ │ Multi-Query Expansion │ │ Chunk at semantic boundaries per type │ │ LLM rephrases using │ │ (headings, functions, paragraphs) │ │ corpus vocabulary │ └──────────────────┬─────────────────────┘ │ (optional, needs API) │ │ └────────────┬─────────────┘ ▼ │ ┌────────────────────────────────────────┐ ▼ │ Enrichment (optional) │ ┌──────────────────────────┐ │ Local LLM annotates chunks with │ │ Embedding │ │ entities and keywords │ │ Embed original │ └──────────────────┬─────────────────────┘ │ + expanded queries │ │ └────────────┬─────────────┘ ▼ │ ┌────────────────────────────────────────┐ ▼ │ Embedding │ ┌──────────────────────────┐ │ Local model (nomic-embed-text, 768d) │ │ Hybrid Search │ └──────────────────┬─────────────────────┘ │ │ │ │ ┌────────┐ ┌──────────┐ │ ▼ │ │Vector │ │ BM25 │ │ ┌─────────────────┐ │ │sqlite- │ │ FTS5 │ │ │ │ │ │vec │ │ keyword │ │ │ ┌───────────┐ │ │ └───┬────┘ └────┬─────┘ │ │ │sqlite-vec │ │ │ └─────┬─────┘ │ │ │ (vectors) │ │◄─────────────────│ │ │ │ └───────────┘ │ search │ RRF Merge │ │ ┌───────────┐ │ └────────────┬─────────────┘ │ │ FTS5 │ │ │ │ │(keywords) │ │ ▼ │ └───────────┘ │ ┌──────────────────────────┐ │ │ │ Synthesis (LLM) │ │ SQLite (.db) │ │ or Raw Fragments │ │ │ └────────────┬─────────────┘ └─────────────────┘ │ ▼ ┌──────────────────────────┐ │ Response │ │ ┌────────────────────┐ │ │ │ Answer + Sources │ │ │ ├────────────────────┤ │ │ │ Confidence Signals │ │ │ │ (fresh/corr/cons/ │ │ │ │ auth → overall) │ │ │ ├────────────────────┤ │ │ │ Contradictions │ │ │ └────────────────────┘ │ └──────────────────────────┘ ``` ## Ingestion pipeline ``` Source → Connector → Extractor → Enrichment → Embedding → SQLite (sqlite-vec + FTS5) ``` ### Connectors Pluggable adapters that pull content from sources. Each source is registered with a type and name. The connector handles authentication, pagination, and change detection. Ingestion is incremental: unchanged files (by SHA-256 checksum) are skipped. Documents that no longer exist at the source are removed from the database. Supported connectors: local filesystem, Git (GitHub, GitLab, any Git host), Confluence Cloud, Slack, GitHub Wiki. See [Connectors](connectors.md) for setup details. ### Extractors Files are chunked at semantic boundaries based on file type: | File type | Strategy | |-----------|----------| | Markdown (`.md`) | Split on headings | | Code (`.go`, `.py`, `.js`, `.ts`, `.jsx`, `.tsx`, `.java`, `.rs`, `.rb`) | Split on function/class boundaries | | PDF (`.pdf`) | Text extraction | | Jupyter (`.ipynb`) | Cell boundaries | | Config (`.yaml`, `.yml`, `.toml`, `.json`, `.ini`, `.conf`, `.env`, `.properties`) | Logical sections | | Everything else | Paragraph-based fallback | Oversized chunks get a fixed-size fallback with configurable overlap (`KB_MAX_CHUNK_SIZE`, `KB_CHUNK_OVERLAP`). ### Enrichment (optional) A small local LLM (`qwen2.5:0.5b` by default) runs over each chunk with a sliding window of neighboring chunks. It appends entity and keyword annotations that improve retrieval without modifying the original text. Enrichment runs entirely locally, no external API calls. The enrichment model is pulled automatically on first run. #### What enrichment produces The enrichment LLM reads each chunk (plus its neighbors for context) and generates entity names, keywords, and domain terms that are relevant but may not appear verbatim in the text. These annotations are appended to the chunk before embedding. The original chunk text is preserved separately so the raw content is never modified. #### Choosing a model The enrichment model is configured via `KB_ENRICH_MODEL` or the `--enrich-model` flag. | Model | Speed | Quality | Memory | |-------|-------|---------|--------| | `qwen2.5:0.5b` (default) | Fast | Basic keyword extraction | ~500 MB | | `qwen2.5:3b` | Slower | Better entity recognition, more accurate keywords | ~2 GB | Smaller models are fine for most corpora. Use a larger model if you notice retrieval missing results that should match on entity names or domain-specific terminology. #### How enrichment affects retrieval Enriched terms are embedded alongside the chunk text, so they influence both **vector similarity** and **BM25 keyword search**. This is especially useful for vocabulary mismatch: if a chunk discusses "k8s pod autoscaling" but the user searches for "Kubernetes horizontal scaling," enrichment can bridge the gap by adding both phrasings. #### When to re-enrich Enrichment metadata is stored with each fragment. You need to re-enrich when: - **Changing the enrichment model** — different models produce different annotations. - **Changing the prompt version** — the annotation format changes. To re-enrich, either re-ingest with `--force` or use `--re-enrich` to update enrichment on existing fragments without re-scanning sources: ```bash # Re-enrich all fragments with a new model kb ingest --re-enrich --enrich-model qwen2.5:3b # Re-enrich only a specific source kb ingest --re-enrich --source ./my-docs ``` #### Skipping enrichment Pass `--skip-enrichment` to `kb ingest` if you want faster ingestion and don't need the keyword boost. This is useful for quick iteration during development or when your corpus already uses consistent terminology that matches how users search. #### Troubleshooting - **Ollama OOM with larger models** — If Ollama crashes or returns errors during enrichment with `qwen2.5:3b` or larger, your machine may not have enough memory. Fall back to `qwen2.5:0.5b` or skip enrichment entirely. - **Slow enrichment on large corpora** — Enrichment adds an LLM call per chunk. For large corpora (thousands of files), expect enrichment to take significantly longer than embedding alone. Use `--parallel` to speed up multi-source ingestion, or `--skip-enrichment` for the initial ingest and run `--re-enrich` later. - **Enrichment model not found** — KB auto-pulls the enrichment model on first run via `kb setup`. If this fails (e.g., no internet), pull it manually: `ollama pull qwen2.5:0.5b`. ### Embedding and storage Each chunk is embedded locally (`nomic-embed-text` by default, 768 dimensions). Vectors are stored in **sqlite-vec** for similarity search. The raw text is also indexed in an **FTS5** table for BM25 keyword search. Everything lives in a single SQLite database file. No external database infrastructure. ## Query pipeline ``` Query → Expansion → Embedding → Hybrid Search (vector + BM25) → RRF Merge → Synthesis/Raw ``` ### Multi-query expansion When an API key is available, KB does a quick scout retrieval to extract domain vocabulary from the corpus, then asks the LLM to rephrase the query using those terms. This bridges vocabulary mismatch, like when the user says "auth" but the docs say "authentication middleware." Each expanded query variant is searched independently. Results are merged in the RRF step. ### Hybrid search Every query runs through both **vector similarity** (semantic meaning) and **BM25 keyword search** (exact term matching). This catches both conceptual matches and precise terminology. Results from all search paths are merged via **Reciprocal Rank Fusion** (RRF), which boosts fragments that appear in multiple result lists without requiring score normalization. ### Synthesis vs raw mode **Synthesis mode** (default, requires an LLM provider) sends the top fragments to the configured LLM with a system prompt that instructs it to: - Synthesise a direct answer from the retrieved fragments - Assess confidence signals across the full context - Cite specific sources - Flag contradictions between sources explicitly **Raw mode** (no API key needed) returns fragments directly with per-fragment confidence scores computed locally. Useful for debugging retrieval, feeding a separate pipeline, or when no API key is configured. ## The trust layer Every response includes a composite trust score built from four independent dimensions. ### Confidence signals | Signal | Weight | What it measures | |--------|--------|-----------------| | **Freshness** | 0.20 | How recently were the sources modified, relative to the corpus age distribution | | **Corroboration** | 0.25 | How many independent sources support the answer | | **Consistency** | 0.30 | Do the sources agree, or are there contradictions | | **Authority** | 0.25 | How authoritative are the source types for this kind of query | The **overall** score is a weighted composite: ``` overall = freshness × 0.20 + corroboration × 0.25 + consistency × 0.30 + authority × 0.25 ``` ### How confidence is computed In **raw mode**, confidence is computed per fragment using local heuristics: - **Freshness** is scored relative to the corpus age distribution, so a document modified last week scores higher than one modified last year, calibrated to how old the corpus is overall - **Corroboration** reflects how many distinct sources contain similar information - **Consistency** is based on embedding similarity between fragments about the same topic - **Authority** weights source types based on query characteristics (e.g., code repos are more authoritative for implementation questions, Confluence for process questions) In **synthesis mode**, the LLM assesses confidence across the full retrieved context, considering cross-fragment agreement, source diversity, and information completeness. ### Contradictions When sources disagree, Knowledge Broker flags the contradiction explicitly in the response. The `contradictions` array contains natural-language descriptions of what the sources disagree about and which sources are involved. Most knowledge tools silently pick one answer. KB surfaces the disagreement so agents can escalate to a human and humans can figure out which source is actually right. ### Using confidence signals Agents can use the overall score to decide how to proceed: | Score range | Suggested behavior | |-------------|-------------------| | 0.85+ | Answer confidently | | 0.6–0.85 | Answer with caveats, note uncertainty | | Below 0.6 | Surface the contradiction or uncertainty to the user | These thresholds are suggestions. Agents and applications can define their own logic based on the confidence breakdown. ## Configuration KB loads settings from multiple sources (later overrides earlier): 1. **Defaults** — sensible built-in values 2. **`~/.config/kb/config`** — persistent user config (respects `$XDG_CONFIG_HOME`) 3. **`.env` in working directory** — project-local overrides 4. **`--config `** — explicit file (useful for server deployments) 5. **Environment variables** — always highest precedence All config files use `KEY=VALUE` format. Run `kb config` to see the resolved values and where each one comes from. See the [CLI Reference](cli.md#configuration) for the full variable list. --- # Connectors KB uses pluggable connectors to ingest content from different sources. Each connector scans a source, produces documents, and supports incremental re-ingestion via checksums. ## Local Filesystem Ingest files from a local directory. Walks the directory tree recursively, respects `.gitignore` patterns. ```bash kb ingest --source ./path/to/dir kb ingest --source ./repo-a --source ./repo-b # multiple directories ``` No configuration needed. This is the default if no flags are given, so `kb ingest` ingests the current directory. ## Git Clone and ingest a Git repository by URL. Supports public repos directly; private repos authenticate via `KB_GITHUB_TOKEN`, the `gh` CLI, or GitHub device flow. GitLab and other Git hosts are also supported. ```bash kb ingest --git https://github.com/owner/repo kb ingest --git https://github.com/owner/private-repo # uses gh CLI or device flow kb ingest --git https://gitlab.com/owner/repo # uses KB_GITLAB_TOKEN kb ingest --git https://github.com/owner/repo#abc1234 # pin to a specific commit ``` | Variable | Required | Description | |----------|----------|-------------| | `KB_GITHUB_TOKEN` | No | GitHub personal access token for private repos | | `KB_GITLAB_TOKEN` | No | GitLab personal access token for private repos | | `KB_GIT_TOKEN` | No | Generic Git token (works with any host) | | `KB_GITHUB_CLIENT_ID` | No | GitHub OAuth app client ID for device flow auth | For GitHub, if no token is set, KB tries the `gh` CLI's cached token, then falls back to device flow if a client ID is configured. ## Confluence Ingest pages from an Atlassian Confluence Cloud space. Fetches all pages via the REST API with pagination, extracts body content, and supports incremental sync. ```bash kb ingest --confluence ENGINEERING kb ingest --confluence ENGINEERING --confluence PRODUCT # multiple spaces ``` | Variable | Required | Description | |----------|----------|-------------| | `KB_CONFLUENCE_BASE_URL` | Yes | Your Confluence instance URL (e.g., `https://yoursite.atlassian.net`) | | `KB_CONFLUENCE_EMAIL` | Yes | Email address for API authentication | | `KB_CONFLUENCE_TOKEN` | Yes | Atlassian API token ([create one here](https://id.atlassian.com/manage-profile/security/api-tokens)) | The `--confluence` flag takes a **space key** (the short code visible in Confluence URLs, e.g., `ENGINEERING`). Each space is ingested as a separate source. Pages that are deleted in Confluence are automatically detected and removed from the knowledge base on re-ingestion. ## Slack Ingest messages from Slack channels. Fetches message history via the Slack Web API. Threaded conversations become individual documents; non-threaded messages are grouped by day. ```bash kb ingest --slack C0ABC123DEF kb ingest --slack C0ABC123DEF --slack C0XYZ789GHI # multiple channels ``` | Variable | Required | Description | |----------|----------|-------------| | `KB_SLACK_TOKEN` | Yes | Bot User OAuth Token (`xoxb-...`) | | `KB_SLACK_WORKSPACE` | No | Workspace name for display (e.g., `acme-org`) | ### Slack app setup 1. Create a Slack app at [api.slack.com/apps](https://api.slack.com/apps) 2. Add the following **Bot Token Scopes** under OAuth & Permissions: - `channels:history` — read message history - `channels:read` — list channels and get channel info - `channels:join` — join public channels (if the bot isn't already a member) 3. Install the app to your workspace 4. Copy the **Bot User OAuth Token** (`xoxb-...`) to `KB_SLACK_TOKEN` 5. Find channel IDs: right-click a channel name in Slack → "View channel details" → the ID is at the bottom By default, Slack ingestion looks back **90 days**. Messages older than that are not fetched. ### How messages are structured - **Threaded conversations** (messages with replies): each thread becomes a single document containing the parent message and all replies - **Non-threaded messages**: grouped into daily documents per channel ## GitHub Wiki Ingest pages from a GitHub repository's wiki. The wiki is a separate Git repo (`{repoURL}.wiki.git`) that KB clones and scans. Page links are rewritten to point to the GitHub wiki web UI. ```bash kb ingest --wiki https://github.com/owner/repo kb ingest --wiki https://github.com/owner/repo --wiki https://github.com/owner/other-repo ``` | Variable | Required | Description | |----------|----------|-------------| | `KB_GITHUB_TOKEN` | No | Required for private repos | The `--wiki` flag takes the **main repository URL** (not the wiki URL). KB automatically derives the wiki clone URL. Authentication works the same as the Git connector: `KB_GITHUB_TOKEN`, `gh` CLI, or device flow. ## Combining sources All connector flags can be mixed in a single ingest command: ```bash kb ingest \ --source ./local-docs \ --git https://github.com/acme/backend \ --confluence ENGINEERING \ --slack C0ABC123DEF \ --wiki https://github.com/acme/platform ``` Each source is tracked independently. Re-running the same command only processes new or changed content. ## Incremental ingestion All connectors support incremental ingestion via SHA-256 checksums. On each run: 1. KB loads known checksums from the database 2. The connector scans the source and compares against known checksums 3. Only new or changed documents are extracted, embedded, and stored 4. Documents that existed previously but are no longer present are deleted from the database To force a full re-ingestion (ignoring checksums): ```bash kb ingest --confluence ENGINEERING --force ``` ## Re-ingesting all sources To re-ingest all previously registered local sources: ```bash kb ingest --all ``` This is useful after upgrading KB or changing chunking/embedding settings. --- # MCP Server Knowledge Broker includes an [MCP](https://modelcontextprotocol.io) server that any MCP-compatible client can use to query and explore the knowledge base. `kb serve` runs the HTTP API, MCP stdio, and MCP SSE transports in a single process. ## Setup The typical deployment: one KB instance runs on a shared machine with your org's sources already ingested. Developers and agents connect to it via MCP or HTTP. ### Connecting MCP clients Each developer adds KB to their MCP client config (Claude Code, Cursor, Windsurf, etc.): ```json { "mcpServers": { "knowledge-broker": { "command": "/path/to/kb", "args": ["serve", "--no-http", "--no-sse"] } } } ``` If `kb` is on your PATH, you can use `"command": "kb"` directly. This launches KB as a subprocess via stdio. ### SSE (remote access) `kb serve` also starts an SSE transport on `:8082` by default, so remote clients can connect without running the binary locally: ```bash kb serve # HTTP on :8080, MCP SSE on :8082 kb serve --mcp-addr :9090 # custom MCP SSE port ``` The SSE endpoint is at `http:///sse` with messages at `http:///message`. For HTTPS, put a reverse proxy or tunnel in front. ### Remote ingestion Team members can push content from their local checkouts to the shared instance: ```bash # On the server kb serve --addr :8080 # From a developer's machine kb ingest --source ./my-repo --remote http://server:8080 ``` ## Tools ### query Query the knowledge base and get an answer. | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `query` | string | yes | — | The query to search for | | `topics` | string | no | — | Comma-separated topics to boost relevance | | `limit` | number | no | 20 | Max fragments to retrieve | | `raw` | boolean | no | false | Return raw fragments instead of synthesised answer | | `sources` | string | no | — | Comma-separated source names to filter results | | `source_types` | string | no | — | Comma-separated source types to filter results | | `no_expand` | boolean | no | false | Disable multi-query expansion | **Synthesis mode (default):** Returns a synthesised answer with confidence signals and source citations. Requires an LLM provider (`ANTHROPIC_API_KEY` for Claude, or configure `KB_LLM_PROVIDER`). **Raw mode (raw=true):** Returns fragments with content, source metadata, and per-fragment confidence signals. No API key required. ### list-sources List all ingested sources with fragment counts and last sync time. Takes no parameters. Returns an array of sources with `source_type`, `source_name`, `description`, `fragment_count`, and `last_ingest`. ## Prompts ### kb-instructions A prompt that returns instructions teaching the agent when and how to use the knowledge base. Takes no arguments. The response includes: - When to query the knowledge base (missing context, unfamiliar patterns, before making assumptions) - A dynamically generated list of available sources with descriptions and fragment counts - Tips for using synthesis vs raw mode, topics, and source filtering MCP clients that support prompts will show this in their prompt list. Use it to bootstrap agent context without manually writing instructions. ## Agent prompts MCP clients discover KB's tools automatically, but agents won't reach for them unless they know the knowledge base exists. Adding a short prompt to your project config makes the difference between agents that guess and agents that check. ### Claude Code Add to your project's `CLAUDE.md`: ```markdown ## Knowledge base This project is indexed in Knowledge Broker, a shared knowledge base that spans our repos, docs, and internal sources. It's available via MCP as the "knowledge-broker" server (tools: `query` and `list-sources`). Use Knowledge Broker: - Before asking me for context about the codebase, architecture, or how things work - When you encounter unfamiliar patterns, services, or conventions - When you need to understand why something was built a certain way - Instead of grepping across repos for answers that span multiple files Start with `list-sources` to see what's indexed. Use `query` for answers. Check the confidence score — if it's below 0.5, tell me you're uncertain rather than treating the answer as fact. If sources contradict each other, surface both claims with their dates. ``` ### Codex Add to your project's `AGENTS.md` (also read by Jules, Aider, and other agents that support the format): ```markdown ## Knowledge base This project is indexed in Knowledge Broker, a shared knowledge base available via MCP (server: "knowledge-broker", tools: "query" and "list-sources"). Before making assumptions about the codebase, architecture, or project conventions, use Knowledge Broker's query tool. Start with list-sources to see what's indexed. Check the confidence score in the response — if it's below 0.5, flag the uncertainty. If sources contradict, surface both claims with dates. ``` ### Cursor Add to `.cursor/rules`: ``` This project is indexed in Knowledge Broker, a shared knowledge base available via MCP (server: "knowledge-broker", tools: "query" and "list-sources"). Before making assumptions about the codebase, architecture, or project conventions, use Knowledge Broker's query tool. Start with list-sources to see what's indexed. Check the confidence score in the response — if it's below 0.5, flag the uncertainty. If sources contradict, surface both claims with dates. ``` ### Windsurf Add to `.windsurfrules`: ``` This project is indexed in Knowledge Broker, a shared knowledge base available via MCP (server: "knowledge-broker", tools: "query" and "list-sources"). Before making assumptions about the codebase, architecture, or project conventions, use Knowledge Broker's query tool. Start with list-sources to see what's indexed. Check the confidence score in the response — if it's below 0.5, flag the uncertainty. If sources contradict, surface both claims with dates. ``` ### Generic (any MCP client) The core instruction is the same everywhere. Adapt to your client's prompt format: ``` This project is indexed in Knowledge Broker, a shared knowledge base available via MCP (server: "knowledge-broker"). Use the "query" tool to search for answers about the codebase, architecture, and project conventions before making assumptions. Use "list-sources" to discover what's indexed. Pay attention to confidence scores in the response — flag anything below 0.5 as uncertain. When sources contradict each other, surface both claims with their dates so the user can judge which is current. ``` ## Configuration KB loads settings from config files and environment variables. The recommended approach is to save persistent settings to `~/.config/kb/config`: ```bash mkdir -p ~/.config/kb echo 'ANTHROPIC_API_KEY=sk-ant-...' >> ~/.config/kb/config ``` Run `kb config` to see the resolved configuration and where each value comes from. See the [CLI Reference](cli.md#configuration) for the full search path and variable list. ## Typical setup 1. **Deploy KB** on a shared machine. Ingest your org's sources: `kb ingest --confluence ENGINEERING --git https://github.com/org/repo --slack C0ABC123DEF` 2. **Start the server**: `kb serve` 3. **Each developer** adds KB to their MCP client config (see above) 4. Agents call `query` for answers with confidence signals, or `list-sources` to discover what's available 5. The `kb-instructions` prompt bootstraps agent context automatically, no manual prompt engineering needed --- # CLI Reference ## kb ingest Ingest documents from one or more sources into the knowledge base. ```bash kb ingest --source ./path/to/dir kb ingest --git https://github.com/owner/repo kb ingest --confluence ENGINEERING kb ingest --slack C0ABC123DEF kb ingest --wiki https://github.com/owner/repo kb ingest --all ``` | Flag | Description | |------|-------------| | `--source` | Local directory path (repeatable) | | `--git` | Git repository URL (repeatable). Append `#sha` to pin to a specific commit | | `--confluence` | Confluence space key (repeatable) | | `--slack` | Slack channel ID (repeatable) | | `--wiki` | GitHub Wiki repository URL (repeatable) | | `--all` | Re-ingest all registered local sources | | `--remote` | URL of a remote KB server to push fragments to | | `--description` | Human-readable description of the source (shown to agents) | | `--db` | SQLite database path (default: `kb.db`) | | `--skip-enrichment` | Skip LLM chunk enrichment (faster ingestion) | | `--enrich-model` | Ollama model for chunk enrichment (default: `qwen2.5:0.5b` or `KB_ENRICH_MODEL`) | | `--prompt-version` | Enrichment prompt version: `v1` (full rewrite) or `v2` (append keywords) | | `--re-enrich` | Re-run enrichment on already-ingested chunks, then re-embed | | `--watch` | Watch for file changes and re-ingest automatically (local sources only) | | `--parallel` | Ingest multiple sources in parallel (default: sequential) | | `--force` | Force re-ingestion of all files, ignoring checksums | All connector flags can be combined in a single command. Ingestion is incremental, unchanged files are skipped based on checksums. See [Connectors](connectors.md) for detailed setup instructions per source type. ## kb query Query the knowledge base. ```bash # Raw retrieval (no API key needed) kb query --raw "how does auth work?" # Synthesised answer (requires an LLM provider, Claude by default) kb query "what is the billing retry policy?" # Human-readable streaming kb query --human "how does deployment work?" # With filters kb query --raw --limit 10 --topics "billing,payments" "retry policy" kb query --raw --source-type git "deployment process" ``` | Flag | Description | |------|-------------| | `--raw` | Return ranked fragments without LLM synthesis | | `--human` | Stream the answer in human-readable format | | `--limit` | Maximum number of fragments to retrieve | | `--topics` | Comma-separated topics to boost relevance | | `--source-type` | Filter by source type (`filesystem`, `git`, `confluence`, `slack`, `github_wiki`) | | `--db` | SQLite database path (default: `kb.db`) | | `--remote` | URL of a remote KB server to query | ## kb chat Start an interactive multi-turn conversation with the knowledge base. Each turn sends the full conversation history to the query engine, so follow-up questions have full context. ```bash # Start a chat session kb chat # With filters kb chat --topics "billing,payments" --source owner/repo # Against a remote server kb chat --remote http://server:8080 ``` Type your question at the `kb> ` prompt. The answer streams to the terminal. Type `exit`, `quit`, or press `Ctrl+C` to end the session. | Flag | Description | |------|-------------| | `--db` | SQLite database path (default: `kb.db`) | | `--limit` | Maximum number of fragments to retrieve | | `--topics` | Comma-separated topics to boost relevance | | `--source` | Filter results to this source name (repeatable) | | `--source-type` | Filter results to this source type (repeatable) | | `--llm` | LLM provider override: `claude`, `openai`, `ollama` | | `--no-expand` | Disable multi-query expansion | | `--remote` | URL of a remote KB server | Example session: ``` $ kb chat Knowledge Broker — interactive chat (type 'exit' or 'quit' to end) kb> how does authentication work? Authentication uses JWT tokens issued by the auth service... --- Confidence: 0.82 --- kb> what happens when a token expires? When a JWT token expires, the client must request a new one... --- Confidence: 0.79 --- kb> exit ``` ## kb serve Start the HTTP API and MCP server. Runs the HTTP API, MCP stdio transport, and MCP SSE transport in a single process. ```bash kb serve kb serve --addr :9090 --mcp-addr :9091 ``` | Flag | Default | Description | |------|---------|-------------| | `--addr` | `:8080` | HTTP listen address | | `--mcp-addr` | `:8082` | MCP SSE listen address | | `--db` | `kb.db` | SQLite database path | | `--no-http` | `false` | Disable HTTP API server | | `--no-sse` | `false` | Disable MCP SSE transport | | `--no-stdio` | `false` | Disable MCP stdio transport | Use `--no-*` flags to run only the transports you need: ```bash kb serve # all transports (default) kb serve --no-http --no-sse # stdio only (for MCP client configs) kb serve --no-stdio # HTTP + SSE (headless server deployment) ``` ### Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/v1/query` | POST | Query with optional SSE streaming | | `/v1/ingest` | POST | Receive fragments from remote ingestion | | `/v1/sources` | GET | List registered sources | | `/v1/sources` | PATCH | Update source description | | `/v1/sources` | DELETE | Remove a source and its fragments | | `/v1/sources/import` | POST | Import sources from JSON | | `/v1/export` | GET | Export fragment embeddings as JSON | | `/v1/version` | GET | Server version | | `/v1/health` | GET | Health check | | `/metrics` | GET | Prometheus metrics | ### Query request format ```json { "messages": [{"role": "user", "content": "how does auth work?"}], "limit": 20, "mode": "raw", "stream": true, "topics": ["billing", "payments"], "sources": ["my-repo"], "source_types": ["git", "confluence"], "no_expand": false } ``` - Omit `mode` for synthesis (default). Set `"mode": "raw"` for raw retrieval. - Set `"stream": true` for SSE streaming (synthesis mode only). - The `messages` array follows the same format as the Anthropic Messages API. Pass conversation history for multi-turn queries. ## kb sources Manage registered ingestion sources. All subcommands accept `--remote` to operate on a remote KB server. ### kb sources list List all registered sources with type, name, description, fragment count, and last ingest time. ```bash kb sources list kb sources list --remote http://server:8080 ``` ### kb sources describe Set a description for an existing source. Descriptions appear in `list-sources` results and the `kb-instructions` prompt. ```bash kb sources describe filesystem/my-repo "Payment processing microservice" kb sources describe git/owner/repo "Main backend API" kb sources describe --remote http://server:8080 git/owner/repo "Main backend API" ``` ### kb sources export Export registered sources to a JSON file. ```bash kb sources export sources.json ``` ### kb sources import Import sources from a JSON file. ```bash kb sources import sources.json ``` ### kb sources remove Remove a registered source and all its fragments from the database. ```bash kb sources remove confluence/ENGINEERING kb sources remove --remote http://server:8080 git/owner/repo ``` ## kb export Export fragment embeddings for visualization with TensorBoard Embedding Projector. ```bash kb export --out ./export/ kb export --remote http://server:8080 --out ./export/ ``` Produces `tensors.tsv` and `metadata.tsv` files that can be loaded into the [Embedding Projector](https://projector.tensorflow.org/). ## kb eval Run the evaluation framework to measure retrieval quality. ```bash make eval # one-command eval kb eval --db eval.db --testset eval/testset.json # manual kb eval --db eval.db --corpus eval/corpus --ingest # ingest corpus first kb eval --db eval.db --json # structured output ``` | Flag | Default | Description | |------|---------|-------------| | `--db` | `kb.db` | Database path | | `--testset` | `eval/testset.json` | Path to test set | | `--corpus` | `eval/corpus` | Path to eval corpus | | `--limit` | `20` | Top-K retrieval limit | | `--ingest` | `false` | Ingest corpus before running eval | | `--json` | `false` | Output structured JSON | | `--skip-enrichment` | `false` | Skip chunk enrichment during ingestion | See [Evaluation](eval.md) for details on metrics, test cases, and extending the eval suite. ## kb cluster Run k-means clustering on fragment embeddings to discover topic groups. ```bash kb cluster ``` ### kb cluster viz Generate an interactive HTML visualization of fragment clusters. ```bash kb cluster viz ``` ## kb setup Verify runtime dependencies and pull required models. Useful for checking everything works before first use, or re-running setup after problems. ```bash kb setup ``` ``` Checking Ollama... running at http://localhost:11434 Checking models... nomic-embed-text... available qwen2.5:0.5b... available Ready. ``` ### kb setup mcp Configure MCP settings for Claude Code or Cursor. ```bash kb setup mcp kb setup mcp --client claude --global kb setup mcp --client cursor --local ``` ## kb version Print the KB version. ```bash kb version kb version --remote http://server:8080 ``` ## kb config Show the resolved configuration: where each value comes from, which config files were loaded, and the current value of every setting (secrets are masked). ```bash kb config kb config --config /etc/kb/config ``` ``` Config files: ~/.config/kb/config found .env not found --config (not specified) KEY VALUE SOURCE KB_DB kb.db default KB_OLLAMA_URL http://localhost:11434 ~/.config/kb/config ANTHROPIC_API_KEY sk-ant-a**** env ... ``` ## Global flags | Flag | Description | |------|-------------| | `--config` | Path to config file (overrides `.env` and `~/.config/kb/config`) | | `--debug` | Enable debug mode (log all API calls) | | `--no-setup` | Skip automatic runtime management (useful for CI or custom deployments) | ## Configuration KB loads configuration from multiple sources. Later sources override earlier ones: | Precedence | Source | Description | |:----------:|--------|-------------| | 1 (lowest) | Defaults | Built-in defaults | | 2 | `~/.config/kb/config` | Persistent user config (respects `$XDG_CONFIG_HOME`) | | 3 | `.env` in working directory | Project-local overrides (useful during development) | | 4 | `--config ` | Explicit file path (useful for server deployments) | | 5 (highest) | Environment variables | Always take precedence | All config files use the same `KEY=VALUE` format (same as `.env`). Run `kb config` to see which source each value comes from. ### Environment variables | Variable | Default | Description | |----------|---------|-------------| | `KB_DB` | `kb.db` | SQLite database path | | `KB_OLLAMA_URL` | `http://localhost:11434` | Embedding server URL | | `KB_EMBEDDING_MODEL` | `nomic-embed-text` | Embedding model name | | `KB_ENRICH_MODEL` | `qwen2.5:0.5b` | Enrichment model name | | `KB_SKIP_SETUP` | `false` | Skip automatic runtime management | | `KB_LLM_PROVIDER` | `claude` | LLM provider: `claude`, `openai`, or `ollama` | | `ANTHROPIC_API_KEY` | — | API key for Claude (default LLM provider) | | `KB_CLAUDE_MODEL` | `claude-sonnet-4-20250514` | Claude model for synthesis | | `OPENAI_API_KEY` | — | API key for OpenAI | | `KB_LISTEN_ADDR` | `:8080` | Default HTTP listen address | | `KB_MAX_CHUNK_SIZE` | `2000` | Max chunk size in characters | | `KB_CHUNK_OVERLAP` | `150` | Chunk overlap in characters | | `KB_WORKERS` | `4` | Parallel ingestion workers | | `KB_DEFAULT_LIMIT` | `20` | Default fragment retrieval limit | --- # Evaluation Framework A repeatable eval harness for measuring retrieval quality. Run evals before and after changes to chunking, embedding models, or prompts. ## Quick start ```bash make eval ``` This ingests the eval corpus, runs the test set, and prints a summary table. No API key needed. ## Manual usage ```bash # Ingest the eval corpus into a fresh database kb ingest --source eval/corpus --db eval.db # Run evaluation kb eval --db eval.db # Run with custom options kb eval --db eval.db --testset eval/testset.json --limit 10 --json # Ingest and eval in one step kb eval --db eval.db --corpus eval/corpus --ingest ``` ### Flags | Flag | Default | Description | |------|---------|-------------| | `--db` | `kb.db` | Database path | | `--testset` | `eval/testset.json` | Path to query/answer test set | | `--corpus` | `eval/corpus` | Path to eval corpus directory | | `--limit` | `20` | K value for retrieval (top-K fragments) | | `--ingest` | false | Ingest the corpus before running eval | | `--json` | false | Output structured JSON instead of a table | ## Metrics ### Retrieval metrics (reported at K=5, K=10, K=20) - **Hit@K** — Was at least one expected source file found in the top-K results? - **Recall@K** — Did the expected source files appear in the top-K retrieved fragments? - **Precision@K** — What fraction of top-K fragments came from expected source files? - **MRR** — Mean reciprocal rank of the first relevant fragment, averaged across queries. - **Avg confidence** — Mean confidence score across retrieved fragments. - **Avg freshness** — Mean freshness score across retrieved fragments. ### Chunking stats - Total fragment count - Fragments per file - Mean, median, and P95 token length (whitespace-approximated) These track chunking quality over time, so a change that produces 3x more fragments or halves average length is immediately visible. ## Eval corpus Located in `eval/corpus/`. A set of fictional files about an "Acme Widget Service" designed to exercise different retrieval scenarios: | File | Purpose | |------|---------| | `README.md` | Project overview, features, quick start | | `config.go` | Go config structs and validation | | `api.go` | HTTP API handlers | | `architecture.md` | System design (intentionally contradicts README on some details) | | `runbook.md` | Operational procedures and troubleshooting | | `CHANGELOG.md` | Version history and release notes | | `CONTRIBUTING.md` | Contribution guidelines | | `deploy/kubernetes.yaml` | Kubernetes deployment manifests | | `design-review.md` | Design review discussion | | `incident-review.md` | Incident post-mortem | The corpus is checked into the repo and should not change between eval runs unless you're intentionally updating it. ## Test set Located in `eval/testset.json`. Each entry has: ```json { "id": "q01", "query": "What database does the widget service use?", "expected_sources": ["config.go", "architecture.md"], "reference_answer": "PostgreSQL, configured via DATABASE_URL.", "category": "direct_extraction" } ``` Categories: - **direct_extraction** — single-source factual lookup - **cross_document** — requires information from multiple files - **knowledge_update** — tests whether newer information is preferred - **abstention** — no good answer exists in the corpus - **pronoun_resolution** — requires resolving references across context - **vocabulary_mismatch** — query uses different terms than the corpus ## Extending the eval **Adding queries:** Edit `eval/testset.json`. Include `expected_sources` (filenames that should appear in results) and a `reference_answer` for human comparison. **Adding corpus files:** Add files to `eval/corpus/` and write questions that reference them. Re-run `make eval` to see the impact. **Comparing configurations:** Run eval with different embedding models or chunk sizes by changing `KB_EMBEDDING_MODEL` or `KB_MAX_CHUNK_SIZE`, then compare the summary tables. ---