Skip to content

cagent with RAG

Enabling RAG in cagent is as simple as adding a rag section to your config. Here's a basic example.

Step 1. Clone the repo

git clone https://github.com/ajeetraina/cagent-rag-demo/
cd cagent-rag-demo

Step 2. Get OpenAI API Key

# Set your OpenAI API key
export OPENAI_API_KEY=your-key-here

Step 3. Verify the YAML file

rag:
  codebase:
    docs: [./src, ./pkg]
    strategies:
      - type: chunked-embeddings
        embedding_model: openai/text-embedding-3-small
        vector_dimensions: 1536
        database: ./embeddings.db
        limit: 20
        chunking:
          size: 1500
          overlap: 150
      - type: bm25
        database: ./bm25.db
        limit: 15
    results:
      fusion:
        strategy: rrf
        k: 60
      deduplicate: true
      limit: 5

agents:
  root:
    model: openai/gpt-4
    instruction: |
      You are a Go developer. Search the codebase before answering.
      Reference specific files and functions.
    rag: [codebase]
cagent run cagent-config.yaml

This config indexes ./src and ./pkg using two strategies:

  • chunked-embeddings converts code into vectors for semantic search (finding "authentication" when you search "login"), while bm25 does keyword matching (finding exact function names like HandleRequest).

Chunks are 1500 characters with 150-character overlap to preserve context at boundaries. Both strategies run in parallel—embeddings returns up to 20 results, bm25 up to 15. RRF fusion merges them by rank (not score), deduplicates, and returns the top 5. The agent gets a search tool linked to this index and is instructed to search before answering.

How it works?

This is a RAG (Retrieval-Augmented Generation) config for cagent. Here's what each section does:

rag: — The knowledge base
yamldocs: [./src, ./pkg]          # Index these 2 directories (your Go source code)

Two retrieval strategies running in parallel:

strategy How it works
chunked-embeddings Splits code into 1500-char chunks (150 overlap), converts to vectors via OpenAI embeddings, finds top 20 semantically similar chunks
bm25 Keyword-based search (like grep but smarter), finds top 15 matches by term frequency

results: — How results are merged

fusion:
  strategy: rrf    # Reciprocal Rank Fusion — combines both rankings
  k: 60           # Smoothing factor (higher = more equal weighting)
deduplicate: true  # Remove duplicate chunks
limit: 5          # Feed only the best 5 chunks to the LLM

So: 20 embedding results + 15 BM25 results → fused → deduped → top 5 sent to GPT-4.