RAG (Retrieval-Augmented Generation)

RAG enhances LLM responses by automatically including relevant context from your vector databases. This allows the model to provide more accurate, contextual answers based on your specific knowledge base.

How RAG Works

Query Processing: Your question is converted to embeddings
Similarity Search: Relevant content is retrieved from the vector database
Context Injection: Retrieved content is added to your prompt
Enhanced Response: The LLM responds with knowledge from your data

Quick Start

# 1. Build a knowledge base
lc embed -m text-embedding-3-small -v knowledge -f "docs/*.md"

# 2. Use RAG in direct prompts
lc -v knowledge "How does the authentication system work?"

# 3. Use RAG in interactive chat
lc chat -v knowledge -m gpt-4

RAG in Direct Prompts

The simplest way to use RAG:

# Basic usage
lc -v <database> "your question"

# With specific model
lc -v docs -m gpt-4 "Explain the API endpoints"

# With specific provider
lc -v knowledge -p openai -m gpt-4 "What are the best practices?"

RAG in Interactive Chat

Start a RAG-enhanced chat session:

lc chat -v <database> -m <model>

In this mode:

Every message is automatically enhanced with relevant context
The system maintains conversation history
Context is refreshed for each new message

How Context is Selected

The RAG system uses intelligent filtering:

Similarity Threshold: Only includes content with >0.3 similarity score
Relevance Ranking: Most similar content appears first
Token Limits: Prevents context overflow
Deduplication: Avoids redundant information

Building Effective Knowledge Bases

1. Domain-Specific Databases

Create focused databases for better retrieval:

# Technical documentation
lc embed -v tech-docs -f "docs/technical/*.md"

# API reference
lc embed -v api-ref -f "docs/api/*.json"

# Code examples
lc embed -v examples -f "examples/**/*.py"

2. Chunking Strategy

Files are automatically chunked for optimal retrieval:

Chunk Size: ~1200 characters
Overlap: 200 characters
Boundaries: Respects sentences and paragraphs

3. Regular Updates

Keep your knowledge base current:

# Re-embed updated files
lc embed -v docs -f "updated-file.md"

# Add new content
lc embed -v docs "New important information"

Advanced RAG Patterns

Multi-Database RAG

Use multiple databases for comprehensive context:

# Create specialized databases
lc embed -v code -f "src/**/*.rs"
lc embed -v docs -f "docs/**/*.md"
lc embed -v issues -f "issues/**/*.txt"

# Query across databases (one at a time)
lc -v code "How is authentication implemented?"
lc -v docs "What does the documentation say about auth?"

Build knowledge incrementally:

# Start with high-level concepts
lc embed -v knowledge "Machine learning is a subset of AI"

# Add specific details
lc embed -v knowledge "Supervised learning uses labeled data"

# Add examples
lc embed -v knowledge "Example: spam detection uses supervised learning"

Context-Aware Prompting

Structure prompts for better RAG results:

# Good: Specific question
lc -v docs "What are the authentication endpoints in the API?"

# Better: Context + question
lc -v docs "In the context of our REST API, what are the authentication endpoints?"

# Best: Structured query
lc -v docs "List all authentication-related API endpoints with their methods and parameters"

RAG Response Format

When using RAG, the system automatically formats context:

Based on the following relevant information:
• [Relevant content 1]
• [Relevant content 2]
• [Relevant content 3]

[Your question]

This helps the LLM understand what information to prioritize.

Performance Optimization

1. Model Selection

Use the same embedding model consistently
Smaller models (text-embedding-3-small) are faster
Larger models provide better semantic understanding

2. Database Size

Optimal: 100-10,000 documents
Large databases may slow retrieval
Consider splitting very large corpora

3. Query Optimization

Be specific in your questions
Use domain terminology consistently
Avoid overly broad queries

Troubleshooting RAG

"No relevant context found"

Check database content: lc vectors info <database>
Try different query terms
Ensure database has related content

"Context seems unrelated"

Review similarity threshold
Check if embedding model matches
Consider re-embedding with better chunking

"Response ignores context"

Ensure model has sufficient context window
Try a more capable model (e.g., gpt-4)
Restructure your question

Best Practices

Start Small: Build focused databases first
Test Retrieval: Use lc similar to verify quality
Iterate: Refine your knowledge base based on results
Monitor: Check which content is being retrieved
Update: Keep databases current with new information

Example Workflow

# 1. Create a project knowledge base
lc embed -v project -f "README.md"
lc embed -v project -f "docs/*.md"
lc embed -v project -f "src/**/*.py"

# 2. Test retrieval
lc similar -v project "authentication flow"

# 3. Use in development
lc -v project "How do I add a new API endpoint?"

# 4. Interactive problem-solving
lc chat -v project -m gpt-4
> How can I optimize the database queries?
> What's the current caching strategy?

Next Steps

Vector Database Guide - Managing your knowledge bases
Embeddings Guide - Creating effective embeddings
Similarity Search - Testing your retrieval

RAG (Retrieval-Augmented Generation)

How RAG Works​

Quick Start​

RAG in Direct Prompts​

RAG in Interactive Chat​

How Context is Selected​

Building Effective Knowledge Bases​

1. Domain-Specific Databases​

2. Chunking Strategy​

3. Regular Updates​

Advanced RAG Patterns​

Multi-Database RAG​

Iterative Refinement​

Context-Aware Prompting​

RAG Response Format​

Performance Optimization​

1. Model Selection​

2. Database Size​

3. Query Optimization​

Troubleshooting RAG​

"No relevant context found"​

"Context seems unrelated"​

"Response ignores context"​

Best Practices​

Example Workflow​

Next Steps​