/codegraph-rust

100% Rust GraphRAG implementation with a MCP server for indexing large codebases with blazingly fast speed and querying the graph with natural language

Primary LanguageRust

CodeGraph

Turn your codebase into a searchable knowledge graph powered by embeddings and LLMs

CodeGraph indexes your source code to a graph database, creates semantic embeddings, and exposes a Model Context Protocol (MCP) server that AI tools (Claude Desktop, LM Studio, etc.) can query for project-aware context.

✨ What you get:

  • 🔍 Semantic code search across your entire codebase
  • 🧠 LLM-powered code intelligence and analysis
  • 📊 Automatic dependency graphs and code relationships
  • ⚡ Fast vector search with FAISS or cloud SurrealDB HNSW (2-5ms query latency)
  • 🔌 MCP server for AI tool integration (stdio and streamable HTTP)
  • ⚙️ Easy-to-use CLI interface
  • ☁️ NEW: Jina AI cloud embeddings with modifiable models and dimensions and reranking
  • 🗄️ NEW: SurrealDB HNSW backend for cloud-native and local vector search
  • 📦 NEW: Node.js NAPI bindings for zero-overhead TypeScript integration
  • 🤖 NEW: Agentic code-agent tools with tier-aware multi-step reasoning

⚠️ Important: MCP Server Architecture Change

FAISS+RocksDB support in MCP server is deprecated in favor of SurrealDB-based architecture.

What Changed:

  • MCP server no longer uses FAISS vector search or RocksDB graph storage
  • CLI and SDK continue to support FAISS/RocksDB for local operations
  • NAPI bindings still provide TypeScript access to all features
  • 🆕 MCP code-agent tools now require SurrealDB for graph analysis

Required Setup for Code-Agent Tools:

The new agentic MCP tools (agentic_code_search, agentic_dependency_analysis, etc.) require SurrealDB:

Option 1: Free Cloud Instance (Recommended)

  • Sign up at Surreal Cloud
  • Get 1GB FREE instance - perfect for testing and small projects
  • Configure connection details in environment variables

Option 2: Local Installation

# Install SurrealDB
curl -sSf https://install.surrealdb.com | sh

# Run locally
surreal start --bind 127.0.0.1:3004 --user root --pass root memory

Free Cloud Resources:

  • 🆓 SurrealDB Cloud: 1GB free instance at surrealdb.com/cloud
  • 🆓 Jina AI: 10 million free API tokens at jina.ai for embeddings and reranking

Why This Change:

  • Native graph capabilities: SurrealDB provides built-in graph database features
  • Unified storage: Single database for both vectors and graph relationships and extendable to relational and document use-cases!
  • Cloud-native: Better support for distributed deployments
  • Reduced complexity: Eliminates custom RocksDB integration layer

See CHANGELOG.md for detailed migration guide.


📋 Table of Contents


🎯 Choose Your Setup

Pick the setup that matches your needs:

Option 1: Local Setup (Free, Private) 🏠

Best for: Privacy-conscious users, offline work, no API costs

Providers:

  • Embeddings: ONNX or Ollama
  • LLM: Ollama (Qwen2.5-Coder, CodeLlama, etc.)

Pros: ✅ Free, ✅ Private, ✅ No internet required after setup Cons: ❌ Slower, ❌ Requires local GPU/CPU resources

→ Jump to Local Setup Instructions


Option 2: LM Studio (Best Performance on Mac) 🚀

Best for: Mac users (Apple Silicon), best local performance

Providers:

  • Embeddings: LM Studio (Jina embeddings)
  • LLM: LM Studio (DeepSeek Coder, etc.)

Pros: ✅ 120 embeddings/sec, ✅ MLX + Flash Attention 2, ✅ Free Cons: ❌ Mac only, ❌ Requires LM Studio app

→ Jump to LM Studio Setup Instructions


Option 3: Cloud Providers (Best Quality) ☁️

Best for: Production use, best quality, don't want to manage local models

Providers:

  • Embeddings: Jina (You get 10 million tokens for free when you just create an account!)
  • LLM: Anthropic Claude or OpenAI GPT-5-*
  • Backend: SurrealDB graph database (You get a free cloud instance up-to 1gb! Or run it completely locally!)

Pros: ✅ Best quality, ✅ Fast, ✅ 1M context (sonnet[1m]) Cons: ❌ API costs, ❌ Requires internet, ❌ Data sent to cloud

→ Jump to Cloud Setup Instructions


Option 4: Hybrid (Mix & Match) 🔀

Best for: Balancing cost and quality

Example combinations:

  • Local embeddings (ONNX) + Cloud LLM (OpenAI, Claude, x.ai)
  • LMStudio embeddings + Cloud LLM (OpenAI, Claude, x.ai)
  • Jina AI embeddings + Local LLM (Ollama, LMStudio)

→ Jump to Hybrid Setup Instructions


🛠️ Installation

Prerequisites (All Setups)

# 1. Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# 2. Install FAISS (vector search library)
# macOS:
brew install faiss

# Ubuntu/Debian:
sudo apt-get install libfaiss-dev

# Arch Linux:
sudo pacman -S faiss

Local Setup (ONNX + Ollama)

Step 1: Install Ollama

# macOS/Linux:
curl -fsSL https://ollama.com/install.sh | sh

# Or download from: https://ollama.com/download

brew install onnx-runtime

Step 2: Pull models

# Pull embedding model
hf (cli) download qdrant/all-minillm-onnx

# Pull LLM for code intelligence (optional)
ollama pull qwen2.5-coder:14b

Step 3: Build CodeGraph

cd codegraph-rust

# Build with ONNX embeddings and Ollama support
cargo build --release --features "onnx,ollama,faiss"

Step 4: Configure

Create ~/.codegraph/config.toml:

[embedding]
provider = "onnx"  # or "ollama" if you prefer
model = "qdrant/all-minillm-onnx"
dimension = 384

[llm]
enabled = true
provider = "ollama"
model = "qwen2.5-coder:14b"
ollama_url = "http://localhost:11434"

Step 5: Index and run

# Index your project
./target/release/codegraph index /path/to/your/project

# Start MCP server
./target/release/codegraph start stdio

Done! Your local setup is ready.


LM Studio Setup

Step 1: Install LM Studio

Step 2: Download models in LM Studio

  • Embedding model: jinaai/jina-embeddings-v4
  • LLM model (optional): lmstudio-community/DeepSeek-Coder-V2-Lite-Instruct-GGUF

Step 3: Start LM Studio server

  • In LM Studio, go to "Local Server" tab
  • Click "Start Server" (runs on http://localhost:1234)

Step 4: Build CodeGraph

cd codegraph-rust

# Build with OpenAI-compatible support (for LM Studio)
cargo build --release --features "openai-compatible,faiss"

Step 5: Configure

Create ~/.codegraph/config.toml:

[embedding]
provider = "lmstudio"
model = "jinaai/jina-embeddings-v4"
lmstudio_url = "http://localhost:1234"
dimension = 2048

[llm]
enabled = true
provider = "lmstudio"
model = "lmstudio-community/DeepSeek-Coder-V2-Lite-Instruct-GGUF"
lmstudio_url = "http://localhost:1234"

Step 6: Index and run

# Index your project
./target/release/codegraph index /path/to/your/project

# Start MCP server
./target/release/codegraph start stdio

Done! LM Studio setup complete.


Cloud Setup (Anthropic, OpenAI, xAI & Jina AI)

Step 1: Get API keys

Step 2: Build CodeGraph with cloud features

cd codegraph-rust

# Build with all cloud providers
cargo build --release --features "anthropic,openai-llm,openai,faiss"

# Or with Jina AI cloud embeddings (Matryoska dimensions + reranking)
cargo build --release --features "cloud-jina,anthropic,faiss"

# Or with SurrealDB HNSW cloud/local vector backend
cargo build --release --features "cloud-surrealdb,openai,faiss"

Step 3: Run setup wizard (easiest)

./target/release/codegraph-setup

The wizard will guide you through configuration.

Or manually configure ~/.codegraph/config.toml:

For Anthropic Claude:

[embedding]
provider = "jina" # or openai
model = "jina-embeddings-v4"
openai_api_key = "sk-..."  # or set OPENAI_API_KEY env var
dimension = 2048

[llm]
enabled = true
provider = "anthropic"
model = "claude-haiku"
anthropic_api_key = "sk-ant-..."  # or set ANTHROPIC_API_KEY env var
context_window = 200000

For OpenAI (with reasoning models):

[embedding]
provider = "jina" # or openai
model = "jina-embeddings-v4"
openai_api_key = "sk-..."
dimension = 2048

[llm]
enabled = true
provider = "openai"
model = "gpt-5-codex-mini"
openai_api_key = "sk-..."
max_completion_token = 128000
reasoning_effort = "medium"  # reasoning models: "minimal", "medium", "high"

For Jina AI (cloud embeddings with reranking):

[embedding]
provider = "jina"
model = "jina-embeddings-v4"
jina_api_key = "jina_..."  # or set JINA_API_KEY env var
dimension = 2048 # or matryoshka 1024,512,256 adjust the schemas/*.surql file HNSW vector index to match your embedding model dimensions
jina_enable_reranking = true  # Optional two-stage retrieval
jina_reranking_model = "jina-reranker-v3"

[llm]
enabled = true
provider = "anthropic"
model = "claude-haiku"
anthropic_api_key = "sk-ant-..."

For xAI Grok (2M context window, $0.50-$1.50/M tokens):

[embedding]
provider = "openai"  # or "jina"
model = "text-embedding-3-small"
openai_api_key = "sk-..."
dimension = 2048

[llm]
enabled = true
provider = "xai"
model = "grok-4-fast"  # or "grok-4-turbo"
xai_api_key = "xai-..."  # or set XAI_API_KEY env var
xai_base_url = "https://api.x.ai/v1"  # default, can be omitted
reasoning_effort = "medium"  # Options: "minimal", "medium", "high"
context_window = 2000000  # 2M tokens!

For SurrealDB HNSW (graph database backend with advanced features):

[embedding]
provider = "jina"  # or "openai"
model = "jina-embeddings-v4"
openai_api_key = "sk-..."
dimension = 2048

[vector_store]
backend = "surrealdb"  # Instead of "faiss"
surrealdb_url = "ws://localhost:8000"  # or cloud instance
surrealdb_namespace = "codegraph"
surrealdb_database = "production"

[llm]
enabled = true
provider = "anthropic"
model = "claude-haiku"

Step 4: Index and run

# Index your project
./target/release/codegraph index /path/to/your/project

# Start MCP server
./target/release/codegraph start stdio

Done! Cloud setup complete.


Hybrid Setup

Mix local and cloud providers to balance cost and quality:

Example: Local embeddings + Cloud LLM

[embedding]
provider = "onnx"  # Free, local
model = "sentence-transformers/all-MiniLM-L6-v2"
dimension = 384

[llm]
enabled = true
provider = "anthropic"  # Best quality for analysis
model = "sonnet[1m]"
anthropic_api_key = "sk-ant-..."

Build with required features:

cargo build --release --features "onnx,anthropic,faiss"

⚙️ Configuration

Quick Configuration

Use the interactive wizard:

cargo build --release --bin codegraph-setup --features all-cloud-providers
./target/release/codegraph-setup

Manual Configuration

Configuration directory: ~/.codegraph/

All configuration files are stored in ~/.codegraph/ in TOML format.

Configuration is loaded from (in order):

  1. ~/.codegraph/default.toml (base configuration)
  2. ~/.codegraph/{environment}.toml (e.g., development.toml, production.toml)
  3. ~/.codegraph/local.toml (local overrides, machine-specific)
  4. ./config/ (fallback for backward compatibility)
  5. Environment variables (CODEGRAPH__* prefix)

See Configuration Guide for complete documentation.

Full configuration example:

[embedding]
provider = "lmstudio"  # or "onnx", "ollama", "openai"
model = "jinaai/jina-embeddings-v4"
dimension = 2048
batch_size = 64

[llm]
enabled = true
provider = "anthropic"  # or "openai", "ollama", "lmstudio" or "xai"
model = "haiku"
anthropic_api_key = "sk-ant-..."
context_window = 200000
temperature = 0.1
max_completion_token = 4096

[performance]
num_threads = 0  # 0 = auto-detect
cache_size_mb = 512
max_concurrent_requests = 4

[logging]
level = "warn"  # trace, debug, info, warn, error
format = "pretty"  # pretty, json, compact

See .codegraph.toml.example for all options.


🚀 Usage

Basic Commands

# Index a project
codegraph index -r /path/to/project

# Start MCP server (for Claude Desktop, LM Studio, etc.)
codegraph start stdio

# List available MCP tools
codegraph tools list

Note: HTTP transport is not yet implemented with the official rmcp SDK. Use STDIO transport for all MCP integrations.

Using with Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on Mac):

{
  "mcpServers": {
    "codegraph": {
      "command": "/path/to/codegraph",
      "args": ["start", "stdio"],
      "env": {
        "RUST_LOG": "warn"
      }
    }
  }
}

Using with LM Studio

  1. Start CodeGraph MCP server: codegraph start stdio
  2. In LM Studio, enable MCP support in settings
  3. CodeGraph tools will appear in LM Studio's tool palette

📊 Feature Flags Reference

When building, include features for the providers you want to use:

Feature Providers Enabled Use Case
onnx ONNX embeddings Local CPU/GPU embeddings
ollama Ollama embeddings + LLM Local models via Ollama
openai OpenAI embeddings Cloud embeddings (text-embedding-3-large/small)
openai-llm OpenAI Cloud LLM (gpt-5, gpt-5-codex, gpt-5-codex-mini)
anthropic Anthropic Claude Cloud LLM (Claude 4.5, Haiku 4.5)
openai-compatible LMStudio, custom providers OpenAI Responses API compatible
cloud-jina Jina AI embeddings + reranking Cloud embeddings & Free usage (SOTA and variable dims)
cloud-surrealdb SurrealDB HNSW Local & Free Cloud-native graph database backend (up-to 1gb)
cloud Jina AI + SurrealDB All cloud vector & graph features
faiss FAISS vector search Local vector search graph backend (rocksdb persisted)
all-cloud-providers All cloud LLM providers Shortcut for Jina + Surreal + Anthropic + OpenAI

Common Build Commands

# Local only (ONNX + Ollama)
cargo build --release --features "onnx,ollama,faiss"

# LM Studio
cargo build --release --features "openai-compatible,faiss"

# Cloud only (Anthropic + OpenAI)
cargo build --release --features "anthropic,openai-llm,openai,faiss"

# Jina AI cloud embeddings + local FAISS
cargo build --release --features "cloud-jina,faiss"

# SurrealDB cloud vector backend
cargo build --release --features "cloud-surrealdb,openai,faiss"

# Full cloud (Jina + SurrealDB + Anthropic)
cargo build --release --features "cloud,anthropic,faiss"

# Everything (local + cloud)
cargo build --release --features "all-cloud-providers,onnx,ollama,cloud,faiss"

⚡ Performance

Speed Metrics (Apple Silicon + LM Studio)

Operation Performance Notes
Embedding generation 120 embeddings/sec LM Studio with MLX
Vector search (local) 2-5ms latency FAISS with index caching
Vector search (cloud) 2-5ms latency SurrealDB HNSW
Jina AI embeddings 50-150ms per query Cloud API call overhead
Jina reranking 80-200ms for top-K Two-stage retrieval
Ollama embeddings ~60 embeddings/sec About half LM Studio speed

Optimizations (Enabled by Default)

Optimization Speedup Memory Cost
FAISS index cache 10-50× 300-600 MB
Embedding cache 10-100× ~90 MB
Query cache 100× ~10 MB
Parallel search 2-3× Minimal

🔧 Troubleshooting

Build Issues

"Could not find library faiss"

# Install FAISS first
brew install faiss  # macOS
sudo apt-get install libfaiss-dev  # Ubuntu

"Feature X is not enabled"

  • Make sure you included the feature flag when building
  • Example: cargo build --release --features "anthropic,faiss"

Runtime Issues

"API key not found"

  • Set environment variable: export ANTHROPIC_API_KEY="sk-ant-..."
  • Or add to config file: anthropic_api_key = "sk-ant-..."

"Model not found"

  • For Ollama: Run ollama pull <model-name> first
  • For LM Studio: Download the model in LM Studio app
  • For cloud: Check your model name matches available models

"Connection refused"

  • LM Studio: Make sure the local server is running
  • Ollama: Check Ollama is running with ollama list
  • Cloud: Check your internet connection

Getting Help

  1. Check docs/CLOUD_PROVIDERS.md for detailed provider setup
  2. See LMSTUDIO_SETUP.md for LM Studio specifics
  3. Open an issue on GitHub with your error message

📦 Node.js Integration (NAPI Bindings)

Zero-Overhead TypeScript Integration

CodeGraph provides native Node.js bindings through NAPI-RS for seamless TypeScript/JavaScript integration:

Key Features:

  • 🚀 Native Performance: Direct Rust-to-Node.js bindings with zero serialization overhead
  • 📘 Auto-Generated Types: TypeScript definitions generated directly from Rust code
  • Async Runtime: Full tokio async support integrated with Node.js event loop
  • 🔄 Hot-Reload Config: Update configuration without restarting your Node.js process
  • 🌐 Dual-Mode Search: Automatic routing between local FAISS and cloud SurrealDB

Installation

Option 1: Direct Install (Recommended)

# Build the addon
cd crates/codegraph-napi
npm install
npm run build

# Install in your project
cd /path/to/your-project
npm install /path/to/codegraph-rust/crates/codegraph-napi

Option 2: Pack and Install

# Build and pack
cd crates/codegraph-napi
npm install
npm run build
npm pack  # Creates codegraph-napi-1.0.0.tgz

# Install in your project
cd /path/to/your-project
npm install /path/to/codegraph-rust/crates/codegraph-napi/codegraph-napi-1.0.0.tgz

API Examples

Semantic Search:

import { semanticSearch } from 'codegraph-napi';

const results = await semanticSearch('find authentication code', {
  limit: 10,
  useCloud: true,      // Use cloud search with automatic fallback
  reranking: true      // Enable Jina reranking (if configured)
});

console.log(`Found ${results.totalCount} results in ${results.searchTimeMs}ms`);
console.log(`Search mode: ${results.modeUsed}`);  // "local" or "cloud"

Configuration Management:

import { getCloudConfig, reloadConfig } from 'codegraph-napi';

// Check cloud feature availability
const config = await getCloudConfig();
console.log('Jina AI enabled:', config.jina_enabled);
console.log('SurrealDB enabled:', config.surrealdb_enabled);

// Hot-reload configuration without restart
await reloadConfig();

Embedding Operations:

import { getEmbeddingStats, countTokens } from 'codegraph-napi';

// Get embedding provider stats
const stats = await getEmbeddingStats();
console.log(`Provider: ${stats.provider}, Dimension: ${stats.dimension}`);

// Count tokens for cost estimation (Jina AI)
const tokens = await countTokens("query text");
console.log(`Token count: ${tokens}`);

Graph Navigation:

import { getNeighbors, getGraphStats } from 'codegraph-napi';

// Get connected nodes
const neighbors = await getNeighbors(nodeId);

// Get graph statistics
const stats = await getGraphStats();
console.log(`Nodes: ${stats.node_count}, Edges: ${stats.edge_count}`);

Build Options

Feature flags for selective compilation:

# Local-only (FAISS, no cloud)
npm run build  # Uses default = ["local"]

# Cloud-only (no FAISS)
npm run build -- --features cloud

# Full build (local + cloud)
npm run build -- --features full

See NAPI README for complete documentation.

🤝 Contributing

We welcome contributions!

# Format code
cargo fmt --all

# Run linter
cargo clippy --workspace --all-targets

# Run tests
cargo test --workspace

Open an issue to discuss large changes before starting.


📄 License

Dual-licensed under MIT and Apache 2.0. See LICENSE-MIT and LICENSE-APACHE for details.


📚 Learn More