CodeGraph

Turn your codebase into a searchable knowledge graph powered by embeddings and LLMs

CodeGraph indexes your source code to a graph database, creates semantic embeddings, and exposes a Model Context Protocol (MCP) server that AI tools (Claude Desktop, LM Studio, etc.) can query for project-aware context.

✨ What you get:

🔍 Semantic code search across your entire codebase
🧠 LLM-powered code intelligence and analysis
📊 Automatic dependency graphs and code relationships
⚡ Fast vector search with FAISS or cloud SurrealDB HNSW (2-5ms query latency)
🔌 MCP server for AI tool integration (stdio and streamable HTTP)
⚙️ Easy-to-use CLI interface
☁️ NEW: Jina AI cloud embeddings with modifiable models and dimensions and reranking
🗄️ NEW: SurrealDB HNSW backend for cloud-native and local vector search
📦 NEW: Node.js NAPI bindings for zero-overhead TypeScript integration
🤖 NEW: Agentic code-agent tools with tier-aware multi-step reasoning

⚠️ Important: MCP Server Architecture Change

FAISS+RocksDB support in MCP server is deprecated in favor of SurrealDB-based architecture.

What Changed:

❌ MCP server no longer uses FAISS vector search or RocksDB graph storage
✅ CLI and SDK continue to support FAISS/RocksDB for local operations
✅ NAPI bindings still provide TypeScript access to all features
🆕 MCP code-agent tools now require SurrealDB for graph analysis

Required Setup for Code-Agent Tools:

The new agentic MCP tools (agentic_code_search, agentic_dependency_analysis, etc.) require SurrealDB:

Option 1: Free Cloud Instance (Recommended)

Sign up at Surreal Cloud
Get 1GB FREE instance - perfect for testing and small projects
Configure connection details in environment variables

Option 2: Local Installation

# Install SurrealDB
curl -sSf https://install.surrealdb.com | sh

# Run locally
surreal start --bind 127.0.0.1:3004 --user root --pass root memory

Free Cloud Resources:

🆓 SurrealDB Cloud: 1GB free instance at surrealdb.com/cloud
🆓 Jina AI: 10 million free API tokens at jina.ai for embeddings and reranking

Why This Change:

Native graph capabilities: SurrealDB provides built-in graph database features
Unified storage: Single database for both vectors and graph relationships and extendable to relational and document use-cases!
Cloud-native: Better support for distributed deployments
Reduced complexity: Eliminates custom RocksDB integration layer

See CHANGELOG.md for detailed migration guide.

🎯 Choose Your Setup

Pick the setup that matches your needs:

Option 1: Local Setup (Free, Private) 🏠

Best for: Privacy-conscious users, offline work, no API costs

Providers:

Embeddings: ONNX or Ollama
LLM: Ollama (Qwen2.5-Coder, CodeLlama, etc.)

Pros: ✅ Free, ✅ Private, ✅ No internet required after setup Cons: ❌ Slower, ❌ Requires local GPU/CPU resources

→ Jump to Local Setup Instructions

Option 2: LM Studio (Best Performance on Mac) 🚀

Best for: Mac users (Apple Silicon), best local performance

Providers:

Embeddings: LM Studio (Jina embeddings)
LLM: LM Studio (DeepSeek Coder, etc.)

Pros: ✅ 120 embeddings/sec, ✅ MLX + Flash Attention 2, ✅ Free Cons: ❌ Mac only, ❌ Requires LM Studio app

→ Jump to LM Studio Setup Instructions

Option 3: Cloud Providers (Best Quality) ☁️

Best for: Production use, best quality, don't want to manage local models

Providers:

Embeddings: Jina (You get 10 million tokens for free when you just create an account!)
LLM: Anthropic Claude or OpenAI GPT-5-*
Backend: SurrealDB graph database (You get a free cloud instance up-to 1gb! Or run it completely locally!)

Pros: ✅ Best quality, ✅ Fast, ✅ 1M context (sonnet[1m]) Cons: ❌ API costs, ❌ Requires internet, ❌ Data sent to cloud

→ Jump to Cloud Setup Instructions

Option 4: Hybrid (Mix & Match) 🔀

Best for: Balancing cost and quality

Example combinations:

Local embeddings (ONNX) + Cloud LLM (OpenAI, Claude, x.ai)
LMStudio embeddings + Cloud LLM (OpenAI, Claude, x.ai)
Jina AI embeddings + Local LLM (Ollama, LMStudio)

→ Jump to Hybrid Setup Instructions

🛠️ Installation

Prerequisites (All Setups)

# 1. Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# 2. Install FAISS (vector search library)
# macOS:
brew install faiss

# Ubuntu/Debian:
sudo apt-get install libfaiss-dev

# Arch Linux:
sudo pacman -S faiss

Local Setup (ONNX + Ollama)

Step 1: Install Ollama

# macOS/Linux:
curl -fsSL https://ollama.com/install.sh | sh

# Or download from: https://ollama.com/download

brew install onnx-runtime

Step 2: Pull models

# Pull embedding model
hf (cli) download qdrant/all-minillm-onnx

# Pull LLM for code intelligence (optional)
ollama pull qwen2.5-coder:14b

Step 3: Build CodeGraph

cd codegraph-rust

# Build with ONNX embeddings and Ollama support
cargo build --release --features "onnx,ollama,faiss"

Step 4: Configure

Create ~/.codegraph/config.toml:

[embedding]
provider = "onnx"  # or "ollama" if you prefer
model = "qdrant/all-minillm-onnx"
dimension = 384

[llm]
enabled = true
provider = "ollama"
model = "qwen2.5-coder:14b"
ollama_url = "http://localhost:11434"

Step 5: Index and run

# Index your project
./target/release/codegraph index /path/to/your/project

# Start MCP server
./target/release/codegraph start stdio

✅ Done! Your local setup is ready.

LM Studio Setup

Step 1: Install LM Studio

Download from lmstudio.ai
Install and launch the app

Step 2: Download models in LM Studio

Embedding model: jinaai/jina-embeddings-v4
LLM model (optional): lmstudio-community/DeepSeek-Coder-V2-Lite-Instruct-GGUF

Step 3: Start LM Studio server

In LM Studio, go to "Local Server" tab
Click "Start Server" (runs on http://localhost:1234)

Step 4: Build CodeGraph

cd codegraph-rust

# Build with OpenAI-compatible support (for LM Studio)
cargo build --release --features "openai-compatible,faiss"

Step 5: Configure

Create ~/.codegraph/config.toml:

[embedding]
provider = "lmstudio"
model = "jinaai/jina-embeddings-v4"
lmstudio_url = "http://localhost:1234"
dimension = 2048

[llm]
enabled = true
provider = "lmstudio"
model = "lmstudio-community/DeepSeek-Coder-V2-Lite-Instruct-GGUF"
lmstudio_url = "http://localhost:1234"

Step 6: Index and run

# Index your project
./target/release/codegraph index /path/to/your/project

# Start MCP server
./target/release/codegraph start stdio

✅ Done! LM Studio setup complete.

Cloud Setup (Anthropic, OpenAI, xAI & Jina AI)

Step 1: Get API keys

Anthropic: console.anthropic.com(Claude 4.5 models 1M/200k ctx)
OpenAI: platform.openai.com(GPT-5 models 400k/200k ctx)
xAI: x.ai (Grok-4-fast with 2M ctx, $0.50-$1.50/M tokens)
Jina AI: jina.ai (for SOTA embeddings & reranking)
SurrealDB [https://www.surrealdb.com] (for graph dabase backend local or cloud based setup)

Step 2: Build CodeGraph with cloud features

cd codegraph-rust

# Build with all cloud providers
cargo build --release --features "anthropic,openai-llm,openai,faiss"

# Or with Jina AI cloud embeddings (Matryoska dimensions + reranking)
cargo build --release --features "cloud-jina,anthropic,faiss"

# Or with SurrealDB HNSW cloud/local vector backend
cargo build --release --features "cloud-surrealdb,openai,faiss"

Step 3: Run setup wizard (easiest)

./target/release/codegraph-setup

The wizard will guide you through configuration.

Or manually configure ~/.codegraph/config.toml:

For Anthropic Claude:

[embedding]
provider = "jina" # or openai
model = "jina-embeddings-v4"
openai_api_key = "sk-..."  # or set OPENAI_API_KEY env var
dimension = 2048

[llm]
enabled = true
provider = "anthropic"
model = "claude-haiku"
anthropic_api_key = "sk-ant-..."  # or set ANTHROPIC_API_KEY env var
context_window = 200000

For OpenAI (with reasoning models):

[embedding]
provider = "jina" # or openai
model = "jina-embeddings-v4"
openai_api_key = "sk-..."
dimension = 2048

[llm]
enabled = true
provider = "openai"
model = "gpt-5-codex-mini"
openai_api_key = "sk-..."
max_completion_token = 128000
reasoning_effort = "medium"  # reasoning models: "minimal", "medium", "high"

For Jina AI (cloud embeddings with reranking):

[embedding]
provider = "jina"
model = "jina-embeddings-v4"
jina_api_key = "jina_..."  # or set JINA_API_KEY env var
dimension = 2048 # or matryoshka 1024,512,256 adjust the schemas/*.surql file HNSW vector index to match your embedding model dimensions
jina_enable_reranking = true  # Optional two-stage retrieval
jina_reranking_model = "jina-reranker-v3"

[llm]
enabled = true
provider = "anthropic"
model = "claude-haiku"
anthropic_api_key = "sk-ant-..."

For xAI Grok (2M context window, $0.50-$1.50/M tokens):

[embedding]
provider = "openai"  # or "jina"
model = "text-embedding-3-small"
openai_api_key = "sk-..."
dimension = 2048

[llm]
enabled = true
provider = "xai"
model = "grok-4-fast"  # or "grok-4-turbo"
xai_api_key = "xai-..."  # or set XAI_API_KEY env var
xai_base_url = "https://api.x.ai/v1"  # default, can be omitted
reasoning_effort = "medium"  # Options: "minimal", "medium", "high"
context_window = 2000000  # 2M tokens!

For SurrealDB HNSW (graph database backend with advanced features):

[embedding]
provider = "jina"  # or "openai"
model = "jina-embeddings-v4"
openai_api_key = "sk-..."
dimension = 2048

[vector_store]
backend = "surrealdb"  # Instead of "faiss"
surrealdb_url = "ws://localhost:8000"  # or cloud instance
surrealdb_namespace = "codegraph"
surrealdb_database = "production"

[llm]
enabled = true
provider = "anthropic"
model = "claude-haiku"

Step 4: Index and run

# Index your project
./target/release/codegraph index /path/to/your/project

# Start MCP server
./target/release/codegraph start stdio

✅ Done! Cloud setup complete.

Hybrid Setup

Mix local and cloud providers to balance cost and quality:

Example: Local embeddings + Cloud LLM

[embedding]
provider = "onnx"  # Free, local
model = "sentence-transformers/all-MiniLM-L6-v2"
dimension = 384

[llm]
enabled = true
provider = "anthropic"  # Best quality for analysis
model = "sonnet[1m]"
anthropic_api_key = "sk-ant-..."

Build with required features:

cargo build --release --features "onnx,anthropic,faiss"

⚙️ Configuration

Quick Configuration

Use the interactive wizard:

cargo build --release --bin codegraph-setup --features all-cloud-providers
./target/release/codegraph-setup

Manual Configuration

Configuration directory: ~/.codegraph/

All configuration files are stored in ~/.codegraph/ in TOML format.

Configuration is loaded from (in order):

~/.codegraph/default.toml (base configuration)
~/.codegraph/{environment}.toml (e.g., development.toml, production.toml)
~/.codegraph/local.toml (local overrides, machine-specific)
./config/ (fallback for backward compatibility)
Environment variables (CODEGRAPH__* prefix)

See Configuration Guide for complete documentation.

Full configuration example:

[embedding]
provider = "lmstudio"  # or "onnx", "ollama", "openai"
model = "jinaai/jina-embeddings-v4"
dimension = 2048
batch_size = 64

[llm]
enabled = true
provider = "anthropic"  # or "openai", "ollama", "lmstudio" or "xai"
model = "haiku"
anthropic_api_key = "sk-ant-..."
context_window = 200000
temperature = 0.1
max_completion_token = 4096

[performance]
num_threads = 0  # 0 = auto-detect
cache_size_mb = 512
max_concurrent_requests = 4

[logging]
level = "warn"  # trace, debug, info, warn, error
format = "pretty"  # pretty, json, compact

See .codegraph.toml.example for all options.

🚀 Usage

Basic Commands

# Index a project
codegraph index -r /path/to/project

# Start MCP server (for Claude Desktop, LM Studio, etc.)
codegraph start stdio

# List available MCP tools
codegraph tools list

Note: HTTP transport is not yet implemented with the official rmcp SDK. Use STDIO transport for all MCP integrations.

Using with Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on Mac):

{
  "mcpServers": {
    "codegraph": {
      "command": "/path/to/codegraph",
      "args": ["start", "stdio"],
      "env": {
        "RUST_LOG": "warn"
      }
    }
  }
}

Using with LM Studio

Start CodeGraph MCP server: codegraph start stdio
In LM Studio, enable MCP support in settings
CodeGraph tools will appear in LM Studio's tool palette

📊 Feature Flags Reference

When building, include features for the providers you want to use:

Feature	Providers Enabled	Use Case
`onnx`	ONNX embeddings	Local CPU/GPU embeddings
`ollama`	Ollama embeddings + LLM	Local models via Ollama
`openai`	OpenAI embeddings	Cloud embeddings (text-embedding-3-large/small)
`openai-llm`	OpenAI	Cloud LLM (gpt-5, gpt-5-codex, gpt-5-codex-mini)
`anthropic`	Anthropic Claude	Cloud LLM (Claude 4.5, Haiku 4.5)
`openai-compatible`	LMStudio, custom providers	OpenAI Responses API compatible
`cloud-jina`	Jina AI embeddings + reranking	Cloud embeddings & Free usage (SOTA and variable dims)
`cloud-surrealdb`	SurrealDB HNSW	Local & Free Cloud-native graph database backend (up-to 1gb)
`cloud`	Jina AI + SurrealDB	All cloud vector & graph features
`faiss`	FAISS vector search	Local vector search graph backend (rocksdb persisted)
`all-cloud-providers`	All cloud LLM providers	Shortcut for Jina + Surreal + Anthropic + OpenAI

Common Build Commands

# Local only (ONNX + Ollama)
cargo build --release --features "onnx,ollama,faiss"

# LM Studio
cargo build --release --features "openai-compatible,faiss"

# Cloud only (Anthropic + OpenAI)
cargo build --release --features "anthropic,openai-llm,openai,faiss"

# Jina AI cloud embeddings + local FAISS
cargo build --release --features "cloud-jina,faiss"

# SurrealDB cloud vector backend
cargo build --release --features "cloud-surrealdb,openai,faiss"

# Full cloud (Jina + SurrealDB + Anthropic)
cargo build --release --features "cloud,anthropic,faiss"

# Everything (local + cloud)
cargo build --release --features "all-cloud-providers,onnx,ollama,cloud,faiss"

⚡ Performance

Speed Metrics (Apple Silicon + LM Studio)

Operation	Performance	Notes
Embedding generation	120 embeddings/sec	LM Studio with MLX
Vector search (local)	2-5ms latency	FAISS with index caching
Vector search (cloud)	2-5ms latency	SurrealDB HNSW
Jina AI embeddings	50-150ms per query	Cloud API call overhead
Jina reranking	80-200ms for top-K	Two-stage retrieval
Ollama embeddings	~60 embeddings/sec	About half LM Studio speed

Optimizations (Enabled by Default)

Optimization	Speedup	Memory Cost
FAISS index cache	10-50×	300-600 MB
Embedding cache	10-100×	~90 MB
Query cache	100×	~10 MB
Parallel search	2-3×	Minimal

🔧 Troubleshooting

Build Issues

"Could not find library faiss"

# Install FAISS first
brew install faiss  # macOS
sudo apt-get install libfaiss-dev  # Ubuntu

"Feature X is not enabled"

Make sure you included the feature flag when building
Example: cargo build --release --features "anthropic,faiss"

Runtime Issues

"API key not found"

Set environment variable: export ANTHROPIC_API_KEY="sk-ant-..."
Or add to config file: anthropic_api_key = "sk-ant-..."

"Model not found"

For Ollama: Run ollama pull <model-name> first
For LM Studio: Download the model in LM Studio app
For cloud: Check your model name matches available models

"Connection refused"

LM Studio: Make sure the local server is running
Ollama: Check Ollama is running with ollama list
Cloud: Check your internet connection

Getting Help

Check docs/CLOUD_PROVIDERS.md for detailed provider setup
See LMSTUDIO_SETUP.md for LM Studio specifics
Open an issue on GitHub with your error message

📦 Node.js Integration (NAPI Bindings)

Zero-Overhead TypeScript Integration

CodeGraph provides native Node.js bindings through NAPI-RS for seamless TypeScript/JavaScript integration:

Key Features:

🚀 Native Performance: Direct Rust-to-Node.js bindings with zero serialization overhead
📘 Auto-Generated Types: TypeScript definitions generated directly from Rust code
⚡ Async Runtime: Full tokio async support integrated with Node.js event loop
🔄 Hot-Reload Config: Update configuration without restarting your Node.js process
🌐 Dual-Mode Search: Automatic routing between local FAISS and cloud SurrealDB

Installation

Option 1: Direct Install (Recommended)

# Build the addon
cd crates/codegraph-napi
npm install
npm run build

# Install in your project
cd /path/to/your-project
npm install /path/to/codegraph-rust/crates/codegraph-napi

Option 2: Pack and Install

# Build and pack
cd crates/codegraph-napi
npm install
npm run build
npm pack  # Creates codegraph-napi-1.0.0.tgz

# Install in your project
cd /path/to/your-project
npm install /path/to/codegraph-rust/crates/codegraph-napi/codegraph-napi-1.0.0.tgz

API Examples

Semantic Search:

import { semanticSearch } from 'codegraph-napi';

const results = await semanticSearch('find authentication code', {
  limit: 10,
  useCloud: true,      // Use cloud search with automatic fallback
  reranking: true      // Enable Jina reranking (if configured)
});

console.log(`Found ${results.totalCount} results in ${results.searchTimeMs}ms`);
console.log(`Search mode: ${results.modeUsed}`);  // "local" or "cloud"

Configuration Management:

import { getCloudConfig, reloadConfig } from 'codegraph-napi';

// Check cloud feature availability
const config = await getCloudConfig();
console.log('Jina AI enabled:', config.jina_enabled);
console.log('SurrealDB enabled:', config.surrealdb_enabled);

// Hot-reload configuration without restart
await reloadConfig();

Embedding Operations:

import { getEmbeddingStats, countTokens } from 'codegraph-napi';

// Get embedding provider stats
const stats = await getEmbeddingStats();
console.log(`Provider: ${stats.provider}, Dimension: ${stats.dimension}`);

// Count tokens for cost estimation (Jina AI)
const tokens = await countTokens("query text");
console.log(`Token count: ${tokens}`);

Graph Navigation:

import { getNeighbors, getGraphStats } from 'codegraph-napi';

// Get connected nodes
const neighbors = await getNeighbors(nodeId);

// Get graph statistics
const stats = await getGraphStats();
console.log(`Nodes: ${stats.node_count}, Edges: ${stats.edge_count}`);

Build Options

Feature flags for selective compilation:

# Local-only (FAISS, no cloud)
npm run build  # Uses default = ["local"]

# Cloud-only (no FAISS)
npm run build -- --features cloud

# Full build (local + cloud)
npm run build -- --features full

See NAPI README for complete documentation.

🤝 Contributing

We welcome contributions!

# Format code
cargo fmt --all

# Run linter
cargo clippy --workspace --all-targets

# Run tests
cargo test --workspace

Open an issue to discuss large changes before starting.

📄 License

Dual-licensed under MIT and Apache 2.0. See LICENSE-MIT and LICENSE-APACHE for details.

📚 Learn More

NAPI Bindings Guide - Complete TypeScript integration documentation
Cloud Providers Guide - Detailed cloud provider setup
Configuration Reference - All configuration options
Changelog - Version history and release notes
Legacy Docs - Historical experiments and architecture notes

Jakedismo/codegraph-rust

CodeGraph

⚠️ Important: MCP Server Architecture Change

What Changed:

Required Setup for Code-Agent Tools:

Why This Change:

📋 Table of Contents

🎯 Choose Your Setup

Option 1: Local Setup (Free, Private) 🏠

Option 2: LM Studio (Best Performance on Mac) 🚀

Option 3: Cloud Providers (Best Quality) ☁️

Option 4: Hybrid (Mix & Match) 🔀

🛠️ Installation

Prerequisites (All Setups)

Local Setup (ONNX + Ollama)

LM Studio Setup

Cloud Setup (Anthropic, OpenAI, xAI & Jina AI)

Hybrid Setup

⚙️ Configuration

Quick Configuration

Manual Configuration

🚀 Usage

Basic Commands

Using with Claude Desktop

Using with LM Studio

📊 Feature Flags Reference

Common Build Commands

⚡ Performance

Speed Metrics (Apple Silicon + LM Studio)

Optimizations (Enabled by Default)

🔧 Troubleshooting

Build Issues

Runtime Issues

Getting Help

📦 Node.js Integration (NAPI Bindings)

Zero-Overhead TypeScript Integration

Installation

API Examples

Build Options

🤝 Contributing

📄 License

📚 Learn More