/sourcebot

Blazingly fast code search 🏎️ Deployed as a single Docker image 📦 Search million+ lines of code in your GitHub and GitLab repositories 🪄 MIT licensed ✅

Primary LanguageTypeScriptMIT LicenseMIT

Blazingly fast code search 🏎️

About

Sourcebot is a fast code indexing and search tool for your codebases. It is built ontop of the zoekt indexer, originally authored by Han-Wen Nienhuys and now maintained by Sourcegraph.

demo.mp4

Features

  • 💻 One-command deployment: Get started instantly using Docker on your own machine.
  • 🔍 Multi-repo search: Effortlessly index and search through multiple public and private repositories in GitHub or GitLab.
  • Lightning fast performance: Built on top of the powerful Zoekt search engine.
  • 📂 Full file visualization: Instantly view the entire file when selecting any search result.
  • 🎨 Modern web app: Enjoy a sleek interface with features like syntax highlighting, light/dark mode, and vim-style navigation

You can try out our public hosted demo here!

Getting Started

Get started with a single docker command:

docker run -p 3000:3000 --rm --name sourcebot ghcr.io/sourcebot-dev/sourcebot:latest

Navigate to localhost:3000 to start searching the Sourcebot repo. Want to search your own repos? Checkout how to configure Sourcebot.

What does this command do?

Configuring Sourcebot

Sourcebot supports indexing and searching through public and private repositories hosted on GitHub icon GitHub and GitLab. This section will guide you through configuring the repositories that Sourcebot indexes.

  1. Create a new folder on your machine that stores your configs and .sourcebot cache, and navigate into it:

    mkdir sourcebot_workspace
    cd sourcebot_workspace
  2. Create a new config following the configuration schema to specify which repositories Sourcebot should index. For example, to index llama.cpp:

    touch my_config.json
    echo '{
        "$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/index.json",
        "Configs": [
            {
                "Type": "github",
                "GitHubUser": "ggerganov",
                "Name": "^llama\\.cpp$"
            }
        ]
    }' > my_config.json

    (For more examples, see example-config.json. For additional usage information, see the configuration schema).

  3. Run Sourcebot and point it to the new config you created with the -e CONFIG_PATH flag:

    docker run -p 3000:3000 --rm --name sourcebot -v $(pwd):/data -e CONFIG_PATH=/data/my_config.json ghcr.io/sourcebot-dev/sourcebot:latest
    What does this command do?
    • Pull and run the Sourcebot docker image from ghcr.io/sourcebot-dev/sourcebot:latest.
    • Mount the current directory (-v $(pwd):/data) to allow Sourcebot to persist the .sourcebot cache.
    • Mirrors (clones) llama.cpp at HEAD into .sourcebot/github/ggerganov/llama.cpp.
    • Indexes llama.cpp into a .zoekt index file in .sourcebot/index/.
    • Map port 3000 between your machine and the docker image.
    • Starts the web server on port 3000.

    You should see a .sourcebot folder in your current directory. This folder stores a cache of the repositories zoekt has indexed. The HEAD commit of a repository is re-indexed every hour. Indexing private repos? See Providing an access token.

    [!WARNING] Depending on the size of your repo(s), SourceBot could take a couple of minutes to finish indexing. SourceBot doesn't currently support displaying indexing progress in real-time, so please be patient while it finishes. You can track the progress manually by investigating the .sourcebot cache in your workspace.

    Using GitLab?

    tl;dr: A GITLAB_TOKEN is required to index GitLab repositories (both private & public). See Providing an access token.

    Currently, the GitLab indexer is restricted to only indexing repositories that the associated GITLAB_TOKEN has access to. For example, if the token has access to foo, bar, and baz repositories, the following config will index all three:

    {
        "$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/index.json",
        "Configs": [
            {
                "Type": "gitlab"
            }
        ]
    }

    See Providing an access token.


Providing an access token

This will depend on the code hosting platform you're using:

GitHub icon GitHub

In order to index private repositories, you'll need to generate a GitHub Personal Access Token (PAT) and pass it to Sourcebot. Create a new PAT here and make sure you select the repo scope:

GitHub PAT creation

You'll need to pass this PAT each time you run Sourcebot by setting the GITHUB_TOKEN environment variable:

docker run -p 3000:3000 --rm --name sourcebot -e GITHUB_TOKEN=[your-github-token] -e CONFIG_PATH=/data/my_config.json -v $(pwd):/data ghcr.io/sourcebot-dev/sourcebot:latest
GitLab

[!NOTE] An access token is required to index GitLab repositories (both private & public) since the GitLab indexer needs the token to determine which repositories to index. See example-config.json for example usage.

Generate a GitLab Personal Access Token (PAT) here and make sure you select the read_api scope:

GitLab PAT creation

You'll need to pass this PAT each time you run Sourcebot by setting the GITLAB_TOKEN environment variable:

docker run -p 3000:3000 --rm --name sourcebot -e GITLAB_TOKEN=[your-gitlab-token] -e CONFIG_PATH=/data/my_config.json -v $(pwd):/data ghcr.io/sourcebot-dev/sourcebot:latest

Using a self-hosted GitLab / GitHub instance

If you're using a self-hosted GitLab or GitHub instance with a custom domain, there is some additional config required:

GitHub icon GitHub
  1. In your config, add the GitHubURL field to point to your deployment's URL. For example:
    {
        "$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/index.json",
        "Configs": [
            {
                "Type": "github",
                "GitHubUrl": "https://github.example.com"
            }
        ]
    }
    
  2. Set the GITHUB_HOSTNAME environment variable to your deployment's hostname. For example:
     docker run -e GITHUB_HOSTNAME=github.example.com /* additional args */ ghcr.io/sourcebot-dev/sourcebot:latest
     
GitLab
  1. In your config, add the GitLabURL field to point to your deployment's URL. For example:

    {
        "$schema": "https://raw.githubusercontent.com/sourcebot-dev/sourcebot/main/schemas/index.json",
        "Configs": [
            {
                "Type": "gitlab",
                "GitLabURL": "https://gitlab.example.com"
            }
        ]
    }
  2. Set the GITLAB_HOSTNAME environment variable to your deployment's hostname. For example:

     docker run -e GITLAB_HOSTNAME=gitlab.example.com /* additional args */ ghcr.io/sourcebot-dev/sourcebot:latest
     

Build from source

Note

Building from source is only required if you'd like to contribute. The recommended way to use Sourcebot is to use the pre-built docker image.

  1. Install go and NodeJS. Note that a NodeJS version of at least 21.1.0 is required.

  2. Install ctags (required by zoekt-indexserver)

    // macOS:
    brew install universal-ctags
    
    // Linux:
    snap install universal-ctags
  3. Clone the repository with submodules:

    git clone --recurse-submodules https://github.com/sourcebot-dev/sourcebot.git
  4. Run make to build zoekt and install dependencies:

    cd sourcebot
    make

    The zoekt binaries and web dependencies are placed into bin and node_modules respectively.

  5. Create a config.json file at the repository root. See Configuring Sourcebot for more information.

  6. (Optional) Depending on your config.json, you may need to pass an access token to Sourcebot:

    GitHub icon GitHub

    First, generate a personal access token (PAT). See Providing an access token.

    Next, Create a text file named .github-token in your home directory and paste the token in it. The file should look like:

    ghp_...

    zoekt will read this file to authenticate with GitHub.

    GitLab First, generate a personal access token (PAT). See [Providing an access token](#providing-an-access-token).

    Next, Create a text file named .gitlab-token in your home directory and paste the token in it. The file should look like:

    glpat-...

    zoekt will read this file to authenticate with GitLab.

  7. Start Sourcebot with the command:

    yarn dev

    A .sourcebot directory will be created and zoekt will begin to index the repositories found given config.json.

  8. Start searching at http://localhost:3000.

Telemetry

By default, Sourcebot collects anonymized usage data through PostHog to help us improve the performance and reliability of our tool. We do not collect or transmit any information related to your codebase. In addition, all events are sanitized to ensure that no sensitive or identifying details leave your machine. The data we collect includes general usage statistics and metadata such as query performance (e.g., search duration, error rates) to monitor the application's health and functionality. This information helps us better understand how Sourcebot is used and where improvements can be made :)

If you'd like to disable all telemetry, you can do so by setting the environment variable SOURCEBOT_TELEMETRY_DISABLED to 1 in the docker run command:

docker run -e SOURCEBOT_TELEMETRY_DISABLED=1 /* additional args */ ghcr.io/sourcebot-dev/sourcebot:latest

Or if you are building locally, create a .env.local file at the repository root with the following contents:

SOURCEBOT_TELEMETRY_DISABLED=1
NEXT_PUBLIC_SOURCEBOT_TELEMETRY_DISABLED=1