Autodoc

⚡ Toolkit for auto-generating codebase documentation using LLMs ⚡

What is this? • Get Started • Community • Contribute

What is this?

Autodoc is a experimental toolkit for for auto-generating codebase documention for git repositories using Large Language Models, like GPT-4 or Alpaca. Autodoc can be installed in your repo in about 5 minutes. It indexes your codebase through a depth-first traversal of all repository contents and calls an LLM to write documentation for each file and folder. These documents can be combined to describe the different components of your system and how they work together.

The generated documentation lives in your codebase, and travels where your code travels. Developers who download your code can use the doc command to ask questions about your codebase and get highly specific answers with reference links back to code files.

In the near future, documentation will be re-indexed as part your CI pipeline, so it is always up-to-date.

Status

Autodoc is in the early stages of development. It is functional, but not ready for production use. Things may break, or not work as expected. If you're interested in working on the core Autodoc framwork, please see contributing. We would love to have your help!

Examples

Below are a few examples of how Autodoc can be used.

Autodoc - This repository contains documentation for itself, generated by Autodoc. It lives in the .autodoc folder. Follow the instructions here to learn how to query it.
TolyGPT.com - TolyGPT is an Autodoc chatbot trained on the Solana validator codebase and deployed to the web for easy access. In the near future, Autodoc will support a web version in additon to the existing CLI tool.

Get Started

Requirements

Autodoc requires Node v18.0.0 or greater. v19.0.0 or greater is recommended. Make sure you're running the proper version:

$ node -v

Example output:

v19.8.1

Install the Autodoc CLI tool as a global NPM module:

$ npm install -g @context-labs/autodoc

This command installs the Autodoc CLI tool that will allow you to create and query Autodoc indexes.

Run doc to see the available commands.

Querying

We'll use the Autodoc repository as an example to demonstrate how querying in Autodoc works.

Clone Autodoc and change directory to get started:

$ git clone https://github.com/context-labs/autodoc.git
$ cd autodoc

Right now Autodoc only supports OpenAI. Make sure you have have your OpenAI API key exported in your current session:

$ export OPENAI_API_KEY=<YOUR_KEY_HERE>

To start the Autodoc query CLI, run:

$ doc q

If this is your first time running doc q, you'll get a screen that prompts you to select which GPT models you have access to. Select whichever is appropriate for your level of access. If you aren't sure, select the first option:

You're now ready to query documentation for the Autodoc repository:

This is the core querying experience. It's very basic right now, with plenty of room of improvement. If you're interested in improving the Autodoc CLI querying experience, checkout this issue :)

Indexing

Follow the steps below to generate documentation for your own repository using Autodoc.

Change directory into the root of your project:

cd $PROJECT_ROOT

Make sure your OpenAI API key is available in the current session:

$ export OPENAI_API_KEY=<YOUR_KEY_HERE>

Run the init command:

doc init

You will be prompted to enter the name of your project, GitHub url, and select which GPT models you have access to. If you aren't sure which models you have access to, select the first option. This command will generate an autodoc.config.json file in the root of your project to store the values. This file should be checked in to git.

Note: Do not skip entering these values or indexing may not work.

Run the index command:

doc index

You should see a screen like this:

This screen estimates the cost of indexing your repository. You can also access this screen via the doc estimate command.

For every file in your project, Autodoc calculates the number of tokens in the file based on the file content. The more lines of code, the larger the number of tokens. Using this number, it determine which model it will use on per file basis, always choosing the cheapest model whose context length supports the number of tokens in the file. If you're interested in helping make model selection configurable in Autodoc, check out this issue.

Note: This naive model selection strategy means that files under ~4000 tokens will be documented using GPT-3.5, which will result in less accurate documenation. We recommend using GPT-4 8K at a minimum. Indexing with GPT-4 results in signficantly better output. You can apply for access here.

For large projects, the cost can be several hundred dollars. View OpenAI pricing here.

In the near future, we will support self-hosted models, such as Llama and Alpaca. Read this issue if you're interesting in contributing to this work.

When you're done repository is done being indexed, you should see a screen like this:

You can now query your application using the steps outlined in querying.

Community

There is a small group of us that are working full time on Autodoc. Join us on Discord, or follow us on Twitter for updates. We'll be posting reguarly and continuing to improve the Autodoc applicatioin. What to contribute? Read below.

Contributing

As an open source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infra, or better documentation.

For detailed information on how to contribute, see here.

ZeroPie/autodoc