/dbrx

Code examples and resources for DBRX, a large language model developed by Databricks

Primary LanguagePythonOtherNOASSERTION

DBRX

DBRX is a large language model trained by Databricks, and made available under an open license. This repository contains the minimal code and examples to run inference, as well as a collection of resources and links for using DBRX.

A reference model code can be found in this repository at modeling_dbrx.py.

Note: this model code is supplied for references purposes only, please see the HuggingFace repository for the official supported version.

Model details

DBRX is a Mixture-of-Experts (MoE) model with 132B total parameters and 36B live parameters. We use 16 experts, of which 4 are active during training or inference. DBRX was pre-trained for 12T tokens of text. DBRX has a context length of 32K tokens.

The following models are open-sourced:

Model Description
DBRX Base Pre-trained base model
DBRX Instruct Finetuned model for instruction following

The model was trained using optimized versions of our open source libraries Composer, LLM Foundry, MegaBlocks and Streaming.

For the instruct model, we used the ChatML format. Please see the DBRX Instruct model card for more information on this.

Quick start

To download the weights and tokenizer, please first visit the DBRX HuggingFace page and accept the license. Note: access to the Base model requires manual approval.

We recommend having at least 320GB of memory to run the model.

Then, run:

pip install -r requirements.txt # Or requirements-gpu.txt to use flash attention on GPU(s)
huggingface-cli login           # Add your Hugging Face token in order to access the model
python generate.py              # See generate.py to change the prompt and other settings

For more advanced usage, please see LLM Foundry (chat script, batch generation script)

If you have any package installation issues, we recommend using our Docker image: mosaicml/llm-foundry:2.2.1_cu121_flash2-latest

Inference

Both TensorRT-LLM and vLLM can be used to run optimized inference with DBRX. We have tested both libraries on NVIDIA A100 and H100 systems. To run inference with 16-bit precision, a minimum of 4 x 80GB multi-GPU system is required.

TensorRT-LLM

DBRX support is being added to TensorRT-LLM library: Pending PR

After merging, instructions to build and run DBRX TensorRT engines will be found at: README

vLLM

Please see the vLLM docs for instructions on how to run DBRX with the vLLM engine.

Finetune

An example script to finetune DBRX can be found in our open source library LLM Foundry

Model card

The model cards can be found at:

Integrations

DBRX is available on the Databricks platform through:

The same tools used to train high quality MoE models such as DBRX are available for Databricks customers. Please reach out to us at https://www.databricks.com/company/contact if you are interested in pre-training, finetuning, or deploying your own DBRX models!

Issues

For issues with model output, or community discussion, please use the Hugging Face community forum (instruct, base)

For issues with LLM Foundry, or any of the underlying training libraries, please open an issue on the relevant GitHub repository.

License

Our model weights and code are licensed for both researchers and commercial entities. The Databricks Open Source License can be found at LICENSE, and our Acceptable Use Policy can be found here.

Related Repository

[1] https://github.com/databricks/megablocks