/NeMo-Skills

A pipeline to improve skills of large language models

Primary LanguagePythonApache License 2.0Apache-2.0

NeMo Skills

In this repository we provide a pipeline to improve "skills" of large language models (LLMs). Currently we focus on the ability to solve simple mathematical problems, but more skills are coming (such as coding and table understanding).

Our pipeline consists of 3 steps and can be directly applied to any LLM that is supported in NVIDIA's NeMo Toolkit.

  1. Setup
    • Pick a "student" model that you want to improve. E.g. Mistral-7B.
    • [optionally] Pick a "teacher" model (can also use the student model itself). E.g. Mixtral-8x7B.
    • Choose evaluation benchmarks and training datasets. E.g. GSM8K and MATH.
  2. Generate synthetic data
    • Write a couple of examples of solutions that you want the student LLM to learn. E.g. teach it to use code to solve math problems.
    • Run a large-scale generation of diverse solutions on the training datasets showing your examples in the prompt to the teacher model.
    • Filter the generated solutions based on correctness and quality.
  3. Finetune the student model on the generated dataset

We release a series of OpenMath models improved with this pipeline that are one of the best open models for solving mathematical problems and are currently the only state-of-the-art open models that do not rely on OpenAI for data generation!

greedy majority@50
model GSM8K MATH GMS8K MATH
GPT-4 [1] 94.4 56.2 - -
GPT-4 + code [2] 92.9 69.7 - -
OpenMath-CodeLlama-7B (nemo | HF) 75.9 43.6 84.8 55.6
OpenMath-Mistral-7B (nemo | HF) 80.2 44.5 86.9 57.2
OpenMath-CodeLlama-13B (nemo | HF) 78.8 45.5 86.8 57.6
OpenMath-CodeLlama-34B (nemo | HF) 80.7 48.3 88.0 60.2
OpenMath-Llama2-70B (nemo | HF) 84.7 46.3 90.1 58.3
OpenMath-CodeLlama-70B (nemo | HF) 84.6 50.7 90.8 60.4

We also release OpenMathInstruct-1, a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model.

Please see our paper "OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset" for more details!

Getting started

Try to run inference with our models with just a few commands!

We provide all instructions to fully reproduce our results.

If you want to improve your own models or to learn more about our pipeline, read through the relevant docs below.

We also provide a convinient tool for visualizing inference and data analysis

Overview Inference Page Analyze Page
Demo of the tool Demo of the inference page Demo of the analyze page

Supported models and datasets

Any model that is supported by NeMo can be used as a "student". Many popular models are supported, e.g. LLaMA2, CodeLLaMA, Mistral-7B and Mixtral-8x7B. For the "teacher" you can use virtually any openly available LLM, since only inference support is needed.

We currently support the following datasets.

Evaluation:

Training:

Please check out evaluation and finetuning sections to learn more!

Paper and Citation

If you find our work useful, please consider citing us!

@article{toshniwal2024openmath,
  title   = {OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset},
  author  = {Shubham Toshniwal and Ivan Moshkov and Sean Narenthiran and Daria Gitman and Fei Jia and Igor Gitman},
  year    = {2024},
  journal = {arXiv preprint arXiv: Arxiv-2402.10176}
}

Disclaimer: This project is strictly for research purposes, and not an official product from NVIDIA.