/SentenceVAE

Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context

Primary LanguagePythonMIT LicenseMIT

Static Badge Static Badge Static Badge

Hongjun An1,2*,Yifan Chen1,2*,Zhe Sun1,2✉ & Xuelong Li1,2✉

1School of Artificial Intelligence, OPtics and ElectroNics(iOPEN), Northwestern PolyTechnical University

2Institute of Artificial Intelligence (TeleAI), China Telecom

English | 简体中文

1.Introduction

Current large language models (LLMs) primarily utilize next-token prediction method for inference, which significantly impedes their processing speed. In this paper, we introduce a novel inference methodology termed next-sentence prediction, aiming at enhancing the inference efficiency of LLMs. We present Sentence Variational Autoencoder (SentenceVAE), which includes a Sentence Encoder to compress multiple tokens in a sentence into a single token, and a Sentence Decoder to reconstruct it.


Fig. 1. The schematic form of SentenceVAE.

By integrating SentenceVAE into the input and output layers of LLMs, we develop Sentence-level LLMs (SLLMs) that employ a sentence-by-sentence inference method.


Fig. 2. (a) The schematic form of published LLMs. (b) The schematic form of SLLMs, which embedded with SentenceVAEs.

The SLLMs can maintain the integrity of the original semantic content by segmenting the context into sentences, thereby improving accuracy while boosting inference speed. Moreover, compared to previous LLMs, SLLMs process fewer tokens over equivalent context length, significantly reducing memory demands for self-attention computation and facilitating the handling of longer context. Extensive experiments on Wanjuan dataset have revealed that the proposed method can accelerate inference speed by 204 ~ 365%, reduce perplexity (PPL) to 46 ~ 75% of its original metric, and decrease memory overhead by 86 ~ 91% for the equivalent context length, compared to previous token-by-token methods.

Model Total Params Average PPL Mean output throughput (toks/s) Mean GPU memory (KB/token)
OPT↓ SLLM↓ Δ↓ OPT↑ SLLM↑ Δ↑ OPT↓ SLLM↓ Δ↓
SLLM-125M-H1 214M 26.75 31.68 +18.4% 214.57 652.78 +204.2% 73.15 12.03 -83.6%
SLLM-125M-H2 226M 44.60 +66.7% 539.80 +151.6% 7.08 -90.3%
SLLM-125M-H4 250M 14.32 -46.5% 332.12 +54.8% 10.00 -86.3%
SLLM-350M-H1 429M 25.18 24.84 -1.4% 144.33 481.39 +233.5% 197.59 29.98 -84.8%
SLLM-350M-H2 450M 14.81 -41.2% 442.23 +206.4% 26.78 -86.4%
SLLM-350M-H4 492M 10.17 -59.6% 315.61 +118.7% 17.73 -91.0%
SLLM-1.3B-H1 1.61B 15.95 8.76 -45.1% 119.07 479.71 +302.9% 400.01 57.07 -85.7%
SLLM-1.3B-H2 1.69B 3.84 -75.9% 553.95 +365.2% 55.14 -86.2%

In addition, by corroborating the Scaling Law, we extrapolated the feasibility of our methodologies to larger-scale models.


Fig. 3. Scaling Law of (a) SLLMs and (b) SVAEs.

2.Quick Start

Installation

Step1. Install SentenceVAE from source.

git clone https://github.com/BestAnHongjun/SentenceVAE.git
cd SentenceVAE
pip3 install -e . # or python3 setup.py develop
Prepare OPT models

Step1. Create a folder named model_repo under SentenceVAE to save OPT series models.

cd SentenceVAE
mkdir -p model_repo

Step2. Navigate to the model_repo directory with cd and initialize git-lfs.

cd model_repo
git lfs install

Step3. Download OPT-125M model for SentenceVAE-768 series and SLLM-125M series.

git clone https://huggingface.co/facebook/opt-125m

Step4. Download OPT-350M model for SentenceVAE-1024 series and SLLM-350M series.

git clone https://huggingface.co/facebook/opt-350m

Step5. Download OPT-1.3B model for Sentence-2048 series and SLLM-1.3B series.

git clone https://huggingface.co/facebook/opt-1.3b
SentenceVAE Demo

Step1. Download a pretrained model from table below.

Model Hidden Size Hidden Layers Loss↓ PPL↓ Download Link
SVAE-768-H1 768 1 1.339 3.605 ModelScope
OpenXLab
SVAE-768-H2 768 2 1.019 2.588 ModelScope
OpenXLab
SVAE-768-H4 768 4 0.5598 1.649 ModelScope
OpenXLab
SVAE-1024-H1 1024 1 0.9266 2.406 ModelScope
OpenXLab
SVAE-1024-H2 1024 2 0.6610 1.845 ModelScope
OpenXLab
SVAE-1024-H4 1024 4 0.3704 1.384 ModelScope
OpenXLab
SVAE-2048-H1 2048 1 0.5165 1.622 ModelScope
OpenXLab
SVAE-2048-H2 2048 2 0.2845 1.292 ModelScope
OpenXLab
SVAE-2048-H4 2048 4 0.1270 1.115 ModelScope
OpenXLab

Step2. Run demo script under tools/demo folder. Here's an example:

cd SentenceVAE

python3 tools/demo/demo_svae.py \
    -c config/SVAE/SVAE-768/svae_768_h4.yaml \
    --checkpoint /path/to/pretrained/checkpoint \
    --input "What's your name?"

Arguments:

  • -c,--config: path to the corresponding configuration file, please reference this folder.
  • --checkpoint: path to the checkpoint file you just downloaded.
  • --input: A sentence you want to test.
    • It must be a separate sentence ending with punctuation marks such as commas, periods, etc. Please refer to the paper for specific reasons.
    • Currently, only English is supported.

The model will compress this sentence into a single vector, decode and restore it for output. In an ideal state, the output and input should be consistent.

SentenceLLM Demo

Notice: Please be aware that, as SFT datasets are typically commercial secrets and difficult for us to access, all the models listed below are pre-trained models, not general-purpose conversation models. Therefore, the PPL (Perplexity) metric should be used to assess model quality, not conversational performance. If you treat them as Q&A models, you're likely to get gibberish outputs (in fact, even our baseline OPT model will output gibberish). We recommend fine-tuning these models on private SFT datasets to explore their potential as general-purpose conversation models.

Step1. Download a pretrained model from table below.

Model Download Link
SLLM-125M-H1 ModelScope
OpenXLab
SLLM-125M-H2 ModelScope
OpenXLab
SLLM-125M-H4 ModelScope
OpenXLab
SLLM-350M-H1 ModelScope
OpenXLab
SLLM-350M-H2 ModelScope
OpenXLab
SLLM-350M-H4 ModelScope
OpenXLab
SLLM-1.3B-H1 ModelScope
OpenXLab
SLLM-1.3B-H2 ModelScope
OpenXLab

Step2. Run demo script under tools/demo folder. Here's an example:

cd SentenceVAE

python3 tools/demo/demo_sllm.py \
    -c config/SLLM/SLLM-125m/sllm_125m_h4_all.yaml \
    --checkpoint /path/to/pretrained/checkpoint \
    --input "What's your name?"

Arguments:

  • -c,--config: path to the corresponding configuration file, please reference this folder.
  • --checkpoint: path to the checkpoint file you just downloaded.
  • --input: Your input sentence.

3.Tutorials

Under writing...

Train Models
Eval Models
Test Benchmarks

4.Cite SentenceVAE

If you use SentenceVAE in your research, please cite our work by using the following BibTeX entry:

@article{an2024sentencevae,
  title={SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context},
  author={An, Hongjun and Chen, Yifan and Sun, Zhe and Li, Xuelong},
  journal={arXiv preprint arXiv:2408.00655},
  year={2024}
}