/xgen

Salesforce open-source LLMs with 8k sequence length.

Primary LanguagePythonApache License 2.0Apache-2.0

XGen

Official research release for the family of XGen models (7B) by Salesforce AI Research:

Title: Long Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length

Authors: Erik Nijkamp, Hiroaki Hayashi, Tian Xie, Congying Xia, Bo Pang, Rui Meng, Wojciech Kryscinski, Lifu Tu, Meghana Bhat, Semih Yavuz, Chen Xing, Jesse Vig, Lidiya Murakhovs'ka, Jason Wu, Yingbo Zhou, Shafiq Rayhan Joty, Caiming Xiong.

Models

Model cards are published on the HuggingFace Hub:

The tokenization uses the OpenAI Tiktoken package, which can be installed via pip:

pip install tiktoken

The models can be used as auto-regressive samplers as follows:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Salesforce/xgen-7b-8k-base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Salesforce/xgen-7b-8k-base", torch_dtype=torch.bfloat16)
inputs = tokenizer("The world is", return_tensors="pt")
sample = model.generate(**inputs, max_length=128)
print(tokenizer.decode(sample[0]))

Citation

@misc{XGen,
  title={Long Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length},
  author={Erik Nijkamp, Hiroaki Hayashi, Tian Xie, Congying Xia, Bo Pang, Rui Meng, Wojciech Kryscinski, Lifu Tu, Meghana Bhat, Semih Yavuz, Chen Xing, Jesse Vig, Lidiya Murakhovs'ka, Jason Wu, Yingbo Zhou, Shafiq Rayhan Joty, Caiming Xiong},
  howpublished={Salesforce AI Research Blog},
  year={2023},
  url={https://blog.salesforceairesearch.com/xgen}
}