/embformer

Embformer: An Embedding-Weight-Only Transformer Architecture

Primary LanguagePythonApache License 2.0Apache-2.0

Embformer: An Embedding-Weight-Only Transformer Architecture

DOI demo model Discord Discord

This is the official implementation of Embformer: An Embedding-Weight-Only Transformer Architecture.

Getting Started

Run commands in the terminal:

pip install -r requirements.txt

The following contains a code snippet illustrating how to use the model generate content based on given inputs.

from transformers import AutoModelForCausalLM, AutoTokenizer

# model_name = "HighCWu/Embformer-MiniMind-Base-0.1B"
# model_name = "HighCWu/Embformer-MiniMind-Seqlen512-0.1B"
model_name = "HighCWu/Embformer-MiniMind-0.1B"
# model_name = "HighCWu/Embformer-MiniMind-RLHF-0.1B"
# model_name = "HighCWu/Embformer-MiniMind-R1-0.1B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True,
    cache_dir=".cache"
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
    cache_dir=".cache"
)

# prepare the model input
prompt = "请为我讲解“大语言模型”这个概念。"
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    input_ids=model_inputs['input_ids'],
    attention_mask=model_inputs['attention_mask'],
    max_new_tokens=8192
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

print(tokenizer.decode(output_ids, skip_special_tokens=True))

Training

The current training code is modified based on MiniMind and integrated in the submodule embformer-minimind

License Agreement

All our open-weight models are licensed under Apache 2.0. You can find the license files in the respective Hugging Face repositories.

Citation

If you find our work helpful, feel free to give us a cite.

@manual{wu_2025_15736957,
  title        = {Embformer: An Embedding-Weight-Only Transformer
                   Architecture
                  },
  author       = {Wu, Hecong},
  month        = jun,
  year         = 2025,
  doi          = {10.5281/zenodo.15736957},
  url          = {https://doi.org/10.5281/zenodo.15736957},
}