/swan

This project aims to enable language model inference on FPGAs, supporting AI applications in edge devices and environments with limited resources.

Primary LanguageC++Apache License 2.0Apache-2.0

Swan

A Lightweight Language Model Execution Environment Using FPGA

English | 日本語 | 中文

Swan is an OSS project implemented in C++.
Its goal is to efficiently run language models on general-purpose FPGAs using High-Level Synthesis (HLS).

This project aims to enable language model inference on FPGAs, supporting AI applications in edge devices and environments with limited resources.

Features

  • Versatility: Supports common FPGA boards such as the KV260.
  • Scalability: The source code is written in C++, making customization and extension easy.
  • Lightweight: Considers the size constraints of language models and adopts an efficient architecture.

Dependencies

To build and run Swan, the following tools and libraries are required:

  • CMake
  • g++
  • HLS tools (e.g., Vivado HLS)

Clone & Download Weight Files

To clone the Swan repository, run the following command:

$ git clone git@github.com:turingmotors/swan.git
$ cd swan

Download 15M parameter model from huggingface.co/karpathy/tinyllamas:

wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin -O model/stories15M.bin
wget https://raw.githubusercontent.com/leloykun/llama2.cpp/master/tokenizer.bin -O model/tokenizer.bin

Building

FPGA Environment

See technical blog for details on building Swan in an FPGA environment.

CPU Environment

$ mkdir -p build && cd build
$ cmake ..
$ make && cd ..

Once the build is complete, you can run Swan with the following command:

$ ./build/swan

Command Line Options

Swan supports the following options:

Usage: ./build/swan [options]
Options:
  --weight_path   : Weight file path
  --vocab_path    : Tokenizer file path
  --max_seq       : Maximum sequence length
  --temp          : Temperature for sampling
  --color         : Enable color output
  --log           : Enable log output
  --help, -h      : Show this help message

Reference Projects

This project is inspired by llama2.c.

License

This project is released under the Apache License 2.0.

Contributions

Contributions to Swan are highly welcome. Please submit feedback and improvement suggestions through Issues and Pull Requests.
Turing Inc. is supporting the development of Swan.