/slm-rag

A very simple sveltekit template of chat with your own fine-tuned transformers models

Primary LanguageJavaScriptMIT LicenseMIT

Image 1

This repo demos a simple template of using transformers.js, LangChain.js, and Deep Chat to create a demo chat.

Live Demo

download project

# clone git repository
git clone https://github.com/shizheng-rlfresh/slm-rag.git
# go to the directory and install dependency
npm install

development mode

# start a serve on localhost
npm run dev -- --open

# build app
npm run build

# preview
npm run preview

model options

  • To import transformers models through transformers.js, you will need a .onnx model, e.g., model.onnx (preferrably a quantized model, e.g., model_quantized.onnx).

    # create a python virtual environment
    python -m venv .venv
    # activate .venv and install required packages
    source .venv/bin/activate
    pip install -r requirements.txt
    # run the conversion script - <modelid>
    python -m scripts.convert --quantize --model_id <modelid>
    • Push your custom model to hub and craft your huggingface repo files structures as follows, where your converted models are enclosed in onnx directory. Example hub repo file structure

    • In this demo, we used a custom gpt2-small (124MM parms) fine-tuned on a conversational dataset, i.e., oasst2. This model was fine-tuned on a NVIDIA Tesla T4 GPU for 20 epochs.

    // import model from HuggingFace Hub
    import { pipeline } from '@xenova/transformers';
    // for CuteChat Demo, we used our own model
    const pipe = await pipeline('text-generation', 'shi-zheng-qxhs/gpt2_oasst2_curated_onnx');
    // for PDF RAG Demo, we used 'Qwen1.5-0.5B-Chat'
    // Note this could be a bit slow running on web
    const pipe = await pipeline('text-generation', 'Xenova/Qwen1.5-0.5B-Chat');

    You can either use pipeline or model.generate as if using transformers in python. In chat.js, we used custom functions to process the user input and model generations, which can be modified based on your own need.

    • Deep Chat allows using handler in request to use models imported directly from transformers.js. chat.svelte shows an example of how we handled custom functions, as well as using requestInterceptor and responseInterceptor to process the (user) input and (model generated) output.

    langchain.js does not support LLM through transformers.js as of now (and there are open issues on custom LLMs. It is not hard to implement a custom LLM). In this demo code, we chose to use vectorStore from langchain.js (see ragloader.js) and pipeline from transformers'js (see ragchat.js).

    • RAG component is implemented rag.svelte with Deep Chat.