slm-rag: A JavaScript repository from shizheng-rlfresh

This repo demos a simple template of using transformers.js, LangChain.js, and Deep Chat to create a demo chat.

download project

# clone git repository
git clone https://github.com/shizheng-rlfresh/slm-rag.git
# go to the directory and install dependency
npm install

development mode

# start a serve on localhost
npm run dev -- --open

# build app
npm run build

# preview
npm run preview

model options

To import transformers models through transformers.js, you will need a .onnx model, e.g., model.onnx (preferrably a quantized model, e.g., model_quantized.onnx).
- transformers.js recommends using the conversion script to convert a customized model, e.g., your own pretrained or fine-tuned models.
```
# create a python virtual environment
python -m venv .venv
# activate .venv and install required packages
source .venv/bin/activate
pip install -r requirements.txt
# run the conversion script - <modelid>
python -m scripts.convert --quantize --model_id <modelid>
```
- Push your custom model to hub and craft your huggingface repo files structures as follows, where your converted models are enclosed in onnx directory.
- In this demo, we used a custom gpt2-small (124MM parms) fine-tuned on a conversational dataset, i.e., oasst2. This model was fine-tuned on a NVIDIA Tesla T4 GPU for 20 epochs.
```
// import model from HuggingFace Hub
import { pipeline } from '@xenova/transformers';
// for CuteChat Demo, we used our own model
const pipe = await pipeline('text-generation', 'shi-zheng-qxhs/gpt2_oasst2_curated_onnx');
```
```
// for PDF RAG Demo, we used 'Qwen1.5-0.5B-Chat'
// Note this could be a bit slow running on web
const pipe = await pipeline('text-generation', 'Xenova/Qwen1.5-0.5B-Chat');
```
You can either use pipeline or model.generate as if using transformers in python. In chat.js, we used custom functions to process the user input and model generations, which can be modified based on your own need.
- Deep Chat allows using handler in request to use models imported directly from transformers.js. chat.svelte shows an example of how we handled custom functions, as well as using requestInterceptor and responseInterceptor to process the (user) input and (model generated) output.
langchain.js does not support LLM through transformers.js as of now (and there are open issues on custom LLMs. It is not hard to implement a custom LLM). In this demo code, we chose to use vectorStore from langchain.js (see ragloader.js) and pipeline from transformers'js (see ragchat.js).
- RAG component is implemented rag.svelte with Deep Chat.