llm-mlc

LLM plugin for running models using MLC

Installation

Install this plugin in the same environment as llm.

llm install llm-mlc

You need to install two dependencies manually - mlc-chat-nightly and mlc-ai-nightly - because the installation process differs from one platform to another in a way that is not yet automated.

The steps for this are described in detail on the mlc.ai/package site.

If you are on an Apple Silicon M1/M2 Mac you can run this command:

llm mlc pip install --pre --force-reinstall \
  mlc-ai-nightly \
  mlc-chat-nightly \
  -f https://mlc.ai/wheels

The llm mlc pip command here ensures that pip will run in the same virtual environment as llm itself.

For other systems, follow the instructions here.

Finally, run the llm mlc setup command to complete the installation:

llm mlc setup

This will setup git lfs and use it to download some extra dependencies:

Git LFS is not installed. Should I run 'git lfs install' for you?
Install Git LFS? [y/N]: y
Updated Git hooks.
Git LFS initialized.
Downloading prebuilt binaries...
Cloning into '/Users/simon/Library/Application Support/io.datasette.llm/mlc/dist/prebuilt/lib'...
remote: Enumerating objects: 221, done.
remote: Counting objects: 100% (86/86), done.
remote: Compressing objects: 100% (54/54), done.
remote: Total 221 (delta 59), reused 56 (delta 32), pack-reused 135
Receiving objects: 100% (221/221), 52.06 MiB | 9.13 MiB/s, done.
Resolving deltas: 100% (152/152), done.
Updating files: 100% (60/60), done.
Ready to install models in /Users/simon/Library/Application Support/io.datasette.llm/mlc

Installing models

After installation you will need to download a model using the llm mlc download-model command.

Here's how to download and install Llama 2:

llm mlc download-model Llama-2-7b-chat --alias llama2

This will download around 8GB of content.

You can also use Llama-2-13b-chat (about 15.15GB) or Llama-2-70b-chat (extremely big), though these files are a lot larger.

The -a/--alias is optional, but can be used to set a shorter alias for the model. This can then be used with llm -m <alias> instead of the full name.

The download-model command also takes a URL to one of the MLC repositories on Hugging Face.

For example, to install mlc-chat-WizardLM-13B-V1:

llm mlc download-model https://huggingface.co/mlc-ai/mlc-chat-WizardLM-13B-V1.2-q4f16_1

You can see a full list of models you have installed this way using:

llm mlc models

This will also show the name of the model you should use to activate it, e.g.:

MlcModel: mlc-chat-Llama-2-7b-chat-hf-q4f16_1 (aliases: llama2, Llama-2-7b-chat)

Running a prompt through a model

Once you have downloaded and added a model, you can run a prompt like this:

llm -m Llama-2-7b-chat 'five names for a cute pet ferret'

Great! Here are five cute and creative name suggestions for a pet ferret:

Ferbie - a playful and affectionate name for a friendly and outgoing ferret.

Mr. Whiskers - a suave and sophisticated name for a well-groomed and dignified ferret.

Luna - a celestial and dreamy name for a curious and adventurous ferret.

Felix - a cheerful and energetic name for a lively and mischievous ferret.

Sprinkles - a fun and playful name for a happy and energetic ferret with a sprinkle of mischief.

Remember, the most important thing is to choose a name that you and your ferret will love and enjoy!

And to send a follow-up prompt to continue the current conversation, use -c:

llm -c 'two more'

Of course! Here are two more cute name ideas for a pet ferret:

Digger - a fun and playful name that suits a pet that loves to dig and burrow, and is also a nod to the ferret's natural instincts as a burrower.

Gizmo - a fun and quirky name that suits a pet with a curious and mischievous personality, and is also a nod to the ferret's playful and inventive nature.

Model options

These options are available for all models. They mostly take a floating point value between 0.0 and 1.0.

-o temperature: A higher temperature encourages more diverse outputs, while a lower temperature produces more deterministic outputs.
-o top_p: At each step, we select tokens from the minimal set that has a cumulative probability exceeding this value.
-o repetition_penalty: Controls the likelihood of the model generating repeated texts.
-o max_gen_len: Takes an integer, which controls the maximum length of the generated text.

Use them like this:

llm -m Llama-2-7b-chat \
  -o temperature 0.5 \
  -o top_p 0.9 \
  -o repetition_penalty 0.9 \
  -o max_gen_len 100 \
  'five names for a cute pet ferret'

The MLC documentation has more details on these options.

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd llm-mlc
python3 -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

pip install -e '.[test]'

To run the tests: