LLM plugin for models hosted by LiteLLM
First, install the LLM command-line utility.
Now install this plugin in the same environment as LLM.
llm install llm-litellmYou will need an API key from LiteLLM. You can obtain one here.
You can set that as an environment variable called LITELLM_KEY, or add it to the llm set of saved keys using:
llm keys set litellmEnter key: <paste key here>
To list available models, run:
llm models listYou should see a list that looks something like this:
LiteLLM: litellm/openai/gpt-3.5-turbo
LiteLLM: litellm/anthropic/claude-2
LiteLLM: litellm/meta-llama/llama-2-70b-chat
...
To run a prompt against a model, pass its full model ID to the -m option, like this:
llm -m litellm/anthropic/claude-2 "Five spooky names for a pet tarantula"You can set a shorter alias for a model using the llm aliases command like so:
llm aliases set claude litellm/anthropic/claude-2Now you can prompt Claude using:
cat llm_litellm.py | llm -m claude -s 'write some pytest tests for this'Images are supported too, for some models:
llm -m litellm/anthropic/claude-3.5-sonnet 'describe this image' -a https://static.simonwillison.net/static/2024/pelicans.jpg
llm -m litellm/anthropic/claude-3-haiku 'extract text' -a page.pngSome LiteLLM models can accept image attachments. Run this command:
llm models --options -q litellmAnd look for models that list these attachment types:
Attachment types:
application/pdf, image/gif, image/jpeg, image/png, image/webp
You can feed these models images as URLs or file paths, for example:
llm -m litellm/google/gemini-flash-1.5 'describe image' \
-a https://static.simonwillison.net/static/2025/two-pelicans.jpgLLM includes support for schemas, allowing you to control the JSON structure of the output returned by the model.
Some of the models provided by LiteLLM are compatible with this feature, see their full list of structured output models for details.
llm-litellm currently enables schema support for the models in that list. Models have varying levels of quality in their schema support, so test carefully rather than assuming all models will correctly work the same.
llm -m litellm/google/gemini-flash-1.5 'invent 3 cool capybaras' \
--schema-multi 'name,bio'Output:
{
"items": [
{
"bio": "Chill vibes only. Spends most days floating on lily pads, occasionally accepting head scratches from passing frogs.",
"name": "Professor Fluffernutter"
},
{
"bio": "A thrill-seeker! Capybara extraordinaire known for her daring escapes from the local zoo and impromptu skateboarding sessions.",
"name": "Capybara-bara the Bold"
},
{
"bio": "A renowned artist, creating masterpieces using mud, leaves, and her own surprisingly dexterous paws.",
"name": "Michelangelo Capybara"
}
]
}LiteLLM offers comprehensive options for controlling which underlying provider your request is routed to.
You can specify these using the LiteLLM JSON format, then pass that to LLM using the -o provider '{JSON goes here} option:
llm -m litellm/meta-llama/llama-3.1-8b-instruct hi \
-o provider '{"quantizations": ["fp8"]}'This specifies that you would like only providers that support fp8 quantization for that model.
LiteLLM have a partnership with Exa where prompts through any supported model can be augmented with relevant search results from the Exa index - a form of RAG.
Enable this feature using the -o online 1 option:
llm -m litellm/mistralai/mistral-small -o online 1 'key events on march 1st 2025'Consult the LiteLLM documentation for current pricing.
The llm models -q litellm command will display all available models, or you can use this command to see more detailed JSON:
llm litellm modelsOutput starts like this:
- id: latitudegames/wayfarer-large-70b-llama-3.3
name: LatitueGames: Wayfarer Large 70B Llama 3.3
context_length: 128,000
architecture: text->text Llama3
pricing: prompt $0.7/M, completion $0.7/M
- id: thedrummer/skyfall-36b-v2
name: TheDrummer: Skyfall 36B V2
context_length: 64,000
architecture: text->text Other
pricing: prompt $0.5/M, completion $0.5/M
- id: microsoft/phi-4-multimodal-instruct
name: Microsoft: Phi 4 Multimodal Instruct
context_length: 131,072
architecture: text+image->text Other
pricing: prompt $0.07/M, completion $0.14/M, image $0.2476/KAdd --json to get back JSON instead, which looks like this:
[
{
"id": "microsoft/phi-4-multimodal-instruct",
"name": "Microsoft: Phi 4 Multimodal Instruct",
"created": 1741396284,
"description": "Phi-4 Multimodal Instruct is a versatile...",
"context_length": 131072,
"architecture": {
"modality": "text+image->text",
"tokenizer": "Other",
"instruct_type": null
},
"pricing": {
"prompt": "0.00000007",
"completion": "0.00000014",
"image": "0.0002476",
"request": "0",
"input_cache_read": "0",
"input_cache_write": "0",
"web_search": "0",
"internal_reasoning": "0"
},
"top_provider": {
"context_length": 131072,
"max_completion_tokens": null,
"is_moderated": false
},
"per_request_limits": null
}Add --free for a list of just the models that are available for free.
llm litellm models --freeThe llm litellm key command shows you information about your current API key, including rate limits:
llm litellm keyExample output:
{
"label": "sk-or-v1-0fa...240",
"limit": null,
"usage": 0.65017511,
"limit_remaining": null,
"is_free_tier": false,
"rate_limit": {
"requests": 40,
"interval": "10s"
}
}This will default to inspecting the key you have set using llm keys set litellm or using the LITELLM_KEY environment variable.
You can inspect a different key by passing the key itself - or the name of the key in the llm keys list - as the --key option:
llm litellm key --key sk-xxxTo set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd llm-litellm
python3 -m venv venv
source venv/bin/activateNow install the dependencies and test dependencies:
llm install -e '.[test]'To run the tests:
pytestTo update recordings and snapshots, run:
PYTEST_LITELLM_KEY="$(llm keys get litellm)" \
pytest --record-mode=rewrite --inline-snapshot=fix