/free-llm-api-resources

A list of free LLM inference resources accessible via API.

Primary LanguagePython

Free LLM API resources

This lists various services that provide free access or credits towards API-based LLM usage.

Note

Please don't abuse these services, else we might lose them.

Warning

This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)

Free Providers

Provider Provider Limits/Notes Model Name Model Limits
GroqDistil Whisper Large v37200 audio-seconds/minute
2000 requests/day
Gemma 2 9B Instruct14400 requests/day
15000 tokens/minute
Gemma 7B Instruct14400 requests/day
15000 tokens/minute
LLaVA 1.5 7B14400 requests/day
30000 tokens/minute
Llama 3 70B14400 requests/day
6000 tokens/minute
Llama 3 70B - Groq Tool Use Preview14400 requests/day
15000 tokens/minute
Llama 3 8B14400 requests/day
30000 tokens/minute
Llama 3 8B - Groq Tool Use Preview14400 requests/day
15000 tokens/minute
Llama 3.1 70B14400 requests/day
20000 tokens/minute
Llama 3.1 8B14400 requests/day
20000 tokens/minute
Llama Guard 3 8B14400 requests/day
15000 tokens/minute
Mixtral 8x7B14400 requests/day
5000 tokens/minute
Whisper Large v37200 audio-seconds/minute
2000 requests/day
OpenRouterGemma 2 9B Instruct20 requests/minute
200 requests/day
Hermes 3 Llama 3.1 405B20 requests/minute
200 requests/day
Llama 3 8B Instruct20 requests/minute
200 requests/day
Llama 3.1 8B Instruct20 requests/minute
200 requests/day
Mistral 7B Instruct20 requests/minute
200 requests/day
Mythomist 7B20 requests/minute
200 requests/day
OpenChat 7B20 requests/minute
200 requests/day
Phi-3 Medium 128k Instruct20 requests/minute
200 requests/day
Phi-3 Mini 128k Instruct20 requests/minute
200 requests/day
Qwen 2 7B Instruct20 requests/minute
200 requests/day
Reflection Llama 3.1 70B20 requests/minute
200 requests/day
Toppy M 7B20 requests/minute
200 requests/day
Zephyr 7B Beta20 requests/minute
200 requests/day
Google AI Studio Data is used for training (when used outside of the UK/CH/EEA/EU). Gemini 1.5 Flash 1000000 tokens/minute
1500 requests/day
15 requests/minute
Gemini 1.5 Flash (Experimental) 1000000 tokens/minute
1500 requests/day
5 requests/minute
Gemini 1.5 Pro 32000 tokens/minute
50 requests/day
2 requests/minute
Gemini 1.5 Pro (Experimental) 1000000 tokens/minute
50 requests/day
2 requests/minute
Gemini 1.5 Flash-8B (Experimental) 1000000 tokens/minute
1500 requests/day
15 requests/minute
Gemini 1.0 Pro 32000 tokens/minute
1500 requests/day
15 requests/minute
text-embedding-004 150 batch requests/minute
1500 requests/minute
100 content/batch
embedding-001
Google Cloud Vertex AI Very stringent payment verification for Google Cloud. Llama 3.1 405B Instruct Llama 3.1 API Service free during preview.
60 requests/minute
Gemini Flash Experimental Experimental Gemini model.
10 requests/minute
Gemini Pro Experimental
glhf.chat (Free Beta) Email for API access Any model on Hugging Face runnable on vLLM and fits on a A100 node (~640GB VRAM), including Llama 3.1 405B at FP8
Cohere 20 requests/min
1000 requests/month
Command-R Shared Limit
Command-R+ Shared Limit
HuggingFace Serverless Inference Dynamic Rate Limits.
Limited to models smaller than 10GB.
Some popular models are supported even if they exceed 10GB.
Various open models
OVH AI Endpoints (Free Alpha)Token expires every 2 weeks.CodeLlama 13B Instruct12 requests/minute
Llama 2 13B Chat12 requests/minute
Llama 3 70B Instruct12 requests/minute
Llama 3 8B Instruct12 requests/minute
Mistral 7B Instruct12 requests/minute
Mixtral 8x22B Instruct12 requests/minute
Mixtral 8x7B Instruct12 requests/minute
Cloudflare Workers AI10000 neurons/day
Beta models have unlimited usage.
Typically 300 requests/min for text models.
Deepseek Coder 6.7B Base (AWQ)
Deepseek Coder 6.7B Instruct (AWQ)
Deepseek Math 7B Instruct
Discolm German 7B v1 (AWQ)
Falcom 7B Instruct
Gemma 2B Instruct (LoRA)
Gemma 7B Instruct
Gemma 7B Instruct (LoRA)
Hermes 2 Pro Mistral 7B
Llama 2 13B Chat (AWQ)
Llama 2 7B Chat (FP16)
Llama 2 7B Chat (INT8)
Llama 2 7B Chat (LoRA)
Llama 3 8B Instruct
Llama 3 8B Instruct
Llama 3 8B Instruct (AWQ)
Llama 3.1 8B Instruct
Llama 3.1 8B Instruct (AWQ)
Llama 3.1 8B Instruct (FP8)
LlamaGuard 7B (AWQ)
Mistral 7B Instruct v0.1
Mistral 7B Instruct v0.1 (AWQ)
Mistral 7B Instruct v0.2
Mistral 7B Instruct v0.2 (LoRA)
Neural Chat 7B v3.1 (AWQ)
OpenChat 3.5 0106
OpenHermes 2.5 Mistral 7B (AWQ)
Phi-2
Qwen 1.5 0.5B Chat
Qwen 1.5 1.8B Chat
Qwen 1.5 14B Chat (AWQ)
Qwen 1.5 7B Chat (AWQ)
SQLCoder 7B 2
Starling LM 7B Beta
TinyLlama 1.1B Chat v1.0
Una Cybertron 7B v2 (BF16)
Zephyr 7B Beta (AWQ)
Lambda Labs (Free Preview) Free for a limited time Nous Hermes 3 Llama 3.1 405B (FP8)
Mistral (Codestral) Currently free to use, monthly subscription based, requires phone number verification. Codestral 30 requests/minute
2000 requests/day
Cerebras Waitlist
Free tier restricted to 8K context
Llama 3.1 8B 30 requests/minute, 60000 tokens/minute
900 requests/hour, 1000000 tokens/hour
14400 requests/day, 1000000 tokens/day
Llama 3.1 70B 30 requests/minute, 60000 tokens/minute
900 requests/hour, 1000000 tokens/hour
14400 requests/day, 1000000 tokens/day
SambaNova Cloud Llama 3.1 405B
Llama 3.1 70B
Llama 3.1 8B
GitHub ModelsWaitlist
Rate limits dependent on Copilot subscription tier
AI21-Jamba-Instruct
Cohere Command R
Cohere Command R+
Cohere Embed v3 English
Cohere Embed v3 Multilingual
Meta-Llama-3-70B-Instruct
Meta-Llama-3-8B-Instruct
Meta-Llama-3.1-405B-Instruct
Meta-Llama-3.1-70B-Instruct
Meta-Llama-3.1-8B-Instruct
Mistral Large
Mistral Large (2407)
Mistral Nemo
Mistral Small
OpenAI GPT-4o
OpenAI GPT-4o mini
OpenAI Text Embedding 3 (large)
OpenAI Text Embedding 3 (small)
Phi-3-medium instruct (128k)
Phi-3-medium instruct (4k)
Phi-3-mini instruct (128k)
Phi-3-mini instruct (4k)
Phi-3-small instruct (128k)
Phi-3-small instruct (8k)
Phi-3.5-mini instruct (128k)

Providers with trial credits

Provider Credits Requirements Models
Mistral 2 weeks
1 request/second
500,000 tokens/minute
1,000,000,000 tokens/month
Mistral Open/Proprietary Models
Together $5 Various open models
Fireworks $1 Various open models
OctoAI $10 Various open models
Unify $10 (+$40 for getting into contact) Routes to other providers, various open models and proprietary models (OpenAI, Gemini, Anthropic, Mistral, Perplexity, etc)
DeepInfra $1.80 Various open models
NVIDIA NIM 1000 API calls Various open models
AI21 $10 for 3 months Jamba/Jurrasic-2
NLP Cloud $15 Phone number verification Various open models
Hyperbolic$10DeepSeek V2.5
Hermes 3 Llama 3.1 70B
Llama 3 70B Instruct
Llama 3.1 405B Base
Llama 3.1 405B Base (FP8)
Llama 3.1 405B Instruct
Llama 3.1 70B Instruct
Llama 3.1 8B Instruct
Pixtral 12B (2409)
Qwen2-VL 7B Instruct
Reflection Llama 3.1 70B