This lists various services that provide free access or credits towards API-based LLM usage.
Note
Please don't abuse these services, else we might lose them.
Warning
This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)
Provider | Provider Limits/Notes | Model Name | Model Limits |
---|---|---|---|
Groq | Distil Whisper Large v3 | 7200 audio-seconds/minute 2000 requests/day | |
Gemma 2 9B Instruct | 14400 requests/day 15000 tokens/minute | ||
Gemma 7B Instruct | 14400 requests/day 15000 tokens/minute | ||
LLaVA 1.5 7B | 14400 requests/day 30000 tokens/minute | ||
Llama 3 70B | 14400 requests/day 6000 tokens/minute | ||
Llama 3 70B - Groq Tool Use Preview | 14400 requests/day 15000 tokens/minute | ||
Llama 3 8B | 14400 requests/day 30000 tokens/minute | ||
Llama 3 8B - Groq Tool Use Preview | 14400 requests/day 15000 tokens/minute | ||
Llama 3.1 70B | 14400 requests/day 20000 tokens/minute | ||
Llama 3.1 8B | 14400 requests/day 20000 tokens/minute | ||
Llama Guard 3 8B | 14400 requests/day 15000 tokens/minute | ||
Mixtral 8x7B | 14400 requests/day 5000 tokens/minute | ||
Whisper Large v3 | 7200 audio-seconds/minute 2000 requests/day | ||
OpenRouter | Gemma 2 9B Instruct | 20 requests/minute 200 requests/day | |
Hermes 3 Llama 3.1 405B | 20 requests/minute 200 requests/day | ||
Llama 3 8B Instruct | 20 requests/minute 200 requests/day | ||
Llama 3.1 8B Instruct | 20 requests/minute 200 requests/day | ||
Mistral 7B Instruct | 20 requests/minute 200 requests/day | ||
Mythomist 7B | 20 requests/minute 200 requests/day | ||
OpenChat 7B | 20 requests/minute 200 requests/day | ||
Phi-3 Medium 128k Instruct | 20 requests/minute 200 requests/day | ||
Phi-3 Mini 128k Instruct | 20 requests/minute 200 requests/day | ||
Qwen 2 7B Instruct | 20 requests/minute 200 requests/day | ||
Reflection Llama 3.1 70B | 20 requests/minute 200 requests/day | ||
Toppy M 7B | 20 requests/minute 200 requests/day | ||
Zephyr 7B Beta | 20 requests/minute 200 requests/day | ||
Google AI Studio | Data is used for training (when used outside of the UK/CH/EEA/EU). | Gemini 1.5 Flash | 1000000 tokens/minute 1500 requests/day 15 requests/minute |
Gemini 1.5 Flash (Experimental) | 1000000 tokens/minute 1500 requests/day 5 requests/minute |
||
Gemini 1.5 Pro | 32000 tokens/minute 50 requests/day 2 requests/minute |
||
Gemini 1.5 Pro (Experimental) | 1000000 tokens/minute 50 requests/day 2 requests/minute |
||
Gemini 1.5 Flash-8B (Experimental) | 1000000 tokens/minute 1500 requests/day 15 requests/minute |
||
Gemini 1.0 Pro | 32000 tokens/minute 1500 requests/day 15 requests/minute |
||
text-embedding-004 | 150 batch requests/minute 1500 requests/minute 100 content/batch |
||
embedding-001 | |||
Google Cloud Vertex AI | Very stringent payment verification for Google Cloud. | Llama 3.1 405B Instruct | Llama 3.1 API Service free during preview. 60 requests/minute |
Gemini Flash Experimental | Experimental Gemini model. 10 requests/minute |
||
Gemini Pro Experimental | |||
glhf.chat (Free Beta) | Email for API access | Any model on Hugging Face runnable on vLLM and fits on a A100 node (~640GB VRAM), including Llama 3.1 405B at FP8 | |
Cohere | 20 requests/min 1000 requests/month |
Command-R | Shared Limit |
Command-R+ | Shared Limit | ||
HuggingFace Serverless Inference | Dynamic Rate Limits. Limited to models smaller than 10GB. Some popular models are supported even if they exceed 10GB. |
Various open models | |
OVH AI Endpoints (Free Alpha) | Token expires every 2 weeks. | CodeLlama 13B Instruct | 12 requests/minute |
Llama 2 13B Chat | 12 requests/minute | ||
Llama 3 70B Instruct | 12 requests/minute | ||
Llama 3 8B Instruct | 12 requests/minute | ||
Mistral 7B Instruct | 12 requests/minute | ||
Mixtral 8x22B Instruct | 12 requests/minute | ||
Mixtral 8x7B Instruct | 12 requests/minute | ||
Cloudflare Workers AI | 10000 neurons/day Beta models have unlimited usage. Typically 300 requests/min for text models. | Deepseek Coder 6.7B Base (AWQ) | |
Deepseek Coder 6.7B Instruct (AWQ) | |||
Deepseek Math 7B Instruct | |||
Discolm German 7B v1 (AWQ) | |||
Falcom 7B Instruct | |||
Gemma 2B Instruct (LoRA) | |||
Gemma 7B Instruct | |||
Gemma 7B Instruct (LoRA) | |||
Hermes 2 Pro Mistral 7B | |||
Llama 2 13B Chat (AWQ) | |||
Llama 2 7B Chat (FP16) | |||
Llama 2 7B Chat (INT8) | |||
Llama 2 7B Chat (LoRA) | |||
Llama 3 8B Instruct | |||
Llama 3 8B Instruct | |||
Llama 3 8B Instruct (AWQ) | |||
Llama 3.1 8B Instruct | |||
Llama 3.1 8B Instruct (AWQ) | |||
Llama 3.1 8B Instruct (FP8) | |||
LlamaGuard 7B (AWQ) | |||
Mistral 7B Instruct v0.1 | |||
Mistral 7B Instruct v0.1 (AWQ) | |||
Mistral 7B Instruct v0.2 | |||
Mistral 7B Instruct v0.2 (LoRA) | |||
Neural Chat 7B v3.1 (AWQ) | |||
OpenChat 3.5 0106 | |||
OpenHermes 2.5 Mistral 7B (AWQ) | |||
Phi-2 | |||
Qwen 1.5 0.5B Chat | |||
Qwen 1.5 1.8B Chat | |||
Qwen 1.5 14B Chat (AWQ) | |||
Qwen 1.5 7B Chat (AWQ) | |||
SQLCoder 7B 2 | |||
Starling LM 7B Beta | |||
TinyLlama 1.1B Chat v1.0 | |||
Una Cybertron 7B v2 (BF16) | |||
Zephyr 7B Beta (AWQ) | |||
Lambda Labs (Free Preview) | Free for a limited time | Nous Hermes 3 Llama 3.1 405B (FP8) | |
Mistral (Codestral) | Currently free to use, monthly subscription based, requires phone number verification. | Codestral | 30 requests/minute 2000 requests/day |
Cerebras | Waitlist Free tier restricted to 8K context |
Llama 3.1 8B | 30 requests/minute, 60000 tokens/minute 900 requests/hour, 1000000 tokens/hour 14400 requests/day, 1000000 tokens/day |
Llama 3.1 70B | 30 requests/minute, 60000 tokens/minute 900 requests/hour, 1000000 tokens/hour 14400 requests/day, 1000000 tokens/day |
||
SambaNova Cloud | Llama 3.1 405B | ||
Llama 3.1 70B | |||
Llama 3.1 8B | |||
GitHub Models | Waitlist Rate limits dependent on Copilot subscription tier | AI21-Jamba-Instruct | |
Cohere Command R | |||
Cohere Command R+ | |||
Cohere Embed v3 English | |||
Cohere Embed v3 Multilingual | |||
Meta-Llama-3-70B-Instruct | |||
Meta-Llama-3-8B-Instruct | |||
Meta-Llama-3.1-405B-Instruct | |||
Meta-Llama-3.1-70B-Instruct | |||
Meta-Llama-3.1-8B-Instruct | |||
Mistral Large | |||
Mistral Large (2407) | |||
Mistral Nemo | |||
Mistral Small | |||
OpenAI GPT-4o | |||
OpenAI GPT-4o mini | |||
OpenAI Text Embedding 3 (large) | |||
OpenAI Text Embedding 3 (small) | |||
Phi-3-medium instruct (128k) | |||
Phi-3-medium instruct (4k) | |||
Phi-3-mini instruct (128k) | |||
Phi-3-mini instruct (4k) | |||
Phi-3-small instruct (128k) | |||
Phi-3-small instruct (8k) | |||
Phi-3.5-mini instruct (128k) |
Provider | Credits | Requirements | Models |
---|---|---|---|
Mistral | 2 weeks 1 request/second 500,000 tokens/minute 1,000,000,000 tokens/month |
Mistral Open/Proprietary Models | |
Together | $5 | Various open models | |
Fireworks | $1 | Various open models | |
OctoAI | $10 | Various open models | |
Unify | $10 (+$40 for getting into contact) | Routes to other providers, various open models and proprietary models (OpenAI, Gemini, Anthropic, Mistral, Perplexity, etc) | |
DeepInfra | $1.80 | Various open models | |
NVIDIA NIM | 1000 API calls | Various open models | |
AI21 | $10 for 3 months | Jamba/Jurrasic-2 | |
NLP Cloud | $15 | Phone number verification | Various open models |
Hyperbolic | $10 | DeepSeek V2.5 | |
Hermes 3 Llama 3.1 70B | |||
Llama 3 70B Instruct | |||
Llama 3.1 405B Base | |||
Llama 3.1 405B Base (FP8) | |||
Llama 3.1 405B Instruct | |||
Llama 3.1 70B Instruct | |||
Llama 3.1 8B Instruct | |||
Pixtral 12B (2409) | |||
Qwen2-VL 7B Instruct | |||
Reflection Llama 3.1 70B |