Portkey vs LiteLLM vs OpenRouter: LLM Gateway Comparison 2026
Portkey vs LiteLLM vs OpenRouter: LLM Gateway Comparison 2026
TL;DR
Managing multiple LLM providers — OpenAI, Anthropic, Gemini, Mistral — is complex: different SDKs, different pricing, different reliability. LiteLLM is the open-source Python proxy that gives you one OpenAI-compatible API for 100+ LLMs — self-host it and route anywhere. Portkey is the enterprise-grade AI gateway with production features (semantic caching, guardrails, advanced observability) as a managed service or self-hosted. OpenRouter is the SaaS marketplace model — one API key, access to 200+ models, pay per token with their routing, no infrastructure to manage. For enterprise production teams: Portkey. For self-hosted infrastructure control: LiteLLM. For instant multi-model access with zero setup: OpenRouter.
Key Takeaways
- LiteLLM supports 100+ LLM providers — all via an OpenAI-compatible API (
/chat/completions) - Portkey offers semantic caching — cache similar (not just identical) prompts, reducing costs by up to 40%
- OpenRouter has 200+ models — including models not available via direct API (some fine-tuned, some obscure)
- LiteLLM is fully open source (MIT) — self-host on your own infra with full data control
- Portkey GitHub stars: ~8k — fastest-growing enterprise AI gateway
- All three support fallbacks — if GPT-4o fails, automatically retry with Claude Sonnet
- LiteLLM's proxy adds ~10-20ms latency — acceptable for most use cases
The Multi-LLM Problem
In 2026, using a single LLM provider is risky:
- Rate limits — OpenAI's 429 errors during peak hours kill production apps
- Outages — Any provider can go down; no fallback = 100% downtime
- Cost optimization — Route simple queries to cheaper models (GPT-4o Mini, Claude Haiku)
- SDK fragmentation — OpenAI, Anthropic, and Google each have their own SDKs
- Observability gaps — No unified view of costs, latency, and errors across providers
LLM gateways solve all of this with a single unified API.
OpenRouter: Marketplace for 200+ Models
OpenRouter is the simplest option — sign up, get one API key, access 200+ models. They handle routing, load balancing, and fallbacks. You pay OpenRouter per token at competitive rates.
Zero-Config Setup
import OpenAI from "openai";
// OpenRouter is OpenAI-API-compatible — just change the baseURL
const client = new OpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
defaultHeaders: {
"HTTP-Referer": "https://yourapp.com", // Required by OpenRouter
"X-Title": "Your App Name",
},
});
// Now use any model via unified API
async function chat(prompt: string, model: string) {
const response = await client.chat.completions.create({
model, // Any of 200+ supported models
messages: [{ role: "user", content: prompt }],
});
return response.choices[0].message.content;
}
// Use GPT-4o
const gpt4Response = await chat("Explain quantum computing", "openai/gpt-4o");
// Use Claude Sonnet
const claudeResponse = await chat("Write a poem", "anthropic/claude-sonnet-4-5");
// Use Gemini 1.5 Pro
const geminiResponse = await chat("Analyze this data", "google/gemini-pro-1.5");
// Use Llama 3.3 (open source, cheap)
const llamaResponse = await chat("Summarize this", "meta-llama/llama-3.3-70b-instruct");
Model Routing and Fallbacks
// OpenRouter auto-routes when you specify multiple models (beta feature)
const response = await client.chat.completions.create({
model: "openai/gpt-4o",
messages: [{ role: "user", content: prompt }],
// Route config via extra_body
extra_body: {
route: "fallback",
models: [
"openai/gpt-4o",
"anthropic/claude-sonnet-4-5",
"google/gemini-pro-1.5",
],
},
});
Cost Control with Model Selection
// Route based on task complexity to control costs
type TaskType = "simple" | "complex" | "creative";
const MODEL_MAP: Record<TaskType, string> = {
simple: "openai/gpt-4o-mini", // ~$0.15 / 1M input tokens
complex: "openai/gpt-4o", // ~$2.50 / 1M input tokens
creative: "anthropic/claude-sonnet-4-5", // ~$3.00 / 1M input tokens
};
async function intelligentChat(prompt: string, taskType: TaskType) {
const model = MODEL_MAP[taskType];
const response = await client.chat.completions.create({
model,
messages: [{ role: "user", content: prompt }],
});
return { response: response.choices[0].message.content, model };
}
OpenRouter Pricing Overview
Model Input ($/1M) Output ($/1M)
openai/gpt-4o $2.50 $10.00
openai/gpt-4o-mini $0.15 $0.60
anthropic/claude-sonnet-4-5 $3.00 $15.00
anthropic/claude-haiku-3-5 $0.80 $4.00
google/gemini-pro-1.5 $1.25 $5.00
meta-llama/llama-3.3-70b $0.065 $0.10
mistralai/mistral-large $2.00 $6.00
LiteLLM: Open-Source Universal Proxy
LiteLLM is a Python library AND proxy server that translates any LLM call to OpenAI format. Self-host the proxy and get unified routing, load balancing, cost tracking, and fallbacks — with full data ownership.
LiteLLM Python Library (No Server)
import litellm
# Set provider keys
import os
os.environ["OPENAI_API_KEY"] = "..."
os.environ["ANTHROPIC_API_KEY"] = "..."
os.environ["GEMINI_API_KEY"] = "..."
# Universal interface — same call for any model
response = litellm.completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
# Claude — identical interface
response = litellm.completion(
model="claude-sonnet-4-5",
messages=[{"role": "user", "content": "Hello"}]
)
# Gemini
response = litellm.completion(
model="gemini/gemini-pro",
messages=[{"role": "user", "content": "Hello"}]
)
LiteLLM Proxy Server Setup
# litellm-config.yaml — define your models and routing
model_list:
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
- model_name: claude-sonnet
litellm_params:
model: anthropic/claude-sonnet-4-5
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: gemini-pro
litellm_params:
model: gemini/gemini-pro
api_key: os.environ/GEMINI_API_KEY
# Load-balanced group — random/least-latency routing
- model_name: best-available
litellm_params:
model: openai/gpt-4o
model_info:
base_model: openai/gpt-4o
router_settings:
routing_strategy: "least-busy"
num_retries: 3
timeout: 30
litellm_settings:
success_callback: ["langfuse"] # Observability integration
failure_callback: ["langfuse"]
cache:
type: "redis"
host: "redis"
port: 6379
# Start the proxy
litellm --config litellm-config.yaml --port 4000
# Or via Docker
docker run -p 4000:4000 \
-e OPENAI_API_KEY=$OPENAI_API_KEY \
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
-v ./litellm-config.yaml:/app/config.yaml \
ghcr.io/berriai/litellm:main \
--config /app/config.yaml
Calling LiteLLM Proxy from Node.js
import OpenAI from "openai";
// Point to your LiteLLM proxy — OpenAI-compatible
const client = new OpenAI({
baseURL: "http://localhost:4000/v1", // Your LiteLLM proxy
apiKey: "sk-1234", // Virtual key from LiteLLM
});
// All these use the proxy's routing
const gpt4 = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello" }],
});
const claude = await client.chat.completions.create({
model: "claude-sonnet", // Maps to anthropic/claude-sonnet-4-5 via config
messages: [{ role: "user", content: "Hello" }],
});
Fallbacks and Load Balancing
# litellm-config.yaml — fallback configuration
router_settings:
# Primary → fallback chain
fallbacks:
- gpt-4o: ["claude-sonnet", "gemini-pro"]
- claude-sonnet: ["gpt-4o", "gemini-pro"]
# Retry on specific errors
retry_policy:
BadRequestError: 0 # Don't retry 400s
RateLimitError: 3 # Retry rate limits 3 times
TimeoutError: 2 # Retry timeouts twice
ServiceUnavailableError: 3
# Context window fallback — if prompt too long, use bigger model
context_window_fallbacks:
- gpt-4o: ["gpt-4o-128k", "claude-opus-3-5"]
Cost Tracking and Budget Limits
# litellm-config.yaml — per-team budget controls
general_settings:
master_key: "sk-master"
# Virtual keys with budget limits
virtual_keys:
- virtual_key: "sk-team-a"
max_budget: 100.00 # $100 limit
budget_duration: "monthly"
model_access: ["gpt-4o-mini", "claude-haiku"]
- virtual_key: "sk-team-b"
max_budget: 500.00
budget_duration: "monthly"
model_access: ["gpt-4o", "claude-sonnet"]
Portkey: Enterprise AI Gateway
Portkey is the most feature-rich option — designed for enterprise production with semantic caching, guardrails, advanced observability, and AI config management. Available as managed cloud or self-hosted.
Setup (TypeScript SDK)
npm install portkey-ai
import Portkey from "portkey-ai";
const portkey = new Portkey({
apiKey: process.env.PORTKEY_API_KEY,
virtualKey: process.env.OPENAI_VIRTUAL_KEY, // Provider key managed in Portkey vault
});
// OpenAI-compatible interface
const response = await portkey.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello" }],
});
Configs — Define Routing Logic in Dashboard
// Reference a config by ID — routing logic managed in Portkey dashboard
const portkey = new Portkey({
apiKey: process.env.PORTKEY_API_KEY,
config: "pc-production-fallback-abc123", // Config defined in dashboard
});
// Config can include: fallbacks, load balancing, caching, guardrails
// without changing your code
const response = await portkey.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello" }],
});
Inline Config (No Dashboard Required)
import { createConfig } from "portkey-ai";
const response = await portkey.chat.completions.create(
{
model: "gpt-4o",
messages: [{ role: "user", content: "What is 2+2?" }],
},
{
config: createConfig({
// Fallback chain
strategy: { mode: "fallback" },
targets: [
{ virtualKey: process.env.OPENAI_VIRTUAL_KEY },
{ virtualKey: process.env.ANTHROPIC_VIRTUAL_KEY, overrideParams: { model: "claude-sonnet-4-5" } },
{ virtualKey: process.env.GEMINI_VIRTUAL_KEY, overrideParams: { model: "gemini-pro" } },
],
}),
}
);
Semantic Caching
// Portkey's unique semantic cache — matches similar prompts, not just identical
const portkey = new Portkey({
apiKey: process.env.PORTKEY_API_KEY,
virtualKey: process.env.OPENAI_VIRTUAL_KEY,
cache: {
mode: "semantic",
maxAge: 86400, // 24 hours
forceRefresh: false,
},
});
// First call: "What's the weather like in Paris?" → calls OpenAI, caches result
// Second call: "How's the weather in Paris today?" → semantic match → returns cached result (0 tokens)
// Cache hit rate on production apps: typically 30-50% with semantic matching
Guardrails
// Portkey guardrails — input/output validation and transformation
const portkey = new Portkey({
apiKey: process.env.PORTKEY_API_KEY,
virtualKey: process.env.OPENAI_VIRTUAL_KEY,
});
const response = await portkey.chat.completions.create(
{
model: "gpt-4o",
messages: [{ role: "user", content: userInput }],
},
{
config: createConfig({
guardrails: [
{
type: "input",
checks: [
{ id: "portkey.prompt_injection", on_fail: "block" }, // Block prompt injection
{ id: "portkey.pii_detection", on_fail: "anonymize" }, // Anonymize PII
],
},
{
type: "output",
checks: [
{ id: "portkey.toxicity", on_fail: "censor" }, // Censor toxic output
],
},
],
}),
}
);
Feature Comparison
| Feature | OpenRouter | LiteLLM | Portkey |
|---|---|---|---|
| Hosting | SaaS only | Self-hosted (OSS) | Cloud or self-hosted |
| Setup time | <5 min | 30-60 min (self-hosted) | 15-30 min |
| Model count | 200+ | 100+ providers | 50+ providers |
| OpenAI compatible API | ✅ | ✅ | ✅ |
| Provider fallbacks | Partial | ✅ | ✅ |
| Load balancing | Basic | ✅ Advanced | ✅ Advanced |
| Semantic caching | ❌ | Basic (Redis) | ✅ Purpose-built |
| Cost tracking | ✅ Basic | ✅ Per-team budgets | ✅ Advanced |
| Guardrails | ❌ | ❌ | ✅ |
| Prompt versioning | ❌ | ❌ | ✅ |
| Data residency | ❌ (US servers) | ✅ Full control | ✅ Self-hosted option |
| Open source | ❌ | ✅ MIT | Partial |
| GitHub stars | ~2k | ~15k | ~8k |
| Free tier | Pay per token | Open source | 10k requests/mo |
| Enterprise support | ❌ | Community | ✅ |
When to Use Each
Choose OpenRouter if:
- You want zero infrastructure setup — one API key and you're done
- You need access to obscure or experimental models not available via direct API
- Cost-per-token routing across 200+ models without management overhead is your goal
- You're building a prototype or side project
Choose LiteLLM if:
- Data ownership and compliance require self-hosted infrastructure
- You want to build on open-source without vendor lock-in
- Your team uses Python (LiteLLM's native language) and you need a flexible proxy
- You need advanced load balancing with per-team budget controls
Choose Portkey if:
- You need semantic caching to reduce LLM costs at scale
- Guardrails for PII detection and prompt injection are required
- You want a managed enterprise gateway with SLA guarantees
- Prompt management and versioning alongside routing matters
Methodology
Data sourced from GitHub repositories (star counts as of February 2026), official documentation and pricing pages, community benchmarks on HuggingFace forums and AI engineering blogs. Pricing data for OpenRouter models verified from openrouter.ai/models (February 2026). LiteLLM latency estimates from official benchmarks and community reports. Feature availability verified against documentation.
Related: Langfuse vs LangSmith vs Helicone for LLM observability, or Mastra vs LangChain.js vs GenKit for AI agent frameworks.