Skip to main content

Portkey vs LiteLLM vs OpenRouter: LLM Gateway Comparison 2026

·PkgPulse Team

Portkey vs LiteLLM vs OpenRouter: LLM Gateway Comparison 2026

TL;DR

Managing multiple LLM providers — OpenAI, Anthropic, Gemini, Mistral — is complex: different SDKs, different pricing, different reliability. LiteLLM is the open-source Python proxy that gives you one OpenAI-compatible API for 100+ LLMs — self-host it and route anywhere. Portkey is the enterprise-grade AI gateway with production features (semantic caching, guardrails, advanced observability) as a managed service or self-hosted. OpenRouter is the SaaS marketplace model — one API key, access to 200+ models, pay per token with their routing, no infrastructure to manage. For enterprise production teams: Portkey. For self-hosted infrastructure control: LiteLLM. For instant multi-model access with zero setup: OpenRouter.

Key Takeaways

  • LiteLLM supports 100+ LLM providers — all via an OpenAI-compatible API (/chat/completions)
  • Portkey offers semantic caching — cache similar (not just identical) prompts, reducing costs by up to 40%
  • OpenRouter has 200+ models — including models not available via direct API (some fine-tuned, some obscure)
  • LiteLLM is fully open source (MIT) — self-host on your own infra with full data control
  • Portkey GitHub stars: ~8k — fastest-growing enterprise AI gateway
  • All three support fallbacks — if GPT-4o fails, automatically retry with Claude Sonnet
  • LiteLLM's proxy adds ~10-20ms latency — acceptable for most use cases

The Multi-LLM Problem

In 2026, using a single LLM provider is risky:

  • Rate limits — OpenAI's 429 errors during peak hours kill production apps
  • Outages — Any provider can go down; no fallback = 100% downtime
  • Cost optimization — Route simple queries to cheaper models (GPT-4o Mini, Claude Haiku)
  • SDK fragmentation — OpenAI, Anthropic, and Google each have their own SDKs
  • Observability gaps — No unified view of costs, latency, and errors across providers

LLM gateways solve all of this with a single unified API.


OpenRouter: Marketplace for 200+ Models

OpenRouter is the simplest option — sign up, get one API key, access 200+ models. They handle routing, load balancing, and fallbacks. You pay OpenRouter per token at competitive rates.

Zero-Config Setup

import OpenAI from "openai";

// OpenRouter is OpenAI-API-compatible — just change the baseURL
const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
  defaultHeaders: {
    "HTTP-Referer": "https://yourapp.com",  // Required by OpenRouter
    "X-Title": "Your App Name",
  },
});

// Now use any model via unified API
async function chat(prompt: string, model: string) {
  const response = await client.chat.completions.create({
    model,  // Any of 200+ supported models
    messages: [{ role: "user", content: prompt }],
  });
  return response.choices[0].message.content;
}

// Use GPT-4o
const gpt4Response = await chat("Explain quantum computing", "openai/gpt-4o");

// Use Claude Sonnet
const claudeResponse = await chat("Write a poem", "anthropic/claude-sonnet-4-5");

// Use Gemini 1.5 Pro
const geminiResponse = await chat("Analyze this data", "google/gemini-pro-1.5");

// Use Llama 3.3 (open source, cheap)
const llamaResponse = await chat("Summarize this", "meta-llama/llama-3.3-70b-instruct");

Model Routing and Fallbacks

// OpenRouter auto-routes when you specify multiple models (beta feature)
const response = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: prompt }],
  // Route config via extra_body
  extra_body: {
    route: "fallback",
    models: [
      "openai/gpt-4o",
      "anthropic/claude-sonnet-4-5",
      "google/gemini-pro-1.5",
    ],
  },
});

Cost Control with Model Selection

// Route based on task complexity to control costs
type TaskType = "simple" | "complex" | "creative";

const MODEL_MAP: Record<TaskType, string> = {
  simple: "openai/gpt-4o-mini",            // ~$0.15 / 1M input tokens
  complex: "openai/gpt-4o",                // ~$2.50 / 1M input tokens
  creative: "anthropic/claude-sonnet-4-5", // ~$3.00 / 1M input tokens
};

async function intelligentChat(prompt: string, taskType: TaskType) {
  const model = MODEL_MAP[taskType];
  const response = await client.chat.completions.create({
    model,
    messages: [{ role: "user", content: prompt }],
  });
  return { response: response.choices[0].message.content, model };
}

OpenRouter Pricing Overview

Model                          Input ($/1M)   Output ($/1M)
openai/gpt-4o                    $2.50          $10.00
openai/gpt-4o-mini               $0.15           $0.60
anthropic/claude-sonnet-4-5      $3.00          $15.00
anthropic/claude-haiku-3-5       $0.80           $4.00
google/gemini-pro-1.5            $1.25           $5.00
meta-llama/llama-3.3-70b         $0.065          $0.10
mistralai/mistral-large          $2.00           $6.00

LiteLLM: Open-Source Universal Proxy

LiteLLM is a Python library AND proxy server that translates any LLM call to OpenAI format. Self-host the proxy and get unified routing, load balancing, cost tracking, and fallbacks — with full data ownership.

LiteLLM Python Library (No Server)

import litellm

# Set provider keys
import os
os.environ["OPENAI_API_KEY"] = "..."
os.environ["ANTHROPIC_API_KEY"] = "..."
os.environ["GEMINI_API_KEY"] = "..."

# Universal interface — same call for any model
response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

# Claude — identical interface
response = litellm.completion(
    model="claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Hello"}]
)

# Gemini
response = litellm.completion(
    model="gemini/gemini-pro",
    messages=[{"role": "user", "content": "Hello"}]
)

LiteLLM Proxy Server Setup

# litellm-config.yaml — define your models and routing
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY

  - model_name: claude-sonnet
    litellm_params:
      model: anthropic/claude-sonnet-4-5
      api_key: os.environ/ANTHROPIC_API_KEY

  - model_name: gemini-pro
    litellm_params:
      model: gemini/gemini-pro
      api_key: os.environ/GEMINI_API_KEY

  # Load-balanced group — random/least-latency routing
  - model_name: best-available
    litellm_params:
      model: openai/gpt-4o
    model_info:
      base_model: openai/gpt-4o

router_settings:
  routing_strategy: "least-busy"
  num_retries: 3
  timeout: 30

litellm_settings:
  success_callback: ["langfuse"]  # Observability integration
  failure_callback: ["langfuse"]
  cache:
    type: "redis"
    host: "redis"
    port: 6379
# Start the proxy
litellm --config litellm-config.yaml --port 4000

# Or via Docker
docker run -p 4000:4000 \
  -e OPENAI_API_KEY=$OPENAI_API_KEY \
  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
  -v ./litellm-config.yaml:/app/config.yaml \
  ghcr.io/berriai/litellm:main \
  --config /app/config.yaml

Calling LiteLLM Proxy from Node.js

import OpenAI from "openai";

// Point to your LiteLLM proxy — OpenAI-compatible
const client = new OpenAI({
  baseURL: "http://localhost:4000/v1",  // Your LiteLLM proxy
  apiKey: "sk-1234",  // Virtual key from LiteLLM
});

// All these use the proxy's routing
const gpt4 = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});

const claude = await client.chat.completions.create({
  model: "claude-sonnet",  // Maps to anthropic/claude-sonnet-4-5 via config
  messages: [{ role: "user", content: "Hello" }],
});

Fallbacks and Load Balancing

# litellm-config.yaml — fallback configuration
router_settings:
  # Primary → fallback chain
  fallbacks:
    - gpt-4o: ["claude-sonnet", "gemini-pro"]
    - claude-sonnet: ["gpt-4o", "gemini-pro"]

  # Retry on specific errors
  retry_policy:
    BadRequestError: 0     # Don't retry 400s
    RateLimitError: 3      # Retry rate limits 3 times
    TimeoutError: 2        # Retry timeouts twice
    ServiceUnavailableError: 3

  # Context window fallback — if prompt too long, use bigger model
  context_window_fallbacks:
    - gpt-4o: ["gpt-4o-128k", "claude-opus-3-5"]

Cost Tracking and Budget Limits

# litellm-config.yaml — per-team budget controls
general_settings:
  master_key: "sk-master"

# Virtual keys with budget limits
virtual_keys:
  - virtual_key: "sk-team-a"
    max_budget: 100.00   # $100 limit
    budget_duration: "monthly"
    model_access: ["gpt-4o-mini", "claude-haiku"]

  - virtual_key: "sk-team-b"
    max_budget: 500.00
    budget_duration: "monthly"
    model_access: ["gpt-4o", "claude-sonnet"]

Portkey: Enterprise AI Gateway

Portkey is the most feature-rich option — designed for enterprise production with semantic caching, guardrails, advanced observability, and AI config management. Available as managed cloud or self-hosted.

Setup (TypeScript SDK)

npm install portkey-ai
import Portkey from "portkey-ai";

const portkey = new Portkey({
  apiKey: process.env.PORTKEY_API_KEY,
  virtualKey: process.env.OPENAI_VIRTUAL_KEY,  // Provider key managed in Portkey vault
});

// OpenAI-compatible interface
const response = await portkey.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});

Configs — Define Routing Logic in Dashboard

// Reference a config by ID — routing logic managed in Portkey dashboard
const portkey = new Portkey({
  apiKey: process.env.PORTKEY_API_KEY,
  config: "pc-production-fallback-abc123",  // Config defined in dashboard
});

// Config can include: fallbacks, load balancing, caching, guardrails
// without changing your code
const response = await portkey.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});

Inline Config (No Dashboard Required)

import { createConfig } from "portkey-ai";

const response = await portkey.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [{ role: "user", content: "What is 2+2?" }],
  },
  {
    config: createConfig({
      // Fallback chain
      strategy: { mode: "fallback" },
      targets: [
        { virtualKey: process.env.OPENAI_VIRTUAL_KEY },
        { virtualKey: process.env.ANTHROPIC_VIRTUAL_KEY, overrideParams: { model: "claude-sonnet-4-5" } },
        { virtualKey: process.env.GEMINI_VIRTUAL_KEY, overrideParams: { model: "gemini-pro" } },
      ],
    }),
  }
);

Semantic Caching

// Portkey's unique semantic cache — matches similar prompts, not just identical
const portkey = new Portkey({
  apiKey: process.env.PORTKEY_API_KEY,
  virtualKey: process.env.OPENAI_VIRTUAL_KEY,
  cache: {
    mode: "semantic",
    maxAge: 86400,  // 24 hours
    forceRefresh: false,
  },
});

// First call: "What's the weather like in Paris?" → calls OpenAI, caches result
// Second call: "How's the weather in Paris today?" → semantic match → returns cached result (0 tokens)
// Cache hit rate on production apps: typically 30-50% with semantic matching

Guardrails

// Portkey guardrails — input/output validation and transformation
const portkey = new Portkey({
  apiKey: process.env.PORTKEY_API_KEY,
  virtualKey: process.env.OPENAI_VIRTUAL_KEY,
});

const response = await portkey.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [{ role: "user", content: userInput }],
  },
  {
    config: createConfig({
      guardrails: [
        {
          type: "input",
          checks: [
            { id: "portkey.prompt_injection", on_fail: "block" },  // Block prompt injection
            { id: "portkey.pii_detection", on_fail: "anonymize" }, // Anonymize PII
          ],
        },
        {
          type: "output",
          checks: [
            { id: "portkey.toxicity", on_fail: "censor" },          // Censor toxic output
          ],
        },
      ],
    }),
  }
);

Feature Comparison

FeatureOpenRouterLiteLLMPortkey
HostingSaaS onlySelf-hosted (OSS)Cloud or self-hosted
Setup time<5 min30-60 min (self-hosted)15-30 min
Model count200+100+ providers50+ providers
OpenAI compatible API
Provider fallbacksPartial
Load balancingBasic✅ Advanced✅ Advanced
Semantic cachingBasic (Redis)✅ Purpose-built
Cost tracking✅ Basic✅ Per-team budgets✅ Advanced
Guardrails
Prompt versioning
Data residency❌ (US servers)✅ Full control✅ Self-hosted option
Open source✅ MITPartial
GitHub stars~2k~15k~8k
Free tierPay per tokenOpen source10k requests/mo
Enterprise supportCommunity

When to Use Each

Choose OpenRouter if:

  • You want zero infrastructure setup — one API key and you're done
  • You need access to obscure or experimental models not available via direct API
  • Cost-per-token routing across 200+ models without management overhead is your goal
  • You're building a prototype or side project

Choose LiteLLM if:

  • Data ownership and compliance require self-hosted infrastructure
  • You want to build on open-source without vendor lock-in
  • Your team uses Python (LiteLLM's native language) and you need a flexible proxy
  • You need advanced load balancing with per-team budget controls

Choose Portkey if:

  • You need semantic caching to reduce LLM costs at scale
  • Guardrails for PII detection and prompt injection are required
  • You want a managed enterprise gateway with SLA guarantees
  • Prompt management and versioning alongside routing matters

Methodology

Data sourced from GitHub repositories (star counts as of February 2026), official documentation and pricing pages, community benchmarks on HuggingFace forums and AI engineering blogs. Pricing data for OpenRouter models verified from openrouter.ai/models (February 2026). LiteLLM latency estimates from official benchmarks and community reports. Feature availability verified against documentation.


Related: Langfuse vs LangSmith vs Helicone for LLM observability, or Mastra vs LangChain.js vs GenKit for AI agent frameworks.

Comments

Stay Updated

Get the latest package insights, npm trends, and tooling tips delivered to your inbox.