Add AI Features to Your App: OpenAI vs Anthropic 2026

TL;DR

Both OpenAI and Anthropic SDKs are excellent — the choice is about model capability, pricing, and your specific use case. OpenAI (GPT-4o) leads on multimodal tasks and ecosystem integrations; Anthropic (Claude 3.5 Sonnet) leads on long context, instruction following, and safety constraints. Both support streaming, tool use, and structured output. Pick OpenAI for embeddings + broad ecosystem; pick Anthropic for long documents and nuanced instruction following.

Key Takeaways

OpenAI SDK: 12M weekly downloads, broadest ecosystem, GPT-4o for multimodal
Anthropic SDK: 2M weekly downloads, 200K context window, Claude 3.5 Sonnet leads reasoning benchmarks
Streaming — both support stream: true / stream option for real-time output
Tool use — both support function calling; syntax differs but concepts are identical
Cost tip — use GPT-4o-mini or Claude Haiku for high-volume simple tasks (10-50x cheaper)

Setup

# OpenAI
npm install openai

# Anthropic
npm install @anthropic-ai/sdk

# Both: store API key in environment
# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

Basic Text Generation

// OpenAI
import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Summarize this article in 3 bullets.' },
  ],
  max_tokens: 500,
  temperature: 0.7,
});

console.log(response.choices[0].message.content);
// Token usage
console.log(response.usage); // { prompt_tokens, completion_tokens, total_tokens }

// Anthropic
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const response = await anthropic.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 500,
  system: 'You are a helpful assistant.',
  messages: [
    { role: 'user', content: 'Summarize this article in 3 bullets.' },
  ],
});

console.log(response.content[0].text);
// Token usage
console.log(response.usage); // { input_tokens, output_tokens }

Streaming Responses

// OpenAI streaming
const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Write a haiku.' }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) process.stdout.write(delta);
}

// Anthropic streaming
const stream = anthropic.messages.stream({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Write a haiku.' }],
});

stream.on('text', (text) => {
  process.stdout.write(text);
});

const finalMessage = await stream.finalMessage();
console.log('\nTokens used:', finalMessage.usage);

// Both: streaming in Next.js API routes (App Router)
// app/api/chat/route.ts
import { OpenAIStream, StreamingTextResponse } from 'ai';  // Vercel AI SDK
import OpenAI from 'openai';

const openai = new OpenAI();

export async function POST(req: Request) {
  const { messages } = await req.json();

  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    stream: true,
    messages,
  });

  const stream = OpenAIStream(response);
  return new StreamingTextResponse(stream);
}
// Vercel AI SDK works with both OpenAI and Anthropic
// npm install ai — unified streaming interface

Tool Use (Function Calling)

// OpenAI tool use
const tools: OpenAI.Tool[] = [
  {
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get current weather for a city',
      parameters: {
        type: 'object',
        properties: {
          city: { type: 'string', description: 'City name' },
          unit: { type: 'string', enum: ['celsius', 'fahrenheit'] },
        },
        required: ['city'],
      },
    },
  },
];

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'What\'s the weather in Tokyo?' }],
  tools,
  tool_choice: 'auto',
});

const toolCall = response.choices[0].message.tool_calls?.[0];
if (toolCall) {
  const args = JSON.parse(toolCall.function.arguments);
  // args = { city: "Tokyo" }
  const result = await getWeather(args.city);

  // Continue conversation with tool result
  const followUp = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      { role: 'user', content: 'What\'s the weather in Tokyo?' },
      response.choices[0].message,
      {
        role: 'tool',
        tool_call_id: toolCall.id,
        content: JSON.stringify(result),
      },
    ],
    tools,
  });
}

// Anthropic tool use (same concept, different syntax)
const tools: Anthropic.Tool[] = [
  {
    name: 'get_weather',
    description: 'Get current weather for a city',
    input_schema: {
      type: 'object',
      properties: {
        city: { type: 'string', description: 'City name' },
      },
      required: ['city'],
    },
  },
];

const response = await anthropic.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  tools,
  messages: [{ role: 'user', content: 'What\'s the weather in Tokyo?' }],
});

const toolUse = response.content.find(block => block.type === 'tool_use');
if (toolUse && toolUse.type === 'tool_use') {
  const args = toolUse.input as { city: string };
  const result = await getWeather(args.city);

  // Continue with tool result
  const followUp = await anthropic.messages.create({
    model: 'claude-3-5-sonnet-20241022',
    max_tokens: 1024,
    tools,
    messages: [
      { role: 'user', content: 'What\'s the weather in Tokyo?' },
      { role: 'assistant', content: response.content },
      {
        role: 'user',
        content: [{
          type: 'tool_result',
          tool_use_id: toolUse.id,
          content: JSON.stringify(result),
        }],
      },
    ],
  });
}

Structured Output

// OpenAI: structured output (JSON mode)
import { z } from 'zod';
import { zodResponseFormat } from 'openai/helpers/zod';

const ProductSchema = z.object({
  name: z.string(),
  price: z.number(),
  inStock: z.boolean(),
  tags: z.array(z.string()),
});

const response = await openai.beta.chat.completions.parse({
  model: 'gpt-4o-2024-08-06',  // Must support structured output
  messages: [
    { role: 'user', content: 'Extract product info: Blue Widget, $29.99, available' },
  ],
  response_format: zodResponseFormat(ProductSchema, 'product'),
});

const product = response.choices[0].message.parsed;
// product is typed as z.infer<typeof ProductSchema>

// Anthropic: JSON extraction via prompt
const response = await anthropic.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  messages: [{
    role: 'user',
    content: `Extract product info as JSON: Blue Widget, $29.99, available

    Return ONLY valid JSON matching:
    { "name": string, "price": number, "inStock": boolean, "tags": string[] }`,
  }],
});

const json = JSON.parse(response.content[0].text);
// Anthropic follows instructions reliably — this works well in practice

Cost Management

// Model selection by use case + cost
const models = {
  // Simple tasks: 10-50x cheaper
  simple: {
    openai: 'gpt-4o-mini',        // $0.15/1M input tokens
    anthropic: 'claude-3-haiku-20240307',  // $0.25/1M input tokens
  },
  // Complex reasoning:
  complex: {
    openai: 'gpt-4o',             // $2.50/1M input tokens
    anthropic: 'claude-3-5-sonnet-20241022',  // $3.00/1M input tokens
  },
  // Long documents (Anthropic shines):
  longContext: {
    anthropic: 'claude-3-5-sonnet-20241022',  // 200K context
    openai: 'gpt-4o',             // 128K context
  },
};

// Cache frequently-used prompts (Anthropic prompt caching)
const response = await anthropic.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  system: [
    {
      type: 'text',
      text: 'You are a coding assistant. Here is the full codebase...',
      cache_control: { type: 'ephemeral' },  // Cache this expensive prefix
    },
  ],
  messages: [{ role: 'user', content: 'Find the bug in auth.ts' }],
});
// First request: full price. Subsequent requests: 90% discount on cached tokens

When to Choose Which

Use Case	Pick	Reason
Multimodal (images, audio, video)	OpenAI	GPT-4o Vision leads
Long documents (100K+ tokens)	Anthropic	200K context, better instruction following
Embeddings / vector search	OpenAI	`text-embedding-3` is unmatched
Strict instruction following	Anthropic	Claude is more precise with complex rules
Existing OpenAI integration	OpenAI	No migration needed
Agentic workflows	Either	Both have strong tool use; Anthropic slightly more reliable
High-volume classification	Haiku / Mini	Both cheap models are excellent
Code generation	Anthropic	Claude 3.5 Sonnet leads HumanEval benchmarks

Understanding the Model Capability Split

The choice between OpenAI and Anthropic is increasingly a choice about which tasks you are optimizing for. In 2026, the model capability landscape has stabilized around clear strengths for each provider. OpenAI's GPT-4o leads in multimodal tasks: vision, audio understanding, image generation through the API, and video frame analysis. Its embedding models (text-embedding-3-large and text-embedding-3-small) remain the best-in-class for semantic search and retrieval-augmented generation pipelines, offering better accuracy at lower cost than any alternative including Anthropic's embedding options.

Anthropic's Claude 3.5 Sonnet consistently leads on instruction-following precision, long-context comprehension, and code generation benchmarks. The 200,000-token context window is not just a larger number — it enables genuinely different workflows: analyzing an entire codebase in a single request, processing full financial reports, or maintaining coherent conversation state across very long user sessions. Claude's training emphasizes careful adherence to system prompts, which matters for applications where the model must reliably avoid certain topics, maintain a specific persona, or follow complex output format requirements.

Rate Limiting and Retry Strategy

Both OpenAI and Anthropic enforce rate limits at multiple levels: requests per minute (RPM), tokens per minute (TPM), and daily request caps. Production applications need retry logic with exponential backoff to handle rate limit responses (HTTP 429). Both SDKs include built-in retry handling, but understanding the parameters matters.

OpenAI's SDK retries up to 2 times by default with exponential backoff starting at 0.5 seconds. You can configure this with new OpenAI({ maxRetries: 3 }). Anthropic's SDK similarly retries by default. The important behavior difference: RateLimitError (429) should be retried with backoff, while BadRequestError (400) indicates a problem with your request that retrying won't fix. Both SDKs distinguish these error types in their exception hierarchy.

For high-volume production workloads, the better architecture is a request queue rather than inline retries. Instead of the API call blocking while retrying, failed requests enter a queue and are retried asynchronously with increasing delays. This prevents cascading slowdowns when you hit rate limits — downstream callers are served from the queue rather than blocking on a retry. Tools like BullMQ (Redis-backed) or CloudFlare Queues integrate naturally with both SDKs for this pattern.

Prompt Engineering Differences Between Providers

The same prompt produces different quality results on GPT-4o versus Claude 3.5 Sonnet, and the differences are systematic enough to matter in production. Claude responds better to explicit, structured instructions: numbered steps, XML-style tags to delineate sections, and clear statements of what to do and what not to do. Using <thinking> and <answer> tags in your prompt to ask Claude to structure its reasoning before outputting the final answer consistently improves response quality for complex reasoning tasks.

GPT-4o responds well to the standard OpenAI prompt format: system message for persona and constraints, user message for the request. It is less sensitive to XML-style structuring but benefits from examples (few-shot prompting) for complex output formats. For JSON output, GPT-4o's structured output feature (using Zod schemas) is more reliable than prompt-based JSON extraction, because it guarantees well-formed JSON at the model level rather than relying on the model to correctly format its response.

For applications that need to work with both providers, abstracting the prompt format into a template layer helps. The system prompt and core instruction can be shared, but provider-specific optimizations (XML tags for Anthropic, explicit format examples for OpenAI) can be injected by the abstraction layer.

Error Handling in Production

Production AI integrations need robust error handling that distinguishes between different failure modes and responds appropriately to each. A RateLimitError means you should back off and retry. An AuthenticationError means your API key is invalid and retrying won't help — alert the team. An InternalServerError from the provider is their infrastructure issue, retry with backoff. A BadRequestError means your request is malformed — check your messages array for invalid content.

Both SDKs provide typed error classes that make this branching straightforward. OpenAI's APIError subclasses include RateLimitError, AuthenticationError, PermissionDeniedError, NotFoundError, ConflictError, UnprocessableEntityError, and InternalServerError. Anthropic's error hierarchy follows the same pattern. Wrapping your API calls in structured error handling that routes each error type to the appropriate response (retry, alert, reject) is more maintainable than catching generic errors and inspecting status codes manually.

One production failure mode that both providers share: context length errors. When your messages array exceeds the model's context window, you receive an error rather than a truncated response. Applications that accumulate conversation history need context window management — either truncating the oldest messages, using a summarization step to compress history, or switching to a larger-context model when the conversation grows long. Building this management as an explicit utility function in your AI integration layer prevents hard-to-debug context overflow errors in production.

Cost Attribution and Monitoring

AI API costs scale with token usage, and in production applications, understanding which features consume the most tokens is essential for cost optimization. Both SDKs return token usage in every response (response.usage in OpenAI, response.usage in Anthropic), but aggregating this into useful cost dashboards requires application-level instrumentation.

The recommended pattern is to create a wrapper around your AI SDK calls that logs token usage with context: which feature invoked the call, which model was used, which user triggered it (anonymized if necessary), and the input/output token counts. Storing this in a database or logging platform lets you build cost-per-feature dashboards, identify expensive operations, and set alerts when daily costs exceed thresholds.

Anthropic's prompt caching feature deserves specific cost monitoring: the first request with a cacheable prefix charges full price, subsequent requests charge 10% for the cached portion. The cache_creation_input_tokens and cache_read_input_tokens fields in response.usage let you measure how much caching is actually occurring. If you expect caching to reduce costs by 80% for a particular workflow but usage shows low cache hit rates, the cache is not persisting between requests as expected — often because the cache TTL (5 minutes for ephemeral caching) is shorter than your request interval.

The Vercel AI SDK as a Provider-Agnostic Layer

One practical approach to avoiding SDK lock-in is using the Vercel AI SDK (ai package) as a unified interface over both OpenAI and Anthropic. The AI SDK provides a consistent streamText, generateText, and streamObject API that works with both providers via adapter packages (@ai-sdk/openai, @ai-sdk/anthropic). Switching providers becomes a one-line change in the provider configuration.

The AI SDK's streamText function returns an async iterable over text delta chunks, normalized to the same format regardless of whether you are using OpenAI's or Anthropic's underlying streaming format. Its generateObject and streamObject functions handle structured output generation with Zod schema validation, using OpenAI's structured output feature when available and falling back to prompt-based JSON extraction for Anthropic. This abstraction trades some provider-specific optimization capability for portability and simpler integration code, which is often the right tradeoff for teams building product features rather than AI infrastructure.

Compare AI/LLM package health and download trends on PkgPulse.

The 2026 JavaScript Stack Cheatsheet