Choose Chat Completions if: Building a stateless API (classify, summarize, extract) — no multi-turn needed You need maximum model flexibility and control Custom state storage in your own DB (Redis, Postgres) Cost optimization is critical — no overhead from managed state Existing codebase built on Chat Completions patterns Need response_format: json_schema for guaranteed structure Choose Responses API if: Multi-turn chat where OpenAI managing state is preferable Need built-in web search without t

OpenAI Chat Completions vs Responses vs Assistants 2026

TL;DR

OpenAI offers three distinct API surfaces for building AI-powered applications, each targeting a different level of abstraction. Chat Completions is the foundational stateless API — one request, one response, you manage conversation history yourself; it's the most flexible, best understood, and the right default for most production apps. Responses API (launched early 2025) is OpenAI's new unified API that adds built-in conversation state, multi-turn turn-taking, and direct tool result handling with a cleaner DX than raw Completions — it's the future direction for OpenAI's API surface. Assistants API is the high-level managed stateful agent API — persistent Threads, file search, code interpreter, vector store integration, all server-side; it removes boilerplate but trades control for convenience and costs more per token due to managed state overhead. For simple chat or inference: Chat Completions. For multi-step agentic flows with conversation state: Responses API. For document Q&A with minimal code: Assistants API.

Key Takeaways

Chat Completions is stateless — you send the full message history every request, you own persistence
Responses API maintains state server-side — previous_response_id chains turns without resending history
Assistants API manages Threads — persistent conversation objects with full OpenAI-managed lifecycle
Responses API supports built-in tool calls — web search, file search, computer use as first-class tools
Assistants API has Code Interpreter — runs Python sandboxed, generates charts, processes files
Chat Completions is cheapest — no state overhead, pay only for tokens in/out
Responses API deprecated Assistants API patterns — OpenAI is converging on Responses as the primary stateful API

API Architecture Overview

Chat Completions     Responses API        Assistants API
─────────────────    ─────────────────    ─────────────────
Stateless            Stateful             Stateful + Managed
You own history      Server state chain   Server Threads/Runs
Raw tool results     Built-in tools       Code Interpreter
No file search       Built-in file search File Search tool
Standard streaming   Streaming events     Streaming runs
Cheapest             Mid-cost             Most expensive

Chat Completions: The Stateless Foundation

Chat Completions is the workhorse API — every OpenAI model is available, every feature is supported, and you have complete control.

Installation

npm install openai

Basic Request

import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What is the capital of France?" },
  ],
  temperature: 0.7,
  max_tokens: 500,
});

console.log(response.choices[0].message.content);
// "The capital of France is Paris."

Multi-Turn Conversation (Manual History)

import OpenAI from "openai";
import type { ChatCompletionMessageParam } from "openai/resources";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// You manage the conversation array
const messages: ChatCompletionMessageParam[] = [
  { role: "system", content: "You are a concise coding assistant." },
];

async function chat(userMessage: string): Promise<string> {
  // Add user message
  messages.push({ role: "user", content: userMessage });

  const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages,
    temperature: 0.3,
  });

  const assistantMessage = response.choices[0].message.content ?? "";

  // Add assistant response to history
  messages.push({ role: "assistant", content: assistantMessage });

  return assistantMessage;
}

// Usage
await chat("How do I reverse a string in JavaScript?");
await chat("Can you show me a one-liner version?"); // Full history sent again

Streaming

import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Next.js API Route with streaming
export async function POST(req: Request) {
  const { messages } = await req.json();

  const stream = await client.chat.completions.create({
    model: "gpt-4o",
    messages,
    stream: true,
  });

  const encoder = new TextEncoder();

  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const delta = chunk.choices[0]?.delta?.content;
        if (delta) {
          controller.enqueue(encoder.encode(`data: ${JSON.stringify({ delta })}\n\n`));
        }
      }
      controller.enqueue(encoder.encode("data: [DONE]\n\n"));
      controller.close();
    },
  });

  return new Response(readable, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache",
    },
  });
}

Tool Calling (Function Calls)

import OpenAI from "openai";
import type { ChatCompletionTool } from "openai/resources";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const tools: ChatCompletionTool[] = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get current weather for a city",
      parameters: {
        type: "object",
        properties: {
          city: { type: "string", description: "City name" },
          unit: { type: "string", enum: ["celsius", "fahrenheit"] },
        },
        required: ["city"],
      },
    },
  },
  {
    type: "function",
    function: {
      name: "search_web",
      description: "Search the web for current information",
      parameters: {
        type: "object",
        properties: {
          query: { type: "string" },
        },
        required: ["query"],
      },
    },
  },
];

// Tool execution dispatcher
async function executeTool(name: string, args: Record<string, unknown>): Promise<string> {
  if (name === "get_weather") {
    const { city } = args as { city: string; unit?: string };
    // Call your weather API
    return JSON.stringify({ temp: 22, condition: "sunny", city });
  }
  if (name === "search_web") {
    // Call your search API
    return JSON.stringify({ results: ["Result 1", "Result 2"] });
  }
  return JSON.stringify({ error: "Unknown tool" });
}

// Agentic loop
async function runWithTools(userMessage: string): Promise<string> {
  const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
    { role: "user", content: userMessage },
  ];

  while (true) {
    const response = await client.chat.completions.create({
      model: "gpt-4o",
      messages,
      tools,
      tool_choice: "auto",
    });

    const choice = response.choices[0];
    messages.push(choice.message); // Add assistant message with tool calls

    if (choice.finish_reason === "stop") {
      return choice.message.content ?? "";
    }

    if (choice.finish_reason === "tool_calls") {
      // Execute all tool calls in parallel
      const toolResults = await Promise.all(
        (choice.message.tool_calls ?? []).map(async (toolCall) => {
          const result = await executeTool(
            toolCall.function.name,
            JSON.parse(toolCall.function.arguments)
          );
          return {
            role: "tool" as const,
            tool_call_id: toolCall.id,
            content: result,
          };
        })
      );

      messages.push(...toolResults);
    }
  }
}

const answer = await runWithTools("What's the weather in Tokyo and what's trending on HN today?");

Structured Output

import OpenAI from "openai";
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const ArticleSchema = z.object({
  title: z.string(),
  summary: z.string(),
  tags: z.array(z.string()),
  readingTime: z.number(),
  keyPoints: z.array(z.string()),
});

const response = await client.beta.chat.completions.parse({
  model: "gpt-4o",
  messages: [
    {
      role: "user",
      content: "Analyze this article and extract metadata: [article text]",
    },
  ],
  response_format: zodResponseFormat(ArticleSchema, "article"),
});

const article = response.choices[0].message.parsed;
// Fully typed: { title: string, summary: string, tags: string[], ... }

Responses API: Stateful Multi-Turn

The Responses API (launched 2025) is OpenAI's new primary API — it handles conversation state server-side and includes built-in tools like web search and file search.

Basic Request

import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// First turn — no previous_response_id
const response = await client.responses.create({
  model: "gpt-4o",
  input: "What is the capital of France?",
  instructions: "You are a helpful geography assistant.",
});

console.log(response.output_text);
// "The capital of France is Paris."
console.log(response.id); // "resp_01ABC..." — use for next turn

Multi-Turn (Server-Side State)

import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Turn 1
const turn1 = await client.responses.create({
  model: "gpt-4o",
  input: "My name is Sarah and I'm building a React app.",
  instructions: "Remember context about the user throughout our conversation.",
});

// Turn 2 — reference previous response, no need to resend history
const turn2 = await client.responses.create({
  model: "gpt-4o",
  input: "What's a good state management library for my project?",
  previous_response_id: turn1.id, // Server looks up prior context
});

// Turn 3
const turn3 = await client.responses.create({
  model: "gpt-4o",
  input: "Can you show me an example?",
  previous_response_id: turn2.id,
});

console.log(turn3.output_text);
// Zustand example tailored to Sarah's React context — no history payload sent

Built-In Web Search

import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const response = await client.responses.create({
  model: "gpt-4o",
  input: "What are the most popular npm packages released in February 2026?",
  tools: [{ type: "web_search_preview" }],
});

console.log(response.output_text);
// Response includes web search results synthesized into answer

// Access the raw search results
for (const item of response.output) {
  if (item.type === "web_search_call") {
    console.log("Searched:", item.status);
  }
  if (item.type === "message") {
    console.log("Answer:", item.content[0].text);
  }
}

Built-In File Search

import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// First, create a vector store and upload files
const vectorStore = await client.vectorStores.create({
  name: "Documentation",
});

await client.vectorStores.fileBatches.uploadAndPoll(vectorStore.id, {
  files: [
    new File(["Your product documentation content..."], "docs.txt", {
      type: "text/plain",
    }),
  ],
});

// Query with file search
const response = await client.responses.create({
  model: "gpt-4o",
  input: "How do I configure authentication in the product?",
  tools: [
    {
      type: "file_search",
      vector_store_ids: [vectorStore.id],
    },
  ],
});

console.log(response.output_text);
// Answer synthesized from uploaded documentation

Streaming with Responses API

import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const stream = await client.responses.create({
  model: "gpt-4o",
  input: "Write a detailed explanation of React's reconciliation algorithm.",
  stream: true,
});

for await (const event of stream) {
  if (event.type === "response.output_text.delta") {
    process.stdout.write(event.delta);
  }
  if (event.type === "response.completed") {
    console.log("\n\nDone. Response ID:", event.response.id);
  }
}

Custom Tool Calling

import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const response = await client.responses.create({
  model: "gpt-4o",
  input: "What's the current price of AAPL stock?",
  tools: [
    {
      type: "function",
      name: "get_stock_price",
      description: "Get the current stock price for a ticker symbol",
      parameters: {
        type: "object",
        properties: {
          ticker: { type: "string", description: "Stock ticker symbol" },
        },
        required: ["ticker"],
      },
    },
  ],
  tool_choice: "auto",
});

// Handle tool call
for (const item of response.output) {
  if (item.type === "function_call") {
    const { ticker } = JSON.parse(item.arguments) as { ticker: string };
    const price = await fetchStockPrice(ticker); // Your implementation

    // Submit tool result and continue
    const finalResponse = await client.responses.create({
      model: "gpt-4o",
      previous_response_id: response.id,
      input: [
        {
          type: "function_call_output",
          call_id: item.call_id,
          output: JSON.stringify({ price, ticker }),
        },
      ],
    });

    console.log(finalResponse.output_text);
  }
}

async function fetchStockPrice(ticker: string) {
  return { price: 185.42, currency: "USD" };
}

Assistants API: Managed Stateful Agents

Assistants API manages Threads, Runs, and built-in tools (Code Interpreter, File Search) fully server-side. Best for document Q&A and code execution use cases with minimal custom infrastructure.

Creating an Assistant

import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Create once, reuse by ID
const assistant = await client.beta.assistants.create({
  name: "Data Analyst",
  instructions:
    "You are a data analyst. Use code interpreter to analyze data, create charts, and answer questions about datasets.",
  model: "gpt-4o",
  tools: [
    { type: "code_interpreter" },
    { type: "file_search" },
  ],
});

console.log("Assistant ID:", assistant.id);
// "asst_01ABC..." — store this, don't recreate each time

Thread Lifecycle

import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const ASSISTANT_ID = "asst_01ABC..."; // From creation

// Create a Thread (persists server-side)
const thread = await client.beta.threads.create();

// Add a message to the Thread
await client.beta.threads.messages.create(thread.id, {
  role: "user",
  content: "Analyze this sales data and identify trends.",
});

// Run the Assistant on the Thread
const run = await client.beta.threads.runs.createAndPoll(thread.id, {
  assistant_id: ASSISTANT_ID,
});

if (run.status === "completed") {
  const messages = await client.beta.threads.messages.list(thread.id);
  const lastMessage = messages.data[0];

  if (lastMessage.content[0].type === "text") {
    console.log(lastMessage.content[0].text.value);
  }
}

// Next turn — same Thread, conversation continues
await client.beta.threads.messages.create(thread.id, {
  role: "user",
  content: "Now create a bar chart of the top 5 products.",
});

const run2 = await client.beta.threads.runs.createAndPoll(thread.id, {
  assistant_id: ASSISTANT_ID,
});

File Upload and Code Interpreter

import OpenAI from "openai";
import { createReadStream } from "fs";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Upload a CSV file for analysis
const file = await client.files.create({
  file: createReadStream("sales-data.csv"),
  purpose: "assistants",
});

const ASSISTANT_ID = "asst_01ABC...";

// Create thread with file attachment
const thread = await client.beta.threads.create({
  messages: [
    {
      role: "user",
      content: "Analyze this CSV file. What are the top 3 revenue months?",
      attachments: [
        {
          file_id: file.id,
          tools: [{ type: "code_interpreter" }],
        },
      ],
    },
  ],
});

const run = await client.beta.threads.runs.createAndPoll(thread.id, {
  assistant_id: ASSISTANT_ID,
});

if (run.status === "completed") {
  const messages = await client.beta.threads.messages.list(thread.id);

  for (const message of messages.data.reverse()) {
    if (message.role === "assistant") {
      for (const content of message.content) {
        if (content.type === "text") {
          console.log(content.text.value);
        }
        // Code Interpreter can generate image files (charts)
        if (content.type === "image_file") {
          console.log("Chart generated:", content.image_file.file_id);
          // Download with client.files.content(content.image_file.file_id)
        }
      }
    }
  }
}

File Search (Vector Store)

import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Upload documentation files
const vectorStore = await client.beta.vectorStores.create({
  name: "Product Docs",
});

await client.beta.vectorStores.fileBatches.uploadAndPoll(vectorStore.id, {
  files: [
    new File(["Authentication guide..."], "auth.txt", { type: "text/plain" }),
    new File(["API reference..."], "api-reference.txt", { type: "text/plain" }),
  ],
});

// Create assistant with file search
const assistant = await client.beta.assistants.create({
  name: "Support Bot",
  instructions: "Answer questions using the product documentation. Always cite sources.",
  model: "gpt-4o",
  tools: [{ type: "file_search" }],
  tool_resources: {
    file_search: {
      vector_store_ids: [vectorStore.id],
    },
  },
});

// Query
const thread = await client.beta.threads.create();
await client.beta.threads.messages.create(thread.id, {
  role: "user",
  content: "How do I set up OAuth with your API?",
});

const run = await client.beta.threads.runs.createAndPoll(thread.id, {
  assistant_id: assistant.id,
});

const messages = await client.beta.threads.messages.list(thread.id);
const reply = messages.data[0].content[0];

if (reply.type === "text") {
  console.log(reply.text.value);
  // Includes citations with file names and quote snippets
  console.log("Citations:", reply.text.annotations);
}

Streaming Runs

import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const ASSISTANT_ID = "asst_01ABC...";
const THREAD_ID = "thread_01ABC..."; // Existing thread

await client.beta.threads.messages.create(THREAD_ID, {
  role: "user",
  content: "Summarize our conversation so far.",
});

const stream = client.beta.threads.runs.stream(THREAD_ID, {
  assistant_id: ASSISTANT_ID,
});

stream
  .on("textDelta", (delta) => {
    process.stdout.write(delta.value ?? "");
  })
  .on("toolCallDelta", (delta) => {
    if (delta.type === "code_interpreter") {
      if (delta.code_interpreter?.input) {
        process.stdout.write(delta.code_interpreter.input);
      }
    }
  })
  .on("end", () => {
    console.log("\nRun complete");
  });

await stream.finalRun();

Feature Comparison

Feature	Chat Completions	Responses API	Assistants API
State management	❌ You manage	✅ Server-side chain	✅ Persistent Threads
Conversation history	Manual (resend all)	`previous_response_id`	Automatic
Web search	❌ Custom tools only	✅ Built-in	❌ Custom tools only
File search	❌ Custom only	✅ Vector stores	✅ File Search tool
Code Interpreter	❌	❌	✅ Python sandbox
Streaming	✅	✅	✅
Structured output	✅ (`response_format`)	✅	⚠️ Via instructions
Tool calling	✅	✅	✅
All models	✅	✅	✅
Cost	Lowest	Mid	Highest
Maturity	GA (2023)	GA (2025)	GA (2023, updating)
Rate limits	Standard	Standard	Thread-scoped
Token limit	Context window	Context window	Context window
File processing	No	Yes (file search)	Yes (Code Interp + FS)
Weekly API usage	Dominant	Growing	Stable

When to Use Each

Choose Chat Completions if:

Building a stateless API (classify, summarize, extract) — no multi-turn needed
You need maximum model flexibility and control
Custom state storage in your own DB (Redis, Postgres)
Cost optimization is critical — no overhead from managed state
Existing codebase built on Chat Completions patterns
Need response_format: json_schema for guaranteed structure

Choose Responses API if:

Multi-turn chat where OpenAI managing state is preferable
Need built-in web search without third-party integration
Document Q&A using file search with simpler setup than Assistants
Building new projects — this is OpenAI's current recommended primary API
Agentic flows where you want easier tool result submission

Choose Assistants API if:

Document Q&A with many files and automatic retrieval
Code Interpreter use cases: data analysis, chart generation, math
Long-lived persistent Threads across multiple user sessions
Support chatbot or copilot where OpenAI handles full conversation lifecycle
You want to avoid managing vector stores and file search plumbing yourself

Cost Architecture and Token Accounting

Understanding the cost structure of each API is essential for production budget planning. Chat Completions charges for input tokens (the full message history you send) and output tokens (the generated response). For long multi-turn conversations, the input token cost compounds because you resend the entire history on every request — a 10-turn conversation where each turn is 200 tokens ends up sending 2,000 input tokens by the final turn, versus 400 tokens if only the new message were sent. This is why Chat Completions is cheapest for short interactions and single-turn requests, but becomes expensive for long conversations where the history grows substantially.

The Responses API's server-side state management eliminates the token resending cost. OpenAI stores the conversation context and you reference it via previous_response_id — the new request sends only the latest user turn rather than the full history. OpenAI bills for the new input tokens (just the new message) plus a context maintenance fee for the stored state, which is typically lower than the cost of resending the full history. For applications with average conversation lengths of 10+ turns, the Responses API's state management is cost-neutral to cost-effective compared to Chat Completions, while significantly reducing payload sizes and network latency.

The Assistants API adds overhead for Thread management, Run orchestration, and the hosted tools (Code Interpreter is billed per session hour at a flat rate, File Search is billed per 1,000 queries). This overhead makes Assistants the most expensive option for simple use cases but justified for document Q&A systems where the alternative would be implementing your own vector store, chunking pipeline, embedding generation, and retrieval system. For a support chatbot with a 500-document knowledge base, Assistants' File Search tool at $0.10/1,000 queries is competitive with the operational cost of running your own pgvector or Pinecone integration plus the engineering time to implement retrieval-augmented generation from scratch.

Migration Path Between APIs

Teams frequently start with Chat Completions and later need to migrate to the Responses API or Assistants API as their use case grows more complex. The migration from Chat Completions to Responses API is the cleanest path: both use the same model identifiers, the same tool definitions, and the same streaming event format. The primary change is replacing the messages array with input and adding previous_response_id for multi-turn flows. The migration can be done incrementally — individual endpoints can switch to the Responses API while others remain on Chat Completions, since the two APIs coexist under the same API key and rate limits.

Migrating from Assistants API to Responses API (or vice versa) is more involved because Assistants' Thread and Run lifecycle has no direct equivalent in Responses. Teams migrating away from Assistants typically need to implement their own conversation state storage (a database table mapping session IDs to previous_response_id chains) and replace Code Interpreter usage with their own sandboxed code execution (using Deno or Docker). The migration is justified when the Assistants API's rate limits become a bottleneck — Thread-based rate limits are more restrictive than standard Chat Completions limits for high-traffic applications — or when the operational flexibility of managing your own vector store outweighs the convenience of File Search.

Methodology

Data sourced from OpenAI API documentation (platform.openai.com/docs), OpenAI Responses API announcement (early 2025), npm weekly download statistics for the openai package as of February 2026, OpenAI developer forum discussions, and practical benchmarks comparing API latency and cost across the three surfaces. Pricing data from OpenAI pricing page as of February 2026.

Related: Gemini API vs Claude API vs Mistral API for cross-provider LLM comparisons, or Vercel AI SDK vs LangChain vs LlamaIndex for AI orchestration frameworks.

See also: Sass vs Tailwind CSS and AVA vs Jest

Search Intent Refresh: Which OpenAI API Should You Use?

For most new OpenAI integrations in 2026, start with the Responses API. It is the clearest default for multimodal input, tool calling, streaming, and future platform features. Keep Chat Completions when you have a stable existing chat integration and do not need newer orchestration primitives. Use Assistants only when its managed thread/run abstraction is a deliberate product fit rather than an accidental dependency.

A practical migration plan:

Keep simple synchronous chat flows on Chat Completions until there is a product reason to move.
Use Responses for new agent-like workflows, multimodal inputs, structured outputs, and tool-heavy apps.
Avoid mixing state between Assistants and your own database unless you have a clear ownership boundary.
Write adapter tests around message formatting, tool call payloads, and streaming events before migrating.
Track cost and latency after the migration; API shape changes can alter retry and context behavior.

Frequently searched questions

Is Chat Completions deprecated? Not for existing simple chat workloads, but the platform direction favors newer APIs for tool use and multimodal workflows.

Should I migrate from Assistants to Responses? Consider it if you want more direct control over state, retrieval, and orchestration in your own application code.

The 2026 JavaScript Stack Cheatsheet