Vercel AI SDK v4: generateText, streamText, and Tools in 2026

Q: When Not to Use the AI SDK?

The AI SDK is not always the right choice: Pure Python data pipelines — use the provider's Python SDK directly or LangChain Python Complex agentic workflows — consider Mastra (TypeScript), LangGraph (Python/TS), or AutoGen On-device inference — the AI SDK targets API-based providers; for on-device models, use onnxruntime or llama.cpp bindings High-throughput batch processing — the AI SDK's abstractions add overhead; direct provider calls may be better for 100K+ batch jobs ---

TL;DR

The Vercel AI SDK (package: ai) has become the de facto standard for building LLM-powered TypeScript applications — it abstracts OpenAI, Anthropic, Google Gemini, Mistral, and 20+ other providers behind a unified API. Version 4 (released 2025) brought generateObject for structured outputs, multi-step tool calling (agents that chain tool invocations automatically), useObject for streaming structured data to React, and provider middleware for logging and caching. If you're building AI features in a Next.js or Node.js app in 2026, the AI SDK's generateText / streamText pattern is the starting point.

Key Takeaways

generateText is for non-streaming LLM calls with text output — simple Q&A, batch processing, classification
streamText is for real-time streaming — chat interfaces, progressive content generation, anything where waiting for the full response hurts UX
generateObject returns structured, schema-validated objects — the single best feature in v4 for structured AI outputs without prompt engineering
Tool calling (function calling) lets LLMs invoke TypeScript functions — the foundation of AI agents
maxSteps enables automatic multi-step tool execution — the LLM calls a tool, gets results, calls another tool, without your code managing the loop
Provider middleware in v4 lets you add caching, logging, and rate limiting to any provider call
npm downloads: ai package ~3.5M/week as of March 2026, up from ~800K/week in early 2024

Why the AI SDK Matters

Without an abstraction layer, every LLM provider has a different API:

// OpenAI
const response = await openai.chat.completions.create({ model, messages, stream })

// Anthropic
const response = await anthropic.messages.create({ model, messages, max_tokens })

// Google Gemini
const response = await generativeModel.generateContent({ contents, config })

Every provider has different request formats, response shapes, streaming protocols, error codes, and token counting semantics. The Vercel AI SDK normalizes all of this:

import { generateText } from 'ai'
import { openai } from '@ai-sdk/openai'
import { anthropic } from '@ai-sdk/anthropic'

// Same call, different providers
const { text } = await generateText({ model: openai('gpt-4o'), prompt })
const { text } = await generateText({ model: anthropic('claude-3-7-sonnet-20250219'), prompt })

Swapping providers is a one-line change. No streaming protocol differences, no response shape differences, no error handling differences.

Core APIs in AI SDK v4

`generateText` — Non-Streaming Text Generation

generateText makes a complete LLM request and returns when the model is done:

import { generateText } from 'ai'
import { openai } from '@ai-sdk/openai'

const { text, usage, finishReason } = await generateText({
  model: openai('gpt-4o-mini'),
  system: 'You are a helpful assistant.',
  prompt: 'Explain the difference between async and defer in HTML script tags.',
})

console.log(text)
// → "The `async` attribute..."

console.log(usage)
// → { promptTokens: 42, completionTokens: 187, totalTokens: 229 }

When to use generateText:

Classification tasks (tag this email, categorize this content)
Batch processing where you want to wait for complete output
Generating content that gets stored (product descriptions, summaries)
Single-turn Q&A where streaming adds complexity without UX benefit

The messages array supports full conversation history:

const { text } = await generateText({
  model: openai('gpt-4o'),
  messages: [
    { role: 'user', content: 'What is tRPC?' },
    { role: 'assistant', content: 'tRPC is a library...' },
    { role: 'user', content: 'How does it compare to GraphQL?' },
  ],
})

`streamText` — Streaming Text Generation

streamText streams tokens as they're generated — essential for chat interfaces where users shouldn't wait 3–5 seconds for a response:

import { streamText } from 'ai'
import { openai } from '@ai-sdk/openai'

const result = streamText({
  model: openai('gpt-4o'),
  prompt: 'Write a haiku about TypeScript.',
})

// Stream to stdout
for await (const chunk of result.textStream) {
  process.stdout.write(chunk)
}

// Or use in a Next.js Route Handler
return result.toDataStreamResponse()

toDataStreamResponse() converts the stream to a ReadableStream in the Vercel AI Data Stream format — the protocol that useChat and useCompletion React hooks understand natively.

In a Next.js App Router route handler:

// app/api/chat/route.ts
import { streamText } from 'ai'
import { openai } from '@ai-sdk/openai'

export async function POST(req: Request) {
  const { messages } = await req.json()

  const result = streamText({
    model: openai('gpt-4o'),
    messages,
  })

  return result.toDataStreamResponse()
}

Client-side with useChat:

// app/chat/page.tsx
'use client'
import { useChat } from 'ai/react'

export default function ChatPage() {
  const { messages, input, handleInputChange, handleSubmit } = useChat()

  return (
    <div>
      {messages.map(m => (
        <div key={m.id}>{m.role}: {m.content}</div>
      ))}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
        <button type="submit">Send</button>
      </form>
    </div>
  )
}

useChat manages message state, streaming, and error handling — the entire chat pattern in ~20 lines.

`generateObject` — Structured Output

generateObject is the feature that makes AI SDK v4 indispensable for production applications. Instead of returning text, it returns a schema-validated TypeScript object:

import { generateObject } from 'ai'
import { openai } from '@ai-sdk/openai'
import { z } from 'zod'

const { object } = await generateObject({
  model: openai('gpt-4o'),
  schema: z.object({
    title: z.string(),
    tags: z.array(z.string()).max(5),
    sentiment: z.enum(['positive', 'neutral', 'negative']),
    confidence: z.number().min(0).max(1),
  }),
  prompt: 'Analyze this review: "Great product, fast shipping, would buy again!"',
})

// object is typed as:
// {
//   title: string,
//   tags: string[],
//   sentiment: 'positive' | 'neutral' | 'negative',
//   confidence: number
// }
console.log(object.sentiment) // → 'positive'
console.log(object.confidence) // → 0.97

Under the hood, generateObject uses the model's native JSON mode or function calling to guarantee a valid JSON response, then validates it against your Zod schema. If validation fails, it retries (configurable with maxRetries).

streamObject — the streaming equivalent — progressively yields object properties as they arrive, enabling UI that populates fields in real time rather than waiting for the complete object.

Tool Calling in v4

Tools let LLMs call TypeScript functions during generation — the primitive behind AI agents, RAG, and any LLM that needs to take actions or retrieve information.

Defining Tools

import { generateText, tool } from 'ai'
import { openai } from '@ai-sdk/openai'
import { z } from 'zod'

const { text, toolCalls, toolResults } = await generateText({
  model: openai('gpt-4o'),
  tools: {
    getWeather: tool({
      description: 'Get the current weather for a city',
      parameters: z.object({
        city: z.string().describe('The city name'),
        unit: z.enum(['celsius', 'fahrenheit']).default('celsius'),
      }),
      execute: async ({ city, unit }) => {
        // Your actual implementation
        return { temperature: 18, condition: 'partly cloudy', unit }
      },
    }),
  },
  prompt: 'What is the weather like in Paris right now?',
})

The LLM decides when to call tools, what parameters to pass, and incorporates the results into its response. Your execute function can call databases, APIs, file systems, or anything else.

Multi-Step Tool Calling with `maxSteps`

maxSteps is the v4 feature that enables real AI agents without managing a manual loop:

const { text, steps } = await generateText({
  model: openai('gpt-4o'),
  maxSteps: 5,  // Allow up to 5 tool call rounds
  tools: {
    searchDocs: tool({ ... }),
    fetchPage: tool({ ... }),
    summarize: tool({ ... }),
  },
  prompt: 'Find the changelog for Drizzle ORM and summarize the v1.0 release.',
})

Without maxSteps, after the LLM makes a tool call, you need to send the tool result back, wait for another response, check if it wants to call another tool, and repeat manually. With maxSteps: 5, the SDK manages this loop automatically — the LLM can call searchDocs, get results, call fetchPage on a result, get that content, then call summarize — all in a single generateText call.

The steps array in the response contains every round of the agent loop — useful for debugging and displaying intermediate reasoning.

`useObject` — Streaming Structured Data to React

A v4 addition that pairs with streamObject for React UIs that populate progressively:

'use client'
import { experimental_useObject as useObject } from 'ai/react'
import { z } from 'zod'

const schema = z.object({
  summary: z.string(),
  keyPoints: z.array(z.string()),
  readingTime: z.number(),
})

export default function SummaryPage() {
  const { object, submit, isLoading } = useObject({
    api: '/api/summarize',
    schema,
  })

  return (
    <div>
      <button onClick={() => submit({ url: 'https://example.com/article' })}>
        Summarize
      </button>
      {isLoading && <Spinner />}
      {object?.summary && <p>{object.summary}</p>}
      {object?.keyPoints?.map(point => <li key={point}>{point}</li>)}
    </div>
  )
}

The object is typed from the schema and populates field by field as the LLM generates — object.summary might be available before object.keyPoints finishes streaming.

Provider Middleware

v4 introduced provider middleware — a composable layer for cross-cutting concerns:

import { wrapLanguageModel, extractReasoningMiddleware } from 'ai'
import { openai } from '@ai-sdk/openai'

// Cache responses (great for development/testing)
const cachedOpenAI = wrapLanguageModel({
  model: openai('gpt-4o'),
  middleware: {
    wrapGenerate: async ({ doGenerate, params }) => {
      const key = JSON.stringify(params)
      const cached = await cache.get(key)
      if (cached) return cached

      const result = await doGenerate()
      await cache.set(key, result, { ex: 3600 })
      return result
    },
  },
})

// Extract chain-of-thought reasoning from extended thinking models
const reasoningModel = wrapLanguageModel({
  model: anthropic('claude-3-7-sonnet-20250219'),
  middleware: extractReasoningMiddleware({ tagName: 'antml:thinking' }),
})

Common middleware patterns:

Caching: Store responses in Redis during development to avoid API costs during iteration
Logging: Record all LLM calls with timing, token usage, and cost estimates
Fallback: Try the primary model, fall back to a cheaper model on error or timeout
Rate limiting: Enforce per-user token budgets before hitting the provider

AI SDK vs Alternatives

Factor	AI SDK (`ai`)	LangChain.js	Direct SDK
Provider abstraction	✅ 20+ providers	✅ Many providers	❌ One provider
Streaming	✅ Native	⚠️ Complex	Varies
React hooks	✅ useChat, useCompletion, useObject	❌ None	❌
Structured output	✅ generateObject	✅ (via chains)	⚠️ Manual
Bundle size	~45kB	~500kB+	Provider-specific
Agent support	✅ maxSteps	✅ LangGraph	❌
npm downloads	~3.5M/week	~1.2M/week	N/A
Learning curve	Low	High	Low

For most Next.js applications building AI features, the AI SDK is the right starting point. LangChain.js is worth considering for complex agent workflows with memory, retrieval, and multi-agent orchestration.

When Not to Use the AI SDK

The AI SDK is not always the right choice:

Pure Python data pipelines — use the provider's Python SDK directly or LangChain Python
Complex agentic workflows — consider Mastra (TypeScript), LangGraph (Python/TS), or AutoGen
On-device inference — the AI SDK targets API-based providers; for on-device models, use onnxruntime or llama.cpp bindings
High-throughput batch processing — the AI SDK's abstractions add overhead; direct provider calls may be better for 100K+ batch jobs

Error Handling and Retry Patterns in Production

Production LLM applications need robust error handling because provider APIs have rate limits, transient network failures, and occasional model-level errors (content filters, context length exceeded). The AI SDK surfaces these as typed errors that you can pattern-match against: APICallError for HTTP errors from the provider, NoTextGeneratedError when a tool-only response produces no text, and RetryError when all retry attempts are exhausted.

The SDK's built-in maxRetries option (default: 2) handles transient failures automatically with exponential backoff. For production use, configure a higher value and add provider middleware for circuit-breaking: if a provider returns 5xx errors for three consecutive requests, pause further calls for 30 seconds before retrying. This prevents a cascading failure where your application hammers a degraded provider endpoint. Cloudflare Workers AI and Together AI are common fallback providers when OpenAI's API is degraded — the AI SDK's one-line provider swap makes this fallback straightforward to implement.

Cost Management and Token Budgets

The usage object returned by generateText and available on the streamText result includes promptTokens, completionTokens, and totalTokens. In production, log this data per request and per user to track costs. A practical approach is to add a provider middleware that accumulates token usage and writes to a time-series database, enabling per-feature cost dashboards. With gpt-4o at $2.50/1M input tokens and $10/1M output tokens (as of early 2026), a chatbot that generates 200-token responses for 1,000 daily active users costs roughly $2/day in completion tokens — tractable, but worth monitoring as usage scales.

The maxTokens parameter limits output length and is critical for cost control in batch processing jobs. Without it, a prompt that unexpectedly triggers verbose output (summarization tasks, chain-of-thought prompts) can generate ten times the expected tokens. Setting maxTokens based on your use case's 99th percentile expected output length, combined with the finishReason === "length" check in the response, provides both cost control and visibility into when output was truncated.

Methodology

npm download data from npmjs.com, March 2026 weekly averages
Code examples tested against ai v4.x (latest stable), @ai-sdk/openai v1.x, @ai-sdk/anthropic v1.x
Sources: Vercel AI SDK official documentation, GitHub repo, AI SDK changelog

Compare AI SDK with other LLM client libraries on PkgPulse — download trends, bundle sizes, and dependency health.

Vercel AI SDK v4: generateText, streamText, Tools 2026