<!-- PkgPulse AI-readable guide source -->
<!-- Canonical: https://www.pkgpulse.com/guides/vercel-ai-sdk-v4-generatetext-streamtext-tools-2026 -->
<!-- Raw Markdown: https://www.pkgpulse.com/guides/vercel-ai-sdk-v4-generatetext-streamtext-tools-2026/raw.md -->
<!-- Source path: content/guides/vercel-ai-sdk-v4-generatetext-streamtext-tools-2026.mdx -->

---
og_image: "/images/guides/vercel-ai-sdk-v4-generatetext-streamtext-tools-2026.webp"
title: "Vercel AI SDK v4: generateText, streamText, Tools 2026"
description: "Vercel AI SDK v4 in 2026: generateText, streamText, generateObject, tool calling, and multi-step agents. What changed, what's new, and when to use each API."
date: "2026-03-09"
author: "PkgPulse Team"
tags: ["ai-sdk", "vercel", "llm", "typescript", "openai"]
---

# Vercel AI SDK v4: generateText, streamText, and Tools in 2026

## TL;DR

The Vercel AI SDK (package: `ai`) has become the de facto standard for building LLM-powered TypeScript applications — it abstracts OpenAI, Anthropic, Google Gemini, Mistral, and 20+ other providers behind a unified API. Version 4 (released 2025) brought **`generateObject`** for structured outputs, **multi-step tool calling** (agents that chain tool invocations automatically), **`useObject`** for streaming structured data to React, and **provider middleware** for logging and caching. If you're building AI features in a Next.js or Node.js app in 2026, the AI SDK's `generateText` / `streamText` pattern is the starting point.

## Key Takeaways

- **`generateText`** is for non-streaming LLM calls with text output — simple Q&A, batch processing, classification
- **`streamText`** is for real-time streaming — chat interfaces, progressive content generation, anything where waiting for the full response hurts UX
- **`generateObject`** returns structured, schema-validated objects — the single best feature in v4 for structured AI outputs without prompt engineering
- **Tool calling (function calling)** lets LLMs invoke TypeScript functions — the foundation of AI agents
- **`maxSteps`** enables automatic multi-step tool execution — the LLM calls a tool, gets results, calls another tool, without your code managing the loop
- **Provider middleware** in v4 lets you add caching, logging, and rate limiting to any provider call
- **npm downloads**: `ai` package ~3.5M/week as of March 2026, up from ~800K/week in early 2024

---

## Why the AI SDK Matters

Without an abstraction layer, every LLM provider has a different API:

```typescript
// OpenAI
const response = await openai.chat.completions.create({ model, messages, stream })

// Anthropic
const response = await anthropic.messages.create({ model, messages, max_tokens })

// Google Gemini
const response = await generativeModel.generateContent({ contents, config })
```

Every provider has different request formats, response shapes, streaming protocols, error codes, and token counting semantics. The Vercel AI SDK normalizes all of this:

```typescript
import { generateText } from 'ai'
import { openai } from '@ai-sdk/openai'
import { anthropic } from '@ai-sdk/anthropic'

// Same call, different providers
const { text } = await generateText({ model: openai('gpt-4o'), prompt })
const { text } = await generateText({ model: anthropic('claude-3-7-sonnet-20250219'), prompt })
```

Swapping providers is a one-line change. No streaming protocol differences, no response shape differences, no error handling differences.

---

## Core APIs in AI SDK v4

### `generateText` — Non-Streaming Text Generation

`generateText` makes a complete LLM request and returns when the model is done:

```typescript
import { generateText } from 'ai'
import { openai } from '@ai-sdk/openai'

const { text, usage, finishReason } = await generateText({
  model: openai('gpt-4o-mini'),
  system: 'You are a helpful assistant.',
  prompt: 'Explain the difference between async and defer in HTML script tags.',
})

console.log(text)
// → "The `async` attribute..."

console.log(usage)
// → { promptTokens: 42, completionTokens: 187, totalTokens: 229 }
```

**When to use `generateText`:**
- Classification tasks (tag this email, categorize this content)
- Batch processing where you want to wait for complete output
- Generating content that gets stored (product descriptions, summaries)
- Single-turn Q&A where streaming adds complexity without UX benefit

**The `messages` array** supports full conversation history:

```typescript
const { text } = await generateText({
  model: openai('gpt-4o'),
  messages: [
    { role: 'user', content: 'What is tRPC?' },
    { role: 'assistant', content: 'tRPC is a library...' },
    { role: 'user', content: 'How does it compare to GraphQL?' },
  ],
})
```

### `streamText` — Streaming Text Generation

`streamText` streams tokens as they're generated — essential for chat interfaces where users shouldn't wait 3–5 seconds for a response:

```typescript
import { streamText } from 'ai'
import { openai } from '@ai-sdk/openai'

const result = streamText({
  model: openai('gpt-4o'),
  prompt: 'Write a haiku about TypeScript.',
})

// Stream to stdout
for await (const chunk of result.textStream) {
  process.stdout.write(chunk)
}

// Or use in a Next.js Route Handler
return result.toDataStreamResponse()
```

**`toDataStreamResponse()`** converts the stream to a `ReadableStream` in the [Vercel AI Data Stream format](https://sdk.vercel.ai/docs/ai-sdk-ui/stream-protocol#ai-stream-protocol) — the protocol that `useChat` and `useCompletion` React hooks understand natively.

**In a Next.js App Router route handler:**

```typescript
// app/api/chat/route.ts
import { streamText } from 'ai'
import { openai } from '@ai-sdk/openai'

export async function POST(req: Request) {
  const { messages } = await req.json()

  const result = streamText({
    model: openai('gpt-4o'),
    messages,
  })

  return result.toDataStreamResponse()
}
```

**Client-side with `useChat`:**

```tsx
// app/chat/page.tsx
'use client'
import { useChat } from 'ai/react'

export default function ChatPage() {
  const { messages, input, handleInputChange, handleSubmit } = useChat()

  return (
    <div>
      {messages.map(m => (
        <div key={m.id}>{m.role}: {m.content}</div>
      ))}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
        <button type="submit">Send</button>
      </form>
    </div>
  )
}
```

`useChat` manages message state, streaming, and error handling — the entire chat pattern in ~20 lines.

### `generateObject` — Structured Output

**`generateObject`** is the feature that makes AI SDK v4 indispensable for production applications. Instead of returning text, it returns a schema-validated TypeScript object:

```typescript
import { generateObject } from 'ai'
import { openai } from '@ai-sdk/openai'
import { z } from 'zod'

const { object } = await generateObject({
  model: openai('gpt-4o'),
  schema: z.object({
    title: z.string(),
    tags: z.array(z.string()).max(5),
    sentiment: z.enum(['positive', 'neutral', 'negative']),
    confidence: z.number().min(0).max(1),
  }),
  prompt: 'Analyze this review: "Great product, fast shipping, would buy again!"',
})

// object is typed as:
// {
//   title: string,
//   tags: string[],
//   sentiment: 'positive' | 'neutral' | 'negative',
//   confidence: number
// }
console.log(object.sentiment) // → 'positive'
console.log(object.confidence) // → 0.97
```

Under the hood, `generateObject` uses the model's native JSON mode or function calling to guarantee a valid JSON response, then validates it against your Zod schema. If validation fails, it retries (configurable with `maxRetries`).

**`streamObject`** — the streaming equivalent — progressively yields object properties as they arrive, enabling UI that populates fields in real time rather than waiting for the complete object.

---

## Tool Calling in v4

Tools let LLMs call TypeScript functions during generation — the primitive behind AI agents, RAG, and any LLM that needs to take actions or retrieve information.

### Defining Tools

```typescript
import { generateText, tool } from 'ai'
import { openai } from '@ai-sdk/openai'
import { z } from 'zod'

const { text, toolCalls, toolResults } = await generateText({
  model: openai('gpt-4o'),
  tools: {
    getWeather: tool({
      description: 'Get the current weather for a city',
      parameters: z.object({
        city: z.string().describe('The city name'),
        unit: z.enum(['celsius', 'fahrenheit']).default('celsius'),
      }),
      execute: async ({ city, unit }) => {
        // Your actual implementation
        return { temperature: 18, condition: 'partly cloudy', unit }
      },
    }),
  },
  prompt: 'What is the weather like in Paris right now?',
})
```

The LLM decides when to call tools, what parameters to pass, and incorporates the results into its response. Your `execute` function can call databases, APIs, file systems, or anything else.

### Multi-Step Tool Calling with `maxSteps`

**`maxSteps`** is the v4 feature that enables real AI agents without managing a manual loop:

```typescript
const { text, steps } = await generateText({
  model: openai('gpt-4o'),
  maxSteps: 5,  // Allow up to 5 tool call rounds
  tools: {
    searchDocs: tool({ ... }),
    fetchPage: tool({ ... }),
    summarize: tool({ ... }),
  },
  prompt: 'Find the changelog for Drizzle ORM and summarize the v1.0 release.',
})
```

Without `maxSteps`, after the LLM makes a tool call, you need to send the tool result back, wait for another response, check if it wants to call another tool, and repeat manually. With `maxSteps: 5`, the SDK manages this loop automatically — the LLM can call `searchDocs`, get results, call `fetchPage` on a result, get that content, then call `summarize` — all in a single `generateText` call.

The `steps` array in the response contains every round of the agent loop — useful for debugging and displaying intermediate reasoning.

---

## `useObject` — Streaming Structured Data to React

A v4 addition that pairs with `streamObject` for React UIs that populate progressively:

```tsx
'use client'
import { experimental_useObject as useObject } from 'ai/react'
import { z } from 'zod'

const schema = z.object({
  summary: z.string(),
  keyPoints: z.array(z.string()),
  readingTime: z.number(),
})

export default function SummaryPage() {
  const { object, submit, isLoading } = useObject({
    api: '/api/summarize',
    schema,
  })

  return (
    <div>
      <button onClick={() => submit({ url: 'https://example.com/article' })}>
        Summarize
      </button>
      {isLoading && <Spinner />}
      {object?.summary && <p>{object.summary}</p>}
      {object?.keyPoints?.map(point => <li key={point}>{point}</li>)}
    </div>
  )
}
```

The `object` is typed from the schema and populates field by field as the LLM generates — `object.summary` might be available before `object.keyPoints` finishes streaming.

---

## Provider Middleware

v4 introduced **provider middleware** — a composable layer for cross-cutting concerns:

```typescript
import { wrapLanguageModel, extractReasoningMiddleware } from 'ai'
import { openai } from '@ai-sdk/openai'

// Cache responses (great for development/testing)
const cachedOpenAI = wrapLanguageModel({
  model: openai('gpt-4o'),
  middleware: {
    wrapGenerate: async ({ doGenerate, params }) => {
      const key = JSON.stringify(params)
      const cached = await cache.get(key)
      if (cached) return cached

      const result = await doGenerate()
      await cache.set(key, result, { ex: 3600 })
      return result
    },
  },
})

// Extract chain-of-thought reasoning from extended thinking models
const reasoningModel = wrapLanguageModel({
  model: anthropic('claude-3-7-sonnet-20250219'),
  middleware: extractReasoningMiddleware({ tagName: 'antml:thinking' }),
})
```

**Common middleware patterns:**
- **Caching**: Store responses in Redis during development to avoid API costs during iteration
- **Logging**: Record all LLM calls with timing, token usage, and cost estimates
- **Fallback**: Try the primary model, fall back to a cheaper model on error or timeout
- **Rate limiting**: Enforce per-user token budgets before hitting the provider

---

## AI SDK vs Alternatives

| Factor | AI SDK (`ai`) | LangChain.js | Direct SDK |
|--------|---------------|-------------|------------|
| Provider abstraction | ✅ 20+ providers | ✅ Many providers | ❌ One provider |
| Streaming | ✅ Native | ⚠️ Complex | Varies |
| React hooks | ✅ useChat, useCompletion, useObject | ❌ None | ❌ |
| Structured output | ✅ generateObject | ✅ (via chains) | ⚠️ Manual |
| Bundle size | ~45kB | ~500kB+ | Provider-specific |
| Agent support | ✅ maxSteps | ✅ LangGraph | ❌ |
| npm downloads | ~3.5M/week | ~1.2M/week | N/A |
| Learning curve | Low | High | Low |

For most Next.js applications building AI features, the AI SDK is the right starting point. LangChain.js is worth considering for complex agent workflows with memory, retrieval, and multi-agent orchestration.

---

## When Not to Use the AI SDK

The AI SDK is not always the right choice:

- **Pure Python data pipelines** — use the provider's Python SDK directly or LangChain Python
- **Complex agentic workflows** — consider Mastra (TypeScript), LangGraph (Python/TS), or AutoGen
- **On-device inference** — the AI SDK targets API-based providers; for on-device models, use onnxruntime or llama.cpp bindings
- **High-throughput batch processing** — the AI SDK's abstractions add overhead; direct provider calls may be better for 100K+ batch jobs

---

## Error Handling and Retry Patterns in Production

Production LLM applications need robust error handling because provider APIs have rate limits, transient network failures, and occasional model-level errors (content filters, context length exceeded). The AI SDK surfaces these as typed errors that you can pattern-match against: `APICallError` for HTTP errors from the provider, `NoTextGeneratedError` when a tool-only response produces no text, and `RetryError` when all retry attempts are exhausted.

The SDK's built-in `maxRetries` option (default: 2) handles transient failures automatically with exponential backoff. For production use, configure a higher value and add provider middleware for circuit-breaking: if a provider returns 5xx errors for three consecutive requests, pause further calls for 30 seconds before retrying. This prevents a cascading failure where your application hammers a degraded provider endpoint. Cloudflare Workers AI and Together AI are common fallback providers when OpenAI's API is degraded — the AI SDK's one-line provider swap makes this fallback straightforward to implement.

## Cost Management and Token Budgets

The `usage` object returned by `generateText` and available on the `streamText` result includes `promptTokens`, `completionTokens`, and `totalTokens`. In production, log this data per request and per user to track costs. A practical approach is to add a provider middleware that accumulates token usage and writes to a time-series database, enabling per-feature cost dashboards. With `gpt-4o` at $2.50/1M input tokens and $10/1M output tokens (as of early 2026), a chatbot that generates 200-token responses for 1,000 daily active users costs roughly $2/day in completion tokens — tractable, but worth monitoring as usage scales.

The `maxTokens` parameter limits output length and is critical for cost control in batch processing jobs. Without it, a prompt that unexpectedly triggers verbose output (summarization tasks, chain-of-thought prompts) can generate ten times the expected tokens. Setting `maxTokens` based on your use case's 99th percentile expected output length, combined with the `finishReason === "length"` check in the response, provides both cost control and visibility into when output was truncated.

---

## Methodology

- npm download data from npmjs.com, March 2026 weekly averages
- Code examples tested against `ai` v4.x (latest stable), `@ai-sdk/openai` v1.x, `@ai-sdk/anthropic` v1.x
- Sources: Vercel AI SDK official documentation, GitHub repo, AI SDK changelog

---

*Compare AI SDK with other LLM client libraries on [PkgPulse](/compare/ai-vs-langchain) — download trends, bundle sizes, and dependency health.*

*Related: [Vercel AI SDK vs OpenAI SDK vs Anthropic SDK 2026](/guides/vercel-ai-sdk-vs-openai-sdk-vs-anthropic-sdk-2026) · [LangChain.js vs Vercel AI SDK 2026](/guides/langchainjs-vs-vercel-ai-sdk-2026) · [Mastra vs LangChain.js vs Genkit 2026](/guides/mastra-vs-langchain-js-vs-genkit-2026)*