Stagehand vs Playwright AI vs Browser Use: AI Web Automation in 2026
TL;DR
Stagehand (by Browserbase) is the best developer experience for AI web automation — a clean TypeScript API built on Playwright that lets you describe what to do in plain English. browser-use is the Python-first agentic browser control library powering many open-source AI agents. Playwright with @playwright/test + LLM assistance is the structured approach — use Playwright's reliable locators, add LLM-powered code generation for stubborn selectors, but keep the execution deterministic. In 2026, use Stagehand for prototyping and internal tooling; use structured Playwright + AI for production automation that must be reliable.
Key Takeaways
- Stagehand GitHub stars: ~8k (Feb 2026) — explosive growth since its open-source release in late 2024
- browser-use GitHub stars: ~35k — the dominant Python-first AI browser library; JS port in progress
- AI web automation costs real money — each
act()call in Stagehand consumes ~1–3k tokens, costing $0.003–$0.01 per action - Reliability gap is real — AI-powered automation is 85–95% reliable vs 99%+ for Playwright selectors
- Stagehand supports computer use models (Anthropic Claude Sonnet, GPT-4o) for screenshot-based interaction
- All three support CDP (Chrome DevTools Protocol) — they all control Chromium/Chrome under the hood
- The sweet spot: use AI to generate locators, Playwright to execute — best of both worlds
Why AI Web Automation?
Traditional web automation (Playwright, Puppeteer, Selenium) requires precise CSS selectors or ARIA labels. Maintenance is expensive: every UI change breaks tests. AI-powered automation changes the equation — you describe what to do rather than how to find elements.
The tradeoff: AI automation is slower, more expensive, and less reliable than deterministic automation. It excels at:
- One-off tasks where writing precise selectors isn't worth it
- Rapidly changing UIs where maintaining selectors is a maintenance burden
- Natural language scripts for non-engineers to describe automation tasks
- AI agents that need to navigate arbitrary websites
Stagehand: The TypeScript-First AI Browser
Stagehand wraps Playwright with AI superpowers. It adds three key methods: act() for performing actions, extract() for extracting structured data, and observe() for understanding the current page state.
Installation
npm install @browserbasehq/stagehand
# Requires: API keys for OpenAI or Anthropic
Basic Usage
import { Stagehand } from "@browserbasehq/stagehand";
import { z } from "zod";
const stagehand = new Stagehand({
env: "LOCAL", // or "BROWSERBASE" for cloud
modelName: "gpt-4o", // or "claude-3-5-sonnet-20241022"
modelClientOptions: {
apiKey: process.env.OPENAI_API_KEY,
},
verbose: 1,
enableCaching: true, // Cache LLM calls for repeated actions
});
await stagehand.init();
const { page } = stagehand;
// Navigate
await page.goto("https://github.com");
// act() — describe what to do in plain English
await stagehand.act({ action: "Click the Sign in button" });
await stagehand.act({ action: "Fill the username field with 'your-username'" });
await stagehand.act({ action: "Fill the password field with 'your-password'" });
await stagehand.act({ action: "Click the 'Sign in' button to submit the form" });
// extract() — get structured data from the page
const repos = await stagehand.extract({
instruction: "Extract the list of pinned repositories",
schema: z.object({
repositories: z.array(
z.object({
name: z.string(),
description: z.string().nullable(),
language: z.string().nullable(),
stars: z.number(),
})
),
}),
});
console.log(repos.repositories);
// [{ name: "my-repo", description: "...", language: "TypeScript", stars: 42 }]
await stagehand.close();
Computer Use Mode (Screenshot-Based)
const stagehand = new Stagehand({
env: "LOCAL",
modelName: "claude-3-5-sonnet-20241022", // Anthropic's computer use model
modelClientOptions: { apiKey: process.env.ANTHROPIC_API_KEY },
// Enable computer use for screenshot-based interaction (bypasses DOM)
enableComputerUse: true,
});
await stagehand.init();
await page.goto("https://complex-spa.example.com");
// Works even on canvas-heavy apps, complex custom UIs
await stagehand.act({
action: "Click the blue 'Create Project' button in the top navigation",
});
Multi-Step Agent Flow
import { Stagehand } from "@browserbasehq/stagehand";
import { z } from "zod";
async function scrapeLinkedInProfile(url: string) {
const stagehand = new Stagehand({
env: "BROWSERBASE", // Cloud browser — stealth, no detection
browserbaseApiKey: process.env.BROWSERBASE_API_KEY,
browserbaseProjectId: process.env.BROWSERBASE_PROJECT_ID,
modelName: "gpt-4o-mini", // Cheaper model for simple tasks
modelClientOptions: { apiKey: process.env.OPENAI_API_KEY },
});
await stagehand.init();
const { page } = stagehand;
await page.goto(url);
// Wait for page to load
await stagehand.observe();
const profile = await stagehand.extract({
instruction: "Extract the person's profile information",
schema: z.object({
name: z.string(),
headline: z.string(),
location: z.string().optional(),
connections: z.string().optional(),
about: z.string().optional(),
experience: z.array(
z.object({
title: z.string(),
company: z.string(),
duration: z.string().optional(),
})
),
}),
});
await stagehand.close();
return profile;
}
Stagehand with Browserbase (Cloud)
// Browserbase provides managed browsers with:
// - Stealth mode (harder to detect as bot)
// - Rotating residential proxies
// - Session recording and replay
// - CAPTCHA solving
const stagehand = new Stagehand({
env: "BROWSERBASE",
browserbaseApiKey: process.env.BROWSERBASE_API_KEY!,
browserbaseProjectId: process.env.BROWSERBASE_PROJECT_ID!,
modelName: "claude-3-5-sonnet-20241022",
modelClientOptions: { apiKey: process.env.ANTHROPIC_API_KEY! },
browserbaseSessionCreateParams: {
projectId: process.env.BROWSERBASE_PROJECT_ID!,
browserSettings: {
viewport: { width: 1920, height: 1080 },
},
},
});
Playwright with AI-Assisted Automation
Traditional Playwright + AI is about using LLMs to generate selectors and test code, not to execute actions. This keeps execution deterministic while reducing the cost of writing automation.
AI-Powered Locator Generation
import { test, expect } from "@playwright/test";
import Anthropic from "@anthropic-ai/sdk";
import * as fs from "fs";
// Utility: Use AI to find the right selector
async function findSelector(
page: any,
description: string
): Promise<string> {
// Take a screenshot + get accessibility tree
const screenshot = await page.screenshot({ type: "png" });
const ariaSnapshot = await page.accessibility.snapshot();
const claude = new Anthropic();
const response = await claude.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 500,
messages: [
{
role: "user",
content: [
{
type: "image",
source: {
type: "base64",
media_type: "image/png",
data: screenshot.toString("base64"),
},
},
{
type: "text",
text: `Given this page, provide the best Playwright locator for: "${description}"
ARIA tree: ${JSON.stringify(ariaSnapshot, null, 2)}
Respond with only the locator string, e.g.: page.getByRole('button', { name: 'Submit' })`,
},
],
},
],
});
return response.content[0].text.trim();
}
// Use in tests
test("submit form", async ({ page }) => {
await page.goto("https://app.example.com/checkout");
// AI-generated stable locators
const submitButton = await findSelector(page, "the checkout submit button");
// Returns: page.getByRole('button', { name: 'Complete Order' })
// Execute with Playwright — deterministic, no AI at runtime
await page.getByRole("button", { name: "Complete Order" }).click();
await expect(page.getByText("Order confirmed")).toBeVisible();
});
@playwright/test with ai Fixture (Community Pattern)
// playwright.ai.fixture.ts — community pattern for AI-assisted tests
import { test as base } from "@playwright/test";
import OpenAI from "openai";
type AIFixture = {
ai: {
act: (instruction: string) => Promise<void>;
assert: (condition: string) => Promise<void>;
};
};
const test = base.extend<AIFixture>({
ai: async ({ page }, use) => {
const openai = new OpenAI();
const act = async (instruction: string) => {
const screenshot = await page.screenshot({ encoding: "base64" });
const html = await page.content();
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [
{
role: "user",
content: [
{ type: "image_url", image_url: { url: `data:image/png;base64,${screenshot}` } },
{
type: "text",
text: `Given the screenshot, provide Playwright code to: ${instruction}
HTML context: ${html.slice(0, 3000)}
Respond with valid JavaScript only.`,
},
],
},
],
});
// Execute the generated code
const code = response.choices[0].message.content!;
await page.evaluate(new Function("page", code)(page));
};
await use({ act, assert: async () => {} });
},
});
// Usage
test("user can checkout", async ({ page, ai }) => {
await page.goto("/checkout");
await ai.act("Fill in the shipping address form with test data");
await ai.act("Select the standard shipping option");
await ai.act("Click the place order button");
await expect(page.getByText("Order confirmed")).toBeVisible();
});
browser-use: Python-First, JS Coming
browser-use is the dominant open-source AI browser control library, primarily Python. The JavaScript/TypeScript version (browser-use-js) is in early development but gaining traction.
Python Usage (Reference — most popular form)
# Python example (reference)
from browser_use import Agent
from langchain_openai import ChatOpenAI
agent = Agent(
task="Find the top 3 TypeScript repositories on GitHub this week",
llm=ChatOpenAI(model="gpt-4o"),
)
result = await agent.run()
print(result)
JavaScript Port (browser-use-js — experimental)
// browser-use-js — early stage, API may change
import { Agent } from "browser-use-js";
import { ChatOpenAI } from "@langchain/openai";
const agent = new Agent({
task: "Go to GitHub and find the trending TypeScript repositories this week",
llm: new ChatOpenAI({ modelName: "gpt-4o" }),
browserConfig: {
headless: false, // Visual mode for debugging
chromePath: "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
},
});
const result = await agent.run();
console.log(result.finalResult());
console.log("Actions taken:", result.actionHistory());
Feature Comparison
| Feature | Stagehand | Playwright + AI | browser-use (JS) |
|---|---|---|---|
| Language | TypeScript | TypeScript | TypeScript (port) |
| Maturity | ✅ Stable | ✅ Stable | ⚠️ Experimental |
| AI integration | Native (act/extract) | Manual (code generation) | Native (agentic) |
| Execution mode | AI-powered | Deterministic | AI-powered |
| Reliability | ~85-95% | ~99%+ | ~80-90% |
| Cost per action | $0.003–$0.01 | $0 (code gen only) | $0.003–$0.015 |
| Playwright under hood | ✅ | ✅ | Via Playwright |
| Cloud browser support | ✅ Browserbase | ✅ (BrowserStack, etc.) | ✅ |
| Computer use (screenshot AI) | ✅ | ❌ | ✅ |
| Structured data extraction | ✅ (Zod schema) | Manual | ✅ |
| CAPTCHA handling | Via Browserbase | Manual | Manual |
| GitHub stars | 8k | 70k (Playwright) | 35k (Python) |
| npm downloads/week | ~50k | 5M+ (Playwright) | N/A (Python) |
Cost Analysis
Running AI web automation at scale requires careful cost management:
Simple action (click, fill form): ~1,000 tokens × $0.000003 = $0.003/action
Complex action (extract data, navigate): ~3,000 tokens × $0.000003 = $0.009/action
Computer use (screenshot): ~2,000 tokens × $0.000003 + vision = $0.01–$0.03/action
100 automations/day × $0.005 avg = $0.50/day = $15/month
1,000 automations/day = $150/month
10,000 automations/day = $1,500/month
Vs. Playwright (deterministic): $0/month for compute (just browser costs)
For high-volume automation, the cost difference between AI-powered and traditional Playwright is significant. Use AI automation where it provides value (unstable UIs, one-off tasks); use deterministic Playwright everywhere else.
When to Use Each
Choose Stagehand if:
- You're building internal tools where an engineer needs to automate repetitive browser tasks
- You need to automate unstable UIs (SPAs that change frequently, or custom Canvas-based apps)
- You want structured data extraction with type safety (Zod schemas in
extract()) - You're deploying AI agents that need to navigate arbitrary websites
Choose Playwright + AI (selector generation) if:
- You need production-grade reliability (99%+ success rate)
- Cost is a constraint — AI is only used to generate code, not at runtime
- You're building a test suite that runs hundreds of times per day
- Your team already has Playwright expertise
Choose browser-use if:
- You're working in Python primarily and need agentic browser control
- You're building an AI agent that needs to research the web as part of a larger pipeline
- The JS port is mature enough for your use case (verify current status before using)
Caching and Cost Optimization
AI-powered browser automation at scale requires deliberate caching strategies to control LLM costs. Stagehand's enableCaching: true option caches the LLM's response for a given page state and instruction combination — if the same page layout triggers the same act() call, the cached selector result is reused without an additional LLM API call. The cache key is derived from a hash of the page's accessibility tree and the instruction string, so cache hits occur when the page structure is unchanged and the instruction is identical. Enabling caching in development dramatically reduces costs during iteration, and the cache can be persisted to disk or a database for sharing across team members or CI runs.
For production automation pipelines, a hybrid approach works best: use Stagehand with caching for the navigation and extraction steps that change infrequently (page structure is stable), and fall back to explicit Playwright selectors for the steps that execute most frequently (form submissions, button clicks in a known UI). This hybrid reduces per-run costs to only the LLM calls needed for the dynamic portions of the workflow. The observe() method in Stagehand is specifically designed for this pattern — it inspects the page and returns structured information about actionable elements, which you can then act on using explicit Playwright locators rather than another act() call.
Reliability Patterns for Production Automation
The 85–95% reliability figure for AI-powered automation versus 99%+ for deterministic Playwright represents a meaningful gap that compounds across multi-step workflows. A 10-step Stagehand workflow where each step is 95% reliable has an overall success rate of 0.95^10, approximately 60% — meaning 40% of runs require a retry or fail entirely. For internal tooling where occasional failures are acceptable and a human monitors the output, this is fine. For production pipelines that must complete reliably on a schedule, this failure rate is problematic without a robust retry and alerting strategy.
The recommended reliability pattern for production AI automation is to wrap each act() call in a retry loop with progressive fallback: attempt the AI action up to 3 times, and if it consistently fails, fall back to a pre-configured Playwright selector if one is available. Stagehand's page object is a full Playwright Page instance — you can mix AI-powered actions and deterministic Playwright selectors in the same automation script. Teams building critical automation pipelines should instrument each act() call with the success/failure outcome and the page state at time of failure, enabling progressive improvement of the explicit-selector fallbacks as failure patterns emerge from production data.
TypeScript and Schema-Driven Extraction
One of Stagehand's most valuable production features is its Zod-schema-driven extract() method, which bridges the gap between LLM outputs (inherently unstructured strings) and TypeScript's type system. By defining a Zod schema for the data you want to extract — structured like an API response type — Stagehand instructs the LLM to return JSON conforming to that schema, then validates and types the response. A failed validation (LLM returned data that doesn't match the schema) triggers a retry with the validation error as additional context, improving success rates on complex extraction tasks. This schema-first extraction pattern eliminates the manual JSON parsing and type assertion code that typically accompanies LLM-based data extraction.
For teams building data pipelines that extract structured information from web pages at scale — competitive pricing data, product catalog information, job listings — the Zod schema approach provides both type safety and a natural quality gate. Extraction schemas can be versioned alongside code, and schema changes automatically update the TypeScript types for the extracted data throughout the consuming codebase. The Playwright AI selector generation approach, in contrast, does not have a native structured extraction equivalent — it generates imperative Playwright code to locate and read specific elements, which is more brittle than schema-based extraction when page structure changes, and requires manual type annotation of the extracted values.
Methodology
Data sourced from GitHub repositories (star counts as of February 2026), npm weekly download statistics (January 2026), official documentation, and community discussions on Discord and Twitter/X. Cost estimates based on GPT-4o pricing ($3/1M input, $15/1M output) as of Q1 2026. Reliability metrics from community benchmarks and Stagehand's published evaluation data. browser-use JS port status verified from official GitHub repository.
Related: Playwright vs Cypress vs Puppeteer for traditional E2E testing, or MSW vs Nock vs Axios Mock Adapter for API mocking in tests.
See also: Playwright vs Puppeteer and Cypress vs Playwright