got-scraping vs Crawlee vs puppeteer-extra: Advanced Web Scraping in Node.js (2026)

TL;DR

Crawlee is the full web scraping framework from Apify — request queuing, automatic retries, proxy rotation, browser pool management, and both HTTP and browser-based crawlers in one toolkit. got-scraping is got with anti-bot headers — generates realistic browser-like HTTP headers, TLS fingerprinting, automatic header rotation for scraping without a browser. puppeteer-extra is Puppeteer with plugins — stealth mode to bypass bot detection, ad blocking, reCAPTCHA solving, and other plugin extensions. In 2026: Crawlee for production scraping pipelines, got-scraping for fast HTTP-based scraping, puppeteer-extra for browser automation with stealth.

Key Takeaways

Crawlee: ~50K weekly downloads — full framework, queue management, proxy rotation, Apify
got-scraping: ~30K weekly downloads — HTTP scraping with realistic headers, TLS fingerprinting
puppeteer-extra: ~200K weekly downloads — Puppeteer + stealth plugin, anti-detection
got-scraping is for HTTP requests — fast, no browser overhead, works for many sites
puppeteer-extra controls a real browser — handles JavaScript-rendered pages, stealth mode
Crawlee combines both approaches — use HTTP crawlers or browser crawlers as needed

The Anti-Bot Challenge

Modern websites detect scrapers via:
  🔍 HTTP headers — missing or wrong User-Agent, Accept, Accept-Language
  🔍 TLS fingerprint — Node.js has a different TLS fingerprint than Chrome
  🔍 JavaScript challenges — Cloudflare, PerimeterX, DataDome
  🔍 Browser fingerprint — headless Chrome has detectable properties
  🔍 Rate limiting — too many requests too fast
  🔍 IP reputation — datacenter IPs flagged as bots

Solutions:
  got-scraping    → Fixes HTTP headers + TLS fingerprint
  puppeteer-extra → Fixes browser fingerprint + JS challenges
  Crawlee         → Framework that orchestrates both approaches

got-scraping

got-scraping — HTTP scraping with realistic headers:

Basic usage

import { gotScraping } from "got-scraping"

// Makes requests that look like a real browser:
const response = await gotScraping({
  url: "https://example.com/products",
  // Automatically generates realistic headers:
  // User-Agent, Accept, Accept-Language, Accept-Encoding
  // Sec-Ch-Ua, Sec-Fetch-* headers (Chrome-like)
})

console.log(response.body) // HTML content

// With proxy:
const response2 = await gotScraping({
  url: "https://example.com/api/data",
  proxyUrl: "http://proxy:8080",
  responseType: "json",
})

Header generation

import { gotScraping } from "got-scraping"

// got-scraping generates different realistic headers each time:
const response = await gotScraping({
  url: "https://example.com",
  headerGeneratorOptions: {
    browsers: ["chrome", "firefox"],     // Mimic Chrome or Firefox
    devices: ["desktop"],                 // Desktop headers
    operatingSystems: ["windows", "macos"],
    locales: ["en-US"],
  },
})

// Example generated headers:
// User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...
// Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
// Accept-Language: en-US,en;q=0.9
// Sec-Ch-Ua: "Chromium";v="122", "Google Chrome";v="122"
// Sec-Fetch-Mode: navigate

Scraping with Cheerio

import { gotScraping } from "got-scraping"
import * as cheerio from "cheerio"

async function scrapeProducts(url: string) {
  const { body } = await gotScraping({ url })
  const $ = cheerio.load(body)

  const products = $(".product-card").map((_, el) => ({
    name: $(el).find(".product-name").text().trim(),
    price: $(el).find(".product-price").text().trim(),
    url: $(el).find("a").attr("href"),
  })).get()

  return products
}

// Fast — no browser needed:
const products = await scrapeProducts("https://example.com/products")

When got-scraping isn't enough

got-scraping works for:
  ✅ Server-rendered HTML pages
  ✅ REST APIs with anti-bot headers
  ✅ Sites that check User-Agent and headers
  ✅ High-volume scraping (fast — no browser)

got-scraping fails for:
  ❌ JavaScript-rendered content (SPAs)
  ❌ Cloudflare challenge pages
  ❌ Sites requiring browser fingerprinting
  ❌ Interactive elements (login forms, infinite scroll)
  → Use puppeteer-extra or Crawlee with browser crawler

puppeteer-extra

puppeteer-extra — Puppeteer with plugins:

Stealth plugin

import puppeteer from "puppeteer-extra"
import StealthPlugin from "puppeteer-extra-plugin-stealth"

// Add stealth plugin — hides headless Chrome indicators:
puppeteer.use(StealthPlugin())

const browser = await puppeteer.launch({ headless: true })
const page = await browser.newPage()

// Now headless Chrome looks like a real browser:
await page.goto("https://bot-detection-site.com")

// Stealth plugin patches:
// ✅ navigator.webdriver → false
// ✅ Chrome runtime properties present
// ✅ Correct WebGL vendor/renderer
// ✅ Plugin array not empty
// ✅ Language and timezone consistent
// ✅ iframe contentWindow access

Scraping with stealth

import puppeteer from "puppeteer-extra"
import StealthPlugin from "puppeteer-extra-plugin-stealth"

puppeteer.use(StealthPlugin())

async function scrapeJSRenderedPage(url: string) {
  const browser = await puppeteer.launch({ headless: true })
  const page = await browser.newPage()

  await page.goto(url, { waitUntil: "networkidle0" })

  // Wait for dynamic content:
  await page.waitForSelector(".product-card")

  // Extract data from JavaScript-rendered page:
  const products = await page.evaluate(() => {
    return Array.from(document.querySelectorAll(".product-card")).map((el) => ({
      name: el.querySelector(".name")?.textContent?.trim(),
      price: el.querySelector(".price")?.textContent?.trim(),
    }))
  })

  await browser.close()
  return products
}

Plugins ecosystem

import puppeteer from "puppeteer-extra"
import StealthPlugin from "puppeteer-extra-plugin-stealth"
import AdblockerPlugin from "puppeteer-extra-plugin-adblocker"
import RecaptchaPlugin from "puppeteer-extra-plugin-recaptcha"

// Stealth — bypass bot detection:
puppeteer.use(StealthPlugin())

// Ad blocker — faster page loads, less noise:
puppeteer.use(AdblockerPlugin({ blockTrackers: true }))

// reCAPTCHA solver (requires 2captcha API key):
puppeteer.use(RecaptchaPlugin({
  provider: { id: "2captcha", token: "YOUR_2CAPTCHA_KEY" },
}))

const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto("https://example.com/login")

// Automatically solves reCAPTCHA if present:
const { solved } = await page.solveRecaptchas()

Crawlee

Crawlee — full scraping framework:

HTTP crawler (fast)

import { CheerioCrawler } from "crawlee"

const crawler = new CheerioCrawler({
  maxRequestsPerCrawl: 100,
  maxConcurrency: 10,

  async requestHandler({ request, $, enqueueLinks, log }) {
    log.info(`Processing ${request.url}`)

    // Extract data with Cheerio:
    const title = $("h1").text()
    const products = $(".product").map((_, el) => ({
      name: $(el).find(".name").text().trim(),
      price: $(el).find(".price").text().trim(),
    })).get()

    // Store results:
    await Dataset.pushData({ url: request.url, title, products })

    // Follow links:
    await enqueueLinks({
      globs: ["https://example.com/products/**"],
    })
  },
})

await crawler.run(["https://example.com/products"])

Browser crawler (JavaScript-rendered)

import { PlaywrightCrawler } from "crawlee"

const crawler = new PlaywrightCrawler({
  maxConcurrency: 5,
  headless: true,

  async requestHandler({ page, request, enqueueLinks, log }) {
    log.info(`Processing ${request.url}`)

    // Wait for JavaScript content:
    await page.waitForSelector(".product-card")

    // Extract from rendered page:
    const products = await page.evaluate(() =>
      Array.from(document.querySelectorAll(".product-card")).map((el) => ({
        name: el.querySelector(".name")?.textContent?.trim(),
        price: el.querySelector(".price")?.textContent?.trim(),
      }))
    )

    await Dataset.pushData({ url: request.url, products })

    // Follow pagination:
    await enqueueLinks({ selector: ".pagination a" })
  },
})

await crawler.run(["https://example.com/products"])

Proxy rotation

import { CheerioCrawler, ProxyConfiguration } from "crawlee"

const proxyConfig = new ProxyConfiguration({
  proxyUrls: [
    "http://proxy1:8080",
    "http://proxy2:8080",
    "http://proxy3:8080",
  ],
  // Or use Apify proxy:
  // apifyProxyGroups: ["RESIDENTIAL"],
})

const crawler = new CheerioCrawler({
  proxyConfiguration: proxyConfig,
  // Automatically rotates proxies per request
  // Retries with different proxy on failure

  async requestHandler({ request, $ }) {
    // Each request uses a different proxy
  },
})

Queue management and retries

import { CheerioCrawler } from "crawlee"

const crawler = new CheerioCrawler({
  maxRequestRetries: 3,           // Retry failed requests 3 times
  maxConcurrency: 10,              // 10 parallel requests
  maxRequestsPerCrawl: 1000,       // Stop after 1000 requests
  requestHandlerTimeoutSecs: 60,   // Timeout per request

  // Automatic retry with backoff:
  // 1st retry: immediate
  // 2nd retry: after a few seconds
  // 3rd retry: after more seconds + different proxy

  async requestHandler({ request, $ }) {
    // Process page
  },

  async failedRequestHandler({ request, error }) {
    console.error(`Failed after retries: ${request.url}`, error.message)
  },
})

Feature Comparison

Feature	got-scraping	puppeteer-extra	Crawlee
HTTP scraping	✅	❌ (browser)	✅ (Cheerio)
Browser scraping	❌	✅	✅ (Playwright)
Anti-bot headers	✅	Via stealth	✅ (got-scraping)
Browser stealth	❌	✅ (plugin)	✅ (built-in)
Proxy rotation	Manual	Manual	✅ (built-in)
Request queue	❌	❌	✅
Auto retries	Via got	Manual	✅ (built-in)
Concurrency control	Manual	Manual	✅ (built-in)
Data storage	Manual	Manual	✅ (Dataset)
Link following	Manual	Manual	✅ (enqueueLinks)
reCAPTCHA solving	❌	✅ (plugin)	Via plugin
Weekly downloads	~30K	~200K	~50K

When to Use Each

Use got-scraping if:

Scraping server-rendered HTML (no JavaScript needed)
Need high-volume, fast HTTP scraping
Want realistic headers without a full browser
Simple scraping scripts with Cheerio for parsing

Use puppeteer-extra if:

Need to bypass sophisticated bot detection
Scraping JavaScript-rendered pages (SPAs)
Need reCAPTCHA solving or ad blocking
Want a full browser with stealth capabilities
Already using Puppeteer and need anti-detection

Use Crawlee if:

Building a production scraping pipeline
Need queue management, retries, and proxy rotation built-in
Want to switch between HTTP and browser crawlers as needed
Scraping at scale with concurrency control
Using Apify cloud for deployment

Methodology

Download data from npm registry (weekly average, February 2026). Feature comparison based on got-scraping v4.x, puppeteer-extra v3.x, and Crawlee v3.x.

Compare web scraping and automation tools on PkgPulse →