Skip to main content

got-scraping vs Crawlee vs puppeteer-extra: Advanced Web Scraping in Node.js (2026)

·PkgPulse Team

TL;DR

Crawlee is the full web scraping framework from Apify — request queuing, automatic retries, proxy rotation, browser pool management, and both HTTP and browser-based crawlers in one toolkit. got-scraping is got with anti-bot headers — generates realistic browser-like HTTP headers, TLS fingerprinting, automatic header rotation for scraping without a browser. puppeteer-extra is Puppeteer with plugins — stealth mode to bypass bot detection, ad blocking, reCAPTCHA solving, and other plugin extensions. In 2026: Crawlee for production scraping pipelines, got-scraping for fast HTTP-based scraping, puppeteer-extra for browser automation with stealth.

Key Takeaways

  • Crawlee: ~50K weekly downloads — full framework, queue management, proxy rotation, Apify
  • got-scraping: ~30K weekly downloads — HTTP scraping with realistic headers, TLS fingerprinting
  • puppeteer-extra: ~200K weekly downloads — Puppeteer + stealth plugin, anti-detection
  • got-scraping is for HTTP requests — fast, no browser overhead, works for many sites
  • puppeteer-extra controls a real browser — handles JavaScript-rendered pages, stealth mode
  • Crawlee combines both approaches — use HTTP crawlers or browser crawlers as needed

The Anti-Bot Challenge

Modern websites detect scrapers via:
  🔍 HTTP headers — missing or wrong User-Agent, Accept, Accept-Language
  🔍 TLS fingerprint — Node.js has a different TLS fingerprint than Chrome
  🔍 JavaScript challenges — Cloudflare, PerimeterX, DataDome
  🔍 Browser fingerprint — headless Chrome has detectable properties
  🔍 Rate limiting — too many requests too fast
  🔍 IP reputation — datacenter IPs flagged as bots

Solutions:
  got-scraping    → Fixes HTTP headers + TLS fingerprint
  puppeteer-extra → Fixes browser fingerprint + JS challenges
  Crawlee         → Framework that orchestrates both approaches

got-scraping

got-scraping — HTTP scraping with realistic headers:

Basic usage

import { gotScraping } from "got-scraping"

// Makes requests that look like a real browser:
const response = await gotScraping({
  url: "https://example.com/products",
  // Automatically generates realistic headers:
  // User-Agent, Accept, Accept-Language, Accept-Encoding
  // Sec-Ch-Ua, Sec-Fetch-* headers (Chrome-like)
})

console.log(response.body) // HTML content

// With proxy:
const response2 = await gotScraping({
  url: "https://example.com/api/data",
  proxyUrl: "http://proxy:8080",
  responseType: "json",
})

Header generation

import { gotScraping } from "got-scraping"

// got-scraping generates different realistic headers each time:
const response = await gotScraping({
  url: "https://example.com",
  headerGeneratorOptions: {
    browsers: ["chrome", "firefox"],     // Mimic Chrome or Firefox
    devices: ["desktop"],                 // Desktop headers
    operatingSystems: ["windows", "macos"],
    locales: ["en-US"],
  },
})

// Example generated headers:
// User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...
// Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
// Accept-Language: en-US,en;q=0.9
// Sec-Ch-Ua: "Chromium";v="122", "Google Chrome";v="122"
// Sec-Fetch-Mode: navigate

Scraping with Cheerio

import { gotScraping } from "got-scraping"
import * as cheerio from "cheerio"

async function scrapeProducts(url: string) {
  const { body } = await gotScraping({ url })
  const $ = cheerio.load(body)

  const products = $(".product-card").map((_, el) => ({
    name: $(el).find(".product-name").text().trim(),
    price: $(el).find(".product-price").text().trim(),
    url: $(el).find("a").attr("href"),
  })).get()

  return products
}

// Fast — no browser needed:
const products = await scrapeProducts("https://example.com/products")

When got-scraping isn't enough

got-scraping works for:
  ✅ Server-rendered HTML pages
  ✅ REST APIs with anti-bot headers
  ✅ Sites that check User-Agent and headers
  ✅ High-volume scraping (fast — no browser)

got-scraping fails for:
  ❌ JavaScript-rendered content (SPAs)
  ❌ Cloudflare challenge pages
  ❌ Sites requiring browser fingerprinting
  ❌ Interactive elements (login forms, infinite scroll)
  → Use puppeteer-extra or Crawlee with browser crawler

puppeteer-extra

puppeteer-extra — Puppeteer with plugins:

Stealth plugin

import puppeteer from "puppeteer-extra"
import StealthPlugin from "puppeteer-extra-plugin-stealth"

// Add stealth plugin — hides headless Chrome indicators:
puppeteer.use(StealthPlugin())

const browser = await puppeteer.launch({ headless: true })
const page = await browser.newPage()

// Now headless Chrome looks like a real browser:
await page.goto("https://bot-detection-site.com")

// Stealth plugin patches:
// ✅ navigator.webdriver → false
// ✅ Chrome runtime properties present
// ✅ Correct WebGL vendor/renderer
// ✅ Plugin array not empty
// ✅ Language and timezone consistent
// ✅ iframe contentWindow access

Scraping with stealth

import puppeteer from "puppeteer-extra"
import StealthPlugin from "puppeteer-extra-plugin-stealth"

puppeteer.use(StealthPlugin())

async function scrapeJSRenderedPage(url: string) {
  const browser = await puppeteer.launch({ headless: true })
  const page = await browser.newPage()

  await page.goto(url, { waitUntil: "networkidle0" })

  // Wait for dynamic content:
  await page.waitForSelector(".product-card")

  // Extract data from JavaScript-rendered page:
  const products = await page.evaluate(() => {
    return Array.from(document.querySelectorAll(".product-card")).map((el) => ({
      name: el.querySelector(".name")?.textContent?.trim(),
      price: el.querySelector(".price")?.textContent?.trim(),
    }))
  })

  await browser.close()
  return products
}

Plugins ecosystem

import puppeteer from "puppeteer-extra"
import StealthPlugin from "puppeteer-extra-plugin-stealth"
import AdblockerPlugin from "puppeteer-extra-plugin-adblocker"
import RecaptchaPlugin from "puppeteer-extra-plugin-recaptcha"

// Stealth — bypass bot detection:
puppeteer.use(StealthPlugin())

// Ad blocker — faster page loads, less noise:
puppeteer.use(AdblockerPlugin({ blockTrackers: true }))

// reCAPTCHA solver (requires 2captcha API key):
puppeteer.use(RecaptchaPlugin({
  provider: { id: "2captcha", token: "YOUR_2CAPTCHA_KEY" },
}))

const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto("https://example.com/login")

// Automatically solves reCAPTCHA if present:
const { solved } = await page.solveRecaptchas()

Crawlee

Crawlee — full scraping framework:

HTTP crawler (fast)

import { CheerioCrawler } from "crawlee"

const crawler = new CheerioCrawler({
  maxRequestsPerCrawl: 100,
  maxConcurrency: 10,

  async requestHandler({ request, $, enqueueLinks, log }) {
    log.info(`Processing ${request.url}`)

    // Extract data with Cheerio:
    const title = $("h1").text()
    const products = $(".product").map((_, el) => ({
      name: $(el).find(".name").text().trim(),
      price: $(el).find(".price").text().trim(),
    })).get()

    // Store results:
    await Dataset.pushData({ url: request.url, title, products })

    // Follow links:
    await enqueueLinks({
      globs: ["https://example.com/products/**"],
    })
  },
})

await crawler.run(["https://example.com/products"])

Browser crawler (JavaScript-rendered)

import { PlaywrightCrawler } from "crawlee"

const crawler = new PlaywrightCrawler({
  maxConcurrency: 5,
  headless: true,

  async requestHandler({ page, request, enqueueLinks, log }) {
    log.info(`Processing ${request.url}`)

    // Wait for JavaScript content:
    await page.waitForSelector(".product-card")

    // Extract from rendered page:
    const products = await page.evaluate(() =>
      Array.from(document.querySelectorAll(".product-card")).map((el) => ({
        name: el.querySelector(".name")?.textContent?.trim(),
        price: el.querySelector(".price")?.textContent?.trim(),
      }))
    )

    await Dataset.pushData({ url: request.url, products })

    // Follow pagination:
    await enqueueLinks({ selector: ".pagination a" })
  },
})

await crawler.run(["https://example.com/products"])

Proxy rotation

import { CheerioCrawler, ProxyConfiguration } from "crawlee"

const proxyConfig = new ProxyConfiguration({
  proxyUrls: [
    "http://proxy1:8080",
    "http://proxy2:8080",
    "http://proxy3:8080",
  ],
  // Or use Apify proxy:
  // apifyProxyGroups: ["RESIDENTIAL"],
})

const crawler = new CheerioCrawler({
  proxyConfiguration: proxyConfig,
  // Automatically rotates proxies per request
  // Retries with different proxy on failure

  async requestHandler({ request, $ }) {
    // Each request uses a different proxy
  },
})

Queue management and retries

import { CheerioCrawler } from "crawlee"

const crawler = new CheerioCrawler({
  maxRequestRetries: 3,           // Retry failed requests 3 times
  maxConcurrency: 10,              // 10 parallel requests
  maxRequestsPerCrawl: 1000,       // Stop after 1000 requests
  requestHandlerTimeoutSecs: 60,   // Timeout per request

  // Automatic retry with backoff:
  // 1st retry: immediate
  // 2nd retry: after a few seconds
  // 3rd retry: after more seconds + different proxy

  async requestHandler({ request, $ }) {
    // Process page
  },

  async failedRequestHandler({ request, error }) {
    console.error(`Failed after retries: ${request.url}`, error.message)
  },
})

Feature Comparison

Featuregot-scrapingpuppeteer-extraCrawlee
HTTP scraping❌ (browser)✅ (Cheerio)
Browser scraping✅ (Playwright)
Anti-bot headersVia stealth✅ (got-scraping)
Browser stealth✅ (plugin)✅ (built-in)
Proxy rotationManualManual✅ (built-in)
Request queue
Auto retriesVia gotManual✅ (built-in)
Concurrency controlManualManual✅ (built-in)
Data storageManualManual✅ (Dataset)
Link followingManualManual✅ (enqueueLinks)
reCAPTCHA solving✅ (plugin)Via plugin
Weekly downloads~30K~200K~50K

When to Use Each

Use got-scraping if:

  • Scraping server-rendered HTML (no JavaScript needed)
  • Need high-volume, fast HTTP scraping
  • Want realistic headers without a full browser
  • Simple scraping scripts with Cheerio for parsing

Use puppeteer-extra if:

  • Need to bypass sophisticated bot detection
  • Scraping JavaScript-rendered pages (SPAs)
  • Need reCAPTCHA solving or ad blocking
  • Want a full browser with stealth capabilities
  • Already using Puppeteer and need anti-detection

Use Crawlee if:

  • Building a production scraping pipeline
  • Need queue management, retries, and proxy rotation built-in
  • Want to switch between HTTP and browser crawlers as needed
  • Scraping at scale with concurrency control
  • Using Apify cloud for deployment

Methodology

Download data from npm registry (weekly average, February 2026). Feature comparison based on got-scraping v4.x, puppeteer-extra v3.x, and Crawlee v3.x.

Compare web scraping and automation tools on PkgPulse →

Comments

Stay Updated

Get the latest package insights, npm trends, and tooling tips delivered to your inbox.