TL;DR
metascraper is the most complete metadata extraction library — rule-based, handles Open Graph, Twitter Cards, JSON-LD, oEmbed, and falls back gracefully through multiple sources. open-graph-scraper is the most popular and straightforward — focused on Open Graph tags with optional Twitter Card support, easy to use. unfurl.js is a full oEmbed + Open Graph extractor with TypeScript types out of the box. For link preview features (like Slack/Discord): metascraper. For simple OG tag extraction: open-graph-scraper. For oEmbed support (YouTube, Twitter embed codes): unfurl.js.
Key Takeaways
- metascraper: ~90K weekly downloads — rule-based, 30+ rules, Open Graph + JSON-LD + oEmbed + fallbacks
- open-graph-scraper: ~200K weekly downloads — most popular, focused on OG tags, easy API
- unfurl.js: ~20K weekly downloads — TypeScript-first, oEmbed + Open Graph + Twitter Cards
- All three require fetching the page HTML first (or accept pre-fetched HTML)
- Rate limiting matters — implement caching to avoid hammering external sites
- For production: cache results in Redis or DB — don't re-scrape on every request
What Metadata Gets Extracted
Open Graph (og:*) tags — set by web publishers:
og:title "React vs Vue in 2026"
og:description "A data-driven comparison of download trends..."
og:image "https://pkgpulse.com/og/react-vs-vue.png"
og:url "https://pkgpulse.com/compare/react-vs-vue"
og:type "article" | "website" | "video" | etc.
Twitter Card tags (twitter:*):
twitter:card "summary_large_image"
twitter:title "React vs Vue"
twitter:image "https://..."
JSON-LD (structured data):
{ "@type": "Article", "headline": "...", "image": [...] }
oEmbed — embed codes for YouTube, Twitter, Instagram, etc.:
{ type: "video", html: "<iframe>...</iframe>", title: "..." }
Fallbacks (when og:* not set):
<title> tag, <meta name="description">, first <img> on page
metascraper
metascraper — rule-based metadata extraction:
Setup (rule-based plugins)
import got from "got"
import metascraper from "metascraper"
import metascraperTitle from "metascraper-title"
import metascraperDescription from "metascraper-description"
import metascraperImage from "metascraper-image"
import metascraperUrl from "metascraper-url"
import metascraperAuthor from "metascraper-author"
import metascraperDate from "metascraper-date"
// Compose your scraper with the rules you need:
const scraper = metascraper([
metascraperTitle(), // og:title, twitter:title, <title>
metascraperDescription(), // og:description, meta[description], first paragraph
metascraperImage(), // og:image, twitter:image, first img
metascraperUrl(), // og:url, canonical link, href
metascraperAuthor(), // JSON-LD author, meta[author]
metascraperDate(), // og:published_time, JSON-LD datePublished
])
// Fetch and extract:
async function extractMetadata(url: string) {
const { body: html, url: finalUrl } = await got(url)
const metadata = await scraper({ html, url: finalUrl })
return metadata
}
const meta = await extractMetadata("https://pkgpulse.com/blog/react-vs-vue")
// {
// title: "React vs Vue in 2026: A Data-Driven Comparison",
// description: "Compare React and Vue download trends, health scores...",
// image: "https://pkgpulse.com/og/react-vs-vue.png",
// url: "https://pkgpulse.com/blog/react-vs-vue",
// author: "PkgPulse Team",
// date: "2026-03-01T00:00:00.000Z",
// }
Available rules
// metascraper plugins — install only what you need:
// npm install metascraper-title metascraper-description metascraper-image
import metascraperTitle from "metascraper-title"
import metascraperDescription from "metascraper-description"
import metascraperImage from "metascraper-image"
import metascraperUrl from "metascraper-url"
import metascraperAuthor from "metascraper-author"
import metascraperDate from "metascraper-date"
import metascraperPublisher from "metascraper-publisher" // Site name
import metascraperReadability from "metascraper-readability" // Article content
import metascraperLang from "metascraper-lang" // Page language
import metascraperVideo from "metascraper-video" // og:video, video elements
import metascraperAudio from "metascraper-audio" // og:audio
import metascraperIframe from "metascraper-iframe" // oEmbed iframe
import metascraperTwitter from "metascraper-twitter" // Twitter-specific
import metascraperYoutube from "metascraper-youtube" // YouTube oEmbed
import metascraperSpotify from "metascraper-spotify" // Spotify oEmbed
Custom rule
import metascraper from "metascraper"
// Write a custom rule for site-specific metadata:
const metascraperPkgPulse = () => ({
packageName: [
// Rule 1: Try custom meta tag first:
({ htmlDom: $ }) => $("meta[name='pkg:name']").attr("content"),
// Rule 2: Fall back to og:title parsing:
({ htmlDom: $ }) => {
const title = $("meta[property='og:title']").attr("content")
return title?.match(/^(\S+) vs/)?.[1] // Extract first package name
},
],
})
const scraper = metascraper([
metascraperPkgPulse(),
// ... other rules
])
open-graph-scraper
open-graph-scraper — straightforward OG extraction:
Basic usage
import ogs from "open-graph-scraper"
// Simple API — one function:
const { result, error } = await ogs({ url: "https://pkgpulse.com" })
if (error) {
console.error("Failed to scrape:", error)
} else {
console.log(result.ogTitle) // "PkgPulse — npm Package Health"
console.log(result.ogDescription) // "Compare npm packages..."
console.log(result.ogImage) // [{ url: "https://...", type: "image/png" }]
console.log(result.ogUrl) // "https://pkgpulse.com"
console.log(result.ogSiteName) // "PkgPulse"
console.log(result.twitterCard) // "summary_large_image"
console.log(result.twitterTitle) // "PkgPulse"
}
With custom options
import ogs from "open-graph-scraper"
const { result } = await ogs({
url: "https://example.com",
// Custom fetch options:
fetchOptions: {
headers: {
"User-Agent": "MyApp/1.0 LinkPreviewBot",
Accept: "text/html",
},
signal: AbortSignal.timeout(5000), // 5 second timeout
},
// Pass pre-fetched HTML (no network request):
html: "<html>...</html>",
// When html is provided, url is still required for relative URL resolution
})
Link preview API endpoint
import express from "express"
import ogs from "open-graph-scraper"
const app = express()
// Rate limiting + caching are essential here:
import NodeCache from "node-cache"
const cache = new NodeCache({ stdTTL: 3600 }) // 1 hour cache
app.get("/api/link-preview", async (req, res) => {
const url = req.query.url as string
if (!url) {
return res.status(400).json({ error: "url required" })
}
// Validate URL:
try { new URL(url) } catch {
return res.status(400).json({ error: "Invalid URL" })
}
// Check cache:
const cached = cache.get(url)
if (cached) return res.json(cached)
try {
const { result } = await ogs({
url,
fetchOptions: {
headers: { "User-Agent": "LinkPreviewBot/1.0" },
signal: AbortSignal.timeout(8000),
},
})
const preview = {
title: result.ogTitle ?? result.twitterTitle ?? null,
description: result.ogDescription ?? result.twitterDescription ?? null,
image: result.ogImage?.[0]?.url ?? result.twitterImage?.[0]?.url ?? null,
url: result.ogUrl ?? url,
siteName: result.ogSiteName ?? null,
}
cache.set(url, preview)
res.json(preview)
} catch (err) {
res.status(500).json({ error: "Failed to fetch preview" })
}
})
unfurl.js
unfurl.js — TypeScript-first, oEmbed + Open Graph:
Basic usage
import { unfurl } from "unfurl.js"
// Returns strongly-typed metadata:
const metadata = await unfurl("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
// {
// title: "Rick Astley - Never Gonna Give You Up",
// description: "...",
// open_graph: { title: "...", images: [...] },
// twitter_card: { title: "..." },
// oEmbed: {
// type: "video",
// title: "Rick Astley - Never Gonna Give You Up",
// html: "<iframe width='459' height='344' src='https://www.youtube.com/embed/...'...",
// width: 459,
// height: 344,
// },
// }
// oEmbed gives you the embed iframe HTML — useful for YouTube, Twitter, Instagram, etc.
// For regular websites (no oEmbed):
const pkgMeta = await unfurl("https://pkgpulse.com")
// {
// title: "PkgPulse — npm Package Health",
// open_graph: { title: "...", description: "...", images: [...] },
// description: "...",
// }
TypeScript types
import { unfurl, Metadata } from "unfurl.js"
// Strongly typed return value:
const meta: Metadata = await unfurl("https://pkgpulse.com/blog/react-vs-vue")
// Access typed fields:
const ogImages: Array<{ url: string; width?: number; height?: number }> =
meta.open_graph?.images ?? []
const oEmbedHtml: string | undefined = meta.oEmbed?.html
// Type narrowing:
if (meta.oEmbed?.type === "video") {
console.log("Video embed:", meta.oEmbed.html)
console.log("Video width:", meta.oEmbed.width)
}
Feature Comparison
| Feature | metascraper | open-graph-scraper | unfurl.js |
|---|---|---|---|
| Open Graph | ✅ | ✅ | ✅ |
| Twitter Cards | ✅ | ✅ | ✅ |
| JSON-LD | ✅ | ✅ | ✅ |
| oEmbed | ✅ | ⚠️ Limited | ✅ Excellent |
| Custom rules | ✅ | ❌ | ❌ |
| Readability (content) | ✅ (plugin) | ❌ | ❌ |
| TypeScript | ✅ | ✅ | ✅ (best) |
| Modularity | ✅ Per-plugin | ❌ Monolithic | ❌ |
| Bundle size | Modular | ~200KB | ~150KB |
| ESM | ✅ | ✅ | ✅ |
| Pre-fetched HTML | ✅ | ✅ | ❌ (URL only) |
When to Use Each
Choose metascraper if:
- Building a full link preview system (like Slack or Notion)
- You need fallback chains — try og:, then JSON-LD, then raw HTML
- Custom rule needed for specific sites
- Want modular installation (only include rules you use)
Choose open-graph-scraper if:
- Simple OG tag extraction is all you need
- Quick setup with minimal configuration
- Most popular → most Stack Overflow answers and examples
Choose unfurl.js if:
- TypeScript types are important — unfurl has the best typings
- YouTube/Twitter/Vimeo oEmbed support (embed codes) is needed
- Simplest API with good oEmbed handling
Security Considerations for URL Metadata Scrapers
Accepting arbitrary URLs from users and fetching them on your server introduces Server-Side Request Forgery (SSRF) vulnerabilities. An attacker can supply internal URLs like http://169.254.169.254/latest/meta-data/ (AWS instance metadata), http://localhost:6379 (Redis), or http://10.0.0.1/admin to probe your internal network. All three libraries perform the HTTP fetch themselves (or via the underlying HTTP client you provide), so the SSRF protection must be implemented at the application level before passing a URL to any of them.
The minimum SSRF mitigation is URL validation against an allowlist of acceptable schemes and a denylist of private IP ranges. Reject URLs with file://, ftp://, gopher://, or non-standard schemes. Resolve the hostname to an IP address and reject if the resolved IP is in RFC 1918 private ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), loopback (127.0.0.0/8), or link-local ranges (169.254.0.0/16). The ssrf-req-filter npm package implements this check and can be used as a custom fetchOptions.agent with open-graph-scraper.
A complementary mitigation is implementing a strict Content-Type check on the HTTP response. Your scraper should only process text/html responses — fetching a PDF, image, or binary executable and passing it to a metadata parser is wasteful and potentially dangerous. Setting a response size limit (reject responses larger than 2MB) prevents memory exhaustion from servers that return unexpectedly large documents. Rate limiting by requesting IP on your link preview endpoint prevents a single user from using your server as an unrestricted HTTP proxy to external URLs.
Caching Strategies for Production Link Preview Services
None of these libraries include caching — they fetch and parse a URL on every call. In a production link preview endpoint serving a social feed or a chat application, re-scraping the same URL on every request will trigger bot detection, rate limiting, and unnecessary latency. A Redis-based cache with a 24-hour TTL is the standard pattern: key on the normalized URL, store the extracted metadata as JSON, and serve from cache on repeat hits.
The more nuanced problem is negative caching. When a URL fails to return valid Open Graph tags (a private GitHub repo, a 404 page, an anti-scraping protected site), you don't want to hammer it on every cache miss. Cache the failure result too — a { error: true, fetchedAt: Date } object with a shorter TTL (1-6 hours) — so repeat failures serve the cached error response rather than retrying. open-graph-scraper's error response is simple enough to serialize directly; metascraper returns null for missing fields, so a failed scrape is just an object where every field is null; unfurl throws on network failures, so you need a try/catch wrapper to normalize errors into a cacheable shape.
Rate limiting from the upstream site's perspective is also important. Many sites block scrapers after a few hundred requests per day from the same IP. Using a User-Agent header that identifies your bot ("MyApp/1.0 +https://myapp.com/bot") is both courteous and practically necessary — some sites return different content to known bot agents. For high-volume preview services, rotating request origins via a proxy or respecting X-Robots-Tag: noindex signals reduces the chance of getting blocked.
TypeScript Integration and Typed Metadata Shapes
Type safety in metadata extraction prevents a class of runtime bugs where your code assumes result.ogImage is a string but receives an array, or assumes result.ogTitle is always defined when it may be undefined for sites without OG tags.
unfurl.js has the strongest TypeScript integration of the three. Its Metadata type is a comprehensive interface with optional fields matching the open standard: open_graph?.images is typed as Array<{ url: string; type?: string; width?: number; height?: number }>, and oEmbed?.html is string | undefined. This specificity means accessing a deeply nested property like the first image URL is fully typed with optional chaining, and TypeScript's strict mode will catch cases where you forget to handle the undefined branch before passing the value to a function expecting a string.
open-graph-scraper's TypeScript types are less complete. The result object is typed, but many fields return string | undefined without distinguishing cases where the field is simply absent versus present but empty. The ogImage field is typed as OgImage[] (always an array) in some versions and OgImage | OgImage[] in others, depending on the version. Accessing result.ogImage?.[0]?.url is required defensive code in all versions. metascraper returns a plain object from scraper({ html, url }) with keys matching the plugins you included — the TypeScript types are provided per-plugin package and require assembling the return type manually for full inference.
Handling Metadata Quality Across Different Sites
Real-world Open Graph data is inconsistent. A well-maintained tech site returns properly formatted og:image URLs with absolute paths and correctly sized images. A legacy CMS might return relative image URLs (/images/logo.png), no og:description, or a broken og:url that doesn't match the canonical URL. Each library handles this differently, and the differences matter in production.
metascraper applies its waterfall rules to gracefully degrade: if og:image isn't present, it tries twitter:image, then the first <img> tag in the article body above a minimum pixel dimension. The result is that metascraper almost always returns something for image, even for pages with no explicit OG tags. This makes it the best choice when you're building a link preview that must show an image for a broad range of URLs.
open-graph-scraper returns what's in the meta tags with minimal fallback logic. If og:image isn't set, result.ogImage is undefined. This is predictable but means your UI needs to handle missing fields gracefully — showing a placeholder image or a text-only preview. For a product that focuses on well-structured content sites (tech blogs, news outlets), this is acceptable. For a general-purpose URL previewer, the missing-field handling shifts to your application code.
unfurl's TypeScript types make missing-field handling explicit at the type level — meta.open_graph?.images?.[0]?.url is optional chaining at every step, forcing you to handle undefined at compile time rather than discovering it at runtime. This type-driven approach to incomplete data is unfurl's clearest advantage over the other two libraries for TypeScript projects.
Methodology
Download data from npm registry (weekly average, February 2026). Feature comparison based on metascraper v5.x, open-graph-scraper v6.x, and unfurl.js v2.x.
Compare web scraping and metadata packages on PkgPulse →
In 2026, open-graph-scraper is the most widely adopted choice for simple OG tag extraction, metascraper is the choice when you need intelligent metadata normalization across sites that don't publish OG tags, and unfurl.js is the best fit for chat and social applications that need complete link preview cards including player metadata.
See also: pm2 vs node:cluster vs tsx watch and h3 vs polka vs koa 2026, better-sqlite3 vs libsql vs sql.js.