franc vs langdetect vs cld3: Language Detection in JavaScript (2026)
TL;DR
franc is the most popular JavaScript language detection library — pure JavaScript, works in browsers and Node.js, covers 400+ languages, and is tree-shakable (use franc-min for smaller bundles). langdetect is a port of Google's language detection algorithm — accurate for longer texts, designed for Node.js. @google-cloud/language and cld3 (compiled to WASM) offer Google's production-grade detection but require more setup. For browser-compatible language detection: franc. For server-side with high accuracy: langdetect or cld3. For short texts (tweets, comments): all struggle — franc-min is usually fine.
Key Takeaways
- franc: ~400K weekly downloads — 400+ languages, browser + Node.js, ESM-native, configurable
- langdetect: ~50K weekly downloads — port of Google's LangDetect, probabilistic, Node.js
- cld3 / @langion/cld3: WASM-compiled Compact Language Detector v3 — Google's production algorithm
- All language detectors struggle with: short texts (<50 chars), code snippets, mixed-language text
- franc provides confidence scores — filter low-confidence results
- For production apps: consider server-side with langdetect or cld3 for better accuracy
franc
franc — pure JavaScript language detection:
Basic usage
import { franc } from "franc"
// Or: import { franc } from "franc-min" // Fewer languages, smaller bundle
// Detect language:
franc("Hello, how are you?") // "eng" (English)
franc("Bonjour, comment allez-vous?") // "fra" (French)
franc("Guten Morgen, wie geht es Ihnen?") // "deu" (German)
franc("こんにちは、お元気ですか?") // "jpn" (Japanese)
franc("你好,你好吗?") // "cmn" (Mandarin Chinese)
franc("مرحبا كيف حالك؟") // "arb" (Arabic)
// Returns ISO 639-3 codes (3-letter codes, not ISO 639-1 2-letter codes)
// "eng" not "en", "fra" not "fr", "deu" not "de"
// Convert to ISO 639-1 if needed:
import iso6393to1 from "iso-639-1"
const lang3 = franc("Hello world") // "eng"
// Map manually or use a lookup table:
const iso1Map: Record<string, string> = { eng: "en", fra: "fr", deu: "de", jpn: "ja" }
const lang1 = iso1Map[lang3] ?? lang3
Confidence scores
import { francAll } from "franc"
// Get all candidates with confidence scores:
const results = francAll("Hello, how are you?")
// [
// ["eng", 1.0], // English — 100% confidence
// ["sco", 0.8], // Scots
// ["nob", 0.5], // Norwegian Bokmål
// ...
// ]
// Use the top result only if confidence is high:
function detectLanguage(text: string, minConfidence = 0.7): string | null {
const results = francAll(text)
const [lang, confidence] = results[0] ?? []
if (!lang || confidence < minConfidence) {
return null // Not confident enough
}
return lang // ISO 639-3 code
}
detectLanguage("Hello world") // "eng" (high confidence)
detectLanguage("Hi") // null (too short/ambiguous)
detectLanguage("Bonjour tout le monde") // "fra"
Configuration options
import { franc, francAll } from "franc"
// Limit to specific languages (improves accuracy when domain is known):
franc("Hello world", { only: ["eng", "fra", "deu", "spa"] })
// "eng" (only considers English, French, German, Spanish)
// Exclude certain languages:
franc("Hello world", { ignore: ["sco", "nob"] })
// "eng" (doesn't confuse with Scots or Norwegian)
// Minimum text length (default: 10):
franc("Hi", { minLength: 0 }) // Attempt even with very short text
franc("Hi", { minLength: 10 }) // Returns "und" (undetermined) for short text
franc-min vs franc vs franc-all
// franc ships multiple variants:
// franc-min — 82 languages, ~540KB (best for browsers):
import { franc } from "franc-min"
// franc — 400 languages, ~1.5MB (more coverage):
import { franc } from "franc"
// franc-all — 400+ languages (most comprehensive):
import { franc } from "franc-all"
// For browser apps, use franc-min — significant bundle size difference
// For server-side: franc (400 languages) is fine
Content moderation use case
import { franc } from "franc-min"
interface UserContent {
id: string
text: string
expectedLanguage: string // ISO 639-1: "en", "fr", etc.
}
const iso1ToIso3: Record<string, string> = {
en: "eng", fr: "fra", de: "deu", es: "spa",
pt: "por", it: "ita", nl: "nld", ja: "jpn",
zh: "cmn", ar: "arb", ru: "rus", ko: "kor",
}
function validateContentLanguage(content: UserContent): boolean {
const expected = iso1ToIso3[content.expectedLanguage]
const detected = franc(content.text, { minLength: 20 })
if (detected === "und") {
return true // Too short to determine — let through
}
return detected === expected
}
langdetect
langdetect — Google's LangDetect algorithm for Node.js:
Basic usage
import langdetect from "langdetect"
// Note: langdetect uses ISO 639-1 (2-letter codes) by default
// Detect (returns most likely language):
langdetect.detect("Hello, how are you?")
// "en"
langdetect.detect("Bonjour, comment allez-vous?")
// "fr"
langdetect.detect("这是一段中文文本")
// "zh-cn"
// Detect with probabilities:
langdetect.detectOne("Hello, how are you?")
// { lang: "en", prob: 0.9999... }
langdetect.detectAll("Hello, how are you?")
// [
// { lang: "en", prob: 0.9999 },
// { lang: "af", prob: 0.0000... }, // Afrikaans
// ...
// ]
Compared to franc accuracy
// langdetect performs better on longer texts (50+ words)
// franc performs better on very short texts (5-10 words)
// Both struggle with mixed-language content and code
// Test on short text:
import { franc } from "franc-min"
import langdetect from "langdetect"
const shortText = "Hello"
franc(shortText) // "sco" (often wrong on very short)
langdetect.detect(shortText) // "en" (usually correct)
// Test on longer text:
const paragraph = "This is a longer text that contains multiple sentences in English."
franc(paragraph) // "eng" ✓
langdetect.detect(paragraph) // "en" ✓
// langdetect is probabilistic — runs multiple trials internally
// More accurate for longer texts due to statistical approach
cld3 (Compact Language Detector)
CLD3 / node-cld — Google's production language detection:
// @langion/cld3 — WASM build of Google's CLD3:
import cld3 from "@langion/cld3"
await cld3.ready() // Wait for WASM initialization
const result = cld3.findLanguage("Hello, how are you?")
// {
// language: "en",
// probability: 0.9999...,
// isReliable: true,
// proportion: 1.0,
// }
// Find top 3 languages (for mixed-language text):
const results = cld3.findTopNMostFreqLangs("Hello world, Bonjour monde!", 3)
// [
// { language: "en", probability: 0.6, isReliable: true },
// { language: "fr", probability: 0.3, isReliable: false },
// ]
// CLD3 is the most accurate for production use cases
// but requires WASM setup and is larger than franc/langdetect
Feature Comparison
| Feature | franc | langdetect | cld3 |
|---|---|---|---|
| Language count | 400+ | 55 | 107 |
| Short text | ⚠️ Weak | ⚠️ Weak | ✅ Better |
| Browser support | ✅ | ❌ | ✅ (WASM) |
| Bundle size | ~540KB (min) | ~2MB | ~8MB (WASM) |
| ISO codes | 639-3 | 639-1 | 639-1 |
| Confidence score | ✅ | ✅ | ✅ |
| ESM | ✅ | ❌ | ✅ |
| TypeScript | ✅ | ✅ @types | ✅ |
| No binary deps | ✅ | ✅ | ✅ (WASM) |
| Accuracy (long text) | Good | Very good | Excellent |
When to Use Each
Choose franc if:
- Browser compatibility required (React, Vue, Svelte apps)
- You need 400+ language support
- Lightweight detection (franc-min for browser bundles)
- ESM-first codebase
Choose langdetect if:
- Server-side Node.js only (no browser)
- You need the probabilistic accuracy of Google's original algorithm
- Text is typically 50+ words (langdetect shines with longer text)
Choose cld3 if:
- Production apps requiring Google-grade accuracy
- You can accept the WASM bundle overhead (~8MB)
- Mixed-language text detection is important
Handle edge cases:
// All detectors struggle with these cases — handle gracefully:
// 1. Very short text:
if (text.length < 20) return "und" // Don't trust detection
// 2. All-caps or all numbers:
if (/^[A-Z0-9\s]+$/.test(text)) return "und"
// 3. Mixed language (code-switching):
// Consider breaking into sentences first
// 4. Code/technical content:
// Package names, URLs, code snippets always return wrong results
// Strip before detecting
// 5. Confidence threshold:
const [lang, score] = francAll(text)[0]
if (score < 0.8) return "und" // Threshold for "I'm sure"
Methodology
Download data from npm registry (weekly average, February 2026). Accuracy comparisons based on community benchmarks and documentation for franc v6.x, langdetect v1.x, and @langion/cld3 v1.x.