Skip to main content

franc vs langdetect vs cld3: Language Detection in JavaScript (2026)

·PkgPulse Team

TL;DR

franc is the most popular JavaScript language detection library — pure JavaScript, works in browsers and Node.js, covers 400+ languages, and is tree-shakable (use franc-min for smaller bundles). langdetect is a port of Google's language detection algorithm — accurate for longer texts, designed for Node.js. @google-cloud/language and cld3 (compiled to WASM) offer Google's production-grade detection but require more setup. For browser-compatible language detection: franc. For server-side with high accuracy: langdetect or cld3. For short texts (tweets, comments): all struggle — franc-min is usually fine.

Key Takeaways

  • franc: ~400K weekly downloads — 400+ languages, browser + Node.js, ESM-native, configurable
  • langdetect: ~50K weekly downloads — port of Google's LangDetect, probabilistic, Node.js
  • cld3 / @langion/cld3: WASM-compiled Compact Language Detector v3 — Google's production algorithm
  • All language detectors struggle with: short texts (<50 chars), code snippets, mixed-language text
  • franc provides confidence scores — filter low-confidence results
  • For production apps: consider server-side with langdetect or cld3 for better accuracy

franc

franc — pure JavaScript language detection:

Basic usage

import { franc } from "franc"
// Or: import { franc } from "franc-min"  // Fewer languages, smaller bundle

// Detect language:
franc("Hello, how are you?")    // "eng" (English)
franc("Bonjour, comment allez-vous?")   // "fra" (French)
franc("Guten Morgen, wie geht es Ihnen?")  // "deu" (German)
franc("こんにちは、お元気ですか?")  // "jpn" (Japanese)
franc("你好,你好吗?")  // "cmn" (Mandarin Chinese)
franc("مرحبا كيف حالك؟")  // "arb" (Arabic)

// Returns ISO 639-3 codes (3-letter codes, not ISO 639-1 2-letter codes)
// "eng" not "en", "fra" not "fr", "deu" not "de"

// Convert to ISO 639-1 if needed:
import iso6393to1 from "iso-639-1"
const lang3 = franc("Hello world")  // "eng"
// Map manually or use a lookup table:
const iso1Map: Record<string, string> = { eng: "en", fra: "fr", deu: "de", jpn: "ja" }
const lang1 = iso1Map[lang3] ?? lang3

Confidence scores

import { francAll } from "franc"

// Get all candidates with confidence scores:
const results = francAll("Hello, how are you?")
// [
//   ["eng", 1.0],   // English — 100% confidence
//   ["sco", 0.8],   // Scots
//   ["nob", 0.5],   // Norwegian Bokmål
//   ...
// ]

// Use the top result only if confidence is high:
function detectLanguage(text: string, minConfidence = 0.7): string | null {
  const results = francAll(text)
  const [lang, confidence] = results[0] ?? []

  if (!lang || confidence < minConfidence) {
    return null  // Not confident enough
  }

  return lang  // ISO 639-3 code
}

detectLanguage("Hello world")           // "eng" (high confidence)
detectLanguage("Hi")                    // null (too short/ambiguous)
detectLanguage("Bonjour tout le monde") // "fra"

Configuration options

import { franc, francAll } from "franc"

// Limit to specific languages (improves accuracy when domain is known):
franc("Hello world", { only: ["eng", "fra", "deu", "spa"] })
// "eng" (only considers English, French, German, Spanish)

// Exclude certain languages:
franc("Hello world", { ignore: ["sco", "nob"] })
// "eng" (doesn't confuse with Scots or Norwegian)

// Minimum text length (default: 10):
franc("Hi", { minLength: 0 })   // Attempt even with very short text
franc("Hi", { minLength: 10 })  // Returns "und" (undetermined) for short text

franc-min vs franc vs franc-all

// franc ships multiple variants:

// franc-min — 82 languages, ~540KB (best for browsers):
import { franc } from "franc-min"

// franc — 400 languages, ~1.5MB (more coverage):
import { franc } from "franc"

// franc-all — 400+ languages (most comprehensive):
import { franc } from "franc-all"

// For browser apps, use franc-min — significant bundle size difference
// For server-side: franc (400 languages) is fine

Content moderation use case

import { franc } from "franc-min"

interface UserContent {
  id: string
  text: string
  expectedLanguage: string  // ISO 639-1: "en", "fr", etc.
}

const iso1ToIso3: Record<string, string> = {
  en: "eng", fr: "fra", de: "deu", es: "spa",
  pt: "por", it: "ita", nl: "nld", ja: "jpn",
  zh: "cmn", ar: "arb", ru: "rus", ko: "kor",
}

function validateContentLanguage(content: UserContent): boolean {
  const expected = iso1ToIso3[content.expectedLanguage]
  const detected = franc(content.text, { minLength: 20 })

  if (detected === "und") {
    return true  // Too short to determine — let through
  }

  return detected === expected
}

langdetect

langdetect — Google's LangDetect algorithm for Node.js:

Basic usage

import langdetect from "langdetect"
// Note: langdetect uses ISO 639-1 (2-letter codes) by default

// Detect (returns most likely language):
langdetect.detect("Hello, how are you?")
// "en"

langdetect.detect("Bonjour, comment allez-vous?")
// "fr"

langdetect.detect("这是一段中文文本")
// "zh-cn"

// Detect with probabilities:
langdetect.detectOne("Hello, how are you?")
// { lang: "en", prob: 0.9999... }

langdetect.detectAll("Hello, how are you?")
// [
//   { lang: "en", prob: 0.9999 },
//   { lang: "af", prob: 0.0000... },  // Afrikaans
//   ...
// ]

Compared to franc accuracy

// langdetect performs better on longer texts (50+ words)
// franc performs better on very short texts (5-10 words)
// Both struggle with mixed-language content and code

// Test on short text:
import { franc } from "franc-min"
import langdetect from "langdetect"

const shortText = "Hello"
franc(shortText)                           // "sco" (often wrong on very short)
langdetect.detect(shortText)               // "en" (usually correct)

// Test on longer text:
const paragraph = "This is a longer text that contains multiple sentences in English."
franc(paragraph)                           // "eng" ✓
langdetect.detect(paragraph)              // "en" ✓

// langdetect is probabilistic — runs multiple trials internally
// More accurate for longer texts due to statistical approach

cld3 (Compact Language Detector)

CLD3 / node-cld — Google's production language detection:

// @langion/cld3 — WASM build of Google's CLD3:
import cld3 from "@langion/cld3"

await cld3.ready()  // Wait for WASM initialization

const result = cld3.findLanguage("Hello, how are you?")
// {
//   language: "en",
//   probability: 0.9999...,
//   isReliable: true,
//   proportion: 1.0,
// }

// Find top 3 languages (for mixed-language text):
const results = cld3.findTopNMostFreqLangs("Hello world, Bonjour monde!", 3)
// [
//   { language: "en", probability: 0.6, isReliable: true },
//   { language: "fr", probability: 0.3, isReliable: false },
// ]

// CLD3 is the most accurate for production use cases
// but requires WASM setup and is larger than franc/langdetect

Feature Comparison

Featurefranclangdetectcld3
Language count400+55107
Short text⚠️ Weak⚠️ Weak✅ Better
Browser support✅ (WASM)
Bundle size~540KB (min)~2MB~8MB (WASM)
ISO codes639-3639-1639-1
Confidence score
ESM
TypeScript✅ @types
No binary deps✅ (WASM)
Accuracy (long text)GoodVery goodExcellent

When to Use Each

Choose franc if:

  • Browser compatibility required (React, Vue, Svelte apps)
  • You need 400+ language support
  • Lightweight detection (franc-min for browser bundles)
  • ESM-first codebase

Choose langdetect if:

  • Server-side Node.js only (no browser)
  • You need the probabilistic accuracy of Google's original algorithm
  • Text is typically 50+ words (langdetect shines with longer text)

Choose cld3 if:

  • Production apps requiring Google-grade accuracy
  • You can accept the WASM bundle overhead (~8MB)
  • Mixed-language text detection is important

Handle edge cases:

// All detectors struggle with these cases — handle gracefully:

// 1. Very short text:
if (text.length < 20) return "und"  // Don't trust detection

// 2. All-caps or all numbers:
if (/^[A-Z0-9\s]+$/.test(text)) return "und"

// 3. Mixed language (code-switching):
// Consider breaking into sentences first

// 4. Code/technical content:
// Package names, URLs, code snippets always return wrong results
// Strip before detecting

// 5. Confidence threshold:
const [lang, score] = francAll(text)[0]
if (score < 0.8) return "und"  // Threshold for "I'm sure"

Methodology

Download data from npm registry (weekly average, February 2026). Accuracy comparisons based on community benchmarks and documentation for franc v6.x, langdetect v1.x, and @langion/cld3 v1.x.

Compare NLP and text processing packages on PkgPulse →

Comments

Stay Updated

Get the latest package insights, npm trends, and tooling tips delivered to your inbox.