LLM Token Counting in JavaScript: gpt-tokenizer vs js-tiktoken vs @dqbd/tiktoken 2026
LLM Token Counting in JavaScript: gpt-tokenizer vs js-tiktoken vs @dqbd/tiktoken 2026
TL;DR
For token counting in JavaScript, js-tiktoken wins for universal compatibility (3M+ weekly downloads, works on every platform), gpt-tokenizer wins for bundle size and small-text speed (fastest pure-JS implementation), and @dqbd/tiktoken wins for large-text throughput on Node.js backends (3–6× faster via WASM). If you're counting Claude tokens, skip all three — use Anthropic's official token-counting API.
Key Takeaways
- js-tiktoken is the most popular choice with 3M+ weekly downloads and zero-config edge compatibility
- gpt-tokenizer is the fastest pure-JS tokenizer for small texts (1.05 µs/iter), with the smallest bundle
- @dqbd/tiktoken handles large-text batch processing best via WASM (421ms vs 1,005ms pure JS)
- Accurate Claude token counting requires the Anthropic API — no open-source tokenizer exists for Claude
- o200k_base (GPT-4o) has replaced cl100k_base as the standard for new OpenAI projects
- Token counting is now critical infrastructure: quadratic attention scaling means context costs can 50× at 100K tokens vs 10K
Why Token Counting Matters More Than Ever in 2026
Every JavaScript developer building LLM-powered features eventually hits the same problem: a prompt silently gets truncated, an API call fails with a context-length error, or the billing dashboard shows costs 3× higher than expected.
Token counting solves all three.
LLMs don't process text in characters or words — they use tokens, subword units produced by byte-pair encoding (BPE). "JavaScript" might be a single token. "antidesestablishmentarianism" might be 6. The same 1,000-word essay could be 800 tokens in GPT-4o and 1,100 tokens in Claude — and you're billed for every single one.
At scale, this matters enormously. Processing 1M prompts per day at 300 tokens each costs roughly $600/day on GPT-4o Turbo. Just a 20% reduction through smarter prompt construction saves tens of thousands of dollars annually. And thanks to quadratic attention scaling, 100K tokens in a single call costs approximately 50× more compute than 10K tokens — not 10× more.
Three JavaScript libraries handle the tokenization problem for OpenAI-compatible models. Here's how they stack up in 2026.
The Contenders
gpt-tokenizer — Fastest Pure-JS Implementation
npm: gpt-tokenizer | weekly downloads: ~80K | bundle size: ~50KB | latest: 3.4.0
gpt-tokenizer is a pure-JavaScript port of tiktoken that has quietly become the speed leader for small-text tokenization. Since v2.4.0, it benchmarks at 1.05 µs/iter for short texts — faster than WASM implementations because it avoids initialization overhead.
npm install gpt-tokenizer
import { encode, decode, encodeChat } from 'gpt-tokenizer'
// Simple token counting
const tokens = encode('Hello, world! How many tokens is this?')
console.log(tokens.length) // 9
// Chat message tokenization (accounts for special tokens)
const messages = [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is the capital of France?' }
]
const chatTokens = encodeChat(messages, 'gpt-4o')
console.log(chatTokens.length) // includes message overhead tokens
The encodeChat() function is a significant advantage — it correctly accounts for the <|im_start|> and <|im_end|> special tokens that OpenAI adds around each message, which naive character-based counting misses entirely.
Supported encodings: r50k_base, p50k_base, cl100k_base (GPT-4), o200k_base (GPT-4o), o200k_harmony (GPT-5 preview)
Best for: Browser apps, Vercel Edge Functions, any environment where bundle size and startup time matter.
js-tiktoken — The Universal Standard
npm: js-tiktoken | weekly downloads: 3.0M+ | bundle size: ~200KB | latest: 1.0.21
js-tiktoken is OpenAI's official pure-JavaScript port of tiktoken. It's the de facto standard for JavaScript token counting — not because it's the fastest or smallest, but because it works everywhere without configuration, and OpenAI maintains it.
npm install js-tiktoken
import { encoding_for_model, get_encoding } from 'js-tiktoken'
// Get encoder for a specific model
const encoder = encoding_for_model('gpt-4o')
const tokens = encoder.encode('How many tokens in this sentence?')
console.log(tokens.length) // 7
// Count tokens for a chat prompt with overhead
function countChatTokens(messages: Array<{role: string, content: string}>) {
const encoder = encoding_for_model('gpt-4o')
let count = 3 // every reply is primed with <|im_start|>assistant
for (const message of messages) {
count += 4 // <|im_start|>{role}\n{content}<|im_end|>\n
count += encoder.encode(message.role).length
count += encoder.encode(message.content).length
}
encoder.free() // Important: free WASM memory
return count
}
The critical detail: call encoder.free() to release WASM memory. Forgetting this causes memory leaks in long-running applications, a common production gotcha.
js-tiktoken also works identically on Cloudflare Workers and Vercel Edge without any WASM configuration — edge runtimes being the primary reason teams choose it over @dqbd/tiktoken.
Best for: Universal applications, Cloudflare Workers, projects that need OpenAI-compatible counts everywhere without WASM setup.
@dqbd/tiktoken — WASM Performance Leader
npm: @dqbd/tiktoken | weekly downloads: ~130K | bundle size: ~1.2MB | latest: 1.0.22
@dqbd/tiktoken provides WebAssembly bindings to the original Rust tiktoken implementation. For large-text processing on Node.js backends, it's 3–6× faster than pure-JavaScript alternatives.
npm install @dqbd/tiktoken
import { get_encoding, encoding_for_model } from '@dqbd/tiktoken'
const enc = get_encoding('cl100k_base')
// Process a large document
const largeDoc = getLargeDocument() // imagine 10,000 words
const tokens = enc.encode(largeDoc)
console.log(`Document: ${tokens.length} tokens`)
// Batch processing multiple documents
const docs = getDocumentBatch() // array of strings
const tokenCounts = docs.map(doc => enc.encode(doc).length)
enc.free() // Free WASM memory when done
For edge environments (Cloudflare Workers, Vercel Edge), use the lite variant:
import { get_encoding } from '@dqbd/tiktoken/lite'
import cl100k_base from '@dqbd/tiktoken/encoders/cl100k_base.json'
// Manually load encoder data (avoids bundling all encoders)
const enc = get_encoding('cl100k_base', {
'<|fim_prefix|>': 100258,
'<|fim_middle|>': 100259,
'<|fim_suffix|>': 100260,
})
The WASM binary accounts for the larger bundle size. In constrained environments like Cloudflare Workers (1MB limit), @dqbd/tiktoken/lite strips down to just the encoding you need.
Benchmark comparison (large text ~10K chars):
| Library | Time | Method |
|---|---|---|
| @dqbd/tiktoken | 421ms | WASM |
| js-tiktoken (WASM) | 452ms | WASM |
| js-tiktoken (JS) | 1,006ms | Pure JS |
| gpt-tokenizer | ~950ms | Pure JS |
Best for: Node.js backends processing large documents, batch tokenization pipelines, RAG chunking systems.
Encoding Schemes: Which One Do You Need?
The "encoding" is the vocabulary and merge rules that define how text becomes tokens. Using the wrong encoding produces different (incorrect) token counts.
| Encoding | Models | Vocab Size | When to Use |
|---|---|---|---|
| p50k_base | Legacy text-davinci models | 50K | Legacy code only — avoid |
| cl100k_base | GPT-4 Turbo, GPT-3.5-turbo, text-embedding-3 | 100K | Still widely used |
| o200k_base | GPT-4o, GPT-4o mini | 200K | New projects using GPT-4o |
| o200k_harmony | GPT-5 preview | 200K+ | Emerging — check model docs |
The shift from cl100k_base to o200k_base doubles the vocabulary size, which improves multilingual efficiency significantly. Chinese, Japanese, and Arabic text tokenizes with 20–40% fewer tokens under o200k_base compared to cl100k_base. For English-only projects, the difference is minimal.
Don't hardcode encoding names in production code. Use the encoding_for_model() function, which automatically resolves the correct encoding:
// ✅ Correct — resolves encoding automatically
const enc = encoding_for_model('gpt-4o') // → o200k_base
// ❌ Fragile — breaks when model encoding changes
const enc = get_encoding('cl100k_base')
Counting Tokens for Claude, Gemini, and Other Models
Here's the hard truth: none of these JavaScript libraries count Claude or Gemini tokens accurately.
Anthropic uses a proprietary tokenizer. There's no open-source port. The same prompt might be 400 tokens in GPT-4 and 520 tokens in Claude — and character counting will be wrong for both.
For Claude, use the official token counting API:
import Anthropic from '@anthropic-ai/sdk'
const client = new Anthropic()
const response = await client.messages.countTokens({
model: 'claude-opus-4-5',
system: 'You are a helpful assistant.',
messages: [{ role: 'user', content: 'How many tokens is this message?' }]
})
console.log(response.input_tokens) // exact count
This endpoint is free to call (separate rate limit from message generation) and supports system prompts, tools, images, and PDFs. Use it to pre-validate before sending expensive requests.
For Gemini, use the countTokens method from the official Google AI SDK:
import { GoogleGenerativeAI } from '@google/generative-ai'
const genAI = new GoogleGenerativeAI(API_KEY)
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-pro' })
const countResult = await model.countTokens('Your prompt here')
console.log(countResult.totalTokens)
Never estimate cross-model token counts by extrapolation. A 10% error in token count estimation can mean the difference between a prompt fitting in the context window and getting silently truncated.
Edge Runtime Support: What Works Where
| Library | Cloudflare Workers | Vercel Edge | Browser | Node.js |
|---|---|---|---|---|
| gpt-tokenizer | ✅ | ✅ | ✅ | ✅ |
| js-tiktoken | ✅ | ✅ | ✅ | ✅ |
| @dqbd/tiktoken (full) | ⚠️ (>1MB limit) | ✅ | ⚠️ | ✅ |
| @dqbd/tiktoken/lite | ✅ | ✅ | ✅ | ✅ |
Cloudflare Workers have a strict 1MB script size limit. The full @dqbd/tiktoken package (including all encoder data and WASM binary) exceeds this. Use the /lite variant or switch to js-tiktoken or gpt-tokenizer for Workers.
A common pattern for edge-aware token counting:
// token-counter.ts — works everywhere
async function countTokens(text: string, model = 'gpt-4o'): Promise<number> {
// Use gpt-tokenizer for universal compatibility
const { encodeChat } = await import('gpt-tokenizer')
return encodeChat([{ role: 'user', content: text }], model).length
}
Production Patterns
Pattern 1: Truncate to Context Window
import { encode } from 'js-tiktoken'
function truncateToContextWindow(
text: string,
maxTokens: number,
model = 'gpt-4o'
): string {
const encoder = encoding_for_model(model)
const tokens = encoder.encode(text)
if (tokens.length <= maxTokens) {
encoder.free()
return text
}
// Truncate token array and decode back to text
const truncated = tokens.slice(0, maxTokens)
const result = new TextDecoder().decode(encoder.decode(truncated))
encoder.free()
return result
}
Pattern 2: API Cost Estimation Before Request
const PRICING = {
'gpt-4o': { input: 2.50, output: 10.00 }, // per 1M tokens
'gpt-4o-mini': { input: 0.15, output: 0.60 },
}
function estimateCost(
prompt: string,
expectedOutputTokens: number,
model: keyof typeof PRICING
): number {
const { encode } = require('gpt-tokenizer')
const inputTokens = encode(prompt).length
const prices = PRICING[model]
const inputCost = (inputTokens / 1_000_000) * prices.input
const outputCost = (expectedOutputTokens / 1_000_000) * prices.output
return inputCost + outputCost // in USD
}
// Example: warn if prompt would cost > $0.01 per call
const cost = estimateCost(systemPrompt + userInput, 500, 'gpt-4o')
if (cost > 0.01) {
console.warn(`Expensive prompt: ~$${cost.toFixed(4)} per call`)
}
Pattern 3: RAG Chunking with Token Awareness
import { encode } from '@dqbd/tiktoken'
const enc = get_encoding('cl100k_base')
function chunkDocument(
text: string,
maxChunkTokens = 512,
overlapTokens = 50
): string[] {
const tokens = enc.encode(text)
const chunks: string[] = []
let start = 0
while (start < tokens.length) {
const end = Math.min(start + maxChunkTokens, tokens.length)
const chunkTokens = tokens.slice(start, end)
chunks.push(new TextDecoder().decode(enc.decode(chunkTokens)))
start = end - overlapTokens // overlap for context continuity
}
return chunks
}
Which Library Should You Choose?
Choose gpt-tokenizer if:
- You're building browser-based apps or need minimal bundle size
- Your prompts are typically short (< 500 tokens)
- You need the
encodeChat()convenience function - You're deploying to Vercel Edge Functions
Choose js-tiktoken if:
- You need guaranteed compatibility across all environments without config
- You're deploying to Cloudflare Workers (no WASM hassle)
- You want the official OpenAI-maintained solution
- 3M+ weekly downloads means excellent ecosystem compatibility
Choose @dqbd/tiktoken if:
- You're building server-side RAG pipelines or document chunking
- You process large texts (10K+ tokens per request)
- Performance is critical for batch operations
- You don't need edge runtime support (or use the
/litevariant)
Use Anthropic/Google APIs if:
- You're building on Claude or Gemini
- You need accurate token counts for non-OpenAI models
- Cost: Anthropic's counting endpoint is free (rate limited)
Methodology
- npm download data sourced from npmjs.com and Socket.dev (March 2026)
- Benchmark data from the
compare-tokenizersrepository andtiktoken-bench - Encoding information from OpenAI's official tiktoken documentation
- Claude tokenization approach from Anthropic's token counting API docs
- Edge runtime compatibility verified against Cloudflare Workers and Vercel Edge documentation
Want to see real token usage data across npm packages? Check out PkgPulse's package comparison tool for live npm stats, health scores, and download trends.