LLM Token Counting in JavaScript: gpt-tokenizer vs js-tiktoken vs @dqbd/tiktoken 2026

TL;DR

For token counting in JavaScript, js-tiktoken wins for universal compatibility (3M+ weekly downloads, works on every platform), gpt-tokenizer wins for bundle size and small-text speed (fastest pure-JS implementation), and @dqbd/tiktoken wins for large-text throughput on Node.js backends (3–6× faster via WASM). If you're counting Claude tokens, skip all three — use Anthropic's official token-counting API.

Key Takeaways

js-tiktoken is the most popular choice with 3M+ weekly downloads and zero-config edge compatibility
gpt-tokenizer is the fastest pure-JS tokenizer for small texts (1.05 µs/iter), with the smallest bundle
@dqbd/tiktoken handles large-text batch processing best via WASM (421ms vs 1,005ms pure JS)
Accurate Claude token counting requires the Anthropic API — no open-source tokenizer exists for Claude
o200k_base (GPT-4o) has replaced cl100k_base as the standard for new OpenAI projects
Token counting is now critical infrastructure: quadratic attention scaling means context costs can 50× at 100K tokens vs 10K

Why Token Counting Matters More Than Ever in 2026

Every JavaScript developer building LLM-powered features eventually hits the same problem: a prompt silently gets truncated, an API call fails with a context-length error, or the billing dashboard shows costs 3× higher than expected.

Token counting solves all three.

LLMs don't process text in characters or words — they use tokens, subword units produced by byte-pair encoding (BPE). "JavaScript" might be a single token. "antidesestablishmentarianism" might be 6. The same 1,000-word essay could be 800 tokens in GPT-4o and 1,100 tokens in Claude — and you're billed for every single one.

At scale, this matters enormously. Processing 1M prompts per day at 300 tokens each costs roughly $600/day on GPT-4o Turbo. Just a 20% reduction through smarter prompt construction saves tens of thousands of dollars annually. And thanks to quadratic attention scaling, 100K tokens in a single call costs approximately 50× more compute than 10K tokens — not 10× more.

Three JavaScript libraries handle the tokenization problem for OpenAI-compatible models. Here's how they stack up in 2026.

The Contenders

gpt-tokenizer — Fastest Pure-JS Implementation

npm: gpt-tokenizer | weekly downloads: ~80K | bundle size: ~50KB | latest: 3.4.0

gpt-tokenizer is a pure-JavaScript port of tiktoken that has quietly become the speed leader for small-text tokenization. Since v2.4.0, it benchmarks at 1.05 µs/iter for short texts — faster than WASM implementations because it avoids initialization overhead.

npm install gpt-tokenizer

import { encode, decode, encodeChat } from 'gpt-tokenizer'

// Simple token counting
const tokens = encode('Hello, world! How many tokens is this?')
console.log(tokens.length) // 9

// Chat message tokenization (accounts for special tokens)
const messages = [
  { role: 'system', content: 'You are a helpful assistant.' },
  { role: 'user', content: 'What is the capital of France?' }
]
const chatTokens = encodeChat(messages, 'gpt-4o')
console.log(chatTokens.length) // includes message overhead tokens

The encodeChat() function is a significant advantage — it correctly accounts for the <|im_start|> and <|im_end|> special tokens that OpenAI adds around each message, which naive character-based counting misses entirely.

Supported encodings: r50k_base, p50k_base, cl100k_base (GPT-4), o200k_base (GPT-4o), o200k_harmony (GPT-5 preview)

Best for: Browser apps, Vercel Edge Functions, any environment where bundle size and startup time matter.

js-tiktoken — The Universal Standard

npm: js-tiktoken | weekly downloads: 3.0M+ | bundle size: ~200KB | latest: 1.0.21

js-tiktoken is OpenAI's official pure-JavaScript port of tiktoken. It's the de facto standard for JavaScript token counting — not because it's the fastest or smallest, but because it works everywhere without configuration, and OpenAI maintains it.

npm install js-tiktoken

import { encoding_for_model, get_encoding } from 'js-tiktoken'

// Get encoder for a specific model
const encoder = encoding_for_model('gpt-4o')
const tokens = encoder.encode('How many tokens in this sentence?')
console.log(tokens.length) // 7

// Count tokens for a chat prompt with overhead
function countChatTokens(messages: Array<{role: string, content: string}>) {
  const encoder = encoding_for_model('gpt-4o')
  let count = 3 // every reply is primed with <|im_start|>assistant

  for (const message of messages) {
    count += 4 // <|im_start|>{role}\n{content}<|im_end|>\n
    count += encoder.encode(message.role).length
    count += encoder.encode(message.content).length
  }

  encoder.free() // Important: free WASM memory
  return count
}

The critical detail: call encoder.free() to release WASM memory. Forgetting this causes memory leaks in long-running applications, a common production gotcha.

js-tiktoken also works identically on Cloudflare Workers and Vercel Edge without any WASM configuration — edge runtimes being the primary reason teams choose it over @dqbd/tiktoken.

Best for: Universal applications, Cloudflare Workers, projects that need OpenAI-compatible counts everywhere without WASM setup.

@dqbd/tiktoken — WASM Performance Leader

npm: @dqbd/tiktoken | weekly downloads: ~130K | bundle size: ~1.2MB | latest: 1.0.22

@dqbd/tiktoken provides WebAssembly bindings to the original Rust tiktoken implementation. For large-text processing on Node.js backends, it's 3–6× faster than pure-JavaScript alternatives.

npm install @dqbd/tiktoken

import { get_encoding, encoding_for_model } from '@dqbd/tiktoken'

const enc = get_encoding('cl100k_base')

// Process a large document
const largeDoc = getLargeDocument() // imagine 10,000 words
const tokens = enc.encode(largeDoc)
console.log(`Document: ${tokens.length} tokens`)

// Batch processing multiple documents
const docs = getDocumentBatch() // array of strings
const tokenCounts = docs.map(doc => enc.encode(doc).length)

enc.free() // Free WASM memory when done

For edge environments (Cloudflare Workers, Vercel Edge), use the lite variant:

import { get_encoding } from '@dqbd/tiktoken/lite'
import cl100k_base from '@dqbd/tiktoken/encoders/cl100k_base.json'

// Manually load encoder data (avoids bundling all encoders)
const enc = get_encoding('cl100k_base', {
  '<|fim_prefix|>': 100258,
  '<|fim_middle|>': 100259,
  '<|fim_suffix|>': 100260,
})

The WASM binary accounts for the larger bundle size. In constrained environments like Cloudflare Workers (1MB limit), @dqbd/tiktoken/lite strips down to just the encoding you need.

Benchmark comparison (large text ~10K chars):

Library	Time	Method
@dqbd/tiktoken	421ms	WASM
js-tiktoken (WASM)	452ms	WASM
js-tiktoken (JS)	1,006ms	Pure JS
gpt-tokenizer	~950ms	Pure JS

Best for: Node.js backends processing large documents, batch tokenization pipelines, RAG chunking systems.

Encoding Schemes: Which One Do You Need?

The "encoding" is the vocabulary and merge rules that define how text becomes tokens. Using the wrong encoding produces different (incorrect) token counts.

Encoding	Models	Vocab Size	When to Use
p50k_base	Legacy text-davinci models	50K	Legacy code only — avoid
cl100k_base	GPT-4 Turbo, GPT-3.5-turbo, text-embedding-3	100K	Still widely used
o200k_base	GPT-4o, GPT-4o mini	200K	New projects using GPT-4o
o200k_harmony	GPT-5 preview	200K+	Emerging — check model docs

The shift from cl100k_base to o200k_base doubles the vocabulary size, which improves multilingual efficiency significantly. Chinese, Japanese, and Arabic text tokenizes with 20–40% fewer tokens under o200k_base compared to cl100k_base. For English-only projects, the difference is minimal.

Don't hardcode encoding names in production code. Use the encoding_for_model() function, which automatically resolves the correct encoding:

// ✅ Correct — resolves encoding automatically
const enc = encoding_for_model('gpt-4o') // → o200k_base

// ❌ Fragile — breaks when model encoding changes
const enc = get_encoding('cl100k_base')

Counting Tokens for Claude, Gemini, and Other Models

Here's the hard truth: none of these JavaScript libraries count Claude or Gemini tokens accurately.

Anthropic uses a proprietary tokenizer. There's no open-source port. The same prompt might be 400 tokens in GPT-4 and 520 tokens in Claude — and character counting will be wrong for both.

For Claude, use the official token counting API:

import Anthropic from '@anthropic-ai/sdk'

const client = new Anthropic()

const response = await client.messages.countTokens({
  model: 'claude-opus-4-5',
  system: 'You are a helpful assistant.',
  messages: [{ role: 'user', content: 'How many tokens is this message?' }]
})

console.log(response.input_tokens) // exact count

This endpoint is free to call (separate rate limit from message generation) and supports system prompts, tools, images, and PDFs. Use it to pre-validate before sending expensive requests.

For Gemini, use the countTokens method from the official Google AI SDK:

import { GoogleGenerativeAI } from '@google/generative-ai'

const genAI = new GoogleGenerativeAI(API_KEY)
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-pro' })
const countResult = await model.countTokens('Your prompt here')
console.log(countResult.totalTokens)

Never estimate cross-model token counts by extrapolation. A 10% error in token count estimation can mean the difference between a prompt fitting in the context window and getting silently truncated.

Edge Runtime Support: What Works Where

Library	Cloudflare Workers	Vercel Edge	Browser	Node.js
gpt-tokenizer	✅	✅	✅	✅
js-tiktoken	✅	✅	✅	✅
@dqbd/tiktoken (full)	⚠️ (>1MB limit)	✅	⚠️	✅
@dqbd/tiktoken/lite	✅	✅	✅	✅

Cloudflare Workers have a strict 1MB script size limit. The full @dqbd/tiktoken package (including all encoder data and WASM binary) exceeds this. Use the /lite variant or switch to js-tiktoken or gpt-tokenizer for Workers.

A common pattern for edge-aware token counting:

// token-counter.ts — works everywhere
async function countTokens(text: string, model = 'gpt-4o'): Promise<number> {
  // Use gpt-tokenizer for universal compatibility
  const { encodeChat } = await import('gpt-tokenizer')
  return encodeChat([{ role: 'user', content: text }], model).length
}

Production Patterns

Pattern 1: Truncate to Context Window

import { encode } from 'js-tiktoken'

function truncateToContextWindow(
  text: string,
  maxTokens: number,
  model = 'gpt-4o'
): string {
  const encoder = encoding_for_model(model)
  const tokens = encoder.encode(text)

  if (tokens.length <= maxTokens) {
    encoder.free()
    return text
  }

  // Truncate token array and decode back to text
  const truncated = tokens.slice(0, maxTokens)
  const result = new TextDecoder().decode(encoder.decode(truncated))
  encoder.free()
  return result
}

Pattern 2: API Cost Estimation Before Request

const PRICING = {
  'gpt-4o': { input: 2.50, output: 10.00 }, // per 1M tokens
  'gpt-4o-mini': { input: 0.15, output: 0.60 },
}

function estimateCost(
  prompt: string,
  expectedOutputTokens: number,
  model: keyof typeof PRICING
): number {
  const { encode } = require('gpt-tokenizer')
  const inputTokens = encode(prompt).length
  const prices = PRICING[model]

  const inputCost = (inputTokens / 1_000_000) * prices.input
  const outputCost = (expectedOutputTokens / 1_000_000) * prices.output

  return inputCost + outputCost // in USD
}

// Example: warn if prompt would cost > $0.01 per call
const cost = estimateCost(systemPrompt + userInput, 500, 'gpt-4o')
if (cost > 0.01) {
  console.warn(`Expensive prompt: ~$${cost.toFixed(4)} per call`)
}

Pattern 3: RAG Chunking with Token Awareness

import { encode } from '@dqbd/tiktoken'

const enc = get_encoding('cl100k_base')

function chunkDocument(
  text: string,
  maxChunkTokens = 512,
  overlapTokens = 50
): string[] {
  const tokens = enc.encode(text)
  const chunks: string[] = []

  let start = 0
  while (start < tokens.length) {
    const end = Math.min(start + maxChunkTokens, tokens.length)
    const chunkTokens = tokens.slice(start, end)
    chunks.push(new TextDecoder().decode(enc.decode(chunkTokens)))
    start = end - overlapTokens // overlap for context continuity
  }

  return chunks
}

Which Library Should You Choose?

Choose gpt-tokenizer if:

You're building browser-based apps or need minimal bundle size
Your prompts are typically short (< 500 tokens)
You need the encodeChat() convenience function
You're deploying to Vercel Edge Functions

Choose js-tiktoken if:

You need guaranteed compatibility across all environments without config
You're deploying to Cloudflare Workers (no WASM hassle)
You want the official OpenAI-maintained solution
3M+ weekly downloads means excellent ecosystem compatibility

Choose @dqbd/tiktoken if:

You're building server-side RAG pipelines or document chunking
You process large texts (10K+ tokens per request)
Performance is critical for batch operations
You don't need edge runtime support (or use the /lite variant)

Use Anthropic/Google APIs if:

You're building on Claude or Gemini
You need accurate token counts for non-OpenAI models
Cost: Anthropic's counting endpoint is free (rate limited)

Methodology

npm download data sourced from npmjs.com and Socket.dev (March 2026)
Benchmark data from the compare-tokenizers repository and tiktoken-bench
Encoding information from OpenAI's official tiktoken documentation
Claude tokenization approach from Anthropic's token counting API docs
Edge runtime compatibility verified against Cloudflare Workers and Vercel Edge documentation

Want to see real token usage data across npm packages? Check out PkgPulse's package comparison tool for live npm stats, health scores, and download trends.

LLM Token Counting in JavaScript: gpt-tokenizer vs js-tiktoken vs @dqbd/tiktoken 2026

LLM Token Counting in JavaScript: gpt-tokenizer vs js-tiktoken vs @dqbd/tiktoken 2026

TL;DR

Key Takeaways

Why Token Counting Matters More Than Ever in 2026

The Contenders

gpt-tokenizer — Fastest Pure-JS Implementation

js-tiktoken — The Universal Standard

@dqbd/tiktoken — WASM Performance Leader

Encoding Schemes: Which One Do You Need?

Counting Tokens for Claude, Gemini, and Other Models

Edge Runtime Support: What Works Where

Production Patterns

Pattern 1: Truncate to Context Window

Pattern 2: API Cost Estimation Before Request

Pattern 3: RAG Chunking with Token Awareness

Which Library Should You Choose?

Methodology

Comments

Stay Updated