DataLoader vs p-batch vs graphql-batch: Batching & Deduplication (2026)
TL;DR
DataLoader (from Meta/Facebook) is the standard solution for the N+1 problem in GraphQL — it batches multiple individual loads within a single event loop tick into one bulk request, and deduplicates repeated loads of the same key. p-batch is a generic promise-based batching utility that works outside GraphQL contexts. graphql-batch is a Ruby gem, not a JS library — in the JavaScript ecosystem the equivalent is DataLoader itself, sometimes combined with graphql-dataloader for easier integration. In 2026: use DataLoader in any GraphQL resolver, always create a new DataLoader instance per request (for per-user caching), and combine it with Prisma's findMany for maximum efficiency.
Key Takeaways
- dataloader: ~4M weekly downloads — Meta's N+1 solution, per-tick batching, built-in deduplication + in-memory cache
- p-batch: ~500K weekly downloads — generic batching, not GraphQL-specific, useful for REST or non-resolver contexts
- N+1 problem: 100 users × 1 DB query per user = 101 queries → DataLoader: 1 batch query for all users
- DataLoader batches within a single event loop tick — multiple
loader.load(id)calls get batched automatically - Always create DataLoader per request (not global) — the cache should not persist across users
- DataLoader's built-in cache prevents duplicate fetches within the same request
The N+1 Problem
// Without DataLoader — N+1 queries:
const query = `
query {
posts { # 1 query to get 100 posts
title
author { # 100 queries — one per post author!
name
}
}
}
`
// Resolver that causes N+1:
const resolvers = {
Post: {
author: async (post) => {
// Called once PER POST — 100 individual DB queries!
return db.user.findUnique({ where: { id: post.authorId } })
},
},
}
// Database log:
// SELECT * FROM posts; -- 1 query
// SELECT * FROM users WHERE id = 1; -- 100 queries
// SELECT * FROM users WHERE id = 2;
// SELECT * FROM users WHERE id = 1; ← duplicate!
// ...
// Total: 101 queries for a single GraphQL request
DataLoader
DataLoader — batch and cache async data fetching:
How it works
Event loop tick:
1. post.author resolver calls loader.load(userId) for user 1
2. post.author resolver calls loader.load(userId) for user 2
3. post.author resolver calls loader.load(userId) for user 1 (duplicate!)
...all within the same tick...
4. End of tick: DataLoader collects all unique keys: [1, 2, 3, ...]
5. Calls batchFn([1, 2, 3, ...]) → single DB query
6. Resolves each individual promise with its result
7. Duplicate keys get the same cached result
Basic usage
import DataLoader from "dataloader"
import { PrismaClient } from "@prisma/client"
const db = new PrismaClient()
// Define a batch function — receives array of keys, returns array of values in same order:
const userLoader = new DataLoader<number, User | null>(async (userIds) => {
// One bulk query for all IDs:
const users = await db.user.findMany({
where: { id: { in: [...userIds] } },
})
// CRITICAL: Return results in the SAME ORDER as the input keys
// (DataLoader requires 1:1 correspondence: keys[i] → results[i])
const userMap = new Map(users.map((u) => [u.id, u]))
return userIds.map((id) => userMap.get(id) ?? null)
})
// Load individual items (these get batched automatically):
const user1 = await userLoader.load(1) // Doesn't query yet
const user2 = await userLoader.load(2) // Doesn't query yet
const user1again = await userLoader.load(1) // Will use cache
// End of tick → one batch query: SELECT WHERE id IN (1, 2)
// user1again === user1 (same reference — deduped from cache)
Per-request context pattern (correct way)
// IMPORTANT: Create new DataLoader per request — not global!
// If global: user A's request can see user B's cached data (security issue)
// context.ts — create loaders per request:
interface Context {
db: PrismaClient
loaders: {
user: DataLoader<number, User | null>
post: DataLoader<number, Post | null>
comment: DataLoader<number, Comment[]>
}
}
function createContext(db: PrismaClient): Context {
return {
db,
loaders: {
user: new DataLoader<number, User | null>(async (ids) => {
const users = await db.user.findMany({
where: { id: { in: [...ids] } },
})
const map = new Map(users.map((u) => [u.id, u]))
return ids.map((id) => map.get(id) ?? null)
}),
post: new DataLoader<number, Post | null>(async (ids) => {
const posts = await db.post.findMany({
where: { id: { in: [...ids] } },
})
const map = new Map(posts.map((p) => [p.id, p]))
return ids.map((id) => map.get(id) ?? null)
}),
// One-to-many loader (comments for a post):
comment: new DataLoader<number, Comment[]>(async (postIds) => {
const comments = await db.comment.findMany({
where: { postId: { in: [...postIds] } },
})
// Group by postId:
const byPost = new Map<number, Comment[]>()
for (const comment of comments) {
const list = byPost.get(comment.postId) ?? []
list.push(comment)
byPost.set(comment.postId, list)
}
return postIds.map((id) => byPost.get(id) ?? [])
}),
},
}
}
Using loaders in resolvers
// resolvers.ts — resolvers use context.loaders, not direct DB calls:
const resolvers = {
Query: {
posts: async (_: unknown, __: unknown, ctx: Context) => {
return ctx.db.post.findMany({ take: 100 })
},
},
Post: {
// This resolver is called 100 times for 100 posts:
author: async (post: Post, _: unknown, ctx: Context) => {
// DataLoader batches all 100 calls into 1 query!
return ctx.loaders.user.load(post.authorId)
},
comments: async (post: Post, _: unknown, ctx: Context) => {
// Also batched — all comment lookups for all posts in one query:
return ctx.loaders.comment.load(post.id)
},
},
Comment: {
author: async (comment: Comment, _: unknown, ctx: Context) => {
// Deduplication: multiple comments by same author share cache:
return ctx.loaders.user.load(comment.authorId)
},
},
}
Integration with graphql-yoga / Apollo Server
import { createSchema, createYoga } from "graphql-yoga"
import { createServer } from "http"
const yoga = createYoga({
schema,
context: async () => {
// New loaders per request — no data leakage between requests:
return createContext(db)
},
})
// Apollo Server v4 equivalent:
import { ApolloServer } from "@apollo/server"
import { expressMiddleware } from "@apollo/server/express4"
const server = new ApolloServer({ typeDefs, resolvers })
app.use("/graphql", expressMiddleware(server, {
context: async ({ req }) => createContext(db),
}))
Cache control
import DataLoader from "dataloader"
const userLoader = new DataLoader<number, User | null>(batchFn, {
// Disable cache if you want fresh data every call within same request:
cache: false,
// Custom cache map (e.g., LRU for long-lived loaders):
// cacheMap: new LRUCache({ max: 1000 }),
// Max batch size (avoid huge IN clauses):
maxBatchSize: 100,
// Batch scheduler — default is process.nextTick, can customize:
batchScheduleFn: (callback) => setTimeout(callback, 10), // 10ms window
})
// Manual cache operations:
userLoader.prime(1, existingUser) // Seed cache without calling batchFn
userLoader.clear(1) // Invalidate specific key
userLoader.clearAll() // Clear entire cache
p-batch
p-batch — generic promise batching:
Basic usage
import pBatch from "p-batch"
// Generic batch processor — works for any async operation:
const processInBatches = pBatch(
async (items: number[]) => {
// Process all items at once:
const results = await db.user.findMany({ where: { id: { in: items } } })
const map = new Map(results.map((r) => [r.id, r]))
return items.map((id) => map.get(id) ?? null)
},
{ maxBatchSize: 50 }
)
// Individual calls get batched:
const [user1, user2, user3] = await Promise.all([
processInBatches(1),
processInBatches(2),
processInBatches(3),
])
Compared to DataLoader
// DataLoader: automatic tick-based batching + deduplication + cache
// p-batch: configurable batching (time window or size), no built-in cache
// p-batch is useful for non-GraphQL batching:
// - Rate-limited API calls
// - Bulk event processing
// - Batching writes (not just reads)
// Example: batch API calls to avoid rate limiting:
const fetchUserFromApi = pBatch(
async (userIds: string[]) => {
const response = await fetch(`/api/users?ids=${userIds.join(",")}`)
const data = await response.json()
return userIds.map((id) => data.users.find((u: User) => u.id === id) ?? null)
},
{
maxBatchSize: 50,
maxWait: 50, // Wait up to 50ms to accumulate a batch
}
)
Feature Comparison
| Feature | DataLoader | p-batch |
|---|---|---|
| Built-in cache (deduplication) | ✅ | ❌ |
| Tick-based auto-batching | ✅ | ❌ (time-based) |
| Max batch size | ✅ | ✅ |
| GraphQL-specific | Designed for | Agnostic |
| TypeScript | ✅ | ✅ |
prime() / clear() cache API | ✅ | ❌ |
| Weekly downloads | ~4M | ~500K |
| Custom cache map | ✅ | ❌ |
When to Use Each
Choose DataLoader if:
- Building GraphQL resolvers and want to solve the N+1 problem
- Need automatic per-tick batching with no extra setup
- Want built-in deduplication so duplicate keys use cached results
- Working with any GraphQL server (Apollo, graphql-yoga, mercurius, etc.)
Choose p-batch if:
- Batching outside of GraphQL (REST API clients, event processing)
- Need time-window batching rather than event-loop-tick batching
- Batching writes or side effects (not just reads)
DataLoader best practices:
// ✅ DO: Create per request
const context = { loaders: createLoaders(db) }
// ❌ DON'T: Create globally
const globalLoader = new DataLoader(batchFn) // Cache persists across users!
// ✅ DO: Return results in same order as keys
const batchFn = async (ids) => {
const results = await db.findMany({ id: { in: ids } })
const map = new Map(results.map(r => [r.id, r]))
return ids.map(id => map.get(id) ?? null) // Ordered!
}
// ❌ DON'T: Return unsorted results
const badBatchFn = async (ids) => {
return db.findMany({ id: { in: ids } }) // DB may return in different order!
}
// ✅ DO: Use prime() to seed cache from list queries
const posts = await db.post.findMany({ take: 100 })
for (const post of posts) {
ctx.loaders.post.prime(post.id, post) // No re-fetch needed if later loader.load(post.id)
}
Methodology
Download data from npm registry (weekly average, February 2026). Feature comparison based on dataloader v2.x and p-batch v3.x.