Skip to main content

Guide

DataLoader vs p-batch vs graphql-batch 2026

Compare DataLoader, p-batch, and graphql-batch for solving the N+1 query problem in GraphQL APIs. Batching strategies, request deduplication, caching, and.

·PkgPulse Team·
0

TL;DR

DataLoader (from Meta/Facebook) is the standard solution for the N+1 problem in GraphQL — it batches multiple individual loads within a single event loop tick into one bulk request, and deduplicates repeated loads of the same key. p-batch is a generic promise-based batching utility that works outside GraphQL contexts. graphql-batch is a Ruby gem, not a JS library — in the JavaScript ecosystem the equivalent is DataLoader itself, sometimes combined with graphql-dataloader for easier integration. In 2026: use DataLoader in any GraphQL resolver, always create a new DataLoader instance per request (for per-user caching), and combine it with Prisma's findMany for maximum efficiency.

Key Takeaways

  • dataloader: ~4M weekly downloads — Meta's N+1 solution, per-tick batching, built-in deduplication + in-memory cache
  • p-batch: ~500K weekly downloads — generic batching, not GraphQL-specific, useful for REST or non-resolver contexts
  • N+1 problem: 100 users × 1 DB query per user = 101 queries → DataLoader: 1 batch query for all users
  • DataLoader batches within a single event loop tick — multiple loader.load(id) calls get batched automatically
  • Always create DataLoader per request (not global) — the cache should not persist across users
  • DataLoader's built-in cache prevents duplicate fetches within the same request

The N+1 Problem

// Without DataLoader — N+1 queries:
const query = `
  query {
    posts {        # 1 query to get 100 posts
      title
      author {     # 100 queries — one per post author!
        name
      }
    }
  }
`

// Resolver that causes N+1:
const resolvers = {
  Post: {
    author: async (post) => {
      // Called once PER POST — 100 individual DB queries!
      return db.user.findUnique({ where: { id: post.authorId } })
    },
  },
}

// Database log:
// SELECT * FROM posts;                              -- 1 query
// SELECT * FROM users WHERE id = 1;                -- 100 queries
// SELECT * FROM users WHERE id = 2;
// SELECT * FROM users WHERE id = 1;  ← duplicate!
// ...
// Total: 101 queries for a single GraphQL request

DataLoader

DataLoader — batch and cache async data fetching:

How it works

Event loop tick:
  1. post.author resolver calls loader.load(userId) for user 1
  2. post.author resolver calls loader.load(userId) for user 2
  3. post.author resolver calls loader.load(userId) for user 1 (duplicate!)
  ...all within the same tick...
  4. End of tick: DataLoader collects all unique keys: [1, 2, 3, ...]
  5. Calls batchFn([1, 2, 3, ...]) → single DB query
  6. Resolves each individual promise with its result
  7. Duplicate keys get the same cached result

Basic usage

import DataLoader from "dataloader"
import { PrismaClient } from "@prisma/client"

const db = new PrismaClient()

// Define a batch function — receives array of keys, returns array of values in same order:
const userLoader = new DataLoader<number, User | null>(async (userIds) => {
  // One bulk query for all IDs:
  const users = await db.user.findMany({
    where: { id: { in: [...userIds] } },
  })

  // CRITICAL: Return results in the SAME ORDER as the input keys
  // (DataLoader requires 1:1 correspondence: keys[i] → results[i])
  const userMap = new Map(users.map((u) => [u.id, u]))
  return userIds.map((id) => userMap.get(id) ?? null)
})

// Load individual items (these get batched automatically):
const user1 = await userLoader.load(1)    // Doesn't query yet
const user2 = await userLoader.load(2)    // Doesn't query yet
const user1again = await userLoader.load(1)  // Will use cache

// End of tick → one batch query: SELECT WHERE id IN (1, 2)
// user1again === user1 (same reference — deduped from cache)

Per-request context pattern (correct way)

// IMPORTANT: Create new DataLoader per request — not global!
// If global: user A's request can see user B's cached data (security issue)

// context.ts — create loaders per request:
interface Context {
  db: PrismaClient
  loaders: {
    user: DataLoader<number, User | null>
    post: DataLoader<number, Post | null>
    comment: DataLoader<number, Comment[]>
  }
}

function createContext(db: PrismaClient): Context {
  return {
    db,
    loaders: {
      user: new DataLoader<number, User | null>(async (ids) => {
        const users = await db.user.findMany({
          where: { id: { in: [...ids] } },
        })
        const map = new Map(users.map((u) => [u.id, u]))
        return ids.map((id) => map.get(id) ?? null)
      }),

      post: new DataLoader<number, Post | null>(async (ids) => {
        const posts = await db.post.findMany({
          where: { id: { in: [...ids] } },
        })
        const map = new Map(posts.map((p) => [p.id, p]))
        return ids.map((id) => map.get(id) ?? null)
      }),

      // One-to-many loader (comments for a post):
      comment: new DataLoader<number, Comment[]>(async (postIds) => {
        const comments = await db.comment.findMany({
          where: { postId: { in: [...postIds] } },
        })
        // Group by postId:
        const byPost = new Map<number, Comment[]>()
        for (const comment of comments) {
          const list = byPost.get(comment.postId) ?? []
          list.push(comment)
          byPost.set(comment.postId, list)
        }
        return postIds.map((id) => byPost.get(id) ?? [])
      }),
    },
  }
}

Using loaders in resolvers

// resolvers.ts — resolvers use context.loaders, not direct DB calls:
const resolvers = {
  Query: {
    posts: async (_: unknown, __: unknown, ctx: Context) => {
      return ctx.db.post.findMany({ take: 100 })
    },
  },

  Post: {
    // This resolver is called 100 times for 100 posts:
    author: async (post: Post, _: unknown, ctx: Context) => {
      // DataLoader batches all 100 calls into 1 query!
      return ctx.loaders.user.load(post.authorId)
    },

    comments: async (post: Post, _: unknown, ctx: Context) => {
      // Also batched — all comment lookups for all posts in one query:
      return ctx.loaders.comment.load(post.id)
    },
  },

  Comment: {
    author: async (comment: Comment, _: unknown, ctx: Context) => {
      // Deduplication: multiple comments by same author share cache:
      return ctx.loaders.user.load(comment.authorId)
    },
  },
}

Integration with graphql-yoga / Apollo Server

import { createSchema, createYoga } from "graphql-yoga"
import { createServer } from "http"

const yoga = createYoga({
  schema,
  context: async () => {
    // New loaders per request — no data leakage between requests:
    return createContext(db)
  },
})

// Apollo Server v4 equivalent:
import { ApolloServer } from "@apollo/server"
import { expressMiddleware } from "@apollo/server/express4"

const server = new ApolloServer({ typeDefs, resolvers })

app.use("/graphql", expressMiddleware(server, {
  context: async ({ req }) => createContext(db),
}))

Cache control

import DataLoader from "dataloader"

const userLoader = new DataLoader<number, User | null>(batchFn, {
  // Disable cache if you want fresh data every call within same request:
  cache: false,

  // Custom cache map (e.g., LRU for long-lived loaders):
  // cacheMap: new LRUCache({ max: 1000 }),

  // Max batch size (avoid huge IN clauses):
  maxBatchSize: 100,

  // Batch scheduler — default is process.nextTick, can customize:
  batchScheduleFn: (callback) => setTimeout(callback, 10),  // 10ms window
})

// Manual cache operations:
userLoader.prime(1, existingUser)  // Seed cache without calling batchFn
userLoader.clear(1)                 // Invalidate specific key
userLoader.clearAll()               // Clear entire cache

p-batch

p-batch — generic promise batching:

Basic usage

import pBatch from "p-batch"

// Generic batch processor — works for any async operation:
const processInBatches = pBatch(
  async (items: number[]) => {
    // Process all items at once:
    const results = await db.user.findMany({ where: { id: { in: items } } })
    const map = new Map(results.map((r) => [r.id, r]))
    return items.map((id) => map.get(id) ?? null)
  },
  { maxBatchSize: 50 }
)

// Individual calls get batched:
const [user1, user2, user3] = await Promise.all([
  processInBatches(1),
  processInBatches(2),
  processInBatches(3),
])

Compared to DataLoader

// DataLoader: automatic tick-based batching + deduplication + cache
// p-batch: configurable batching (time window or size), no built-in cache

// p-batch is useful for non-GraphQL batching:
// - Rate-limited API calls
// - Bulk event processing
// - Batching writes (not just reads)

// Example: batch API calls to avoid rate limiting:
const fetchUserFromApi = pBatch(
  async (userIds: string[]) => {
    const response = await fetch(`/api/users?ids=${userIds.join(",")}`)
    const data = await response.json()
    return userIds.map((id) => data.users.find((u: User) => u.id === id) ?? null)
  },
  {
    maxBatchSize: 50,
    maxWait: 50,  // Wait up to 50ms to accumulate a batch
  }
)

Feature Comparison

FeatureDataLoaderp-batch
Built-in cache (deduplication)
Tick-based auto-batching❌ (time-based)
Max batch size
GraphQL-specificDesigned forAgnostic
TypeScript
prime() / clear() cache API
Weekly downloads~4M~500K
Custom cache map

Why the N+1 Problem Matters in Production

The N+1 query problem is one of the most impactful performance issues in GraphQL APIs and goes unnoticed during development with small datasets. A developer testing with five posts and five users sees five queries — barely noticeable. A production deployment with ten thousand posts executing one database query per post author generates ten thousand database round-trips for a single GraphQL request, easily causing timeouts and database connection pool exhaustion. DataLoader's solution is elegant because it requires no changes to the GraphQL schema or the shape of resolver code — the resolver still calls loader.load(id) as if loading one item, and DataLoader transparently batches the loads that happen within the same event loop tick. This architecture means you can add DataLoader to an existing GraphQL API with N+1 problems by adding loaders to the context and updating resolvers, without changing the schema.

Production Architecture and Per-Request Instantiation

The per-request DataLoader pattern is the single most important architectural decision when using DataLoader in production. A globally shared DataLoader instance accumulates cache entries indefinitely and crosses request boundaries, meaning user A's query can return cached data from user B's earlier request — a serious data isolation bug. Create DataLoaders inside the request context factory and attach them to the context object that is passed to every resolver. GraphQL servers like Apollo Server v4 and graphql-yoga both provide a context function that runs per request, making this straightforward. Pair per-request instantiation with cacheMap: null or cache: false if you're using DataLoader for deduplication only and don't need the within-request cache, trading memory for explicit predictability.

Ordering Guarantee and Database Query Patterns

The most common DataLoader bug is a batch function that returns results in a different order than the input keys. If your Prisma or SQL query returns records sorted by a database-level ordering rather than by the order of the input ID array, DataLoader will assign results to the wrong keys. The correct pattern is always to build a Map from the query results keyed by ID, then return ids.map(id => map.get(id) ?? null) — this ensures the output order matches the input order regardless of database query ordering. For one-to-many loaders (comments for a post), build a Map from the foreign key to an array, then map over the parent IDs returning an empty array for parents with no children.

Combining DataLoader with Prisma's findMany

Prisma's findMany with where: { id: { in: [...ids] } } is the canonical batch function implementation, but there is an important edge case: Prisma does not guarantee that findMany returns results for IDs that don't exist in the database. If your batch contains 100 IDs and only 90 records exist, the returned array has 90 elements — the mapping step must handle missing records explicitly with map.get(id) ?? null or a thrown error, not by indexing into the raw array. For soft-deleted records, your batch function must decide whether to return the soft-deleted record (exposing deleted data to resolvers that may not check) or return null (consistent with the record not existing), and document this behavior in the loader's definition.

Performance Profiling and Batch Size Tuning

DataLoader's default tick-based batching collects all loads scheduled within a single event loop tick. In practice this means all loads triggered during a single GraphQL resolver execution phase get batched together. Monitor batch sizes in production using a logging wrapper around your batch function — if batches consistently contain only one or two items, your resolvers may be executing sequentially rather than in parallel, defeating the batching. Use Promise.all to parallelize multiple loader calls within a single resolver when you need data from multiple sources simultaneously. The maxBatchSize option is critical for preventing database IN clause limits — PostgreSQL's practical limit is around 65,535 parameters; set maxBatchSize: 1000 to stay well within this bound.

Ecosystem Integration and Alternatives

For REST APIs, DataLoader's tick-based batching works just as well as for GraphQL — HTTP endpoints that accept batch requests (array of IDs in a POST body or comma-separated ID query params) are natural targets for DataLoader-style batching. The DataLoader package is maintained by the GraphQL Foundation following Meta's initial contribution, ensuring long-term maintenance. For non-GraphQL use cases where you need time-window batching rather than tick-based batching, p-batch's maxWait option accumulates items for a configurable millisecond window before calling the batch function — useful when the producer and consumer of batched items are not in the same synchronous execution context.

Caching Strategy and Cache Invalidation

DataLoader's built-in request-scoped cache prevents duplicate loads within a single request, but this creates a subtle edge case: if a mutation updates a record and a subsequent query in the same request loads the same record via DataLoader, the cached value (pre-mutation) is returned instead of the updated value. The solution is to call loader.clear(id) after any mutation that modifies data the loader caches, or loader.clearAll() to invalidate the entire request cache. This is particularly important in GraphQL mutations that return updated objects — clear the relevant DataLoader keys before resolving the mutation's return value. For p-batch, the batching window is configurable via the maxTime option — a lower maxTime reduces latency at the cost of smaller batch sizes, while a higher value increases batching efficiency at the cost of added latency. In high-throughput APIs, measure the actual batch sizes in production using DataLoader's batchLoadFn telemetry to verify that your batcher is actually batching rather than making one-at-a-time calls, which would indicate that the event loop tick is advancing between loads.

When to Use Each

Choose DataLoader if:

  • Building GraphQL resolvers and want to solve the N+1 problem
  • Need automatic per-tick batching with no extra setup
  • Want built-in deduplication so duplicate keys use cached results
  • Working with any GraphQL server (Apollo, graphql-yoga, mercurius, etc.)

Choose p-batch if:

  • Batching outside of GraphQL (REST API clients, event processing)
  • Need time-window batching rather than event-loop-tick batching
  • Batching writes or side effects (not just reads)

DataLoader best practices:

// ✅ DO: Create per request
const context = { loaders: createLoaders(db) }

// ❌ DON'T: Create globally
const globalLoader = new DataLoader(batchFn)  // Cache persists across users!

// ✅ DO: Return results in same order as keys
const batchFn = async (ids) => {
  const results = await db.findMany({ id: { in: ids } })
  const map = new Map(results.map(r => [r.id, r]))
  return ids.map(id => map.get(id) ?? null)  // Ordered!
}

// ❌ DON'T: Return unsorted results
const badBatchFn = async (ids) => {
  return db.findMany({ id: { in: ids } })  // DB may return in different order!
}

// ✅ DO: Use prime() to seed cache from list queries
const posts = await db.post.findMany({ take: 100 })
for (const post of posts) {
  ctx.loaders.post.prime(post.id, post)  // No re-fetch needed if later loader.load(post.id)
}

Methodology

Download data from npm registry (weekly average, February 2026). Feature comparison based on dataloader v2.x and p-batch v3.x.

Compare GraphQL and API packages on PkgPulse →

See also: graphql-yoga vs apollo-server vs mercurius and pothos vs TypeGraphQL vs nexus, better-sqlite3 vs libsql vs sql.js.

The 2026 JavaScript Stack Cheatsheet

One PDF: the best package for every category (ORMs, bundlers, auth, testing, state management). Used by 500+ devs. Free, updated monthly.