Skip to main content

Guide

DuckDB vs ClickHouse vs QuestDB 2026

Compare DuckDB, ClickHouse, and QuestDB for analytical workloads. OLAP queries, time-series, Node.js integration, and which analytical database fits your.

·PkgPulse Team·
0

TL;DR: DuckDB is the embedded analytical database — runs in-process, reads Parquet/CSV/JSON directly, SQL on local files with zero infrastructure. ClickHouse is the column-oriented OLAP powerhouse — handles billions of rows, real-time ingestion, materialized views, and sub-second aggregation queries at scale. QuestDB is the time-series specialist — SQL with time-series extensions, high-throughput ingestion via InfluxDB Line Protocol, and optimized for ordered data. In 2026: DuckDB for local analytics and embedded OLAP, ClickHouse for large-scale analytical workloads, QuestDB for time-series and IoT data.

Key Takeaways

  • DuckDB: Embedded (in-process), zero infrastructure. Reads Parquet, CSV, JSON, Arrow natively. SQL with window functions, CTEs, lateral joins. Best for local analytics, data engineering pipelines, and embedding analytics in applications
  • ClickHouse: Distributed column store, petabyte-scale. MergeTree engine family, materialized views, real-time aggregation. Best for dashboards, log analytics, and any workload with billions of rows
  • QuestDB: Time-series optimized, column-based. SQL with SAMPLE BY, LATEST ON, ASOF JOIN. InfluxDB Line Protocol ingestion. Best for metrics, financial data, IoT, and any time-ordered dataset

DuckDB — Embedded Analytical Database

DuckDB runs in-process with zero infrastructure — think SQLite for analytics. It reads Parquet, CSV, and JSON files directly and executes complex OLAP queries locally.

Basic Setup and Queries

import duckdb from "duckdb";

// In-memory database
const db = new duckdb.Database(":memory:");
const conn = db.connect();

// Query Parquet files directly — no import step
conn.all(
  `SELECT
    product_category,
    COUNT(*) as order_count,
    SUM(amount) as total_revenue,
    AVG(amount) as avg_order_value
  FROM read_parquet('s3://analytics-bucket/orders/*.parquet')
  WHERE order_date >= '2026-01-01'
  GROUP BY product_category
  ORDER BY total_revenue DESC
  LIMIT 10`,
  (err, rows) => {
    console.table(rows);
  }
);

// Query CSV files with automatic type detection
conn.all(
  `SELECT * FROM read_csv_auto('data/users.csv') LIMIT 5`,
  (err, rows) => console.table(rows)
);

// Query JSON files
conn.all(
  `SELECT * FROM read_json_auto('data/events.json') LIMIT 5`,
  (err, rows) => console.table(rows)
);

Advanced Analytics

// Window functions for running totals and rankings
conn.all(`
  WITH daily_sales AS (
    SELECT
      date_trunc('day', order_date) as day,
      product_category,
      SUM(amount) as daily_revenue
    FROM read_parquet('orders.parquet')
    GROUP BY 1, 2
  )
  SELECT
    day,
    product_category,
    daily_revenue,
    SUM(daily_revenue) OVER (
      PARTITION BY product_category
      ORDER BY day
      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as cumulative_revenue,
    RANK() OVER (
      PARTITION BY day
      ORDER BY daily_revenue DESC
    ) as daily_rank
  FROM daily_sales
  ORDER BY day DESC, daily_rank
`, (err, rows) => console.table(rows));

// Pivot tables
conn.all(`
  PIVOT (
    SELECT
      date_part('month', order_date) as month,
      product_category,
      SUM(amount) as revenue
    FROM read_parquet('orders.parquet')
    WHERE order_date >= '2026-01-01'
    GROUP BY 1, 2
  )
  ON product_category
  USING SUM(revenue)
  ORDER BY month
`, (err, rows) => console.table(rows));

Persistent Database and Exports

// Persistent database on disk
const db = new duckdb.Database("analytics.duckdb");
const conn = db.connect();

// Create tables from external sources
conn.run(`
  CREATE TABLE orders AS
  SELECT * FROM read_parquet('s3://bucket/orders/*.parquet')
`);

// Create indexes for frequently filtered columns
conn.run(`CREATE INDEX idx_orders_date ON orders(order_date)`);

// Export query results to Parquet
conn.run(`
  COPY (
    SELECT product_category, COUNT(*) as cnt, SUM(amount) as total
    FROM orders
    GROUP BY product_category
  ) TO 'summary.parquet' (FORMAT PARQUET)
`);

// Export to CSV
conn.run(`
  COPY orders TO 'orders_backup.csv' (FORMAT CSV, HEADER)
`);

// Attach multiple databases
conn.run(`ATTACH 'users.duckdb' AS users_db`);
conn.all(`
  SELECT o.*, u.name, u.email
  FROM orders o
  JOIN users_db.users u ON o.user_id = u.id
`, (err, rows) => console.table(rows));

S3 and Remote Data

// Configure S3 access
conn.run(`
  SET s3_region = 'us-east-1';
  SET s3_access_key_id = '${process.env.AWS_ACCESS_KEY_ID}';
  SET s3_secret_access_key = '${process.env.AWS_SECRET_ACCESS_KEY}';
`);

// Query S3 Parquet files with glob patterns
conn.all(`
  SELECT
    date_trunc('hour', timestamp) as hour,
    COUNT(*) as request_count,
    AVG(response_time_ms) as avg_latency,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY response_time_ms) as p99
  FROM read_parquet('s3://logs-bucket/2026/03/**/*.parquet')
  WHERE status_code >= 500
  GROUP BY 1
  ORDER BY 1 DESC
`, (err, rows) => console.table(rows));

// Query remote HTTP files
conn.all(`
  SELECT * FROM read_csv_auto(
    'https://data.example.com/public/dataset.csv'
  ) LIMIT 100
`, (err, rows) => console.table(rows));

ClickHouse — Column-Oriented OLAP at Scale

ClickHouse is the high-performance column-oriented database for real-time analytical queries across billions of rows.

Node.js Client Setup

import { createClient } from "@clickhouse/client";

const client = createClient({
  url: "http://localhost:8123",
  username: "default",
  password: process.env.CLICKHOUSE_PASSWORD,
  database: "analytics",
});

// Create a table with MergeTree engine
await client.command({
  query: `
    CREATE TABLE IF NOT EXISTS events (
      event_id UUID DEFAULT generateUUIDv4(),
      timestamp DateTime64(3),
      user_id UInt64,
      event_type LowCardinality(String),
      properties String, -- JSON stored as String
      amount Decimal64(2),
      country LowCardinality(String)
    )
    ENGINE = MergeTree()
    PARTITION BY toYYYYMM(timestamp)
    ORDER BY (event_type, timestamp, user_id)
    TTL timestamp + INTERVAL 1 YEAR
  `,
});

High-Throughput Ingestion

// Batch insert — millions of rows per second
await client.insert({
  table: "events",
  values: events.map((e) => ({
    timestamp: e.timestamp,
    user_id: e.userId,
    event_type: e.type,
    properties: JSON.stringify(e.properties),
    amount: e.amount,
    country: e.country,
  })),
  format: "JSONEachRow",
});

// Stream insert for large datasets
const stream = client.insert({
  table: "events",
  format: "JSONEachRow",
});

for await (const batch of eventBatches) {
  await stream.add(batch);
}
await stream.end();

// Async insert for fire-and-forget (higher throughput)
await client.command({
  query: `
    INSERT INTO events FORMAT JSONEachRow
    ${events.map((e) => JSON.stringify(e)).join("\n")}
  `,
  clickhouse_settings: {
    async_insert: 1,
    wait_for_async_insert: 0, // Don't wait for flush
  },
});

Analytical Queries

// Real-time dashboard query — sub-second on billions of rows
const result = await client.query({
  query: `
    SELECT
      toStartOfHour(timestamp) AS hour,
      event_type,
      count() AS event_count,
      uniq(user_id) AS unique_users,
      sum(amount) AS total_amount,
      quantile(0.99)(amount) AS p99_amount
    FROM events
    WHERE timestamp >= now() - INTERVAL 24 HOUR
    GROUP BY hour, event_type
    ORDER BY hour DESC, event_count DESC
  `,
  format: "JSONEachRow",
});

const rows = await result.json();
console.table(rows);

// Funnel analysis
const funnel = await client.query({
  query: `
    SELECT
      level,
      count() AS users,
      round(users / first_value(users) OVER (ORDER BY level) * 100, 2) AS conversion_pct
    FROM (
      SELECT
        user_id,
        windowFunnel(86400)(
          timestamp,
          event_type = 'page_view',
          event_type = 'add_to_cart',
          event_type = 'checkout',
          event_type = 'purchase'
        ) AS level
      FROM events
      WHERE timestamp >= now() - INTERVAL 7 DAY
      GROUP BY user_id
    )
    GROUP BY level
    ORDER BY level
  `,
  format: "JSONEachRow",
});

Materialized Views

// Real-time aggregation with materialized views
await client.command({
  query: `
    -- Target table for aggregated data
    CREATE TABLE IF NOT EXISTS hourly_metrics (
      hour DateTime,
      event_type LowCardinality(String),
      country LowCardinality(String),
      event_count AggregateFunction(count, UInt64),
      unique_users AggregateFunction(uniq, UInt64),
      total_amount AggregateFunction(sum, Decimal64(2))
    )
    ENGINE = AggregatingMergeTree()
    PARTITION BY toYYYYMM(hour)
    ORDER BY (hour, event_type, country)
  `,
});

await client.command({
  query: `
    -- Materialized view auto-aggregates on insert
    CREATE MATERIALIZED VIEW IF NOT EXISTS mv_hourly_metrics
    TO hourly_metrics
    AS SELECT
      toStartOfHour(timestamp) AS hour,
      event_type,
      country,
      countState() AS event_count,
      uniqState(user_id) AS unique_users,
      sumState(amount) AS total_amount
    FROM events
    GROUP BY hour, event_type, country
  `,
});

// Query the materialized view — instant results
const metrics = await client.query({
  query: `
    SELECT
      hour,
      event_type,
      countMerge(event_count) AS events,
      uniqMerge(unique_users) AS users,
      sumMerge(total_amount) AS revenue
    FROM hourly_metrics
    WHERE hour >= now() - INTERVAL 7 DAY
    GROUP BY hour, event_type
    ORDER BY hour DESC
  `,
  format: "JSONEachRow",
});

QuestDB — Time-Series Database

QuestDB is optimized for time-series data with SQL extensions for time-based queries, high-throughput ingestion, and columnar storage.

Ingestion via InfluxDB Line Protocol

import net from "net";

// InfluxDB Line Protocol over TCP — fastest ingestion path
const socket = net.connect({ host: "localhost", port: 9009 });

// Send metrics in ILP format
function sendMetric(
  measurement: string,
  tags: Record<string, string>,
  fields: Record<string, number>,
  timestamp: bigint // nanoseconds
) {
  const tagStr = Object.entries(tags)
    .map(([k, v]) => `${k}=${v}`)
    .join(",");
  const fieldStr = Object.entries(fields)
    .map(([k, v]) => `${k}=${v}`)
    .join(",");

  socket.write(`${measurement},${tagStr} ${fieldStr} ${timestamp}\n`);
}

// Send API metrics
sendMetric(
  "api_requests",
  { endpoint: "/api/users", method: "GET", status: "200" },
  { duration_ms: 45.2, response_bytes: 1024 },
  BigInt(Date.now()) * 1000000n // Convert ms to ns
);

// Batch sending for high throughput
const batch = metrics
  .map(
    (m) =>
      `api_requests,endpoint=${m.endpoint},method=${m.method} ` +
      `duration_ms=${m.duration},status_code=${m.status} ` +
      `${BigInt(m.timestamp) * 1000000n}`
  )
  .join("\n");

socket.write(batch + "\n");

REST API Queries

// QuestDB REST API for SQL queries
async function queryQuestDB(sql: string) {
  const response = await fetch(
    `http://localhost:9000/exec?query=${encodeURIComponent(sql)}`,
    { method: "GET" }
  );
  return response.json();
}

// Basic aggregation
const hourly = await queryQuestDB(`
  SELECT
    timestamp,
    endpoint,
    count() as request_count,
    avg(duration_ms) as avg_latency,
    max(duration_ms) as max_latency
  FROM api_requests
  WHERE timestamp > dateadd('d', -1, now())
  SAMPLE BY 1h
  ALIGN TO CALENDAR
`);

console.table(hourly.dataset);

Time-Series SQL Extensions

// SAMPLE BY — time-based downsampling (QuestDB's killer feature)
const downsampled = await queryQuestDB(`
  SELECT
    timestamp,
    endpoint,
    count() as requests,
    avg(duration_ms) as avg_ms,
    percentile_cont(0.99) WITHIN GROUP (ORDER BY duration_ms) as p99_ms
  FROM api_requests
  WHERE timestamp > dateadd('h', -24, now())
  SAMPLE BY 15m
  FILL(NULL)
  ALIGN TO CALENDAR
`);

// LATEST ON — get the most recent value per series
const latest = await queryQuestDB(`
  SELECT * FROM sensor_readings
  LATEST ON timestamp PARTITION BY sensor_id
  WHERE timestamp > dateadd('h', -1, now())
`);

// ASOF JOIN — join time-series by closest timestamp
const correlated = await queryQuestDB(`
  SELECT
    t.timestamp,
    t.symbol,
    t.price as trade_price,
    q.bid,
    q.ask,
    t.price - q.bid as spread
  FROM trades t
  ASOF JOIN quotes q ON (t.symbol = q.symbol)
  WHERE t.timestamp > dateadd('h', -1, now())
`);

// LT JOIN — join where right timestamp is strictly less than left
const withPrevious = await queryQuestDB(`
  SELECT
    a.timestamp,
    a.value as current_value,
    b.value as previous_value,
    a.value - b.value as delta
  FROM metrics a
  LT JOIN metrics b ON (a.sensor_id = b.sensor_id)
  WHERE a.timestamp > dateadd('d', -7, now())
`);

Table Creation and Partitioning

// Create a partitioned table optimized for time-series
await queryQuestDB(`
  CREATE TABLE IF NOT EXISTS api_requests (
    timestamp TIMESTAMP,
    endpoint SYMBOL CAPACITY 256 INDEX,
    method SYMBOL CAPACITY 8,
    status_code SHORT,
    duration_ms DOUBLE,
    response_bytes LONG,
    user_id LONG,
    region SYMBOL CAPACITY 32
  ) TIMESTAMP(timestamp) PARTITION BY DAY WAL
  DEDUP UPSERT KEYS(timestamp, endpoint, user_id)
`);

// SYMBOL type = dictionary-encoded strings (like ClickHouse LowCardinality)
// WAL = Write-Ahead Log for concurrent readers/writers
// PARTITION BY DAY = one partition per day for efficient time range queries
// DEDUP UPSERT = handle duplicate events

// Detach old partitions for archival
await queryQuestDB(`
  ALTER TABLE api_requests DETACH PARTITION
  WHERE timestamp < dateadd('M', -6, now())
`);

Grafana Integration

// QuestDB has native Grafana support via PostgreSQL wire protocol
// grafana.ini datasource config:
// type: postgres
// host: questdb:8812
// database: qdb
// user: admin

// Dashboard query with Grafana variables
const grafanaQuery = `
  SELECT
    $__timeGroup(timestamp, $__interval) as time,
    endpoint,
    avg(duration_ms) as avg_latency,
    count() as requests
  FROM api_requests
  WHERE $__timeFilter(timestamp)
    AND endpoint = '$endpoint'
  SAMPLE BY $__interval
  FILL(NULL)
`;

Feature Comparison

FeatureDuckDBClickHouseQuestDB
ArchitectureEmbedded (in-process)Distributed serverServer (single-node + cloud)
StorageColumnarColumnar (MergeTree)Columnar (time-partitioned)
DeploymentLibrary / no serverSelf-hosted or cloudSelf-hosted or cloud
ScaleSingle machine (TBs)Petabytes (distributed)Single node (TBs)
IngestionFile reads (Parquet/CSV)Batch + streamingILP + REST + PostgreSQL
Query LanguageSQL (PostgreSQL-compatible)SQL (ClickHouse dialect)SQL (PostgreSQL + extensions)
Time-SeriesStandard SQLDateTime functionsSAMPLE BY, LATEST ON, ASOF JOIN
Materialized Views✅ (real-time)
Window Functions✅ (full)✅ (full)✅ (basic)
JSON Support✅ (native)✅ (JSON functions)Limited
S3/Remote Files✅ (native)✅ (s3 table function)
Node.js Clientduckdb npm@clickhouse/clientREST API / pg driver
ConcurrencySingle-writerHigh concurrencyWAL (concurrent R/W)
Replication✅ (built-in)Cloud only
LicenseMITApache 2.0Apache 2.0
Ideal WorkloadAd-hoc analytics, ETLDashboards, log analyticsMetrics, IoT, financial

When to Use Each

Choose DuckDB if:

  • You need analytics with zero infrastructure (embedded in your app)
  • You're querying Parquet, CSV, or JSON files directly from S3 or local disk
  • Your data fits on a single machine (up to low TBs)
  • You're building data pipelines or ETL workflows
  • You want SQLite-like simplicity for analytical queries

Choose ClickHouse if:

  • You have billions of rows and need sub-second query performance
  • Real-time dashboards and monitoring are your primary use case
  • You need materialized views for automatic pre-aggregation
  • You require distributed architecture across multiple nodes
  • Funnel analysis, cohort queries, and complex aggregations are important

Choose QuestDB if:

  • Your data is time-series (metrics, IoT sensors, financial ticks)
  • You need SAMPLE BY for natural time-based downsampling
  • ASOF JOIN and LATEST ON are critical for time-series correlation
  • High ingestion throughput via InfluxDB Line Protocol matters
  • You want SQL (not InfluxQL or PromQL) for time-series queries

Production Deployment and Infrastructure Considerations

Running ClickHouse in production requires meaningful infrastructure investment. The typical starting point is a three-node cluster with 32GB RAM each for workloads in the tens of billions of rows — smaller setups work but lose ClickHouse's strongest characteristic, which is distributed query execution across replicas. ClickHouse Cloud (the managed offering) abstracts away node management but introduces egress costs that can surprise teams doing data export or cross-region replication at scale. QuestDB's production deployment is simpler because it runs as a single binary with no external dependencies, but this also means it lacks built-in replication — the cloud product (QuestDB Cloud) handles replication, while self-hosted deployments rely on filesystem snapshots for backup. DuckDB's production story is intentionally minimal: it runs in-process with the application, which means database availability is tied to application availability. For persistent data, DuckDB writes to a single .duckdb file with a WAL for crash safety, but it has no replication or high-availability primitive by design. Teams that outgrow single-machine DuckDB typically move their data to cloud object storage (S3 or GCS as Parquet files) and continue querying it remotely.

Security and Access Control

The three databases take fundamentally different approaches to security. ClickHouse has the most mature RBAC system — users, roles, row-level security, and column-level access control are all first-class features, making it suitable for multi-tenant analytics where different business units should see different data slices. IP-based allow lists and SSL mutual TLS are available for network security. QuestDB's security model is simpler: authentication is handled at the HTTP API and PostgreSQL wire protocol level, with user management added relatively recently. For sensitive environments, teams typically put QuestDB behind a reverse proxy that handles authentication. DuckDB, running embedded, inherits the security of the process it runs in — there is no network listener and no authentication layer by default, which is secure for single-user local analytics but requires careful thought when embedding in a multi-tenant application. For shared DuckDB access, the MotherDuck cloud extension provides access control at the cloud layer.

TypeScript and Node.js Integration Patterns

Node.js integration quality varies significantly. The official @clickhouse/client is actively maintained by the ClickHouse team, ships with TypeScript types, and supports both streaming and batch query patterns — it is genuinely production-quality. The DuckDB Node.js client is functional but still uses a callback-based API at its core; the duckdb-async wrapper on npm provides promise wrappers, and a newer official WASM build (@duckdb/duckdb-wasm) enables DuckDB in browser environments. QuestDB lacks a dedicated Node.js client and relies on its PostgreSQL wire protocol (using the pg package) or HTTP REST API. The REST API is simple and sufficient for most use cases, but it lacks prepared statement support and connection pooling in the traditional sense. For type-safe SQL queries in Node.js across all three databases, tools like Kysely and Drizzle are beginning to add support, though coverage is strongest for the PostgreSQL wire protocol databases.

Ecosystem Context and Competitive Landscape

The analytical database space has moved remarkably fast since 2023. DuckDB went from a research project to the backbone of the "local-first analytics" movement, with embedded use in data science notebooks, Observable, and even browser applications via its WASM build. ClickHouse has benefited from the observability space's explosive growth — companies building internal developer portals, monitoring systems, and log analytics have found ClickHouse's materialized views and query performance unmatched for their specific needs. QuestDB competes in the time-series segment against InfluxDB, TimescaleDB, and newer entrants like GreptimeDB and TDengine. Its SQL-native approach differentiates it from InfluxQL and Flux-based systems, and the growing adoption of InfluxDB Line Protocol as a universal time-series ingestion standard benefits QuestDB since it speaks ILP natively. The broader trend is that analytical workloads that previously went to a dedicated data warehouse (Snowflake, BigQuery) are increasingly handled by embedded or single-binary systems for latency and cost reasons.

Choosing the Right Tool for Local vs. Distributed Workloads

The sharpest distinction between these three databases is the local-vs-distributed axis. DuckDB is explicitly designed to be embedded — it has no network listener, no authentication layer, and no replication. This is not a limitation but a deliberate design choice that makes it suitable for embedding analytics directly in application code without infrastructure overhead. ClickHouse and QuestDB both operate as server processes with network endpoints, authentication, and the operational overhead that implies. Before evaluating which server-side analytical database to deploy, teams should honestly assess whether DuckDB running against Parquet files on object storage covers their needs. Many analytics workloads that teams assume require a dedicated OLAP database can be handled by DuckDB reading S3 Parquet files, especially when combined with a data transformation pipeline that pre-aggregates data into Parquet snapshots. The infrastructure cost savings and operational simplicity of this approach are substantial compared to running a ClickHouse cluster.

Methodology

Feature comparison based on DuckDB v1.x, ClickHouse 24.x, and QuestDB 8.x documentation as of March 2026. Benchmarks reference official published results. Code examples use official Node.js clients and REST APIs. Evaluated on: query performance characteristics, ingestion methods, time-series capabilities, scalability, and developer experience with Node.js.

See also: How to Set Up Drizzle ORM with Next.js (2026 Guide), How to Migrate from Mongoose to Prisma, and Mongoose vs Prisma in 2026: MongoDB vs SQL-First

The 2026 JavaScript Stack Cheatsheet

One PDF: the best package for every category (ORMs, bundlers, auth, testing, state management). Used by 500+ devs. Free, updated monthly.