Skip to main content

DuckDB vs ClickHouse vs QuestDB: Analytical Databases Compared (2026)

·PkgPulse Team

TL;DR: DuckDB is the embedded analytical database — runs in-process, reads Parquet/CSV/JSON directly, SQL on local files with zero infrastructure. ClickHouse is the column-oriented OLAP powerhouse — handles billions of rows, real-time ingestion, materialized views, and sub-second aggregation queries at scale. QuestDB is the time-series specialist — SQL with time-series extensions, high-throughput ingestion via InfluxDB Line Protocol, and optimized for ordered data. In 2026: DuckDB for local analytics and embedded OLAP, ClickHouse for large-scale analytical workloads, QuestDB for time-series and IoT data.

Key Takeaways

  • DuckDB: Embedded (in-process), zero infrastructure. Reads Parquet, CSV, JSON, Arrow natively. SQL with window functions, CTEs, lateral joins. Best for local analytics, data engineering pipelines, and embedding analytics in applications
  • ClickHouse: Distributed column store, petabyte-scale. MergeTree engine family, materialized views, real-time aggregation. Best for dashboards, log analytics, and any workload with billions of rows
  • QuestDB: Time-series optimized, column-based. SQL with SAMPLE BY, LATEST ON, ASOF JOIN. InfluxDB Line Protocol ingestion. Best for metrics, financial data, IoT, and any time-ordered dataset

DuckDB — Embedded Analytical Database

DuckDB runs in-process with zero infrastructure — think SQLite for analytics. It reads Parquet, CSV, and JSON files directly and executes complex OLAP queries locally.

Basic Setup and Queries

import duckdb from "duckdb";

// In-memory database
const db = new duckdb.Database(":memory:");
const conn = db.connect();

// Query Parquet files directly — no import step
conn.all(
  `SELECT
    product_category,
    COUNT(*) as order_count,
    SUM(amount) as total_revenue,
    AVG(amount) as avg_order_value
  FROM read_parquet('s3://analytics-bucket/orders/*.parquet')
  WHERE order_date >= '2026-01-01'
  GROUP BY product_category
  ORDER BY total_revenue DESC
  LIMIT 10`,
  (err, rows) => {
    console.table(rows);
  }
);

// Query CSV files with automatic type detection
conn.all(
  `SELECT * FROM read_csv_auto('data/users.csv') LIMIT 5`,
  (err, rows) => console.table(rows)
);

// Query JSON files
conn.all(
  `SELECT * FROM read_json_auto('data/events.json') LIMIT 5`,
  (err, rows) => console.table(rows)
);

Advanced Analytics

// Window functions for running totals and rankings
conn.all(`
  WITH daily_sales AS (
    SELECT
      date_trunc('day', order_date) as day,
      product_category,
      SUM(amount) as daily_revenue
    FROM read_parquet('orders.parquet')
    GROUP BY 1, 2
  )
  SELECT
    day,
    product_category,
    daily_revenue,
    SUM(daily_revenue) OVER (
      PARTITION BY product_category
      ORDER BY day
      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as cumulative_revenue,
    RANK() OVER (
      PARTITION BY day
      ORDER BY daily_revenue DESC
    ) as daily_rank
  FROM daily_sales
  ORDER BY day DESC, daily_rank
`, (err, rows) => console.table(rows));

// Pivot tables
conn.all(`
  PIVOT (
    SELECT
      date_part('month', order_date) as month,
      product_category,
      SUM(amount) as revenue
    FROM read_parquet('orders.parquet')
    WHERE order_date >= '2026-01-01'
    GROUP BY 1, 2
  )
  ON product_category
  USING SUM(revenue)
  ORDER BY month
`, (err, rows) => console.table(rows));

Persistent Database and Exports

// Persistent database on disk
const db = new duckdb.Database("analytics.duckdb");
const conn = db.connect();

// Create tables from external sources
conn.run(`
  CREATE TABLE orders AS
  SELECT * FROM read_parquet('s3://bucket/orders/*.parquet')
`);

// Create indexes for frequently filtered columns
conn.run(`CREATE INDEX idx_orders_date ON orders(order_date)`);

// Export query results to Parquet
conn.run(`
  COPY (
    SELECT product_category, COUNT(*) as cnt, SUM(amount) as total
    FROM orders
    GROUP BY product_category
  ) TO 'summary.parquet' (FORMAT PARQUET)
`);

// Export to CSV
conn.run(`
  COPY orders TO 'orders_backup.csv' (FORMAT CSV, HEADER)
`);

// Attach multiple databases
conn.run(`ATTACH 'users.duckdb' AS users_db`);
conn.all(`
  SELECT o.*, u.name, u.email
  FROM orders o
  JOIN users_db.users u ON o.user_id = u.id
`, (err, rows) => console.table(rows));

S3 and Remote Data

// Configure S3 access
conn.run(`
  SET s3_region = 'us-east-1';
  SET s3_access_key_id = '${process.env.AWS_ACCESS_KEY_ID}';
  SET s3_secret_access_key = '${process.env.AWS_SECRET_ACCESS_KEY}';
`);

// Query S3 Parquet files with glob patterns
conn.all(`
  SELECT
    date_trunc('hour', timestamp) as hour,
    COUNT(*) as request_count,
    AVG(response_time_ms) as avg_latency,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY response_time_ms) as p99
  FROM read_parquet('s3://logs-bucket/2026/03/**/*.parquet')
  WHERE status_code >= 500
  GROUP BY 1
  ORDER BY 1 DESC
`, (err, rows) => console.table(rows));

// Query remote HTTP files
conn.all(`
  SELECT * FROM read_csv_auto(
    'https://data.example.com/public/dataset.csv'
  ) LIMIT 100
`, (err, rows) => console.table(rows));

ClickHouse — Column-Oriented OLAP at Scale

ClickHouse is the high-performance column-oriented database for real-time analytical queries across billions of rows.

Node.js Client Setup

import { createClient } from "@clickhouse/client";

const client = createClient({
  url: "http://localhost:8123",
  username: "default",
  password: process.env.CLICKHOUSE_PASSWORD,
  database: "analytics",
});

// Create a table with MergeTree engine
await client.command({
  query: `
    CREATE TABLE IF NOT EXISTS events (
      event_id UUID DEFAULT generateUUIDv4(),
      timestamp DateTime64(3),
      user_id UInt64,
      event_type LowCardinality(String),
      properties String, -- JSON stored as String
      amount Decimal64(2),
      country LowCardinality(String)
    )
    ENGINE = MergeTree()
    PARTITION BY toYYYYMM(timestamp)
    ORDER BY (event_type, timestamp, user_id)
    TTL timestamp + INTERVAL 1 YEAR
  `,
});

High-Throughput Ingestion

// Batch insert — millions of rows per second
await client.insert({
  table: "events",
  values: events.map((e) => ({
    timestamp: e.timestamp,
    user_id: e.userId,
    event_type: e.type,
    properties: JSON.stringify(e.properties),
    amount: e.amount,
    country: e.country,
  })),
  format: "JSONEachRow",
});

// Stream insert for large datasets
const stream = client.insert({
  table: "events",
  format: "JSONEachRow",
});

for await (const batch of eventBatches) {
  await stream.add(batch);
}
await stream.end();

// Async insert for fire-and-forget (higher throughput)
await client.command({
  query: `
    INSERT INTO events FORMAT JSONEachRow
    ${events.map((e) => JSON.stringify(e)).join("\n")}
  `,
  clickhouse_settings: {
    async_insert: 1,
    wait_for_async_insert: 0, // Don't wait for flush
  },
});

Analytical Queries

// Real-time dashboard query — sub-second on billions of rows
const result = await client.query({
  query: `
    SELECT
      toStartOfHour(timestamp) AS hour,
      event_type,
      count() AS event_count,
      uniq(user_id) AS unique_users,
      sum(amount) AS total_amount,
      quantile(0.99)(amount) AS p99_amount
    FROM events
    WHERE timestamp >= now() - INTERVAL 24 HOUR
    GROUP BY hour, event_type
    ORDER BY hour DESC, event_count DESC
  `,
  format: "JSONEachRow",
});

const rows = await result.json();
console.table(rows);

// Funnel analysis
const funnel = await client.query({
  query: `
    SELECT
      level,
      count() AS users,
      round(users / first_value(users) OVER (ORDER BY level) * 100, 2) AS conversion_pct
    FROM (
      SELECT
        user_id,
        windowFunnel(86400)(
          timestamp,
          event_type = 'page_view',
          event_type = 'add_to_cart',
          event_type = 'checkout',
          event_type = 'purchase'
        ) AS level
      FROM events
      WHERE timestamp >= now() - INTERVAL 7 DAY
      GROUP BY user_id
    )
    GROUP BY level
    ORDER BY level
  `,
  format: "JSONEachRow",
});

Materialized Views

// Real-time aggregation with materialized views
await client.command({
  query: `
    -- Target table for aggregated data
    CREATE TABLE IF NOT EXISTS hourly_metrics (
      hour DateTime,
      event_type LowCardinality(String),
      country LowCardinality(String),
      event_count AggregateFunction(count, UInt64),
      unique_users AggregateFunction(uniq, UInt64),
      total_amount AggregateFunction(sum, Decimal64(2))
    )
    ENGINE = AggregatingMergeTree()
    PARTITION BY toYYYYMM(hour)
    ORDER BY (hour, event_type, country)
  `,
});

await client.command({
  query: `
    -- Materialized view auto-aggregates on insert
    CREATE MATERIALIZED VIEW IF NOT EXISTS mv_hourly_metrics
    TO hourly_metrics
    AS SELECT
      toStartOfHour(timestamp) AS hour,
      event_type,
      country,
      countState() AS event_count,
      uniqState(user_id) AS unique_users,
      sumState(amount) AS total_amount
    FROM events
    GROUP BY hour, event_type, country
  `,
});

// Query the materialized view — instant results
const metrics = await client.query({
  query: `
    SELECT
      hour,
      event_type,
      countMerge(event_count) AS events,
      uniqMerge(unique_users) AS users,
      sumMerge(total_amount) AS revenue
    FROM hourly_metrics
    WHERE hour >= now() - INTERVAL 7 DAY
    GROUP BY hour, event_type
    ORDER BY hour DESC
  `,
  format: "JSONEachRow",
});

QuestDB — Time-Series Database

QuestDB is optimized for time-series data with SQL extensions for time-based queries, high-throughput ingestion, and columnar storage.

Ingestion via InfluxDB Line Protocol

import net from "net";

// InfluxDB Line Protocol over TCP — fastest ingestion path
const socket = net.connect({ host: "localhost", port: 9009 });

// Send metrics in ILP format
function sendMetric(
  measurement: string,
  tags: Record<string, string>,
  fields: Record<string, number>,
  timestamp: bigint // nanoseconds
) {
  const tagStr = Object.entries(tags)
    .map(([k, v]) => `${k}=${v}`)
    .join(",");
  const fieldStr = Object.entries(fields)
    .map(([k, v]) => `${k}=${v}`)
    .join(",");

  socket.write(`${measurement},${tagStr} ${fieldStr} ${timestamp}\n`);
}

// Send API metrics
sendMetric(
  "api_requests",
  { endpoint: "/api/users", method: "GET", status: "200" },
  { duration_ms: 45.2, response_bytes: 1024 },
  BigInt(Date.now()) * 1000000n // Convert ms to ns
);

// Batch sending for high throughput
const batch = metrics
  .map(
    (m) =>
      `api_requests,endpoint=${m.endpoint},method=${m.method} ` +
      `duration_ms=${m.duration},status_code=${m.status} ` +
      `${BigInt(m.timestamp) * 1000000n}`
  )
  .join("\n");

socket.write(batch + "\n");

REST API Queries

// QuestDB REST API for SQL queries
async function queryQuestDB(sql: string) {
  const response = await fetch(
    `http://localhost:9000/exec?query=${encodeURIComponent(sql)}`,
    { method: "GET" }
  );
  return response.json();
}

// Basic aggregation
const hourly = await queryQuestDB(`
  SELECT
    timestamp,
    endpoint,
    count() as request_count,
    avg(duration_ms) as avg_latency,
    max(duration_ms) as max_latency
  FROM api_requests
  WHERE timestamp > dateadd('d', -1, now())
  SAMPLE BY 1h
  ALIGN TO CALENDAR
`);

console.table(hourly.dataset);

Time-Series SQL Extensions

// SAMPLE BY — time-based downsampling (QuestDB's killer feature)
const downsampled = await queryQuestDB(`
  SELECT
    timestamp,
    endpoint,
    count() as requests,
    avg(duration_ms) as avg_ms,
    percentile_cont(0.99) WITHIN GROUP (ORDER BY duration_ms) as p99_ms
  FROM api_requests
  WHERE timestamp > dateadd('h', -24, now())
  SAMPLE BY 15m
  FILL(NULL)
  ALIGN TO CALENDAR
`);

// LATEST ON — get the most recent value per series
const latest = await queryQuestDB(`
  SELECT * FROM sensor_readings
  LATEST ON timestamp PARTITION BY sensor_id
  WHERE timestamp > dateadd('h', -1, now())
`);

// ASOF JOIN — join time-series by closest timestamp
const correlated = await queryQuestDB(`
  SELECT
    t.timestamp,
    t.symbol,
    t.price as trade_price,
    q.bid,
    q.ask,
    t.price - q.bid as spread
  FROM trades t
  ASOF JOIN quotes q ON (t.symbol = q.symbol)
  WHERE t.timestamp > dateadd('h', -1, now())
`);

// LT JOIN — join where right timestamp is strictly less than left
const withPrevious = await queryQuestDB(`
  SELECT
    a.timestamp,
    a.value as current_value,
    b.value as previous_value,
    a.value - b.value as delta
  FROM metrics a
  LT JOIN metrics b ON (a.sensor_id = b.sensor_id)
  WHERE a.timestamp > dateadd('d', -7, now())
`);

Table Creation and Partitioning

// Create a partitioned table optimized for time-series
await queryQuestDB(`
  CREATE TABLE IF NOT EXISTS api_requests (
    timestamp TIMESTAMP,
    endpoint SYMBOL CAPACITY 256 INDEX,
    method SYMBOL CAPACITY 8,
    status_code SHORT,
    duration_ms DOUBLE,
    response_bytes LONG,
    user_id LONG,
    region SYMBOL CAPACITY 32
  ) TIMESTAMP(timestamp) PARTITION BY DAY WAL
  DEDUP UPSERT KEYS(timestamp, endpoint, user_id)
`);

// SYMBOL type = dictionary-encoded strings (like ClickHouse LowCardinality)
// WAL = Write-Ahead Log for concurrent readers/writers
// PARTITION BY DAY = one partition per day for efficient time range queries
// DEDUP UPSERT = handle duplicate events

// Detach old partitions for archival
await queryQuestDB(`
  ALTER TABLE api_requests DETACH PARTITION
  WHERE timestamp < dateadd('M', -6, now())
`);

Grafana Integration

// QuestDB has native Grafana support via PostgreSQL wire protocol
// grafana.ini datasource config:
// type: postgres
// host: questdb:8812
// database: qdb
// user: admin

// Dashboard query with Grafana variables
const grafanaQuery = `
  SELECT
    $__timeGroup(timestamp, $__interval) as time,
    endpoint,
    avg(duration_ms) as avg_latency,
    count() as requests
  FROM api_requests
  WHERE $__timeFilter(timestamp)
    AND endpoint = '$endpoint'
  SAMPLE BY $__interval
  FILL(NULL)
`;

Feature Comparison

FeatureDuckDBClickHouseQuestDB
ArchitectureEmbedded (in-process)Distributed serverServer (single-node + cloud)
StorageColumnarColumnar (MergeTree)Columnar (time-partitioned)
DeploymentLibrary / no serverSelf-hosted or cloudSelf-hosted or cloud
ScaleSingle machine (TBs)Petabytes (distributed)Single node (TBs)
IngestionFile reads (Parquet/CSV)Batch + streamingILP + REST + PostgreSQL
Query LanguageSQL (PostgreSQL-compatible)SQL (ClickHouse dialect)SQL (PostgreSQL + extensions)
Time-SeriesStandard SQLDateTime functionsSAMPLE BY, LATEST ON, ASOF JOIN
Materialized Views✅ (real-time)
Window Functions✅ (full)✅ (full)✅ (basic)
JSON Support✅ (native)✅ (JSON functions)Limited
S3/Remote Files✅ (native)✅ (s3 table function)
Node.js Clientduckdb npm@clickhouse/clientREST API / pg driver
ConcurrencySingle-writerHigh concurrencyWAL (concurrent R/W)
Replication✅ (built-in)Cloud only
LicenseMITApache 2.0Apache 2.0
Ideal WorkloadAd-hoc analytics, ETLDashboards, log analyticsMetrics, IoT, financial

When to Use Each

Choose DuckDB if:

  • You need analytics with zero infrastructure (embedded in your app)
  • You're querying Parquet, CSV, or JSON files directly from S3 or local disk
  • Your data fits on a single machine (up to low TBs)
  • You're building data pipelines or ETL workflows
  • You want SQLite-like simplicity for analytical queries

Choose ClickHouse if:

  • You have billions of rows and need sub-second query performance
  • Real-time dashboards and monitoring are your primary use case
  • You need materialized views for automatic pre-aggregation
  • You require distributed architecture across multiple nodes
  • Funnel analysis, cohort queries, and complex aggregations are important

Choose QuestDB if:

  • Your data is time-series (metrics, IoT sensors, financial ticks)
  • You need SAMPLE BY for natural time-based downsampling
  • ASOF JOIN and LATEST ON are critical for time-series correlation
  • High ingestion throughput via InfluxDB Line Protocol matters
  • You want SQL (not InfluxQL or PromQL) for time-series queries

Methodology

Feature comparison based on DuckDB v1.x, ClickHouse 24.x, and QuestDB 8.x documentation as of March 2026. Benchmarks reference official published results. Code examples use official Node.js clients and REST APIs. Evaluated on: query performance characteristics, ingestion methods, time-series capabilities, scalability, and developer experience with Node.js.

Comments

Stay Updated

Get the latest package insights, npm trends, and tooling tips delivered to your inbox.