Skip to main content

pgvector vs Qdrant vs Weaviate: Vector Databases for JavaScript 2026

·PkgPulse Team

pgvector vs Qdrant vs Weaviate: Vector Databases for JavaScript 2026

TL;DR

Vector databases store and search embeddings — the numerical representations that power semantic search, RAG (Retrieval-Augmented Generation), and recommendation systems. pgvector is the pragmatic choice — add vector search to your existing PostgreSQL database using a Postgres extension. Qdrant is the purpose-built vector database — written in Rust for performance, it offers the best ANN (Approximate Nearest Neighbor) search speed and the richest filtering capabilities. Weaviate has the most complete AI integration — built-in vectorization modules (OpenAI, Cohere, HuggingFace), GraphQL query interface, and multi-modal support. If you already use Postgres: pgvector. For production vector search with complex filters: Qdrant. For schema-first AI-native applications: Weaviate.

Key Takeaways

  • pgvector adds <5ms overhead to existing Postgres queries — no new infrastructure required
  • Qdrant can handle 1M+ vectors with sub-millisecond query times at full scale
  • Weaviate's vectorizer modules can generate embeddings automatically without OpenAI SDK calls
  • pgvector is not a vector database — it's a Postgres extension; for large-scale ANN search, purpose-built DBs win
  • Qdrant GitHub stars: ~22k — fastest-growing standalone vector database
  • All three support HNSW algorithm — the standard for accurate, fast ANN search
  • pgvector v0.7 added HNSW — previously only IVFFlat, now competitive with standalone vector DBs

Vector databases solve one core problem: finding semantically similar items rather than exact matches.

Traditional search: WHERE title LIKE '%machine learning%' — misses "ML", "deep learning", "neural networks"

Vector search: Encode "machine learning" as an embedding → find documents with similar vectors → returns semantically related content even without keyword match.

Use cases:

  • RAG — retrieve relevant context from a knowledge base for LLM answers
  • Semantic search — find products/articles/documents by meaning, not keywords
  • Recommendations — find similar items based on user behavior vectors
  • Duplicate detection — find near-duplicate content at scale

pgvector: Vector Search in PostgreSQL

pgvector adds a vector data type and similarity search operators to PostgreSQL. If your app already uses Postgres, this is the lowest-friction path to vector search.

Installation

# Via Docker
docker run -d \
  --name pgvector \
  -e POSTGRES_PASSWORD=mypassword \
  -p 5432:5432 \
  pgvector/pgvector:pg16

# Or enable on existing Postgres with pgvector installed
-- Enable the extension
CREATE EXTENSION IF NOT EXISTS vector;

Schema Setup

-- Create a table with a vector column
CREATE TABLE documents (
  id          BIGSERIAL PRIMARY KEY,
  content     TEXT NOT NULL,
  embedding   vector(1536),  -- OpenAI text-embedding-3-small dimensions
  metadata    JSONB,
  created_at  TIMESTAMPTZ DEFAULT NOW()
);

-- Create HNSW index for fast ANN search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- Or IVFFlat (older, less accurate but sometimes faster for very large datasets)
-- CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
--   WITH (lists = 100);

Node.js with Drizzle ORM

// db/schema.ts — Drizzle + pgvector
import { pgTable, bigserial, text, jsonb, customType } from "drizzle-orm/pg-core";

const vector = customType<{ data: number[]; driverData: string }>({
  dataType(config) {
    return `vector(${config?.dimensions ?? 1536})`;
  },
  toDriver(value: number[]) {
    return JSON.stringify(value);
  },
  fromDriver(value: string) {
    return JSON.parse(value);
  },
});

export const documents = pgTable("documents", {
  id: bigserial("id", { mode: "number" }).primaryKey(),
  content: text("content").notNull(),
  embedding: vector("embedding", { dimensions: 1536 }),
  metadata: jsonb("metadata"),
});
// lib/vector-search.ts — semantic search with pgvector
import OpenAI from "openai";
import { drizzle } from "drizzle-orm/postgres-js";
import postgres from "postgres";
import { sql } from "drizzle-orm";

const openai = new OpenAI();
const client = postgres(process.env.DATABASE_URL!);
const db = drizzle(client);

// Generate embedding for a query
async function embed(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: text,
  });
  return response.data[0].embedding;
}

// Insert a document with its embedding
async function indexDocument(content: string, metadata: object) {
  const embedding = await embed(content);

  await db.execute(sql`
    INSERT INTO documents (content, embedding, metadata)
    VALUES (${content}, ${JSON.stringify(embedding)}::vector, ${JSON.stringify(metadata)})
  `);
}

// Semantic search — find similar documents
async function semanticSearch(query: string, limit = 5) {
  const queryEmbedding = await embed(query);

  const results = await db.execute(sql`
    SELECT
      id,
      content,
      metadata,
      1 - (embedding <=> ${JSON.stringify(queryEmbedding)}::vector) AS similarity
    FROM documents
    ORDER BY embedding <=> ${JSON.stringify(queryEmbedding)}::vector
    LIMIT ${limit}
  `);

  return results.rows;
}

// Hybrid search — combine keyword + semantic search
async function hybridSearch(query: string, limit = 5) {
  const queryEmbedding = await embed(query);

  // RRF (Reciprocal Rank Fusion) hybrid search
  const results = await db.execute(sql`
    WITH semantic AS (
      SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> ${JSON.stringify(queryEmbedding)}::vector) AS rank
      FROM documents
      LIMIT 50
    ),
    keyword AS (
      SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank(to_tsvector('english', content), plainto_tsquery('english', ${query})) DESC) AS rank
      FROM documents
      WHERE to_tsvector('english', content) @@ plainto_tsquery('english', ${query})
      LIMIT 50
    )
    SELECT
      d.id, d.content, d.metadata,
      COALESCE(1.0/(60 + s.rank), 0) + COALESCE(1.0/(60 + k.rank), 0) AS score
    FROM documents d
    LEFT JOIN semantic s ON d.id = s.id
    LEFT JOIN keyword k ON d.id = k.id
    WHERE s.id IS NOT NULL OR k.id IS NOT NULL
    ORDER BY score DESC
    LIMIT ${limit}
  `);

  return results.rows;
}

Qdrant is a standalone vector database written in Rust, designed for production-scale similarity search. It supports complex filtering, payload indexing, and multiple vector spaces per document.

Installation

# Docker
docker run -d \
  --name qdrant \
  -p 6333:6333 \
  -v qdrant_storage:/qdrant/storage \
  qdrant/qdrant:latest

# Node.js client
npm install @qdrant/js-client-rest

Collection Setup

import { QdrantClient } from "@qdrant/js-client-rest";

const client = new QdrantClient({ url: "http://localhost:6333" });

// Create a collection
await client.createCollection("documents", {
  vectors: {
    size: 1536,         // OpenAI text-embedding-3-small
    distance: "Cosine",
    hnsw_config: {
      m: 16,
      ef_construct: 100,
      full_scan_threshold: 10000,
    },
  },
  // Indexing payload fields for fast filtering
  optimizers_config: {
    indexing_threshold: 20000,
  },
});

// Create payload index for efficient filtering
await client.createPayloadIndex("documents", {
  field_name: "source",
  field_schema: "keyword",
});

await client.createPayloadIndex("documents", {
  field_name: "created_at",
  field_schema: "datetime",
});
import OpenAI from "openai";
import { QdrantClient } from "@qdrant/js-client-rest";

const openai = new OpenAI();
const qdrant = new QdrantClient({ url: "http://localhost:6333" });

// Batch insert with embeddings
async function indexDocuments(docs: Array<{ id: string; content: string; source: string }>) {
  const embeddings = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: docs.map((d) => d.content),
  });

  await qdrant.upsert("documents", {
    wait: true,
    points: docs.map((doc, i) => ({
      id: doc.id,
      vector: embeddings.data[i].embedding,
      payload: {
        content: doc.content,
        source: doc.source,
        created_at: new Date().toISOString(),
      },
    })),
  });
}

// Semantic search with filtering
async function search(query: string, source?: string) {
  const queryEmbedding = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: query,
  });

  const results = await qdrant.search("documents", {
    vector: queryEmbedding.data[0].embedding,
    limit: 5,
    with_payload: true,
    // Payload filter — combine vector search with metadata filtering
    filter: source
      ? {
          must: [{ key: "source", match: { value: source } }],
        }
      : undefined,
    score_threshold: 0.7,  // Only return high-similarity results
  });

  return results.map((r) => ({
    score: r.score,
    content: (r.payload as any).content,
    source: (r.payload as any).source,
  }));
}

Multi-Vector Search (Sparse + Dense)

// Qdrant supports hybrid search natively with sparse + dense vectors
// Dense: semantic embedding (OpenAI, Cohere)
// Sparse: BM25/SPLADE keyword weights

// Collection with both vector types
await client.createCollection("hybrid-docs", {
  vectors: {
    dense: { size: 1536, distance: "Cosine" },
  },
  sparse_vectors: {
    sparse: {},  // BM25 sparse vectors
  },
});

// Search with prefetch fusion
const results = await client.query("hybrid-docs", {
  prefetch: [
    {
      query: denseEmbedding,
      using: "dense",
      limit: 20,
    },
    {
      query: { indices: sparseIndices, values: sparseValues },
      using: "sparse",
      limit: 20,
    },
  ],
  query: { fusion: "rrf" },  // Reciprocal Rank Fusion
  limit: 5,
  with_payload: true,
});

Weaviate: Schema-First AI Integration

Weaviate has built-in vectorizer modules — instead of calling OpenAI separately and storing embeddings yourself, you configure which model to use and Weaviate handles vectorization automatically.

Installation

# Docker with OpenAI vectorizer
docker run -d \
  --name weaviate \
  -p 8080:8080 \
  -e OPENAI_APIKEY=$OPENAI_API_KEY \
  -e ENABLE_MODULES="text2vec-openai,generative-openai" \
  -e DEFAULT_VECTORIZER_MODULE="text2vec-openai" \
  cr.weaviate.io/semitechnologies/weaviate:latest

npm install weaviate-client

Schema and Collection Setup

import weaviate, { WeaviateClient } from "weaviate-client";

const client: WeaviateClient = await weaviate.connectToLocal();

// Define collection with auto-vectorization
await client.collections.create({
  name: "Document",
  vectorizers: weaviate.configure.vectorizer.text2VecOpenAI({
    model: "text-embedding-3-small",
    modelVersion: "3",
  }),
  generative: weaviate.configure.generative.openAI({
    model: "gpt-4o",
  }),
  properties: [
    { name: "content", dataType: weaviate.configure.dataType.TEXT },
    { name: "source", dataType: weaviate.configure.dataType.TEXT },
    { name: "category", dataType: weaviate.configure.dataType.TEXT },
    { name: "createdAt", dataType: weaviate.configure.dataType.DATE },
  ],
});

Insert and Search

const documents = client.collections.get("Document");

// Insert — Weaviate auto-vectorizes using the configured model
await documents.data.insertMany([
  {
    content: "Machine learning is a subset of artificial intelligence",
    source: "textbook",
    category: "AI",
  },
  {
    content: "Neural networks are inspired by biological brains",
    source: "paper",
    category: "AI",
  },
]);

// Semantic search — no need to generate embeddings yourself
const results = await documents.query.nearText(["deep learning concepts"], {
  limit: 5,
  returnMetadata: ["score"],
  filters: documents.filter.byProperty("category").equal("AI"),
});

results.objects.forEach((obj) => {
  console.log(obj.properties.content);
  console.log("Score:", obj.metadata?.score);
});

Generative Search (RAG Built-In)

// Weaviate generative search — retrieve + generate in one query
const ragResult = await documents.generate.nearText(
  ["What is machine learning?"],
  {
    singlePrompt: "Explain this in simple terms: {content}",
  },
  { limit: 3 }
);

ragResult.objects.forEach((obj) => {
  console.log("Generated:", obj.generated);
  console.log("Source:", obj.properties.content);
});

Feature Comparison

FeaturepgvectorQdrantWeaviate
TypePostgres extensionStandalone vector DBStandalone vector DB
New infra needed❌ (uses existing PG)
Auto-vectorization✅ Built-in
HNSW support✅ v0.7+
Hybrid searchManual SQL✅ Native✅ Native
Complex filteringSQL WHERE✅ JSON filter✅ GraphQL
Multi-tenancySchemas/tables✅ Collections✅ Tenants
Generative search✅ Built-in
Node.js clientVia pg/postgres.js✅ Official✅ Official
Self-hostable
Managed cloudVia Neon/Supabase✅ Qdrant Cloud✅ Weaviate Cloud
Scale to 1M+ vectorsPossible (with tuning)✅ Excellent✅ Good
GitHub stars14k22k12k

When to Use Each

Choose pgvector if:

  • You already run PostgreSQL and don't want new infrastructure
  • Your vector dataset is under 1M documents (pgvector scales reasonably)
  • You need to JOIN vector search results with relational data (native in Postgres)
  • Your team knows SQL and prefers one database for everything

Choose Qdrant if:

  • Your vector dataset exceeds 1M documents or you need sub-millisecond query times
  • You need complex payload filtering combined with vector search
  • You're building a production search system and need fine-grained performance tuning
  • Multiple vector spaces per document (multi-modal: text + image vectors) are needed

Choose Weaviate if:

  • Auto-vectorization (no external embedding calls) simplifies your pipeline
  • Built-in generative search (RAG without extra orchestration) fits your use case
  • GraphQL query interface fits your team's preference
  • You're building multi-modal search (text + images + video)

Methodology

Data sourced from GitHub repositories (star counts as of February 2026), official benchmarks (ANN-Benchmarks, Qdrant benchmarks suite), and community performance reports. pgvector performance benchmarks from the pgvector GitHub repository and community testing. npm weekly download statistics from npmjs.com (January 2026).


Related: Langfuse vs LangSmith vs Helicone for LLM observability in RAG pipelines, or Mastra vs LangChain.js vs GenKit for AI frameworks that use vector search.

Comments

Stay Updated

Get the latest package insights, npm trends, and tooling tips delivered to your inbox.