Vector Databases in Production: Choosing Between PostgreSQL pgvector and Purpose-Built Solutions

Vector Databases in Production: Choosing Between PostgreSQL pgvector and Purpose-Built Solutions

We had a working prototype. Semantic search was returning relevant results. Then our lead engineer dropped the question nobody wanted to hear: “Cool, but what happens when we have 50x more data?”

We'd been dodging that conversation for weeks. Keep storing vectors inside PostgreSQL using pgvector, or spin up a dedicated vector database like Pinecone? We stuck with pgvector. Still running it in production a year later. But the answer for your team might look completely different. This post walks through how we made that call, with actual numbers from our vacation rental management platform.

What We Built and Why Vector Search Mattered

VRMS is rental management software. It pushes listings to Airbnb and Booking.com, handles pricing, tracks reservations, manages guest conversations, and generates owner payouts - the full operational stack for vacation rental managers. We architected it as multi-tenant from day one. Each client gets their own API layer, database, and background workers. No shared resource and no cross-tenant data leakage.

The semantic search piece lives inside the guest communication inbox. Guests message things like “Where is parking? Is it free?” or “What's the WiFi password? Can I check in early?” Our system has to understand what they're actually asking, pull the right details from that specific property's knowledge base, and draft something accurate.

Keyword matching breaks down immediately with real guest messages. Someone writes “Where do I leave my car?” and expects parking information back, even though they never typed the word “parking.” Vector embeddings solve that problem.

How We Implemented pgvector

Every chunk of property information gets converted into a 1536-dimensional vector through OpenAI's text-embedding-3-small model. Listing descriptions. Amenity lists. House rules. Check-in instructions. WiFi credentials. Room layouts. Background workers regenerate embeddings whenever property managers touch their listings.

Storage happens in PostgreSQL with the pgvector extension. HNSW indexing handles the similarity search. We partition by property, so each listing maintains its own knowledge base. Keeps retrieval tight and contextually accurate.

The flow works like this: guest message arrives, system categorizes intent (WiFi? parking? availability?), converts the message to an embedding, runs cosine similarity against that property's vectors, grabs the top matches, combines them with conversation history, and feeds everything to GPT-4. The operator sees a draft response they can tweak and send.

Compound questions work fine. “What's the WiFi password and where do I park?” pulls both topics and merges them into one reply. No extra logic required.

The Numbers That Actually Matter

Our first ingestion pipeline was embarrassingly slow. Importing a large legacy Airbnb account took over seven days. Thousands of listings, tons of historical reservation data, the works. We tore it apart and rebuilt with parallel processing, chunking, and delta updates.

New import time: three to four hours. Manual intervention dropped by roughly 95 percent. Throughput jumped about 100x.

Search latency sits comfortably under 100ms for typical property knowledge bases. HNSW indexing does the heavy lifting there. And because each tenant runs on their own database instance, one client hammering queries never slows down another.

The ceiling shows up around a few hundred thousand vectors per tenant. Past that point, query times climb unless you add memory or start partitioning. Our per-property design sidesteps this for now. A system with one massive shared embedding corpus would hit the wall much sooner.

Costs You Won't Find in the Pricing Calculator

Embedding generation bills come from OpenAI. Delta updates keep us from reprocessing entire knowledge bases on every property edit, which makes the API spend predictable month to month.

Vector storage itself costs almost nothing. A 1536-dimensional embedding is a few kilobytes. The actual expense hides in HNSW index overhead and keeping enough RAM allocated for fast retrieval.

Going with a managed vector database would mean monthly service fees plus some degree of vendor lock-in. You'd get automatic scaling and cross-region replication without lifting a finger. We chose to keep everything in-house. More control, lower direct costs. Also more 2am Slack messages when something goes sideways.

Why pgvector Fit Our Constraints

Three factors drove the decision.

  1. Our embeddings needed to sit right next to relational data. Property vectors have to join against pricing tables, availability calendars, reservation records. Running queries across two different databases would mean building sync logic, dealing with eventual consistency, and doubling our failure modes. Not worth it.
  2. We already had per-tenant database isolation baked into the architecture. VRMS gives each client their own PostgreSQL instance. Adding vector storage to that existing pattern felt obvious. Spinning up a separate vector database would have meant either replicating the isolation model somewhere else or compromising a design we'd already battle-tested.
  3. Bandwidth, of course. Our team isn't infinite. One database means one backup strategy, one monitoring setup, one credential rotation process. That simplicity compounds. We felt it every time we shipped a feature without coordinating across multiple data stores.

Where pgvector Hits Its Limits

If your users need to query millions of embeddings simultaneously, pgvector will buckle. Same if you need sub-10ms latency globally with automatic regional failover.

Pinecone, Weaviate, Qdrant, Milvus. These purpose-built systems exist precisely for those scenarios. They ship with specialized indexing, managed scaling, and hybrid search features that combine vector similarity with metadata filtering. pgvector doesn't compete on those fronts yet.

Teams planning to swap embedding models frequently or experiment across different vector backends will find dedicated databases more accommodating too. Migrations between vector stores tend to be simpler than wrestling with PostgreSQL extension upgrades.

Figuring Out What Your System Needs

Map your data model first. If embeddings require transactional consistency with relational tables, pgvector deserves a hard look. If vectors live independently, the coupling argument loses steam.

Project your scale trajectory honestly. Tens of thousands of vectors per tenant? pgvector handles that without drama. Millions per tenant with cross-tenant search patterns? Budget for Pinecone or similar.

Test the entire path. Benchmark ingestion through retrieval through user response. Synthetic query tests miss the bottlenecks that show up during multi-turn conversations and concurrent writes.

And be realistic about ops capacity. A unified stack simplifies everything until the moment it doesn't. Index tuning, memory optimization, sharding. Someone on your team has to own that as data grows.

Keeping Production Stable

Vector search needs observability like any other critical path. We run a three-stage setup.

Instrumentation lives inside the application code. Query latencies, embedding generation times, result counts, GPT-4 response durations. Everything gets logged.

Ingestion pushes that data to Elasticsearch (Kibana). Datadog or Grafana would work just as well.

Analysis happens through dashboards built for the questions we actually ask. Latency trending upward? We catch it before users complain. Ingestion jobs dying silently? Alerts fire.

Automated health checks cover the rest. Sync delays, channel connectivity failures, missing listing data. Issues get flagged before they cascade into something customers notice.

How Devslane Approaches These Projects

We've shipped semantic search for clients across vacation rentals, fintech, and enterprise SaaS. Some systems run pgvector. Others run Pinecone or Weaviate. The right choice depends entirely on constraints.

If you're weighing options, we can run a structured evaluation against your actual data and traffic. Define benchmarks. Test real performance. Map out a migration path if the initial choice turns out wrong.

Our team augmentation model embeds engineers directly into your existing workflows. A two-week risk-free trial gives you space to validate architecture decisions before committing further.

Building something new? Fixing something old? We take it from data model through implementation and set up the observability layer so you're not flying blind in production.

FAQs (Frequently Asked Questions)

What is pgvector and how does it compare to dedicated vector databases?

pgvector is a PostgreSQL extension that adds vector data types and similarity search to your existing database. Dedicated systems like Pinecone and Weaviate do nothing but store and query embeddings. pgvector shines when you need vectors tightly coupled with relational data. Dedicated databases win on massive scale and managed infrastructure.

How many vectors can pgvector handle before performance degrades?

Depends on memory, index config, and query load. We run stable sub-100ms searches with tens of thousands of vectors per tenant. Push past a few hundred thousand and you'll need more RAM or partitioning. Millions of vectors usually means it's time for a dedicated store.

What embedding model works best with pgvector for semantic search?

OpenAI's text-embedding-3-small hits a good balance at 1536 dimensions. text-embedding-3-large gives better accuracy at 3072 dimensions but costs more and uses more storage. Open-source options like sentence-transformers eliminate the API dependency if that matters to you.

How does HNSW indexing improve vector search performance in PostgreSQL?

HNSW builds a graph structure for approximate nearest neighbor search. Instead of comparing your query against every single vector (slow), it navigates graph layers to find similar vectors fast. You trade some accuracy for dramatic speed gains. Queries that would take seconds with brute force finish in milliseconds.

Can pgvector support multi-tenant applications with data isolation?

Absolutely. Use separate schemas, separate databases, or row-level security depending on how strict you need isolation. We run dedicated PostgreSQL instances per tenant for complete separation. Schema-based approaches work too but need more careful index and resource management.

What are the main cost differences between pgvector and managed vector databases?

pgvector adds minimal marginal cost to existing PostgreSQL. You pay for storage and memory. Managed services charge per vector, per query, per feature. Small implementations might run $50 to $200 monthly on Pinecone versus nearly free with pgvector. At scale, managed services cost thousands but eliminate ops burden.

How do you handle embedding updates when source data changes?

Background workers watch for source changes and regenerate affected embeddings. Delta processing avoids recomputing everything on every edit. We batch generation during low-traffic periods and prioritize time-sensitive updates. PostgreSQL transactions keep vectors consistent with source records.

What observability metrics matter most for production vector search?

Query latency at p50, p95, p99. Embedding generation time and queue depth. Index size growth. Memory utilization. Alert on ingestion failures and anomalous query patterns. Together these metrics show system health and flag bottlenecks before users feel them.