Semantic Search Selection Report
1. Overview
This report evaluates semantic search solutions for the BANA POS system. The goal is to upgrade the existing keyword-based Typesense search to support semantic understanding — enabling intent-based queries, synonym matching, multilingual search (Vietnamese + English), product recommendations, and "more like this" features.
Current State: Typesense v2.1.0 with Kafka CDC (Debezium) sync from PostgreSQL. 6 indexed collections: products, merchants, organizers, categories, devices, sale-channels.
Target State: Hybrid search combining keyword (BM25) + semantic (vector/embedding) search across all collections with excellent Vietnamese + English support.
Evaluation Date: February 2026
2. Current Search Architecture
2.1. Data Flow
2.2. Indexed Collections
| Collection | Entity | Query By | Multilingual |
|---|---|---|---|
| products | Product | identifier, info.name.en, info.name.vi, info.description.en, info.description.vi | Yes |
| merchants | Merchant | name.en, name.vi, slug, identifier, status, type, currency | Yes |
| organizers | Organizer | name.en, name.vi, description.en, description.vi | Yes |
| categories | Category | name.en, name.vi | Yes |
| devices | Device | name, identifier, status | No |
| sale-channels | SaleChannel | name.en, name.vi | Yes |
2.3. Infrastructure Context
| Component | Version | Status | Relevance to Semantic Search |
|---|---|---|---|
| PostgreSQL | 18 Alpine | Running | Source of truth. pgvector extension available |
| Typesense | 2.1.0 | Running | Already deployed. Supports native vector search |
| Redis | Latest | Running | Cache (DB 0), BullMQ (DB 0), Pub/Sub (DB 0), WebSocket (DB 4). Redis 8+ supports vector search |
| Kafka + Debezium | 3.x + 2.4 | Running | CDC pipeline for real-time sync |
| Drizzle ORM | 0.45.1 | Running | Native pgvector support (vector(), cosineDistance()) |
| Bun | >= 1.3.2 | Running | Runtime. Near-100% Node.js compatibility |
| BullMQ | 5.14.3 | Running | Queue infrastructure (finance, payment). Reusable for embedding pipeline |
2.4. Capabilities Gap
| Capability | Current (Keyword) | Target (Semantic) |
|---|---|---|
| Exact match (SKU, barcode) | Yes | Yes (keyword path) |
| Typo tolerance | Yes | Yes |
| Synonym understanding | No | Yes |
| Intent-based queries | No | Yes |
| "More like this" | No | Yes |
| Natural language queries | No | Yes |
| Vietnamese semantic understanding | No | Yes |
| Product recommendations | No | Yes |
| Cross-language search (en query → vi results) | No | Yes |
3. Vector Database Evaluation
3.1. Candidates
| # | Solution | Type | Language | License | GitHub Stars | Status in BANA |
|---|---|---|---|---|---|---|
| 1 | Typesense | Search engine + Vector | C++ | GPL-3 | 22k+ | Already deployed |
| 2 | PostgreSQL pgvector | Extension | C | PostgreSQL | 14k+ | DB already deployed |
| 3 | Redis Stack | In-memory DB + Vector | C | RSALv2 | 67k+ | Already deployed |
| 4 | Qdrant | Purpose-built vector DB | Rust | Apache-2 | 23k+ | Not deployed |
| 5 | Meilisearch | Search engine + Vector | Rust | MIT | 49k+ | Not deployed |
| 6 | Weaviate | Vector DB + Vectorizers | Go | BSD-3 | 14k+ | Not deployed |
| 7 | LanceDB | Embedded vector DB | Rust | Apache-2 | 5k+ | Not deployed |
| 8 | Milvus/Zilliz | Distributed vector DB | Go/C++ | Apache-2 | 40k+ | Not deployed |
| 9 | Elasticsearch | Search engine + Vector | Java | SSPL/ELv2 | 72k+ | Not deployed |
| 10 | OpenSearch | Search engine + Vector | Java | Apache-2 | 10k+ | Not deployed |
| 11 | Pinecone | Managed vector DB | - | Proprietary | N/A | Not deployed |
| 12 | ChromaDB | Embedding DB | Python/Rust | Apache-2 | 18k+ | Not deployed |
| 13 | Vespa | Search platform | Java/C++ | Apache-2 | 6k+ | Not deployed |
3.2. Feature Matrix
| Feature | Typesense | pgvector | Redis Stack | Qdrant | Meilisearch | Weaviate | Milvus | Elasticsearch |
|---|---|---|---|---|---|---|---|---|
| Vector Index (HNSW) | Yes | Yes | Yes | Yes | Yes (Arroy) | Yes | Yes | Yes (Lucene) |
| Hybrid Search | Native (rank fusion) | Manual (SQL) | KNN + filter | Dense+sparse fusion | semanticRatio | Alpha-blend | Native | Retriever API |
| Built-in Embeddings | Yes (S-BERT, E5) | No (BYO) | No (BYO) | No (BYO) | Yes (OpenAI, HF) | Yes (20+ modules) | No (BYO) | Yes (ELSER) |
| External Embedding API | OpenAI, PaLM | N/A | N/A | N/A | OpenAI, Cohere, HF | OpenAI, Cohere, HF | N/A | OpenAI, HF |
| BYO Vectors | Yes (float[]) | Yes (vector) | Yes (float32/64) | Yes | Yes | Yes | Yes | Yes |
| Metadata Filtering | Full | Full (SQL) | 10 attrs max | Full (payload) | Faceted | Full (where) | Full | Full |
| Faceted Search | Yes | No | No | No | Yes | No | No | Yes |
| Multi-tenancy | Filter-based | RLS/WHERE | Tag-based | Tiered shards | Tenant tokens | Shard-per-tenant | DB/collection/partition | Index-per-tenant |
| TypeScript SDK | Excellent | Drizzle native | Existing client | Official REST | Good | v3 client | Node.js SDK | Official |
| Bun Compatibility | Confirmed | Confirmed | Confirmed | Expected | Expected | Expected | Expected | Expected |
| Self-hosted Docker | Yes | Extension | Yes | Yes | Yes | Yes (Compose) | Yes (+ etcd) | Yes |
| Cloud-only | No | No | No | No | No | No | No | No |
3.3. Performance Benchmarks
| Solution | RPS (1M vectors) | P50 Latency | P99 Latency | Memory (100K x 1024d) | Insert Rate |
|---|---|---|---|---|---|
| Typesense | ~5K-10K | <10ms | <50ms | ~600MB + 2-6GB model | Real-time (CDC) |
| pgvector | ~471 QPS (99% recall) | <50ms | <100ms | ~600MB (HNSW) | Real-time (SQL) |
| Redis Stack | ~3.4x Qdrant | ~3ms | <10ms | ~600MB (in-memory) | ~15K vec/sec |
| Qdrant | High | ~31ms | ~37ms | ~600MB (quantizable to ~60MB) | ~50K vec/sec |
| Meilisearch | ~5K-10K | <50ms | <100ms | ~35x data size (RAM!) | Batch |
| Weaviate | ~1.7x less than Redis | ~10ms | <50ms | ~3GB per 1M vectors | Moderate |
| Milvus | Highest open-source | <10ms | <30ms | ~600MB | ~50K vec/sec |
| Elasticsearch | ~5x faster than OpenSearch | <50ms | <100ms | Requires 128-256GB RAM cluster | Moderate |
3.4. Cost Analysis (Self-Hosted, Monthly)
| Solution | New Infrastructure | RAM Overhead | Operational Complexity | Monthly Cost |
|---|---|---|---|---|
| Typesense (upgrade) | None | +2-6GB (embedding model) | None (already running) | $0 |
| pgvector | None (extension) | +600MB (HNSW index) | Minimal (ALTER TABLE) | $0 |
| Redis Stack (upgrade to v8+) | None | +600MB (vector index) | Minimal (existing Redis) | $0 |
| Qdrant | 1 Docker container | 2-8GB | Moderate (new service) | $50-150 |
| Meilisearch | 1 Docker container | 4-16GB (heavy!) | Moderate | $50-200 |
| Weaviate | Docker Compose (multi-container) | 8-16GB | High (modules, config) | $100-300 |
| Milvus | Docker + etcd + MinIO | 8-16GB | High (distributed) | $150-400 |
| Elasticsearch | Docker cluster (3+ nodes) | 128-256GB | Very High (JVM, ops) | $300-1000+ |
3.5. Critical Disqualifiers for BANA
| Candidate | Disqualifier | Severity |
|---|---|---|
| Elasticsearch | Requires 128-256GB RAM cluster. JVM-based. Massive operational overhead. Overkill for POS search. | CRITICAL |
| OpenSearch | Same as Elasticsearch. 5x slower than ES for vector search. | CRITICAL |
| Vespa | No TypeScript/JavaScript SDK. XML-based config. Internet-scale complexity, completely overkill. | CRITICAL |
| Pinecone | Cloud-only, no self-hosted option. Consumption-based pricing (expensive at scale). Vendor lock-in. | HIGH |
| ChromaDB | Single-node only. In-memory HNSW (10M x 1536d = 60GB RAM). No multi-tenancy. Not production-ready at scale. | HIGH |
| Milvus | Requires etcd + MinIO for distributed mode. Disproportionate complexity for POS scale (<1M products). | MEDIUM |
| Weaviate | Multi-container Docker Compose. 8-16GB RAM. Complex module configuration. Steeper learning curve. | MEDIUM |
| Meilisearch | Single-node architecture. RAM usage = 35x data size. No horizontal scaling. | MEDIUM |
3.6. Scoring Matrix
Weighted scoring (1-10, higher = better for BANA):
| Criteria (Weight) | Typesense | pgvector | Redis Stack | Qdrant | Meilisearch | Weaviate | Milvus |
|---|---|---|---|---|---|---|---|
| Integration Fit (25%) | 10 | 10 | 9 | 5 | 4 | 4 | 3 |
| Hybrid Search (20%) | 9 | 6 | 7 | 8 | 8 | 9 | 8 |
| TypeScript/Bun SDK (15%) | 10 | 10 | 9 | 7 | 8 | 7 | 6 |
| Operational Cost (10%) | 10 | 10 | 9 | 7 | 7 | 6 | 5 |
| Performance (10%) | 8 | 8 | 10 | 9 | 7 | 7 | 9 |
| Multi-tenancy (5%) | 7 | 9 | 6 | 8 | 7 | 9 | 10 |
| Scalability (5%) | 7 | 7 | 6 | 9 | 5 | 8 | 10 |
| Vietnamese Support (10%) | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
| Weighted Total | 9.05 | 8.45 | 8.30 | 6.90 | 6.35 | 6.35 | 5.95 |
3.7. Finalist Analysis
Typesense Vector Upgrade (Score: 9.05) — WINNER
Why #1: Already deployed with CDC sync. Adding semantic search = adding an embed field to existing collection schemas. Zero new infrastructure, zero new services.
Strengths:
- Native rank fusion hybrid search with configurable
alphaparameter (0 = pure keyword, 1 = pure semantic) - Built-in embedding models:
ts/all-MiniLM-L12-v2,ts/e5-small,ts/distiluse-base-multilingual-cased-v2 - External API support: OpenAI, Google PaLM natively supported
- BYO pre-computed embeddings via
float[]fields - Existing Kafka CDC pipeline continues working unchanged
- Typesense
query_bysupports mixing text fields and embedding fields in a single query rerank_hybrid_matches: truefor improved result quality- Cost: $0 additional
Weaknesses:
- Built-in multilingual model (
distiluse) is older, Vietnamese quality is moderate - Embedding model loading requires 2-6GB additional RAM
- Not designed for billion-vector scale (adequate for POS at <10M products)
How hybrid search works in Typesense:
{
"q": "laptop gaming",
"query_by": "name_en,name_vi,description_en,embedding",
"prefix": "true,true,true,false",
"vector_query": "embedding:([], id: *, alpha: 0.3)",
"filter_by": "status:=active && merchantId:=12345"
}PostgreSQL pgvector (Score: 8.45) — COMPLEMENT
Why #2: Zero new infrastructure. Drizzle ORM has native pgvector support with vector() column type and distance functions (cosineDistance(), l2Distance(), innerProduct()).
Strengths:
- Zero new services — just enable the pgvector extension on existing PostgreSQL
- Drizzle ORM integration is seamless:typescript
import { vector, index } from 'drizzle-orm/pg-core'; const products = pgTable('products', { embedding: vector('embedding', { dimensions: 1024 }), }); // Query db.select().from(products).orderBy(cosineDistance(products.embedding, queryVector)).limit(10); - Full SQL filtering combined with vector search (merchantId, organizerId, category, etc.)
halfvectype for 50% storage savings with no recall loss- HNSW index for sub-100ms queries at <10M vectors
- pgvectorscale achieves 471 QPS at 99% recall on 50M vectors
- Real-time indexing (vectors indexed on INSERT/UPDATE)
Weaknesses:
- No built-in rank fusion — hybrid search (keyword + vector) must be implemented at application level
- No built-in embedding generation (BYO only)
- HNSW index consumes significant memory for large datasets
Best for: Product recommendations, "more like this", analytical similarity queries, cross-service vector lookups.
Redis Stack Vector Search (Score: 8.30) — REAL-TIME
Why #3: Already running Redis for cache, BullMQ, and pub/sub. Redis 8+ includes vector search natively (previously required Redis Stack module).
Strengths:
- 3ms P50 latency — fastest vector search among all candidates
- Already deeply integrated (caching, queues, pub/sub, WebSocket)
- Uses existing Redis client (
node-redis) - Supports HNSW, FLAT, and SVS-VAMANA indexes
- KNN + pre-filter on TAG, TEXT, NUMERIC, GEO fields
- In-memory performance ideal for real-time POS checkout suggestions
Weaknesses:
- Maximum 10 attributes per vector index
- Everything in-memory — RAM cost grows linearly
- No built-in embedding generation
- Limited hybrid search (no true rank fusion)
Best for: Real-time POS similarity lookups ("customers also bought"), checkout suggestions, ultra-low-latency search cache.
4. Embedding Model Evaluation
4.1. Critical Requirement: Vietnamese + English
BANA uses i18n fields ({ en: string, vi: string }) for products, merchants, organizers, and categories. The embedding model must handle both Vietnamese and English effectively. Poor Vietnamese support is a disqualifier.
4.2. Candidates
| # | Model | Provider | Dimensions | Parameters | Max Tokens | Languages | Type |
|---|---|---|---|---|---|---|---|
| 1 | BGE-M3 | BAAI | 1024 | 568M | 8,192 | 100+ | Open-source |
| 2 | jina-embeddings-v3 | Jina AI | 1024 (32-1024) | 570M | 8,192 | 89 | API + Self-host |
| 3 | Typesense distiluse-multilingual | Typesense built-in | 512 | ~135M | 512 | Multilingual | Built-in |
| 4 | Cohere Embed v4 | Cohere | 1536 (configurable) | - | 128K | 100+ | API only |
| 5 | OpenAI text-embedding-3-small | OpenAI | 1536 (configurable) | - | 8,191 | Multilingual | API only |
| 6 | multilingual-e5-large | Microsoft | 1024 | 560M | 512 | 100+ | Open-source |
| 7 | gte-multilingual-base | Alibaba | 768 | ~300M | 8,192 | 70+ | Open-source |
| 8 | Google gemini-embedding-001 | 3072 (configurable) | - | 8,192 | 100+ | API | |
| 9 | nomic-embed-text-v2-moe | Nomic AI | 768 | 475M (305M active) | 8,192 | ~100 | Open-source |
| 10 | Voyage voyage-3.5 | Voyage AI | 1024 | - | 32K | Multilingual | API |
| 11 | OpenAI text-embedding-3-large | OpenAI | 3072 (configurable) | - | 8,191 | Multilingual | API only |
| 12 | mxbai-embed-large | Mixedbread AI | 1024 | 335M | 512 | English-focused | Open-source |
| 13 | all-MiniLM-L6-v2 | Sentence-Transformers | 384 | 22M | 512 | English only | Open-source |
| 14 | AWS Titan V2 | AWS Bedrock | 1024 | - | 8,192 | Multilingual | API |
4.3. Vietnamese Language Quality
| Model | Vietnamese Explicit Support | MTEB Multilingual Rank | Vietnamese Quality Assessment |
|---|---|---|---|
| jina-embeddings-v3 | Top 30 languages | High | Excellent |
| BGE-M3 | 100+ languages (trained on 170+ lang data) | High | Excellent |
| Cohere Embed v4 | 100+ languages | #1 multilingual | Good |
| multilingual-e5-large | Strong on Mr. TyDi multilingual | Good | Good |
| gte-multilingual-base | 70+ languages | SOTA multilingual | Good |
| gemini-embedding-001 | 100+ languages | #1 MTEB multilingual | Good |
| nomic-embed-text-v2-moe | ~100 languages | Good | Moderate |
| OpenAI text-embedding-3-small/large | Multilingual but not optimized | Good | Moderate |
| Typesense distiluse-multilingual | Multilingual (older model) | Moderate | Moderate |
| Voyage voyage-3.5 | Multilingual (not highlighted) | Good | Moderate |
| mxbai-embed-large | English-focused | N/A | Poor |
| all-MiniLM-L6-v2 | English only | N/A | Poor |
4.4. Cost Analysis
| Model | 100K docs (~50M tokens) | 1M docs (~500M tokens) | 10M docs (~5B tokens) | Self-Hosted Option |
|---|---|---|---|---|
| BGE-M3 (self-hosted) | $0 (infra only) | $0 (infra only) | $0 (infra only) | Yes (Ollama, TEI) |
| Typesense built-in | $0 | $0 | $0 | Yes (built-in) |
| multilingual-e5-large | $0 (infra only) | $0 (infra only) | $0 (infra only) | Yes (Ollama, TEI) |
| gte-multilingual-base | $0 (infra only) | $0 (infra only) | $0 (infra only) | Yes (HF, TEI) |
| OpenAI 3-small | $1.00 | $10.00 | $100.00 | No |
| OpenAI 3-large | $6.50 | $65.00 | $650.00 | No |
| Cohere Embed v4 | $6.00 | $60.00 | $600.00 | No |
| gemini-embedding-001 | $7.50 (free tier available) | $75.00 | $750.00 | No |
| Voyage voyage-3.5 | $3.00 (first 200M free) | $18.00 | $300.00 | No |
| jina-embeddings-v3 | Token-based (free trial 10M) | Token-based | Token-based | Partial |
Assumes ~500 tokens per product document (title + description + category in both en + vi).
Self-Hosted Infrastructure Cost:
| Setup | Monthly Cost | Throughput | Best For |
|---|---|---|---|
| Ollama on CPU (4-core, 8GB) | ~$30-50/month VPS | ~50-100 docs/sec | Small-medium catalog |
| HuggingFace TEI on CPU | ~$50-80/month VPS | ~100-200 docs/sec | Medium catalog |
| HuggingFace TEI on GPU (T4) | ~$150-300/month | ~500-2,000 docs/sec | Large catalog |
4.5. Self-Hosting Options for TypeScript/Bun
| Runtime | Models | Bun Compatible | Latency | Best For |
|---|---|---|---|---|
| Ollama (Docker) | bge-m3, nomic-embed, mxbai-embed | Yes (REST API) | 10-30ms/embed | Production. Simple Docker, model management, REST API |
| HuggingFace TEI (Docker) | Any HF model (bge-m3, e5, gte) | Yes (REST API) | 5-20ms/embed | Production. Token batching, Flash Attention, Prometheus metrics |
| Transformers.js (in-process) | Smaller models (<150M params) | Yes (v4) | 20-50ms/embed | Prototyping. CPU only, large models are slow |
| ONNX Runtime (native addon) | ONNX-exported models | Partial | <10ms/embed (int8) | Performance-critical. Complex setup |
4.6. Embedding Strategy for BANA Products
// Recommended: Concatenate relevant fields for embedding
function createProductEmbeddingText(product: TProduct): string {
const parts = [
// Primary: product name (both languages, highest weight)
product.name?.en, product.name?.vi,
// Secondary: description
product.description?.en, product.description?.vi,
// Tertiary: structured attributes
product.category?.name?.en, product.category?.name?.vi,
product.merchant?.name?.en,
// Variant info
...product.variants?.map(v => `${v.name?.en || ''} ${v.name?.vi || ''}`),
// Identifiers (for exact match fallback)
product.sku, product.barcode,
].filter(Boolean);
return parts.join(' | ');
}Key principles:
- Most important fields (name) first — models weight early tokens more
- Separator (
|) between fields to prevent semantic bleeding - Both language versions for cross-language search
- Keep total under 8,192 tokens for BGE-M3 (product data is typically <200 tokens)
- Hash the embedding text; skip re-embedding if hash unchanged
4.7. Scoring Matrix
| Criteria (Weight) | BGE-M3 | jina-v3 | TS built-in | Cohere v4 | OpenAI 3-small | e5-multi | gte-multi |
|---|---|---|---|---|---|---|---|
| Vietnamese Quality (25%) | 9 | 10 | 6 | 8 | 6 | 8 | 8 |
| English Quality (15%) | 8 | 8 | 6 | 9 | 8 | 7 | 8 |
| Self-Hosted Option (15%) | 10 | 8 | 10 | 3 | 3 | 9 | 9 |
| Cost Efficiency (15%) | 10 | 7 | 10 | 6 | 9 | 10 | 10 |
| Bun/TS Compatibility (10%) | 8 | 8 | 10 | 8 | 9 | 7 | 7 |
| Dimensions/Storage (10%) | 8 | 9 | 7 | 9 | 9 | 8 | 7 |
| Context Length (10%) | 9 | 9 | 7 | 10 | 8 | 6 | 9 |
| Weighted Total | 9.00 | 8.60 | 7.55 | 7.15 | 6.90 | 8.05 | 8.15 |
4.8. Embedding Model Verdict
BGE-M3 (Score: 9.00) — WINNER
- 100+ languages including Vietnamese, trained on 170+ language data
- Triple retrieval: Dense + Sparse (BM25-like) + ColBERT multi-vector in one model
- 1024 dimensions, 8,192 token context, 568M parameters
- Self-hosted via Ollama:
ollama pull bge-m3(1.2GB), REST API atlocalhost:11434 - Also via HuggingFace TEI: Production-grade Docker serving with Prometheus metrics
- Cost: $0 per token (self-hosted), ~$30-80/month infrastructure
jina-embeddings-v3 (Score: 8.60) — RUNNER-UP
- Vietnamese explicitly in top 30 best-performing languages (best Vietnamese quality)
- 89 languages, 570M parameters, 1024 dimensions (configurable 32-1024)
- Task-specific LoRA adapters: retrieval, text-matching, classification
- Caveat: API pricing is token-based; self-hosting options are more limited
Typesense Built-in distiluse-multilingual (Score: 7.55) — QUICK START
- Zero configuration — just add
embedfield to collection schema - 512 dimensions, multilingual
- Vietnamese quality: Moderate (older model, not specifically optimized)
- Best for: Phase 1 quick start, validate hybrid search before investing in better models
5. Architecture Evaluation
5.1. Candidate Architectures
Option A: Typesense-Only (Simplest)
| Aspect | Assessment |
|---|---|
| Effort | 1-2 weeks |
| New infrastructure | None |
| Monthly cost | $0 additional |
| Vietnamese quality | Moderate (distiluse-multilingual) |
| Hybrid search quality | Excellent (native rank fusion) |
| Search latency | <50ms |
Option B: Typesense + Ollama BGE-M3 (Best Balance)
| Aspect | Assessment |
|---|---|
| Effort | 3-4 weeks |
| New infrastructure | 1 Docker container (Ollama, ~2GB RAM) |
| Monthly cost | ~$30-80 (Ollama on existing server or small VPS) |
| Vietnamese quality | Excellent (BGE-M3, 100+ languages) |
| Hybrid search quality | Excellent (Typesense rank fusion with high-quality embeddings) |
| Search latency | <50ms search + 10-30ms embedding (cached: 5ms) |
Option C: Typesense + pgvector + Ollama (Full Power)
| Aspect | Assessment |
|---|---|
| Effort | 4-6 weeks |
| New infrastructure | 1 Docker container (Ollama) + pgvector extension |
| Monthly cost | ~$30-80 |
| Capabilities | Hybrid search + recommendations + "more like this" + SQL analytics |
| Search latency | <50ms search, <100ms recommendations |
Option D: Cloud API Only (Lowest Effort)
| Aspect | Assessment |
|---|---|
| Effort | 1 week |
| New infrastructure | None |
| Monthly cost | $5-50 API costs (scales with data volume) |
| Vietnamese quality | Good (OpenAI) to Excellent (Cohere v4) |
| Vendor dependency | High (API key, rate limits, pricing changes) |
5.2. Architecture Scoring
| Criteria (Weight) | Option A | Option B | Option C | Option D |
|---|---|---|---|---|
| Vietnamese Quality (25%) | 6 | 10 | 10 | 7 |
| Integration Effort (20%) | 10 | 7 | 5 | 10 |
| Operational Cost (15%) | 10 | 8 | 8 | 6 |
| Feature Completeness (15%) | 6 | 8 | 10 | 6 |
| Data Sovereignty (10%) | 10 | 10 | 10 | 3 |
| Search Quality (10%) | 7 | 9 | 9 | 8 |
| Maintenance (5%) | 10 | 7 | 6 | 9 |
| Weighted Total | 7.85 | 8.55 | 8.15 | 7.05 |
6. Hybrid Search Design
6.1. How Hybrid Search Works
Hybrid search runs keyword (BM25) and semantic (vector) searches in parallel, then fuses results using Reciprocal Rank Fusion (RRF):
RRF_score(document) = SUM( 1 / (rank_i + k) )Where rank_i is the document's position in each result list, and k is a smoothing constant (typically 60).
Typesense implements this natively via the alpha parameter:
alpha: 0.0= pure keyword searchalpha: 0.3= 70% keyword, 30% semantic (recommended start)alpha: 0.5= equal blendalpha: 1.0= pure semantic search
6.2. When Semantic Search is WORSE Than Keyword
This is critical for a POS system:
| Query Type | Best Search Mode | Why |
|---|---|---|
SKU codes (SKU-2847-B) | Keyword only | Semantic may confuse similar codes |
Barcodes (8935049001234) | Keyword only | Exact match required |
Model numbers (iPhone 15 Pro Max 256GB) | Keyword-dominant | Semantic may confuse similar models |
Price lookups (price < 500000) | Filter only | Not a search problem |
Natural language (cheap wireless headphones) | Hybrid | Semantic understands intent |
Synonym queries (running shoes → sneakers) | Semantic-dominant | Keyword won't match synonyms |
Vietnamese queries (tai nghe khong day) | Hybrid | Semantic handles Vietnamese well |
6.3. Query Classification Strategy
6.4. Re-ranking (Optional, Phase 3+)
Two-stage retrieval improves quality by 20-35% but adds 200-500ms latency:
- Stage 1 (Retrieve): Typesense hybrid search returns top 50-100 candidates (~20ms)
- Stage 2 (Re-rank): Cross-encoder scores query + each candidate pair for precise relevance
| Re-ranker | Latency (top 20) | Quality | Cost |
|---|---|---|---|
| Cohere Rerank 4 | ~200ms | Best | $2/1K queries |
| BGE-reranker-v2-m3 | ~150ms | Very good, multilingual | Free (self-hosted) |
| ms-marco-MiniLM-L-6-v2 | ~100ms | Good | Free (self-hosted) |
Recommendation: Skip re-ranking initially. Add it in Phase 3+ only if search quality metrics indicate a need.
7. Data Pipeline Design
7.1. Embedding Pipeline (Option B Architecture)
7.2. Embedding Invalidation
Fields that trigger re-embedding vs. fields that don't:
| Trigger Re-embedding | Skip Re-embedding |
|---|---|
| Product name (en, vi) | Price changes |
| Product description (en, vi) | Stock quantity |
| Category assignment | Sort order |
| Brand / key attributes | Internal flags |
| Variant names | Status (filtered, not embedded) |
Implementation: SHA-256 hash of embedding-relevant fields. Compare hash on CDC event; skip re-embedding if unchanged.
7.3. BullMQ Queue Design
Follows the existing BANA BullMQ pattern (same as mq-pay and finance):
Queue: embedding:generation:P01, P02, P03 (3 partitions for load distribution)
Queue: embedding:indexing:P01, P02, P03
Concurrency: 10 jobs per partition (configurable)
Retry: Exponential backoff, max 3 retries
Dead Letter Queue: embedding:generation:failed8. Memory Estimation
8.1. HNSW Index Memory Formula
Memory = 1.1 * (4 * dimensions + 8 * m) * num_vectors bytesWhere m is HNSW max bi-directional links (typically 16).
8.2. BANA Projections
| Scale | Products | Dimensions | HNSW Memory | Typesense Total (+ model) | pgvector Total |
|---|---|---|---|---|---|
| Small | 10,000 | 1024 (BGE-M3) | ~60 MB | ~2.1 GB (+ 2GB model) | ~60 MB |
| Medium | 100,000 | 1024 | ~600 MB | ~2.6 GB (+ 2GB model) | ~600 MB |
| Large | 1,000,000 | 1024 | ~6 GB | ~8 GB (+ 2GB model) | ~6 GB |
| Small (halfvec) | 100,000 | 1024 | ~300 MB | N/A | ~300 MB |
Note: These estimates are for the products collection only. Multiply by ~1.3x for all 6 collections (other collections are smaller).
9. Decision
9.1. Selected: Option B — Typesense + Ollama BGE-M3 (Score: 8.55)
Selected Vector Database: Typesense (vector upgrade to existing deployment) Selected Embedding Model: BGE-M3 via Ollama (self-hosted) Architecture: Typesense hybrid search with pre-computed BGE-M3 embeddings via BullMQ pipeline
9.2. Rationale
- Leverages existing infrastructure — Typesense already deployed with CDC sync. No new database services required.
- Best Vietnamese + English support — BGE-M3 handles 100+ languages including Vietnamese at excellent quality. Self-hosted means no API dependency.
- Zero per-token costs — Self-hosted embedding generation. Predictable infrastructure cost (~$30-80/month for Ollama).
- Production-proven hybrid search — Typesense rank fusion combines keyword + semantic search natively. Configurable
alphaparameter for tuning. - Fits existing BullMQ pattern — Embedding pipeline uses the same queue architecture as finance/payment packages (3 partitions, configurable concurrency).
- Data sovereignty — All data and models stay on your servers. No external API calls for embeddings.
- Incremental migration — Can start with Phase 1 (Typesense built-in model) in 1-2 weeks, then upgrade to BGE-M3 in Phase 2.
9.3. Implementation Phases
| Phase | Duration | Description | Impact |
|---|---|---|---|
| Phase 1 | Week 1-2 | Add Typesense built-in embeddings (ts/distiluse-multilingual) to product collection. Enable hybrid search with alpha: 0.3. Test with existing CDC pipeline. Query classification for SKU vs natural language. | Immediate semantic search. $0 cost. |
| Phase 2 | Week 3-4 | Deploy Ollama + BGE-M3 Docker container. Build BullMQ embedding pipeline (generation + indexing). Replace built-in model with pre-computed BGE-M3 float[] vectors. Tune alpha parameter. | Superior Vietnamese search quality. |
| Phase 3 | Week 5-6 | Extend to all 6 collections. Add query embedding cache (Redis). Implement search analytics (CTR, zero-results rate). Add "related products" using vector similarity. | Full semantic search across all entities. |
| Phase 4 | Future | Add pgvector for SQL-level recommendations. Add cross-encoder re-ranking for high-value queries. Implement "did you mean?" suggestions. Fine-tune embedding model if needed. | Advanced features. |
9.4. Cost Summary
| Component | Status | Monthly Cost |
|---|---|---|
| Typesense (existing) | Already running | $0 |
| Ollama BGE-M3 (Docker on existing server or $30 VPS) | New | $0-30 |
| Redis embedding cache (existing Redis DB) | Already running | $0 |
| BullMQ workers (existing Bun infrastructure) | Reuse pattern | $0 |
| Kafka CDC pipeline (existing) | Already running | $0 |
| Total | $0-30/month |
10. PostgreSQL Full-Text Search Addendum
10.1. Research Motivation
PostgreSQL offers built-in Full-Text Search (FTS) capabilities, and several extensions promise BM25 scoring within PostgreSQL itself. This addendum evaluates whether PostgreSQL-native search could replace or complement the Typesense + Ollama BGE-M3 architecture selected in Section 9.
Key questions investigated:
- Can PostgreSQL provide BM25 scoring natively?
- How well does PostgreSQL FTS handle Vietnamese text?
- How does Drizzle ORM integrate with PostgreSQL FTS?
- Should PostgreSQL FTS replace or complement Typesense?
10.2. BM25 Extensions Comparison
Native PostgreSQL FTS uses ts_rank (a TF-IDF variant), not BM25. Several extensions add BM25 scoring:
| Extension | BM25 | License | WAL Safety | Vietnamese | Drizzle ORM | Production Status |
|---|---|---|---|---|---|---|
| Native FTS | No (TF-IDF) | PostgreSQL | Yes | Custom config | Full | Decades-mature |
| ParadeDB pg_search | Yes | AGPL-3.0 | No (community) | ICU only | Raw SQL only | Active ($12M funded) |
| Timescale pg_textsearch | Yes | PostgreSQL | Unknown | Unknown | Raw SQL only | Preview only |
| VectorChord-BM25 | Yes | Apache-2.0 | Unknown | Unknown | Raw SQL only | Very new |
| PGroonga | No (Groonga) | PostgreSQL | Yes | Native | Raw SQL only | 10+ years mature |
| pg_trgm | No (similarity) | PostgreSQL | Yes | Works with unaccent | Full | Decades-mature |
10.2.1. ParadeDB pg_search — SKIP
ParadeDB builds BM25 search into PostgreSQL via the Tantivy engine (Rust, Lucene-inspired). Performance is 20-1000x faster than native FTS.
Critical concerns for BANA:
- AGPL-3.0 license — If you modify pg_search and serve it over a network (SaaS/POS), you may need to release modifications. Requires legal review for commercial use.
- No WAL support in community edition — PostgreSQL crash = full BM25 index rebuild. Enterprise edition adds WAL but requires paid license.
- One BM25 index per table — Cannot create multiple BM25 indexes with different configurations on the same table.
- No Vietnamese tokenizer — Tantivy offers ICU tokenization (Unicode word boundaries) but no linguistically-aware Vietnamese compound word segmentation.
- No Drizzle ORM integration — All queries via raw SQL (
db.execute(sql\...`)`).
10.2.2. Timescale pg_textsearch — WATCH
This is the extension originally asked about. Developed by Timescale with a PostgreSQL-compatible license (permissive). However, it is in preview status — not suitable for production deployment yet. Worth monitoring for future GA release.
10.2.3. PGroonga — CONSIDER
PGroonga (Groonga-based, v4.0.5) is the only extension that natively supports Vietnamese out of the box, handling all languages including CJK and Vietnamese without custom configuration. Available on Supabase as a built-in extension. Trade-off: adds Groonga as an external dependency.
10.3. Vietnamese Search in PostgreSQL
10.3.1. The Vietnamese Challenge
Vietnamese text presents unique challenges for search:
| Challenge | Example | Impact |
|---|---|---|
| Tonal diacritics | à á ả ã ạ are all variants of a | Users may search without diacritics |
| Double-composed characters | ế = circumflex + acute (2 marks) | Character normalization required |
| Compound words | "Hà Nội" = 2 syllables, 1 word | Space-based tokenization breaks compound words |
| No built-in PG config | Vietnamese not in Snowball stemmers | Custom text search configuration needed |
10.3.2. The unaccent Solution
Since PostgreSQL 11, the default unaccent.rules includes all Vietnamese characters:
CREATE EXTENSION IF NOT EXISTS unaccent;
-- Test: all Vietnamese diacritics handled
SELECT unaccent('cà phê điện thoại'); -- → 'ca phe dien thoai'
SELECT unaccent('Hà Nội thủ đô'); -- → 'Ha Noi thu do'Characters handled: ắ ằ ẳ ẵ ặ → a, ế ề ể ễ ệ → e, ớ ờ ở ỡ ợ → o, ứ ừ ử ữ ự → u, đ → d, and all tonal combinations.
Important: unaccent() is STABLE (not IMMUTABLE), so an immutable wrapper is required for indexes and generated columns:
CREATE OR REPLACE FUNCTION f_unaccent(text)
RETURNS text AS $$
SELECT public.unaccent('public.unaccent', $1);
$$ LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT;10.3.3. Custom Vietnamese Text Search Configuration
CREATE TEXT SEARCH CONFIGURATION vietnamese (COPY = simple);
ALTER TEXT SEARCH CONFIGURATION vietnamese
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH unaccent, simple;This enables Vietnamese FTS where "ca phe" matches "cà phê" via diacritic normalization.
10.3.4. Vietnamese Word Segmentation Tools
| Tool | Language | Speed | PostgreSQL Extension? |
|---|---|---|---|
| CocCoc Tokenizer | C++ | Very fast | No |
| Underthesea | Python | Good | No |
| VnCoreNLP | Java | Good | No |
| PGroonga/Groonga | C | Fast | Yes |
None of the Vietnamese NLP tokenizers have native PostgreSQL extensions (except PGroonga). For compound word segmentation, application-layer preprocessing is required.
10.3.5. ICU Accent-Insensitive Collation (PostgreSQL 12+)
CREATE COLLATION vi_accent_insensitive (
provider = icu,
deterministic = false,
locale = 'vi-u-ks-level1'
);This provides accent-insensitive comparisons at the database level but cannot be used with LIKE patterns (only equality =) and has a performance penalty.
10.4. Drizzle ORM Integration
10.4.1. Custom tsvector Type
Drizzle ORM does not have native tsvector support. A custom type definition is required:
import { customType } from 'drizzle-orm/pg-core';
export const tsvector = customType<{ data: string }>({
dataType() {
return 'tsvector';
},
});10.4.2. Schema with Generated Column + GIN Index
import { SQL, sql } from 'drizzle-orm';
import { index, pgTable, text, bigint } from 'drizzle-orm/pg-core';
export const products = pgTable(
'products',
{
id: bigint('id', { mode: 'bigint' }).primaryKey(),
nameVi: text('name_vi').notNull(),
nameEn: text('name_en'),
nameSearch: tsvector('name_search')
.generatedAlwaysAs(
(): SQL =>
sql`setweight(to_tsvector('vietnamese',
f_unaccent(coalesce(${products.nameVi}, ''))), 'A') ||
setweight(to_tsvector('english',
coalesce(${products.nameEn}, '')), 'B')`,
),
},
(t) => [
// GIN index for full-text search
index('idx_products_name_search').using('gin', t.nameSearch),
// GIN trigram index for fuzzy search
index('idx_products_name_vi_trgm').using(
'gin',
sql`f_unaccent(lower(${t.nameVi}))`.op('gin_trgm_ops'),
),
],
);Key constraint: The generation expression must use only IMMUTABLE functions. Both to_tsvector() (with explicit regconfig 'vietnamese') and f_unaccent() (our custom wrapper) are immutable.
10.4.3. Query Patterns with Drizzle
// Full-text search with ranking
const query = sql`plainto_tsquery('vietnamese', f_unaccent(${searchTerm}))`;
const results = await db
.select({
id: products.id,
nameVi: products.nameVi,
rank: sql<number>`ts_rank(${products.nameSearch}, ${query})`.as('rank'),
})
.from(products)
.where(sql`${products.nameSearch} @@ ${query}`)
.orderBy(desc(sql`ts_rank(${products.nameSearch}, ${query})`));
// Fuzzy trigram search (typo-tolerant fallback)
const fuzzy = await db
.select({
id: products.id,
sim: sql<number>`similarity(
f_unaccent(lower(${products.nameVi})),
f_unaccent(lower(${term}))
)`.as('sim'),
})
.from(products)
.where(sql`f_unaccent(lower(${products.nameVi})) % f_unaccent(lower(${term}))`)
.orderBy(desc(sql`similarity(...)`));10.4.4. Migration in BANA (IGNIS Pattern)
// src/migrations/processes/search-0001-setup-vietnamese-fts.ts
export const process: TMigrationProcess = {
name: 'search-0001-setup-vietnamese-fts',
handler: async ({ connector }) => {
await connector.execute(sql`CREATE EXTENSION IF NOT EXISTS unaccent`);
await connector.execute(sql`CREATE EXTENSION IF NOT EXISTS pg_trgm`);
await connector.execute(sql`
CREATE OR REPLACE FUNCTION f_unaccent(text)
RETURNS text AS $$
SELECT public.unaccent('public.unaccent', $1);
$$ LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT
`);
await connector.execute(sql`
CREATE TEXT SEARCH CONFIGURATION IF NOT EXISTS vietnamese (COPY = simple)
`);
await connector.execute(sql`
ALTER TEXT SEARCH CONFIGURATION vietnamese
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH unaccent, simple
`);
},
};10.5. Practical Architecture: Multi-Layer Search
For BANA, the recommended approach is multi-layer search with graceful degradation:
| Layer | Engine | Best For | Latency |
|---|---|---|---|
| Primary | Typesense (hybrid) | Product catalog, high-volume search, faceted search | <50ms |
| Complement | PostgreSQL FTS | Real-time consistent queries, inline CRUD search, small collections | 10-50ms |
| Fallback | PostgreSQL pg_trgm | Fuzzy/typo-tolerant search, Vietnamese diacritic-insensitive | 10-50ms |
10.6. Impact on Original Decision
The original recommendation (Option B: Typesense + Ollama BGE-M3) remains unchanged. PostgreSQL FTS is a valuable complement, not a replacement.
| Aspect | Typesense (Primary) | PostgreSQL FTS (Complement) |
|---|---|---|
| BM25 ranking | Built-in | No (TF-IDF only via ts_rank) |
| Semantic/Vector search | Built-in (with BGE-M3 embeddings) | No (requires pgvector separately) |
| Vietnamese diacritics | ICU tokenization | unaccent extension (full support) |
| Typo tolerance | Built-in | Via pg_trgm extension |
| Faceted search | Native | No |
| Drizzle ORM integration | N/A (separate service) | Full (generated columns, GIN indexes) |
| Real-time consistency | CDC delay (~1-5s) | Instant (same transaction) |
| Operational cost | Already deployed ($0) | Already deployed ($0) |
When to use PostgreSQL FTS over Typesense:
- Sale order search within a single merchant context (real-time consistency matters)
- Finance transaction search (small dataset, transactional queries)
- Inline search filters within existing CRUD controllers
- Fallback when Typesense is temporarily unavailable
BM25 extensions verdict:
- ParadeDB: Skip (AGPL license + no WAL in community edition)
- pg_textsearch: Watch (promising PostgreSQL license, but still preview)
- PGroonga: Consider for best Vietnamese support (if you need compound word segmentation)
- Native FTS + pg_trgm: Use now ($0 cost, full Drizzle integration)
11. Comparison with Alternatives Not Selected
| Alternative | Why Not Selected |
|---|---|
| Elasticsearch/OpenSearch | 128-256GB RAM cluster. JVM operational overhead. Overkill for POS scale. |
| Vespa | No TypeScript SDK. XML-based config. Internet-scale complexity. |
| Pinecone | Cloud-only. No self-hosted. Consumption pricing grows with usage. Vendor lock-in. |
| ChromaDB | Single-node. 60GB RAM for 10M vectors. No multi-tenancy. Not production-ready. |
| Milvus | Requires etcd + MinIO. Disproportionate complexity for <1M products. |
| Weaviate | Multi-container Compose. 8-16GB RAM minimum. Steeper learning curve. |
| Meilisearch | Single-node. RAM = 35x data size. No horizontal scaling. |
| Pure Cloud API (OpenAI/Cohere) | Ongoing per-token costs. API dependency. Rate limits. Less control. |
12. Related Documentation
| Document | Description |
|---|---|
| Search | Search library identity card + catalog |
| Search — Architecture | CDC pipeline + query path |
| Search — Domain Model | Typesense collections |
| Commerce Package | Product search integration (host) |
| Core Database | PostgreSQL schemas and models |
13. References
Vector Databases
- Typesense Vector Search Documentation
- Typesense Semantic Search Guide
- pgvector GitHub
- Drizzle ORM pgvector Integration
- pgvector 0.8.0 on Aurora PostgreSQL
- Redis Vector Search
- Redis Benchmarking Results
- Qdrant Vector Database
- Milvus Multi-Tenancy
- Weaviate Hybrid Search
- Meilisearch AI-Powered Search
Embedding Models
- BGE-M3 on HuggingFace
- Jina Embeddings v3
- OpenAI Embeddings
- Cohere Embed v4
- Gemini Embedding
- Voyage AI Embeddings
- VN-MTEB Vietnamese Benchmark
- Ollama Embedding Models
- HuggingFace TEI
- Transformers.js v4
PostgreSQL Full-Text Search
- PostgreSQL Documentation: Full Text Search
- PostgreSQL Documentation: unaccent
- PostgreSQL Documentation: pg_trgm
- ParadeDB: BM25 in PostgreSQL
- ParadeDB Documentation
- PGroonga: Multilingual Full Text Search
- PGroonga on Supabase
- Vietnamese FTS on PostgreSQL (blog.tuando.me)
- Vietnamese PostgreSQL FTS Configuration (GitHub Gist)
- CocCoc Vietnamese Tokenizer
- Drizzle ORM: PostgreSQL Full-Text Search
- Drizzle ORM: Full-Text Search with Generated Columns
- Drizzle ORM: Indexes & Constraints
- PostgreSQL Accent-Insensitive Collations
- Postgres Text Search: Full Text vs Trigram