Skip to content

Semantic Search Selection Report

1. Overview

This report evaluates semantic search solutions for the BANA POS system. The goal is to upgrade the existing keyword-based Typesense search to support semantic understanding — enabling intent-based queries, synonym matching, multilingual search (Vietnamese + English), product recommendations, and "more like this" features.

Current State: Typesense v2.1.0 with Kafka CDC (Debezium) sync from PostgreSQL. 6 indexed collections: products, merchants, organizers, categories, devices, sale-channels.

Target State: Hybrid search combining keyword (BM25) + semantic (vector/embedding) search across all collections with excellent Vietnamese + English support.

Evaluation Date: February 2026

2. Current Search Architecture

2.1. Data Flow

2.2. Indexed Collections

CollectionEntityQuery ByMultilingual
productsProductidentifier, info.name.en, info.name.vi, info.description.en, info.description.viYes
merchantsMerchantname.en, name.vi, slug, identifier, status, type, currencyYes
organizersOrganizername.en, name.vi, description.en, description.viYes
categoriesCategoryname.en, name.viYes
devicesDevicename, identifier, statusNo
sale-channelsSaleChannelname.en, name.viYes

2.3. Infrastructure Context

ComponentVersionStatusRelevance to Semantic Search
PostgreSQL18 AlpineRunningSource of truth. pgvector extension available
Typesense2.1.0RunningAlready deployed. Supports native vector search
RedisLatestRunningCache (DB 0), BullMQ (DB 0), Pub/Sub (DB 0), WebSocket (DB 4). Redis 8+ supports vector search
Kafka + Debezium3.x + 2.4RunningCDC pipeline for real-time sync
Drizzle ORM0.45.1RunningNative pgvector support (vector(), cosineDistance())
Bun>= 1.3.2RunningRuntime. Near-100% Node.js compatibility
BullMQ5.14.3RunningQueue infrastructure (finance, payment). Reusable for embedding pipeline

2.4. Capabilities Gap

CapabilityCurrent (Keyword)Target (Semantic)
Exact match (SKU, barcode)YesYes (keyword path)
Typo toleranceYesYes
Synonym understandingNoYes
Intent-based queriesNoYes
"More like this"NoYes
Natural language queriesNoYes
Vietnamese semantic understandingNoYes
Product recommendationsNoYes
Cross-language search (en query → vi results)NoYes

3. Vector Database Evaluation

3.1. Candidates

#SolutionTypeLanguageLicenseGitHub StarsStatus in BANA
1TypesenseSearch engine + VectorC++GPL-322k+Already deployed
2PostgreSQL pgvectorExtensionCPostgreSQL14k+DB already deployed
3Redis StackIn-memory DB + VectorCRSALv267k+Already deployed
4QdrantPurpose-built vector DBRustApache-223k+Not deployed
5MeilisearchSearch engine + VectorRustMIT49k+Not deployed
6WeaviateVector DB + VectorizersGoBSD-314k+Not deployed
7LanceDBEmbedded vector DBRustApache-25k+Not deployed
8Milvus/ZillizDistributed vector DBGo/C++Apache-240k+Not deployed
9ElasticsearchSearch engine + VectorJavaSSPL/ELv272k+Not deployed
10OpenSearchSearch engine + VectorJavaApache-210k+Not deployed
11PineconeManaged vector DB-ProprietaryN/ANot deployed
12ChromaDBEmbedding DBPython/RustApache-218k+Not deployed
13VespaSearch platformJava/C++Apache-26k+Not deployed

3.2. Feature Matrix

FeatureTypesensepgvectorRedis StackQdrantMeilisearchWeaviateMilvusElasticsearch
Vector Index (HNSW)YesYesYesYesYes (Arroy)YesYesYes (Lucene)
Hybrid SearchNative (rank fusion)Manual (SQL)KNN + filterDense+sparse fusionsemanticRatioAlpha-blendNativeRetriever API
Built-in EmbeddingsYes (S-BERT, E5)No (BYO)No (BYO)No (BYO)Yes (OpenAI, HF)Yes (20+ modules)No (BYO)Yes (ELSER)
External Embedding APIOpenAI, PaLMN/AN/AN/AOpenAI, Cohere, HFOpenAI, Cohere, HFN/AOpenAI, HF
BYO VectorsYes (float[])Yes (vector)Yes (float32/64)YesYesYesYesYes
Metadata FilteringFullFull (SQL)10 attrs maxFull (payload)FacetedFull (where)FullFull
Faceted SearchYesNoNoNoYesNoNoYes
Multi-tenancyFilter-basedRLS/WHERETag-basedTiered shardsTenant tokensShard-per-tenantDB/collection/partitionIndex-per-tenant
TypeScript SDKExcellentDrizzle nativeExisting clientOfficial RESTGoodv3 clientNode.js SDKOfficial
Bun CompatibilityConfirmedConfirmedConfirmedExpectedExpectedExpectedExpectedExpected
Self-hosted DockerYesExtensionYesYesYesYes (Compose)Yes (+ etcd)Yes
Cloud-onlyNoNoNoNoNoNoNoNo

3.3. Performance Benchmarks

SolutionRPS (1M vectors)P50 LatencyP99 LatencyMemory (100K x 1024d)Insert Rate
Typesense~5K-10K<10ms<50ms~600MB + 2-6GB modelReal-time (CDC)
pgvector~471 QPS (99% recall)<50ms<100ms~600MB (HNSW)Real-time (SQL)
Redis Stack~3.4x Qdrant~3ms<10ms~600MB (in-memory)~15K vec/sec
QdrantHigh~31ms~37ms~600MB (quantizable to ~60MB)~50K vec/sec
Meilisearch~5K-10K<50ms<100ms~35x data size (RAM!)Batch
Weaviate~1.7x less than Redis~10ms<50ms~3GB per 1M vectorsModerate
MilvusHighest open-source<10ms<30ms~600MB~50K vec/sec
Elasticsearch~5x faster than OpenSearch<50ms<100msRequires 128-256GB RAM clusterModerate

3.4. Cost Analysis (Self-Hosted, Monthly)

SolutionNew InfrastructureRAM OverheadOperational ComplexityMonthly Cost
Typesense (upgrade)None+2-6GB (embedding model)None (already running)$0
pgvectorNone (extension)+600MB (HNSW index)Minimal (ALTER TABLE)$0
Redis Stack (upgrade to v8+)None+600MB (vector index)Minimal (existing Redis)$0
Qdrant1 Docker container2-8GBModerate (new service)$50-150
Meilisearch1 Docker container4-16GB (heavy!)Moderate$50-200
WeaviateDocker Compose (multi-container)8-16GBHigh (modules, config)$100-300
MilvusDocker + etcd + MinIO8-16GBHigh (distributed)$150-400
ElasticsearchDocker cluster (3+ nodes)128-256GBVery High (JVM, ops)$300-1000+

3.5. Critical Disqualifiers for BANA

CandidateDisqualifierSeverity
ElasticsearchRequires 128-256GB RAM cluster. JVM-based. Massive operational overhead. Overkill for POS search.CRITICAL
OpenSearchSame as Elasticsearch. 5x slower than ES for vector search.CRITICAL
VespaNo TypeScript/JavaScript SDK. XML-based config. Internet-scale complexity, completely overkill.CRITICAL
PineconeCloud-only, no self-hosted option. Consumption-based pricing (expensive at scale). Vendor lock-in.HIGH
ChromaDBSingle-node only. In-memory HNSW (10M x 1536d = 60GB RAM). No multi-tenancy. Not production-ready at scale.HIGH
MilvusRequires etcd + MinIO for distributed mode. Disproportionate complexity for POS scale (<1M products).MEDIUM
WeaviateMulti-container Docker Compose. 8-16GB RAM. Complex module configuration. Steeper learning curve.MEDIUM
MeilisearchSingle-node architecture. RAM usage = 35x data size. No horizontal scaling.MEDIUM

3.6. Scoring Matrix

Weighted scoring (1-10, higher = better for BANA):

Criteria (Weight)TypesensepgvectorRedis StackQdrantMeilisearchWeaviateMilvus
Integration Fit (25%)101095443
Hybrid Search (20%)9678898
TypeScript/Bun SDK (15%)101097876
Operational Cost (10%)101097765
Performance (10%)88109779
Multi-tenancy (5%)79687910
Scalability (5%)77695810
Vietnamese Support (10%)7777777
Weighted Total9.058.458.306.906.356.355.95

3.7. Finalist Analysis

Typesense Vector Upgrade (Score: 9.05) — WINNER

Why #1: Already deployed with CDC sync. Adding semantic search = adding an embed field to existing collection schemas. Zero new infrastructure, zero new services.

Strengths:

  • Native rank fusion hybrid search with configurable alpha parameter (0 = pure keyword, 1 = pure semantic)
  • Built-in embedding models: ts/all-MiniLM-L12-v2, ts/e5-small, ts/distiluse-base-multilingual-cased-v2
  • External API support: OpenAI, Google PaLM natively supported
  • BYO pre-computed embeddings via float[] fields
  • Existing Kafka CDC pipeline continues working unchanged
  • Typesense query_by supports mixing text fields and embedding fields in a single query
  • rerank_hybrid_matches: true for improved result quality
  • Cost: $0 additional

Weaknesses:

  • Built-in multilingual model (distiluse) is older, Vietnamese quality is moderate
  • Embedding model loading requires 2-6GB additional RAM
  • Not designed for billion-vector scale (adequate for POS at <10M products)

How hybrid search works in Typesense:

json
{
  "q": "laptop gaming",
  "query_by": "name_en,name_vi,description_en,embedding",
  "prefix": "true,true,true,false",
  "vector_query": "embedding:([], id: *, alpha: 0.3)",
  "filter_by": "status:=active && merchantId:=12345"
}

PostgreSQL pgvector (Score: 8.45) — COMPLEMENT

Why #2: Zero new infrastructure. Drizzle ORM has native pgvector support with vector() column type and distance functions (cosineDistance(), l2Distance(), innerProduct()).

Strengths:

  • Zero new services — just enable the pgvector extension on existing PostgreSQL
  • Drizzle ORM integration is seamless:
    typescript
    import { vector, index } from 'drizzle-orm/pg-core';
    const products = pgTable('products', {
      embedding: vector('embedding', { dimensions: 1024 }),
    });
    // Query
    db.select().from(products).orderBy(cosineDistance(products.embedding, queryVector)).limit(10);
  • Full SQL filtering combined with vector search (merchantId, organizerId, category, etc.)
  • halfvec type for 50% storage savings with no recall loss
  • HNSW index for sub-100ms queries at <10M vectors
  • pgvectorscale achieves 471 QPS at 99% recall on 50M vectors
  • Real-time indexing (vectors indexed on INSERT/UPDATE)

Weaknesses:

  • No built-in rank fusion — hybrid search (keyword + vector) must be implemented at application level
  • No built-in embedding generation (BYO only)
  • HNSW index consumes significant memory for large datasets

Best for: Product recommendations, "more like this", analytical similarity queries, cross-service vector lookups.

Redis Stack Vector Search (Score: 8.30) — REAL-TIME

Why #3: Already running Redis for cache, BullMQ, and pub/sub. Redis 8+ includes vector search natively (previously required Redis Stack module).

Strengths:

  • 3ms P50 latency — fastest vector search among all candidates
  • Already deeply integrated (caching, queues, pub/sub, WebSocket)
  • Uses existing Redis client (node-redis)
  • Supports HNSW, FLAT, and SVS-VAMANA indexes
  • KNN + pre-filter on TAG, TEXT, NUMERIC, GEO fields
  • In-memory performance ideal for real-time POS checkout suggestions

Weaknesses:

  • Maximum 10 attributes per vector index
  • Everything in-memory — RAM cost grows linearly
  • No built-in embedding generation
  • Limited hybrid search (no true rank fusion)

Best for: Real-time POS similarity lookups ("customers also bought"), checkout suggestions, ultra-low-latency search cache.

4. Embedding Model Evaluation

4.1. Critical Requirement: Vietnamese + English

BANA uses i18n fields ({ en: string, vi: string }) for products, merchants, organizers, and categories. The embedding model must handle both Vietnamese and English effectively. Poor Vietnamese support is a disqualifier.

4.2. Candidates

#ModelProviderDimensionsParametersMax TokensLanguagesType
1BGE-M3BAAI1024568M8,192100+Open-source
2jina-embeddings-v3Jina AI1024 (32-1024)570M8,19289API + Self-host
3Typesense distiluse-multilingualTypesense built-in512~135M512MultilingualBuilt-in
4Cohere Embed v4Cohere1536 (configurable)-128K100+API only
5OpenAI text-embedding-3-smallOpenAI1536 (configurable)-8,191MultilingualAPI only
6multilingual-e5-largeMicrosoft1024560M512100+Open-source
7gte-multilingual-baseAlibaba768~300M8,19270+Open-source
8Google gemini-embedding-001Google3072 (configurable)-8,192100+API
9nomic-embed-text-v2-moeNomic AI768475M (305M active)8,192~100Open-source
10Voyage voyage-3.5Voyage AI1024-32KMultilingualAPI
11OpenAI text-embedding-3-largeOpenAI3072 (configurable)-8,191MultilingualAPI only
12mxbai-embed-largeMixedbread AI1024335M512English-focusedOpen-source
13all-MiniLM-L6-v2Sentence-Transformers38422M512English onlyOpen-source
14AWS Titan V2AWS Bedrock1024-8,192MultilingualAPI

4.3. Vietnamese Language Quality

ModelVietnamese Explicit SupportMTEB Multilingual RankVietnamese Quality Assessment
jina-embeddings-v3Top 30 languagesHighExcellent
BGE-M3100+ languages (trained on 170+ lang data)HighExcellent
Cohere Embed v4100+ languages#1 multilingualGood
multilingual-e5-largeStrong on Mr. TyDi multilingualGoodGood
gte-multilingual-base70+ languagesSOTA multilingualGood
gemini-embedding-001100+ languages#1 MTEB multilingualGood
nomic-embed-text-v2-moe~100 languagesGoodModerate
OpenAI text-embedding-3-small/largeMultilingual but not optimizedGoodModerate
Typesense distiluse-multilingualMultilingual (older model)ModerateModerate
Voyage voyage-3.5Multilingual (not highlighted)GoodModerate
mxbai-embed-largeEnglish-focusedN/APoor
all-MiniLM-L6-v2English onlyN/APoor

4.4. Cost Analysis

Model100K docs (~50M tokens)1M docs (~500M tokens)10M docs (~5B tokens)Self-Hosted Option
BGE-M3 (self-hosted)$0 (infra only)$0 (infra only)$0 (infra only)Yes (Ollama, TEI)
Typesense built-in$0$0$0Yes (built-in)
multilingual-e5-large$0 (infra only)$0 (infra only)$0 (infra only)Yes (Ollama, TEI)
gte-multilingual-base$0 (infra only)$0 (infra only)$0 (infra only)Yes (HF, TEI)
OpenAI 3-small$1.00$10.00$100.00No
OpenAI 3-large$6.50$65.00$650.00No
Cohere Embed v4$6.00$60.00$600.00No
gemini-embedding-001$7.50 (free tier available)$75.00$750.00No
Voyage voyage-3.5$3.00 (first 200M free)$18.00$300.00No
jina-embeddings-v3Token-based (free trial 10M)Token-basedToken-basedPartial

Assumes ~500 tokens per product document (title + description + category in both en + vi).

Self-Hosted Infrastructure Cost:

SetupMonthly CostThroughputBest For
Ollama on CPU (4-core, 8GB)~$30-50/month VPS~50-100 docs/secSmall-medium catalog
HuggingFace TEI on CPU~$50-80/month VPS~100-200 docs/secMedium catalog
HuggingFace TEI on GPU (T4)~$150-300/month~500-2,000 docs/secLarge catalog

4.5. Self-Hosting Options for TypeScript/Bun

RuntimeModelsBun CompatibleLatencyBest For
Ollama (Docker)bge-m3, nomic-embed, mxbai-embedYes (REST API)10-30ms/embedProduction. Simple Docker, model management, REST API
HuggingFace TEI (Docker)Any HF model (bge-m3, e5, gte)Yes (REST API)5-20ms/embedProduction. Token batching, Flash Attention, Prometheus metrics
Transformers.js (in-process)Smaller models (<150M params)Yes (v4)20-50ms/embedPrototyping. CPU only, large models are slow
ONNX Runtime (native addon)ONNX-exported modelsPartial<10ms/embed (int8)Performance-critical. Complex setup

4.6. Embedding Strategy for BANA Products

typescript
// Recommended: Concatenate relevant fields for embedding
function createProductEmbeddingText(product: TProduct): string {
  const parts = [
    // Primary: product name (both languages, highest weight)
    product.name?.en, product.name?.vi,
    // Secondary: description
    product.description?.en, product.description?.vi,
    // Tertiary: structured attributes
    product.category?.name?.en, product.category?.name?.vi,
    product.merchant?.name?.en,
    // Variant info
    ...product.variants?.map(v => `${v.name?.en || ''} ${v.name?.vi || ''}`),
    // Identifiers (for exact match fallback)
    product.sku, product.barcode,
  ].filter(Boolean);

  return parts.join(' | ');
}

Key principles:

  • Most important fields (name) first — models weight early tokens more
  • Separator (|) between fields to prevent semantic bleeding
  • Both language versions for cross-language search
  • Keep total under 8,192 tokens for BGE-M3 (product data is typically <200 tokens)
  • Hash the embedding text; skip re-embedding if hash unchanged

4.7. Scoring Matrix

Criteria (Weight)BGE-M3jina-v3TS built-inCohere v4OpenAI 3-smalle5-multigte-multi
Vietnamese Quality (25%)91068688
English Quality (15%)8869878
Self-Hosted Option (15%)108103399
Cost Efficiency (15%)10710691010
Bun/TS Compatibility (10%)88108977
Dimensions/Storage (10%)8979987
Context Length (10%)99710869
Weighted Total9.008.607.557.156.908.058.15

4.8. Embedding Model Verdict

BGE-M3 (Score: 9.00) — WINNER

  • 100+ languages including Vietnamese, trained on 170+ language data
  • Triple retrieval: Dense + Sparse (BM25-like) + ColBERT multi-vector in one model
  • 1024 dimensions, 8,192 token context, 568M parameters
  • Self-hosted via Ollama: ollama pull bge-m3 (1.2GB), REST API at localhost:11434
  • Also via HuggingFace TEI: Production-grade Docker serving with Prometheus metrics
  • Cost: $0 per token (self-hosted), ~$30-80/month infrastructure

jina-embeddings-v3 (Score: 8.60) — RUNNER-UP

  • Vietnamese explicitly in top 30 best-performing languages (best Vietnamese quality)
  • 89 languages, 570M parameters, 1024 dimensions (configurable 32-1024)
  • Task-specific LoRA adapters: retrieval, text-matching, classification
  • Caveat: API pricing is token-based; self-hosting options are more limited

Typesense Built-in distiluse-multilingual (Score: 7.55) — QUICK START

  • Zero configuration — just add embed field to collection schema
  • 512 dimensions, multilingual
  • Vietnamese quality: Moderate (older model, not specifically optimized)
  • Best for: Phase 1 quick start, validate hybrid search before investing in better models

5. Architecture Evaluation

5.1. Candidate Architectures

Option A: Typesense-Only (Simplest)

AspectAssessment
Effort1-2 weeks
New infrastructureNone
Monthly cost$0 additional
Vietnamese qualityModerate (distiluse-multilingual)
Hybrid search qualityExcellent (native rank fusion)
Search latency<50ms

Option B: Typesense + Ollama BGE-M3 (Best Balance)

AspectAssessment
Effort3-4 weeks
New infrastructure1 Docker container (Ollama, ~2GB RAM)
Monthly cost~$30-80 (Ollama on existing server or small VPS)
Vietnamese qualityExcellent (BGE-M3, 100+ languages)
Hybrid search qualityExcellent (Typesense rank fusion with high-quality embeddings)
Search latency<50ms search + 10-30ms embedding (cached: 5ms)

Option C: Typesense + pgvector + Ollama (Full Power)

AspectAssessment
Effort4-6 weeks
New infrastructure1 Docker container (Ollama) + pgvector extension
Monthly cost~$30-80
CapabilitiesHybrid search + recommendations + "more like this" + SQL analytics
Search latency<50ms search, <100ms recommendations

Option D: Cloud API Only (Lowest Effort)

AspectAssessment
Effort1 week
New infrastructureNone
Monthly cost$5-50 API costs (scales with data volume)
Vietnamese qualityGood (OpenAI) to Excellent (Cohere v4)
Vendor dependencyHigh (API key, rate limits, pricing changes)

5.2. Architecture Scoring

Criteria (Weight)Option AOption BOption COption D
Vietnamese Quality (25%)610107
Integration Effort (20%)107510
Operational Cost (15%)10886
Feature Completeness (15%)68106
Data Sovereignty (10%)1010103
Search Quality (10%)7998
Maintenance (5%)10769
Weighted Total7.858.558.157.05

6. Hybrid Search Design

6.1. How Hybrid Search Works

Hybrid search runs keyword (BM25) and semantic (vector) searches in parallel, then fuses results using Reciprocal Rank Fusion (RRF):

RRF_score(document) = SUM( 1 / (rank_i + k) )

Where rank_i is the document's position in each result list, and k is a smoothing constant (typically 60).

Typesense implements this natively via the alpha parameter:

  • alpha: 0.0 = pure keyword search
  • alpha: 0.3 = 70% keyword, 30% semantic (recommended start)
  • alpha: 0.5 = equal blend
  • alpha: 1.0 = pure semantic search

6.2. When Semantic Search is WORSE Than Keyword

This is critical for a POS system:

Query TypeBest Search ModeWhy
SKU codes (SKU-2847-B)Keyword onlySemantic may confuse similar codes
Barcodes (8935049001234)Keyword onlyExact match required
Model numbers (iPhone 15 Pro Max 256GB)Keyword-dominantSemantic may confuse similar models
Price lookups (price < 500000)Filter onlyNot a search problem
Natural language (cheap wireless headphones)HybridSemantic understands intent
Synonym queries (running shoessneakers)Semantic-dominantKeyword won't match synonyms
Vietnamese queries (tai nghe khong day)HybridSemantic handles Vietnamese well

6.3. Query Classification Strategy

6.4. Re-ranking (Optional, Phase 3+)

Two-stage retrieval improves quality by 20-35% but adds 200-500ms latency:

  1. Stage 1 (Retrieve): Typesense hybrid search returns top 50-100 candidates (~20ms)
  2. Stage 2 (Re-rank): Cross-encoder scores query + each candidate pair for precise relevance
Re-rankerLatency (top 20)QualityCost
Cohere Rerank 4~200msBest$2/1K queries
BGE-reranker-v2-m3~150msVery good, multilingualFree (self-hosted)
ms-marco-MiniLM-L-6-v2~100msGoodFree (self-hosted)

Recommendation: Skip re-ranking initially. Add it in Phase 3+ only if search quality metrics indicate a need.

7. Data Pipeline Design

7.1. Embedding Pipeline (Option B Architecture)

7.2. Embedding Invalidation

Fields that trigger re-embedding vs. fields that don't:

Trigger Re-embeddingSkip Re-embedding
Product name (en, vi)Price changes
Product description (en, vi)Stock quantity
Category assignmentSort order
Brand / key attributesInternal flags
Variant namesStatus (filtered, not embedded)

Implementation: SHA-256 hash of embedding-relevant fields. Compare hash on CDC event; skip re-embedding if unchanged.

7.3. BullMQ Queue Design

Follows the existing BANA BullMQ pattern (same as mq-pay and finance):

Queue: embedding:generation:P01, P02, P03  (3 partitions for load distribution)
Queue: embedding:indexing:P01, P02, P03

Concurrency: 10 jobs per partition (configurable)
Retry: Exponential backoff, max 3 retries
Dead Letter Queue: embedding:generation:failed

8. Memory Estimation

8.1. HNSW Index Memory Formula

Memory = 1.1 * (4 * dimensions + 8 * m) * num_vectors bytes

Where m is HNSW max bi-directional links (typically 16).

8.2. BANA Projections

ScaleProductsDimensionsHNSW MemoryTypesense Total (+ model)pgvector Total
Small10,0001024 (BGE-M3)~60 MB~2.1 GB (+ 2GB model)~60 MB
Medium100,0001024~600 MB~2.6 GB (+ 2GB model)~600 MB
Large1,000,0001024~6 GB~8 GB (+ 2GB model)~6 GB
Small (halfvec)100,0001024~300 MBN/A~300 MB

Note: These estimates are for the products collection only. Multiply by ~1.3x for all 6 collections (other collections are smaller).

9. Decision

9.1. Selected: Option B — Typesense + Ollama BGE-M3 (Score: 8.55)

Selected Vector Database: Typesense (vector upgrade to existing deployment) Selected Embedding Model: BGE-M3 via Ollama (self-hosted) Architecture: Typesense hybrid search with pre-computed BGE-M3 embeddings via BullMQ pipeline

9.2. Rationale

  1. Leverages existing infrastructure — Typesense already deployed with CDC sync. No new database services required.
  2. Best Vietnamese + English support — BGE-M3 handles 100+ languages including Vietnamese at excellent quality. Self-hosted means no API dependency.
  3. Zero per-token costs — Self-hosted embedding generation. Predictable infrastructure cost (~$30-80/month for Ollama).
  4. Production-proven hybrid search — Typesense rank fusion combines keyword + semantic search natively. Configurable alpha parameter for tuning.
  5. Fits existing BullMQ pattern — Embedding pipeline uses the same queue architecture as finance/payment packages (3 partitions, configurable concurrency).
  6. Data sovereignty — All data and models stay on your servers. No external API calls for embeddings.
  7. Incremental migration — Can start with Phase 1 (Typesense built-in model) in 1-2 weeks, then upgrade to BGE-M3 in Phase 2.

9.3. Implementation Phases

PhaseDurationDescriptionImpact
Phase 1Week 1-2Add Typesense built-in embeddings (ts/distiluse-multilingual) to product collection. Enable hybrid search with alpha: 0.3. Test with existing CDC pipeline. Query classification for SKU vs natural language.Immediate semantic search. $0 cost.
Phase 2Week 3-4Deploy Ollama + BGE-M3 Docker container. Build BullMQ embedding pipeline (generation + indexing). Replace built-in model with pre-computed BGE-M3 float[] vectors. Tune alpha parameter.Superior Vietnamese search quality.
Phase 3Week 5-6Extend to all 6 collections. Add query embedding cache (Redis). Implement search analytics (CTR, zero-results rate). Add "related products" using vector similarity.Full semantic search across all entities.
Phase 4FutureAdd pgvector for SQL-level recommendations. Add cross-encoder re-ranking for high-value queries. Implement "did you mean?" suggestions. Fine-tune embedding model if needed.Advanced features.

9.4. Cost Summary

ComponentStatusMonthly Cost
Typesense (existing)Already running$0
Ollama BGE-M3 (Docker on existing server or $30 VPS)New$0-30
Redis embedding cache (existing Redis DB)Already running$0
BullMQ workers (existing Bun infrastructure)Reuse pattern$0
Kafka CDC pipeline (existing)Already running$0
Total$0-30/month

10. PostgreSQL Full-Text Search Addendum

10.1. Research Motivation

PostgreSQL offers built-in Full-Text Search (FTS) capabilities, and several extensions promise BM25 scoring within PostgreSQL itself. This addendum evaluates whether PostgreSQL-native search could replace or complement the Typesense + Ollama BGE-M3 architecture selected in Section 9.

Key questions investigated:

  • Can PostgreSQL provide BM25 scoring natively?
  • How well does PostgreSQL FTS handle Vietnamese text?
  • How does Drizzle ORM integrate with PostgreSQL FTS?
  • Should PostgreSQL FTS replace or complement Typesense?

10.2. BM25 Extensions Comparison

Native PostgreSQL FTS uses ts_rank (a TF-IDF variant), not BM25. Several extensions add BM25 scoring:

ExtensionBM25LicenseWAL SafetyVietnameseDrizzle ORMProduction Status
Native FTSNo (TF-IDF)PostgreSQLYesCustom configFullDecades-mature
ParadeDB pg_searchYesAGPL-3.0No (community)ICU onlyRaw SQL onlyActive ($12M funded)
Timescale pg_textsearchYesPostgreSQLUnknownUnknownRaw SQL onlyPreview only
VectorChord-BM25YesApache-2.0UnknownUnknownRaw SQL onlyVery new
PGroongaNo (Groonga)PostgreSQLYesNativeRaw SQL only10+ years mature
pg_trgmNo (similarity)PostgreSQLYesWorks with unaccentFullDecades-mature

10.2.1. ParadeDB pg_search — SKIP

ParadeDB builds BM25 search into PostgreSQL via the Tantivy engine (Rust, Lucene-inspired). Performance is 20-1000x faster than native FTS.

Critical concerns for BANA:

  • AGPL-3.0 license — If you modify pg_search and serve it over a network (SaaS/POS), you may need to release modifications. Requires legal review for commercial use.
  • No WAL support in community edition — PostgreSQL crash = full BM25 index rebuild. Enterprise edition adds WAL but requires paid license.
  • One BM25 index per table — Cannot create multiple BM25 indexes with different configurations on the same table.
  • No Vietnamese tokenizer — Tantivy offers ICU tokenization (Unicode word boundaries) but no linguistically-aware Vietnamese compound word segmentation.
  • No Drizzle ORM integration — All queries via raw SQL (db.execute(sql\...`)`).

10.2.2. Timescale pg_textsearch — WATCH

This is the extension originally asked about. Developed by Timescale with a PostgreSQL-compatible license (permissive). However, it is in preview status — not suitable for production deployment yet. Worth monitoring for future GA release.

10.2.3. PGroonga — CONSIDER

PGroonga (Groonga-based, v4.0.5) is the only extension that natively supports Vietnamese out of the box, handling all languages including CJK and Vietnamese without custom configuration. Available on Supabase as a built-in extension. Trade-off: adds Groonga as an external dependency.

10.3. Vietnamese Search in PostgreSQL

10.3.1. The Vietnamese Challenge

Vietnamese text presents unique challenges for search:

ChallengeExampleImpact
Tonal diacriticsà á ả ã ạ are all variants of aUsers may search without diacritics
Double-composed charactersế = circumflex + acute (2 marks)Character normalization required
Compound words"Hà Nội" = 2 syllables, 1 wordSpace-based tokenization breaks compound words
No built-in PG configVietnamese not in Snowball stemmersCustom text search configuration needed

10.3.2. The unaccent Solution

Since PostgreSQL 11, the default unaccent.rules includes all Vietnamese characters:

sql
CREATE EXTENSION IF NOT EXISTS unaccent;

-- Test: all Vietnamese diacritics handled
SELECT unaccent('cà phê điện thoại');  -- → 'ca phe dien thoai'
SELECT unaccent('Hà Nội thủ đô');      -- → 'Ha Noi thu do'

Characters handled: ắ ằ ẳ ẵ ặa, ế ề ể ễ ệe, ớ ờ ở ỡ ợo, ứ ừ ử ữ ựu, đd, and all tonal combinations.

Important: unaccent() is STABLE (not IMMUTABLE), so an immutable wrapper is required for indexes and generated columns:

sql
CREATE OR REPLACE FUNCTION f_unaccent(text)
RETURNS text AS $$
  SELECT public.unaccent('public.unaccent', $1);
$$ LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT;

10.3.3. Custom Vietnamese Text Search Configuration

sql
CREATE TEXT SEARCH CONFIGURATION vietnamese (COPY = simple);
ALTER TEXT SEARCH CONFIGURATION vietnamese
  ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
                    word, hword, hword_part
  WITH unaccent, simple;

This enables Vietnamese FTS where "ca phe" matches "cà phê" via diacritic normalization.

10.3.4. Vietnamese Word Segmentation Tools

ToolLanguageSpeedPostgreSQL Extension?
CocCoc TokenizerC++Very fastNo
UndertheseaPythonGoodNo
VnCoreNLPJavaGoodNo
PGroonga/GroongaCFastYes

None of the Vietnamese NLP tokenizers have native PostgreSQL extensions (except PGroonga). For compound word segmentation, application-layer preprocessing is required.

10.3.5. ICU Accent-Insensitive Collation (PostgreSQL 12+)

sql
CREATE COLLATION vi_accent_insensitive (
  provider = icu,
  deterministic = false,
  locale = 'vi-u-ks-level1'
);

This provides accent-insensitive comparisons at the database level but cannot be used with LIKE patterns (only equality =) and has a performance penalty.

10.4. Drizzle ORM Integration

10.4.1. Custom tsvector Type

Drizzle ORM does not have native tsvector support. A custom type definition is required:

typescript
import { customType } from 'drizzle-orm/pg-core';

export const tsvector = customType<{ data: string }>({
  dataType() {
    return 'tsvector';
  },
});

10.4.2. Schema with Generated Column + GIN Index

typescript
import { SQL, sql } from 'drizzle-orm';
import { index, pgTable, text, bigint } from 'drizzle-orm/pg-core';

export const products = pgTable(
  'products',
  {
    id: bigint('id', { mode: 'bigint' }).primaryKey(),
    nameVi: text('name_vi').notNull(),
    nameEn: text('name_en'),
    nameSearch: tsvector('name_search')
      .generatedAlwaysAs(
        (): SQL =>
          sql`setweight(to_tsvector('vietnamese',
                f_unaccent(coalesce(${products.nameVi}, ''))), 'A') ||
              setweight(to_tsvector('english',
                coalesce(${products.nameEn}, '')), 'B')`,
      ),
  },
  (t) => [
    // GIN index for full-text search
    index('idx_products_name_search').using('gin', t.nameSearch),
    // GIN trigram index for fuzzy search
    index('idx_products_name_vi_trgm').using(
      'gin',
      sql`f_unaccent(lower(${t.nameVi}))`.op('gin_trgm_ops'),
    ),
  ],
);

Key constraint: The generation expression must use only IMMUTABLE functions. Both to_tsvector() (with explicit regconfig 'vietnamese') and f_unaccent() (our custom wrapper) are immutable.

10.4.3. Query Patterns with Drizzle

typescript
// Full-text search with ranking
const query = sql`plainto_tsquery('vietnamese', f_unaccent(${searchTerm}))`;
const results = await db
  .select({
    id: products.id,
    nameVi: products.nameVi,
    rank: sql<number>`ts_rank(${products.nameSearch}, ${query})`.as('rank'),
  })
  .from(products)
  .where(sql`${products.nameSearch} @@ ${query}`)
  .orderBy(desc(sql`ts_rank(${products.nameSearch}, ${query})`));

// Fuzzy trigram search (typo-tolerant fallback)
const fuzzy = await db
  .select({
    id: products.id,
    sim: sql<number>`similarity(
      f_unaccent(lower(${products.nameVi})),
      f_unaccent(lower(${term}))
    )`.as('sim'),
  })
  .from(products)
  .where(sql`f_unaccent(lower(${products.nameVi})) % f_unaccent(lower(${term}))`)
  .orderBy(desc(sql`similarity(...)`));

10.4.4. Migration in BANA (IGNIS Pattern)

typescript
// src/migrations/processes/search-0001-setup-vietnamese-fts.ts
export const process: TMigrationProcess = {
  name: 'search-0001-setup-vietnamese-fts',
  handler: async ({ connector }) => {
    await connector.execute(sql`CREATE EXTENSION IF NOT EXISTS unaccent`);
    await connector.execute(sql`CREATE EXTENSION IF NOT EXISTS pg_trgm`);
    await connector.execute(sql`
      CREATE OR REPLACE FUNCTION f_unaccent(text)
      RETURNS text AS $$
        SELECT public.unaccent('public.unaccent', $1);
      $$ LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT
    `);
    await connector.execute(sql`
      CREATE TEXT SEARCH CONFIGURATION IF NOT EXISTS vietnamese (COPY = simple)
    `);
    await connector.execute(sql`
      ALTER TEXT SEARCH CONFIGURATION vietnamese
        ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
                          word, hword, hword_part
        WITH unaccent, simple
    `);
  },
};

For BANA, the recommended approach is multi-layer search with graceful degradation:

LayerEngineBest ForLatency
PrimaryTypesense (hybrid)Product catalog, high-volume search, faceted search<50ms
ComplementPostgreSQL FTSReal-time consistent queries, inline CRUD search, small collections10-50ms
FallbackPostgreSQL pg_trgmFuzzy/typo-tolerant search, Vietnamese diacritic-insensitive10-50ms

10.6. Impact on Original Decision

The original recommendation (Option B: Typesense + Ollama BGE-M3) remains unchanged. PostgreSQL FTS is a valuable complement, not a replacement.

AspectTypesense (Primary)PostgreSQL FTS (Complement)
BM25 rankingBuilt-inNo (TF-IDF only via ts_rank)
Semantic/Vector searchBuilt-in (with BGE-M3 embeddings)No (requires pgvector separately)
Vietnamese diacriticsICU tokenizationunaccent extension (full support)
Typo toleranceBuilt-inVia pg_trgm extension
Faceted searchNativeNo
Drizzle ORM integrationN/A (separate service)Full (generated columns, GIN indexes)
Real-time consistencyCDC delay (~1-5s)Instant (same transaction)
Operational costAlready deployed ($0)Already deployed ($0)

When to use PostgreSQL FTS over Typesense:

  • Sale order search within a single merchant context (real-time consistency matters)
  • Finance transaction search (small dataset, transactional queries)
  • Inline search filters within existing CRUD controllers
  • Fallback when Typesense is temporarily unavailable

BM25 extensions verdict:

  • ParadeDB: Skip (AGPL license + no WAL in community edition)
  • pg_textsearch: Watch (promising PostgreSQL license, but still preview)
  • PGroonga: Consider for best Vietnamese support (if you need compound word segmentation)
  • Native FTS + pg_trgm: Use now ($0 cost, full Drizzle integration)

11. Comparison with Alternatives Not Selected

AlternativeWhy Not Selected
Elasticsearch/OpenSearch128-256GB RAM cluster. JVM operational overhead. Overkill for POS scale.
VespaNo TypeScript SDK. XML-based config. Internet-scale complexity.
PineconeCloud-only. No self-hosted. Consumption pricing grows with usage. Vendor lock-in.
ChromaDBSingle-node. 60GB RAM for 10M vectors. No multi-tenancy. Not production-ready.
MilvusRequires etcd + MinIO. Disproportionate complexity for <1M products.
WeaviateMulti-container Compose. 8-16GB RAM minimum. Steeper learning curve.
MeilisearchSingle-node. RAM = 35x data size. No horizontal scaling.
Pure Cloud API (OpenAI/Cohere)Ongoing per-token costs. API dependency. Rate limits. Less control.
DocumentDescription
SearchSearch library identity card + catalog
Search — ArchitectureCDC pipeline + query path
Search — Domain ModelTypesense collections
Commerce PackageProduct search integration (host)
Core DatabasePostgreSQL schemas and models

13. References

Vector Databases

Embedding Models

Architecture & Patterns

Proprietary and Confidential. Unauthorized copying, distribution, or use of this software is strictly prohibited.