Semantic Search Selection Report

1. Overview

This report evaluates semantic search solutions for the BANA POS system. The goal is to upgrade the existing keyword-based Typesense search to support semantic understanding - enabling intent-based queries, synonym matching, multilingual search (Vietnamese + English), product recommendations, and "more like this" features.

Current State: Typesense v2.1.0 with Kafka CDC (Debezium) sync from PostgreSQL. 6 indexed collections: products, merchants, organizers, categories, devices, sale-channels.

Target State: Hybrid search combining keyword (BM25) + semantic (vector/embedding) search across all collections with excellent Vietnamese + English support.

Evaluation Date: February 2026

2. Current Search Architecture

2.1. Data Flow

2.2. Indexed Collections

Collection	Entity	Query By	Multilingual
products	Product	identifier, info.name.en, info.name.vi, info.description.en, info.description.vi	Yes
merchants	Merchant	name.en, name.vi, slug, identifier, status, type, currency	Yes
organizers	Organizer	name.en, name.vi, description.en, description.vi	Yes
categories	Category	name.en, name.vi	Yes
devices	Device	name, identifier, status	No
sale-channels	SaleChannel	name.en, name.vi	Yes

2.3. Infrastructure Context

Component	Version	Status	Relevance to Semantic Search
PostgreSQL	18 Alpine	Running	Source of truth. pgvector extension available
Typesense	2.1.0	Running	Already deployed. Supports native vector search
Redis	Latest	Running	Cache (DB 0), BullMQ (DB 0), Pub/Sub (DB 0), WebSocket (DB 4). Redis 8+ supports vector search
Kafka + Debezium	3.x + 2.4	Running	CDC pipeline for real-time sync
Drizzle ORM	0.45.2	Running	Native pgvector support (`vector()`, `cosineDistance()`)
Bun	>= 1.3.8	Running	Runtime. Near-100% Node.js compatibility
BullMQ	5.80.8	Running	Queue infrastructure (finance, payment). Reusable for embedding pipeline

2.4. Capabilities Gap

Capability	Current (Keyword)	Target (Semantic)
Exact match (SKU, barcode)	Yes	Yes (keyword path)
Typo tolerance	Yes	Yes
Synonym understanding	No	Yes
Intent-based queries	No	Yes
"More like this"	No	Yes
Natural language queries	No	Yes
Vietnamese semantic understanding	No	Yes
Product recommendations	No	Yes
Cross-language search (en query → vi results)	No	Yes

3. Vector Database Evaluation

3.1. Candidates

#	Solution	Type	Language	License	GitHub Stars	Status in BANA
1	Typesense	Search engine + Vector	C++	GPL-3	22k+	Already deployed
2	PostgreSQL pgvector	Extension	C	PostgreSQL	14k+	DB already deployed
3	Redis Stack	In-memory DB + Vector	C	RSALv2	67k+	Already deployed
4	Qdrant	Purpose-built vector DB	Rust	Apache-2	23k+	Not deployed
5	Meilisearch	Search engine + Vector	Rust	MIT	49k+	Not deployed
6	Weaviate	Vector DB + Vectorizers	Go	BSD-3	14k+	Not deployed
7	LanceDB	Embedded vector DB	Rust	Apache-2	5k+	Not deployed
8	Milvus/Zilliz	Distributed vector DB	Go/C++	Apache-2	40k+	Not deployed
9	Elasticsearch	Search engine + Vector	Java	SSPL/ELv2	72k+	Not deployed
10	OpenSearch	Search engine + Vector	Java	Apache-2	10k+	Not deployed
11	Pinecone	Managed vector DB	-	Proprietary	N/A	Not deployed
12	ChromaDB	Embedding DB	Python/Rust	Apache-2	18k+	Not deployed
13	Vespa	Search platform	Java/C++	Apache-2	6k+	Not deployed

3.2. Feature Matrix

Feature	Typesense	pgvector	Redis Stack	Qdrant	Meilisearch	Weaviate	Milvus	Elasticsearch
Vector Index (HNSW)	Yes	Yes	Yes	Yes	Yes (Arroy)	Yes	Yes	Yes (Lucene)
Hybrid Search	Native (rank fusion)	Manual (SQL)	KNN + filter	Dense+sparse fusion	semanticRatio	Alpha-blend	Native	Retriever API
Built-in Embeddings	Yes (S-BERT, E5)	No (BYO)	No (BYO)	No (BYO)	Yes (OpenAI, HF)	Yes (20+ modules)	No (BYO)	Yes (ELSER)
External Embedding API	OpenAI, PaLM	N/A	N/A	N/A	OpenAI, Cohere, HF	OpenAI, Cohere, HF	N/A	OpenAI, HF
BYO Vectors	Yes (float[])	Yes (vector)	Yes (float32/64)	Yes	Yes	Yes	Yes	Yes
Metadata Filtering	Full	Full (SQL)	10 attrs max	Full (payload)	Faceted	Full (where)	Full	Full
Faceted Search	Yes	No	No	No	Yes	No	No	Yes
Multi-tenancy	Filter-based	RLS/WHERE	Tag-based	Tiered shards	Tenant tokens	Shard-per-tenant	DB/collection/partition	Index-per-tenant
TypeScript SDK	Excellent	Drizzle native	Existing client	Official REST	Good	v3 client	Node.js SDK	Official
Bun Compatibility	Confirmed	Confirmed	Confirmed	Expected	Expected	Expected	Expected	Expected
Self-hosted Docker	Yes	Extension	Yes	Yes	Yes	Yes (Compose)	Yes (+ etcd)	Yes
Cloud-only	No	No	No	No	No	No	No	No

3.3. Performance Benchmarks

Solution	RPS (1M vectors)	P50 Latency	P99 Latency	Memory (100K x 1024d)	Insert Rate
Typesense	~5K-10K	<10ms	<50ms	~600MB + 2-6GB model	Real-time (CDC)
pgvector	~471 QPS (99% recall)	<50ms	<100ms	~600MB (HNSW)	Real-time (SQL)
Redis Stack	~3.4x Qdrant	~3ms	<10ms	~600MB (in-memory)	~15K vec/sec
Qdrant	High	~31ms	~37ms	~600MB (quantizable to ~60MB)	~50K vec/sec
Meilisearch	~5K-10K	<50ms	<100ms	~35x data size (RAM!)	Batch
Weaviate	~1.7x less than Redis	~10ms	<50ms	~3GB per 1M vectors	Moderate
Milvus	Highest open-source	<10ms	<30ms	~600MB	~50K vec/sec
Elasticsearch	~5x faster than OpenSearch	<50ms	<100ms	Requires 128-256GB RAM cluster	Moderate

3.4. Cost Analysis (Self-Hosted, Monthly)

Solution	New Infrastructure	RAM Overhead	Operational Complexity	Monthly Cost
Typesense (upgrade)	None	+2-6GB (embedding model)	None (already running)	$0
pgvector	None (extension)	+600MB (HNSW index)	Minimal (ALTER TABLE)	$0
Redis Stack (upgrade to v8+)	None	+600MB (vector index)	Minimal (existing Redis)	$0
Qdrant	1 Docker container	2-8GB	Moderate (new service)	$50-150
Meilisearch	1 Docker container	4-16GB (heavy!)	Moderate	$50-200
Weaviate	Docker Compose (multi-container)	8-16GB	High (modules, config)	$100-300
Milvus	Docker + etcd + MinIO	8-16GB	High (distributed)	$150-400
Elasticsearch	Docker cluster (3+ nodes)	128-256GB	Very High (JVM, ops)	$300-1000+

3.5. Critical Disqualifiers for BANA

Candidate	Disqualifier	Severity
Elasticsearch	Requires 128-256GB RAM cluster. JVM-based. Massive operational overhead. Overkill for POS search.	CRITICAL
OpenSearch	Same as Elasticsearch. 5x slower than ES for vector search.	CRITICAL
Vespa	No TypeScript/JavaScript SDK. XML-based config. Internet-scale complexity, completely overkill.	CRITICAL
Pinecone	Cloud-only, no self-hosted option. Consumption-based pricing (expensive at scale). Vendor lock-in.	HIGH
ChromaDB	Single-node only. In-memory HNSW (10M x 1536d = 60GB RAM). No multi-tenancy. Not production-ready at scale.	HIGH
Milvus	Requires etcd + MinIO for distributed mode. Disproportionate complexity for POS scale (<1M products).	MEDIUM
Weaviate	Multi-container Docker Compose. 8-16GB RAM. Complex module configuration. Steeper learning curve.	MEDIUM
Meilisearch	Single-node architecture. RAM usage = 35x data size. No horizontal scaling.	MEDIUM

3.6. Scoring Matrix

Weighted scoring (1-10, higher = better for BANA):

Criteria (Weight)	Typesense	pgvector	Redis Stack	Qdrant	Meilisearch	Weaviate	Milvus
Integration Fit (25%)	10	10	9	5	4	4	3
Hybrid Search (20%)	9	6	7	8	8	9	8
TypeScript/Bun SDK (15%)	10	10	9	7	8	7	6
Operational Cost (10%)	10	10	9	7	7	6	5
Performance (10%)	8	8	10	9	7	7	9
Multi-tenancy (5%)	7	9	6	8	7	9	10
Scalability (5%)	7	7	6	9	5	8	10
Vietnamese Support (10%)	7	7	7	7	7	7	7
Weighted Total	9.05	8.45	8.30	6.90	6.35	6.35	5.95

3.7. Finalist Analysis

Typesense Vector Upgrade (Score: 9.05) - WINNER

Why #1: Already deployed with CDC sync. Adding semantic search = adding an embed field to existing collection schemas. Zero new infrastructure, zero new services.

Strengths:

Native rank fusion hybrid search with configurable alpha parameter (0 = pure keyword, 1 = pure semantic)
Built-in embedding models: ts/all-MiniLM-L12-v2, ts/e5-small, ts/distiluse-base-multilingual-cased-v2
External API support: OpenAI, Google PaLM natively supported
BYO pre-computed embeddings via float[] fields
Existing Kafka CDC pipeline continues working unchanged
Typesense query_by supports mixing text fields and embedding fields in a single query
rerank_hybrid_matches: true for improved result quality
Cost: $0 additional

Weaknesses:

Built-in multilingual model (distiluse) is older, Vietnamese quality is moderate
Embedding model loading requires 2-6GB additional RAM
Not designed for billion-vector scale (adequate for POS at <10M products)

How hybrid search works in Typesense:

json

{
  "q": "laptop gaming",
  "query_by": "name_en,name_vi,description_en,embedding",
  "prefix": "true,true,true,false",
  "vector_query": "embedding:([], id: *, alpha: 0.3)",
  "filter_by": "status:=active && merchantId:=12345"
}

PostgreSQL pgvector (Score: 8.45) - COMPLEMENT

Why #2: Zero new infrastructure. Drizzle ORM has native pgvector support with vector() column type and distance functions (cosineDistance(), l2Distance(), innerProduct()).

Strengths:

Zero new services - just enable the pgvector extension on existing PostgreSQL

Drizzle ORM integration is seamless:

typescript

import { vector, index } from 'drizzle-orm/pg-core';
const products = pgTable('products', {
  embedding: vector('embedding', { dimensions: 1024 }),
});
// Query
db.select().from(products).orderBy(cosineDistance(products.embedding, queryVector)).limit(10);

Full SQL filtering combined with vector search (merchantId, organizerId, category, etc.)
halfvec type for 50% storage savings with no recall loss
HNSW index for sub-100ms queries at <10M vectors
pgvectorscale achieves 471 QPS at 99% recall on 50M vectors
Real-time indexing (vectors indexed on INSERT/UPDATE)

Weaknesses:

No built-in rank fusion - hybrid search (keyword + vector) must be implemented at application level
No built-in embedding generation (BYO only)
HNSW index consumes significant memory for large datasets

Best for: Product recommendations, "more like this", analytical similarity queries, cross-service vector lookups.

Redis Stack Vector Search (Score: 8.30) - REAL-TIME

Why #3: Already running Redis for cache, BullMQ, and pub/sub. Redis 8+ includes vector search natively (previously required Redis Stack module).

Strengths:

3ms P50 latency - fastest vector search among all candidates
Already deeply integrated (caching, queues, pub/sub, WebSocket)
Uses existing Redis client (node-redis)
Supports HNSW, FLAT, and SVS-VAMANA indexes
KNN + pre-filter on TAG, TEXT, NUMERIC, GEO fields
In-memory performance ideal for real-time POS checkout suggestions

Weaknesses:

Maximum 10 attributes per vector index
Everything in-memory - RAM cost grows linearly
No built-in embedding generation
Limited hybrid search (no true rank fusion)

Best for: Real-time POS similarity lookups ("customers also bought"), checkout suggestions, ultra-low-latency search cache.

4. Embedding Model Evaluation

4.1. Critical Requirement: Vietnamese + English

BANA uses i18n fields ({ en: string, vi: string }) for products, merchants, organizers, and categories. The embedding model must handle both Vietnamese and English effectively. Poor Vietnamese support is a disqualifier.

4.2. Candidates

#	Model	Provider	Dimensions	Parameters	Max Tokens	Languages	Type
1	BGE-M3	BAAI	1024	568M	8,192	100+	Open-source
2	jina-embeddings-v3	Jina AI	1024 (32-1024)	570M	8,192	89	API + Self-host
3	Typesense distiluse-multilingual	Typesense built-in	512	~135M	512	Multilingual	Built-in
4	Cohere Embed v4	Cohere	1536 (configurable)	-	128K	100+	API only
5	OpenAI text-embedding-3-small	OpenAI	1536 (configurable)	-	8,191	Multilingual	API only
6	multilingual-e5-large	Microsoft	1024	560M	512	100+	Open-source
7	gte-multilingual-base	Alibaba	768	~300M	8,192	70+	Open-source
8	Google gemini-embedding-001	Google	3072 (configurable)	-	8,192	100+	API
9	nomic-embed-text-v2-moe	Nomic AI	768	475M (305M active)	8,192	~100	Open-source
10	Voyage voyage-3.5	Voyage AI	1024	-	32K	Multilingual	API
11	OpenAI text-embedding-3-large	OpenAI	3072 (configurable)	-	8,191	Multilingual	API only
12	mxbai-embed-large	Mixedbread AI	1024	335M	512	English-focused	Open-source
13	all-MiniLM-L6-v2	Sentence-Transformers	384	22M	512	English only	Open-source
14	AWS Titan V2	AWS Bedrock	1024	-	8,192	Multilingual	API

4.3. Vietnamese Language Quality

Model	Vietnamese Explicit Support	MTEB Multilingual Rank	Vietnamese Quality Assessment
jina-embeddings-v3	Top 30 languages	High	Excellent
BGE-M3	100+ languages (trained on 170+ lang data)	High	Excellent
Cohere Embed v4	100+ languages	#1 multilingual	Good
multilingual-e5-large	Strong on Mr. TyDi multilingual	Good	Good
gte-multilingual-base	70+ languages	SOTA multilingual	Good
gemini-embedding-001	100+ languages	#1 MTEB multilingual	Good
nomic-embed-text-v2-moe	~100 languages	Good	Moderate
OpenAI text-embedding-3-small/large	Multilingual but not optimized	Good	Moderate
Typesense distiluse-multilingual	Multilingual (older model)	Moderate	Moderate
Voyage voyage-3.5	Multilingual (not highlighted)	Good	Moderate
mxbai-embed-large	English-focused	N/A	Poor
all-MiniLM-L6-v2	English only	N/A	Poor

4.4. Cost Analysis

Model	100K docs (~50M tokens)	1M docs (~500M tokens)	10M docs (~5B tokens)	Self-Hosted Option
BGE-M3 (self-hosted)	$0 (infra only)	$0 (infra only)	$0 (infra only)	Yes (Ollama, TEI)
Typesense built-in	$0	$0	$0	Yes (built-in)
multilingual-e5-large	$0 (infra only)	$0 (infra only)	$0 (infra only)	Yes (Ollama, TEI)
gte-multilingual-base	$0 (infra only)	$0 (infra only)	$0 (infra only)	Yes (HF, TEI)
OpenAI 3-small	$1.00	$10.00	$100.00	No
OpenAI 3-large	$6.50	$65.00	$650.00	No
Cohere Embed v4	$6.00	$60.00	$600.00	No
gemini-embedding-001	$7.50 (free tier available)	$75.00	$750.00	No
Voyage voyage-3.5	$3.00 (first 200M free)	$18.00	$300.00	No
jina-embeddings-v3	Token-based (free trial 10M)	Token-based	Token-based	Partial

Assumes ~500 tokens per product document (title + description + category in both en + vi).

Self-Hosted Infrastructure Cost:

Setup	Monthly Cost	Throughput	Best For
Ollama on CPU (4-core, 8GB)	~$30-50/month VPS	~50-100 docs/sec	Small-medium catalog
HuggingFace TEI on CPU	~$50-80/month VPS	~100-200 docs/sec	Medium catalog
HuggingFace TEI on GPU (T4)	~$150-300/month	~500-2,000 docs/sec	Large catalog

4.5. Self-Hosting Options for TypeScript/Bun

Runtime	Models	Bun Compatible	Latency	Best For
Ollama (Docker)	bge-m3, nomic-embed, mxbai-embed	Yes (REST API)	10-30ms/embed	Production. Simple Docker, model management, REST API
HuggingFace TEI (Docker)	Any HF model (bge-m3, e5, gte)	Yes (REST API)	5-20ms/embed	Production. Token batching, Flash Attention, Prometheus metrics
Transformers.js (in-process)	Smaller models (<150M params)	Yes (v4)	20-50ms/embed	Prototyping. CPU only, large models are slow
ONNX Runtime (native addon)	ONNX-exported models	Partial	<10ms/embed (int8)	Performance-critical. Complex setup

4.6. Embedding Strategy for BANA Products

typescript

// Recommended: Concatenate relevant fields for embedding
function createProductEmbeddingText(product: TProduct): string {
  const parts = [
    // Primary: product name (both languages, highest weight)
    product.name?.en, product.name?.vi,
    // Secondary: description
    product.description?.en, product.description?.vi,
    // Tertiary: structured attributes
    product.category?.name?.en, product.category?.name?.vi,
    product.merchant?.name?.en,
    // Variant info
    ...product.variants?.map(v => `${v.name?.en || ''} ${v.name?.vi || ''}`),
    // Identifiers (for exact match fallback)
    product.sku, product.barcode,
  ].filter(Boolean);

  return parts.join(' | ');
}

Key principles:

Most important fields (name) first - models weight early tokens more
Separator (|) between fields to prevent semantic bleeding
Both language versions for cross-language search
Keep total under 8,192 tokens for BGE-M3 (product data is typically <200 tokens)
Hash the embedding text; skip re-embedding if hash unchanged

4.7. Scoring Matrix

Criteria (Weight)	BGE-M3	jina-v3	TS built-in	Cohere v4	OpenAI 3-small	e5-multi	gte-multi
Vietnamese Quality (25%)	9	10	6	8	6	8	8
English Quality (15%)	8	8	6	9	8	7	8
Self-Hosted Option (15%)	10	8	10	3	3	9	9
Cost Efficiency (15%)	10	7	10	6	9	10	10
Bun/TS Compatibility (10%)	8	8	10	8	9	7	7
Dimensions/Storage (10%)	8	9	7	9	9	8	7
Context Length (10%)	9	9	7	10	8	6	9
Weighted Total	9.00	8.60	7.55	7.15	6.90	8.05	8.15

4.8. Embedding Model Verdict

BGE-M3 (Score: 9.00) - WINNER

100+ languages including Vietnamese, trained on 170+ language data
Triple retrieval: Dense + Sparse (BM25-like) + ColBERT multi-vector in one model
1024 dimensions, 8,192 token context, 568M parameters
Self-hosted via Ollama: ollama pull bge-m3 (1.2GB), REST API at localhost:11434
Also via HuggingFace TEI: Production-grade Docker serving with Prometheus metrics
Cost: $0 per token (self-hosted), ~$30-80/month infrastructure

jina-embeddings-v3 (Score: 8.60) - RUNNER-UP

Vietnamese explicitly in top 30 best-performing languages (best Vietnamese quality)
89 languages, 570M parameters, 1024 dimensions (configurable 32-1024)
Task-specific LoRA adapters: retrieval, text-matching, classification
Caveat: API pricing is token-based; self-hosting options are more limited

Typesense Built-in distiluse-multilingual (Score: 7.55) - QUICK START

Zero configuration - just add embed field to collection schema
512 dimensions, multilingual
Vietnamese quality: Moderate (older model, not specifically optimized)
Best for: Phase 1 quick start, validate hybrid search before investing in better models

5. Architecture Evaluation

5.1. Candidate Architectures

Option A: Typesense-Only (Simplest)

Aspect	Assessment
Effort	1-2 weeks
New infrastructure	None
Monthly cost	$0 additional
Vietnamese quality	Moderate (distiluse-multilingual)
Hybrid search quality	Excellent (native rank fusion)
Search latency	<50ms

Option B: Typesense + Ollama BGE-M3 (Best Balance)

Aspect	Assessment
Effort	3-4 weeks
New infrastructure	1 Docker container (Ollama, ~2GB RAM)
Monthly cost	~$30-80 (Ollama on existing server or small VPS)
Vietnamese quality	Excellent (BGE-M3, 100+ languages)
Hybrid search quality	Excellent (Typesense rank fusion with high-quality embeddings)
Search latency	<50ms search + 10-30ms embedding (cached: 5ms)

Option C: Typesense + pgvector + Ollama (Full Power)

Aspect	Assessment
Effort	4-6 weeks
New infrastructure	1 Docker container (Ollama) + pgvector extension
Monthly cost	~$30-80
Capabilities	Hybrid search + recommendations + "more like this" + SQL analytics
Search latency	<50ms search, <100ms recommendations

Option D: Cloud API Only (Lowest Effort)

Aspect	Assessment
Effort	1 week
New infrastructure	None
Monthly cost	$5-50 API costs (scales with data volume)
Vietnamese quality	Good (OpenAI) to Excellent (Cohere v4)
Vendor dependency	High (API key, rate limits, pricing changes)

5.2. Architecture Scoring

Criteria (Weight)	Option A	Option B	Option C	Option D
Vietnamese Quality (25%)	6	10	10	7
Integration Effort (20%)	10	7	5	10
Operational Cost (15%)	10	8	8	6
Feature Completeness (15%)	6	8	10	6
Data Sovereignty (10%)	10	10	10	3
Search Quality (10%)	7	9	9	8
Maintenance (5%)	10	7	6	9
Weighted Total	7.85	8.55	8.15	7.05

6. Hybrid Search Design

6.1. How Hybrid Search Works

Hybrid search runs keyword (BM25) and semantic (vector) searches in parallel, then fuses results using Reciprocal Rank Fusion (RRF):

RRF_score(document) = SUM( 1 / (rank_i + k) )

Where rank_i is the document's position in each result list, and k is a smoothing constant (typically 60).

Typesense implements this natively via the alpha parameter:

alpha: 0.0 = pure keyword search
alpha: 0.3 = 70% keyword, 30% semantic (recommended start)
alpha: 0.5 = equal blend
alpha: 1.0 = pure semantic search

6.2. When Semantic Search is WORSE Than Keyword

This is critical for a POS system:

Query Type	Best Search Mode	Why
SKU codes (`SKU-2847-B`)	Keyword only	Semantic may confuse similar codes
Barcodes (`8935049001234`)	Keyword only	Exact match required
Model numbers (`iPhone 15 Pro Max 256GB`)	Keyword-dominant	Semantic may confuse similar models
Price lookups (`price < 500000`)	Filter only	Not a search problem
Natural language (`cheap wireless headphones`)	Hybrid	Semantic understands intent
Synonym queries (`running shoes` → `sneakers`)	Semantic-dominant	Keyword won't match synonyms
Vietnamese queries (`tai nghe khong day`)	Hybrid	Semantic handles Vietnamese well

6.3. Query Classification Strategy

6.4. Re-ranking (Optional, Phase 3+)

Two-stage retrieval improves quality by 20-35% but adds 200-500ms latency:

Stage 1 (Retrieve): Typesense hybrid search returns top 50-100 candidates (~20ms)
Stage 2 (Re-rank): Cross-encoder scores query + each candidate pair for precise relevance

Re-ranker	Latency (top 20)	Quality	Cost
Cohere Rerank 4	~200ms	Best	$2/1K queries
BGE-reranker-v2-m3	~150ms	Very good, multilingual	Free (self-hosted)
ms-marco-MiniLM-L-6-v2	~100ms	Good	Free (self-hosted)

Recommendation: Skip re-ranking initially. Add it in Phase 3+ only if search quality metrics indicate a need.

7. Data Pipeline Design

7.1. Embedding Pipeline (Option B Architecture)

7.2. Embedding Invalidation

Fields that trigger re-embedding vs. fields that don't:

Trigger Re-embedding	Skip Re-embedding
Product name (en, vi)	Price changes
Product description (en, vi)	Stock quantity
Category assignment	Sort order
Brand / key attributes	Internal flags
Variant names	Status (filtered, not embedded)

Implementation: SHA-256 hash of embedding-relevant fields. Compare hash on CDC event; skip re-embedding if unchanged.

7.3. BullMQ Queue Design

Follows the existing BANA BullMQ pattern (same as mq-pay and finance):

Queue: embedding:generation:P01, P02, P03  (3 partitions for load distribution)
Queue: embedding:indexing:P01, P02, P03

Concurrency: 10 jobs per partition (configurable)
Retry: Exponential backoff, max 3 retries
Dead Letter Queue: embedding:generation:failed

8. Memory Estimation

8.1. HNSW Index Memory Formula

Memory = 1.1 * (4 * dimensions + 8 * m) * num_vectors bytes

Where m is HNSW max bi-directional links (typically 16).

8.2. BANA Projections

Scale	Products	Dimensions	HNSW Memory	Typesense Total (+ model)	pgvector Total
Small	10,000	1024 (BGE-M3)	~60 MB	~2.1 GB (+ 2GB model)	~60 MB
Medium	100,000	1024	~600 MB	~2.6 GB (+ 2GB model)	~600 MB
Large	1,000,000	1024	~6 GB	~8 GB (+ 2GB model)	~6 GB
Small (halfvec)	100,000	1024	~300 MB	N/A	~300 MB

Note: These estimates are for the products collection only. Multiply by ~1.3x for all 6 collections (other collections are smaller).

9. Decision

9.1. Selected: Option B - Typesense + Ollama BGE-M3 (Score: 8.55)

Selected Vector Database: Typesense (vector upgrade to existing deployment) Selected Embedding Model: BGE-M3 via Ollama (self-hosted) Architecture: Typesense hybrid search with pre-computed BGE-M3 embeddings via BullMQ pipeline

9.2. Rationale

Leverages existing infrastructure - Typesense already deployed with CDC sync. No new database services required.
Best Vietnamese + English support - BGE-M3 handles 100+ languages including Vietnamese at excellent quality. Self-hosted means no API dependency.
Zero per-token costs - Self-hosted embedding generation. Predictable infrastructure cost (~$30-80/month for Ollama).
Production-proven hybrid search - Typesense rank fusion combines keyword + semantic search natively. Configurable alpha parameter for tuning.
Fits existing BullMQ pattern - Embedding pipeline uses the same queue architecture as finance/payment packages (3 partitions, configurable concurrency).
Data sovereignty - All data and models stay on your servers. No external API calls for embeddings.
Incremental migration - Can start with Phase 1 (Typesense built-in model) in 1-2 weeks, then upgrade to BGE-M3 in Phase 2.

9.3. Implementation Phases

Phase	Duration	Description	Impact
Phase 1	Week 1-2	Add Typesense built-in embeddings (`ts/distiluse-multilingual`) to product collection. Enable hybrid search with `alpha: 0.3`. Test with existing CDC pipeline. Query classification for SKU vs natural language.	Immediate semantic search. $0 cost.
Phase 2	Week 3-4	Deploy Ollama + BGE-M3 Docker container. Build BullMQ embedding pipeline (generation + indexing). Replace built-in model with pre-computed BGE-M3 `float[]` vectors. Tune `alpha` parameter.	Superior Vietnamese search quality.
Phase 3	Week 5-6	Extend to all 6 collections. Add query embedding cache (Redis). Implement search analytics (CTR, zero-results rate). Add "related products" using vector similarity.	Full semantic search across all entities.
Phase 4	Future	Add pgvector for SQL-level recommendations. Add cross-encoder re-ranking for high-value queries. Implement "did you mean?" suggestions. Fine-tune embedding model if needed.	Advanced features.

9.4. Cost Summary

Component	Status	Monthly Cost
Typesense (existing)	Already running	$0
Ollama BGE-M3 (Docker on existing server or $30 VPS)	New	$0-30
Redis embedding cache (existing Redis DB)	Already running	$0
BullMQ workers (existing Bun infrastructure)	Reuse pattern	$0
Kafka CDC pipeline (existing)	Already running	$0
Total		$0-30/month

10. PostgreSQL Full-Text Search Addendum

10.1. Research Motivation

PostgreSQL offers built-in Full-Text Search (FTS) capabilities, and several extensions promise BM25 scoring within PostgreSQL itself. This addendum evaluates whether PostgreSQL-native search could replace or complement the Typesense + Ollama BGE-M3 architecture selected in Section 9.

Key questions investigated:

Can PostgreSQL provide BM25 scoring natively?
How well does PostgreSQL FTS handle Vietnamese text?
How does Drizzle ORM integrate with PostgreSQL FTS?
Should PostgreSQL FTS replace or complement Typesense?

10.2. BM25 Extensions Comparison

Native PostgreSQL FTS uses ts_rank (a TF-IDF variant), not BM25. Several extensions add BM25 scoring:

Extension	BM25	License	WAL Safety	Vietnamese	Drizzle ORM	Production Status
Native FTS	No (TF-IDF)	PostgreSQL	Yes	Custom config	Full	Decades-mature
ParadeDB pg_search	Yes	AGPL-3.0	No (community)	ICU only	Raw SQL only	Active ($12M funded)
Timescale pg_textsearch	Yes	PostgreSQL	Unknown	Unknown	Raw SQL only	Preview only
VectorChord-BM25	Yes	Apache-2.0	Unknown	Unknown	Raw SQL only	Very new
PGroonga	No (Groonga)	PostgreSQL	Yes	Native	Raw SQL only	10+ years mature
pg_trgm	No (similarity)	PostgreSQL	Yes	Works with unaccent	Full	Decades-mature

10.2.1. ParadeDB pg_search - SKIP

ParadeDB builds BM25 search into PostgreSQL via the Tantivy engine (Rust, Lucene-inspired). Performance is 20-1000x faster than native FTS.

Critical concerns for BANA:

AGPL-3.0 license - If you modify pg_search and serve it over a network (SaaS/POS), you may need to release modifications. Requires legal review for commercial use.
No WAL support in community edition - PostgreSQL crash = full BM25 index rebuild. Enterprise edition adds WAL but requires paid license.
One BM25 index per table - Cannot create multiple BM25 indexes with different configurations on the same table.
No Vietnamese tokenizer - Tantivy offers ICU tokenization (Unicode word boundaries) but no linguistically-aware Vietnamese compound word segmentation.
No Drizzle ORM integration - All queries via raw SQL (db.execute(sql\...`)`).

10.2.2. Timescale pg_textsearch - WATCH

This is the extension originally asked about. Developed by Timescale with a PostgreSQL-compatible license (permissive). However, it is in preview status - not suitable for production deployment yet. Worth monitoring for future GA release.

10.2.3. PGroonga - CONSIDER

PGroonga (Groonga-based, v4.0.5) is the only extension that natively supports Vietnamese out of the box, handling all languages including CJK and Vietnamese without custom configuration. Available on Supabase as a built-in extension. Trade-off: adds Groonga as an external dependency.

10.3. Vietnamese Search in PostgreSQL

10.3.1. The Vietnamese Challenge

Vietnamese text presents unique challenges for search:

Challenge	Example	Impact
Tonal diacritics	`à á ả ã ạ` are all variants of `a`	Users may search without diacritics
Double-composed characters	`ế` = circumflex + acute (2 marks)	Character normalization required
Compound words	"Hà Nội" = 2 syllables, 1 word	Space-based tokenization breaks compound words
No built-in PG config	Vietnamese not in Snowball stemmers	Custom text search configuration needed

10.3.2. The `unaccent` Solution

Since PostgreSQL 11, the default unaccent.rules includes all Vietnamese characters:

sql

CREATE EXTENSION IF NOT EXISTS unaccent;

-- Test: all Vietnamese diacritics handled
SELECT unaccent('cà phê điện thoại');  -- → 'ca phe dien thoai'
SELECT unaccent('Hà Nội thủ đô');      -- → 'Ha Noi thu do'

Characters handled: ắ ằ ẳ ẵ ặ → a, ế ề ể ễ ệ → e, ớ ờ ở ỡ ợ → o, ứ ừ ử ữ ự → u, đ → d, and all tonal combinations.

Important: unaccent() is STABLE (not IMMUTABLE), so an immutable wrapper is required for indexes and generated columns:

sql

CREATE OR REPLACE FUNCTION f_unaccent(text)
RETURNS text AS $$
  SELECT public.unaccent('public.unaccent', $1);
$$ LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT;

10.3.3. Custom Vietnamese Text Search Configuration

sql

CREATE TEXT SEARCH CONFIGURATION vietnamese (COPY = simple);
ALTER TEXT SEARCH CONFIGURATION vietnamese
  ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
                    word, hword, hword_part
  WITH unaccent, simple;

This enables Vietnamese FTS where "ca phe" matches "cà phê" via diacritic normalization.

10.3.4. Vietnamese Word Segmentation Tools

Tool	Language	Speed	PostgreSQL Extension?
CocCoc Tokenizer	C++	Very fast	No
Underthesea	Python	Good	No
VnCoreNLP	Java	Good	No
PGroonga/Groonga	C	Fast	Yes

None of the Vietnamese NLP tokenizers have native PostgreSQL extensions (except PGroonga). For compound word segmentation, application-layer preprocessing is required.

10.3.5. ICU Accent-Insensitive Collation (PostgreSQL 12+)

sql

CREATE COLLATION vi_accent_insensitive (
  provider = icu,
  deterministic = false,
  locale = 'vi-u-ks-level1'
);

This provides accent-insensitive comparisons at the database level but cannot be used with LIKE patterns (only equality =) and has a performance penalty.

10.4. Drizzle ORM Integration

10.4.1. Custom tsvector Type

Drizzle ORM does not have native tsvector support. A custom type definition is required:

typescript

import { customType } from 'drizzle-orm/pg-core';

export const tsvector = customType<{ data: string }>({
  dataType() {
    return 'tsvector';
  },
});

10.4.2. Schema with Generated Column + GIN Index

typescript

import { SQL, sql } from 'drizzle-orm';
import { index, pgTable, text, bigint } from 'drizzle-orm/pg-core';

export const products = pgTable(
  'products',
  {
    id: bigint('id', { mode: 'bigint' }).primaryKey(),
    nameVi: text('name_vi').notNull(),
    nameEn: text('name_en'),
    nameSearch: tsvector('name_search')
      .generatedAlwaysAs(
        (): SQL =>
          sql`setweight(to_tsvector('vietnamese',
                f_unaccent(coalesce(${products.nameVi}, ''))), 'A') ||
              setweight(to_tsvector('english',
                coalesce(${products.nameEn}, '')), 'B')`,
      ),
  },
  (t) => [
    // GIN index for full-text search
    index('idx_products_name_search').using('gin', t.nameSearch),
    // GIN trigram index for fuzzy search
    index('idx_products_name_vi_trgm').using(
      'gin',
      sql`f_unaccent(lower(${t.nameVi}))`.op('gin_trgm_ops'),
    ),
  ],
);

Key constraint: The generation expression must use only IMMUTABLE functions. Both to_tsvector() (with explicit regconfig 'vietnamese') and f_unaccent() (our custom wrapper) are immutable.

10.4.3. Query Patterns with Drizzle

typescript

// Full-text search with ranking
const query = sql`plainto_tsquery('vietnamese', f_unaccent(${searchTerm}))`;
const results = await db
  .select({
    id: products.id,
    nameVi: products.nameVi,
    rank: sql<number>`ts_rank(${products.nameSearch}, ${query})`.as('rank'),
  })
  .from(products)
  .where(sql`${products.nameSearch} @@ ${query}`)
  .orderBy(desc(sql`ts_rank(${products.nameSearch}, ${query})`));

// Fuzzy trigram search (typo-tolerant fallback)
const fuzzy = await db
  .select({
    id: products.id,
    sim: sql<number>`similarity(
      f_unaccent(lower(${products.nameVi})),
      f_unaccent(lower(${term}))
    )`.as('sim'),
  })
  .from(products)
  .where(sql`f_unaccent(lower(${products.nameVi})) % f_unaccent(lower(${term}))`)
  .orderBy(desc(sql`similarity(...)`));

10.4.4. Migration in BANA (IGNIS Pattern)

typescript

// src/migrations/processes/search-0001-setup-vietnamese-fts.ts
export const process: TMigrationProcess = {
  name: 'search-0001-setup-vietnamese-fts',
  handler: async ({ connector }) => {
    await connector.execute(sql`CREATE EXTENSION IF NOT EXISTS unaccent`);
    await connector.execute(sql`CREATE EXTENSION IF NOT EXISTS pg_trgm`);
    await connector.execute(sql`
      CREATE OR REPLACE FUNCTION f_unaccent(text)
      RETURNS text AS $$
        SELECT public.unaccent('public.unaccent', $1);
      $$ LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT
    `);
    await connector.execute(sql`
      CREATE TEXT SEARCH CONFIGURATION IF NOT EXISTS vietnamese (COPY = simple)
    `);
    await connector.execute(sql`
      ALTER TEXT SEARCH CONFIGURATION vietnamese
        ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
                          word, hword, hword_part
        WITH unaccent, simple
    `);
  },
};

10.5. Practical Architecture: Multi-Layer Search

For BANA, the recommended approach is multi-layer search with graceful degradation:

Layer	Engine	Best For	Latency
Primary	Typesense (hybrid)	Product catalog, high-volume search, faceted search	<50ms
Complement	PostgreSQL FTS	Real-time consistent queries, inline CRUD search, small collections	10-50ms
Fallback	PostgreSQL pg_trgm	Fuzzy/typo-tolerant search, Vietnamese diacritic-insensitive	10-50ms

10.6. Impact on Original Decision

The original recommendation (Option B: Typesense + Ollama BGE-M3) remains unchanged. PostgreSQL FTS is a valuable complement, not a replacement.

Aspect	Typesense (Primary)	PostgreSQL FTS (Complement)
BM25 ranking	Built-in	No (TF-IDF only via `ts_rank`)
Semantic/Vector search	Built-in (with BGE-M3 embeddings)	No (requires pgvector separately)
Vietnamese diacritics	ICU tokenization	`unaccent` extension (full support)
Typo tolerance	Built-in	Via `pg_trgm` extension
Faceted search	Native	No
Drizzle ORM integration	N/A (separate service)	Full (generated columns, GIN indexes)
Real-time consistency	CDC delay (~1-5s)	Instant (same transaction)
Operational cost	Already deployed ($0)	Already deployed ($0)

When to use PostgreSQL FTS over Typesense:

Sale order search within a single merchant context (real-time consistency matters)
Finance transaction search (small dataset, transactional queries)
Inline search filters within existing CRUD controllers
Fallback when Typesense is temporarily unavailable

BM25 extensions verdict:

ParadeDB: Skip (AGPL license + no WAL in community edition)
pg_textsearch: Watch (promising PostgreSQL license, but still preview)
PGroonga: Consider for best Vietnamese support (if you need compound word segmentation)
Native FTS + pg_trgm: Use now ($0 cost, full Drizzle integration)

11. Comparison with Alternatives Not Selected

Alternative	Why Not Selected
Elasticsearch/OpenSearch	128-256GB RAM cluster. JVM operational overhead. Overkill for POS scale.
Vespa	No TypeScript SDK. XML-based config. Internet-scale complexity.
Pinecone	Cloud-only. No self-hosted. Consumption pricing grows with usage. Vendor lock-in.
ChromaDB	Single-node. 60GB RAM for 10M vectors. No multi-tenancy. Not production-ready.
Milvus	Requires etcd + MinIO. Disproportionate complexity for <1M products.
Weaviate	Multi-container Compose. 8-16GB RAM minimum. Steeper learning curve.
Meilisearch	Single-node. RAM = 35x data size. No horizontal scaling.
Pure Cloud API (OpenAI/Cohere)	Ongoing per-token costs. API dependency. Rate limits. Less control.

Document	Description
Search	Search library identity card + catalog
Search - Architecture	CDC pipeline + query path
Search - Domain Model	Typesense collections
Commerce Package	Product search integration (host)
Core Database	PostgreSQL schemas and models

Providers

Invoice Types

Semantic Search Selection Report ​

1. Overview ​

2. Current Search Architecture ​

2.1. Data Flow ​

2.2. Indexed Collections ​

2.3. Infrastructure Context ​

2.4. Capabilities Gap ​

3. Vector Database Evaluation ​

3.1. Candidates ​

3.2. Feature Matrix ​

3.3. Performance Benchmarks ​

3.4. Cost Analysis (Self-Hosted, Monthly) ​

3.5. Critical Disqualifiers for BANA ​

3.6. Scoring Matrix ​

3.7. Finalist Analysis ​

Typesense Vector Upgrade (Score: 9.05) - WINNER ​

PostgreSQL pgvector (Score: 8.45) - COMPLEMENT ​

Redis Stack Vector Search (Score: 8.30) - REAL-TIME ​

4. Embedding Model Evaluation ​

4.1. Critical Requirement: Vietnamese + English ​

4.2. Candidates ​

4.3. Vietnamese Language Quality ​

4.4. Cost Analysis ​

4.5. Self-Hosting Options for TypeScript/Bun ​

4.6. Embedding Strategy for BANA Products ​

4.7. Scoring Matrix ​

4.8. Embedding Model Verdict ​

BGE-M3 (Score: 9.00) - WINNER ​

jina-embeddings-v3 (Score: 8.60) - RUNNER-UP ​

Typesense Built-in distiluse-multilingual (Score: 7.55) - QUICK START ​

5. Architecture Evaluation ​

5.1. Candidate Architectures ​

Option A: Typesense-Only (Simplest) ​

Option B: Typesense + Ollama BGE-M3 (Best Balance) ​

Option C: Typesense + pgvector + Ollama (Full Power) ​

Option D: Cloud API Only (Lowest Effort) ​

5.2. Architecture Scoring ​

6. Hybrid Search Design ​

6.1. How Hybrid Search Works ​

6.2. When Semantic Search is WORSE Than Keyword ​

6.3. Query Classification Strategy ​

6.4. Re-ranking (Optional, Phase 3+) ​

7. Data Pipeline Design ​

7.1. Embedding Pipeline (Option B Architecture) ​

7.2. Embedding Invalidation ​

7.3. BullMQ Queue Design ​

8. Memory Estimation ​

8.1. HNSW Index Memory Formula ​

8.2. BANA Projections ​

9. Decision ​

9.1. Selected: Option B - Typesense + Ollama BGE-M3 (Score: 8.55) ​

9.2. Rationale ​

9.3. Implementation Phases ​

9.4. Cost Summary ​

10. PostgreSQL Full-Text Search Addendum ​

10.1. Research Motivation ​

10.2. BM25 Extensions Comparison ​

10.2.1. ParadeDB pg_search - SKIP ​

10.2.2. Timescale pg_textsearch - WATCH ​

10.2.3. PGroonga - CONSIDER ​

10.3. Vietnamese Search in PostgreSQL ​

10.3.1. The Vietnamese Challenge ​

10.3.2. The unaccent Solution ​

10.3.3. Custom Vietnamese Text Search Configuration ​

10.3.4. Vietnamese Word Segmentation Tools ​

10.3.5. ICU Accent-Insensitive Collation (PostgreSQL 12+) ​

10.4. Drizzle ORM Integration ​

10.4.1. Custom tsvector Type ​

10.4.2. Schema with Generated Column + GIN Index ​

10.4.3. Query Patterns with Drizzle ​

10.4.4. Migration in BANA (IGNIS Pattern) ​

10.5. Practical Architecture: Multi-Layer Search ​

10.6. Impact on Original Decision ​

11. Comparison with Alternatives Not Selected ​

12. Related Documentation ​

13. References ​

Vector Databases ​

Semantic Search Selection Report

1. Overview

2. Current Search Architecture

2.1. Data Flow

2.2. Indexed Collections

2.3. Infrastructure Context

2.4. Capabilities Gap

3. Vector Database Evaluation

3.1. Candidates

3.2. Feature Matrix

3.3. Performance Benchmarks

3.4. Cost Analysis (Self-Hosted, Monthly)

3.5. Critical Disqualifiers for BANA

3.6. Scoring Matrix

3.7. Finalist Analysis

Typesense Vector Upgrade (Score: 9.05) - WINNER

PostgreSQL pgvector (Score: 8.45) - COMPLEMENT

Redis Stack Vector Search (Score: 8.30) - REAL-TIME

4. Embedding Model Evaluation

4.1. Critical Requirement: Vietnamese + English

4.2. Candidates

4.3. Vietnamese Language Quality

4.4. Cost Analysis

4.5. Self-Hosting Options for TypeScript/Bun

4.6. Embedding Strategy for BANA Products

4.7. Scoring Matrix

4.8. Embedding Model Verdict

BGE-M3 (Score: 9.00) - WINNER

jina-embeddings-v3 (Score: 8.60) - RUNNER-UP

Typesense Built-in distiluse-multilingual (Score: 7.55) - QUICK START

5. Architecture Evaluation

5.1. Candidate Architectures

Option A: Typesense-Only (Simplest)

Option B: Typesense + Ollama BGE-M3 (Best Balance)

Option C: Typesense + pgvector + Ollama (Full Power)

Option D: Cloud API Only (Lowest Effort)

5.2. Architecture Scoring

6. Hybrid Search Design

6.1. How Hybrid Search Works

6.2. When Semantic Search is WORSE Than Keyword

6.3. Query Classification Strategy

6.4. Re-ranking (Optional, Phase 3+)

7. Data Pipeline Design

7.1. Embedding Pipeline (Option B Architecture)

7.2. Embedding Invalidation

7.3. BullMQ Queue Design

8. Memory Estimation

8.1. HNSW Index Memory Formula

8.2. BANA Projections

9. Decision

9.1. Selected: Option B - Typesense + Ollama BGE-M3 (Score: 8.55)

9.2. Rationale

9.3. Implementation Phases

9.4. Cost Summary

10. PostgreSQL Full-Text Search Addendum

10.1. Research Motivation

10.2. BM25 Extensions Comparison

10.2.1. ParadeDB pg_search - SKIP

10.2.2. Timescale pg_textsearch - WATCH

10.2.3. PGroonga - CONSIDER

10.3. Vietnamese Search in PostgreSQL

10.3.1. The Vietnamese Challenge

10.3.2. The `unaccent` Solution

10.3.3. Custom Vietnamese Text Search Configuration

10.3.4. Vietnamese Word Segmentation Tools

10.3.5. ICU Accent-Insensitive Collation (PostgreSQL 12+)

10.4. Drizzle ORM Integration

10.4.1. Custom tsvector Type

10.4.2. Schema with Generated Column + GIN Index

10.4.3. Query Patterns with Drizzle

10.4.4. Migration in BANA (IGNIS Pattern)

10.5. Practical Architecture: Multi-Layer Search

10.6. Impact on Original Decision

11. Comparison with Alternatives Not Selected

12. Related Documentation

13. References

Vector Databases