ADR-0001. Multi-tenancy isolation tiers (Pool / Bridge / Silo)

Field	Value
Status	Draft (Proposed)
Date	2026-05-22
Deciders	Phat Nguyen
Scope	Cross-cutting - `@nx/core` datasource, all services, deployment
Supersedes	-
Product context	Multi-Tenancy Strategy (PRD)

Context

Problem

BANA runs a single shared database (nx_seller) with a single static connection pool per service (PostgresCoreDataSource calls new Pool() once). Tenants are separated only by merchantId / organizerId filters applied in repository queries - i.e. the Pool model.
We need to support stronger isolation (own DB, own stack) for some Orgs while keeping Pool cheap for the majority, and be able to move an Org between models - including the hard direction, Silo → Pool consolidation.

Trigger

Long-term deployment & operations planning. Tenant count and contract diversity are growing; committing to a single global model now would be expensive to undo later.

Current state (AS-IS)

Aspect	Today
DB isolation	One shared `nx_seller`, one static pool/service
Tenant column	`merchantId` (primary), `organizerId` (parent) on ~52 tables
Isolation enforcement	Application-level query filters; no RLS, no schema-per-tenant
Tenant resolution	JWT claims (`organizers[]`, `merchants[]`) → request context filter
Routing	Token-based; not subdomain/host-based
IDs	Snowflake (globally unique) via `IdGenerator`
Deployment	K8s on VNPAY Cloud (Kustomize, separate staging/prod clusters); Traefik gateway

Decision

Adopt a hybrid, tiered model where isolation level is a per-Org runtime attribute, not a platform-wide constant. The isolation unit is the Organizer.

Tier	Services	Database	Default for
POOL	Shared	Shared `nx_seller`, `organizerId` filter	All Orgs (default)
BRIDGE	Shared	One DB per Org	Orgs needing data isolation
SILO	Dedicated stack	One DB per Org	Enterprise / on-prem

Three platform mechanisms make this possible - all land in @nx/core, not in business code:

Tenant Registry - table orgId → { isolationTier, datasourceRef }. Every Org defaults to POOL.
Connection Resolver - evolve PostgresCoreDataSource from one static pool into a per-tenant pool resolver that reads the registry and caches connections. This is the single blocking change; everything else builds on it.
Snowflake WorkerId allocator - centrally assign worker/node IDs so independently-running silos never mint colliding IDs (formalizes the existing APP_ENV_NODE_ID convention, see payment ADR-0002).

Migration directionality

Direction	Difficulty	Mechanism
POOL → SILO (extract)	Easy	Filtered copy by `organizerId` (logical replication / `pg_dump --where`), cutover, flip registry
SILO → POOL (merge)	Hard but enabled	Import preserving Snowflake IDs, verify no orphan FKs, flip registry, lock old silo

Why merge is safe here: Snowflake IDs are globally unique, so importing a silo's rows into shared tables causes no PK collision - the property that makes auto-increment-based merges nearly impossible. Per-merchant voucher sequences are scoped by merchantId, so human-readable numbers don't collide across Orgs either.

Consequences

Pros	Cons
Isolation becomes a per-Org dial, not a rewrite	Connection Resolver adds complexity to the data layer hot path
New Orgs onboard instantly (POOL default)	Need a tenant-aware connection cache (lifecycle, eviction)
Snowflake IDs make Silo→Pool merge feasible	WorkerId allocation must be centrally governed or merges break
Business/repository code is unchanged across tiers	Schema migrations must fan out across N databases (Bridge/Silo)
Fits existing K8s/Kustomize + Traefik topology	Bridge/Silo provisioning needs automation before they scale

Alternatives Considered

Option	Pros	Cons	Why rejected
Stay Pool-only	Simplest ops	Can't isolate, can't go on-prem	Loses enterprise/compliance customers
Go Silo-only	Max isolation	Highest cost, slow onboarding, expensive merges	Wrong for mass-market SMB POS
Pick one global tier now	Simple decision	Lock-in; migration tax paid later under pressure	Premature; needs are still being explored
Subdomain/host-based tenant routing	Common SaaS pattern	Requires reworking client URL + token model	Token-based resolution already works; no need
Postgres RLS instead of app filters	DB-enforced isolation	Large migration of every query path	Out of scope for this decision; tracked as open question

Open questions

Target market (SMB vs enterprise) - leaning hybrid, still exploring.
Compliance / data-residency / on-prem mandates - undetermined; would make SILO + portable Helm chart mandatory.
Is BRIDGE a permanent tier or only transitional to SILO?
Adopt RLS or keep application-level filtering for POOL?

Done when

[ ] Tenant Registry exists; every Org has an isolationTier (default POOL).
[ ] PostgresCoreDataSource resolves a connection per tenant context from the registry.
[ ] Snowflake worker IDs are centrally allocated (no two nodes share one).
[ ] A documented runbook exists for split (Pool→Silo) and merge (Silo→Pool).
[ ] Status promoted from Draft → Accepted once market & compliance questions are answered.

References

Multi-Tenancy Strategy (PRD)
packages/core/src/datasources/postgres-core.datasource.ts - the static pool to evolve
packages/core/src/utilities/request.utility.ts - current tenant-context extraction
Payment ADR-0002 - Snowflake NODE_ID partitioning precedent
AWS SaaS Lens - Pool / Bridge / Silo isolation patterns

Providers

Invoice Types

ADR-0001. Multi-tenancy isolation tiers (Pool / Bridge / Silo) ​

Context ​

Decision ​

Migration directionality ​

Consequences ​

Alternatives Considered ​

Open questions ​

Done when ​

References ​