ADR-0001. Multi-tenancy isolation tiers (Pool / Bridge / Silo)
| Field | Value |
|---|---|
| Status | Draft (Proposed) |
| Date | 2026-05-22 |
| Deciders | Phat Nguyen, Architecture |
| Scope | Cross-cutting — @nx/core datasource, all services, deployment |
| Supersedes | — |
| Product context | Multi-Tenancy Strategy (PRD) |
Context
Problem
- BANA runs a single shared database (
nx_seller) with a single static connection pool per service (PostgresCoreDataSourcecallsnew Pool()once). Tenants are separated only bymerchantId/organizerIdfilters applied in repository queries — i.e. the Pool model. - We need to support stronger isolation (own DB, own stack) for some Orgs while keeping Pool cheap for the majority, and be able to move an Org between models — including the hard direction, Silo → Pool consolidation.
Trigger
- Long-term deployment & operations planning. Tenant count and contract diversity are growing; committing to a single global model now would be expensive to undo later.
Current state (AS-IS)
| Aspect | Today |
|---|---|
| DB isolation | One shared nx_seller, one static pool/service |
| Tenant column | merchantId (primary), organizerId (parent) on ~52 tables |
| Isolation enforcement | Application-level query filters; no RLS, no schema-per-tenant |
| Tenant resolution | JWT claims (organizers[], merchants[]) → request context filter |
| Routing | Token-based; not subdomain/host-based |
| IDs | Snowflake (globally unique) via IdGenerator |
| Deployment | K8s on VNPAY Cloud (Kustomize, separate staging/prod clusters); Traefik gateway |
Decision
Adopt a hybrid, tiered model where isolation level is a per-Org runtime attribute, not a platform-wide constant. The isolation unit is the Organizer.
| Tier | Services | Database | Default for |
|---|---|---|---|
| POOL | Shared | Shared nx_seller, organizerId filter | All Orgs (default) |
| BRIDGE | Shared | One DB per Org | Orgs needing data isolation |
| SILO | Dedicated stack | One DB per Org | Enterprise / on-prem |
Three platform mechanisms make this possible — all land in @nx/core, not in business code:
- Tenant Registry — table
orgId → { isolationTier, datasourceRef }. Every Org defaults toPOOL. - Connection Resolver — evolve
PostgresCoreDataSourcefrom one static pool into a per-tenant pool resolver that reads the registry and caches connections. This is the single blocking change; everything else builds on it. - Snowflake WorkerId allocator — centrally assign worker/node IDs so independently-running silos never mint colliding IDs (formalizes the existing
APP_ENV_NODE_IDconvention, see payment ADR-0002).
Migration directionality
| Direction | Difficulty | Mechanism |
|---|---|---|
| POOL → SILO (extract) | Easy | Filtered copy by organizerId (logical replication / pg_dump --where), cutover, flip registry |
| SILO → POOL (merge) | Hard but enabled | Import preserving Snowflake IDs, verify no orphan FKs, flip registry, lock old silo |
Why merge is safe here: Snowflake IDs are globally unique, so importing a silo's rows into shared tables causes no PK collision — the property that makes auto-increment-based merges nearly impossible. Per-merchant voucher sequences are scoped by merchantId, so human-readable numbers don't collide across Orgs either.
Consequences
| Pros | Cons |
|---|---|
| Isolation becomes a per-Org dial, not a rewrite | Connection Resolver adds complexity to the data layer hot path |
| New Orgs onboard instantly (POOL default) | Need a tenant-aware connection cache (lifecycle, eviction) |
| Snowflake IDs make Silo→Pool merge feasible | WorkerId allocation must be centrally governed or merges break |
| Business/repository code is unchanged across tiers | Schema migrations must fan out across N databases (Bridge/Silo) |
| Fits existing K8s/Kustomize + Traefik topology | Bridge/Silo provisioning needs automation before they scale |
Alternatives Considered
| Option | Pros | Cons | Why rejected |
|---|---|---|---|
| Stay Pool-only | Simplest ops | Can't isolate, can't go on-prem | Loses enterprise/compliance customers |
| Go Silo-only | Max isolation | Highest cost, slow onboarding, expensive merges | Wrong for mass-market SMB POS |
| Pick one global tier now | Simple decision | Lock-in; migration tax paid later under pressure | Premature; needs are still being explored |
| Subdomain/host-based tenant routing | Common SaaS pattern | Requires reworking client URL + token model | Token-based resolution already works; no need |
| Postgres RLS instead of app filters | DB-enforced isolation | Large migration of every query path | Out of scope for this decision; tracked as open question |
Open questions
- Target market (SMB vs enterprise) — leaning hybrid, still exploring.
- Compliance / data-residency / on-prem mandates — undetermined; would make SILO + portable Helm chart mandatory.
- Is BRIDGE a permanent tier or only transitional to SILO?
- Adopt RLS or keep application-level filtering for POOL?
Done when
- [ ] Tenant Registry exists; every Org has an
isolationTier(defaultPOOL). - [ ]
PostgresCoreDataSourceresolves a connection per tenant context from the registry. - [ ] Snowflake worker IDs are centrally allocated (no two nodes share one).
- [ ] A documented runbook exists for split (Pool→Silo) and merge (Silo→Pool).
- [ ] Status promoted from Draft → Accepted once market & compliance questions are answered.
References
- Multi-Tenancy Strategy (PRD)
packages/core/src/datasources/postgres-core.datasource.ts— the static pool to evolvepackages/core/src/utilities/request.utility.ts— current tenant-context extraction- Payment ADR-0002 — Snowflake
NODE_IDpartitioning precedent - AWS SaaS Lens — Pool / Bridge / Silo isolation patterns