Skip to content

ADR-0001. Multi-tenancy isolation tiers (Pool / Bridge / Silo)

FieldValue
StatusDraft (Proposed)
Date2026-05-22
DecidersPhat Nguyen, Architecture
ScopeCross-cutting — @nx/core datasource, all services, deployment
Supersedes
Product contextMulti-Tenancy Strategy (PRD)

Context

Problem

  • BANA runs a single shared database (nx_seller) with a single static connection pool per service (PostgresCoreDataSource calls new Pool() once). Tenants are separated only by merchantId / organizerId filters applied in repository queries — i.e. the Pool model.
  • We need to support stronger isolation (own DB, own stack) for some Orgs while keeping Pool cheap for the majority, and be able to move an Org between models — including the hard direction, Silo → Pool consolidation.

Trigger

  • Long-term deployment & operations planning. Tenant count and contract diversity are growing; committing to a single global model now would be expensive to undo later.

Current state (AS-IS)

AspectToday
DB isolationOne shared nx_seller, one static pool/service
Tenant columnmerchantId (primary), organizerId (parent) on ~52 tables
Isolation enforcementApplication-level query filters; no RLS, no schema-per-tenant
Tenant resolutionJWT claims (organizers[], merchants[]) → request context filter
RoutingToken-based; not subdomain/host-based
IDsSnowflake (globally unique) via IdGenerator
DeploymentK8s on VNPAY Cloud (Kustomize, separate staging/prod clusters); Traefik gateway

Decision

Adopt a hybrid, tiered model where isolation level is a per-Org runtime attribute, not a platform-wide constant. The isolation unit is the Organizer.

TierServicesDatabaseDefault for
POOLSharedShared nx_seller, organizerId filterAll Orgs (default)
BRIDGESharedOne DB per OrgOrgs needing data isolation
SILODedicated stackOne DB per OrgEnterprise / on-prem

Three platform mechanisms make this possible — all land in @nx/core, not in business code:

  1. Tenant Registry — table orgId → { isolationTier, datasourceRef }. Every Org defaults to POOL.
  2. Connection Resolver — evolve PostgresCoreDataSource from one static pool into a per-tenant pool resolver that reads the registry and caches connections. This is the single blocking change; everything else builds on it.
  3. Snowflake WorkerId allocator — centrally assign worker/node IDs so independently-running silos never mint colliding IDs (formalizes the existing APP_ENV_NODE_ID convention, see payment ADR-0002).

Migration directionality

DirectionDifficultyMechanism
POOL → SILO (extract)EasyFiltered copy by organizerId (logical replication / pg_dump --where), cutover, flip registry
SILO → POOL (merge)Hard but enabledImport preserving Snowflake IDs, verify no orphan FKs, flip registry, lock old silo

Why merge is safe here: Snowflake IDs are globally unique, so importing a silo's rows into shared tables causes no PK collision — the property that makes auto-increment-based merges nearly impossible. Per-merchant voucher sequences are scoped by merchantId, so human-readable numbers don't collide across Orgs either.

Consequences

ProsCons
Isolation becomes a per-Org dial, not a rewriteConnection Resolver adds complexity to the data layer hot path
New Orgs onboard instantly (POOL default)Need a tenant-aware connection cache (lifecycle, eviction)
Snowflake IDs make Silo→Pool merge feasibleWorkerId allocation must be centrally governed or merges break
Business/repository code is unchanged across tiersSchema migrations must fan out across N databases (Bridge/Silo)
Fits existing K8s/Kustomize + Traefik topologyBridge/Silo provisioning needs automation before they scale

Alternatives Considered

OptionProsConsWhy rejected
Stay Pool-onlySimplest opsCan't isolate, can't go on-premLoses enterprise/compliance customers
Go Silo-onlyMax isolationHighest cost, slow onboarding, expensive mergesWrong for mass-market SMB POS
Pick one global tier nowSimple decisionLock-in; migration tax paid later under pressurePremature; needs are still being explored
Subdomain/host-based tenant routingCommon SaaS patternRequires reworking client URL + token modelToken-based resolution already works; no need
Postgres RLS instead of app filtersDB-enforced isolationLarge migration of every query pathOut of scope for this decision; tracked as open question

Open questions

  • Target market (SMB vs enterprise) — leaning hybrid, still exploring.
  • Compliance / data-residency / on-prem mandates — undetermined; would make SILO + portable Helm chart mandatory.
  • Is BRIDGE a permanent tier or only transitional to SILO?
  • Adopt RLS or keep application-level filtering for POOL?

Done when

  • [ ] Tenant Registry exists; every Org has an isolationTier (default POOL).
  • [ ] PostgresCoreDataSource resolves a connection per tenant context from the registry.
  • [ ] Snowflake worker IDs are centrally allocated (no two nodes share one).
  • [ ] A documented runbook exists for split (Pool→Silo) and merge (Silo→Pool).
  • [ ] Status promoted from Draft → Accepted once market & compliance questions are answered.

References

  • Multi-Tenancy Strategy (PRD)
  • packages/core/src/datasources/postgres-core.datasource.ts — the static pool to evolve
  • packages/core/src/utilities/request.utility.ts — current tenant-context extraction
  • Payment ADR-0002 — Snowflake NODE_ID partitioning precedent
  • AWS SaaS Lens — Pool / Bridge / Silo isolation patterns

Proprietary and Confidential. Unauthorized copying, distribution, or use of this software is strictly prohibited.