Data Layer

The data layer runs on the stateful node pool in the VNPAY Cloud Kubernetes cluster. All workloads use the csi-sc-vnpaycloud StorageClass (Cinder CSI on OpenStack) with allowVolumeExpansion: true, meaning PVCs can be expanded online without downtime.

Architecture Overview

Component Summary

Component	Type	Instances	Image	Namespace	Storage
PostgreSQL	StatefulSet (primary + replica)	2 (1 primary + 1 replica)	`postgres:17-alpine`	`nx-persistent`	20Gi each
PgBouncer	Deployment	1	`ghcr.io/icoretech/pgbouncer-docker:1.24.1`	`nx-persistent`	-
Redis	StatefulSet (cluster mode)	3 masters, 0 replicas	`redis:8-alpine`	`nx-broker`	5Gi each
Kafka	StatefulSet (KRaft + SASL)	3 brokers	`apache/kafka:4.1.1`	`nx-broker`	10Gi each
Typesense	StatefulSet (Raft cluster)	3	`typesense/typesense:30.1`	`nx-search`	5Gi each
Debezium	Deployment	1	`debezium/connect:3.0.0.Final`	`nx-search`	-

PostgreSQL

Architecture: Primary + Replica with PgBouncer

PostgreSQL runs as two separate StatefulSets - one primary (read-write) and one replica (read-only, hot standby) - with a PgBouncer Deployment in front for connection pooling. The replica streams WAL from the primary using native PostgreSQL streaming replication.

Primary StatefulSet

The init container handles first-run database initialization: creates the nx_seller_operator app user, the nx_seller_core database, enables pg_stat_statements and pgcrypto extensions, and creates the repl_user for streaming replication.

StatefulSet: nx-postgresql-primary

yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nx-postgresql-primary
  namespace: nx-persistent
spec:
  serviceName: nx-postgresql-primary-headless
  replicas: 1
  template:
    spec:
      nodeSelector:
        node.kubernetes.io/pool: stateful
      securityContext:
        runAsNonRoot: true
        runAsUser: 70
        runAsGroup: 70
        fsGroup: 70
        seccompProfile:
          type: RuntimeDefault
      initContainers:
        - name: init-primary
          image: postgres:17-alpine
          # Creates: nx_seller_operator user, nx_seller_core DB,
          # pg_stat_statements + pgcrypto extensions, repl_user for replication,
          # pg_hba.conf entries for replication + SCRAM auth
      containers:
        - name: postgresql
          image: postgres:17-alpine
          ports:
            - containerPort: 5432
          args:
            - -c
            - wal_level=logical
            - -c
            - max_wal_senders=10
            - -c
            - max_replication_slots=10
            - -c
            - hot_standby=on
            - -c
            - shared_buffers=512MB
            - -c
            - effective_cache_size=1536MB
            - -c
            - work_mem=8MB
            - -c
            - maintenance_work_mem=128MB
            - -c
            - max_connections=100
            - -c
            - random_page_cost=1.1
            - -c
            - effective_io_concurrency=200
            - -c
            - log_statement=ddl
            - -c
            - log_min_duration_statement=1000
            - -c
            - password_encryption=scram-sha-256
            - -c
            - timezone=Asia/Ho_Chi_Minh
          resources:
            requests:
              cpu: 500m
              memory: 1Gi
            limits:
              cpu: "2"
              memory: 4Gi
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: [ReadWriteOnce]
        storageClassName: csi-sc-vnpaycloud
        resources:
          requests:
            storage: 20Gi

Replica StatefulSet

The replica init container uses pg_basebackup from the primary on first run, then enters hot standby mode. It streams WAL continuously from the primary for near-zero replication lag.

StatefulSet: nx-postgresql-replica

yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nx-postgresql-replica
  namespace: nx-persistent
spec:
  serviceName: nx-postgresql-replica-headless
  replicas: 1
  template:
    spec:
      initContainers:
        - name: init-replica
          image: postgres:17-alpine
          # pg_basebackup from primary on first run, standby.signal on restart
      containers:
        - name: postgresql
          image: postgres:17-alpine
          args:
            - -c
            - hot_standby=on
            - -c
            - hot_standby_feedback=on
            - -c
            - shared_buffers=512MB
            - -c
            - max_connections=100
          resources:
            requests:
              cpu: 250m
              memory: 512Mi
            limits:
              cpu: "1"
              memory: 2Gi
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        storageClassName: csi-sc-vnpaycloud
        resources:
          requests:
            storage: 20Gi

PgBouncer Connection Pooling

PgBouncer runs as a Deployment in front of the primary. Backend services connect to nx-pgbouncer:5432 (mapped to PgBouncer's internal port 6432). It runs in transaction pooling mode with a max_client_conn of 200 and default_pool_size of 20.

PgBouncer configuration (from ConfigMap)

yaml

# PgBouncer configuration (from ConfigMap)
[databases]
nx_seller_core = host=nx-postgresql-primary.nx-persistent.svc.cluster.local port=5432 dbname=nx_seller_core

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
pool_mode = transaction
max_client_conn = 200
default_pool_size = 20
min_pool_size = 5
reserve_pool_size = 5
max_db_connections = 50

The init container generates userlist.txt from Kubernetes secrets (app user + superuser). The service exposes port 5432 externally, mapping to 6432 internally - so backend services use the same port they would for a direct PostgreSQL connection.

Services

Service	DNS	Port	Purpose
`nx-pgbouncer`	`nx-pgbouncer.nx-persistent.svc.cluster.local`	5432	Pooled connection to primary (used by all backend services)
`nx-postgresql-primary`	`nx-postgresql-primary.nx-persistent.svc.cluster.local`	5432	Direct primary (used by PgBouncer, port-forward)
`nx-postgresql-replica`	`nx-postgresql-replica.nx-persistent.svc.cluster.local`	5432	Direct replica (read-only, future read splitting)
`nx-postgresql-primary-headless`	(pod DNS)	5432	StatefulSet stable DNS
`nx-postgresql-replica-headless`	(pod DNS)	5432	StatefulSet stable DNS

Default Connection Path

Backend services connect to nx-pgbouncer.nx-persistent.svc.cluster.local:5432 (pooled). This is configured in shared-config.yaml as APP_ENV_POSTGRES_HOST. Direct primary/replica services exist for admin access and future read/write splitting.

Users & Secrets

User	Purpose	Secret
`postgres`	Superuser	`nx-postgresql-superuser-secret`
`nx_seller_operator`	Application user (CREATEDB)	`nx-postgresql-app-secret`
`repl_user`	Streaming replication	`nx-postgresql-replication-secret`

Database Schemas

BANA uses 14 schemas in the nx_seller_core database:

Schema	Services
`public`	identity, commerce (shared tables)
`helpdesk`	helpdesk
`pricing`	pricing
`allocation`	commerce (stock allocation)
`inventory`	inventory
`finance`	finance
`ledger`	ledger
`payment`	payment
`sale`	sale
`invoice`	invoice
`identity`	identity
`tax`	taxation
`licensing`	licensing
`outreach`	outreach

Port-Forward for Local Access

bash

# From infrastructure/deployments/staging/ directory:
./kc port-forward svc/nx-postgresql-primary 5432:5432 -n nx-persistent

# From project root:
infrastructure/deployments/staging/kc port-forward svc/nx-postgresql-primary 5432:5432 -n nx-persistent

# Then connect locally:
psql -h localhost -U nx_seller_operator -d nx_seller_core

Redis

Cluster Mode (3 Masters, 0 Replicas)

Redis runs as a 3-node cluster with 16384 hash slots distributed across 3 master nodes. No replicas are configured (staging does not need per-master redundancy). The cluster provides data sharding across nodes.

StatefulSet

StatefulSet: nx-redis

yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nx-redis
  namespace: nx-broker
spec:
  serviceName: nx-redis-headless
  replicas: 3
  template:
    spec:
      nodeSelector:
        node.kubernetes.io/pool: stateful
      securityContext:
        runAsNonRoot: true
        runAsUser: 999
        runAsGroup: 999
        fsGroup: 999
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: redis
          image: redis:8-alpine
          command:
            - redis-server
            - --requirepass
            - $(REDIS_PASSWORD)
            - --masterauth
            - $(REDIS_PASSWORD)
            - --cluster-enabled
            - "yes"
            - --cluster-config-file
            - /data/nodes.conf
            - --cluster-node-timeout
            - "5000"
            - --appendonly
            - "yes"
            - --appendfsync
            - everysec
            - --maxmemory
            - 256mb
            - --maxmemory-policy
            - noeviction
          ports:
            - containerPort: 6379
              name: redis
            - containerPort: 16379
              name: cluster-bus
          env:
            - name: REDIS_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: nx-redis-secret
                  key: REDIS_PASSWORD
          resources:
            requests:
              cpu: 100m
              memory: 384Mi
            limits:
              cpu: "1"
              memory: 512Mi
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: [ReadWriteOnce]
        storageClassName: csi-sc-vnpaycloud
        resources:
          requests:
            storage: 5Gi

Cluster Initialization

After all 3 pods are running, a one-time Job creates the cluster:

Job: nx-redis-cluster-init

yaml

apiVersion: batch/v1
kind: Job
metadata:
  name: nx-redis-cluster-init
  namespace: nx-broker
spec:
  ttlSecondsAfterFinished: 600
  backoffLimit: 3
  template:
    spec:
      restartPolicy: OnFailure
      nodeSelector:
        node.kubernetes.io/pool: default
      securityContext:
        runAsNonRoot: true
        runAsUser: 999
        runAsGroup: 999
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: cluster-init
          image: redis:8-alpine
          command:
            - sh
            - -c
            - |
              set -e
              export REDISCLI_AUTH="$REDIS_PASSWORD"
              HEADLESS="nx-redis-headless.nx-broker.svc.cluster.local"
              NODES=""
              for i in 0 1 2; do
                HOST="nx-redis-${i}.${HEADLESS}"
                until [ "$(redis-cli -h "$HOST" ping 2>/dev/null)" = "PONG" ]; do sleep 2; done
                NODES="${NODES} ${HOST}:6379"
              done
              echo "yes" | redis-cli --cluster create ${NODES} --cluster-replicas 0

PodSecurity Restricted

The init job must have runAsNonRoot: true, seccompProfile: RuntimeDefault, and capabilities.drop: [ALL] to satisfy the restricted PodSecurity Standard enforced on all namespaces.

Services

Service	DNS	Port	Purpose
`nx-redis-headless`	`nx-redis-headless.nx-broker.svc.cluster.local`	6379, 16379	Headless - pod-level DNS for cluster nodes
`nx-redis`	`nx-redis.nx-broker.svc.cluster.local`	6379, 16379	ClusterIP - load-balanced (initial discovery)

Individual pod addresses used by backend services:

nx-redis-0.nx-redis-headless.nx-broker.svc.cluster.local:6379
nx-redis-1.nx-redis-headless.nx-broker.svc.cluster.local:6379
nx-redis-2.nx-redis-headless.nx-broker.svc.cluster.local:6379

Backend Connection Config

Backend services connect in cluster mode via shared-config.yaml:

Cache

yaml

# Cache
APP_ENV_CACHE_REDIS_MODE: cluster
APP_ENV_CACHE_REDIS_CLUSTER_NODES: "nx-redis-0.nx-redis-headless.nx-broker.svc.cluster.local:6379,nx-redis-1.nx-redis-headless.nx-broker.svc.cluster.local:6379,nx-redis-2.nx-redis-headless.nx-broker.svc.cluster.local:6379"

# BullMQ
APP_ENV_BULLMQ_REDIS_MODE: cluster
APP_ENV_BULLMQ_REDIS_CLUSTER_NODES: "nx-redis-0....:6379,nx-redis-1....:6379,nx-redis-2....:6379"

# WebSocket
APP_ENV_WEBSOCKET_REDIS_MODE: cluster
APP_ENV_WEBSOCKET_REDIS_CLUSTER_NODES: "nx-redis-0....:6379,nx-redis-1....:6379,nx-redis-2....:6379"

Redis Cluster vs. Logical Databases

Redis Cluster mode does not support SELECT \<db\> (logical databases). All data lives in DB 0, partitioned by hash slots across the 3 masters. Key prefixes (e.g., cache:, bullmq:, ws:) are used instead of separate databases.

Kafka

KRaft StatefulSet with SASL/SCRAM-SHA-512

Apache Kafka 4.1.1 runs in KRaft mode (no ZooKeeper) with 3 combined broker+controller nodes. Backend services authenticate via SASL/SCRAM-SHA-512 on the CLIENT listener.

Listeners

Listener	Port	Protocol	Purpose
`INTERNAL`	29092	PLAINTEXT	Inter-broker communication (secured by NetworkPolicy)
`CLIENT`	9092	SASL_PLAINTEXT (SCRAM-SHA-512)	Backend services connect here
`CONTROLLER`	9093	PLAINTEXT	KRaft quorum consensus

StatefulSet

The init container generates server.properties dynamically (setting node.id from the pod ordinal), removes lost+found from ext4 PVCs (Kafka rejects it), and runs kafka-storage.sh format on first boot.

StatefulSet: nx-kafka

yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nx-kafka
  namespace: nx-broker
spec:
  serviceName: nx-kafka-headless
  replicas: 3
  podManagementPolicy: Parallel
  template:
    spec:
      nodeSelector:
        node.kubernetes.io/pool: stateful
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      initContainers:
        - name: kafka-init-config
          image: apache/kafka:4.1.1
          # Generates server.properties, removes lost+found,
          # runs kafka-storage.sh format on first run
      containers:
        - name: kafka
          image: apache/kafka:4.1.1
          command:
            - /opt/kafka/bin/kafka-server-start.sh
            - /config/server.properties
          env:
            - name: KAFKA_OPTS
              value: "-Djava.security.auth.login.config=/etc/kafka-jaas/kafka_server_jaas.conf"
          ports:
            - containerPort: 29092
              name: internal
            - containerPort: 9092
              name: client
            - containerPort: 9093
              name: controller
          resources:
            requests:
              cpu: 500m
              memory: 1280Mi
            limits:
              cpu: "1.5"
              memory: 2Gi
          volumeMounts:
            - name: data
              mountPath: /var/lib/kafka/data
            - name: config
              mountPath: /config
            - name: jaas
              mountPath: /etc/kafka-jaas
              readOnly: true
      volumes:
        - name: config
          emptyDir: { sizeLimit: 8Mi }
        - name: jaas
          secret:
            secretName: nx-kafka-jaas
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: [ReadWriteOnce]
        storageClassName: csi-sc-vnpaycloud
        resources:
          requests:
            storage: 10Gi

Key Configuration (from generated `server.properties`)

properties

# Replication
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
default.replication.factor=3
min.insync.replicas=2
num.partitions=3

# Retention
log.retention.hours=168          # 7 days
log.retention.bytes=1073741824   # 1 GB per partition
log.segment.bytes=1073741824

# Auto-create topics
auto.create.topics.enable=false

SASL/SCRAM User Initialization

A one-time Job creates the SCRAM-SHA-512 user nx.staging on the INTERNAL listener (which is PLAINTEXT and does not require SASL itself):

Job: nx-kafka-scram-init

yaml

apiVersion: batch/v1
kind: Job
metadata:
  name: nx-kafka-scram-init
  namespace: nx-broker
spec:
  template:
    spec:
      containers:
        - name: scram-init
          image: apache/kafka:4.1.1
          command:
            - sh
            - -c
            - |
              BOOTSTRAP="nx-kafka-0.nx-kafka-headless.nx-broker.svc.cluster.local:29092"
              # Wait for Kafka, then:
              /opt/kafka/bin/kafka-configs.sh \
                --bootstrap-server "$BOOTSTRAP" \
                --alter \
                --add-config "SCRAM-SHA-512=[iterations=8192,password=${KAFKA_SASL_PASSWORD}]" \
                --entity-type users \
                --entity-name nx.staging

The JAAS config is stored in Secret nx-kafka-jaas and mounted at /etc/kafka-jaas/kafka_server_jaas.conf. It is loaded via KAFKA_OPTS JVM argument.

Services

Service	DNS	Ports	Notes
`nx-kafka-headless`	`nx-kafka-headless.nx-broker.svc.cluster.local`	29092, 9092, 9093	Headless, `publishNotReadyAddresses: true` (critical for KRaft bootstrap)
`nx-kafka`	`nx-kafka.nx-broker.svc.cluster.local`	29092, 9092	ClusterIP - load-balanced

publishNotReadyAddresses

The headless service must have publishNotReadyAddresses: true. Without it, pods cannot discover each other during initial KRaft bootstrap because they are not yet "ready".

Backend Connection Config

From shared-config.yaml

yaml

# From shared-config.yaml
APP_ENV_KAFKA_BROKERS: nx-kafka-0.nx-kafka-headless.nx-broker.svc.cluster.local:9092,nx-kafka-1.nx-kafka-headless.nx-broker.svc.cluster.local:9092,nx-kafka-2.nx-kafka-headless.nx-broker.svc.cluster.local:9092
APP_ENV_KAFKA_SASL_ENABLE: "true"
APP_ENV_KAFKA_SASL_MECHANISM: SCRAM-SHA-512
APP_ENV_KAFKA_SASL_USERNAME: nx.staging
# Password in nx-shared-secret: APP_ENV_KAFKA_SASL_PASSWORD

Typesense

3-Node Raft Cluster

Typesense runs as a 3-node Raft cluster for high availability. Writes go to the elected leader and replicate to followers; reads are served by any node. The cluster tolerates 1 node down (quorum = 2/3).

Typesense holds all data in memory and replicates the full dataset to every node - it does not shard, so each node must fit the whole index in RAM. The index is derived: if lost entirely it re-indexes from PostgreSQL via CDC, so the cluster is for availability, not durability.

Property	Value
Nodes	3 (1 leader + 2 followers)
Failure tolerance	1 node (quorum 2/3)
API port	8108
Peering port (Raft)	8107 - internal only, never published
Peer discovery	`nodes` file, one `<peering-host>:8107:8108` per node
Recovery	self-heal (leader re-streams to a blank node) or re-index from PostgreSQL

Peer discovery uses a nodes file mounted from a ConfigMap (nx-typesense-nodes) listing all pod FQDNs. The governing Service is headless (clusterIP: None, publishNotReadyAddresses: true) so each pod gets a stable DNS name for Raft peering before it is Ready. --reset-peers-on-error lets the cluster self-heal when pod IPs change. Pods spread across hosts via required pod anti-affinity (needs ≥3 stateful nodes).

StatefulSet: nx-typesense (3 replicas)

yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nx-typesense
  namespace: nx-search
spec:
  serviceName: nx-typesense          # headless governing service
  replicas: 3
  podManagementPolicy: Parallel
  template:
    spec:
      nodeSelector:
        node.kubernetes.io/pool: stateful
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app.kubernetes.io/name: typesense
              topologyKey: kubernetes.io/hostname
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: typesense
          image: typesense/typesense:30.1
          args:
            - --data-dir=/data
            - --api-port=8108
            - --peering-port=8107
            - --nodes=/etc/typesense/nodes
            - --api-key=$(TYPESENSE_API_KEY)
            - --enable-cors
            - --reset-peers-on-error
          ports:
            - containerPort: 8108
              name: http
            - containerPort: 8107
              name: peering
          env:
            - name: TYPESENSE_API_KEY
              valueFrom:
                secretKeyRef:
                  name: nx-typesense-secret
                  key: TYPESENSE_API_KEY
          resources:
            requests:
              cpu: 200m
              memory: 512Mi
            limits:
              cpu: "1"
              memory: 2Gi
          volumeMounts:
            - name: data
              mountPath: /data
            - name: nodes
              mountPath: /etc/typesense
              readOnly: true
      volumes:
        - name: nodes
          configMap:
            name: nx-typesense-nodes
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: [ReadWriteOnce]
        storageClassName: csi-sc-vnpaycloud
        resources:
          requests:
            storage: 5Gi

The nx-typesense-nodes ConfigMap holds the Raft peer list (one line, comma-separated):

nx-typesense-0.nx-typesense.nx-search.svc.cluster.local:8107:8108,nx-typesense-1.nx-typesense.nx-search.svc.cluster.local:8107:8108,nx-typesense-2.nx-typesense.nx-search.svc.cluster.local:8107:8108

Service

Service	DNS	Ports	Notes
`nx-typesense` (headless)	`nx-typesense-{0,1,2}.nx-typesense.nx-search.svc.cluster.local`	8108 (http), 8107 (peering)	Governs the StatefulSet; gives stable per-pod DNS

Client configuration

Clients list all three pod FQDNs so the Typesense client load-balances and fails over automatically:

bash

APP_ENV_TYPESENSE_NODES="http:nx-typesense-0.nx-typesense.nx-search.svc.cluster.local:8108,http:nx-typesense-1.nx-typesense.nx-search.svc.cluster.local:8108,http:nx-typesense-2.nx-typesense.nx-search.svc.cluster.local:8108"

Develop (Docker Compose)

Develop mirrors the same topology as a 3-node Docker Compose cluster (nx-typesense-1/2/3, host ports 38118/38128/38138, peering 8107, shared nodes file by absolute mount, image typesense/typesense:31.0.rc3). It runs on a single host, so it provides Raft replication and rolling restart but not host-failure HA. Verify quorum: curl -s localhost:38118/debug -H 'x-typesense-api-key: <key>' on each node → exactly one {"state":1} (leader) + two {"state":4} (followers).

Debezium

CDC Connector

Debezium runs as a Kafka Connect instance in nx-search, streaming change data capture (CDC) events from PostgreSQL to Kafka topics. This enables real-time search index updates in Typesense.

Deployment

Property	Value
Image	`debezium/connect:3.0.0.Final`
Namespace	`nx-search`
Replicas	1
Port	8083

The nx-debezium-init Job registers the PostgreSQL source connector after the Debezium instance is ready.

Service

Service	DNS	Port
`nx-debezium`	`nx-debezium.nx-search.svc.cluster.local`	8083

Network Policies

All namespaces have default-deny-all (both ingress and egress) with explicit allow rules.

Deny Defaults

Applied to: nx-backend, nx-app, nx-persistent, nx-broker, nx-search, nx-internal.

NetworkPolicy: default-deny-all

yaml

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

Allow Rules

Policy	Namespace	Direction	From/To	Ports
`allow-dns`	All 6 namespaces	Egress	`kube-system`	53 (UDP+TCP)
`allow-egress-to-postgresql`	`nx-backend`	Egress	`nx-persistent`	5432
`allow-egress-to-broker`	`nx-backend`	Egress	`nx-broker`	6379, 16379, 9092, 29092
`allow-egress-to-search`	`nx-backend`	Egress	`nx-search`	8108
`allow-egress-to-backend`	`nx-backend`	Egress	`nx-backend`	3000 (inter-service)
`allow-egress-to-s3`	`nx-backend`	Egress	External (0.0.0.0/0 excl. private)	443 (commerce, ledger only)
`allow-from-backend`	`nx-broker`	Ingress	`nx-backend` + `nx-broker`	6379, 16379, 9092, 29092, 9093
`allow-broker-internal`	`nx-broker`	Egress	`nx-broker`	6379, 16379, 9092, 29092, 9093
`allow-pg-replication`	`nx-persistent`	Both	PG pods within `nx-persistent`	5432
`allow-pgbouncer-to-primary`	`nx-persistent`	Egress	PgBouncer → PG primary	5432
`allow-pgbouncer-from-backend`	`nx-persistent`	Ingress	`nx-backend` → PgBouncer	6432
`allow-from-backend`	`nx-persistent`	Ingress	`nx-backend`	5432, 6432
`allow-from-backend`	`nx-search`	Ingress	`nx-backend`	8108
`allow-search-internal`	`nx-search`	Both	Within `nx-search`	8083, 8108
`egress-nx-search`	`nx-search`	Egress	`nx-persistent`, `nx-broker`	5432, 9092

Backup & Recovery

Service	Method	Frequency	Retention	Recovery
PostgreSQL	Streaming replication (replica)	Continuous WAL	Online replica	Promote replica
Redis	AOF persistence (`appendonly yes`, `appendfsync everysec`)	Continuous	On-disk	Restart from AOF
Kafka	Log retention	Automatic	7 days / 1GB per partition	Replay from retained logs
Typesense	Re-index from PostgreSQL	On-demand	N/A	Re-index

Staging Data Strategy

Staging data can be recreated from migration seeds. The primary+replica setup provides read availability and a warm standby, but there is no off-cluster backup to S3 at the staging level. For production-grade backup (CNPG with barman, S3 WAL archiving, PITR), see the Production section below.

Volume Expansion

All PVCs use StorageClass csi-sc-vnpaycloud which has allowVolumeExpansion: true.

For StatefulSets (Redis, Kafka, Typesense, PostgreSQL)

bash

# Example: expand Redis PVC from 5Gi to 10Gi
# From infrastructure/deployments/staging/ directory:
./kc patch pvc data-nx-redis-0 -n nx-broker -p '{"spec":{"resources":{"requests":{"storage":"10Gi"}}}}'
./kc patch pvc data-nx-redis-1 -n nx-broker -p '{"spec":{"resources":{"requests":{"storage":"10Gi"}}}}'
./kc patch pvc data-nx-redis-2 -n nx-broker -p '{"spec":{"resources":{"requests":{"storage":"10Gi"}}}}'

# From project root:
infrastructure/deployments/staging/kc patch pvc data-nx-redis-0 -n nx-broker -p '{"spec":{"resources":{"requests":{"storage":"10Gi"}}}}'

The Cinder CSI driver supports online expansion - no pod restart needed.

Resource Summary

Staging (stateful node pool)

Workload	Pods	CPU req	CPU lim	Mem req	Mem lim	Storage
PG Primary	1	500m	2	1Gi	4Gi	20Gi
PG Replica	1	250m	1	512Mi	2Gi	20Gi
PgBouncer	1	100m	500m	256Mi	512Mi	-
Redis (x3)	3	300m	3	1.125Gi	1.5Gi	15Gi
Kafka (x3)	3	1.5	4.5	3.75Gi	6Gi	30Gi
Typesense (x3)	3	600m	3	1.5Gi	6Gi	15Gi
Debezium	1	250m	1	512Mi	2Gi	-
Total	13	3.5	15	8.67Gi	22Gi	100Gi

Production (Planned)

Not Yet Deployed

The production data layer is planned but not yet deployed. The following describes the target architecture. Actual manifests will be created when production infrastructure is provisioned.

Component	Planned Production Setup
PostgreSQL	CloudNativePG operator (1 primary + 2 replicas), PgBouncer sidecar, continuous WAL archiving to S3, 30d retention, PITR
Redis	Redis Sentinel or Cluster with replicas (1 master + 2 replicas per shard)
Kafka	Strimzi operator (3 brokers, rack-aware, declarative KafkaTopic/KafkaUser CRDs, Cruise Control)
Typesense	3-node Raft cluster with automatic leader election
Failover	Automatic: <30s for PG (CNPG), <15s for Redis (Sentinel), built-in for Kafka (ISR)
Backups	Continuous WAL + daily base to S3 (PG), AOF + RDB to S3 (Redis), log retention + replication (Kafka)

Providers

Invoice Types

Data Layer ​

Architecture Overview ​

Component Summary ​

PostgreSQL ​

Architecture: Primary + Replica with PgBouncer ​

Primary StatefulSet ​

Replica StatefulSet ​

PgBouncer Connection Pooling ​

Services ​

Users & Secrets ​

Database Schemas ​

Port-Forward for Local Access ​

Redis ​

Cluster Mode (3 Masters, 0 Replicas) ​

StatefulSet ​

Cluster Initialization ​

Services ​

Backend Connection Config ​

Kafka ​

KRaft StatefulSet with SASL/SCRAM-SHA-512 ​

Listeners ​

StatefulSet ​

Key Configuration (from generated server.properties) ​

SASL/SCRAM User Initialization ​

Services ​

Backend Connection Config ​

Typesense ​

3-Node Raft Cluster ​

Service ​

Client configuration ​

Develop (Docker Compose) ​

Debezium ​

CDC Connector ​

Deployment ​

Service ​

Network Policies ​

Deny Defaults ​

Allow Rules ​

Backup & Recovery ​

Volume Expansion ​

For StatefulSets (Redis, Kafka, Typesense, PostgreSQL) ​

Resource Summary ​

Staging (stateful node pool) ​

Production (Planned) ​

Data Layer

Architecture Overview

Component Summary

PostgreSQL

Architecture: Primary + Replica with PgBouncer

Primary StatefulSet

Replica StatefulSet

PgBouncer Connection Pooling

Services

Users & Secrets

Database Schemas

Port-Forward for Local Access

Redis

Cluster Mode (3 Masters, 0 Replicas)

StatefulSet

Cluster Initialization

Services

Backend Connection Config

Kafka

KRaft StatefulSet with SASL/SCRAM-SHA-512

Listeners

StatefulSet

Key Configuration (from generated `server.properties`)

SASL/SCRAM User Initialization

Services

Backend Connection Config

Typesense

3-Node Raft Cluster

Service

Client configuration

Develop (Docker Compose)

Debezium

CDC Connector

Deployment

Service

Network Policies

Deny Defaults

Allow Rules

Backup & Recovery

Volume Expansion

For StatefulSets (Redis, Kafka, Typesense, PostgreSQL)

Resource Summary

Staging (stateful node pool)

Production (Planned)