Skip to content

Data Layer

The data layer runs on the stateful node pool in the VNPAY Cloud Kubernetes cluster. All workloads use the csi-sc-vnpaycloud StorageClass (Cinder CSI on OpenStack) with allowVolumeExpansion: true, meaning PVCs can be expanded online without downtime.

Architecture Overview

Component Summary

ComponentTypeInstancesImageNamespaceStorage
PostgreSQLStatefulSet (primary + replica)2 (1 primary + 1 replica)postgres:17-alpinenx-persistent20Gi each
PgBouncerDeployment1ghcr.io/icoretech/pgbouncer-docker:1.24.1nx-persistent-
RedisStatefulSet (cluster mode)3 masters, 0 replicasredis:7-alpinenx-broker5Gi each
KafkaStatefulSet (KRaft + SASL)3 brokersapache/kafka:4.1.1nx-broker10Gi each
TypesenseStatefulSet1typesense/typesense:27.1nx-search5Gi
DebeziumDeployment1debezium/connect:3.0.0.Finalnx-search-

PostgreSQL

Architecture: Primary + Replica with PgBouncer

PostgreSQL runs as two separate StatefulSets — one primary (read-write) and one replica (read-only, hot standby) — with a PgBouncer Deployment in front for connection pooling. The replica streams WAL from the primary using native PostgreSQL streaming replication.

Primary StatefulSet

The init container handles first-run database initialization: creates the nx_seller_operator app user, the nx_seller_core database, enables pg_stat_statements and pgcrypto extensions, and creates the repl_user for streaming replication.

StatefulSet: nx-postgresql-primary
yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nx-postgresql-primary
  namespace: nx-persistent
spec:
  serviceName: nx-postgresql-primary-headless
  replicas: 1
  template:
    spec:
      nodeSelector:
        node.kubernetes.io/pool: stateful
      securityContext:
        runAsNonRoot: true
        runAsUser: 70
        runAsGroup: 70
        fsGroup: 70
        seccompProfile:
          type: RuntimeDefault
      initContainers:
        - name: init-primary
          image: postgres:17-alpine
          # Creates: nx_seller_operator user, nx_seller_core DB,
          # pg_stat_statements + pgcrypto extensions, repl_user for replication,
          # pg_hba.conf entries for replication + SCRAM auth
      containers:
        - name: postgresql
          image: postgres:17-alpine
          ports:
            - containerPort: 5432
          args:
            - -c
            - wal_level=logical
            - -c
            - max_wal_senders=10
            - -c
            - max_replication_slots=10
            - -c
            - hot_standby=on
            - -c
            - shared_buffers=512MB
            - -c
            - effective_cache_size=1536MB
            - -c
            - work_mem=8MB
            - -c
            - maintenance_work_mem=128MB
            - -c
            - max_connections=100
            - -c
            - random_page_cost=1.1
            - -c
            - effective_io_concurrency=200
            - -c
            - log_statement=ddl
            - -c
            - log_min_duration_statement=1000
            - -c
            - password_encryption=scram-sha-256
            - -c
            - timezone=Asia/Ho_Chi_Minh
          resources:
            requests:
              cpu: 500m
              memory: 1Gi
            limits:
              cpu: "2"
              memory: 4Gi
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: [ReadWriteOnce]
        storageClassName: csi-sc-vnpaycloud
        resources:
          requests:
            storage: 20Gi

Replica StatefulSet

The replica init container uses pg_basebackup from the primary on first run, then enters hot standby mode. It streams WAL continuously from the primary for near-zero replication lag.

StatefulSet: nx-postgresql-replica
yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nx-postgresql-replica
  namespace: nx-persistent
spec:
  serviceName: nx-postgresql-replica-headless
  replicas: 1
  template:
    spec:
      initContainers:
        - name: init-replica
          image: postgres:17-alpine
          # pg_basebackup from primary on first run, standby.signal on restart
      containers:
        - name: postgresql
          image: postgres:17-alpine
          args:
            - -c
            - hot_standby=on
            - -c
            - hot_standby_feedback=on
            - -c
            - shared_buffers=512MB
            - -c
            - max_connections=100
          resources:
            requests:
              cpu: 250m
              memory: 512Mi
            limits:
              cpu: "1"
              memory: 2Gi
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        storageClassName: csi-sc-vnpaycloud
        resources:
          requests:
            storage: 20Gi

PgBouncer Connection Pooling

PgBouncer runs as a Deployment in front of the primary. Backend services connect to nx-pgbouncer:5432 (mapped to PgBouncer's internal port 6432). It runs in transaction pooling mode with a max_client_conn of 200 and default_pool_size of 20.

PgBouncer configuration (from ConfigMap)
yaml
# PgBouncer configuration (from ConfigMap)
[databases]
nx_seller_core = host=nx-postgresql-primary.nx-persistent.svc.cluster.local port=5432 dbname=nx_seller_core

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
pool_mode = transaction
max_client_conn = 200
default_pool_size = 20
min_pool_size = 5
reserve_pool_size = 5
max_db_connections = 50

The init container generates userlist.txt from Kubernetes secrets (app user + superuser). The service exposes port 5432 externally, mapping to 6432 internally — so backend services use the same port they would for a direct PostgreSQL connection.

Services

ServiceDNSPortPurpose
nx-pgbouncernx-pgbouncer.nx-persistent.svc.cluster.local5432Pooled connection to primary (used by all backend services)
nx-postgresql-primarynx-postgresql-primary.nx-persistent.svc.cluster.local5432Direct primary (used by PgBouncer, port-forward)
nx-postgresql-replicanx-postgresql-replica.nx-persistent.svc.cluster.local5432Direct replica (read-only, future read splitting)
nx-postgresql-primary-headless(pod DNS)5432StatefulSet stable DNS
nx-postgresql-replica-headless(pod DNS)5432StatefulSet stable DNS

Default Connection Path

Backend services connect to nx-pgbouncer.nx-persistent.svc.cluster.local:5432 (pooled). This is configured in shared-config.yaml as APP_ENV_POSTGRES_HOST. Direct primary/replica services exist for admin access and future read/write splitting.

Users & Secrets

UserPurposeSecret
postgresSuperusernx-postgresql-superuser-secret
nx_seller_operatorApplication user (CREATEDB)nx-postgresql-app-secret
repl_userStreaming replicationnx-postgresql-replication-secret

Database Schemas

BANA uses 7 schemas in the nx_seller_core database:

SchemaServices
publicidentity, commerce (shared tables)
pricingpricing
allocationcommerce (stock allocation)
inventoryinventory
financefinance, ledger
paymentpayment
salesale

Port-Forward for Local Access

bash
# From infrastructure/deployments/staging/ directory:
./kc port-forward svc/nx-postgresql-primary 5432:5432 -n nx-persistent

# From project root:
infrastructure/deployments/staging/kc port-forward svc/nx-postgresql-primary 5432:5432 -n nx-persistent

# Then connect locally:
psql -h localhost -U nx_seller_operator -d nx_seller_core

Redis

Cluster Mode (3 Masters, 0 Replicas)

Redis runs as a 3-node cluster with 16384 hash slots distributed across 3 master nodes. No replicas are configured (staging does not need per-master redundancy). The cluster provides data sharding across nodes.

StatefulSet

StatefulSet: nx-redis
yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nx-redis
  namespace: nx-broker
spec:
  serviceName: nx-redis-headless
  replicas: 3
  template:
    spec:
      nodeSelector:
        node.kubernetes.io/pool: stateful
      securityContext:
        runAsNonRoot: true
        runAsUser: 999
        runAsGroup: 999
        fsGroup: 999
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: redis
          image: redis:7-alpine
          command:
            - redis-server
            - --requirepass
            - $(REDIS_PASSWORD)
            - --masterauth
            - $(REDIS_PASSWORD)
            - --cluster-enabled
            - "yes"
            - --cluster-config-file
            - /data/nodes.conf
            - --cluster-node-timeout
            - "5000"
            - --appendonly
            - "yes"
            - --appendfsync
            - everysec
            - --maxmemory
            - 256mb
            - --maxmemory-policy
            - noeviction
          ports:
            - containerPort: 6379
              name: redis
            - containerPort: 16379
              name: cluster-bus
          env:
            - name: REDIS_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: nx-redis-secret
                  key: REDIS_PASSWORD
          resources:
            requests:
              cpu: 100m
              memory: 384Mi
            limits:
              cpu: "1"
              memory: 512Mi
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: [ReadWriteOnce]
        storageClassName: csi-sc-vnpaycloud
        resources:
          requests:
            storage: 5Gi

Cluster Initialization

After all 3 pods are running, a one-time Job creates the cluster:

Job: nx-redis-cluster-init
yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: nx-redis-cluster-init
  namespace: nx-broker
spec:
  ttlSecondsAfterFinished: 600
  backoffLimit: 3
  template:
    spec:
      restartPolicy: OnFailure
      nodeSelector:
        node.kubernetes.io/pool: default
      securityContext:
        runAsNonRoot: true
        runAsUser: 999
        runAsGroup: 999
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: cluster-init
          image: redis:7-alpine
          command:
            - sh
            - -c
            - |
              set -e
              export REDISCLI_AUTH="$REDIS_PASSWORD"
              HEADLESS="nx-redis-headless.nx-broker.svc.cluster.local"
              NODES=""
              for i in 0 1 2; do
                HOST="nx-redis-${i}.${HEADLESS}"
                until [ "$(redis-cli -h "$HOST" ping 2>/dev/null)" = "PONG" ]; do sleep 2; done
                NODES="${NODES} ${HOST}:6379"
              done
              echo "yes" | redis-cli --cluster create ${NODES} --cluster-replicas 0

PodSecurity Restricted

The init job must have runAsNonRoot: true, seccompProfile: RuntimeDefault, and capabilities.drop: [ALL] to satisfy the restricted PodSecurity Standard enforced on all namespaces.

Services

ServiceDNSPortPurpose
nx-redis-headlessnx-redis-headless.nx-broker.svc.cluster.local6379, 16379Headless — pod-level DNS for cluster nodes
nx-redisnx-redis.nx-broker.svc.cluster.local6379, 16379ClusterIP — load-balanced (initial discovery)

Individual pod addresses used by backend services:

nx-redis-0.nx-redis-headless.nx-broker.svc.cluster.local:6379
nx-redis-1.nx-redis-headless.nx-broker.svc.cluster.local:6379
nx-redis-2.nx-redis-headless.nx-broker.svc.cluster.local:6379

Backend Connection Config

Backend services connect in cluster mode via shared-config.yaml:

Cache
yaml
# Cache
APP_ENV_CACHE_REDIS_MODE: cluster
APP_ENV_CACHE_REDIS_CLUSTER_NODES: "nx-redis-0.nx-redis-headless.nx-broker.svc.cluster.local:6379,nx-redis-1.nx-redis-headless.nx-broker.svc.cluster.local:6379,nx-redis-2.nx-redis-headless.nx-broker.svc.cluster.local:6379"

# BullMQ
APP_ENV_BULLMQ_REDIS_MODE: cluster
APP_ENV_BULLMQ_REDIS_CLUSTER_NODES: "nx-redis-0....:6379,nx-redis-1....:6379,nx-redis-2....:6379"

# WebSocket
APP_ENV_WEBSOCKET_REDIS_MODE: cluster
APP_ENV_WEBSOCKET_REDIS_CLUSTER_NODES: "nx-redis-0....:6379,nx-redis-1....:6379,nx-redis-2....:6379"

Redis Cluster vs. Logical Databases

Redis Cluster mode does not support SELECT \<db\> (logical databases). All data lives in DB 0, partitioned by hash slots across the 3 masters. Key prefixes (e.g., cache:, bullmq:, ws:) are used instead of separate databases.


Kafka

KRaft StatefulSet with SASL/SCRAM-SHA-512

Apache Kafka 4.1.1 runs in KRaft mode (no ZooKeeper) with 3 combined broker+controller nodes. Backend services authenticate via SASL/SCRAM-SHA-512 on the CLIENT listener.

Listeners

ListenerPortProtocolPurpose
INTERNAL29092PLAINTEXTInter-broker communication (secured by NetworkPolicy)
CLIENT9092SASL_PLAINTEXT (SCRAM-SHA-512)Backend services connect here
CONTROLLER9093PLAINTEXTKRaft quorum consensus

StatefulSet

The init container generates server.properties dynamically (setting node.id from the pod ordinal), removes lost+found from ext4 PVCs (Kafka rejects it), and runs kafka-storage.sh format on first boot.

StatefulSet: nx-kafka
yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nx-kafka
  namespace: nx-broker
spec:
  serviceName: nx-kafka-headless
  replicas: 3
  podManagementPolicy: Parallel
  template:
    spec:
      nodeSelector:
        node.kubernetes.io/pool: stateful
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      initContainers:
        - name: kafka-init-config
          image: apache/kafka:4.1.1
          # Generates server.properties, removes lost+found,
          # runs kafka-storage.sh format on first run
      containers:
        - name: kafka
          image: apache/kafka:4.1.1
          command:
            - /opt/kafka/bin/kafka-server-start.sh
            - /config/server.properties
          env:
            - name: KAFKA_OPTS
              value: "-Djava.security.auth.login.config=/etc/kafka-jaas/kafka_server_jaas.conf"
          ports:
            - containerPort: 29092
              name: internal
            - containerPort: 9092
              name: client
            - containerPort: 9093
              name: controller
          resources:
            requests:
              cpu: 500m
              memory: 1280Mi
            limits:
              cpu: "1.5"
              memory: 2Gi
          volumeMounts:
            - name: data
              mountPath: /var/lib/kafka/data
            - name: config
              mountPath: /config
            - name: jaas
              mountPath: /etc/kafka-jaas
              readOnly: true
      volumes:
        - name: config
          emptyDir: { sizeLimit: 8Mi }
        - name: jaas
          secret:
            secretName: nx-kafka-jaas
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: [ReadWriteOnce]
        storageClassName: csi-sc-vnpaycloud
        resources:
          requests:
            storage: 10Gi

Key Configuration (from generated server.properties)

properties
# Replication
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
default.replication.factor=3
min.insync.replicas=2
num.partitions=3

# Retention
log.retention.hours=168          # 7 days
log.retention.bytes=1073741824   # 1 GB per partition
log.segment.bytes=1073741824

# Auto-create topics
auto.create.topics.enable=false

SASL/SCRAM User Initialization

A one-time Job creates the SCRAM-SHA-512 user nx.staging on the INTERNAL listener (which is PLAINTEXT and does not require SASL itself):

Job: nx-kafka-scram-init
yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: nx-kafka-scram-init
  namespace: nx-broker
spec:
  template:
    spec:
      containers:
        - name: scram-init
          image: apache/kafka:4.1.1
          command:
            - sh
            - -c
            - |
              BOOTSTRAP="nx-kafka-0.nx-kafka-headless.nx-broker.svc.cluster.local:29092"
              # Wait for Kafka, then:
              /opt/kafka/bin/kafka-configs.sh \
                --bootstrap-server "$BOOTSTRAP" \
                --alter \
                --add-config "SCRAM-SHA-512=[iterations=8192,password=${KAFKA_SASL_PASSWORD}]" \
                --entity-type users \
                --entity-name nx.staging

The JAAS config is stored in Secret nx-kafka-jaas and mounted at /etc/kafka-jaas/kafka_server_jaas.conf. It is loaded via KAFKA_OPTS JVM argument.

Services

ServiceDNSPortsNotes
nx-kafka-headlessnx-kafka-headless.nx-broker.svc.cluster.local29092, 9092, 9093Headless, publishNotReadyAddresses: true (critical for KRaft bootstrap)
nx-kafkanx-kafka.nx-broker.svc.cluster.local29092, 9092ClusterIP — load-balanced

publishNotReadyAddresses

The headless service must have publishNotReadyAddresses: true. Without it, pods cannot discover each other during initial KRaft bootstrap because they are not yet "ready".

Backend Connection Config

From shared-config.yaml
yaml
# From shared-config.yaml
APP_ENV_KAFKA_BROKERS: nx-kafka-0.nx-kafka-headless.nx-broker.svc.cluster.local:9092,nx-kafka-1.nx-kafka-headless.nx-broker.svc.cluster.local:9092,nx-kafka-2.nx-kafka-headless.nx-broker.svc.cluster.local:9092
APP_ENV_KAFKA_SASL_ENABLE: "true"
APP_ENV_KAFKA_SASL_MECHANISM: SCRAM-SHA-512
APP_ENV_KAFKA_SASL_USERNAME: nx.staging
# Password in nx-shared-secret: APP_ENV_KAFKA_SASL_PASSWORD

Typesense

Single Instance

Typesense runs as a single-pod StatefulSet. For staging this is sufficient — Typesense data can be re-indexed from PostgreSQL if lost.

StatefulSet: nx-typesense
yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nx-typesense
  namespace: nx-search
spec:
  serviceName: nx-typesense
  replicas: 1
  template:
    spec:
      nodeSelector:
        node.kubernetes.io/pool: stateful
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: typesense
          image: typesense/typesense:27.1
          args:
            - --data-dir=/data
            - --api-port=8108
            - --api-key=$(TYPESENSE_API_KEY)
            - --enable-cors
          ports:
            - containerPort: 8108
              name: http
          env:
            - name: TYPESENSE_API_KEY
              valueFrom:
                secretKeyRef:
                  name: nx-typesense-secret
                  key: TYPESENSE_API_KEY
          resources:
            requests:
              cpu: 200m
              memory: 512Mi
            limits:
              cpu: "1"
              memory: 2Gi
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: [ReadWriteOnce]
        storageClassName: csi-sc-vnpaycloud
        resources:
          requests:
            storage: 5Gi

Service

ServiceDNSPort
nx-typesensenx-typesense.nx-search.svc.cluster.local8108

Debezium

CDC Connector

Debezium runs as a Kafka Connect instance in nx-search, streaming change data capture (CDC) events from PostgreSQL to Kafka topics. This enables real-time search index updates in Typesense.

Deployment

PropertyValue
Imagedebezium/connect:3.0.0.Final
Namespacenx-search
Replicas1
Port8083

The nx-debezium-init Job registers the PostgreSQL source connector after the Debezium instance is ready.

Service

ServiceDNSPort
nx-debeziumnx-debezium.nx-search.svc.cluster.local8083

Network Policies

All namespaces have default-deny-all (both ingress and egress) with explicit allow rules.

Deny Defaults

Applied to: nx-backend, nx-app, nx-persistent, nx-broker, nx-search, nx-internal.

NetworkPolicy: default-deny-all
yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

Allow Rules

PolicyNamespaceDirectionFrom/ToPorts
allow-dnsAll 6 namespacesEgresskube-system53 (UDP+TCP)
allow-egress-to-postgresqlnx-backendEgressnx-persistent5432
allow-egress-to-brokernx-backendEgressnx-broker6379, 16379, 9092, 29092
allow-egress-to-searchnx-backendEgressnx-search8108
allow-egress-to-backendnx-backendEgressnx-backend3000 (inter-service)
allow-egress-to-s3nx-backendEgressExternal (0.0.0.0/0 excl. private)443 (commerce, ledger only)
allow-from-backendnx-brokerIngressnx-backend + nx-broker6379, 16379, 9092, 29092, 9093
allow-broker-internalnx-brokerEgressnx-broker6379, 16379, 9092, 29092, 9093
allow-pg-replicationnx-persistentBothPG pods within nx-persistent5432
allow-pgbouncer-to-primarynx-persistentEgressPgBouncer → PG primary5432
allow-pgbouncer-from-backendnx-persistentIngressnx-backend → PgBouncer6432
allow-from-backendnx-persistentIngressnx-backend5432, 6432
allow-from-backendnx-searchIngressnx-backend8108
allow-search-internalnx-searchBothWithin nx-search8083, 8108
egress-nx-searchnx-searchEgressnx-persistent, nx-broker5432, 9092

Backup & Recovery

ServiceMethodFrequencyRetentionRecovery
PostgreSQLStreaming replication (replica)Continuous WALOnline replicaPromote replica
RedisAOF persistence (appendonly yes, appendfsync everysec)ContinuousOn-diskRestart from AOF
KafkaLog retentionAutomatic7 days / 1GB per partitionReplay from retained logs
TypesenseRe-index from PostgreSQLOn-demandN/ARe-index

Staging Data Strategy

Staging data can be recreated from migration seeds. The primary+replica setup provides read availability and a warm standby, but there is no off-cluster backup to S3 at the staging level. For production-grade backup (CNPG with barman, S3 WAL archiving, PITR), see the Production section below.


Volume Expansion

All PVCs use StorageClass csi-sc-vnpaycloud which has allowVolumeExpansion: true.

For StatefulSets (Redis, Kafka, Typesense, PostgreSQL)

bash
# Example: expand Redis PVC from 5Gi to 10Gi
# From infrastructure/deployments/staging/ directory:
./kc patch pvc data-nx-redis-0 -n nx-broker -p '{"spec":{"resources":{"requests":{"storage":"10Gi"}}}}'
./kc patch pvc data-nx-redis-1 -n nx-broker -p '{"spec":{"resources":{"requests":{"storage":"10Gi"}}}}'
./kc patch pvc data-nx-redis-2 -n nx-broker -p '{"spec":{"resources":{"requests":{"storage":"10Gi"}}}}'

# From project root:
infrastructure/deployments/staging/kc patch pvc data-nx-redis-0 -n nx-broker -p '{"spec":{"resources":{"requests":{"storage":"10Gi"}}}}'

The Cinder CSI driver supports online expansion — no pod restart needed.


Resource Summary

Staging (stateful node pool)

WorkloadPodsCPU reqCPU limMem reqMem limStorage
PG Primary1500m21Gi4Gi20Gi
PG Replica1250m1512Mi2Gi20Gi
PgBouncer1100m500m256Mi512Mi-
Redis (x3)3300m31.125Gi1.5Gi15Gi
Kafka (x3)31.54.53.75Gi6Gi30Gi
Typesense1200m1512Mi2Gi5Gi
Debezium1250m1512Mi2Gi-
Total113.1137.67Gi18Gi90Gi

Production (Planned)

Not Yet Deployed

The production data layer is planned but not yet deployed. The following describes the target architecture. Actual manifests will be created when production infrastructure is provisioned.

ComponentPlanned Production Setup
PostgreSQLCloudNativePG operator (1 primary + 2 replicas), PgBouncer sidecar, continuous WAL archiving to S3, 30d retention, PITR
RedisRedis Sentinel or Cluster with replicas (1 master + 2 replicas per shard)
KafkaStrimzi operator (3 brokers, rack-aware, declarative KafkaTopic/KafkaUser CRDs, Cruise Control)
Typesense3-node Raft cluster with automatic leader election
FailoverAutomatic: <30s for PG (CNPG), <15s for Redis (Sentinel), built-in for Kafka (ISR)
BackupsContinuous WAL + daily base to S3 (PG), AOF + RDB to S3 (Redis), log retention + replication (Kafka)

Proprietary and Confidential. Unauthorized copying, distribution, or use of this software is strictly prohibited.