Data Layer
The data layer runs on the stateful node pool in the VNPAY Cloud Kubernetes cluster. All workloads use the csi-sc-vnpaycloud StorageClass (Cinder CSI on OpenStack) with allowVolumeExpansion: true, meaning PVCs can be expanded online without downtime.
Architecture Overview
Component Summary
| Component | Type | Instances | Image | Namespace | Storage |
|---|---|---|---|---|---|
| PostgreSQL | StatefulSet (primary + replica) | 2 (1 primary + 1 replica) | postgres:17-alpine | nx-persistent | 20Gi each |
| PgBouncer | Deployment | 1 | ghcr.io/icoretech/pgbouncer-docker:1.24.1 | nx-persistent | - |
| Redis | StatefulSet (cluster mode) | 3 masters, 0 replicas | redis:7-alpine | nx-broker | 5Gi each |
| Kafka | StatefulSet (KRaft + SASL) | 3 brokers | apache/kafka:4.1.1 | nx-broker | 10Gi each |
| Typesense | StatefulSet | 1 | typesense/typesense:27.1 | nx-search | 5Gi |
| Debezium | Deployment | 1 | debezium/connect:3.0.0.Final | nx-search | - |
PostgreSQL
Architecture: Primary + Replica with PgBouncer
PostgreSQL runs as two separate StatefulSets — one primary (read-write) and one replica (read-only, hot standby) — with a PgBouncer Deployment in front for connection pooling. The replica streams WAL from the primary using native PostgreSQL streaming replication.
Primary StatefulSet
The init container handles first-run database initialization: creates the nx_seller_operator app user, the nx_seller_core database, enables pg_stat_statements and pgcrypto extensions, and creates the repl_user for streaming replication.
StatefulSet: nx-postgresql-primary
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nx-postgresql-primary
namespace: nx-persistent
spec:
serviceName: nx-postgresql-primary-headless
replicas: 1
template:
spec:
nodeSelector:
node.kubernetes.io/pool: stateful
securityContext:
runAsNonRoot: true
runAsUser: 70
runAsGroup: 70
fsGroup: 70
seccompProfile:
type: RuntimeDefault
initContainers:
- name: init-primary
image: postgres:17-alpine
# Creates: nx_seller_operator user, nx_seller_core DB,
# pg_stat_statements + pgcrypto extensions, repl_user for replication,
# pg_hba.conf entries for replication + SCRAM auth
containers:
- name: postgresql
image: postgres:17-alpine
ports:
- containerPort: 5432
args:
- -c
- wal_level=logical
- -c
- max_wal_senders=10
- -c
- max_replication_slots=10
- -c
- hot_standby=on
- -c
- shared_buffers=512MB
- -c
- effective_cache_size=1536MB
- -c
- work_mem=8MB
- -c
- maintenance_work_mem=128MB
- -c
- max_connections=100
- -c
- random_page_cost=1.1
- -c
- effective_io_concurrency=200
- -c
- log_statement=ddl
- -c
- log_min_duration_statement=1000
- -c
- password_encryption=scram-sha-256
- -c
- timezone=Asia/Ho_Chi_Minh
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "2"
memory: 4Gi
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ReadWriteOnce]
storageClassName: csi-sc-vnpaycloud
resources:
requests:
storage: 20GiReplica StatefulSet
The replica init container uses pg_basebackup from the primary on first run, then enters hot standby mode. It streams WAL continuously from the primary for near-zero replication lag.
StatefulSet: nx-postgresql-replica
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nx-postgresql-replica
namespace: nx-persistent
spec:
serviceName: nx-postgresql-replica-headless
replicas: 1
template:
spec:
initContainers:
- name: init-replica
image: postgres:17-alpine
# pg_basebackup from primary on first run, standby.signal on restart
containers:
- name: postgresql
image: postgres:17-alpine
args:
- -c
- hot_standby=on
- -c
- hot_standby_feedback=on
- -c
- shared_buffers=512MB
- -c
- max_connections=100
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: "1"
memory: 2Gi
volumeClaimTemplates:
- metadata:
name: data
spec:
storageClassName: csi-sc-vnpaycloud
resources:
requests:
storage: 20GiPgBouncer Connection Pooling
PgBouncer runs as a Deployment in front of the primary. Backend services connect to nx-pgbouncer:5432 (mapped to PgBouncer's internal port 6432). It runs in transaction pooling mode with a max_client_conn of 200 and default_pool_size of 20.
PgBouncer configuration (from ConfigMap)
# PgBouncer configuration (from ConfigMap)
[databases]
nx_seller_core = host=nx-postgresql-primary.nx-persistent.svc.cluster.local port=5432 dbname=nx_seller_core
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
pool_mode = transaction
max_client_conn = 200
default_pool_size = 20
min_pool_size = 5
reserve_pool_size = 5
max_db_connections = 50The init container generates userlist.txt from Kubernetes secrets (app user + superuser). The service exposes port 5432 externally, mapping to 6432 internally — so backend services use the same port they would for a direct PostgreSQL connection.
Services
| Service | DNS | Port | Purpose |
|---|---|---|---|
nx-pgbouncer | nx-pgbouncer.nx-persistent.svc.cluster.local | 5432 | Pooled connection to primary (used by all backend services) |
nx-postgresql-primary | nx-postgresql-primary.nx-persistent.svc.cluster.local | 5432 | Direct primary (used by PgBouncer, port-forward) |
nx-postgresql-replica | nx-postgresql-replica.nx-persistent.svc.cluster.local | 5432 | Direct replica (read-only, future read splitting) |
nx-postgresql-primary-headless | (pod DNS) | 5432 | StatefulSet stable DNS |
nx-postgresql-replica-headless | (pod DNS) | 5432 | StatefulSet stable DNS |
Default Connection Path
Backend services connect to nx-pgbouncer.nx-persistent.svc.cluster.local:5432 (pooled). This is configured in shared-config.yaml as APP_ENV_POSTGRES_HOST. Direct primary/replica services exist for admin access and future read/write splitting.
Users & Secrets
| User | Purpose | Secret |
|---|---|---|
postgres | Superuser | nx-postgresql-superuser-secret |
nx_seller_operator | Application user (CREATEDB) | nx-postgresql-app-secret |
repl_user | Streaming replication | nx-postgresql-replication-secret |
Database Schemas
BANA uses 7 schemas in the nx_seller_core database:
| Schema | Services |
|---|---|
public | identity, commerce (shared tables) |
pricing | pricing |
allocation | commerce (stock allocation) |
inventory | inventory |
finance | finance, ledger |
payment | payment |
sale | sale |
Port-Forward for Local Access
# From infrastructure/deployments/staging/ directory:
./kc port-forward svc/nx-postgresql-primary 5432:5432 -n nx-persistent
# From project root:
infrastructure/deployments/staging/kc port-forward svc/nx-postgresql-primary 5432:5432 -n nx-persistent
# Then connect locally:
psql -h localhost -U nx_seller_operator -d nx_seller_coreRedis
Cluster Mode (3 Masters, 0 Replicas)
Redis runs as a 3-node cluster with 16384 hash slots distributed across 3 master nodes. No replicas are configured (staging does not need per-master redundancy). The cluster provides data sharding across nodes.
StatefulSet
StatefulSet: nx-redis
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nx-redis
namespace: nx-broker
spec:
serviceName: nx-redis-headless
replicas: 3
template:
spec:
nodeSelector:
node.kubernetes.io/pool: stateful
securityContext:
runAsNonRoot: true
runAsUser: 999
runAsGroup: 999
fsGroup: 999
seccompProfile:
type: RuntimeDefault
containers:
- name: redis
image: redis:7-alpine
command:
- redis-server
- --requirepass
- $(REDIS_PASSWORD)
- --masterauth
- $(REDIS_PASSWORD)
- --cluster-enabled
- "yes"
- --cluster-config-file
- /data/nodes.conf
- --cluster-node-timeout
- "5000"
- --appendonly
- "yes"
- --appendfsync
- everysec
- --maxmemory
- 256mb
- --maxmemory-policy
- noeviction
ports:
- containerPort: 6379
name: redis
- containerPort: 16379
name: cluster-bus
env:
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: nx-redis-secret
key: REDIS_PASSWORD
resources:
requests:
cpu: 100m
memory: 384Mi
limits:
cpu: "1"
memory: 512Mi
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ReadWriteOnce]
storageClassName: csi-sc-vnpaycloud
resources:
requests:
storage: 5GiCluster Initialization
After all 3 pods are running, a one-time Job creates the cluster:
Job: nx-redis-cluster-init
apiVersion: batch/v1
kind: Job
metadata:
name: nx-redis-cluster-init
namespace: nx-broker
spec:
ttlSecondsAfterFinished: 600
backoffLimit: 3
template:
spec:
restartPolicy: OnFailure
nodeSelector:
node.kubernetes.io/pool: default
securityContext:
runAsNonRoot: true
runAsUser: 999
runAsGroup: 999
seccompProfile:
type: RuntimeDefault
containers:
- name: cluster-init
image: redis:7-alpine
command:
- sh
- -c
- |
set -e
export REDISCLI_AUTH="$REDIS_PASSWORD"
HEADLESS="nx-redis-headless.nx-broker.svc.cluster.local"
NODES=""
for i in 0 1 2; do
HOST="nx-redis-${i}.${HEADLESS}"
until [ "$(redis-cli -h "$HOST" ping 2>/dev/null)" = "PONG" ]; do sleep 2; done
NODES="${NODES} ${HOST}:6379"
done
echo "yes" | redis-cli --cluster create ${NODES} --cluster-replicas 0PodSecurity Restricted
The init job must have runAsNonRoot: true, seccompProfile: RuntimeDefault, and capabilities.drop: [ALL] to satisfy the restricted PodSecurity Standard enforced on all namespaces.
Services
| Service | DNS | Port | Purpose |
|---|---|---|---|
nx-redis-headless | nx-redis-headless.nx-broker.svc.cluster.local | 6379, 16379 | Headless — pod-level DNS for cluster nodes |
nx-redis | nx-redis.nx-broker.svc.cluster.local | 6379, 16379 | ClusterIP — load-balanced (initial discovery) |
Individual pod addresses used by backend services:
nx-redis-0.nx-redis-headless.nx-broker.svc.cluster.local:6379
nx-redis-1.nx-redis-headless.nx-broker.svc.cluster.local:6379
nx-redis-2.nx-redis-headless.nx-broker.svc.cluster.local:6379Backend Connection Config
Backend services connect in cluster mode via shared-config.yaml:
Cache
# Cache
APP_ENV_CACHE_REDIS_MODE: cluster
APP_ENV_CACHE_REDIS_CLUSTER_NODES: "nx-redis-0.nx-redis-headless.nx-broker.svc.cluster.local:6379,nx-redis-1.nx-redis-headless.nx-broker.svc.cluster.local:6379,nx-redis-2.nx-redis-headless.nx-broker.svc.cluster.local:6379"
# BullMQ
APP_ENV_BULLMQ_REDIS_MODE: cluster
APP_ENV_BULLMQ_REDIS_CLUSTER_NODES: "nx-redis-0....:6379,nx-redis-1....:6379,nx-redis-2....:6379"
# WebSocket
APP_ENV_WEBSOCKET_REDIS_MODE: cluster
APP_ENV_WEBSOCKET_REDIS_CLUSTER_NODES: "nx-redis-0....:6379,nx-redis-1....:6379,nx-redis-2....:6379"Redis Cluster vs. Logical Databases
Redis Cluster mode does not support SELECT \<db\> (logical databases). All data lives in DB 0, partitioned by hash slots across the 3 masters. Key prefixes (e.g., cache:, bullmq:, ws:) are used instead of separate databases.
Kafka
KRaft StatefulSet with SASL/SCRAM-SHA-512
Apache Kafka 4.1.1 runs in KRaft mode (no ZooKeeper) with 3 combined broker+controller nodes. Backend services authenticate via SASL/SCRAM-SHA-512 on the CLIENT listener.
Listeners
| Listener | Port | Protocol | Purpose |
|---|---|---|---|
INTERNAL | 29092 | PLAINTEXT | Inter-broker communication (secured by NetworkPolicy) |
CLIENT | 9092 | SASL_PLAINTEXT (SCRAM-SHA-512) | Backend services connect here |
CONTROLLER | 9093 | PLAINTEXT | KRaft quorum consensus |
StatefulSet
The init container generates server.properties dynamically (setting node.id from the pod ordinal), removes lost+found from ext4 PVCs (Kafka rejects it), and runs kafka-storage.sh format on first boot.
StatefulSet: nx-kafka
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nx-kafka
namespace: nx-broker
spec:
serviceName: nx-kafka-headless
replicas: 3
podManagementPolicy: Parallel
template:
spec:
nodeSelector:
node.kubernetes.io/pool: stateful
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
initContainers:
- name: kafka-init-config
image: apache/kafka:4.1.1
# Generates server.properties, removes lost+found,
# runs kafka-storage.sh format on first run
containers:
- name: kafka
image: apache/kafka:4.1.1
command:
- /opt/kafka/bin/kafka-server-start.sh
- /config/server.properties
env:
- name: KAFKA_OPTS
value: "-Djava.security.auth.login.config=/etc/kafka-jaas/kafka_server_jaas.conf"
ports:
- containerPort: 29092
name: internal
- containerPort: 9092
name: client
- containerPort: 9093
name: controller
resources:
requests:
cpu: 500m
memory: 1280Mi
limits:
cpu: "1.5"
memory: 2Gi
volumeMounts:
- name: data
mountPath: /var/lib/kafka/data
- name: config
mountPath: /config
- name: jaas
mountPath: /etc/kafka-jaas
readOnly: true
volumes:
- name: config
emptyDir: { sizeLimit: 8Mi }
- name: jaas
secret:
secretName: nx-kafka-jaas
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ReadWriteOnce]
storageClassName: csi-sc-vnpaycloud
resources:
requests:
storage: 10GiKey Configuration (from generated server.properties)
# Replication
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
default.replication.factor=3
min.insync.replicas=2
num.partitions=3
# Retention
log.retention.hours=168 # 7 days
log.retention.bytes=1073741824 # 1 GB per partition
log.segment.bytes=1073741824
# Auto-create topics
auto.create.topics.enable=falseSASL/SCRAM User Initialization
A one-time Job creates the SCRAM-SHA-512 user nx.staging on the INTERNAL listener (which is PLAINTEXT and does not require SASL itself):
Job: nx-kafka-scram-init
apiVersion: batch/v1
kind: Job
metadata:
name: nx-kafka-scram-init
namespace: nx-broker
spec:
template:
spec:
containers:
- name: scram-init
image: apache/kafka:4.1.1
command:
- sh
- -c
- |
BOOTSTRAP="nx-kafka-0.nx-kafka-headless.nx-broker.svc.cluster.local:29092"
# Wait for Kafka, then:
/opt/kafka/bin/kafka-configs.sh \
--bootstrap-server "$BOOTSTRAP" \
--alter \
--add-config "SCRAM-SHA-512=[iterations=8192,password=${KAFKA_SASL_PASSWORD}]" \
--entity-type users \
--entity-name nx.stagingThe JAAS config is stored in Secret nx-kafka-jaas and mounted at /etc/kafka-jaas/kafka_server_jaas.conf. It is loaded via KAFKA_OPTS JVM argument.
Services
| Service | DNS | Ports | Notes |
|---|---|---|---|
nx-kafka-headless | nx-kafka-headless.nx-broker.svc.cluster.local | 29092, 9092, 9093 | Headless, publishNotReadyAddresses: true (critical for KRaft bootstrap) |
nx-kafka | nx-kafka.nx-broker.svc.cluster.local | 29092, 9092 | ClusterIP — load-balanced |
publishNotReadyAddresses
The headless service must have publishNotReadyAddresses: true. Without it, pods cannot discover each other during initial KRaft bootstrap because they are not yet "ready".
Backend Connection Config
From shared-config.yaml
# From shared-config.yaml
APP_ENV_KAFKA_BROKERS: nx-kafka-0.nx-kafka-headless.nx-broker.svc.cluster.local:9092,nx-kafka-1.nx-kafka-headless.nx-broker.svc.cluster.local:9092,nx-kafka-2.nx-kafka-headless.nx-broker.svc.cluster.local:9092
APP_ENV_KAFKA_SASL_ENABLE: "true"
APP_ENV_KAFKA_SASL_MECHANISM: SCRAM-SHA-512
APP_ENV_KAFKA_SASL_USERNAME: nx.staging
# Password in nx-shared-secret: APP_ENV_KAFKA_SASL_PASSWORDTypesense
Single Instance
Typesense runs as a single-pod StatefulSet. For staging this is sufficient — Typesense data can be re-indexed from PostgreSQL if lost.
StatefulSet: nx-typesense
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nx-typesense
namespace: nx-search
spec:
serviceName: nx-typesense
replicas: 1
template:
spec:
nodeSelector:
node.kubernetes.io/pool: stateful
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: typesense
image: typesense/typesense:27.1
args:
- --data-dir=/data
- --api-port=8108
- --api-key=$(TYPESENSE_API_KEY)
- --enable-cors
ports:
- containerPort: 8108
name: http
env:
- name: TYPESENSE_API_KEY
valueFrom:
secretKeyRef:
name: nx-typesense-secret
key: TYPESENSE_API_KEY
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: "1"
memory: 2Gi
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ReadWriteOnce]
storageClassName: csi-sc-vnpaycloud
resources:
requests:
storage: 5GiService
| Service | DNS | Port |
|---|---|---|
nx-typesense | nx-typesense.nx-search.svc.cluster.local | 8108 |
Debezium
CDC Connector
Debezium runs as a Kafka Connect instance in nx-search, streaming change data capture (CDC) events from PostgreSQL to Kafka topics. This enables real-time search index updates in Typesense.
Deployment
| Property | Value |
|---|---|
| Image | debezium/connect:3.0.0.Final |
| Namespace | nx-search |
| Replicas | 1 |
| Port | 8083 |
The nx-debezium-init Job registers the PostgreSQL source connector after the Debezium instance is ready.
Service
| Service | DNS | Port |
|---|---|---|
nx-debezium | nx-debezium.nx-search.svc.cluster.local | 8083 |
Network Policies
All namespaces have default-deny-all (both ingress and egress) with explicit allow rules.
Deny Defaults
Applied to: nx-backend, nx-app, nx-persistent, nx-broker, nx-search, nx-internal.
NetworkPolicy: default-deny-all
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
spec:
podSelector: {}
policyTypes:
- Ingress
- EgressAllow Rules
| Policy | Namespace | Direction | From/To | Ports |
|---|---|---|---|---|
allow-dns | All 6 namespaces | Egress | kube-system | 53 (UDP+TCP) |
allow-egress-to-postgresql | nx-backend | Egress | nx-persistent | 5432 |
allow-egress-to-broker | nx-backend | Egress | nx-broker | 6379, 16379, 9092, 29092 |
allow-egress-to-search | nx-backend | Egress | nx-search | 8108 |
allow-egress-to-backend | nx-backend | Egress | nx-backend | 3000 (inter-service) |
allow-egress-to-s3 | nx-backend | Egress | External (0.0.0.0/0 excl. private) | 443 (commerce, ledger only) |
allow-from-backend | nx-broker | Ingress | nx-backend + nx-broker | 6379, 16379, 9092, 29092, 9093 |
allow-broker-internal | nx-broker | Egress | nx-broker | 6379, 16379, 9092, 29092, 9093 |
allow-pg-replication | nx-persistent | Both | PG pods within nx-persistent | 5432 |
allow-pgbouncer-to-primary | nx-persistent | Egress | PgBouncer → PG primary | 5432 |
allow-pgbouncer-from-backend | nx-persistent | Ingress | nx-backend → PgBouncer | 6432 |
allow-from-backend | nx-persistent | Ingress | nx-backend | 5432, 6432 |
allow-from-backend | nx-search | Ingress | nx-backend | 8108 |
allow-search-internal | nx-search | Both | Within nx-search | 8083, 8108 |
egress-nx-search | nx-search | Egress | nx-persistent, nx-broker | 5432, 9092 |
Backup & Recovery
| Service | Method | Frequency | Retention | Recovery |
|---|---|---|---|---|
| PostgreSQL | Streaming replication (replica) | Continuous WAL | Online replica | Promote replica |
| Redis | AOF persistence (appendonly yes, appendfsync everysec) | Continuous | On-disk | Restart from AOF |
| Kafka | Log retention | Automatic | 7 days / 1GB per partition | Replay from retained logs |
| Typesense | Re-index from PostgreSQL | On-demand | N/A | Re-index |
Staging Data Strategy
Staging data can be recreated from migration seeds. The primary+replica setup provides read availability and a warm standby, but there is no off-cluster backup to S3 at the staging level. For production-grade backup (CNPG with barman, S3 WAL archiving, PITR), see the Production section below.
Volume Expansion
All PVCs use StorageClass csi-sc-vnpaycloud which has allowVolumeExpansion: true.
For StatefulSets (Redis, Kafka, Typesense, PostgreSQL)
# Example: expand Redis PVC from 5Gi to 10Gi
# From infrastructure/deployments/staging/ directory:
./kc patch pvc data-nx-redis-0 -n nx-broker -p '{"spec":{"resources":{"requests":{"storage":"10Gi"}}}}'
./kc patch pvc data-nx-redis-1 -n nx-broker -p '{"spec":{"resources":{"requests":{"storage":"10Gi"}}}}'
./kc patch pvc data-nx-redis-2 -n nx-broker -p '{"spec":{"resources":{"requests":{"storage":"10Gi"}}}}'
# From project root:
infrastructure/deployments/staging/kc patch pvc data-nx-redis-0 -n nx-broker -p '{"spec":{"resources":{"requests":{"storage":"10Gi"}}}}'The Cinder CSI driver supports online expansion — no pod restart needed.
Resource Summary
Staging (stateful node pool)
| Workload | Pods | CPU req | CPU lim | Mem req | Mem lim | Storage |
|---|---|---|---|---|---|---|
| PG Primary | 1 | 500m | 2 | 1Gi | 4Gi | 20Gi |
| PG Replica | 1 | 250m | 1 | 512Mi | 2Gi | 20Gi |
| PgBouncer | 1 | 100m | 500m | 256Mi | 512Mi | - |
| Redis (x3) | 3 | 300m | 3 | 1.125Gi | 1.5Gi | 15Gi |
| Kafka (x3) | 3 | 1.5 | 4.5 | 3.75Gi | 6Gi | 30Gi |
| Typesense | 1 | 200m | 1 | 512Mi | 2Gi | 5Gi |
| Debezium | 1 | 250m | 1 | 512Mi | 2Gi | - |
| Total | 11 | 3.1 | 13 | 7.67Gi | 18Gi | 90Gi |
Production (Planned)
Not Yet Deployed
The production data layer is planned but not yet deployed. The following describes the target architecture. Actual manifests will be created when production infrastructure is provisioned.
| Component | Planned Production Setup |
|---|---|
| PostgreSQL | CloudNativePG operator (1 primary + 2 replicas), PgBouncer sidecar, continuous WAL archiving to S3, 30d retention, PITR |
| Redis | Redis Sentinel or Cluster with replicas (1 master + 2 replicas per shard) |
| Kafka | Strimzi operator (3 brokers, rack-aware, declarative KafkaTopic/KafkaUser CRDs, Cruise Control) |
| Typesense | 3-node Raft cluster with automatic leader election |
| Failover | Automatic: <30s for PG (CNPG), <15s for Redis (Sentinel), built-in for Kafka (ISR) |
| Backups | Continuous WAL + daily base to S3 (PG), AOF + RDB to S3 (Redis), log retention + replication (Kafka) |