Workloads

Every service running in the BANA cluster, with full specs for replicas, resources, health checks, Snowflake ID assignment, and production HA features.

Backend Services

All backend services share a common pattern:

Image: bcr.bana.com.vn/nx-\<service\>:\<tag\>
Base image: oven/bun:1.3.10-alpine
Port: 3000 (internal)
Health check path: /v1/api/\<service\>/health
Restart policy: Always
Namespace: nx-backend

Staging vs Production

Aspect	Staging	Production
Node selector	`node.kubernetes.io/pool: default`	`node.kubernetes.io/pool: app`
PodDisruptionBudget	None	Yes (HA services)
HPA	None	Yes (critical services)
topologySpreadConstraints	None	Yes (HA services)
podAntiAffinity	Soft (preferred) - all services	Hard (required) - HA services
Replicas	Minimal	Higher for HA services

Deployment Template

Deployment: nx-

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nx-<service>
  namespace: nx-backend
  labels:
    app.kubernetes.io/name: <service>
    app.kubernetes.io/part-of: bana
    app.kubernetes.io/component: backend
spec:
  replicas: <count>
  selector:
    matchLabels:
      app.kubernetes.io/name: <service>
  template:
    metadata:
      labels:
        app.kubernetes.io/name: <service>
    spec:
      nodeSelector:
        node.kubernetes.io/pool: default  # staging: default, production: app
      initContainers:
        - name: wait-for-identity
          image: busybox:1.37
          command: ['sh', '-c', 'until wget -qO- http://nx-identity.nx-backend.svc.cluster.local:3000/v1/api/identity/health; do sleep 2; done']
      containers:
        - name: <service>
          image: bcr.bana.com.vn/nx-<service>:<tag>
          ports:
            - containerPort: 3000
          envFrom:
            - configMapRef:
                name: nx-<service>-config
            - secretRef:
                name: nx-<service>-secret
          resources:
            requests:
              cpu: <cpu-req>
              memory: <mem-req>
            limits:
              cpu: <cpu-lim>
              memory: <mem-lim>
          readinessProbe:
            httpGet:
              path: /v1/api/<service>/health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /v1/api/<service>/health
              port: 3000
            initialDelaySeconds: 15
            periodSeconds: 30
            failureThreshold: 3
          startupProbe:
            httpGet:
              path: /v1/api/<service>/health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 36  # 5s × 36 = 180s max startup time
          lifecycle:
            preStop:
              exec:
                command: ["sh", "-c", "sleep 15"]  # Allow endpoint deregistration
          imagePullPolicy: IfNotPresent  # staging; production: Always
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop:
                - ALL
          volumeMounts:
            - name: tmp
              mountPath: /tmp
      terminationGracePeriodSeconds: 45  # 15s preStop + 30s app shutdown
      automountServiceAccountToken: false
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      volumes:
        - name: tmp
          emptyDir:
            sizeLimit: 64Mi

Soft Anti-Affinity (All Deployments)

All backend deployments now include soft pod anti-affinity to spread pods across nodes when possible. This applies to staging and production:

All backend deployments now include soft pod anti-affinity t...

yaml

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app.kubernetes.io/name: <service>
          topologyKey: kubernetes.io/hostname

In staging (2 default nodes), this is a best-effort spread. In production, HA services additionally use hard anti-affinity (requiredDuringSchedulingIgnoredDuringExecution) to guarantee distribution. :::

Why startupProbe + preStop + terminationGracePeriod?

startupProbe: Gives slow-starting services (identity JWKS init, Kafka connection) up to 180s to become ready without being killed by the liveness probe.
preStop sleep 15: When a pod is terminating, the Kubernetes endpoint controller needs time to deregister it from Services. The 15s sleep ensures in-flight requests complete before SIGTERM arrives.
terminationGracePeriodSeconds: 45: 15s preStop + 30s for the application to gracefully drain connections and shut down.

INFO

The wait-for-identity init container is present on all services except identity itself. Identity is the IssuerApplication (JWKS issuer); all others are VerifierApplications that need identity's public keys to validate tokens.

Production HA Features

For HA services (identity, sale, payment-api, signal), production adds:

PodDisruptionBudget

PodDisruptionBudget: nx-{service}-pdb

yaml

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: nx-<service>-pdb
  namespace: nx-backend
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: <service>

HorizontalPodAutoscaler

HorizontalPodAutoscaler: nx-{service}-hpa

yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nx-<service>-hpa
  namespace: nx-backend
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nx-<service>
  minReplicas: 2
  maxReplicas: 5
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Topology & Anti-Affinity (Production Only)

#### Topology & Anti-Affinity (Production Only)

yaml

spec:
  template:
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app.kubernetes.io/name: <service>
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app.kubernetes.io/name: <service>
                topologyKey: kubernetes.io/hostname

Service Specifications

identity

The JWKS issuer - must start first. No wait-for-identity init container.

Property	Staging	Production
Replicas	1	2 (HPA: 2-5)
CPU req/lim	200m / 2	200m / 2
Mem req/lim	256Mi / 1Gi	256Mi / 1Gi
Health path	`/v1/api/identity/health`	`/v1/api/identity/health`
HA	Yes	Yes (PDB + HPA + topology)
Snowflake range	10-19	10-19
PriorityClass	`nx-high`	`nx-high`
Middlewares	`rate-limit-auth`, `circuit-breaker`, `security-headers`	`rate-limit-auth`, `circuit-breaker`, `security-headers`

| Middlewares | rate-limit-auth, circuit-breaker, `secur...

yaml

env:
  - name: SNOWFLAKE_MACHINE_ID
    valueFrom:
      fieldRef:
        fieldPath: metadata.annotations['snowflake-id']

commerce

Property	Value
Replicas	1
CPU req/lim	200m / 2
Mem req/lim	256Mi / 1Gi
Health path	`/v1/api/commerce/health`
HA	No
Snowflake range	20-29
Middlewares	`rate-limit`, `circuit-breaker`, `security-headers`

sale

Property	Staging	Production
Replicas	1	2 (HPA: 2-5)
CPU req/lim	200m / 2	200m / 2
Mem req/lim	256Mi / 1Gi	256Mi / 1Gi
Health path	`/v1/api/sale/health`	`/v1/api/sale/health`
HA	Yes	Yes (PDB + HPA + topology)
Snowflake range	30-39	30-39
PriorityClass	`nx-high`	`nx-high`
Middlewares	`rate-limit`, `circuit-breaker`, `security-headers`	`rate-limit`, `circuit-breaker`, `security-headers`

finance

Property	Value
Replicas	1
CPU req/lim	200m / 2
Mem req/lim	256Mi / 1Gi
Health path	`/v1/api/finance/health`
HA	No
Snowflake range	40-49

inventory

Property	Value
Replicas	1
CPU req/lim	200m / 2
Mem req/lim	256Mi / 1Gi
Health path	`/v1/api/inventory/health`
HA	No
Snowflake range	50-59

ledger

Property	Value
Replicas	1
CPU req/lim	200m / 2
Mem req/lim	384Mi / 1Gi
Health path	`/v1/api/ledger/health`
HA	No
Snowflake range	60-69

pricing

Property	Value
Replicas	1
CPU req/lim	200m / 2
Mem req/lim	320Mi / 1Gi
Health path	`/v1/api/pricing/health`
HA	No
Snowflake range	70-79

Payment (2 Deployments, 1 Image)

Payment uses a single container image deployed two ways via the APP_MODE environment variable.

payment-api

Property	Staging	Production
Replicas	1	2 (HPA: 2-4)
CPU req/lim	200m / 2	200m / 2
Mem req/lim	256Mi / 1Gi	256Mi / 1Gi
Health path	`/v1/api/payment/health`	`/v1/api/payment/health`
HA	Yes	Yes (PDB + HPA + topology)
Snowflake ID	8	80-84
Extra env	`APP_ENV_MQ_PAY_MODE=api`	`APP_MODE=api`
PriorityClass	`nx-high`	`nx-high`
Middlewares	`rate-limit`, `circuit-breaker`, `security-headers`	`rate-limit`, `circuit-breaker`, `security-headers`

Webhook rewriting is configured in Traefik's file-based dynamic config:

Rewrites hook.staging.bana.com.vn/v1/api/* -> /v1/api/payment/*

yaml

# Rewrites hook.staging.bana.com.vn/v1/api/* -> /v1/api/payment/*
- name: payment-webhook
  match: Host(`hook.staging.bana.com.vn`)
  priority: 100
  middlewares:
    - name: payment-add-prefix
    - name: security-headers

payment-worker

Property	Value
Replicas	1
CPU req/lim	100m / 1
Mem req/lim	256Mi / 1Gi
HA	No
Snowflake ID	90
Extra env	`APP_ENV_MQ_PAY_MODE=worker`
PriorityClass	`nx-low`
Liveness	Process check (`kill -0 1`), not HTTP
Traefik	Disabled (no Service/IngressRoute)

Signal (Dual-Route)

Property	Staging	Production
Replicas	1	2 (HPA: 2-5)
CPU req/lim	100m / 1	100m / 1
Mem req/lim	256Mi / 512Mi	256Mi / 512Mi
Health path	`/v1/api/signal/health`	`/v1/api/signal/health`
HA	Yes	Yes (PDB + HPA + topology)
Snowflake range	90-99	90-99
PriorityClass	`nx-high`	`nx-high`

INFO

Signal routing (REST API with middleware and WebSocket without rate limiting) is configured in Traefik's file-based dynamic config, not via IngressRoute CRDs.

Traefik API Gateway

Traefik runs as a Deployment in the nx-internal namespace. It acts as an API gateway for backend routing, not as an ingress controller. External traffic enters through nginx-ingress in nx-internal, which forwards to Traefik.

Deployment: nx-traefik

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nx-traefik
  namespace: nx-internal
  labels:
    app.kubernetes.io/name: traefik
    app.kubernetes.io/part-of: bana
    app.kubernetes.io/component: gateway
spec:
  replicas: 1  # staging: 1, production: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: traefik
  template:
    metadata:
      labels:
        app.kubernetes.io/name: traefik
    spec:
      nodeSelector:
        node.kubernetes.io/pool: default  # staging: default, production: app
      containers:
        - name: traefik
          image: traefik:v3.6
          args:
            - --api.dashboard=true
            - --api.insecure=true
            - --entrypoints.web.address=:8000
            - --entrypoints.traefik.address=:8080
            - --providers.file.directory=/etc/traefik/dynamic
            - --providers.file.watch=true
            - --log.format=json
            - --log.level=INFO
            - --metrics.prometheus=true
            - --metrics.prometheus.addEntryPointsLabels=true
            - --metrics.prometheus.addServicesLabels=true
            - --accesslog=true
            - --accesslog.format=json
          ports:
            - name: web
              containerPort: 8000
            - name: traefik
              containerPort: 8080
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: "1"
              memory: 512Mi
          readinessProbe:
            httpGet:
              path: /ping
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: nx-traefik
  namespace: nx-internal
spec:
  selector:
    app.kubernetes.io/name: traefik
  ports:
    - name: web
      port: 80
      targetPort: 8000
      protocol: TCP
    - name: traefik
      port: 8080
      targetPort: 8080
      protocol: TCP
  type: ClusterIP

TIP

In staging, Traefik runs as a single replica in nx-internal on a default node. In production, it runs 2 replicas on app nodes with podAntiAffinity to spread across hosts.

Snowflake ID Assignment

Each service gets a dedicated range of machine IDs to prevent ID collisions across replicas.

Service	Range	Pod 0	Pod 1
identity	10-19	10	-
commerce	20-29	20	-
sale	30-39	30	-
finance	40-49	40	-
inventory	50-59	50	-
ledger	60-69	60	-
pricing	70-79	70	-
payment-api	80-84	80	-
payment-worker	90	90	-
signal	90-99	90	-

Assignment is done via the pod ordinal index derived from the pod name:

Assignment is done via the pod ordinal index derived from th...

yaml

env:
  - name: POD_NAME
    valueFrom:
      fieldRef:
        fieldPath: metadata.name
  - name: SNOWFLAKE_MACHINE_ID
    value: "$(echo $POD_NAME | grep -oE '[0-9]+$' | awk '{print $1 + <base>}')"

For Deployments (not StatefulSets), use a ConfigMap init script or the Downward API with a sidecar to compute the ID.

Frontend Services

All frontend services use nginx:1.27-alpine serving static assets in the nx-app namespace.

Service	Path	Port	CPU req/lim	Mem req/lim
client	`/client`	8080	50m/500m	64Mi/256Mi
bo	`/bo`	8080	50m/500m	64Mi/256Mi
sale-renderer	`/sale`	8080	50m/500m	64Mi/256Mi
overture	`/`	8080	50m/500m	64Mi/256Mi
wiki	`/wiki`	8080	50m/500m	64Mi/256Mi

Deployment: nx-client

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nx-client
  namespace: nx-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: client
  template:
    spec:
      nodeSelector:
        node.kubernetes.io/pool: default  # staging: default, production: app
      containers:
        - name: client
          image: bcr.bana.com.vn/nx-client:latest
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 50m
              memory: 64Mi
            limits:
              cpu: 500m
              memory: 256Mi
          readinessProbe:
            httpGet:
              path: /
              port: 8080
            initialDelaySeconds: 2
            periodSeconds: 10
          volumeMounts:
            - name: nginx-config
              mountPath: /etc/nginx/conf.d/default.conf
              subPath: default.conf
              readOnly: true
      volumes:
        - name: nginx-config
          configMap:
            name: nx-client-nginx

INFO

Frontend images are built by CI/CD and pushed to bcr.bana.com.vn/nx-\<app\>:\<tag\>. The static assets are baked into the image during the CI build step.

Service Objects

Every Deployment gets a corresponding ClusterIP Service:

Service: nx-

yaml

apiVersion: v1
kind: Service
metadata:
  name: nx-<service>
  namespace: nx-backend
spec:
  selector:
    app.kubernetes.io/name: <service>
  ports:
    - port: 3000      # backend
      targetPort: 3000
      protocol: TCP
  type: ClusterIP

Frontend Services use port 8080 instead of 3000. Backend DNS resolves as nx-\<service\>.nx-backend.svc.cluster.local. Frontend DNS resolves as nx-\<service\>.nx-app.svc.cluster.local.

Providers

Invoice Types

Workloads ​

Backend Services ​

Staging vs Production ​

Deployment Template ​

Production HA Features ​

PodDisruptionBudget ​

HorizontalPodAutoscaler ​

Topology & Anti-Affinity (Production Only) ​

Service Specifications ​

identity ​

commerce ​

sale ​

finance ​

inventory ​

ledger ​

pricing ​

Payment (2 Deployments, 1 Image) ​

payment-api ​

payment-worker ​

Signal (Dual-Route) ​

Traefik API Gateway ​

Snowflake ID Assignment ​

Frontend Services ​

Service Objects ​

Workloads

Backend Services

Staging vs Production

Deployment Template

Production HA Features

PodDisruptionBudget

HorizontalPodAutoscaler

Topology & Anti-Affinity (Production Only)

Service Specifications

identity

commerce

sale

finance

inventory

ledger

pricing

Payment (2 Deployments, 1 Image)

payment-api

payment-worker

Signal (Dual-Route)

Traefik API Gateway

Snowflake ID Assignment

Frontend Services

Service Objects