Skip to content

Workloads

Every service running in the BANA cluster, with full specs for replicas, resources, health checks, Snowflake ID assignment, and production HA features.

Backend Services

All backend services share a common pattern:

  • Image: bcr.bana.com.vn/nx-\<service\>:\<tag\>
  • Base image: oven/bun:1.3-alpine
  • Port: 3000 (internal)
  • Health check path: /v1/api/\<service\>/health
  • Restart policy: Always
  • Namespace: nx-backend

Staging vs Production

AspectStagingProduction
Node selectornode.kubernetes.io/pool: defaultnode.kubernetes.io/pool: app
PodDisruptionBudgetNoneYes (HA services)
HPANoneYes (critical services)
topologySpreadConstraintsNoneYes (HA services)
podAntiAffinitySoft (preferred) — all servicesHard (required) — HA services
ReplicasMinimalHigher for HA services

Deployment Template

Deployment: nx-
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nx-<service>
  namespace: nx-backend
  labels:
    app.kubernetes.io/name: <service>
    app.kubernetes.io/part-of: bana
    app.kubernetes.io/component: backend
spec:
  replicas: <count>
  selector:
    matchLabels:
      app.kubernetes.io/name: <service>
  template:
    metadata:
      labels:
        app.kubernetes.io/name: <service>
    spec:
      nodeSelector:
        node.kubernetes.io/pool: default  # staging: default, production: app
      initContainers:
        - name: wait-for-identity
          image: busybox:1.36
          command: ['sh', '-c', 'until wget -qO- http://nx-identity.nx-backend.svc.cluster.local:3000/v1/api/identity/health; do sleep 2; done']
      containers:
        - name: <service>
          image: bcr.bana.com.vn/nx-<service>:<tag>
          ports:
            - containerPort: 3000
          envFrom:
            - configMapRef:
                name: nx-<service>-config
            - secretRef:
                name: nx-<service>-secret
          resources:
            requests:
              cpu: <cpu-req>
              memory: <mem-req>
            limits:
              cpu: <cpu-lim>
              memory: <mem-lim>
          readinessProbe:
            httpGet:
              path: /v1/api/<service>/health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /v1/api/<service>/health
              port: 3000
            initialDelaySeconds: 15
            periodSeconds: 30
            failureThreshold: 3
          startupProbe:
            httpGet:
              path: /v1/api/<service>/health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 36  # 5s × 36 = 180s max startup time
          lifecycle:
            preStop:
              exec:
                command: ["sh", "-c", "sleep 15"]  # Allow endpoint deregistration
          imagePullPolicy: IfNotPresent  # staging; production: Always
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop:
                - ALL
          volumeMounts:
            - name: tmp
              mountPath: /tmp
      terminationGracePeriodSeconds: 45  # 15s preStop + 30s app shutdown
      automountServiceAccountToken: false
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      volumes:
        - name: tmp
          emptyDir:
            sizeLimit: 64Mi

Soft Anti-Affinity (All Deployments)

All backend deployments now include soft pod anti-affinity to spread pods across nodes when possible. This applies to staging and production:

All backend deployments now include soft pod anti-affinity t...
yaml
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app.kubernetes.io/name: <service>
          topologyKey: kubernetes.io/hostname

In staging (2 default nodes), this is a best-effort spread. In production, HA services additionally use hard anti-affinity (requiredDuringSchedulingIgnoredDuringExecution) to guarantee distribution. :::

Why startupProbe + preStop + terminationGracePeriod?
  • startupProbe: Gives slow-starting services (identity JWKS init, Kafka connection) up to 180s to become ready without being killed by the liveness probe.
  • preStop sleep 15: When a pod is terminating, the Kubernetes endpoint controller needs time to deregister it from Services. The 15s sleep ensures in-flight requests complete before SIGTERM arrives.
  • terminationGracePeriodSeconds: 45: 15s preStop + 30s for the application to gracefully drain connections and shut down.

INFO

The wait-for-identity init container is present on all services except identity itself. Identity is the IssuerApplication (JWKS issuer); all others are VerifierApplications that need identity's public keys to validate tokens.

Production HA Features

For HA services (identity, sale, payment-api, signal), production adds:

PodDisruptionBudget

PodDisruptionBudget: nx-{service}-pdb
yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: nx-<service>-pdb
  namespace: nx-backend
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: <service>

HorizontalPodAutoscaler

HorizontalPodAutoscaler: nx-{service}-hpa
yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nx-<service>-hpa
  namespace: nx-backend
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nx-<service>
  minReplicas: 2
  maxReplicas: 5
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Topology & Anti-Affinity (Production Only)

#### Topology & Anti-Affinity (Production Only)
yaml
spec:
  template:
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app.kubernetes.io/name: <service>
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app.kubernetes.io/name: <service>
                topologyKey: kubernetes.io/hostname

Service Specifications

identity

The JWKS issuer — must start first. No wait-for-identity init container.

PropertyStagingProduction
Replicas12 (HPA: 2–5)
CPU req/lim200m / 2200m / 2
Mem req/lim256Mi / 1Gi256Mi / 1Gi
Health path/v1/api/identity/health/v1/api/identity/health
HAYesYes (PDB + HPA + topology)
Snowflake range10–1910–19
PriorityClassnx-highnx-high
Middlewaresrate-limit-auth, circuit-breaker, security-headersrate-limit-auth, circuit-breaker, security-headers
| Middlewares | rate-limit-auth, circuit-breaker, `secur...
yaml
env:
  - name: SNOWFLAKE_MACHINE_ID
    valueFrom:
      fieldRef:
        fieldPath: metadata.annotations['snowflake-id']

commerce

PropertyValue
Replicas1
CPU req/lim200m / 2
Mem req/lim256Mi / 1Gi
Health path/v1/api/commerce/health
HANo
Snowflake range20–29
Middlewaresrate-limit, circuit-breaker, security-headers

sale

PropertyStagingProduction
Replicas12 (HPA: 2–5)
CPU req/lim200m / 2200m / 2
Mem req/lim256Mi / 1Gi256Mi / 1Gi
Health path/v1/api/sale/health/v1/api/sale/health
HAYesYes (PDB + HPA + topology)
Snowflake range30–3930–39
PriorityClassnx-highnx-high
Middlewaresrate-limit, circuit-breaker, security-headersrate-limit, circuit-breaker, security-headers

finance

PropertyValue
Replicas1
CPU req/lim200m / 2
Mem req/lim256Mi / 1Gi
Health path/v1/api/finance/health
HANo
Snowflake range40–49

inventory

PropertyValue
Replicas1
CPU req/lim200m / 2
Mem req/lim256Mi / 1Gi
Health path/v1/api/inventory/health
HANo
Snowflake range50–59

ledger

PropertyValue
Replicas1
CPU req/lim200m / 2
Mem req/lim384Mi / 1Gi
Health path/v1/api/ledger/health
HANo
Snowflake range60–69

pricing

PropertyValue
Replicas1
CPU req/lim200m / 2
Mem req/lim320Mi / 1Gi
Health path/v1/api/pricing/health
HANo
Snowflake range70–79

Payment (2 Deployments, 1 Image)

Payment uses a single container image deployed two ways via the APP_MODE environment variable.

payment-api

PropertyStagingProduction
Replicas12 (HPA: 2–4)
CPU req/lim200m / 2200m / 2
Mem req/lim256Mi / 1Gi256Mi / 1Gi
Health path/v1/api/payment/health/v1/api/payment/health
HAYesYes (PDB + HPA + topology)
Snowflake ID880–84
Extra envAPP_ENV_MQ_PAY_MODE=apiAPP_MODE=api
PriorityClassnx-highnx-high
Middlewaresrate-limit, circuit-breaker, security-headersrate-limit, circuit-breaker, security-headers

Webhook rewriting is configured in Traefik's file-based dynamic config:

Rewrites hook.staging.bana.com.vn/v1/api/* -> /v1/api/payment/*
yaml
# Rewrites hook.staging.bana.com.vn/v1/api/* -> /v1/api/payment/*
- name: payment-webhook
  match: Host(`hook.staging.bana.com.vn`)
  priority: 100
  middlewares:
    - name: payment-add-prefix
    - name: security-headers

payment-worker

PropertyValue
Replicas1
CPU req/lim100m / 1
Mem req/lim256Mi / 1Gi
HANo
Snowflake ID90
Extra envAPP_ENV_MQ_PAY_MODE=worker
PriorityClassnx-low
LivenessProcess check (kill -0 1), not HTTP
TraefikDisabled (no Service/IngressRoute)

Signal (Dual-Route)

PropertyStagingProduction
Replicas12 (HPA: 2–5)
CPU req/lim100m / 1100m / 1
Mem req/lim256Mi / 512Mi256Mi / 512Mi
Health path/v1/api/signal/health/v1/api/signal/health
HAYesYes (PDB + HPA + topology)
Snowflake range90–9990–99
PriorityClassnx-highnx-high

INFO

Signal routing (REST API with middleware and WebSocket without rate limiting) is configured in Traefik's file-based dynamic config, not via IngressRoute CRDs.

Traefik API Gateway

Traefik runs as a Deployment in the nx-internal namespace. It acts as an API gateway for backend routing, not as an ingress controller. External traffic enters through nginx-ingress in nx-internal, which forwards to Traefik.

Deployment: nx-traefik
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nx-traefik
  namespace: nx-internal
  labels:
    app.kubernetes.io/name: traefik
    app.kubernetes.io/part-of: bana
    app.kubernetes.io/component: gateway
spec:
  replicas: 1  # staging: 1, production: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: traefik
  template:
    metadata:
      labels:
        app.kubernetes.io/name: traefik
    spec:
      nodeSelector:
        node.kubernetes.io/pool: default  # staging: default, production: app
      containers:
        - name: traefik
          image: traefik:v3.6
          args:
            - --api.dashboard=true
            - --api.insecure=true
            - --entrypoints.web.address=:8000
            - --entrypoints.traefik.address=:8080
            - --providers.file.directory=/etc/traefik/dynamic
            - --providers.file.watch=true
            - --log.format=json
            - --log.level=INFO
            - --metrics.prometheus=true
            - --metrics.prometheus.addEntryPointsLabels=true
            - --metrics.prometheus.addServicesLabels=true
            - --accesslog=true
            - --accesslog.format=json
          ports:
            - name: web
              containerPort: 8000
            - name: traefik
              containerPort: 8080
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: "1"
              memory: 512Mi
          readinessProbe:
            httpGet:
              path: /ping
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: nx-traefik
  namespace: nx-internal
spec:
  selector:
    app.kubernetes.io/name: traefik
  ports:
    - name: web
      port: 80
      targetPort: 8000
      protocol: TCP
    - name: traefik
      port: 8080
      targetPort: 8080
      protocol: TCP
  type: ClusterIP

TIP

In staging, Traefik runs as a single replica in nx-internal on a default node. In production, it runs 2 replicas on app nodes with podAntiAffinity to spread across hosts.

Snowflake ID Assignment

Each service gets a dedicated range of machine IDs to prevent ID collisions across replicas.

ServiceRangePod 0Pod 1
identity10–1910
commerce20–2920
sale30–3930
finance40–4940
inventory50–5950
ledger60–6960
pricing70–7970
payment-api80–8480
payment-worker9090
signal90–9990

Assignment is done via the pod ordinal index derived from the pod name:

Assignment is done via the pod ordinal index derived from th...
yaml
env:
  - name: POD_NAME
    valueFrom:
      fieldRef:
        fieldPath: metadata.name
  - name: SNOWFLAKE_MACHINE_ID
    value: "$(echo $POD_NAME | grep -oE '[0-9]+$' | awk '{print $1 + <base>}')"

For Deployments (not StatefulSets), use a ConfigMap init script or the Downward API with a sidecar to compute the ID.

Frontend Services

All frontend services use nginx:1.27-alpine serving static assets in the nx-app namespace.

ServicePathPortCPU req/limMem req/lim
client/client808050m/500m64Mi/256Mi
bo/bo808050m/500m64Mi/256Mi
sale-renderer/sale808050m/500m64Mi/256Mi
overture/808050m/500m64Mi/256Mi
wiki/wiki808050m/500m64Mi/256Mi
Deployment: nx-client
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nx-client
  namespace: nx-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: client
  template:
    spec:
      nodeSelector:
        node.kubernetes.io/pool: default  # staging: default, production: app
      containers:
        - name: client
          image: bcr.bana.com.vn/nx-client:latest
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 50m
              memory: 64Mi
            limits:
              cpu: 500m
              memory: 256Mi
          readinessProbe:
            httpGet:
              path: /
              port: 8080
            initialDelaySeconds: 2
            periodSeconds: 10
          volumeMounts:
            - name: nginx-config
              mountPath: /etc/nginx/conf.d/default.conf
              subPath: default.conf
              readOnly: true
      volumes:
        - name: nginx-config
          configMap:
            name: nx-client-nginx

INFO

Frontend images are built by CI/CD and pushed to bcr.bana.com.vn/nx-\<app\>:\<tag\>. The static assets are baked into the image during the CI build step.

Service Objects

Every Deployment gets a corresponding ClusterIP Service:

Service: nx-
yaml
apiVersion: v1
kind: Service
metadata:
  name: nx-<service>
  namespace: nx-backend
spec:
  selector:
    app.kubernetes.io/name: <service>
  ports:
    - port: 3000      # backend
      targetPort: 3000
      protocol: TCP
  type: ClusterIP

Frontend Services use port 8080 instead of 3000. Backend DNS resolves as nx-\<service\>.nx-backend.svc.cluster.local. Frontend DNS resolves as nx-\<service\>.nx-app.svc.cluster.local.

Proprietary and Confidential. Unauthorized copying, distribution, or use of this software is strictly prohibited.