Skip to content

Operations

1. Deployment

PropertyValue
Imageregistry/nx-seller-payment:<tag>
Container Port3000
External Port31040
Default modeFULL (snowflake 4)
Production modeAPI (snowflake 8) + N× WORKER (snowflake 91+i)
Replicas (FULL)1 (dev only)
Replicas (API)1+ (scale by traffic)
Replicas (WORKER)1+ (scale by queue depth)
Resources (req/lim)200m / 1 CPU, 512Mi / 1Gi memory
Live probeGET /v1/api/payment/healthz
Ready probeGET /v1/api/payment/readyz
Migrationrun as separate Job before rollout (RUN_MODE=migrate)

Traefik routing labels

yaml
labels:
  - "traefik.enable=true"
  - "traefik.http.routers.payment.rule=PathPrefix(`/v1/api/payment`)"
  - "traefik.http.services.payment.loadbalancer.server.port=3000"

WORKER pods don't expose REST — no Traefik labels needed.

Required infrastructure

DependencyWhy
PostgreSQLConfiguration, WebhookConfig, MQ-Pay tables
RedisBullMQ for scheduler + confirmation queues; WS emitter
@nx/identity reachableJWKS verification
VN Pay reachableProvider integration
Webhook subscribers reachableAt least sale must accept webhook deliveries

2. Observability

SignalSourceWhere to look
Logsstdoutkubectl logs deploy/payment-{api,worker}
Health/healthz, /readyzGateway portal
OpenAPI live specGET /v1/api/payment/doc/openapi.jsonGateway portal
BullMQ queue depthRedisbull-board / Burrow
Webhook dispatch errorslogsgrep WebhookDispatcherService

Key log fields

FieldSourceNotes
modeenv APP_ENV_MQ_PAY_MODElog on boot
transactionId / attemptIdper-eventtrace per-payment lifecycle
webhookConfigIddispatch logswhich subscriber received
paymentProviderevent payloadVNPAY_QR_MMS / etc.

3. Security

ConcernMitigation
Provider credential leakStored AES-256-GCM encrypted in Configuration.credential; column hidden from CRUD; key from K8s secret
Encryption key consistencyAPP_ENV_APPLICATION_SECRET MUST be identical across all pods (API + WORKER)
IPN forgeryProvider signature verified in @nx/mq-pay PaymentVerificationService
Webhook tamperingWebhookConfig.signingMethod (HMAC_SHA256) — subscribers verify HMAC of timestamp + eventType + body
AuthNJWT (verified locally)
AuthZCasbin via PolicyDefinition
Network policyCilium — allow gateway + sale + identity + Redis + Postgres + VN Pay (egress)
Secret rotationRotate APP_ENV_APPLICATION_SECRET requires re-encrypting all Configuration.credential rows (manual procedure)

4. Runbook

4.1 Alert classes

AlertTriggerCheckFixEscalate
PaymentHighErrorRate5xx >5% over 5mlogs, identify endpointrollback if recent deployon-call backend
PaymentQueueLagBullMQ pending >1000Redis bull-boardscale WORKER replicason-call SRE
PaymentWorkerStarvationWORKER pods 0 active jobs but queue has workcheck unique snowflake IDs (collision = no jobs)fix APP_ENV_NODE_ID per workeron-call SRE
PaymentWebhookDispatchFailureswebhook delivery error rate >10%logs WebhookDispatcherServicecheck subscriber health (sale, etc.)on-call backend
PaymentProviderUnreachableVN Pay 5xxlogs + provider status pagewait for provider; switch to alternate if configuredon-call backend
PaymentEncryptionMismatch"decrypt failed" log spikecheck APP_ENV_APPLICATION_SECRET consistency across podsredeploy with consistent secreton-call SRE — HIGH

4.2 Common operations

OperationCommand
Tail API logskubectl logs -n <ns> -f deploy/payment-api
Tail worker logskubectl logs -n <ns> -f deploy/payment-worker
Run migrationskubectl run payment-migrate --image=...:tag --env="RUN_MODE=migrate" --rm --restart=Never
List subscribed webhooksSELECT id, name, url, event_types, status FROM "WebhookConfig" WHERE deleted_at IS NULL;
Replay a stuck transactionVia MQ-Pay admin UI or DB: UPDATE Transaction SET status = 'NEW' WHERE id = '...' (requires senior approval)
Inspect encrypted configsSELECT id, code, principal_type, principal_id, environment FROM "Configuration" WHERE group = 'INTEGRATION'; (do not select credential)

4.3 Recovery scenarios

ScenarioRecovery
WORKER pod crashes mid-jobBullMQ re-delivers; idempotent at MQ-Pay layer
API pod crashes during transaction createSale receives 5xx; sale retries the create — MQ-Pay deduplicates by sale's transaction reference
Webhook subscriber downWebhookDispatcherService retries per WebhookConfig.metadata.maxRetries; eventually fails. Manual replay tool fetches transaction state and re-dispatches
Encryption secret rotated incorrectlyWorker can't decrypt provider creds → all transactions fail. Roll back env change immediately.
Snowflake ID collision (two WORKER pods same NODE_ID)Duplicate primary keys in MQ-Pay tables → INSERT errors. Verify each worker has unique APP_ENV_NODE_ID (91, 92, …)

5. Cross-Service Runbook

Proprietary and Confidential. Unauthorized copying, distribution, or use of this software is strictly prohibited.