Operations
1. Deployment
| Property | Value |
|---|---|
| Image | registry/nx-seller-payment:<tag> |
| Container Port | 3000 |
| External Port | 31040 |
| Default mode | FULL (snowflake 4) |
| Production mode | 1× API (snowflake 8) + N× WORKER (snowflake 91+i) |
| Replicas (FULL) | 1 (dev only) |
| Replicas (API) | 1+ (scale by traffic) |
| Replicas (WORKER) | 1+ (scale by queue depth) |
| Resources (req/lim) | 200m / 1 CPU, 512Mi / 1Gi memory |
| Live probe | GET /v1/api/payment/healthz |
| Ready probe | GET /v1/api/payment/readyz |
| Migration | run as separate Job before rollout (RUN_MODE=migrate) |
Traefik routing labels
yaml
labels:
- "traefik.enable=true"
- "traefik.http.routers.payment.rule=PathPrefix(`/v1/api/payment`)"
- "traefik.http.services.payment.loadbalancer.server.port=3000"WORKER pods don't expose REST — no Traefik labels needed.
Required infrastructure
| Dependency | Why |
|---|---|
| PostgreSQL | Configuration, WebhookConfig, MQ-Pay tables |
| Redis | BullMQ for scheduler + confirmation queues; WS emitter |
@nx/identity reachable | JWKS verification |
| VN Pay reachable | Provider integration |
| Webhook subscribers reachable | At least sale must accept webhook deliveries |
2. Observability
| Signal | Source | Where to look |
|---|---|---|
| Logs | stdout | kubectl logs deploy/payment-{api,worker} |
| Health | /healthz, /readyz | Gateway portal |
| OpenAPI live spec | GET /v1/api/payment/doc/openapi.json | Gateway portal |
| BullMQ queue depth | Redis | bull-board / Burrow |
| Webhook dispatch errors | logs | grep WebhookDispatcherService |
Key log fields
| Field | Source | Notes |
|---|---|---|
mode | env APP_ENV_MQ_PAY_MODE | log on boot |
transactionId / attemptId | per-event | trace per-payment lifecycle |
webhookConfigId | dispatch logs | which subscriber received |
paymentProvider | event payload | VNPAY_QR_MMS / etc. |
3. Security
| Concern | Mitigation |
|---|---|
| Provider credential leak | Stored AES-256-GCM encrypted in Configuration.credential; column hidden from CRUD; key from K8s secret |
| Encryption key consistency | APP_ENV_APPLICATION_SECRET MUST be identical across all pods (API + WORKER) |
| IPN forgery | Provider signature verified in @nx/mq-pay PaymentVerificationService |
| Webhook tampering | WebhookConfig.signingMethod (HMAC_SHA256) — subscribers verify HMAC of timestamp + eventType + body |
| AuthN | JWT (verified locally) |
| AuthZ | Casbin via PolicyDefinition |
| Network policy | Cilium — allow gateway + sale + identity + Redis + Postgres + VN Pay (egress) |
| Secret rotation | Rotate APP_ENV_APPLICATION_SECRET requires re-encrypting all Configuration.credential rows (manual procedure) |
4. Runbook
4.1 Alert classes
| Alert | Trigger | Check | Fix | Escalate |
|---|---|---|---|---|
PaymentHighErrorRate | 5xx >5% over 5m | logs, identify endpoint | rollback if recent deploy | on-call backend |
PaymentQueueLag | BullMQ pending >1000 | Redis bull-board | scale WORKER replicas | on-call SRE |
PaymentWorkerStarvation | WORKER pods 0 active jobs but queue has work | check unique snowflake IDs (collision = no jobs) | fix APP_ENV_NODE_ID per worker | on-call SRE |
PaymentWebhookDispatchFailures | webhook delivery error rate >10% | logs WebhookDispatcherService | check subscriber health (sale, etc.) | on-call backend |
PaymentProviderUnreachable | VN Pay 5xx | logs + provider status page | wait for provider; switch to alternate if configured | on-call backend |
PaymentEncryptionMismatch | "decrypt failed" log spike | check APP_ENV_APPLICATION_SECRET consistency across pods | redeploy with consistent secret | on-call SRE — HIGH |
4.2 Common operations
| Operation | Command |
|---|---|
| Tail API logs | kubectl logs -n <ns> -f deploy/payment-api |
| Tail worker logs | kubectl logs -n <ns> -f deploy/payment-worker |
| Run migrations | kubectl run payment-migrate --image=...:tag --env="RUN_MODE=migrate" --rm --restart=Never |
| List subscribed webhooks | SELECT id, name, url, event_types, status FROM "WebhookConfig" WHERE deleted_at IS NULL; |
| Replay a stuck transaction | Via MQ-Pay admin UI or DB: UPDATE Transaction SET status = 'NEW' WHERE id = '...' (requires senior approval) |
| Inspect encrypted configs | SELECT id, code, principal_type, principal_id, environment FROM "Configuration" WHERE group = 'INTEGRATION'; (do not select credential) |
4.3 Recovery scenarios
| Scenario | Recovery |
|---|---|
| WORKER pod crashes mid-job | BullMQ re-delivers; idempotent at MQ-Pay layer |
| API pod crashes during transaction create | Sale receives 5xx; sale retries the create — MQ-Pay deduplicates by sale's transaction reference |
| Webhook subscriber down | WebhookDispatcherService retries per WebhookConfig.metadata.maxRetries; eventually fails. Manual replay tool fetches transaction state and re-dispatches |
| Encryption secret rotated incorrectly | Worker can't decrypt provider creds → all transactions fail. Roll back env change immediately. |
Snowflake ID collision (two WORKER pods same NODE_ID) | Duplicate primary keys in MQ-Pay tables → INSERT errors. Verify each worker has unique APP_ENV_NODE_ID (91, 92, …) |