Skip to content

Operations

1. Deployment

PropertyValue
Imageregistry/nx-seller-commerce:<tag>
Container Port3000
External Port31020
Snowflake ID2
Rolesapi (HTTP) + worker (CDC consumer + SyncProductWorker) via APP_ENV_WORKERS
Replicas (default)1 (dev) / 2+ (staging+)
Migration modeRUN_MODE=migrate job before rollout; on-boot for dev
Live probeGET /v1/api/commerce/healthz
Ready probeGET /v1/api/commerce/readyz

Traefik routing labels

yaml
labels:
  - "traefik.enable=true"
  - "traefik.http.routers.commerce.rule=PathPrefix(`/v1/api/commerce`)"
  - "traefik.http.services.commerce.loadbalancer.server.port=3000"

Required infrastructure

DependencyWhy
PostgreSQLPrimary datastore (public + allocation schemas)
Cache RedisAuthorization cache + footer-summary cache
BullMQ RedisSYNC_PRODUCT_QUEUE (separate connection)
WebSocket RedisWebSocketEmitter fanout
Debezium / Kafka ConnectTails commerce WAL → CDC topics (the integration seam)
TypesenseSearch engine (via @nx/search)
MinioMedia storage (via @nx/asset)
@nx/identity reachableJWKS verification on every JWT

The API role can serve traffic without the worker role; CDC sync to search runs only where APP_ENV_WORKERS enables it.

2. Observability

SignalSourceWhere to look
Logsstdout (IGNIS structured logger, key: %s)kubectl logs deploy/commerce / Loki
HealthGET /v1/api/commerce/healthz, /readyzGateway portal
OpenAPI live specGET /v1/api/commerce/doc/openapi.jsonGateway portal explorer
MetricsTraefik gateway (Prometheus scrape)Grafana — gateway dashboard
CDC lagDebezium / Kafka Connect metricsConnect REST + Kafka consumer-group lag

Key log fields

FieldSourceNotes
requestIdheader X-Request-Idpropagated cross-service
userIdJWT subject
merchantIdrequest scoperole-based filtering
productId / primaryProductIdaggregate + sync flowscritical for sync trace
queueName / job idBullMQ logsSYNC_PRODUCT_QUEUE

Useful log queries

QuestionQuery
Product sync worker failureslevel=error AND SyncProductWorker
Aggregate transaction rollbackslevel=error AND (ProductCreateService OR ProductUpdateService)
Kafka producer connect issuesDisconnected from broker
Encryption failureslevel=error AND EncryptService

3. Security

ConcernMitigation
AuthNJWT (ES256, JWKS from identity); BASIC for service-to-service
AuthZRole-based filtering (SUPER_ADMIN/ADMIN full, OWNER by createdBy, EMPLOYEE via PolicyDefinition, default empty); cached in authorization Redis
Credential encryptionAES-256-GCM (EncryptService, APP_ENV_APPLICATION_SECRET); masked display; decrypt internal-only
SecretsK8s Secret mounted as env (APP_ENV_DB_URL, APP_ENV_APPLICATION_SECRET, SASL, Redis)
Soft-deletedeletedAt on all entities; deletion guarded by DeletionPolicyService
CDC trustDebezium reads WAL with a dedicated replication role; topics are internal-only
Network policyCilium — allow only gateway + Postgres + Redis(×3) + identity + Minio + Typesense + Kafka

4. Runbook

4.1 Alert classes

AlertTriggerCheckFixEscalate
CommerceHighErrorRate5xx >5% over 5mkubectl logs deploy/commerce | grep level=erroridentify failing endpoint; rollback recent deployon-call backend
CommerceSyncWorkerBacklogSYNC_PRODUCT_QUEUE depth risingBullMQ queue depthcheck BullMQ Redis health; raise concurrencyon-call backend
CommerceCdcLagDebezium consumer lag growingKafka Connect status + consumer-group lagrestart connector; check WAL sloton-call SRE
CommerceAggregateRollbacksaggregate TX errors risinggrep ProductCreateService/MerchantServiceinspect request payload; check FK/slug conflictson-call backend
CommerceKafkaProducerDownproducer connect errorsbroker health, SASL creds(low impact — producer idle; fix brokers)on-call SRE
CommerceTaxInfoStaleTaxInfo not updating after merchant tax editcheck Merchant CDC → invoice consumerreplay Debezium offset; verify invoice consumeron-call backend + invoice team

4.2 Common operations

OperationCommand
Tail logskubectl logs -n <ns> -f deploy/commerce
Run migrations manuallykubectl exec -it deploy/commerce -- bun run migrate
Inspect sync queueBullMQ admin / Redis LLEN/KEYS on SYNC_PRODUCT_QUEUE
Re-trigger CDC snapshotKafka Connect connector restart / incremental snapshot
Audit provider integrationSELECT code, status FROM "Configuration" WHERE group='...' (credential ciphertext, never plaintext)

4.3 Recovery scenarios

ScenarioRecovery
Worker crash before sync enqueueaggregate already committed; re-run sync by re-issuing aggregate update or manual enqueue
CDC consumer downDebezium retains offset; restart connector — events replay from WAL slot
Search index drifttrigger Debezium incremental snapshot to re-seed Typesense
Inventory missing default locationmerchant CDC replay; confirm inventory consumer healthy
Wrong masked credentialre-create provider integration (cannot decrypt-and-show; rotate APP_ENV_APPLICATION_SECRET invalidates all)

5. Cross-Service Runbook

For incidents spanning multiple services, see central runbook/:

Proprietary and Confidential. Unauthorized copying, distribution, or use of this software is strictly prohibited.