Operations
1. Deployment
| Property | Value |
|---|---|
| Image | registry/nx-seller-commerce:<tag> |
| Container Port | 3000 |
| External Port | 31020 |
| Snowflake ID | 2 |
| Roles | api (HTTP) + worker (CDC consumer + SyncProductWorker) via APP_ENV_WORKERS |
| Replicas (default) | 1 (dev) / 2+ (staging+) |
| Migration mode | RUN_MODE=migrate job before rollout; on-boot for dev |
| Live probe | GET /v1/api/commerce/healthz |
| Ready probe | GET /v1/api/commerce/readyz |
Traefik routing labels
yaml
labels:
- "traefik.enable=true"
- "traefik.http.routers.commerce.rule=PathPrefix(`/v1/api/commerce`)"
- "traefik.http.services.commerce.loadbalancer.server.port=3000"Required infrastructure
| Dependency | Why |
|---|---|
| PostgreSQL | Primary datastore (public + allocation schemas) |
| Cache Redis | Authorization cache + footer-summary cache |
| BullMQ Redis | SYNC_PRODUCT_QUEUE (separate connection) |
| WebSocket Redis | WebSocketEmitter fanout |
| Debezium / Kafka Connect | Tails commerce WAL → CDC topics (the integration seam) |
| Typesense | Search engine (via @nx/search) |
| Minio | Media storage (via @nx/asset) |
@nx/identity reachable | JWKS verification on every JWT |
The API role can serve traffic without the worker role; CDC sync to search runs only where
APP_ENV_WORKERSenables it.
2. Observability
| Signal | Source | Where to look |
|---|---|---|
| Logs | stdout (IGNIS structured logger, key: %s) | kubectl logs deploy/commerce / Loki |
| Health | GET /v1/api/commerce/healthz, /readyz | Gateway portal |
| OpenAPI live spec | GET /v1/api/commerce/doc/openapi.json | Gateway portal explorer |
| Metrics | Traefik gateway (Prometheus scrape) | Grafana — gateway dashboard |
| CDC lag | Debezium / Kafka Connect metrics | Connect REST + Kafka consumer-group lag |
Key log fields
| Field | Source | Notes |
|---|---|---|
requestId | header X-Request-Id | propagated cross-service |
userId | JWT subject | — |
merchantId | request scope | role-based filtering |
productId / primaryProductId | aggregate + sync flows | critical for sync trace |
queueName / job id | BullMQ logs | SYNC_PRODUCT_QUEUE |
Useful log queries
| Question | Query |
|---|---|
| Product sync worker failures | level=error AND SyncProductWorker |
| Aggregate transaction rollbacks | level=error AND (ProductCreateService OR ProductUpdateService) |
| Kafka producer connect issues | Disconnected from broker |
| Encryption failures | level=error AND EncryptService |
3. Security
| Concern | Mitigation |
|---|---|
| AuthN | JWT (ES256, JWKS from identity); BASIC for service-to-service |
| AuthZ | Role-based filtering (SUPER_ADMIN/ADMIN full, OWNER by createdBy, EMPLOYEE via PolicyDefinition, default empty); cached in authorization Redis |
| Credential encryption | AES-256-GCM (EncryptService, APP_ENV_APPLICATION_SECRET); masked display; decrypt internal-only |
| Secrets | K8s Secret mounted as env (APP_ENV_DB_URL, APP_ENV_APPLICATION_SECRET, SASL, Redis) |
| Soft-delete | deletedAt on all entities; deletion guarded by DeletionPolicyService |
| CDC trust | Debezium reads WAL with a dedicated replication role; topics are internal-only |
| Network policy | Cilium — allow only gateway + Postgres + Redis(×3) + identity + Minio + Typesense + Kafka |
4. Runbook
4.1 Alert classes
| Alert | Trigger | Check | Fix | Escalate |
|---|---|---|---|---|
CommerceHighErrorRate | 5xx >5% over 5m | kubectl logs deploy/commerce | grep level=error | identify failing endpoint; rollback recent deploy | on-call backend |
CommerceSyncWorkerBacklog | SYNC_PRODUCT_QUEUE depth rising | BullMQ queue depth | check BullMQ Redis health; raise concurrency | on-call backend |
CommerceCdcLag | Debezium consumer lag growing | Kafka Connect status + consumer-group lag | restart connector; check WAL slot | on-call SRE |
CommerceAggregateRollbacks | aggregate TX errors rising | grep ProductCreateService/MerchantService | inspect request payload; check FK/slug conflicts | on-call backend |
CommerceKafkaProducerDown | producer connect errors | broker health, SASL creds | (low impact — producer idle; fix brokers) | on-call SRE |
CommerceTaxInfoStale | TaxInfo not updating after merchant tax edit | check Merchant CDC → invoice consumer | replay Debezium offset; verify invoice consumer | on-call backend + invoice team |
4.2 Common operations
| Operation | Command |
|---|---|
| Tail logs | kubectl logs -n <ns> -f deploy/commerce |
| Run migrations manually | kubectl exec -it deploy/commerce -- bun run migrate |
| Inspect sync queue | BullMQ admin / Redis LLEN/KEYS on SYNC_PRODUCT_QUEUE |
| Re-trigger CDC snapshot | Kafka Connect connector restart / incremental snapshot |
| Audit provider integration | SELECT code, status FROM "Configuration" WHERE group='...' (credential ciphertext, never plaintext) |
4.3 Recovery scenarios
| Scenario | Recovery |
|---|---|
| Worker crash before sync enqueue | aggregate already committed; re-run sync by re-issuing aggregate update or manual enqueue |
| CDC consumer down | Debezium retains offset; restart connector — events replay from WAL slot |
| Search index drift | trigger Debezium incremental snapshot to re-seed Typesense |
| Inventory missing default location | merchant CDC replay; confirm inventory consumer healthy |
| Wrong masked credential | re-create provider integration (cannot decrypt-and-show; rotate APP_ENV_APPLICATION_SECRET invalidates all) |
5. Cross-Service Runbook
For incidents spanning multiple services, see central runbook/:
6. Related Pages
- Configuration
- API Events — CDC topic list for replay
- Integration — sister-service network
- Decisions