Operations
1. Deployment
| Property | Value |
|---|---|
| Image | registry/outreach:<tag> |
| Replicas (default) | 1 (low traffic; stateless API) |
| Resources (req/lim) | small — CPU/memory per cluster baseline |
| HPA target | CPU (optional; traffic is bursty on campaigns) |
| Probes | GET /healthz (live), GET /readyz (ready) |
| Snowflake ID | 10 (APP_ENV_NODE_ID) |
| Migration mode | bun run migrate:dev / boot job; seeds are idempotent (alwaysRun) |
Traefik labels
yaml
labels:
- "traefik.enable=true"
- "traefik.http.routers.outreach.rule=PathPrefix(`/v1/api/outreach`)"
- "traefik.http.services.outreach.loadbalancer.server.port=3000"Multiple replicas are safe: the WebSocket emitter is Redis-backed, so any replica can broadcast to all subscribers. Snowflake worker
10must be unique per running process if scaling beyond one — coordinateAPP_ENV_NODE_ID.
2. Observability
| Signal | Source | Where to look |
|---|---|---|
| Logs | stdout (structured key-value) | kubectl logs <pod> / Loki |
| Metrics | Prometheus /metrics (if enabled) | Grafana |
| Traces | no-op (no tracer wired) | — |
| Health | GET /healthz, GET /readyz | Gateway portal |
| API spec | GET /v1/api/outreach/doc/openapi.json, /doc (Scalar) | Browser |
Key log fields
| Field | Source | Notes |
|---|---|---|
requestId | header X-Request-Id | Propagated cross-service |
inquiryId | WS notify | Inquiry SENT | inquiryId: %s | rooms: %j |
rooms | WS notify | Target observation rooms |
3. Security
| Concern | Mitigation |
|---|---|
| AuthN | JWT (ES256, JWKS from identity) on CRUD + statistics |
| Public endpoints | /inquiries/submit, /subscribers/subscribe, /subscribers/unsubscribe are unauthenticated by design |
| Abuse / spam | No built-in rate limit — rely on Traefik middleware + WAF in front of public submit/subscribe |
| Unsubscribe token | Snowflake unsubscribeToken, hidden from API responses; only valid token can deactivate — see ADR-0001 |
| AuthZ | Casbin permissions seeded for OWNER/EMPLOYEE/CASHIER |
| Secrets | K8s Secret mounted as env (DB, Redis password) |
| TLS | Terminated at gateway |
| Network policy | Cilium — allow gateway + identity JWKS + Redis + Postgres |
| Soft-delete | deletedAt; no hard-delete by default |
4. Runbook
4.1 Alert classes
| Alert | Trigger | Check | Fix | Escalate |
|---|---|---|---|---|
outreachHighErrorRate | 5xx >5% over 5m | logs level=error | inspect DB / Redis connectivity | on-call backend |
outreachSubmitFlood | submit RPS spike | Traefik metrics | enable/raise rate limit at gateway | on-call SRE |
outreachWSNotReady | repeated Socket event service not ready warns | Redis health | restart pod / check Redis | on-call backend |
4.2 Common operations
| Operation | Command |
|---|---|
| Tail logs | kubectl logs -n <ns> -f deploy/outreach |
| Re-run seeds | run migration (seeds alwaysRun, idempotent) — operator-run only |
| Verify spec live | open /v1/api/outreach/doc |
| Spam triage | mark offending inquiries CANCELLED via admin CRUD |
No Kafka replay and no BullMQ queue exist for this service — there is nothing to drain or replay.
5. Related Pages
- Configuration
/runbook/— central runbook for cross-service incidents- Decisions