Operations
1. Deployment
| Property | Value |
|---|---|
| Base image | debian:12-slim (standalone Bun binary) |
| Container | e.g. dev-nx-signal |
| Internal port | 3000 (external 31090) |
| Network | nx-network (external, shared) |
| Replicas | horizontally scalable (state is in Redis + DB, not the pod) |
| Snowflake ID | 9 |
| Migration mode | run-on-boot seeds (alwaysRun) — permissions + role-permissions only |
| Health | GET /v1/api/signal/health (public, from VerifierApplication) |
Volumes
| Host Path | Container Path | Purpose |
|---|---|---|
packages/signal/dist/bin | /app/bin | Compiled binary (bun run rebuild) |
packages/signal/resources | /app/resources | Banner, test client HTML |
infrastructure/deployments/develop/signal/start.sh | /app/start.sh | Startup script |
Traefik — dual routing
Signal exposes two routers sharing one backend (port 3000):
| Router | Rule | Middleware | Why |
|---|---|---|---|
signal-rest | PathPrefix(/v1/api/signal) | rate-limit, circuit-breaker, security-headers | Standard protection for short-lived HTTP requests |
signal-ws | PathPrefix(/stream) | none | Persistent WS connections break under request-rate limiting / circuit-breaking |
yaml
labels:
- "traefik.enable=true"
- "traefik.http.routers.signal-rest.rule=PathPrefix(`/v1/api/signal`)"
- "traefik.http.routers.signal-rest.middlewares=rate-limit@file,circuit-breaker@file,security-headers@file"
- "traefik.http.routers.signal-ws.rule=PathPrefix(`/stream`)"
- "traefik.http.services.signal.loadbalancer.server.port=3000"The separate WS router (no middleware) is the key deployment invariant. See ADR-0003 and ADR-0001.
Redis bus
Signal's WebSocketServerHelper and every publisher's WebSocketEmitter must point at the same Redis instance/cluster (APP_ENV_WEBSOCKET_REDIS_*):
| Service | Component | Role |
|---|---|---|
| Signal | WebSocketServerHelper | subscribe + publish (full server) |
| Sale / Payment | WebSocketEmitter | publish only |
Multi-instance fan-out
| Scenario | Behavior |
|---|---|
| Local delivery | Target client on this instance → delivered directly (no Redis round-trip) |
| Remote delivery | Target on another instance → publish to Redis, owning instance delivers |
| Broadcast / room | Published to Redis; every instance delivers to its local matching clients |
2. Observability
| Signal | Source | Where to look |
|---|---|---|
| Logs | stdout (structured key-value) | kubectl logs / Loki |
| Health | GET /v1/api/signal/health | Gateway portal / Traefik check (30s) |
| OpenAPI | GET /v1/api/signal/doc, /explorer | Scalar UI |
| Traces | none (no-op) | — |
Key log lines
| Area | What it tells you |
|---|---|
| Kafka consumer | topic / partition / offset per message; broker connect/disconnect |
| Notification push | topic / room / recipientId per WS emit; warns when emitter not ready |
| Handshake | warns on missing/invalid public key or unsupported auth type |
3. Security
| Concern | Mitigation |
|---|---|
| AuthN (REST) | JWT + Basic, verified via remote JWKS |
| AuthN (WS) | type=Bearer JWT verified by JWKSVerifierTokenService at handshake |
| AuthZ | WebSocketClient.* permissions on client-mgmt routes; /notifications scoped by JWT subject |
| E2E encryption | Mandatory ECDH P-256 + AES-256-GCM on /stream (requireEncryption: true); per-client ephemeral keys (forward secrecy), deleted on disconnect |
| Plaintext exceptions | only connected and error events |
| TLS | wss:// terminated at Traefik in production |
| Public endpoints | /health, /doc, /openapi.json, /explorer, and GET /socket/websocket/clients/status |
| Room ACL | currently passthrough (accepts all rooms) — merchant-scoped ACL is a known TODO |
| Network | internal port 3000 not host-exposed; access only via Traefik |
4. Runbook
4.1 Alert classes
| Alert | Trigger | Check | Fix | Escalate |
|---|---|---|---|---|
signalConsumerStalled | no offset progress on signal.activity-notification | consumer logs / lag | restart pod; verify brokers + SASL | on-call backend |
signalWsEmitterNotReady | repeated "WebSocket emitter not ready" warnings | Redis connectivity | check APP_ENV_WEBSOCKET_REDIS_*, Redis health | on-call SRE |
signalHandshakeRejects | spike in handshake rejections | identity JWKS reachability, clock skew | verify identity URL + APP_ENV_WEBSOCKET_ECDH_INFO parity | on-call backend |
signalDuplicateNotifications | duplicate rows after redelivery | consumer offset/commit logs | de-dup query; mitigate (no built-in dedup — see ADR-0002) | on-call backend |
4.2 Common operations
| Operation | Command |
|---|---|
| Tail logs | kubectl logs -n <ns> -f deploy/signal |
| Check WS readiness | curl /v1/api/signal/socket/websocket/clients/status |
| Inspect connected clients | GET /socket/websocket/clients (JWT + permission) |
| Replay notifications | re-publish to signal.activity-notification (note: creates duplicate rows) |
| Reseed permissions | run signal migrations (idempotent) — operator-run only |
5. Related Pages
- Configuration
- Encryption
/runbook/— central runbook for cross-service incidents- Decisions