Operations
1. Deployment
| Property | Value |
|---|---|
| Image | registry/nx-seller-identity:<tag> |
| Container Port | 3000 |
| External Port | 31010 |
| Snowflake ID | 1 |
| Replicas (default) | 1 (dev) / 2+ (staging+) |
| Resources (req/lim) | 200m / 1 CPU, 512Mi / 1Gi memory |
| Migration mode | RUN_MODE=migrate job before rollout |
| Live probe | GET /v1/api/identity/healthz |
| Ready probe | GET /v1/api/identity/readyz |
| JWKS endpoint | GET /jw-certs (public, MUST be reachable from all sister services) |
Traefik routing labels
yaml
labels:
- "traefik.enable=true"
- "traefik.http.routers.identity.rule=PathPrefix(`/v1/api/identity`) || Path(`/jw-certs`)"
- "traefik.http.services.identity.loadbalancer.server.port=3000"The
/jw-certspath is intentionally outside/v1/api/identity/so sisters can hit it without the API base path.
Required infrastructure
| Dependency | Why |
|---|---|
| PostgreSQL | Primary datastore (schema identity + shared public.Configuration) |
| Redis | OTP state + auth cache + BullMQ for mail queue |
| SMTP | Email delivery — service starts without it but mail flows fail |
| VN Pay SMS | SMS delivery — same |
| JWKS keypair (env / secret) | Service refuses to start without |
2. Observability
| Signal | Source | Where to look |
|---|---|---|
| Logs | stdout (IGNIS structured logger) | kubectl logs deploy/identity / Loki |
| Health | /healthz, /readyz | Gateway portal |
| OpenAPI live spec | GET /v1/api/identity/doc/openapi.json | Gateway portal |
| Metrics | Traefik gateway | Grafana |
| JWKS check | GET /jw-certs | Manual / synthetic monitor (every 1m) |
Key log fields
| Field | Source | Notes |
|---|---|---|
requestId | header X-Request-Id | Cross-service correlation |
userId | JWT subject | Per-request |
identifier.scheme | sign-in flow | Email vs phone vs username |
otp.namespace | OTP service | verify-email / phone-auth / forgot-password |
kid | JWT header | Key rotation tracking |
3. Security
| Concern | Mitigation |
|---|---|
| JWKS rotation | New kid published; old key remains valid until expiry; sisters fetch on miss |
| Private signing key | K8s Secret, mounted file path; never in env-text |
| Password storage | Bun.password hash (argon2-style) |
| OTP brute-force | Hashed code + max 5 attempts + 10–15min lockout + 60s resend cooldown + daily quota |
| Identifier enumeration | OTP request returns same response shape regardless of identifier existence |
| AuthN | BASIC only on internal endpoints; public /auth/* flows have rate limit (gateway) |
| AuthZ | Casbin via PolicyDefinition |
| TLS | Terminated at Nginx → Traefik → service in plaintext |
| Network policy | Cilium — allow gateway + sister services + SMTP + Redis + Postgres |
| Mail/SMS providers | Credentials encrypted (AES-256-GCM) in Configuration.credential |
4. Runbook
4.1 Alert classes
| Alert | Trigger | Check | Fix | Escalate |
|---|---|---|---|---|
IdentityHighErrorRate | 5xx >5% over 5m | kubectl logs ... | grep level=error | identify failing endpoint | on-call backend |
IdentityJWKSDown | /jw-certs returns non-200 | curl JWKS | restart pod; check signing key mount | on-call SRE — HIGH PRIORITY (cascades to all sisters) |
IdentityMailFailures | mail send error rate spike | Nodemailer errors | verify SMTP creds; check provider | on-call backend |
IdentitySMSFailures | SMS send error rate spike | MQSMSComponent errors | verify VN Pay creds; check provider | on-call backend |
IdentityOTPSpike | OTP request rate >Nx baseline | application log | check for credential-stuffing; tighten rate limit | on-call security |
IdentitySignInFailures | sign-in failure rate >5% | log | identify bad-actor IPs; check legit-user pattern | on-call backend |
4.2 Common operations
| Operation | Command |
|---|---|
| Tail logs | kubectl logs -n <ns> -f deploy/identity |
| Run migrations | kubectl exec -it deploy/identity -- bun run migrate |
| Verify JWKS | curl -s <base>/jw-certs | jq .keys[0] |
| Inspect a user | SELECT * FROM "User" WHERE id = '...'; |
| Check policy definitions for a user | SELECT * FROM "PolicyDefinition" WHERE subject_id = '<userId>' AND subject_type = 'User'; |
| Reset OTP lockout for a user | Delete Redis keys <namespace>:lock:<identifier> |
4.3 Recovery scenarios
| Scenario | Recovery |
|---|---|
| Service crash | All in-flight requests fail; OTP state in Redis survives |
| Mail/SMS provider outage | OTP state queued in Redis; user re-requests after provider recovery |
| Redis outage | OTP flows fail (fail open: HTTP 503); auth cache disabled but JWT verify still works |
| Postgres outage | All endpoints fail; users can still verify EXISTING JWTs (sisters cache JWKS) |
| Lost signing key | Catastrophic — all existing JWTs become unverifiable; rotate keypair + force all users to re-login |