Skip to content

Operations

1. Deployment

PropertyValue
Imageregistry/nx-seller-identity:<tag>
Container Port3000
External Port31010
Snowflake ID1
Replicas (default)1 (dev) / 2+ (staging+)
Resources (req/lim)200m / 1 CPU, 512Mi / 1Gi memory
Migration modeRUN_MODE=migrate job before rollout
Live probeGET /v1/api/identity/healthz
Ready probeGET /v1/api/identity/readyz
JWKS endpointGET /jw-certs (public, MUST be reachable from all sister services)

Traefik routing labels

yaml
labels:
  - "traefik.enable=true"
  - "traefik.http.routers.identity.rule=PathPrefix(`/v1/api/identity`) || Path(`/jw-certs`)"
  - "traefik.http.services.identity.loadbalancer.server.port=3000"

The /jw-certs path is intentionally outside /v1/api/identity/ so sisters can hit it without the API base path.

Required infrastructure

DependencyWhy
PostgreSQLPrimary datastore (schema identity + shared public.Configuration)
RedisOTP state + auth cache + BullMQ for mail queue
SMTPEmail delivery — service starts without it but mail flows fail
VN Pay SMSSMS delivery — same
JWKS keypair (env / secret)Service refuses to start without

2. Observability

SignalSourceWhere to look
Logsstdout (IGNIS structured logger)kubectl logs deploy/identity / Loki
Health/healthz, /readyzGateway portal
OpenAPI live specGET /v1/api/identity/doc/openapi.jsonGateway portal
MetricsTraefik gatewayGrafana
JWKS checkGET /jw-certsManual / synthetic monitor (every 1m)

Key log fields

FieldSourceNotes
requestIdheader X-Request-IdCross-service correlation
userIdJWT subjectPer-request
identifier.schemesign-in flowEmail vs phone vs username
otp.namespaceOTP serviceverify-email / phone-auth / forgot-password
kidJWT headerKey rotation tracking

3. Security

ConcernMitigation
JWKS rotationNew kid published; old key remains valid until expiry; sisters fetch on miss
Private signing keyK8s Secret, mounted file path; never in env-text
Password storageBun.password hash (argon2-style)
OTP brute-forceHashed code + max 5 attempts + 10–15min lockout + 60s resend cooldown + daily quota
Identifier enumerationOTP request returns same response shape regardless of identifier existence
AuthNBASIC only on internal endpoints; public /auth/* flows have rate limit (gateway)
AuthZCasbin via PolicyDefinition
TLSTerminated at Nginx → Traefik → service in plaintext
Network policyCilium — allow gateway + sister services + SMTP + Redis + Postgres
Mail/SMS providersCredentials encrypted (AES-256-GCM) in Configuration.credential

4. Runbook

4.1 Alert classes

AlertTriggerCheckFixEscalate
IdentityHighErrorRate5xx >5% over 5mkubectl logs ... | grep level=erroridentify failing endpointon-call backend
IdentityJWKSDown/jw-certs returns non-200curl JWKSrestart pod; check signing key mounton-call SRE — HIGH PRIORITY (cascades to all sisters)
IdentityMailFailuresmail send error rate spikeNodemailer errorsverify SMTP creds; check provideron-call backend
IdentitySMSFailuresSMS send error rate spikeMQSMSComponent errorsverify VN Pay creds; check provideron-call backend
IdentityOTPSpikeOTP request rate >Nx baselineapplication logcheck for credential-stuffing; tighten rate limiton-call security
IdentitySignInFailuressign-in failure rate >5%logidentify bad-actor IPs; check legit-user patternon-call backend

4.2 Common operations

OperationCommand
Tail logskubectl logs -n <ns> -f deploy/identity
Run migrationskubectl exec -it deploy/identity -- bun run migrate
Verify JWKScurl -s <base>/jw-certs | jq .keys[0]
Inspect a userSELECT * FROM "User" WHERE id = '...';
Check policy definitions for a userSELECT * FROM "PolicyDefinition" WHERE subject_id = '<userId>' AND subject_type = 'User';
Reset OTP lockout for a userDelete Redis keys <namespace>:lock:<identifier>

4.3 Recovery scenarios

ScenarioRecovery
Service crashAll in-flight requests fail; OTP state in Redis survives
Mail/SMS provider outageOTP state queued in Redis; user re-requests after provider recovery
Redis outageOTP flows fail (fail open: HTTP 503); auth cache disabled but JWT verify still works
Postgres outageAll endpoints fail; users can still verify EXISTING JWTs (sisters cache JWKS)
Lost signing keyCatastrophic — all existing JWTs become unverifiable; rotate keypair + force all users to re-login

5. Cross-Service Runbook

Proprietary and Confidential. Unauthorized copying, distribution, or use of this software is strictly prohibited.