Skip to content

Gateway Resilience

1. Overview

Traefik provides built-in resilience patterns — circuit breaker, rate limiting, retry, and active health checks — without plugins or custom code.

Source: packages/gateway/config/dynamic/middlewares.yml (circuit breaker, rate limiting), infrastructure/deployments/develop/*/docker-compose.yml (health checks)

2. Resilience Flow

3. Circuit Breaker

Configuration

Source: packages/gateway/config/dynamic/middlewares.yml (lines 44–49)

yaml
circuit-breaker:
  circuitBreaker:
    # 5xx-ratio term currently commented out in source:
    # expression: "ResponseCodeRatio(500, 600, 0, 600) > 0.30 || NetworkErrorRatio() > 0.10 || LatencyAtQuantileMS(95.0) > 3000"
    expression: "NetworkErrorRatio() > 0.10 || LatencyAtQuantileMS(95.0) > 3000"
    checkPeriod: 5s
    fallbackDuration: 15s
    recoveryDuration: 30s

Expression Breakdown

ConditionTriggerActive?
NetworkErrorRatio() > 0.10More than 10% of requests result in network errorsYes
LatencyAtQuantileMS(95.0) > 3000P95 latency exceeds 3 secondsYes
ResponseCodeRatio(500, 600, 0, 600) > 0.30More than 30% of responses are 5xx errorsCommented out in source

The expression uses OR logic — any single active condition triggers the circuit breaker. The 5xx-ratio term is present but commented out in middlewares.yml.

Timing Parameters

ParameterValueDescription
checkPeriod5sHow often the expression is evaluated
fallbackDuration15sHow long the circuit stays open before attempting recovery
recoveryDuration30sHow long the recovery phase lasts before fully closing

State Machine

StateBehavior
ClosedNormal operation — requests pass through
OpenAll requests fail fast with 503 for 15 seconds
RecoveringLimited requests allowed to test recovery over 30 seconds

Available Expression Functions

FunctionDescription
NetworkErrorRatio()Ratio of network errors to total requests
ResponseCodeRatio(from, to, dividedByFrom, dividedByTo)Ratio of response codes in ranges
LatencyAtQuantileMS(quantile)Latency at given quantile (0-100)

4. Rate Limiting

Global Rate Limit

Source: packages/gateway/config/dynamic/middlewares.yml (lines 53–59)

Applied to most services — 200 requests/second per client IP:

yaml
rate-limit:
  rateLimit:
    average: 200
    burst: 400
    sourceCriterion:
      ipStrategy:
        depth: 1

Auth-Specific Rate Limit

Source: packages/gateway/config/dynamic/middlewares.yml (lines 62–69)

Stricter limit for authentication endpoints — 30 requests/minute per client IP:

yaml
rate-limit-auth:
  rateLimit:
    average: 30
    burst: 60
    period: 1m
    sourceCriterion:
      ipStrategy:
        depth: 1

Per-IP Rate Limiting

Both rate limiters use sourceCriterion.ipStrategy.depth: 1 to extract the real client IP from the X-Forwarded-For header. This is necessary because Traefik sits behind Nginx — without this setting, all requests would appear to come from Nginx's IP and share a single rate limit bucket.

Rate Limit Response

When rate limit is exceeded, Traefik returns 429 Too Many Requests.

5. Active Health Checks

Source: Docker labels in infrastructure/deployments/develop/*/docker-compose.yml

Traefik periodically checks service health endpoints:

yaml
# Docker label per service
- "traefik.http.services.identity.loadbalancer.healthcheck.path=/v1/api/identity/health"
- "traefik.http.services.identity.loadbalancer.healthcheck.interval=30s"
ParameterValueDescription
path/v1/api/<service>/healthHealth check endpoint (IGNIS built-in)
interval30sCheck every 30 seconds

All 7 backend services (identity, commerce, sale, finance, inventory, payment, signal) have health checks configured at 30-second intervals. If a service fails health checks, Traefik stops routing traffic to it until it recovers.

6. Retry (Per-Service)

Retries can be configured per-service via Docker labels. No service currently configures retry — this is available for future use:

yaml
# Example: retry up to 3 times for commerce service
- "traefik.http.middlewares.retry-commerce.retry.attempts=3"
- "traefik.http.middlewares.retry-commerce.retry.initialinterval=100ms"
- "traefik.http.routers.commerce.middlewares=retry-commerce@docker,rate-limit@file,circuit-breaker@file,security-headers@file"
DocumentDescription
Gateway OverviewIdentity card + service catalog
MiddlewaresFull middleware definitions from source
ObservabilityPrometheus metrics, access logs, Grafana
OperationsDeploy, runbook, alert classes
DecisionsADRs

Proprietary and Confidential. Unauthorized copying, distribution, or use of this software is strictly prohibited.