Gateway Observability

1. Overview

Traefik provides built-in observability through Prometheus metrics, structured JSON access logs, and a real-time dashboard. No custom code or additional exporters needed.

Source: packages/gateway/config/traefik.yml (metrics + access log config), packages/gateway/config/dynamic/middlewares.yml (dashboard routers), infrastructure/deployments/develop/monitoring/docker-compose.yml (Prometheus + Grafana)

2. Observability Stack

3. Prometheus Metrics

Configuration

Source: packages/gateway/config/traefik.yml (lines 37-43)

yaml

metrics:
  prometheus:
    entryPoint: traefik
    addEntryPointsLabels: true
    addRoutersLabels: true
    addServicesLabels: true

The traefik entrypoint listens on :8080 internally, mapped to host port 30100.

Scrape Configuration

Source: infrastructure/deployments/develop/monitoring/config/prometheus/prometheus.yml

yaml

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: "traefik"
    static_configs:
      - targets: ["dev-nx-gateway:8080"]
        labels:
          instance: "gateway"

  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

Prometheus scrapes Traefik metrics every 15 seconds via the Docker network (dev-nx-gateway:8080). It also monitors itself at localhost:9090.

Available Metrics

Metric	Type	Description
`traefik_entrypoint_requests_total`	Counter	Total requests per entrypoint
`traefik_entrypoint_request_duration_seconds`	Histogram	Request duration per entrypoint
`traefik_router_requests_total`	Counter	Total requests per router (service)
`traefik_router_request_duration_seconds`	Histogram	Request duration per router
`traefik_service_requests_total`	Counter	Total requests per backend service
`traefik_service_request_duration_seconds`	Histogram	Duration per backend service
`traefik_service_open_connections`	Gauge	Current open connections per service
`traefik_service_server_up`	Gauge	Health status per backend server (1=up, 0=down)

Example Prometheus Queries

txt

# Request rate per service (last 5 minutes)
sum(rate(traefik_service_requests_total[5m])) by (service)

# Error rate (5xx responses)
sum(rate(traefik_service_requests_total{code=~"5.."}[5m]))
/ sum(rate(traefik_service_requests_total[5m]))

# P99 latency per service
histogram_quantile(0.99, rate(traefik_service_request_duration_seconds_bucket[5m]))

# Unhealthy backends
traefik_service_server_up == 0

4. Structured Access Logs

Configuration

Source: packages/gateway/config/traefik.yml (lines 29-35)

yaml

accessLog:
  format: json
  fields:
    headers:
      names:
        Authorization: drop
        Cookie: drop

Log Format

Each request produces a JSON log entry:

json

{
  "level": "info",
  "msg": "",
  "ClientAddr": "192.168.1.100:45678",
  "ClientHost": "192.168.1.100",
  "Duration": 145000000,
  "RequestMethod": "GET",
  "RequestPath": "/v1/api/commerce/products",
  "OriginStatus": 200,
  "ServiceName": "commerce@docker",
  "RouterName": "commerce@docker",
  "time": "2025-01-20T10:00:00Z"
}

Sensitive headers (Authorization, Cookie) are automatically dropped from log output.

Application Logs

Source: packages/gateway/config/traefik.yml (lines 25-27)

yaml

log:
  level: INFO
  format: json

Traefik application logs (startup, configuration changes, errors) are also in JSON format at INFO level.

5. Traefik Dashboard

Static Configuration

Source: packages/gateway/config/traefik.yml (lines 7-8)

yaml

api:
  dashboard: true

The dashboard is enabled but NOT exposed with api.insecure: true. Authentication is handled by file provider routers.

Dashboard Routers

Source: packages/gateway/config/dynamic/middlewares.yml (lines 8-23)

yaml

http:
  routers:
    dashboard:
      rule: "PathPrefix(`/api`) || PathPrefix(`/dashboard`)"
      entryPoints:
        - traefik
      service: api@internal
      middlewares:
        - dashboard-auth
    dashboard-redirect:
      rule: "Path(`/`)"
      entryPoints:
        - traefik
      service: api@internal
      middlewares:
        - redirect-to-dashboard
        - dashboard-auth

Access

Available at http://localhost:30100 (mapped from internal port 8080). Protected by HTTP Basic Authentication (user: nx.eventry).

Dashboard Shows

All registered routers and their rules
Backend services and their health status
Middleware chains applied to each router
Real-time request metrics

6. Grafana Integration

Container Configuration

Source: infrastructure/deployments/develop/monitoring/docker-compose.yml

Property	Value
Image	`grafana/grafana:11.5.2`
Host Port	39300 (internal :3000)
Admin User	`admin`
Admin Password	`admin`
Sign Up	Disabled
Default Dashboard	`traefik-overview.json`

Prometheus Data Source

Source: infrastructure/deployments/develop/monitoring/config/grafana/provisioning/datasources/prometheus.yml

yaml

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://dev-nx-prometheus:9090
    isDefault: true
    editable: false

Dashboard Provisioning

Source: infrastructure/deployments/develop/monitoring/config/grafana/provisioning/dashboards/dashboards.yml

yaml

providers:
  - name: "BANA"
    orgId: 1
    folder: "BANA"
    type: file
    disableDeletion: false
    editable: true
    options:
      path: /var/lib/grafana/dashboards
      foldersFromFilesStructure: false

A pre-built traefik-overview.json dashboard is provisioned automatically at infrastructure/deployments/develop/monitoring/config/grafana/dashboards/traefik-overview.json.

7. Prometheus Container Configuration

Source: infrastructure/deployments/develop/monitoring/docker-compose.yml

Property	Value
Image	`prom/prometheus:v3.2.1`
Host Port	39090 (internal :9090)
Retention	30 days (`--storage.tsdb.retention.time=30d`)
Lifecycle API	Enabled (`--web.enable-lifecycle`)
Config	`/etc/prometheus/prometheus.yml`

8. Recommended Alert Rules

The following Prometheus alert rules are recommended for production monitoring. These are not currently deployed - add them to a Prometheus alerting rules file when ready:

yaml

groups:
  - name: gateway
    rules:
      - alert: HighErrorRate
        expr: >
          sum(rate(traefik_service_requests_total{code=~"5.."}[5m]))
          / sum(rate(traefik_service_requests_total[5m])) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High 5xx error rate on gateway"

      - alert: HighLatency
        expr: >
          histogram_quantile(0.99,
            rate(traefik_service_request_duration_seconds_bucket[5m])
          ) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P99 latency exceeds 2 seconds"

      - alert: ServiceDown
        expr: traefik_service_server_up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Backend service {{ $labels.service }} is down"

Document	Description
Gateway Overview	Identity card + service catalog
Operations	Deploy, runbook, alert classes
Middlewares	Full middleware definitions from source
Resilience	Circuit breaker states, health checks, retry
Decisions	ADRs

Providers

Invoice Types

Gateway Observability ​

1. Overview ​

2. Observability Stack ​

3. Prometheus Metrics ​

Configuration ​

Scrape Configuration ​

Available Metrics ​

Example Prometheus Queries ​

4. Structured Access Logs ​

Configuration ​

Log Format ​

Application Logs ​

5. Traefik Dashboard ​

Static Configuration ​

Dashboard Routers ​

Access ​

Dashboard Shows ​

6. Grafana Integration ​

Container Configuration ​

Prometheus Data Source ​

Dashboard Provisioning ​

7. Prometheus Container Configuration ​

8. Recommended Alert Rules ​

9. Related Pages ​

Gateway Observability

1. Overview

2. Observability Stack

3. Prometheus Metrics

Configuration

Scrape Configuration

Available Metrics

Example Prometheus Queries

4. Structured Access Logs

Configuration

Log Format

Application Logs

5. Traefik Dashboard

Static Configuration

Dashboard Routers

Access

Dashboard Shows

6. Grafana Integration

Container Configuration

Prometheus Data Source

Dashboard Provisioning

7. Prometheus Container Configuration

8. Recommended Alert Rules

9. Related Pages