Gateway Observability
1. Overview
Traefik provides built-in observability through Prometheus metrics, structured JSON access logs, and a real-time dashboard. No custom code or additional exporters needed.
Source: packages/gateway/config/traefik.yml (metrics + access log config), packages/gateway/config/dynamic/middlewares.yml (dashboard routers), infrastructure/deployments/develop/monitoring/docker-compose.yml (Prometheus + Grafana)
2. Observability Stack
3. Prometheus Metrics
Configuration
Source: packages/gateway/config/traefik.yml (lines 37–43)
metrics:
prometheus:
entryPoint: traefik
addEntryPointsLabels: true
addRoutersLabels: true
addServicesLabels: trueThe traefik entrypoint listens on :8080 internally, mapped to host port 30100.
Scrape Configuration
Source: infrastructure/deployments/develop/monitoring/config/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: "traefik"
static_configs:
- targets: ["dev-nx-gateway:8080"]
labels:
instance: "gateway"
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]Prometheus scrapes Traefik metrics every 15 seconds via the Docker network (dev-nx-gateway:8080). It also monitors itself at localhost:9090.
Available Metrics
| Metric | Type | Description |
|---|---|---|
traefik_entrypoint_requests_total | Counter | Total requests per entrypoint |
traefik_entrypoint_request_duration_seconds | Histogram | Request duration per entrypoint |
traefik_router_requests_total | Counter | Total requests per router (service) |
traefik_router_request_duration_seconds | Histogram | Request duration per router |
traefik_service_requests_total | Counter | Total requests per backend service |
traefik_service_request_duration_seconds | Histogram | Duration per backend service |
traefik_service_open_connections | Gauge | Current open connections per service |
traefik_service_server_up | Gauge | Health status per backend server (1=up, 0=down) |
Example Prometheus Queries
# Request rate per service (last 5 minutes)
sum(rate(traefik_service_requests_total[5m])) by (service)
# Error rate (5xx responses)
sum(rate(traefik_service_requests_total{code=~"5.."}[5m]))
/ sum(rate(traefik_service_requests_total[5m]))
# P99 latency per service
histogram_quantile(0.99, rate(traefik_service_request_duration_seconds_bucket[5m]))
# Unhealthy backends
traefik_service_server_up == 04. Structured Access Logs
Configuration
Source: packages/gateway/config/traefik.yml (lines 29–35)
accessLog:
format: json
fields:
headers:
names:
Authorization: drop
Cookie: dropLog Format
Each request produces a JSON log entry:
{
"level": "info",
"msg": "",
"ClientAddr": "192.168.1.100:45678",
"ClientHost": "192.168.1.100",
"Duration": 145000000,
"RequestMethod": "GET",
"RequestPath": "/v1/api/commerce/products",
"OriginStatus": 200,
"ServiceName": "commerce@docker",
"RouterName": "commerce@docker",
"time": "2025-01-20T10:00:00Z"
}Sensitive headers (Authorization, Cookie) are automatically dropped from log output.
Application Logs
Source: packages/gateway/config/traefik.yml (lines 25–27)
log:
level: INFO
format: jsonTraefik application logs (startup, configuration changes, errors) are also in JSON format at INFO level.
5. Traefik Dashboard
Static Configuration
Source: packages/gateway/config/traefik.yml (lines 7–8)
api:
dashboard: trueThe dashboard is enabled but NOT exposed with api.insecure: true. Authentication is handled by file provider routers.
Dashboard Routers
Source: packages/gateway/config/dynamic/middlewares.yml (lines 8–23)
http:
routers:
dashboard:
rule: "PathPrefix(`/api`) || PathPrefix(`/dashboard`)"
entryPoints:
- traefik
service: api@internal
middlewares:
- dashboard-auth
dashboard-redirect:
rule: "Path(`/`)"
entryPoints:
- traefik
service: api@internal
middlewares:
- redirect-to-dashboard
- dashboard-authAccess
Available at http://localhost:30100 (mapped from internal port 8080). Protected by HTTP Basic Authentication (user: nx.eventry).
Dashboard Shows
- All registered routers and their rules
- Backend services and their health status
- Middleware chains applied to each router
- Real-time request metrics
6. Grafana Integration
Container Configuration
Source: infrastructure/deployments/develop/monitoring/docker-compose.yml
| Property | Value |
|---|---|
| Image | grafana/grafana:11.5.2 |
| Host Port | 39300 (internal :3000) |
| Admin User | admin |
| Admin Password | admin |
| Sign Up | Disabled |
| Default Dashboard | traefik-overview.json |
Prometheus Data Source
Source: infrastructure/deployments/develop/monitoring/config/grafana/provisioning/datasources/prometheus.yml
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://dev-nx-prometheus:9090
isDefault: true
editable: falseDashboard Provisioning
Source: infrastructure/deployments/develop/monitoring/config/grafana/provisioning/dashboards/dashboards.yml
providers:
- name: "BANA"
orgId: 1
folder: "BANA"
type: file
disableDeletion: false
editable: true
options:
path: /var/lib/grafana/dashboards
foldersFromFilesStructure: falseA pre-built traefik-overview.json dashboard is provisioned automatically at infrastructure/deployments/develop/monitoring/config/grafana/dashboards/traefik-overview.json.
7. Prometheus Container Configuration
Source: infrastructure/deployments/develop/monitoring/docker-compose.yml
| Property | Value |
|---|---|
| Image | prom/prometheus:v3.2.1 |
| Host Port | 39090 (internal :9090) |
| Retention | 30 days (--storage.tsdb.retention.time=30d) |
| Lifecycle API | Enabled (--web.enable-lifecycle) |
| Config | /etc/prometheus/prometheus.yml |
8. Recommended Alert Rules
The following Prometheus alert rules are recommended for production monitoring. These are not currently deployed — add them to a Prometheus alerting rules file when ready:
groups:
- name: gateway
rules:
- alert: HighErrorRate
expr: >
sum(rate(traefik_service_requests_total{code=~"5.."}[5m]))
/ sum(rate(traefik_service_requests_total[5m])) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High 5xx error rate on gateway"
- alert: HighLatency
expr: >
histogram_quantile(0.99,
rate(traefik_service_request_duration_seconds_bucket[5m])
) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "P99 latency exceeds 2 seconds"
- alert: ServiceDown
expr: traefik_service_server_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Backend service {{ $labels.service }} is down"9. Related Pages
| Document | Description |
|---|---|
| Gateway Overview | Identity card + service catalog |
| Operations | Deploy, runbook, alert classes |
| Middlewares | Full middleware definitions from source |
| Resilience | Circuit breaker states, health checks, retry |
| Decisions | ADRs |