Import the ready-made Grafana dashboard

An importable Grafana dashboard ships in the repo at docs/grafana/declaragent-fleet-dashboard.json. It aggregates every Prometheus counter + gauge + histogram the runtime exports through 0.7.6 into three rows — MCP health, audit + SIEM, rate limits + dispatch — and lets you filter by server_id / agent / source from the top of the page.

This recipe is the five-minute path from "zero dashboards" to "full fleet at a glance".

Prerequisites

Grafana ≥ 10.
A Prometheus data source already wired in Grafana.
Prometheus scraping at least one Declaragent host's /metrics endpoint.

Step 1 — Expose `/metrics`

declaragent up -d binds /metrics on 127.0.0.1:9464 by default. Override with DECLARAGENT_METRICS_PORT (set to 0 to disable).

To scrape from a remote Prometheus, bind to 0.0.0.0 and require auth (agent.yaml#controlPlane.bind + an OIDC/OAuth2 block — see /reference/control-plane). Don't expose :9464 to the internet without auth.

Step 2 — Prometheus scrape config

# prometheus.yml
scrape_configs:
  - job_name: declaragent
    static_configs:
      - targets:
          - declaragent-host-1:9464
          - declaragent-host-2:9464
          - declaragent-host-3:9464
    metrics_path: /metrics
    scrape_interval: 15s

On Kubernetes, the rendered chart emits a ServiceMonitor by default (gate with declaragent fleet render --no-servicemonitor if you don't run the Prometheus Operator) — no static config needed.

OTel Collector users: point the scrape at the collector's Prometheus exporter port (:9464 by convention), not every CLI host.

Step 3 — Import

Grafana UI

Dashboards → Import → Upload JSON file.
Pick docs/grafana/declaragent-fleet-dashboard.json.
Select your Prometheus data source when prompted for DS_PROMETHEUS.
Dashboard lands under the declaragent tag.

Grafana HTTP API

curl -X POST http://admin:admin@grafana:3000/api/dashboards/db \
  -H 'Content-Type: application/json' \
  -d "$(jq '{
    dashboard: .,
    overwrite: true,
    inputs: [{"name":"DS_PROMETHEUS","type":"datasource","pluginId":"prometheus","value":"Prometheus"}]
  }' docs/grafana/declaragent-fleet-dashboard.json)"

grafana-operator ConfigMap (GitOps)

apiVersion: v1
kind: ConfigMap
metadata:
  name: declaragent-fleet-dashboard
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
data:
  declaragent.json: |-
    # contents of docs/grafana/declaragent-fleet-dashboard.json

Commit this alongside the rendered chart and ArgoCD / Flux reconciles both.

What the three rows show

Row	Signal	Panels
1 · MCP health	Supervisor restarts, drain duration, circuit state, rate-limit rejects	`mcp_server_restarts_total`, `mcp_server_circuit_state`, `mcp_server_circuit_open_total`, `mcp_server_drain_duration_ms`, `mcp_server_rate_limited_total`
2 · Audit + SIEM	Back-pressure active/paused, adaptive batch interval, export throughput, chain lag	`declaragent_audit_backpressure_{active,paused_total,backlog_ms}`, `declaragent_audit_batch_{interval_ms,rows}`, `declaragent_audit_export_{acked_total,failures_total,last_seq}`
3 · Rate limits + dispatch	Provider + per-tool waits, source throughput, DLQ depth	`declaragent_provider_rate_limit_{waits,wait_ms}`, `declaragent_tool_rate_limit_waits_total`, `source_messages_{received,processed,dlq}`, `source_inflight`

Recommended alerts

Wire these in your alertmanager stack. Thresholds are starting points — tune to your SLO.

groups:
  - name: declaragent
    rules:
      - alert: DeclaragentSiemBackpressure
        expr: rate(declaragent_audit_backpressure_paused_total[5m]) > 0
        for: 5m
        labels: { severity: warning }
        annotations:
          summary: "SIEM export falling behind on {{ $labels.instance }}"
          runbook: "https://docs.declaragent.dev/cookbook/siem-audit-export"

      - alert: DeclaragentMcpCircuitOpen
        expr: sum by (server_id) (mcp_server_circuit_state) > 0
        for: 2m
        labels: { severity: critical }

      - alert: DeclaragentDispatchDlqGrowing
        expr: rate(source_messages_dlq_total{kind="dispatch"}[10m]) > 0
        for: 10m
        labels: { severity: warning }

      - alert: DeclaragentToolRateLimitChoked
        expr: rate(declaragent_tool_rate_limit_waits_total[5m]) > 10
        for: 10m
        labels: { severity: warning }
        annotations:
          summary: "Tool {{ $labels.tool }} repeatedly rate-limited"

Distributed tracing

The same counters flow through OTel when OTEL_EXPORTER_OTLP_ENDPOINT is set — see Trace a request in Grafana for the end-to-end span recipe that pairs with this dashboard.

Prerequisites​

Step 1 — Expose /metrics​

Step 2 — Prometheus scrape config​

Step 3 — Import​

Grafana UI​

Grafana HTTP API​

grafana-operator ConfigMap (GitOps)​

What the three rows show​

Recommended alerts​

Distributed tracing​

Related​