When I Stopped Uploading Grafana Screenshots
I wanted Claude to query our Grafana dashboards during incident analysis without me having to export PNGs and upload them manually. The incident that pushed me over the edge was a pressure-line controller outage on our packaging floor, where three HMIs were throwing stale telemetry, FortiGate logs on FortiOS 7.4.3 showed nothing obvious, and our Ubuntu 22.04 jump box had six browser tabs open while production supervisors waited for an answer.
My first MCP tool schema used generic parameter names like query and target. Claude guessed the wrong meaning for both during a live incident. I meant target as a Grafana dashboard UID, while Claude treated it as a host target and pulled metrics for the wrong VLAN segment. That was my mistake, and it made the response look confident while being operationally useless.
Names matter.
After that, I rebuilt the schema with ugly but precise names: grafana_dashboard_uid, promql_expression, relative_time_range_minutes, and max_datapoints. The descriptions spelled out exactly what each parameter represented and included one good example plus one boundary condition. I do not treat MCP schemas as API decoration anymore; I treat them as the control surface between language and infrastructure.
In our environment, the difference became measurable. Incident diagnosis time with Claude-plus-MCP dropped from 18 minutes to 6 minutes for known failure patterns, mostly because I stopped translating screenshots into words and started letting the model ask narrowly scoped questions against live telemetry. My opinion is simple: vague schemas are worse than no automation because they create wrong answers at machine speed.
Design Tool Schemas Like Runbooks With Guardrails
I write MCP tool definitions the same way I write firewall change notes for my team: explicit nouns, constrained inputs, and no room for heroic interpretation. Claude is good at choosing between well-described actions, but I have watched it make plausible assumptions when I gave it parameters that sounded interchangeable.
- I use parameter names that include the source system, such as
grafana_folder_uid. - I describe units directly, such as minutes, seconds, bytes, or percent.
- I prefer enums when the environment has fixed choices, such as
prod,qa, andlab. - I include operational warnings in descriptions when a query can be expensive.
- I return structured JSON with timestamps, labels, values, and the dashboard URL.
Ambiguity burns time.
For our manufacturing network, I also avoid letting a tool accept free-form infrastructure names unless the backend validates them against our inventory. A model that invents a switch name is annoying in a chat window, but a model that asks Grafana for a non-existent plant segment during a containment call wastes attention I cannot spare.
I currently build these servers with Python 3.11 and the MCP Python SDK because our security tooling already has Python wrappers for Grafana, FortiManager, NetBox, and internal certificate checks. I considered Node.js 22 for the first version, but our on-call engineers were already more comfortable debugging Python virtual environments on Ubuntu 22.04. My opinion: choose the runtime the operations team can repair at 2:00 a.m., not the runtime that looks cleaner in a demo.
Wire Grafana Authentication Without Teaching Claude Secrets
The Grafana tool runs under a service account with read-only access to a small set of dashboards, not my personal token and not a global admin token. I store the token in an environment file owned by root, loaded by systemd, and passed into the MCP process as GRAFANA_TOKEN. Claude never sees the token, and the tool never echoes request headers back into the result.
from mcp.server.fastmcp import FastMCP
import os
import httpx
mcp = FastMCP("plant-grafana")
@mcp.tool()
async def query_grafana_panel(
grafana_dashboard_uid: str,
promql_expression: str,
relative_time_range_minutes: int,
max_datapoints: int = 300
) -> dict:
"""
Query a production Grafana dashboard panel using Prometheus data.
grafana_dashboard_uid must be a real dashboard UID from approved plant dashboards.
relative_time_range_minutes is the lookback window in minutes.
"""
headers = {"Authorization": f"Bearer {os.environ['GRAFANA_TOKEN']}"}
payload = {
"queries": [{"expr": promql_expression, "refId": "A"}],
"from": f"now-{relative_time_range_minutes}m",
"to": "now",
"maxDataPoints": max_datapoints,
}
async with httpx.AsyncClient(timeout=10.0) as client:
response = await client.post(
f"{os.environ['GRAFANA_URL']}/api/ds/query",
headers=headers,
json=payload,
)
response.raise_for_status()
return response.json()
Keep secrets boring.
What I didn’t expect was how much the return shape mattered. When I returned raw Grafana responses, Claude spent tokens explaining Grafana internals. When I returned a trimmed object with series_name, latest_value, min_value, max_value, threshold_hint, and dashboard_url, it started giving answers my team could act on during triage.
I also log every tool call with timestamp, user workstation, dashboard UID, and PromQL expression, but I do not log the model’s full reasoning or any secrets. In regulated manufacturing environments, observability for automation is not optional. My opinion: an MCP server that cannot be audited does not belong near production telemetry.
Separate Resources From Actions Before Things Get Messy
I use MCP resources for relatively stable context and tools for live actions. Our approved dashboard catalog, VLAN-to-line mappings, maintenance windows, and escalation contacts fit better as resources because Claude can read them without executing a query. Live Grafana reads, FortiGate session lookups, and certificate expiration checks fit better as tools because they perform current-state work.
This split kept our implementation cleaner than I expected. A resource called plant://dashboards/packaging gives Claude the approved dashboard UIDs and labels. The tool then requires one of those UIDs before it queries Grafana. That small pattern cut down accidental exploration and made the model ask for data from places my team had already blessed.
Boundaries are design.
I also learned to keep resources short. A giant resource that dumps every dashboard, every panel, and every PromQL expression turns Claude into a search engine with a memory problem. I would rather expose five focused resources than one enormous blob that looks convenient and behaves unpredictably.
For our environment, resources are documentation with structure, while tools are operational verbs with controls. I think teams get into trouble when they blur that line because the demo works either way. Production does not care about the demo; production cares about repeatable behavior under pressure.
You may also find this useful: Check out our guide on Python Network Config Backup: Automating Multi-Vendor Device Snapshots for more practical tips.
Test Locally Before Claude Desktop Touches It
I test every MCP server locally before I connect it to Claude Desktop. I run schema inspection, a mocked Grafana response, a real read-only query, and a failure test with a revoked token. That last one matters because a tool that fails with a vague stack trace teaches Claude nothing useful and leaves the engineer reading logs during an incident.
On my Ubuntu 22.04 workstation, I keep a tiny harness that calls the tool with known parameters and compares the result shape against saved JSON. I am not trying to unit-test Grafana; I am trying to prove that my contract to Claude is stable. Python 3.11, pytest, and a few recorded responses have been enough.
Bad errors spread.
I make failures explicit: dashboard_not_allowed, grafana_timeout, invalid_time_range, and empty_series. Claude handles those much better than a generic exception string. During one compressor alarm review, an empty_series response led it to check whether the sensor had stopped reporting instead of claiming the metric was normal.
My preference is to test MCP tools like operator interfaces, not backend libraries. If the model can choose the right tool, pass valid parameters, receive a clean error, and explain the operational meaning, the tool is ready for a controlled rollout. Anything less is still a lab experiment.
Run MCP Servers Like Real Production Services
I run our MCP servers as systemd services with locked-down users, pinned dependencies, and explicit restart behavior. The Grafana server uses stdio transport for Claude Desktop, while shared team utilities run behind controlled wrappers on hardened Ubuntu 22.04 hosts. I pin package versions, including the MCP SDK version, because surprise dependency changes have no place in incident response.
The service file is intentionally plain: a dedicated user, a virtual environment path, an environment file, Restart=on-failure, and journald logging. I also set resource limits so a bad query cannot chew through CPU on a jump host. That is not glamorous work, but it is the work that keeps automation from becoming another fragile dependency.
Ship the boring parts.
We review tool schemas in the same change process as firewall objects and monitoring rules. My team checks whether names are specific, descriptions are operational, allowed values are constrained, and outputs are small enough for a model to reason over. The review takes minutes, and it catches the exact class of mistakes that hurt me during that first Grafana incident.
I am convinced MCP becomes useful for ops only when I treat it as an interface contract, not a clever chat extension. Claude can help me move faster, but only after I give it tools that describe our environment with the same discipline we expect from our network diagrams, firewall policies, and runbooks.
Further Reading: For more in-depth information, refer to the official Fortinet Documentation.

