Building an MCP Server That Connects Claude to Your Firewall Management System

Building an MCP Server That Connects Claude to Your Firewall Management System

When Firewall Policy Review Became an MCP Problem

I needed Claude to cross-reference firewall policy hit counts against our change log without manually exporting CSVs every time. On our manufacturing network, that meant pulling FortiGate policy data from FortiOS 7.4.3, matching it against approved change tickets, and explaining which rules looked stale, noisy, or suspiciously untouched.

Before I built the MCP server, my weekly policy audit workflow took 45 minutes manually. After I wired Claude to a read-only MCP server, the same workflow ran in under 5 minutes with Claude plus the MCP server. That was not theoretical automation. That was time back before the first production maintenance window of the day.

I was wrong at first.

My first version exposed a write tool before I had tested the read tools. Claude used it in an unexpected chain during a complex analysis query, and although the tool failed because my FortiGate API token lacked write permission, the design mistake was mine. I had trusted intent instead of enforcing capability boundaries, and in security engineering that is a bad trade.

The better architecture was boring by design: Claude could ask questions, retrieve structured firewall state, compare it to change records, and generate findings. It could not modify a policy, disable a rule, add an address object, or commit anything. I now believe read-only is the only sane starting point for AI-assisted firewall administration.

How I Split MCP Tools, Resources, and Prompts

In our environment, I treated MCP tools as live actions, resources as stable context, and prompts as repeatable analyst workflows. That distinction kept the server understandable for my team because every exposed capability had a plain operational purpose. A tool could fetch FortiGate policies. A resource could expose the current audit standard. A prompt could tell Claude how we classify unused rules.

I ran the first production-style test on Ubuntu 22.04 with Python 3.11 and the MCP Python SDK, then connected Claude Desktop to the server over stdio. The server exposed six read-only tools: list policies, get policy details, get hit counts, list address objects, list service objects, and fetch change-log entries from our internal export database.

Small scope wins.

The important design choice was to make Claude compose observations rather than compose firewall operations. If Claude needed to know whether a policy was risky, it had to call read tools, compare fields, and explain the reasoning. It never received an imperative tool like update_policy or disable_rule. My team could inspect the transcript and see exactly which facts drove the answer.

  • Tools returned JSON with policy ID, source, destination, service, action, schedule, comments, and hit counters.
  • Resources held our naming rules, zone map, and approved firewall review checklist.
  • Prompts defined repeatable audits for stale rules, broad services, and missing ticket references.
  • API credentials used a FortiGate read-only admin profile with no configuration permissions.
  • Logs captured tool name, arguments, response size, latency, and Claude session ID.

I prefer this split because it mirrors how a real security engineer works: gather facts, compare against policy, then make a recommendation.

Implementing FortiGate Reads in Python

The Python layer was thinner than I expected. Most of the work was not connecting to the FortiGate REST API; it was normalizing the firewall response into data Claude could reason about without guessing. FortiOS 7.4.3 returns plenty of nested fields, and raw API output carries device-specific quirks that make analysis noisier than needed.

What I didn’t expect was how much better Claude behaved when each tool returned fewer fields with stronger names. Instead of dumping every policy attribute, I returned the fields my team actually uses during audits. That made the model less likely to chase irrelevant metadata and more likely to ask for a second tool call when it needed detail.

from mcp.server.fastmcp import FastMCP
import httpx

mcp = FastMCP("fortigate-readonly")

FORTIGATE_BASE_URL = "https://fw-mfg-core01.example.local"
API_TOKEN = "read_only_token_from_vault"

async def fortigate_get(path: str, params: dict | None = None) -> dict:
    headers = {"Authorization": f"Bearer {API_TOKEN}"}
    async with httpx.AsyncClient(verify="/etc/ssl/certs/fw-ca.pem", timeout=10) as client:
        response = await client.get(f"{FORTIGATE_BASE_URL}{path}", headers=headers, params=params)
        response.raise_for_status()
        return response.json()

@mcp.tool()
async def list_firewall_policies(vdom: str = "root") -> list[dict]:
    data = await fortigate_get("/api/v2/cmdb/firewall/policy", {"vdom": vdom})
    policies = []
    for item in data.get("results", []):
        policies.append({
            "policy_id": item.get("policyid"),
            "name": item.get("name"),
            "source": [x.get("name") for x in item.get("srcaddr", [])],
            "destination": [x.get("name") for x in item.get("dstaddr", [])],
            "service": [x.get("name") for x in item.get("service", [])],
            "action": item.get("action"),
            "status": item.get("status"),
            "comments": item.get("comments", "")
        })
    return policies

The code stayed simple because the permissions did the heavy lifting. The API account could read configuration and monitor counters, but FortiGate would reject writes even if my MCP code drifted later. I still do not treat application logic as a substitute for infrastructure-level guardrails.

Why My Tool Schemas Got Stricter

After the write-tool scare, I tightened every schema. I validated VDOM names against an allow-list. I forced policy IDs to integers. I rejected broad text search unless the query length was between 3 and 80 characters. I also blocked any parameter that looked like a path, URL, shell fragment, or raw FortiGate API endpoint.

This felt excessive until Claude made a perfectly reasonable analysis request that would have passed an empty VDOM string into a lower layer. The API would have defaulted differently than my team expected. That is exactly the kind of quiet failure that creates bad audit evidence.

Validation is design.

You may also find this useful: Check out our guide on Python Network Config Backup: Automating Multi-Vendor Device Snapshots for more practical tips.

For MCP servers in security environments, schemas are not just developer ergonomics. They are part of the control surface. I want Claude to have enough freedom to ask useful questions, but I do not want it inventing operational scope. A strict tool contract makes the model more useful because it forces ambiguity to appear early.

I also added structured error messages. If Claude requested a disabled VDOM, the tool returned a short explanation and the list of allowed VDOMs. If it requested too many policy details at once, the tool told it to batch by policy ID. Those errors improved the analysis flow without giving Claude broader access, which is the right kind of convenience.

Testing With Claude Desktop Before Production

I tested the server locally with Claude Desktop before I let anyone on my team use it against our shared firewall inventory. My desktop config pointed to the Python 3.11 virtual environment, the MCP server process, and a staging token that could only read from a lab FortiGate running FortiOS 7.4.3.

The first test set was intentionally repetitive. I asked Claude to list policies with zero hits, find rules missing ticket IDs, compare policy comments against our change log, and explain whether any broad outbound rules had recent traffic. Then I checked the server logs line by line to confirm every answer came from expected tool calls.

Trust the transcript.

Only after that did I connect it to our production read-only account. Even then, I kept the response size capped, logged every request, and reviewed the first week of sessions with another engineer. Claude was good at correlating stale policy names with change records, but it still needed firm boundaries around what counted as evidence.

My opinion is simple: if an MCP server touches security infrastructure, production testing starts after transcript review, not after the first successful demo.

Extend the Pattern Across Security Operations

Once the firewall MCP server worked, the next obvious targets were our SIEM and vulnerability scanner. I used the same rule: start read-only, expose narrow tools, normalize the output, and make Claude cite the tool results that supported its recommendation. For our SIEM, that meant event-count lookups and saved-search status. For vulnerability management, that meant asset exposure, scanner age, and exception records.

I would not connect Claude directly to remediation workflows yet. I want analyst review between finding and action, especially in a manufacturing facility where a firewall change can break a line controller, a label printer, or a vendor tunnel that nobody remembers until production stops. AI can shorten the investigative path, but it should not own the blast radius.

That boundary matters.

The best MCP servers I have built feel like disciplined junior analysts with perfect access to approved read models and no ability to improvise outside them. They are fast, consistent, and useful because I removed the dangerous parts first. In my view, that is not a limitation of the approach. That is the reason it belongs in security operations.

The safest first MCP server is the one that helps Claude see clearly without letting it touch the controls.

External References


·

·