AI Agents That Control Your Network: What You Should and Should Not Automate

AI Agents That Control Your Network: What You Should and Should Not Automate

Putting AI Agents Near the Control Plane

My team debated for two weeks whether to let our AI agent push BGP route changes autonomously — we agreed on a human-approval gate for any routing change. The debate started after a night shift incident where a supplier VPN dropped, FortiGate logs on FortiOS 7.4.3 showed asymmetric return traffic, and Claude correctly identified that a route-map adjustment would probably restore the path faster than our runbook did.

I wanted speed. Our network engineers wanted control. Our plant operations team wanted one thing only: no automation experiment taking down a production cell at 2:00 a.m. On Ubuntu 22.04, we had a Python 3.11 service already collecting firewall, switch, DHCP, DNS, and NetFlow context, so the question was not whether the agent could reason over the environment. It could. The question was where I trusted it to touch the control plane.

I was wrong about ACLs first.

An early prototype that could modify ACLs autonomously made a correct change based on incorrect input — it removed a deny rule that should have stayed because the log data had a timestamp bug. The agent saw what looked like stale traffic, matched it against a cleanup rule, and generated a technically valid change. The flaw was ours. We fed it bad evidence and gave it too much authority.

My opinion now is blunt: an AI agent belongs close to the control plane for visibility, but not inside the control plane for irreversible decisions.

Sorting Actions by Reversibility and Blast Radius

We stopped talking about “AI network automation” as one bucket and started classifying actions by reversibility and blast radius. A DHCP lease lookup is not a firewall policy deletion. A switchport description update is not a BGP prepend. A FortiOS 7.4.3 address object rename might be annoying to reverse, while a route leak can punish every production VLAN attached to the wrong upstream.

Our useful categories became simple:

  • Read-only actions: inventory, config collection, log correlation, route table inspection, and packet counter review.
  • Reversible low-risk actions: temporary monitoring rules, ticket updates, interface descriptions, and non-routing metadata.
  • Reversible operational actions: disabling a quarantine rule, expiring a temporary object, or rolling back a known prior change.
  • Human-gated actions: firewall policy edits, NAT changes, routing changes, VPN phase changes, and production VLAN moves.
  • Blocked actions: credential rotation without break-glass review, destructive deletes, and any change lacking a rollback path.

That list ended a lot of vague arguments. We automated 15 production tasks and kept 8 tasks human-gated, with exact gating criteria tied to change type, affected site, maintenance window, and rollback test status. Before this model, our average low-risk investigation handoff took 42 minutes. After it, the same class of incident took 11 minutes because the agent gathered evidence before a human opened the ticket.

Speed is useful only when the boundary is boring.

What I did not expect was how much the classification helped operators trust the agent. Nobody had to guess whether Claude would “just fix it.” The agent read everything, recommended changes, and executed only actions we had proven reversible in our own environment. That made the automation feel less like a junior admin with root access and more like a tireless analyst with a short leash.

My opinion is that reversibility beats intelligence as the first design requirement for network agents.

Making Human Gates Hard to Rubber-Stamp

Human approval can become theater. I have seen change windows where a tired engineer clicks approve because the tool output looks confident and the alert channel is noisy. We avoided that by making the approval packet small, specific, and slightly annoying in the right places. Every gated change includes observed evidence, proposed commands, expected impact, rollback commands, affected assets, and the reason the agent could not self-execute.

The approval screen also forces the reviewer to choose one risk label and type a short reason. That sounds bureaucratic, but it changes behavior. When I have to write “approving BGP local-pref change for supplier WAN failover based on confirmed route loss,” I slow down enough to check the route table twice.

Friction has a job.

For our Python 3.11 automation service, the gate is just data. The agent produces a signed change request, not a CLI session. A separate executor validates the schema, checks policy, verifies approval, and runs the command only if the request still matches live state. We built the state check because network conditions move fast, and an approval based on ten-minute-old data is not approval anymore.

def can_execute(change):
    if change["risk"] in {"routing", "firewall_policy", "vpn_phase"}:
        return change["approved_by_human"] and change["rollback_verified"]

    if change["reversible"] and change["blast_radius"] == "low":
        return change["live_state_hash"] == current_state_hash(change["targets"])

    return False

We also require a second reviewer when the change touches redundant paths at the same time. One firewall rule is one thing. Two WAN routers and a core switch stack are another. The agent can still prepare the work, but my team owns the final blast-radius call, and I like that line exactly where it is.

Logging AI Changes Like Evidence

Audit logging for AI-generated changes has to capture more than the final command. If the log only says an object was changed, I cannot reconstruct why the agent recommended it, what evidence it used, or whether the human approved the same version that executed. We log the prompt context hash, source data timestamps, model response ID, proposed diff, approval identity, executor identity, pre-check result, post-check result, and rollback status.

You may also find this useful: Check out our guide on Python Network Config Backup: Automating Multi-Vendor Device Snapshots for more practical tips.

The timestamp bug from the ACL prototype made this personal. We now reject any recommendation built from inconsistent time sources. Firewall logs, syslog collectors, NetFlow records, and ticket events must agree within a defined tolerance before the agent can recommend a change. In our environment, that tolerance is 90 seconds for operational security decisions and 5 minutes for reporting-only tasks.

Logs are design, not paperwork.

We store AI change records in an append-only index and link them to our normal change tickets. The agent does not get a private history. It lives inside the same evidence chain as a human engineer, which means a post-incident review can compare what Claude saw, what I approved, what the executor ran, and what the network did afterward. That level of detail felt excessive until the first time we used it to prove an AI-recommended DNS block was unrelated to a PLC timeout.

My opinion is that an AI network agent without forensic-grade logging is a production liability, even if every demo looks clean.

Update the Governance Model as the Network Changes

Our governance framework is not a policy PDF that sits untouched. We review it every month, and we revise it after any near miss, major topology change, FortiOS upgrade, new supplier connection, or plant network segmentation project. FortiOS 7.4.3 behavior, Ubuntu 22.04 collector timing, and Python 3.11 parsing libraries are all real implementation details, so governance has to track software reality instead of pretending architecture diagrams are enough.

The metric that convinced our skeptics was not theoretical. We ran automated network changes with zero outages for 90 days using the read-recommend-human-approve model. During that period, Claude recommended firewall object cleanup, stale VPN tunnel reviews, switchport documentation fixes, DNS block candidates, and routing diagnostics. It executed reversible actions, and humans approved anything that could strand production traffic.

That number mattered.

We still argue about the boundary. I expect that. Some actions that are human-gated today may become automated after enough rollback tests and incident-free repetitions. Some actions may move the other way after a vendor upgrade changes behavior. The framework is valuable because it gives my team a disciplined way to adjust permissions without turning each new use case into a philosophical debate.

My current rule is simple: Claude can read everything, recommend almost anything, and execute only what we can reverse quickly with verified state. For irreversible changes, my team stays in the loop. I do not see that as distrust of AI. I see it as respect for the network carrying our production floor.

Further Reading: For more in-depth information, refer to the official Fortinet Documentation.

The safest network agent is the one that knows more than it can change.