Autonomous Terraform planning with Claude AI infrastructure review

Autonomous Terraform Planning With Claude: AI Reviews Before Humans Approve

Feed Terraform Plans Into Claude Before Review

A plan that opened port 3389 to 0.0.0.0/0 on a production Windows server got approved by two engineers who were rushing through 15 PRs on a Friday afternoon. I was one of the people who saw the pull request and assumed the network change had already been checked by the application owner. That assumption was wrong, and it bothered me because we had the tools to catch it before the apply ever reached our production maintenance window.

Our environment is not exotic. We run Terraform 1.8 from GitHub Actions, deploy from Ubuntu 22.04 runners, use Python 3.11 for glue scripts, and manage a mix of AWS security groups, Azure networking, FortiOS 7.4.3 firewall policy exports, and Windows Server workloads that support manufacturing systems. Human review worked when we had five infrastructure PRs a week. It did not work when cloud requests, plant network segmentation, and vendor access changes started piling up.

So I put Claude in front of the approval step. The workflow is simple: generate a machine-readable Terraform plan, strip noise that does not affect security, send the remaining change set to Claude with a narrow review prompt, and post the result back to the PR before any engineer clicks approve.

Rushing is a control failure.

terraform init -input=false
terraform plan -out=tfplan.binary -input=false
terraform show -json tfplan.binary > tfplan.json
python3.11 scripts/extract_security_changes.py tfplan.json > claude-review-input.json
python3.11 scripts/claude_terraform_review.py claude-review-input.json > review.md

I do not ask Claude to understand our entire infrastructure. I ask it to read the plan like a tired security engineer who never gets tired. That is the job. If a rule opens RDP, SSH, database ports, management interfaces, or unrestricted egress in a way that changes exposure, I want a comment before the PR moves.

Write A Narrow Security Prompt That Respects The Plan

My first version was bad. My AI reviewer flagged every single plan change as a potential risk in the first version — alert fatigue set in by day three and engineers started ignoring the output. That was my mistake, not the model’s. I gave Claude a vague instruction to “review for security issues,” and it responded like a nervous auditor with unlimited red ink.

The prompt improved when I made it smaller. I told Claude to ignore tag changes, description-only changes, harmless variable reshuffling, and resource replacements that did not alter reachable network paths or privileged access. I also gave it the exact categories we cared about in our plant environment, because a manufacturing facility has a different risk profile than a SaaS-only stack.

  • Public inbound access to administrative ports such as 22, 3389, 5985, and 5986
  • New trust relationships between production, vendor, and engineering networks
  • Security group rules that widen CIDR ranges or remove source restrictions
  • IAM changes that introduce wildcard actions or cross-account access
  • Firewall policy changes that bypass FortiOS 7.4.3 inspection paths

Narrow beat clever.

What I didn’t expect was how much better Claude became when I told it what not to review. The useful output did not come from a bigger context window or a more dramatic prompt. It came from removing the temptation to comment on everything. In our environment, a quiet reviewer that blocks one dangerous change is more valuable than a loud reviewer that narrates every harmless diff.

Separate Blocking Findings From Warnings

I split Claude’s output into three severity levels: block, warn, and note. A block means the PR cannot be approved without a human security exception or a code change. A warning means the owner should explain the intent in the PR. A note is informational, mostly there for traceability when we later ask why a certain path changed.

This classification mattered because infrastructure teams will tolerate automation only if it respects their time. If Claude blocks a tag rename or a subnet name cleanup, the system loses credibility fast. If Claude blocks 3389 from 0.0.0.0/0 on a production Windows server, nobody argues about whether the bot was too aggressive.

I made blocking criteria brutally specific. Public administrative access blocks. New inbound database exposure blocks. New wildcard IAM actions on production roles block. FortiOS policy changes that bypass inspection for plant-to-cloud traffic block. Broad egress gets a warning unless it crosses a sensitive boundary, because some vendor appliances still behave like it is 2009 and require ugly outbound rules.

Automation needs teeth.

The before and after changed how our team behaved. Security-critical findings in Terraform PRs increased from 2 per quarter during manual review to 6 caught automatically before apply. That does not mean our engineers got worse. It means our old process had blind spots, especially when reviewers were tired, distracted, or too familiar with a module to read the diff with suspicion.

Post The Review Where Engineers Already Decide

I did not build a separate dashboard. We already make approval decisions inside GitHub, so Claude’s review had to land as a PR comment with enough structure for a human to act quickly. The comment includes the severity, resource address, exact changed field, why it matters, and a suggested remediation. If it cannot point to a Terraform resource, it does not get to block.

The GitHub Action runs after plan generation and before the normal approval gate. On internal repositories, we store the prompt template with the code so reviewers can inspect it like any other control. That transparency helped adoption because nobody wanted an invisible reviewer making unexplained judgments about production infrastructure.

You may also find this useful: Check out our guide on Python Network Config Backup: Automating Multi-Vendor Device Snapshots for more practical tips.

gh pr comment "$PR_NUMBER" \
  --repo "$GITHUB_REPOSITORY" \
  --body-file review.md

if grep -q "SEVERITY: BLOCK" review.md; then
  echo "Claude found blocking Terraform security issues"
  exit 1
fi

The comment format is intentionally boring. I want resource names, affected ports, old and new CIDRs, and the short reason a change is risky. I do not want motivational language, policy lectures, or generic cloud security advice. Engineers in a plant outage window need facts, not a sermon.

Put the warning at the point of approval.

One detail that made the system better was including both the raw Terraform address and the module path. Our modules are reused across plant sites, and a security group name alone can be ambiguous. Seeing module.prod_windows.module.network.aws_security_group_rule.rdp in the comment made it obvious which owner needed to respond.

Tune False Positives Until People Trust The Bot

After the first noisy week, I treated false positives like production bugs. Every ignored comment got reviewed. If Claude warned on harmless tag changes, I changed the extraction script. If it flagged an internal CIDR as public because the context was missing, I added our approved RFC1918 ranges and plant network blocks to the prompt. If it gave vague advice, I tightened the output schema.

The biggest improvement came from comparing before and after values instead of sending entire resources. Claude does not need the whole Terraform universe to decide that 10.44.0.0/16 becoming 0.0.0.0/0 on port 22 is bad. It needs the old value, the new value, the protocol, the port, the environment label, and the resource address.

I also stopped asking for best practices. That phrase invites essays. I ask for a decision: block, warn, note, or no finding. Then I ask for one sentence of reasoning and one concrete fix. Opinionated output is easier to review, easier to disagree with, and easier to improve over time.

Trust is earned in silence.

My current prompt tells Claude to prefer no finding when the security impact is unclear. That sounds backwards for a security team, but it made the blocking findings sharper. I would rather miss a low-grade warning than train engineers to scroll past the review. The system works because when Claude speaks now, the comment usually deserves attention.

I still require human approval. Claude does not know that a vendor is on-site, that a temporary maintenance bridge was discussed in a change meeting, or that a PLC historian migration has a weekend exception. But Claude does not get bored, does not skim after the tenth PR, and does not assume someone else checked the obvious exposure. For Terraform security review, that trade is worth making.

Further Reading: For more in-depth information, refer to the official Fortinet Documentation.

The best AI reviewer on my team is the one that never gets context-blind to an open management port.