S3 Bucket Policy Mistakes That Exposed Data: A Post-Incident Analysis

Reconstructing the 72-Hour Exposure

A data analytics bucket containing 18 months of customer usage data was publicly accessible for 72 hours due to a misconfigured bucket policy. I found it on a Tuesday morning while reviewing Security Hub findings before our manufacturing shift-change meeting, the kind of quiet routine check that usually produces noisy medium findings and one or two stale exceptions.

Our environment was not exotic. We ran AWS CLI 2.15.22 from Ubuntu 22.04 jump hosts, Python 3.11 for internal validation scripts, Terraform 1.6.6 for infrastructure changes, and FortiOS 7.4.3 on the edge firewalls protecting our plant network. The S3 bucket held exported analytics from production applications, and the data was supposed to stay available only to a small reporting role used by our business intelligence stack.

The mistake was plain once I stopped looking for something clever. One of our developers set a bucket policy to allow s3:GetObject from * for testing and pushed the change to production through a CI/CD pipeline with no policy review step. I first assumed an ACL drifted open after a migration job. I was wrong, and that wrong assumption cost me the first hour of response.

That hour bothered me.

After enabling Block Public Access at the account level and adding bucket policy review to the CI/CD pipeline, public exposure incidents dropped to zero in the following 18 months. Before that, we had three public exposure incidents in nine months, including this one. I do not treat S3 policy review as paperwork anymore; I treat it as a production safety interlock.

Reading S3 Authorization Without Guessing

I had to rebuild the access decision from the ground up because S3 permissions are easy to misread during an incident. In our case, object ACLs were not granting access, bucket ACLs were not the path, and IAM identity policies were not the issue. The bucket policy alone made every object reachable because the statement allowed public principals to call s3:GetObject on the object ARN pattern.

My practical evaluation order during response is simple: I check explicit denies first, then public access block settings, then bucket policy, then access point policy, then ACLs, then identity-based permissions. AWS has more nuance behind the scenes, but that sequence keeps my team from chasing ghosts when minutes matter and plant leadership is asking whether customer data left our control.

aws s3api get-public-access-block --bucket analytics-prod-usage
aws s3api get-bucket-policy-status --bucket analytics-prod-usage
aws s3api get-bucket-policy --bucket analytics-prod-usage | jq .
aws s3api get-bucket-acl --bucket analytics-prod-usage

The dangerous policy looked like a small testing shortcut, which is why it slipped past human attention in a busy sprint. The statement used "Principal": "*", "Action": "s3:GetObject", and "Resource": "arn:aws:s3:::analytics-prod-usage/*". No condition limited source VPC endpoint, organization ID, account, or role. That was the whole failure.

Small JSON can cause big damage.

I now prefer reading S3 exposure as a combined result, not as separate ACL and policy trivia. A private ACL does not save a bucket from a public bucket policy, and a locked-down IAM role does not matter when anonymous access is granted directly at the resource. My opinion is blunt: S3 authorization is manageable only when I force myself to prove the effective access path.

Using Block Public Access Where It Actually Matters

What I didn’t expect was how much our older AWS accounts differed from newer ones. Some newer accounts already had public access blocks set in places that made exposure harder. Our older manufacturing analytics account did not have account-level S3 Block Public Access enabled, so a bad bucket policy could still become effective after deployment.

Bucket-level Block Public Access helps, but I do not consider it enough for a production estate with multiple teams and pipelines. Account-level settings give my team a higher-leverage guardrail because the default posture blocks public ACLs, ignores public ACLs, blocks public bucket policies, and restricts public buckets across every bucket in the account unless we intentionally carve out an exception.

I enable account-level Block Public Access in every production AWS account.
I require a documented exception for any bucket that must serve public content.
I tag approved public buckets with an owner, expiry date, and business justification.
I monitor drift with AWS Config managed rules and Security Hub controls.
I block CI/CD promotion when a policy contains public principals without approved conditions.

The account-level setting changed the failure mode. A bad policy could still be committed, but AWS would refuse to make the bucket publicly reachable. That difference mattered because our pipeline mistake was not eliminated on day one; our blast radius was reduced while we fixed process, ownership, and review gaps.

Guardrails beat reminders.

I also learned to avoid treating public content as a casual exception. If a team needs public assets, I push for CloudFront with origin access control, separate buckets, narrow deployment roles, and explicit logging. Direct public S3 access is rarely the cleanest answer in my environment, and I think older accounts deserve the most suspicion.

You may also find this useful: Check out our guide on Python Network Config Backup: Automating Multi-Vendor Device Snapshots for more practical tips.

Detecting Public Buckets Before Someone Else Does

Our detection path used AWS Config, Security Hub, CloudTrail, and a small Python 3.11 script that compared expected bucket posture against live policy status. AWS Config gave us managed rule coverage, Security Hub gave us a single queue for triage, and CloudTrail showed exactly when the policy changed and which CI/CD role made the call.

I enabled checks for public read access, public write access, public ACLs, and public bucket policies. I also tuned alerts so production analytics buckets paged the on-call security engineer while lower-risk development buckets created tickets. That distinction mattered because our plant operates continuously, and alert fatigue during night shifts creates real operational risk.

Noise is also a vulnerability.

The useful finding was not just “bucket is public.” The useful finding tied together bucket name, account ID, policy status, change actor, pipeline run, and last modified time. Once those fields appeared in the same ticket, my team could move from detection to containment without opening five consoles and asking three application owners for context.

My strongest preference is to make detection boring. I want AWS Config to catch drift, Security Hub to route it, EventBridge to notify, and CloudTrail to prove the change path. A clever dashboard is less valuable than a finding that reaches the right engineer with enough evidence to act immediately.

Put Policy Review Into The Pipeline

The durable fix was not a meeting or a stern message in Slack. We added policy review to the CI/CD pipeline so S3 policy changes had to pass automated checks before production deployment. Our pipeline now rejects public principals unless the policy includes approved constraints such as aws:PrincipalOrgID, a specific AWS account, a known CloudFront origin access control path, or an exception file signed by security.

I keep the control simple because complicated policy linting becomes shelfware. The pipeline parses JSON, identifies statements touching s3:GetObject, s3:PutObject, s3:DeleteObject, and bucket administration actions, then fails the build when public access appears without a known pattern. Human review still exists, but automation catches the easy failures every time.

The computer gets the first vote.

We also changed ownership. Application teams still own their buckets, but my security team owns the policy guardrails and exception criteria. That split works because developers can move quickly inside known rails while security keeps responsibility for exposure risk. I do not want every S3 policy change routed through a committee; I want unsafe patterns stopped before production.

The incident left me with a hard opinion: account-level Block Public Access is the control I enable before I debate tooling maturity, dashboard design, or policy education. Training helps, reviews help, and detection helps, but the best control is the one that turns a rushed testing shortcut into a failed deployment instead of a 72-hour exposure.

I trust S3 safety most when a bad policy cannot become a public incident.