The Audit That Found 340 Keys
A security audit found 340 service account keys across 12 GCP projects, with the oldest key having been active for 4 years with no rotation. I remember staring at the export in our conference room while a production line dashboard refreshed behind me, because our manufacturing facility had treated those keys like plumbing: invisible until something leaked.
My first assumption was wrong. I thought most of the keys would belong to old automation jobs nobody used anymore, but many were tied to active build pipelines, vendor integrations, plant-floor reporting scripts, and local developer workflows. We were running FortiOS 7.4.3 at the edge, Ubuntu 22.04 on several internal runners, and Python 3.11 for most of our cloud inventory scripts, yet our credential discipline looked like something from a much older era.
Permanent keys age badly.
I started by pulling every user-managed key across every project and joining it with IAM bindings, key creation time, last-authentication signals where available, repository search results, and owner metadata from our internal CMDB. That gave my team a map we could argue with, which was better than the vague comfort of believing we had a policy.
My opinion is simple: if I cannot explain why a service account key exists, who owns it, where it runs, and when it expires, that key should not survive the next review cycle.
Inventory Keys Across Every Project
We used the GCP CLI first because I wanted repeatable evidence before building anything fancier. The first pass was crude, but it exposed projects where nobody expected to find credentials, including a retired quality-reporting project that still had three active keys attached to a service account with BigQuery read access.
for PROJECT in $(gcloud projects list --format="value(projectId)"); do
echo "project=${PROJECT}"
for SA in $(gcloud iam service-accounts list \
--project "${PROJECT}" \
--format="value(email)"); do
gcloud iam service-accounts keys list \
--iam-account "${SA}" \
--project "${PROJECT}" \
--managed-by user \
--format="csv[no-heading](name,validAfterTime,validBeforeTime)"
done
done
After that, I moved the collection into a Python 3.11 job so we could normalize timestamps, tag owners, and send the output to a dashboard. I cared less about a pretty report and more about making the age of every key impossible to ignore during the weekly security review.
The dashboard changed the conversation.
We sorted by age, privilege, project criticality, and known usage. The worst keys were not always the oldest. One 14-month-old key belonged to a service account with access to production storage buckets used by our maintenance analytics platform, while a 4-year-old key only touched a dead test project. Age mattered, but blast radius mattered more.
- Every key needed an owner, system name, and business purpose.
- Every key older than 90 days needed an exception or removal plan.
- Every privileged key needed a replacement design before deletion.
- Every unknown key was treated as suspicious until proven otherwise.
- Every new key request had to document why Workload Identity Federation would not work.
I do not like credential inventories that stop at counting objects; I like inventories that create pressure to delete things.
Replace Long-Lived Keys With Workload Identity Federation
Workload Identity Federation became the escape route for systems that had been carrying JSON keys only because nobody wanted to redesign authentication. For our GitHub Actions workflows, external CI runners, and a few vendor-managed jobs, we replaced static key files with short-lived credentials issued through trusted identity providers.
What I did not expect was how quickly developers accepted the change once we removed the local friction. I expected a fight over convenience, but the better workflow was actually easier: no downloaded JSON file, no secret pasted into a repository setting, no emergency Slack message asking who had the latest key.
Less secret handling meant fewer meetings.
The implementation detail that mattered most was scoping. We created separate workload identity pools for CI, vendor integrations, and internal automation instead of treating federation as one giant trust bucket. Then we used attribute conditions so a repository, branch, or external identity could only impersonate the specific service account it needed.
For local development, we pushed application default credentials and impersonation instead of handing out service account keys. That required better documentation, but it was a trade I would make again because local convenience had been one of the biggest sources of sprawl.
After a 90-day remediation project, key count reduced from 340 to 47, all with rotation, and Workload Identity Federation replaced 280 key-based authentications. That was not just a cleaner graph; it was 280 fewer places where a forgotten JSON file could become an incident.
My view now is blunt: federation should be the default path, and a user-managed service account key should feel like an exception that needs a written defense.
Find Keys Hiding In Repositories
The ugliest part of the work was repository scanning. Developers had created service account keys for local development and committed them to repositories, and 12 had been pushed to public GitHub repos. Some were old. Some were still valid. None of that felt theoretical when I saw private keys sitting in commit history.
We scanned our internal Git platform, GitHub organizations, archived ZIP exports, build artifacts, and developer documentation folders. I looked for obvious JSON fields like private_key_id, client_email, and type: service_account, but I also checked older branches because manufacturing environments accumulate forgotten migration work the way control cabinets accumulate unlabeled cables.
History keeps secrets longer than people do.
When we found a match, we did not debate intent. We disabled the key, opened an incident ticket, checked access logs, notified the service owner, and rotated dependent workflows. For the 12 public GitHub exposures, we treated them as compromised even if there was no obvious abuse, because public exposure removes the luxury of optimism.
You may also find this useful: Check out our guide on Python Network Config Backup: Automating Multi-Vendor Device Snapshots for more practical tips.
The hardest conversations were not technical. A few engineers felt blamed for a pattern the organization had tolerated for years. I made a point of saying that the system had trained people to take the fastest path, and then we changed the path.
I think secret scanning is not a cleanup activity; it is a control that should run before every merge and keep running after the repository is archived.
Automate Rotation And Expiration Enforcement
We kept 47 keys because some systems were not ready for federation, including a legacy vendor connector and two plant-floor data collection services that could not support impersonation yet. I did not love that, but security work in a manufacturing facility has to respect uptime, maintenance windows, and the fact that not every vendor moves at cloud speed.
For the remaining keys, we enforced rotation with policy and automation. The Python 3.11 job created replacement keys, stored them in our approved secret manager path, notified the owner, waited for a validation signal, and then disabled the old key before deletion. Ubuntu 22.04 runners handled the scheduled job, and our FortiOS 7.4.3 logging path gave us network-side confirmation when old integrations stopped calling out.
Rotation without deletion is theater.
We also added deny-by-default review gates. New key creation generated an alert. Keys without owner tags generated an escalation. Keys approaching 75 days triggered reminders, and anything past 90 days required a documented exception reviewed by my team. That policy made people uncomfortable for about two weeks, which was a fair price for changing behavior.
The before-and-after metric still sits in our quarterly risk deck: 340 keys before, 47 keys after, with all 47 under rotation and 280 authentications moved to Workload Identity Federation. I like that number because it describes risk removed, not just work completed.
My opinion is that rotation is acceptable only for the credentials I cannot eliminate yet.
Keep Deleting Credentials You No Longer Need
The project changed how I look at cloud identity. I used to think of service account keys as small implementation details, but I now treat each one like a standing exception to our preferred security model. If a key exists forever, it behaves like a password nobody has to type and nobody remembers to question.
We still review the remaining 47 keys every month. Some will disappear when vendors modernize. Some will take longer because plant systems have certification and downtime constraints. I can live with that as long as the exceptions are visible, owned, rotated, and shrinking.
The direction matters.
The best result was not the lower key count by itself. The best result was that developers stopped asking for JSON keys as the first answer. They asked how to impersonate a service account, how to configure federation, or how to run a local test without creating a credential that might outlive the project.
I do not believe service account key sprawl is a GCP problem as much as an engineering habit problem, and I trust habits only when the tooling makes the safer path the easier path.

