Find the Cluster-Admin Bindings Before They Find Me
An audit of a production EKS cluster found 23 cluster-admin bindings — 18 of them for service accounts that ran batch jobs requiring only read access to specific namespaces. I was sitting in our manufacturing facility’s security room, between FortiGate logs from FortiOS 7.4.3 and a Grafana panel showing overnight job failures, when the RBAC export landed in my terminal. My first reaction was that the scanner had duplicated results. It had not.
My team had treated Kubernetes access like a small operational problem, not like identity control around production workloads that could touch recipes, supplier files, and line telemetry. The cluster was running Kubernetes 1.29 on EKS, our review scripts used Python 3.11, and the jump host was Ubuntu 22.04. None of that mattered once I saw how many subjects could bypass namespace boundaries completely.
That was too much trust.
I made the wrong first assumption. I thought cluster-admin had been granted mostly to humans during break-glass troubleshooting. The real pattern was worse: cluster-admin was the path of least resistance when applications needed any elevated access — we added it to unblock ourselves and never removed it. That is how temporary access becomes architecture, and I do not like architecture that depends on memory.
Audit Cluster-Admin Bindings With Kubectl
I start with the raw bindings because I want the plain truth before a dashboard cleans it up. ClusterRoleBinding objects are the obvious target because they attach cluster-wide roles to users, groups, and service accounts. RoleBinding objects can also reference ClusterRoles, so I check both. I care less about pretty output and more about whether a service account can mutate nodes, secrets, admission objects, or workloads outside its namespace.
kubectl get clusterrolebinding -o json \
| jq -r '.items[]
| select(.roleRef.name=="cluster-admin")
| [.metadata.name,
(.subjects[]? | "\(.kind):\(.namespace // "-"):\(.name)")]
| @tsv'
kubectl get rolebinding --all-namespaces -o json \
| jq -r '.items[]
| select(.roleRef.kind=="ClusterRole" and .roleRef.name=="cluster-admin")
| [.metadata.namespace, .metadata.name,
(.subjects[]? | "\(.kind):\(.namespace // "-"):\(.name)")]
| @tsv'
What I did not expect was how boring the names were. Nothing said “danger.” I saw batch-runner, report-sync, maintenance-job, and metrics-exporter. That made the issue easier to miss because the names sounded operationally harmless, while the permissions were effectively root across the Kubernetes API.
Boring names hide sharp edges.
My opinion is simple: every cluster-admin binding should have a ticket, an owner, an expiration date, and a reason that still sounds valid when read aloud six months later.
Choose Namespace Roles Over Cluster-Wide Power
The fix was not clever. Most of the 18 service accounts needed get, list, and watch against configmaps, pods, and jobs in one namespace. A few needed to create jobs in the same namespace. None needed to update ClusterRoles, read secrets everywhere, patch deployments in other production cells, or touch admission controllers. We replaced broad ClusterRoleBinding objects with Role and RoleBinding objects scoped to the namespace where each batch job actually ran.
- I mapped each service account to its namespace, deployment, and owning team.
- I captured observed API verbs from audit logs before changing permissions.
- I created namespace Roles for read-only jobs and separate write Roles for job creators.
- I removed cluster-admin only after validating the next scheduled run.
- I kept one documented break-glass group outside application service accounts.
The before and after was clean: after replacing all 18 service account cluster-admin bindings with scoped roles, the blast radius of a compromised job pod was reduced from cluster-wide to single-namespace. That metric mattered more than the count of bindings because it described what an attacker could actually do after landing inside a job container.
Scope beats hope.
I prefer Roles by default and ClusterRoles only when the resource itself is cluster-scoped or the operational need is genuinely cross-namespace. Anything else feels like giving a forklift key to someone who asked for a cabinet badge.
Use Review Tools Without Outsourcing Judgment
I use kubectl-who-can when I need to answer a direct question fast: who can delete secrets, create pods, impersonate users, or update clusterrolebindings? It is excellent for reversing the view from “what does this subject have” to “who can perform this dangerous action.” That mattered in our environment because some old group bindings were not obvious from the application manifests.
I also use rbac-tool for broader inspection because it makes excessive access easier to compare across namespaces. I ran it from an Ubuntu 22.04 admin workstation, checked output into a restricted evidence folder, and compared weekly snapshots. The tooling did not replace review; it gave my team a repeatable way to ask better questions during change control.
The tool is not the control.
You may also find this useful: Check out our guide on Python Network Config Backup: Automating Multi-Vendor Device Snapshots for more practical tips.
My preferred review questions are direct. Can this service account create pods? Can it read secrets? Can it impersonate another subject? Can it update RBAC resources? Can it write outside its namespace? If the answer is yes, I want the workload owner to explain the business process, not just the YAML. My view is that RBAC review should feel slightly uncomfortable, because excessive access usually survives in places where nobody wants to slow down and ask why.
Build RBAC Review Into CI/CD
Once the cleanup finished, I did not want to depend on another annual audit. We added a CI/CD gate that scans Kubernetes manifests before they reach the cluster. The gate fails any merge request that introduces cluster-admin bindings for service accounts, references wildcard verbs without an approved label, or binds sensitive ClusterRoles outside our platform namespace. The first version was a Python 3.11 script because my team could read and maintain it without adding another control plane dependency.
The gate is intentionally blunt. If a team needs cluster-admin, my team reviews the request and records the exception. That does not block real emergencies because our break-glass process still exists, but it stops quiet permission creep from arriving through normal deployment pipelines. I would rather have one noisy failed build than one silent production privilege expansion.
Friction has a job.
We also made the check visible in pull requests, so application owners see the exact subject, roleRef, and namespace that triggered the failure. That small detail changed the conversation from “security blocked me” to “this binding grants too much.” In my opinion, the best RBAC control is boring, automatic, and close to the code path where the risky permission is created.
Keep Cluster-Admin Rare
I now treat cluster-admin like emergency access, not application access. My team still has production realities: batch jobs fail, vendors send rushed patches, line systems need maintenance windows, and nobody wants a security process that stops manufacturing output. Even so, a service account that runs a scheduled report does not need the Kubernetes equivalent of unrestricted sudo.
The durable answer is not just removing 18 bad bindings. It is changing the default instinct. When a workload needs access, I start with the narrow namespace Role and add verbs only when logs or tests prove they are required. When a platform component needs broader access, I document why the scope cannot be smaller. When someone asks for cluster-admin, I assume the request is incomplete until the reason is precise.
Least privilege is maintenance work.
I do not believe Kubernetes RBAC fails because the model is weak. I believe it fails because busy teams use powerful roles as lubricant, then move on after the deployment turns green. That was my mistake too, and the only fix I trust is a recurring review backed by CI/CD enforcement, audit logs, and a low tolerance for convenience permissions.

