Automated Vulnerability Scanning Pipeline: From Git Push to Remediation Ticket

Automated Vulnerability Scanning Pipeline: From Git Push to Remediation Ticket

The Production Image That Changed My Scan Design

We were discovering CVSS 9.x vulnerabilities in production container images that had been running for months — they weren’t in our scan baseline at deployment. My team found the first one during a FortiOS 7.4.3 firewall rule review tied to an internal manufacturing app, not during a clean security audit, which made the whole thing more uncomfortable. The container had passed our build pipeline, shipped to Ubuntu 22.04 nodes, and quietly aged into a known-bad state while our dashboards still showed green.

My first assumption was wrong. I assumed a clean build-time scan meant the deployed image stayed clean enough until the next release. Our vulnerability scan ran at build time but not on the running images — new CVEs published after deployment were invisible until the next build cycle. In a plant environment where some support services only changed once a quarter, that blind spot was unacceptable.

We had Python 3.11 automation scripts, GitLab CI, Trivy 0.49.1, JIRA Cloud, and a small security team that already had too many manual checks. I did not need another dashboard that someone had to remember to read. I needed the scan result to hit the engineering workflow before the risk became background noise.

Green was lying.

Wire Trivy Into The Merge Path

I started by making Trivy part of the pre-merge and image-build path, because that was the fastest place to stop obvious mistakes. We scanned file systems before container build, scanned the final image after build, and stored the JSON output as a pipeline artifact. I kept the first gate narrow: OS packages, application dependencies, and secrets. I did not want the team arguing about theoretical findings while a critical OpenSSL package sat in the image.

stages:
  - scan
  - build
  - verify

trivy_premerge:
  stage: scan
  image: aquasec/trivy:0.49.1
  script:
    - trivy fs --exit-code 1 --severity HIGH,CRITICAL --format json --output trivy-fs.json .
  artifacts:
    when: always
    paths:
      - trivy-fs.json

trivy_image:
  stage: verify
  image: aquasec/trivy:0.49.1
  script:
    - trivy image --exit-code 1 --severity HIGH,CRITICAL --format json --output trivy-image.json registry.local/apps/workorder-api:$CI_COMMIT_SHA
  artifacts:
    when: always
    paths:
      - trivy-image.json

The practical benefit was speed. When a developer introduced a vulnerable library, we caught it before merge, while the dependency decision was still fresh in their head. A code review comment that says “bump this base image from python:3.11.6-slim to python:3.11.8-slim” is easy to handle. A production incident ticket three months later is not.

I prefer hard pre-merge gates for CVSS 7.0 and higher because ambiguity creates drift. A warning-only pipeline sounds collaborative until the tenth exception becomes the standard operating model.

Separate Build-Time And Runtime Risk

The missing piece was runtime re-evaluation. Build-time scanning answered, “What did we ship?” Runtime scanning answered, “What are we still running now that the threat data changed?” Those are different questions, and treating them as the same question was my mistake.

We added a weekly job that pulled the running image digests from our Kubernetes clusters, matched them against the registry, and rescanned by digest rather than tag. That detail mattered because tags moved, but digests gave us proof. The job ran from an Ubuntu 22.04 runner with read-only registry access and used a Python 3.11 script to normalize cluster inventory before Trivy scanned each digest.

What I didn’t expect was how many findings were not caused by bad deployments. They were caused by time. A clean image on Monday became a risky image after a Thursday CVE publication, and nothing in our old process noticed because no one had rebuilt the service.

Time is an attack surface.

After adding runtime image scanning with weekly re-evaluation, average time to discover new CVEs in production dropped from 47 days to 6 days. That single number changed how our operations team looked at vulnerability management. We stopped treating scans as release paperwork and started treating them as production telemetry.

My opinion is simple: if I only scan at build time, I am managing compliance evidence, not production risk.

Turn Findings Into JIRA Work

Once runtime scans produced reliable findings, I connected them to JIRA. I avoided dumping every CVE into the queue because that would have trained everyone to ignore the automation. The script grouped findings by service, image digest, package name, installed version, fixed version, and CVSS score. One service owner got one ticket per actionable package family, not twenty duplicate alerts from twenty pods.

  • I included the image digest so engineering could reproduce the exact scan target.
  • I included the vulnerable package and installed version so remediation did not start with guesswork.
  • I included the fixed version when Trivy provided one.
  • I mapped each ticket to the owning application team from our service catalog.
  • I added production exposure context from namespace and cluster labels.

The JIRA title format was deliberately boring: “CRITICAL CVE in workorder-api image digest sha256:…” Boring titles made search and reporting easier. The description carried the detail, including Trivy output, package path, CVSS vector, and the expected remediation path. We also added a label for “runtime-rescan” so we could separate delayed CVE discovery from pipeline failures.

The ticket is the control.

I also built idempotency into the JIRA creation step. If the same vulnerable package on the same image digest already had an open ticket, the automation updated the existing ticket instead of creating noise. If a later scan showed the issue fixed, the automation commented with the clean scan evidence and moved the ticket to verification. That closed the loop without forcing my team to babysit a queue.

You may also find this useful: Check out our guide on Python Network Config Backup: Automating Multi-Vendor Device Snapshots for more practical tips.

I would rather have fewer tickets that engineers trust than a perfect vulnerability export that everyone hates.

Define CVSS Policy Without Pretending It Is Math

Our first policy draft tried to be too clever. We used CVSS, exploit maturity, internet exposure, package type, and environment labels. It looked mature, but it slowed every decision. In a manufacturing facility, I need predictable rules that work during a maintenance window when the line is down and everyone wants an answer now.

We landed on a sharper model. CVSS 9.0 and higher created an urgent remediation ticket and paged the app owner during business hours. CVSS 7.0 through 8.9 blocked merge in CI and created tracked production tickets during runtime scans. CVSS below 7.0 stayed visible in reports unless a known exploited vulnerability flag appeared. That last exception mattered because a medium CVSS with active exploitation can be more relevant than a theoretical high.

Policy should be blunt enough to use.

We also treated compensating controls honestly. A service isolated behind FortiOS 7.4.3 segmentation still needed remediation, but the SLA differed from a service exposed through a partner VPN. Segmentation changed priority; it did not erase the finding. I have seen too many teams use “internal only” as a permission slip to do nothing.

My strongest view after running this pipeline is that blocking should happen closest to the developer, while tracking should happen closest to production. Mixing those two jobs creates a system that is both noisy and late.

Keep The Feedback Loop Close To The Fix

The pipeline worked because it put each finding where it could still be acted on. Pre-merge Trivy scans caught risky dependencies before review ended. Image scans caught bad base layers before deployment. Runtime rescans caught new CVEs after deployment without waiting for the next release. JIRA turned the result into accountable work instead of another security spreadsheet.

I still tune thresholds, and I still review exceptions manually when a production service has operational constraints. The difference is that my team now argues from current evidence. We know which digest is running, which CVE applies, which fixed package exists, and who owns the service. That is a better conversation than asking why a vulnerability sat for months in a container nobody remembered to rebuild.

Minutes beat months.

The key insight has held up in every review since: a vulnerability discovered at code review takes minutes to fix; the same vulnerability discovered post-production takes weeks and sometimes a breach investigation. I do not want vulnerability management to depend on memory, quarterly rebuilds, or someone scrolling through scanner output after lunch. I want the pipeline to keep looking after the deployment, because production keeps changing even when the code does not.

My opinion is that runtime scanning is no longer optional for containerized manufacturing systems. If the image is still running, the risk is still moving.

The scan that matters most is the one that runs after everyone thinks the release is finished.

External References


·

·