Supply Chain Security on a Kubernetes Cluster I Definitely Broke Multiple Times
Software supply chain attacks have been a known problem since at least SolarWinds and Log4Shell. But in the age of AI, the blast radius has gotten significantly larger.
Take the LiteLLM compromise in March 2026. LiteLLM is a Python package downloaded 3.4 million times per day – it acts as a unified gateway to OpenAI, Anthropic, Azure AI, and every other major LLM provider. Two versions (1.82.7 and 1.82.8) were trojanized through a cascading attack that started with a compromised Trivy GitHub Action. The malicious payload did three things: harvested credentials from over 50 categories (SSH keys, AWS/GCP/Azure tokens, Kubernetes secrets, LLM API keys), performed lateral movement across Kubernetes clusters by spinning up privileged pods on every node, and installed a persistent backdoor that polled a C2 server every 50 minutes. Because LiteLLM sits between application code and every model API, compromising it handed attackers the API keys for multiple LLM providers simultaneously – one package, all keys.
A month later, in April 2026, the Bitwarden CLI npm package was compromised. 250,000 monthly downloads. The malicious version systematically collected secrets from Azure, AWS, GitHub, GCP, npm, and SSH credentials, then weaponized stolen GitHub tokens to extract more secrets from repositories.
What both attacks have in common: the victim organizations had no reliable way to know what was actually running inside their dependencies. If you’re not generating SBOMs, signing them, and verifying them at deployment time – you’re flying blind.
That’s what this project is. An end-to-end supply chain security pipeline for a Kubernetes cluster – admission control, automated SBOM generation, cryptographic attestation with Cosign, and a vulnerability enrichment module that pulls CVE data from three different sources to give you a real risk profile per package. Built it from scratch. Broke things constantly. Had fun.
One thing worth mentioning upfront: the README and the occasional inline comment were written with LLM assistance. Everything else – the architecture decisions, the code, the debugging, the tooling choices – was done entirely by me. No “what’s the best practice here?” prompts, no asking an AI to design the system. The whole point was to think through it myself, make my own calls, and actually understand what I was building. I wanted to be the one connecting the dots, not just shipping whatever Claude recommended.
What’s an SBOM and Why Should You Care?
SBOM stands for Software Bill of Materials. It’s exactly what it sounds like – a manifest of every library, package, and dependency that went into building a piece of software. Think of it like a nutrition label, but for your container image.
Why does this matter? Because container images are black boxes by default. You pull
nginx:latest, deploy it, and pray. An SBOM tells you:
What exact packages are inside
What versions they’re running
Which ones have known CVEs
The CycloneDX format is what I used here – it’s one of the two main SBOM standards (the other being SPDX). JSON output, machine-readable, and Trivy generates it natively.
The Architecture (High Level)
There are three layers to this project:
Admission Control – A Kubernetes validating webhook that intercepts pod and deployment creation requests and enforces requirements before resources are admitted to the cluster.
Supply Chain Pipeline – A GitHub Actions CI/CD pipeline that builds the webhook container image, generates a CycloneDX SBOM using Trivy, cryptographically signs and attaches the SBOM to the image using Cosign, and auto-commits the artifact back to the repository.
Vulnerability Intelligence – A Python module that enriches SBOM package data with CVE information from OSV, exploit probability scores from FIRST EPSS, and base severity scores from NVD CVSS – producing a consolidated risk profile per package.
Part 1: The Admission Webhook
The webhook is a Python Flask app served by Gunicorn over TLS on port 443. Kubernetes
sends it AdmissionReview JSON objects whenever a pod or deployment creation request
hits the API server. The webhook reads it, applies the validation logic, and returns
allowed: true or allowed: false with a rejection message.
Here’s the shape of a real request the API server sends (simplified):
{
"kind": "AdmissionReview",
"apiVersion": "admission.k8s.io/v1",
"request": {
"uid": "7e8214a0-b076-4aeb-88d0-c677912f6a1a",
"kind": { "group": "apps", "version": "v1", "kind": "Deployment" },
"operation": "CREATE",
"object": {
"spec": {
"template": {
"spec": {
"containers": [{
"name": "my-app",
"image": "my-image:latest",
"env": [{ "name": "LABEL", "value": "deployment" }]
}]
}
}
}
}
}
}
And the response the webhook sends back:
{
"apiVersion": "admission.k8s.io/v1",
"kind": "AdmissionReview",
"response": {
"uid": "7e8214a0-b076-4aeb-88d0-c677912f6a1a",
"allowed": true,
"status": { "message": "deployment label exists" }
}
}
The uid in the response must match the uid from the request – that’s how the
API server knows which request the decision applies to. If you get this wrong, the API
server discards the response.
How Kubernetes Routes to the Webhook
The magic happens in a ValidatingWebhookConfiguration resource. You register your
webhook server with the API server, tell it which resource types to intercept (pods,
deployments), and give it a URL and a CA bundle so it can verify the TLS cert.
Here’s the actual manifest:
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: validating-webhook-config
webhooks:
- admissionReviewVersions: ["v1", "v1beta1"]
name: validate.default.svc.cluster.local
failurePolicy: Fail
sideEffects: None
clientConfig:
caBundle: <base64-encoded-CA-cert>
service:
name: validate
namespace: default
path: /validate
port: 443
rules:
- operations: ["CREATE"]
apiGroups: ["apps", ""]
apiVersions: ["v1beta", "v1"]
resources: ["deployments", "pods"]
A few things worth noting here. failurePolicy: Fail means if the webhook is
unreachable, the admission request is denied – so if your webhook pod crashes, nothing
gets deployed until it comes back up. caBundle is the base64-encoded CA cert that
the API server uses to verify your webhook’s TLS cert. And rules is where you
declare what operations and resource types trigger the webhook – in this case, CREATE
operations on deployments and pods.
When a user runs kubectl apply -f my-deployment.yaml, the request hits the API
server, gets routed to your webhook before it’s written to etcd, and only lands in
the cluster if your webhook signs off on it.
User applies Pod or Deployment manifest
|
v
Kubernetes API Server intercepts CREATE request
|
v
Routes to ValidatingWebhookConfiguration
|
v
HTTPS POST -> validate.default.svc:443/validate
|
v
Webhook Pod (Flask + Gunicorn)
- Extracts env variables from first container
- Checks for required LABEL value
- Returns AdmissionReview response
|
_____|_____
| |
Allowed Denied
| |
Resource Request
created rejected
The validation logic itself is simple on purpose – the webhook checks for a required
LABEL environment variable in the submitted pod spec. That’s the hook. In a real
deployment you’d swap this out for image signature verification, SBOM attestation
checks, whatever policy you care about. The infrastructure is the point.
The TLS Problem
Here’s where things got annoying. The Kubernetes API server will only call your webhook
over HTTPS, and it verifies the cert against the caBundle you provide in the
ValidatingWebhookConfiguration. So you need:
A CA keypair
A server cert signed by that CA
The CA cert base64-encoded and embedded in your webhook config
The server cert and key stored in a Kubernetes Secret and mounted into the webhook pod
I generated all of this with OpenSSL. The important part is getting the Subject
Alternative Names right in csr.conf – they need to match the Kubernetes service
DNS name (validate.default.svc) exactly, or the API server will refuse to talk to
your webhook and everything will break silently.
Here’s the csr.conf:
[ req ]
default_bits = 2048
prompt = no
default_md = sha256
req_extensions = req_ext
distinguished_name = dn
[ dn ]
C = US
ST = NY
L = NY
CN = 192.168.0.109
[ req_ext ]
subjectAltName = @alt_names
[ alt_names ]
DNS.1 = validate
DNS.2 = validate.default
DNS.3 = validate.default.svc
DNS.4 = validate.default.svc.cluster
DNS.5 = validate.default.svc.cluster.local
IP.1 = 192.168.0.109
[ v3_ext ]
authorityKeyIdentifier=keyid,issuer:always
basicConstraints=CA:FALSE
keyUsage=keyEncipherment,dataEncipherment
extendedKeyUsage=serverAuth,clientAuth
subjectAltName=@alt_names
You need all five DNS entries because Kubernetes resolves services with progressively
shorter names depending on the namespace context of the caller. validate.default.svc
is the minimum for cross-namespace calls, but covering all five means you won’t be
chasing a SAN mismatch if the API server resolves to a different form. Replace the IP
with your node’s actual IP or remove it if you don’t need direct IP access.
I know this because it broke silently the first time. Resources were being admitted
without going through the webhook at all, and nothing in kubectl output told me
why. I had to exec into the webhook pod and manually send a test AdmissionReview
request with curl to confirm whether the server was even receiving anything. It
wasn’t. Then I pulled the pod logs – no incoming requests, no TLS errors, nothing.
The API server had simply stopped routing to the webhook because the cert SAN
(Subject Alternative Name – the field in a TLS certificate that lists the DNS names
and IPs the cert is valid for) didn’t match the service DNS name, and it failed open
rather than loudly.
The private keys and certs are in .gitignore. The webhook-secret.yaml is also
not committed – it contains the base64-encoded TLS private key and gets generated
locally at deploy time. Don’t commit your TLS private keys. Don’t do it.
Part 2: The CI/CD Supply Chain Pipeline
This is where it gets more interesting. Every push to admissions-controller/**
triggers a three-job GitHub Actions pipeline:
Job 1: Build
Docker Buildx builds the webhook image and pushes it to Docker Hub. The key detail is capturing the immutable image digest as a job output so downstream jobs can reference the exact image that was built:
jobs:
docker-build-and-push:
runs-on: ubuntu-latest
outputs:
digest: ${{ steps.build-and-push.outputs.digest }}
steps:
- uses: docker/build-push-action@v6
id: build-and-push
with:
push: true
tags: liquidbread0/sbom-validating-webhook:validatingWebhookImage
Why digest instead of tag? Because tags are mutable. latest today is not
latest tomorrow. The digest is a SHA256 hash of the image content – it’s
pinned forever.
Job 2: SBOM Generation
Trivy scans the image and outputs a CycloneDX SBOM in JSON format. The job then auto-commits the SBOM back to the repo and uploads it as a workflow artifact for the next job to consume:
sbom-generation-docker-image:
runs-on: ubuntu-latest
needs: docker-build-and-push
permissions:
contents: write
steps:
- uses: aquasecurity/trivy-action@v0.36.0
with:
image-ref: "liquidbread0/sbom-validating-webhook:validatingWebhookImage"
scan-type: image
format: cyclonedx
output: data/validatingWebhookImage-sbom.json
- uses: actions/upload-artifact@v4
with:
name: sbom
path: data/validatingWebhookImage-sbom.json
- uses: stefanzweifel/git-auto-commit-action@v5
with:
commit_message: "Auto-update webhook sbom file"
file_pattern: data/validatingWebhookImage-sbom.json
The permissions: contents: write is required for the auto-commit step – without
it the job silently fails to push. Took me longer than I’d like to admit to figure
out why the commit wasn’t landing.
Job 3: Cosign Attestation
Cosign signs the SBOM and attaches the attestation to the image at its exact digest
in the Docker Hub registry. The digest from Job 1 is passed in via needs:
sbom-attest-attach:
runs-on: ubuntu-latest
needs: [docker-build-and-push, sbom-generation-docker-image]
steps:
- uses: actions/download-artifact@v4
with:
name: sbom
- uses: sigstore/cosign-installer@v4.1.0
- name: Sign the SBOM and attach it to the image
env:
TAGS: liquidbread0/sbom-validating-webhook:validatingWebhookImage
DIGEST: ${{ needs.docker-build-and-push.outputs.digest }}
COSIGN_PRIV_KEY: ${{ secrets.COSIGN_PRIV_KEY }}
COSIGN_PASSWORD: ${{ secrets.COSIGN_PASSWORD }}
run: |
cosign attest --key env://COSIGN_PRIV_KEY \
--type cyclonedx \
--predicate validatingWebhookImage-sbom.json \
$TAGS@$DIGEST
What does “attaching an attestation” actually mean? Cosign pushes a separate artifact to the registry alongside the image – a signed JSON envelope containing the SBOM. Anyone with the public key can verify it:
cosign verify-attestation \
--key cosign-keys/cosign.pub \
--type cyclonedx \
liquidbread0/sbom-validating-webhook:validatingWebhookImage@<digest>
If the signature doesn’t verify, you know the SBOM was tampered with or the image wasn’t signed by the expected key.
The full pipeline looks like this:
Push to admissions-controller/**
|
v
Job 1: Build and push Docker image -> capture digest
|
v
Job 2: Trivy scans image -> CycloneDX SBOM (JSON)
SBOM auto-committed to data/ in repo
|
v
Job 3: Cosign signs SBOM with private key
Attestation attached to image digest in registry
Part 3: SBOM + Vulnerability Enrichment
Having an SBOM is step one. Actually understanding what’s in it is step two.
The enrichment module reads the CycloneDX SBOM and, for each package, queries three external sources:
Source |
What It Gives You |
|---|---|
OSV (api.osv.dev) |
CVEs and security advisories matched by Package URL (PURL – a standardized
identifier for packages across ecosystems, e.g.
|
FIRST EPSS (api.first.org) |
A probability score (0–1) for how likely a CVE is to be exploited in the wild |
NVD CVSS (services.nvd.nist.gov) |
Base severity score for each CVE |
The output is an EnrichedCVEwithEPSS record per vulnerability – package identity,
CVE ID, EPSS probability, and CVSS base score in a single data model.
Why EPSS? Because CVSS alone is a terrible triage tool. A CVSS 9.8 vulnerability in a package that nobody has ever actually exploited is not the same threat as a CVSS 7.2 with active exploit code in the wild. EPSS gives you the “is anyone actually using this?” signal. Pair it with CVSS and you get something useful.
Right now this module is a work in progress. It’ll get there.
What I Learned
A few things worth calling out:
Immutable image digests matter. You should never be deploying by tag in a security-sensitive environment. A tag can be retagged. A digest is a cryptographic commitment. Build your pipelines around digests.
TLS in Kubernetes is annoying. The SAN matching requirement trips up basically
everyone the first time. Get the service DNS name right (<service>.<namespace>.svc)
and base64-encode the CA cert correctly. When in doubt, openssl verify before you
ever try to deploy.
Signing without a verification step at admission is half a solution. Right now the attestation gets generated and pushed to the registry, but the admission webhook doesn’t actually verify it before letting pods in. The logical next step is wiring Cosign verification into the webhook itself – reject any image that doesn’t have a valid SBOM attestation from the expected key. That closes the loop.
Auto-committing artifacts from CI is surprisingly useful. Having the SBOM auto-committed to the repo on every build means you always know what was in each image, without having to dig through workflow artifacts. Free audit trail.
What’s Next
A few things I want to add:
Cosign verification in the webhook – reject images without valid SBOM attestations at admission time. This is the thing that actually enforces supply chain policy.
Finish the enrichment module – wire up the full CVE enrichment pipeline end to end.
Policy-as-code – integrate something like OPA Gatekeeper so the validation logic is declarative and auditable rather than hardcoded in Flask.
The full repo is here: sbom-admission-controller
If you’re building anything on Kubernetes and you’re not thinking about what’s inside your images – what packages, what versions, what CVEs – you’re flying blind. SBOMs aren’t a silver bullet, but they’re the baseline. Start there.