Supply chain attacks are all the rage these days. While they’re not a new part of the threat landscape, they are growing in popularity among more sophisticated threat actors, and they can create significant system-wide disruption, expense, and loss of confidence across multiple organizations, sectors, or regions. The compromise of Codecov’s Bash Uploader script is one of the latest such attacks. While much is still unknown about the full impact of this incident on organizations around the world, it’s been another wake up call for the world that cybersecurity problems are getting more complex by the day.

This blog post is meant to provide the security community with defensive knowledge and techniques to protect against supply chain attacks involving continuous integration (CI) systems, such as Jenkins, Bamboo, etc., and version control systems, such as GitHub, GitLab, etc. It covers prevention techniques — for software suppliers and consumers — as well as detection and response techniques in the form of a playbook.

It has been co-developed by our Information Security, Security Research, and Managed Detection & Response teams. We believe one of the best ways for organizations to close their security achievement gap and outpace attackers is by openly sharing knowledge about ever-evolving security best practices.

Defending CI systems and source code repositories from similar supply chain attacks

Below are some of the security best practices defenders can use to prevent, detect, and respond to incidents like the Codecov compromise.

Figure 1: High-level overview of known Codecov supply chain compromise stages

Prevention techniques

Provide and perform integrity checks for executable code

If you're a software consumer

Use collision-resistant checksum hashes, such as with SHA-256 and SHA-512, provided by your vendor to validate checksums for all executable files or code they provide. Likewise, verify digital signatures for all executable files or code they provide.

If either of these integrity checks fail, notify your vendor ASAP as this could be a sign of compromised code.

If you're a software supplier

Provide collision-resistant hashes, such as with SHA-256 and SHA-512, and store checksums out-of-band from their corresponding files (i.e. make it so that an attacker has to successfully carry out two attacks to compromise your code: one against the system hosting your checksum data and another against your content delivery systems). Provide users with easy-to-use instructions, including sample code, for performing checksum validation.

Additionally, digitally sign all executable code using tamper-resistant code signing frameworks such as in-toto and secure software update frameworks such as The Update Framework (TUF) (see DataDog’s blog post about using these tools for reference). Simply signing code with a private key is insufficient since attackers have demonstrated ways to compromise static signing keys stored on servers to forge authentic digital signatures.

Relevant for the following Codecov compromise attack stages:

  • Customers’ CI jobs dynamically load Bash Uploader

Version control third-party software components

Store and load local copies of third-party components in a version control system to track changes over time. Only update them after comparing code differences between versions, performing checksum validation, and authenticating digital signatures.

Relevant for the following Codecov compromise attack stages:

  • Bash Uploader script modified and replaced in GCS

Implement egress filtering

Identify trusted internet-accessible systems and apply host-based or network-based firewall rules to only allow egress network traffic to those trusted systems. Use specific IP addresses and fully qualified domain names whenever possible, and fallback to using IP ranges, subdomains, or domains only when necessary.

Relevant for the following Codecov compromise attack stages:

  • Environment variables, including creds, exfiltrated

Implement IP address safelisting

While zero-trust-networking (ZTN) doubts have been cast on the effectiveness of network perimeter security controls, such as IP address safelisting, they are still one of the easiest and most effective ways to mitigate attacks targeting internet routable systems. IP address safelisting is especially useful in the context of protecting service account access to systems when ZTN controls like hardware-backed device authentication certificates aren’t feasible to implement.

Popular source code repository services, such as GitHub, provide this functionality, although it may require you to host your own server or, if using their cloud hosted option, have multiple organizations in place to host your private repositories separately from your public repositories.

Relevant for the following Codecov compromise attack stages:

  • Creds used to access source code repos
  • Bash Uploader script modified and replaced in GCS

Apply least privilege permissions for CI jobs using job-specific credentials

For any credentials a CI job uses, provide a credential for that specific job (i.e. do not reuse a single credential across multiple CI jobs). Only provision permissions to each credential that are needed for the CI job to execute successfully: no more, no less. This will shrink the blast radius of a credential compromise

Relevant for the following Codecov compromise attack stages:

  • Creds used to access source code repos

Use encrypted secrets management for safe credential storage

If you absolutely cannot avoid storing credentials in source code, use cryptographic tooling such as AWS KMS and the AWS Encryption SDK to encrypt credentials before storing them in source code. Otherwise, store them in a secrets management solution, such as Vault, AWS Secrets Manager, or GitHub Actions Encrypted Secrets (if you’re using GitHub Actions as your CI service, that is).

Relevant for the following Codecov compromise attack stages:

  • Creds used to access source code repos

Block plaintext secrets from code commits

Implement pre-commit hooks with tools like git-secrets to detect and block plaintext credentials before they’re committed to your repositories.

Relevant for the following Codecov compromise attack stages:

  • Creds used to access source code repos

Use automated frequent service account credential rotation

Rotate credentials that are used programmatically (e.g. service account passwords, keys, tokens, etc.) to ensure that they're made unusable at some point in the future if they’re exposed or obtained by an attacker.

If you’re able to automate credential rotation, rotate them as frequently as hourly. Also, create two “credential rotator” credentials that can both rotate all service account credentials and rotate each other. This ensures that the credential that is used to rotate other credentials is also short lived.

Relevant for the following Codecov compromise attack stages:

  • Creds used to access source code repos

Detection techniques

While we strongly advocate for adopting multiple layers of prevention controls to make it harder for attackers to compromise software supply chains, we also recognize that prevention controls are imperfect by themselves. Having multiple layers of detection controls is essential for catching suspicious or malicious activity that you can’t (or in some cases shouldn’t) have prevention controls for.

Identify Dependencies

You’ll need these in place to create detection rules and investigate suspicious activity:

  1. Process execution logs, including full command line data, or CI job output logs
  2. Network logs (firewall, network flow, etc.), including source and destination IP address
  3. Authentication logs (on-premise and cloud-based applications), including source IP and identity/account name
  4. Activity audit logs (on-premise and cloud-based applications), including source IP and identity/account name
  5. Indicators of compromise (IOCs), including IPs, commands, file hashes, etc.

Ingress from atypical IP addresses or regions

Whether or not you’re able to implement IP address safelisting for accessing certain systems/environments, use an IP address safelist to detect when atypical IP addresses are accessing critical systems that should only be accessed by trusted IPs.

Relevant for the following Codecov compromise attack stages:

  • Bash Uploader script modified and replaced in GCS
  • Creds used to access source code repos

Egress to atypical IP addresses or regions

Whether or not you’re able to implement egress filtering for certain systems/environments, use an IP address safelist to detect when atypical IP addresses are being connected to.

Relevant for the following Codecov compromise attack stages:

  • Environment variables, including creds, exfiltrated

Environment variables being passed to network connectivity processes

It’s unusual for a system’s local environment variables to be exported and passed into processes used to communicate over a network (curl, wget, nc, etc.), regardless of the IP address or domain being connected to.

Relevant for the following Codecov compromise attack stages:

  • Environment variables, including creds, exfiltrated

Response techniques

The response techniques outlined below are, in some cases, described in the context of the IOCs that were published by Codecov. Dependencies identified in “Detection techniques” above are also dependencies for response steps outlined below.

Data exfiltration response steps: CI servers

Identify and contain affected systems and data

  1. Search CI systems’ process logs, job output, and job configuration files to identify usage of compromised third-party components (in regex form). This will identify potentially affected CI systems that have been using the third-party component that is in scope. This is useful for getting a full inventory of potentially affected systems and examining any local logs that might not be in your SIEM
curl (-s )?https://codecov.io/bash

2. Search for known IOC IP addresses in regex form (based on RegExr community pattern)

(.*79\.135\.72\.34|178\.62\.86\.114|104\.248\.94\.23|185\.211\.156\.78|91\.194\.227\.*|5\.189\.73\.*|218\.92\.0\.247|122\.228\.19\.79|106\.107\.253\.89|185\.71\.67\.56|45\.146\.164\.164|118\.24\.150\.193|37\.203\.243\.207|185\.27\.192\.99\.*)

3. Search for known IOC command line pattern(s)

curl -sm 0.5 -d "$(git remote -v)

4. Create forensic image of affected system(s) identified in steps 1 - 3

5. Network quarantine and/or power off affected system(s)

6. Replace affected system(s) with last known good backup, image snapshot, or clean rebuild

7. Analyze forensic image and historical CI system process, job output, and/or network traffic data to identify potentially exposed sensitive data, such as credentials

Search for malicious usage of potentially exposed credentials

  1. Search authentication and activity audit logs for IP address IOCs
  2. Search authentication and activity audit logs for potentially compromised account events originating from IP addresses outside of organization’s known IP addresses
  3. This could potentially uncover new IP address IOCs

Unauthorized access response steps: source code repositories

Clone full historical source code repository content

Note: This content is based on git-based version control systems

Version control systems such as git coincidentally provide forensics-grade information by virtue of them tracking all changes over time. In order to be able to fully search all data from a given repository, certain git commands must be run in sequence.

  1. Set git config to get full commit history for all references (branches, tags), including pull requests, and clone repositories that need to be analyzed (*nix shell script)
git config --global remote.origin.fetch '+refs/pull/*:refs/remotes/origin/pull/*'
# Space delimited list of repos to clone
declare -a repos=("repo1" "repo2" "repo3")
git_url="https://myGitServer.biz/myGitOrg"
# Loop through each repo and clone it locally
for r in ${repos[@]}; do
echo "Cloning $git_url/$r"
git clone "$git_url/$r"
done

2. In the same directory where repositories were cloned from the step above, export full git commit history in text format for each repository. List git committers at top of each file in case they need to be contacted to gather context (*nix shell script)

git fetch --all
for dir in *(/) ; do
(rm $dir.commits.txt
cd $dir
git fetch --all
echo "******COMMITTERS FOR THIS REPO********" >> ../$dir.commits.txt
git shortlog -s -n >> ../$dir.commits.txt
echo "**************************************" >> ../$dir.commits.txt
git log --all --decorate --oneline --graph -p >> ../$dir.commits.txt
cd ..)
done

a. Note: the below steps can be done with tools such as Atom, Visual Studio Code, and Sublime Text and extensions/plugins you can install in them.

If performing manual reviews of these commit history text files, create copies of those files and use the regex below to find and replace git’s log graph formatting that prepends each line of text

^(\|\s*)*(\+|-|\\|/\||\*\|*)*\s*

b. Then, sort the text in ascending or descending order and de-duplicate/unique-ify it. This will make it easier to manually parse.

Search for binary files and content in repositories

Exporting commit history to text files does not export data from any binary files (e.g. ZIP files, XLSX files, etc.). In order to thoroughly analyze source code repository content, binary files need to be identified and reviewed.

  1. Find binary files in folder containing all cloned git repositories based on file extension (*nix shell script)
find -E . -regex '.*\.(jpg|png|pdf|doc|docx|xls|xlsx|zip|7z|swf|atom|mp4|mkv|exe|ppt|pptx|vsd|rar|tiff|tar|rmd|md)'

2. Find binary files in folder containing all cloned git repositories based on MIME type (*nix shell script)

find . -type f -print0 | xargs -0 file --mime-type | grep -e image/jpg -e image/png -e application/pdf -e application/msword -e application/vnd.openxmlformats-officedocument.wordprocessingml.document -e application/vnd.ms-excel -e application/vnd.openxmlformats-officedocument.spreadsheetml.sheet -e application/zip -e application/x-7z-compressed -e application/x-shockwave-flash -e video/mp4 -e application/vnd.ms-powerpoint -e application/vnd.openxmlformats-officedocument.presentationml.presentation -e application/vnd.visio -e application/vnd.rar -e image/tiff -e application/x-targ

3. Find encoded binary content in commit history text files and other text-based files

grep -ir "url(data:" | cut -d\) -f1
grep -ir "base64" | cut -d\" -f1

Search for plaintext credentials: passwords, API keys, tokens, certificate private keys, etc.

  1. Search commit history text files for known credential patterns using tools such as TruffleHog and GitLeaks
  2. Search binary file contents identified in Search for binary files and content in repositories for credentials

Search logs for malicious access to discovered credentials

  1. Follow steps from Data exfiltration response (Searching for malicious usage of potentially exposed credentials) using logs from systems associated with credentials discovered in Search for plaintext credentials

New findings about attacker behavior from Project Sonar

We are fortunate to have a tremendous amount of data at our fingertips thanks to Project Sonar which conducts internet-wide surveys across more than 70 different services and protocols to gain insights into global exposure to common vulnerabilities. We analyzed data from Project Sonar to see if we could gain any additional context about the IP address IOCs associated with Codecov’s Bash Uploader script compromise. What we found was interesting, to say the least:

  • The threat actor set up the first exfiltration server (178.62.86[.]114) on or about February 1, 2021
  • Historical DNS records from remotly[.]ru and seasonver[.]ru have, and continue to, point to this server
  • The threat actor configured a simple HTTP redirect on the exfiltration server to about.codecov.io to avoid detection
    { "http_code": 301, "http_body": "", "server": "nginx", "alt-svc": "clear", "location": "http://about.codecov.io/", "via": "1.1 google" }
  • The redirect was removed from the exfiltration server on or before February 22, 2021, presumably by the server owner having detected these changes
  • The threat actor set up new infrastructure (104.248.94[.]23) that more closely mirrored Codecov’s GCP setup as their new exfiltration server on or about March 7, 2021
    { "http_code": 301, "http_body": "", "server": "envoy", "alt-svc": "clear", "location": "http://about.codecov.io/", "via": "1.1 google" }
  • The new exfiltration server was last seen on April 1, 2021

We hope the content in this blog will help defenders prevent, detect, and respond to these types of supply chain attacks going forward.