logo

Detecting Malicious PyPi Packages With Semgrep

2022-11-18

Authors:   Andrew Krug, Ellen Wang


Abstract

Software packages are a juicy target for attackers to compromise. They allow malicious actors to access machines and production environments to steal sensitive data, or perform cryptojacking. In the last few months alone, multiple malicious Python packages have been reported to steal credentials from their victims and were subsequently removed. In the worst case, these packages are an attractive target for advanced threat actors to gain access to victims to steal intellectual property or carry out nation state objectives, as seen in CodeCov and SUNBURST.What makes a “bad” package? How can we identify software packages that look malicious? In this talk, we start by showcasing some real-world malicious Pypi packages and the techniques they use to spread and execute code in victims’ environments. We then discuss how we use Semgrep, a static analysis tool designed for vulnerability detection, to scan the source code of Pypi packages and identify suspicious patterns characteristic of malware. Finally, we demonstrate the concept by dissecting malicious Pypi packages we found in the wild.Introduction- Explanation of SLSA threat model with focus on dependency- Short history of malicious Pypi packages - Why it’s a real problem, mention that most existing tools look for previously detected malware and cannot identify never-before-seen malicious software- Problem statement: How to identify malicious packages at scale?Techniques used by Pypi malware (with illustration with real-world examples)- Quick explanation of data analyzed to find techniques: 30-40 PyPI packages removed from PyPI- Explanation of most common patterns found in malware:- Initial access: typosquatting, compromising the maintainer account, compromising the maintainer email domain- Execution: Using a setup script, hooking a function, evaluating dynamic code- Exfiltration: Using url shorteners, stealing environment variables, using an unusual domain extension- Goal: cryptomining, stealing credentialsWriting Semgrep rules to catch malicious Pypi packages:- Quick intro to Semgrep (30s)- Semgrep taint analysis mode- Explanation of detection heuristics created:- Execution of base64-encoded strings- Exfiltration over HTTP of sensitive information- Download and execution of an executable file- Executing commands in setup.py- Putting it all together in a CLI- Results overview: real-world malicious packages we caught and false positive rateConclusion- Brief summary- Future work: Running it at scale and continuously in AWS Lambda

Materials: