In Need of 'Pair' Review: Vulnerable Code Contributions by GitHub Copilot

Conference: Black Hat USA 2022

2022-08-10

Summary

The presentation discusses the use of Copilot, an AI tool developed by GitHub, for generating code and its potential impact on security. The study aims to determine how many common weaknesses and vulnerabilities Copilot generates and how a slight change in the prompt can affect the generated code's security.

Copilot is an AI tool developed by GitHub that generates code based on a provided prompt
The study aims to determine how many common weaknesses and vulnerabilities Copilot generates and how a slight change in the prompt can affect the generated code's security
The study paired Copilot with security scanning tools to check the correctness of the generated code
The study found that Copilot tends to generate functionally correct code but may not always generate secure code
A single comment change can significantly affect Copilot's generated code
The study also tested Copilot's ability to generate code in hardware language Verilog and found that it can potentially generate security-relevant bugs in hardware

The presenter challenged Copilot to generate code in hardware language Verilog, which is still code that can contain security-relevant bugs in hardware. The study found that Copilot can potentially generate such bugs, which adds another dimension to the impact of Copilot on security.

Abstract

On June 29 in 2021 GitHub announced and released their newest tool, 'Copilot' - an 'AI-based Pair Programmer', a deep learning model trained over vast quantities of open-source GitHub code. However, we humans wrote most of that code. And much of it isn't great. It has bugs, it contains dated coding practices, and many repositories even contain dangerously insecure code. Given the vast quantity of garbage code that Copilot has learned from, is it reasonable to trust the code suggestions that it generates? In this talk, we demonstrate that GitHub Copilot is susceptible to writing vulnerabilities in multiple axis, from SQL injections to buffer overflows, use-after-free to cryptographic issues. We try different languages - C, Python, and even Verilog, where we show it also generates hardware bugs (when it can generate hardware at all).Overall, we tried 89 different scenarios for Copilot, generating 1,689 suggestions, and found approximately 40% to be vulnerable.

Materials:

Tags: