A recent study finds that software engineers who use code-generating AI systems are more likely to cause security vulnerabilities in the apps they develop. The paper, co-authored by a team of researchers affiliated with Stanford, highlights the potential pitfalls of code-generating systems as vendors like GitHub start marketing them in earnest.
“Code-generating systems are currently not a replacement for human developers,” Neil Perry, a PhD candidate at Stanford and the lead co-author on the study, told TechCrunch in an email interview. “Developers using them to complete tasks outside of their own areas of expertise should be concerned, and those using them to speed up tasks that they are already skilled at should carefully double-check the outputs and the context that they are used in in the overall project.”
The Stanford study looked specifically at Codex, the AI code-generating system developed by San Francisco-based research lab OpenAI. (Codex powers Copilot.) The researchers recruited 47 developers — ranging from undergraduate students to industry professionals with decades of programming experience — to use Codex to complete security-related problems across programming languages including Python, JavaScript and C.
Codex was trained on billions of lines of public code to suggest additional lines of code and functions given the context of existing code. The system surfaces a programming approach or solution in response to a description of what a developer wants to accomplish (e.g. “Say hello world”), drawing on both its knowledge base and the current context.
According to the researchers, the study participants who had access to Codex were more likely to write incorrect and “insecure” (in the cybersecurity sense) solutions to programming problems compared to a control group. Even more concerningly, they were more likely to say that their insecure answers were secure compared to the people in the control.
Megha Srivastava, a postgraduate student at Stanford and the second co-author on the study, stressed that the findings aren’t a complete condemnation of Codex and other code-generating systems. The study participants didn’t have security expertise that might’ve enabled them to better spot code vulnerabilities, for one. That aside, Srivastava believes that code-generating systems are reliably helpful for tasks that aren’t high risk, like exploratory research code, and could with fine-tuning improve in their coding suggestions.
“Companies that develop their own [systems], perhaps further trained on their in-house source code, may be better off as the model may be encouraged to generate outputs more in-line with their coding and security practices,” Srivastava said.
So how might vendors like GitHub prevent security flaws from being introduced by developers using their code-generating AI systems? The co-authors have a few ideas, including a mechanism to “refine” users’ prompts to be more secure — akin to a supervisor looking over and revising rough drafts of code. They also suggest that developers of cryptography libraries ensure their default settings are secure, as code-generating systems tend to stick to default values that aren’t always free of exploits.
“AI assistant code generation tools are a really exciting development and it’s understandable that so many people are eager to use them. These tools bring up problems to consider moving forward, though … Our goal is to make a broader statement about the use of code generation models,” Perry said. “More work needs to be done on exploring these problems and developing techniques to address them.”
To Perry’s point, introducing security vulnerabilities isn’t code-generating AI systems’ only flaw. At least a portion of the code on which Codex was trained is under a restrictive license; users have been able to prompt Copilot to generate code from Quake, code snippets in personal codebases and example code from books like “Mastering JavaScript” and “Think JavaScript.” Some legal experts have argued that Copilot could put companies and developers at risk if they were to unwittingly incorporate copyrighted suggestions from the tool into their production software.
GitHub’s attempt at rectifying this is a filter, first introduced to the Copilot platform in June, that checks code suggestions with their surrounding code of about 150 characters against public GitHub code and hides suggestions if there’s a match or “near match.” But it’s an imperfect measure. Tim Davis, a computer science professor at Texas A&M University, found that enabling the filter caused Copilot to emit large chunks of his copyrighted code, including all attribution and license text.
“[For these reasons,] we largely express caution toward the use of these tools to replace educating beginning-stage developers about strong coding practices,” Srivastava added.
zince its debut in November, ChatGPT has become the internet’s new favorite plaything. The AI-driven natural language processing tool rapidly amassed more than 1 million users, who have used the web-based chatbot for everything from generating wedding speeches and hip-hop lyrics to crafting academic essays and writing computer code. Not only have ChatGPT’s human-like abilities […]
ata breaches can be extremely harmful to organizations of all shapes and sizes — but it’s how these companies react to the incident that can deal their final blow. While we’ve seen some excellent examples of how companies should respond to data breaches over the past year — kudos to Red Cross and Amnesty for their transparency — 2022 has been a […]
EVERY DAY, BILLIONS of people use the GPS satellite system to find their way around the world—but GPS signals are vulnerable. Jamming and spoofing attacks can cripple GPS connections entirely or make something appear in the wrong location, causing disruption and safety issues. Just ask Russia. New data analysis reveals that multiple major Russian cities […]
Leave a Reply