GitHub Copilot has steered software engineers at the Australia and New Zealand Banking Group (ANZ Bank) toward improved productivity and code quality, and the test drive was enough for the finance house to deploy the generative AI programming assistant in production workflows.
From mid-June, 2023 through the end of July that year, the Melbourne-based ANZ Bank conducted an internal trial of GitHub Copilot that involved 100 of the firm’s 5,000 engineers.
The six-week trial, consisting of two weeks of preparation and four weeks of code challenges, sought to examine how participants felt about using GitHub Copilot with Microsoft Visual Studio Code and to measure the impact the AI-based system had on programmers’ productivity, code quality, and software security.
The experiment’s findings have been documented in a report with a title that could use a little more finesse: “The Impact of AI Tool on Engineering at ANZ Bank, An Empirical Study on GitHub Copilot within Corporate Environment.”
Co-authored by Sayan Chatterjee, cloud architect at ANZ, and Louis Liu, engineering AI and data analytics capability area lead at ANZ, the report cites several prior studies about programming productivity with Copilot.
An ACM/IEEE study on programming with AI help suggested robo-assistance was more of a trade-off: It found that Copilot generated more code, although the quality of software generated was worse than human-built software.
ANZ Bank sought to conduct its own evaluation, citing the potential benefit of AI on productivity while also acknowledging that the technology “raises inherent risks, uncertainties and unintentional consequences regarding intellectual property, data security and privacy.”
Those risks – highlighted by the ongoing copyright lawsuit against GitHub, Microsoft, and OpenAI over Copilot – aren’t addressed in the study, except as an nod to regulatory compliance.
“Prior to starting the experiment, risks related to intellectual property, data security and privacy were assessed in conjunction with ANZ’s legal and security teams to arrive at a set of guidelines,” it said.
The bank experiment examined what effect Copilot has on: Developer sentiment and productivity, as well as code quality and security. It required participating software engineers, cloud engineers, and data engineers to tackle six algorithmic coding challenges per week using Python. Those in the control group were not allowed to use Copilot but were allowed to search the internet or use Stack Overflow.
“The group that had access to GitHub Copilot was able to complete their tasks 42.36 percent faster than the control group participants,” the report says. “…The code produced by Copilot participants contained fewer code smells and bugs on average, meaning it would be more maintainable and less likely to break in production.”
Both of these results were deemed statistically significant. As for security, the experiment was inconclusive.
“The experiment could not generate meaningful data which would measure code security, “the report says. “However, the data suggest that Copilot did not introduce any major security issues into the code.”
The data suggest that Copilot did not introduce any major security issues into the code
This may have been due to the nature of the challenges, which were designed to be short enough that participants could complete them along with their usual daily work. As such, the submitted challenges were fairly short and didn’t leave a lot of room for bugs, the report notes.
In terms of sentiment, those using Copilot felt positive about the experience, though not strongly so.
“They felt it helped them review and understand existing code, create documentation, and test their code; they felt it allowed them to spend less time debugging their code and reduced their overall development time; and they felt the suggestions it provided were somewhat helpful, and aligned well with their project’s coding standards,” the report says.
One intriguing finding is that Copilot was the most useful to the most experienced programmers.
“Assessment of productivity based on Python proficiency found Copilot was beneficial to participants for all skill levels but was most helpful for those who were ‘Expert’ Python programmers,” the study says, adding that the AI helper provided the most improvement (in terms of time saved) on hard tasks.
While observing that the mildly positive endorsements from participants indicate that Copilot can be improved further, the report nonetheless endorsed putting Copilot into production workflows at the bank.
“As of the writing of this paper, GitHub Copilot has already seen significant adoption within the organization, with over 1,000 users using it in their workflows,” the report concludes, adding that a broader investigation of the Copilot’s productivity impact is underway. ®
Counterpoint: AI assistance is leading to lower source code quality, researchers claim