Distinguished doctors have publicly criticized Google and others for making grand claims about AI research and then not sharing the source code and models to let others replicate and verify the experiments.
In January, a team led by Google Brain’s Scott Mayer McKinney published a paper in Nature boasting that its artificial intelligence was better than human doctors at spotting breast cancer in mammograms. The claims were widely reported in the mainstream press. Now top doctors have complained in an article published this week in Nature that the Googlers haven’t backed up their claims with usable evidence.
“On paper and in theory, the McKinney et al study is beautiful,” said Dr Benjamin Haibe-Kains, senior scientist at Canada’s Princess Margaret Cancer Centre and first author of the article. “But if we can’t learn from it then it has little to no scientific value.”
“Without the computer code and fitted model, it will be very difficult to build on their work,” Haibe-Kains told The Register.
“Replicating their model is not impossible but it will take months without any guarantee that the newly generated model will even be close to theirs even with access to all the data they used for training. The Devil is in the details.”
Without the computer code and fitted model, it will be very difficult to build on their work
For example, information on the system’s hyperparameters and the training pipeline were not included in the paper. Researchers should publish the relevant source code so that claims can be verified and tested more easily, Haibe-Kains said. It’s just good science to do so.
As well as Haibe-Kains, 22 other experts from top institutions – including the University of Toronto, Stanford University School of Medicine, MIT, Brigham and Women’s Hospital, and the Massive Analysis Quality Control Society, a group dedicated to reproducible science – put their names to the article. Haibe-Kains and his colleagues said Google’s “publication of insufficiently documented research does not meet the core requirements underlying scientific discovery.”
“Merely textual descriptions of deep-learning models can hide their high level of complexity. Nuances in the computer code may have marked effects on the training and evaluation of results, potentially leading to unintended consequences. Therefore, transparency in the form of the actual computer code used to train a model and arrive at its final set of parameters is essential for research reproducibility,” they wrote.
It’s not difficult to release the source code, such as publishing it on sites like GitHub, GitLab, or Bitbucket, they recommended. There’s now a tab to papers on the pre-print service arXiv to their associated source code, too.
It’s true that deploying the models on actual systems is trickier, though there is software that can make that process easier, such as Docker, Code Ocean, Gigantum, and Colaboratory.
“It’s important to state that this is early stage research,” a Google spokesperson told The Register. It appears the web giant doesn’t want its source code to be released until it’s gone through a QA process due the medical nature of the project: “We intend to subject our software to extensive testing prior to its use in a clinical environment, working alongside patients, providers and regulators to ensure efficacy and safety,” the spokesperson said.
It’s not just the internet giant
Haibe-Kains said this problem of withheld code isn’t specific to Google; many scientific papers on the uses of AI written by all sorts of teams lack the material to recreate their experiments. “Researchers are more incentivized to publish their findings rather than spend time and resources ensuring their study can be replicated,” he said.
“Journals are vulnerable to the ‘hype’ of AI and may lower the standards for accepting papers that don’t include all the materials required to make the study reproducible – often in contradiction to their own guidelines.”
Holding back crucial details, such as the source code used to create the machine-learning software, in research is detrimental to scientific progress, and prevents the algorithms from being tested in the real world in clinical settings.
Don’t trust deep-learning algos to touch up medical scans: Boffins warn ‘highly unstable’ tech leads to bad diagnoses
The McKinney team hit back, politely, at the doctors’ article in a response published in Nature, thanking the experts for their “thoughtful contribution.”
“We agree that transparency and reproducibility are paramount for scientific progress,” they wrote. “We agree that transparency and reproducibility are paramount for scientific progress. In keeping with this principle, the largest data source used in our publication is available to the academic community.”
Yet they will not publish the code to their algorithms, and claimed that most of the components in the model are open to the public already, many of them released by Google itself. There are also other concerns.
“Because liability issues surrounding artificial intelligence in healthcare remain unresolved, providing unrestricted access to such technologies may place patients, providers, and developers at risk,” the Googlers stated. “In addition, the development of impactful medical technologies must remain a sustainable venture to promote a vibrant ecosystem that supports future innovation. Parallels to hardware medical devices and pharmaceuticals may be useful to consider in this regard.”
Because liability issues surrounding artificial intelligence in healthcare remain unresolved, providing unrestricted access to such technologies may place patients, providers, and developers at risk
Haibe-Kains told El Reg he isn’t surprised Google has decided to not publish the code, despite numerous pleas: “They have been given the opportunity once with the publication of their original study, and a second time with the publication of our article. They have not seized these opportunities, it is, therefore, clear that they do not want to share their computer code.”
It’s possible that Google might be holding back the code for commercial reasons. By keeping it to themselves, the ad giant has the upper-hand for pushing forward clinical trials and developing a product that can be sold to healthcare providers.
“There is nothing wrong with this, but it has little to do with science per se, as no new knowledge outside Google is being generated and shared to advance research at large,” Haibe-Kain told us. “There is a darker possibility that I prefer not to believe in: it does not want anybody to scrutinize its code because it is concerned that its model is not stable or that there might be hidden biases or confounding factors that would invalidate the model’s prediction.
“This would not be the first time that subsequent analyses reveal such limitations or errors and that is exactly why we scientists should always be transparent.” ®