Under pressure science opens up
Like society, science is under the spell of the COVID-19 pandemic. At the end of 2019 the virus was still unknown ground, however since the beginning of 2020 research on the behaviour of the virus, treatment for COVID-19 and the development of a vaccine has increased enormously. We observe a growing willingness amongst researchers to share code and data and to collaborate in international communities. “Under pressure science opens up”.
In this blog post we present some best and not-so best use cases of sharing research code and data. We show why more collaboration led to new opportunities, but also that sharing without defining the sharing conditions might lead to scientific damage.
The examples used in this blog post were presented on a WDCC webinar on September 23, 2020.
Sharing of research code: context is key
One of the reports of the COVID-19 response team, led by Neil Ferguson of the Imperial college in London is research that is at the base of the COVID-strategy of the British government. After pressure from society and science the team published the research code of the model on which the findings of the research and advice to the government were based. The model was published on Github.
“How can we trust horrible code for country strategy?”
At first there was tremendous critics on the code, since it was organized messy and badly documented. For the research team these critics were bitter, since after all the code did it’s academic work, and years of laborious research was the fundament of the code. The code was reviewed and tested by many peers. Also it became clear that part of the critics originated from programmers, rather than researchers. These programmers simple took the code into consideration, rather than the full context (i.e. including the research report).
As a result of all the (negative) publicity there was an overwhelming attention for the code, which turned out to be very fruitful. In the Github environment, the code was cleaned, documented and improved by experts from different fields (also outside the Imperial College). Through collaboration the code yielded better predictions on COVID-19 behaviour and the advice following the predictions became more accurate.
With every (politically) sensitive subject, researchers can be reluctant to share the research code. This use case however shows the importance of sharing. In the end sharing improved the code and the predictions. Transparency is important for public trust.
Science and technology should be available for all
A group of about 20 American researchers linked to MIT and Harvard and led by the prominent genetics researcher George Chruch, thrives for maximum transparency in their research on a nasal spray “do-it-yourself-vaccine” for COVID-19. For their research they rely very much on published open data on earlier Corona virus research (like SARS and MERS). For the group of researchers it is only logical that the data of their own research is also published openly. Ingredients, recipe and production methods are freely available, under a CC-BY-SA-4.0 and a disclaimer. This way of research does not come with guarantees, but is meant to inspire other researchers to openly publish code, data and other results. Thus to advance science and society.
Using company data: trust is not enough
In May 2020 the Lancet published an article on the use of Hydroxycholoroquine (originally meant for malaria treatment) as medication for the treatment of COVID-19 patient. The data on which the research and findings were based originated from a small company which collects and analyses heath care data. The researchers in this project never were at the same research institute, nor had they ever published together before. One of the co-authors was a staff member of the company and promised insights in the data. Soon after publication important scientific questions were raised on the truthfulness of the data and findings. Lancet started an independent investigation, however opening up the data (even for peer review) would violate the confidentiality agreement between the company and the clients. Thus the peer review process was hindered and the article was withdrawn by the Lancet two weeks after publication. The incidental collaboration between the researchers ended.
The case shows the importance of a data sharing agreement before data from third parties are used in research. When collaborating with commercial partners, this is even more important, but also between academics, trust in collaboration is needed, but not enough.
All three use cases show that opening up code or data is important and contributes to collaboration, transparency and trust in science for society. The examples also show that proper sharing licensing or data sharing agreements are crucial for scientific integrity.
To summon up the lessons learned from the COVID-cases:
- Open code sharing creates an opportunity to improve the code itself and the documentation of the code
- Sharing code without the context (documentation or report) is not enough. Context is key.
- Transparency in research that inflects on society increases societal trust in science
- When sharing code or data always include the proper reuse license
- When using data from third parties, make sure that you make a data sharing agreement which defines what you may and may not do with the data
- There are many facilities for code sharing in which version control is arranged, so that you may see who made changes and contributes, Github is one of these facilities.
What is the WUR vision on sharing code and data?
In het WUR strategic plan (2019-2022) there is a clear choice for sharing data as open as possible, as closed as needed (we share data in a FAIR manner). Conditions for access restrictions may include the use of personal data, commercial or strategic interest. Publishing data when using data from third parties, always make a Data Sharing agreement before starting research with every data delivering party.
WUR has its own instance of git, where can develop and share your code
The Wageningen Data Competence Center
Within WUR, the Wageningen Data Competence Center is your first entry point for questions on Data Science, Research Data Infrastructure or Data Management. Please contact us when you have questions on Data or Code Sharing, licensing or data sharing agreements. Check our updates on the Data@WUR intranet group or our internet pages, or contact the Data Steward in your group for more information.
For all questions you may contact: firstname.lastname@example.org