Publishing Reproducible Research: A Model for Transparency
An Inside Look at SIIM’s Decision to Make a Big Change to its Standards
Board of Directors for the Society for Imaging Informatics in Medicine (SIIM) has decided to promote the publication of reproducible research in the Journal of Digital Imaging. In the past, JDI and many other scientific publications have published results of research that was peer reviewed, but not necessarily independently tested and proven. In support of the current culture promoting transparency and openness in research, however, the editors of JDI are committed to the process of publishing reproducible research in imaging informatics for medicine.
JDI is doing this by adopting the Center for Open Science’s Transparency and Openness Promotion (TOP) guidelines, “a community-driven effort to align scientific ideals with practices” first published in 2015.1 The TOP guidelines consist of eight standards with three levels of increasing rigor and they can be adopted individually or collectively.
The editors of JDI have initially decided to adopt the guidelines pertaining to citation, data transparency and analytic methods (code) transparency. This means computational articles involving code and data must include an indication if the authors will make their data available to other researchers and where it will be available, and will make their program code and documentation available to others through a trusted digital repository. In addition, all program code, data and other methods reported must be cited and listed in the reference section.
A Closer Look at the Board’s Motivation
So why would a journal that has been publishing for many years make such a costly change? Committing to the publication of reproducible research improves the journal’s reputation and author satisfaction.
A recent survey of more than 1,500 researchers conducted by Nature indicated that more than 70 percent of researchers have tried and failed to reproduce another scientist’s experiments and more than half have failed to reproduce their own (Nature. 2016 May 26;533(7604):452-4). Although 70 percent of researchers failed to reproduce another scientist’s experiments, less than 20 percent reported that they had ever been contacted by another research unable to reproduce their own work. Respondents to the survey were asked to rate 11 different approaches to improving reproducibility in science, and “Journals enforcing standards” finished with the lowest rank, but still won a 69-percent endorsement. This is a revealing statistic that tends to support the practice of enforcing reproducibility standards for journal publications.
Also, In a 2007 PLOS ONE study, there was a significant increase in citation count when the research’s dataset was made publicly available (PLoS One. 2007 Mar 21;2(3):e308). Clinical trials where microarray data were shared were cited about 70 percent more times than the clinical trials that did not share their data. However, the time investment required to release data can be more than an investigator is willing to do, and the investigator may not even have the resources available to document the data, de-identify it, store it and release it. In addition, there is a possibility that results and conclusions can be challenged by additional analysis or that future research could discover different relationships and undermine the planned research of the original author.
Change Brings Challenges
spite of the difficulties of developing and enforcing reproducibility standards for journal publication, the editors of JDI are committed to the concepts of transparency and openness in scientific publications that involve computation. In the case of JDI, this translates to a commitment to require the release of the algorithms or code used to manipulate medical images, open access to the de-identified data in the form of medical images and appropriate citations in data, program code and other methods. All data sets and program code should be cited in the text and listed in the reference section. References for data sets and program code should include a persistent identifier such as the Digital Object Identifier.
When possible, JDI plans to use GitHub as a repository for the code used in articles published in the journal. Another option would be to share the final trained model so that the research could be replicated by other researchers. Although the details of the requirement for code or model sharing are still being finalized, GitHub, a widely used code hosting platform, is well known to many programmers in the field of imaging informatics in medicine and will require only an initial set-up and a committee from SIIM to oversee the repositories of code or models.
GitHub’s value as a repository is that it also helps maintain version control and is open to collaboration. A GitHub Pull Request allows sharing and viewing of code. In addition, it allows a collaborator to propose changes in the code and asks for a review from others before merging the changes into the repository. A collaborator can even invite feedback from other teams.
sharing is a more complicated issue. Since many of the articles published in JDI use data sets pulled from individual institutions, sharing of that data is complicated by not only creating a location for releasing the data, but also de-identification of the data to remove protected health information and possible legal complications involved in data ownership and copyright. Many institutions will not allow the release of that data.
Since most of the images used in JDI published research are in DICOM format, tools such as PixelMed DicomCleaner can be used to de-identify images. However, to be safe, a DICOM expert should review all the images put in the public domain to be sure all the PHI has been removed. Digital medical images are very large so the repository that holds and releases the images for research must be robust and must be managed. This implies that a commitment of money and personnel time be made by a journal to support data sharing and transparency. The TOP guidelines only require that a researcher indicate that the “materials used to conduct the research clearly documented and maximally available to and researcher for purposes of reproducing the results or replicating the procedure.” This implies that the responsibility for releasing the data sets fall on the researcher, who may not have the capability of safely releasing private health information or may not be allowed to release data. Of course, if the researcher uses data from a public repository, the only requirement would be to document which data were used.
Questions Remain, but the Movement is Growing
The publication of reproducible research can benefit a journal, it’s authors and the promotion of science in general. The current culture of openness supports the concepts of code and data sharing, and full disclosure of scientific methods used in a
research project. The challenge for JDI and other journals lies in the ability to enforce and support the standards of reproducibility through policy changes and the implementation of tools that can be used by researchers.
is a cost associated with the management of this data. Who will pay for this? There is some advantage to the publishers for paying for this openness, but when data and code are released, open access to the articles themselves will be expected next. However, more than 2,900 journals and organizations have already become signatories of the TOP guidelines, so this isn’t stopping anyone from moving forward. The movement is growing.
Janice Honeyman-Buck, PhD, is editor-in-chief of the Journal of Digital Imaging, the peer reviewed journal of the Society for Imaging Informatics in Medicine.
References:
Transparency and Openness Promotion Guidelines . Center for Open Science. https://cos.io/our-services/top-guidelines/. Accessed August 1, 2017.