Abstract
We read with great interest the proposal made by Gary Collins and Karel Moons1 to develop a version of the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement that is specific to machine learning (ML), to be known as TRIPOD-ML.2 We agree that the understandable excitement around ML-enabled technologies should not overrule the need for robust scientific evaluation, and ML prediction model studies should adopt established guidance for reporting.
Demonstrating a good level of accuracy in ML-based predictive models is a major step in the validation process. However, the true clinical impact of such models, as interventions, will rely on the model demonstrating superior patient outcomes in randomised clinical trials. As of July, 2019, 368 clinical trials in the field of artificial intelligence (AI) or ML are registered on ClinicalTrials.gov. As with studies of predictive models, AI interventions (both diagnostic and therapeutic) are associated with specific challenges that are insufficiently addressed by the existing Consolidated Standards of Reporting Trials (CONSORT) and Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT) statements.3 For example, elements that require detailed and specific reporting include infrastructure of the trial setting and its ability to administer an ML intervention in real time, the inclusion and exclusion criteria at both the participant level and the input-data level, the interaction between an algorithm and humans that constitutes the intervention and its downstream effects, and other strategies involving adaptive ML technologies that have the potential to continuously improve in performance.
To address these challenges, we are preparing extensions of the CONSORT and SPIRIT statements, CONSORT-AI and SPIRIT-AI, which will specifically focus on clinical trials in which the intervention includes an ML or other AI component. CONSORT-AI4 and SPIRIT-AI will be complementary to the TRIPOD-ML extension proposed by Collins and Moons,1 which addresses the validation stage of a predictive AI model. Together these standards can ensure the same level of reporting transparency is maintained as ML models are taken from observational studies to prospective clinical trials.