Evidence-Based Data Science — We Aren’t There Yet

--

Part I

Most of us in medicine are familiar with the concept of evidence-based medicine (EBM) which is based on standards developed by several international organizations. According to Johns Hopkins Medicine

Evidence-based medicine is the integration of best research evidence with clinical expertise and patient values. Evidence-based medicine is an interdisciplinary approach which uses techniques from science, engineering, biostatistics and epidemiology, such as meta-analysis, decision analysis, risk-benefit analysis, and randomized controlled trials.”

There is a hierarchy of evidence such that we know, for example, that correlation does not equal causation and observational studies are a less robust form of evidence than randomized controlled trials (RCTs). There have been a set of standards in existence for many years for journal publication for Observational Studies (STROBE), RCTs (CONSORT), and Systematic Reviews & Meta-analyses (PRISMA). Higher rated medical journals require adherence to these standards.

Given the speed at which machine learning (ML) and artificial intelligence (AI) has emerged on the scene, the question is, are we adopting known standards for ML and AI? The evidence suggests the answer is currently no and this is the theme of this blog. I will discuss some of the standards and what recent articles in the literature have reported.

There has been a dramatic proliferation of medical predictive models in the past decade. Because machine learning commonly focuses on predictive analytics, we need to first look at existing standards. Let’s start with the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis (TRIPOD) Statement. This initiative collaborated to generate a 22-item checklist that should be followed as a publication guideline. Most of the information is generic and straightforward, although they do mandate how missing data and model validation are handled. A systematic review of diagnostic studies using machine learning published in March 2020 looked at TRIPOD adherence for the 28 articles they reviewed. None of the articles adhered or mentioned the standards. Importantly, TRIPOD-ML and TRIPOD-AI are in the works so more specific and stringent guidelines are likely to be available soon.

Another valuable set of guidelines is the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies: The CHARMS Checklist . While this checklist is for published systematic reviews, the reality is that the checklist would be valuable for any ML or AI article. The checklist consists of thirty-five items, organized into eleven sections. Of note, the sections on model development, validation, and performance provide great detail as to the optimal manner to conduct predictive analytics, but ironically, it is these sections where so many articles fall short of the mark. These details and more will be discussed in part 2 of this series.

In the last several years, there have been multiple articles to suggest that not only is AI equivalent to physicians for image interpretation, but it is sometimes superior. Unfortunately, these articles are only now being critically appraised. Many of the AI studies reported were proof of concepts and did not adhere to existing standards. Several recent articles make it clear that the overly optimistic articles published since 2016 were significantly flawed.

In part 2 we will discuss some of the specifics as to why articles on ML and AI fall short of evidence-based data science.

--

--

Robert (Bob) Hoyt MD FACP ABPM-CI FAMIA
Robert (Bob) Hoyt MD FACP ABPM-CI FAMIA

No responses yet