Image for post
Image for post

In the past few years, artificial intelligence (AI) has become synonymous with deep learning which is based on artificial neural networks (ANNs) with multiple layers. In healthcare, the most common use of AI is image recognition, particularly for the fields of cardiology, pathology, radiology, and ophthalmology. The most common algorithmic approach for image recognition is the convolutional neural network (CNN).

To date, creating a CNN meant you were a data scientist with advanced mathematical and programming skills and were an expert with programs such as TensorFlow and PyTorch.


Image for post
Image for post

It is difficult to find patient-level data of sufficient size for research, modeling, or software development. This is largely due to HIPAA concerns and the overall lack of interoperability in the US healthcare system.

Synthetic data has potential in those areas but much of the generated data is non-medical. For example, R and Python programming languages can generate non-medical datasets for supervised and unsupervised learning. Generative adversarial networks (GANs) can also generate synthetic non-medical data.

In 2017 the Mitre Corporation developed the SyntheticMass project that emulated the healthcare data of the residents of Massachusetts. There are more than 1 million…


Image for post
Image for post
Pixaby

While there are multiple excellent commercial data science platforms available (Dataiku, Databricks, DataRobot, etc.), they are expensive and not open to public collaboration. There are only a few platforms that are free or low cost and align with the Open Data and Open Science movements. The examples that come to mind are Harvard Dataverse, the Open Science Framework (OSF), and Data World. This article will discuss Data World and its many unique features.

Data World is a public benefit corporation, located in Austin Texas that launched in 2016. Data World is an online platform where participants can find data or…


Part 2

Image for post
Image for post

In part 1 of this series, I addressed the problem of machine learning (ML) and artificial intelligence (AI) reported studies not adhering to existing evidence-based guidelines and as a result, were often felt to be of low quality. This is a significant issue given the proliferation of predictive analytical studies. For example, almost 800 predictive studies have been reported on cardiovascular disease alone. I mentioned several guidelines such as TRIPOD and CHARMS that outlined how these studies should be conducted and reported. There are also “risk of bias” guidelines, such as the Prediction model Risk Of Bias ASsessment Tool (PROBAST)


Part I

Image for post
Image for post

Most of us in medicine are familiar with the concept of evidence-based medicine (EBM) which is based on standards developed by several international organizations. According to Johns Hopkins Medicine

Evidence-based medicine is the integration of best research evidence with clinical expertise and patient values. Evidence-based medicine is an interdisciplinary approach which uses techniques from science, engineering, biostatistics and epidemiology, such as meta-analysis, decision analysis, risk-benefit analysis, and randomized controlled trials.”

There is a hierarchy of evidence such that we know, for example, that correlation does not equal causation and observational studies are a less robust form of evidence…


Image for post
Image for post

Machine learning has increased in popularity in the US as evidenced by Google Trends trendline over the last five years.


Image for post
Image for post

In this data-centric world we live in we need lots of tools in our data science tool kit. The kit should include expertise in spreadsheets, statistics, databases, and machine learning. Math and statistics are the backbone of data analysis, machine learning and artificial intelligence, yet they are difficult to learn and even more difficult to retain. Other obstacles for learning and using statistical packages include the cost for commercial packages, such as SPSS, SAS and Stata and the fact that many stats courses demand long-hand calculation of statistical methods.

Enter jamovi (written lower-case). This is a free open-source statistical package…


Image for post
Image for post

Creating a new textbook is a complex process, requiring collaboration and commitment by everyone involved. It is clearly different from writing a fictional work where you often don’t have co-authors and you don’t require citations and references. It is also quite common to have multiple tables and images in every textbook chapter, along with a table of contents, foreword, preface, and an index. For many authors, the only consideration is finding a commercial publisher to review, edit, print, and market the textbook. However, there are other options out there.

For my first textbook Health Informatics Practical Guide, I opted to…


Google Dataset Search was launched in September 2018 with the goal to create a searchable public data repository. The search engine searches on data repositories on the Web based on their meta-data and to date, it includes millions of datasets from a variety of sources. The search engine is based on https://schema.org/ that uses an open standard that organizes the metadata. Anyone can contribute datasets to this engine but they must follow the schema.org guidelines. Further details regarding contributing data can be found here.

Below is a diagram as to how the dataset search engine actually works. Using schema.org standards…


It is widely accepted that data science is an in-demand profession for all industries including healthcare. The unfortunate reality is that there are not enough data scientists to meet the needs of the various fields. Data science is a complex field, demanding programming skills, database management, statistics, higher mathematics, machine learning, soft skills, and domain expertise. Most data scientists have advanced degrees; 49 percent hold a Master’s degree, and 41 percent hold a PhD.

Due to the shortage of data scientists in the foreseeable future, alternatives have been suggested. Gartner and others have proposed the new role of the “citizen…

Robert Hoyt MD FACP ABPM-CI FAMIA

Dr. Hoyt is a physician data scientist who is also an author and editor of several books. His most recent textbook is Data Preparation and Exploration.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store