You mention missing data but I didn't see any code to find missing values, just null values. This dataset is riddled with missing data. An insulin level of zero or triceps thickness of zero means it wasn't done as opposed to an actual value of zero. By the same token, if zero is listed for pregnancies, does that mean it was not asked or the woman has never been pregnant? Nobody understands the diabetic pedigree column. Personally, as a physician data scientist I don't use this dataset for those reasons


Dr. Hoyt is a physician data scientist who is also an author and editor of several books. His most recent textbook is Data Preparation and Exploration.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store