You mention missing data but I didn't see any code to find missing values, just null values. This dataset is riddled with missing data. An insulin level of zero or triceps thickness of zero means it wasn't done as opposed to an actual value of zero. By the same token, if zero is listed for pregnancies, does that mean it was not asked or the woman has never been pregnant? Nobody understands the diabetic pedigree column. Personally, as a physician data scientist I don't use this dataset for those reasons