Open in app

Sign In

Write

Sign In

Robert (Bob) Hoyt MD FACP ABPM-CI FAMIA
Robert (Bob) Hoyt MD FACP ABPM-CI FAMIA

48 Followers

Home

About

Published in

MLearning.ai

·Pinned

Maximizing Orange for Data Science Education — Part 1

What is Orange? Orange data mining platform is a free educational data science platform created by computational biologists at the University of Ljubljana, Slovenia. The computation engine in the background is Python, although users don’t see nor need programming to execute functions. It’s fast, intuitive, and comprehensive. It is client-based…

Data Science

7 min read

Maximizing Orange for Data Science Education — Part 1
Maximizing Orange for Data Science Education — Part 1
Data Science

7 min read


Feb 5

Synthetic Tabular Data Created by AI

Robert E. Hoyt David Patrishkoff Introduction Synthetic data is artificial data that is generated with AI and other modalities when not enough real world data is available to sufficiently train a predictive model or privacy is an issue. Synthetic data is only intended to expand the training dataset and is…

10 min read

Synthetic Tabular Data Created by AI
Synthetic Tabular Data Created by AI

10 min read


Jul 30, 2022

Explainable Models — Unlock the Black Box

Introduction We live in an era in which machine learning and artificial intelligence can generate a myriad of predictions. One difficulty, however, is can we explain how the model came to its conclusions? Can we explain this to non-data scientists? Is it scientifically correct? …

Black Box

6 min read

Explainable Models — Unlock the Black Box
Explainable Models — Unlock the Black Box
Black Box

6 min read


Published in

MLearning.ai

·Jul 26, 2022

Imbalanced Datasets -What Are Possible Solutions?

Are Imbalanced Datasets Common and Challenging? A major challenge in several fields such as fraud detection and medicine is the fact that the category/class/outcome you are interested in is often a small minority. For example, financial fraud constitutes about 3% of inquiries which means that 97% are not fraudulent. Similarly…

Data

6 min read

Imbalanced Datasets -What Are Possible Solutions?
Imbalanced Datasets -What Are Possible Solutions?
Data

6 min read


Published in

MLearning.ai

·Jul 23, 2022

Maximizing Orange for Data Science Education — Part 2

In part 1 of this series, I provided an overview of the data mining platform Orange which focuses on data science education. In part 2 I will focus on some unique Orange educational features users should find helpful. …

Data Science

7 min read

Maximizing Orange for Data Science Education — Part 2
Maximizing Orange for Data Science Education — Part 2
Data Science

7 min read


Dec 2, 2021

Bit.io

It’s about time a web database system was designed for usability Finding a relational database for business or education sounds easy but, in many instances, it is not. There may be connection, configuration, or installation challenges creating a new database. Charges for use may be difficult to understand. …

4 min read

Bit.io
Bit.io

4 min read


Sep 2, 2021

Data Science👨‍💻: Introduction to Orange Tool Part-2
122
1

Manthan Bhikadiya 💡

I enjoyed reading your series of articles on Orange.

I enjoyed reading your series of articles on Orange. My question is how to reduce data leakage. It is clear how to split the data into train/test with Data Sampler but what if you want to impute, normalize the data or pick the top 5 relevant predictors? I get the impression that you connect the Preprocess widget directly to Test and Score. It also receives the train and test data from the Data Sampler widget. Do you believe this is the correct workflow to minimize data leakage? Thanks

1 min read

1 min read


Apr 9, 2021

Diabetes Classification Model with SVM and KNN models
1

Amit Chauhan

You mention missing data but I didn't see any code to find missing values, just null values.

You mention missing data but I didn't see any code to find missing values, just null values. This dataset is riddled with missing data. An insulin level of zero or triceps thickness of zero means it wasn't done as opposed to an actual value of zero. By the same token, if zero is listed for pregnancies, does that mean it was not asked or the woman has never been pregnant? Nobody understands the diabetic pedigree column. Personally, as a physician data scientist I don't use this dataset for those reasons

1 min read

1 min read


Dec 18, 2020

Microsoft Lobe: Image Recognition Made Simple

In the past few years, artificial intelligence (AI) has become synonymous with deep learning which is based on artificial neural networks (ANNs) with multiple layers. In healthcare, the most common use of AI is image recognition, particularly for the fields of cardiology, pathology, radiology, and ophthalmology. …

Image Classification

5 min read

Microsoft Lobe: Image Recognition Made Simple
Microsoft Lobe: Image Recognition Made Simple
Image Classification

5 min read


Dec 15, 2020

Synthea: Do-It-Yourself Data

It is difficult to find patient-level data of sufficient size for research, modeling, or software development. This is largely due to HIPAA concerns and the overall lack of interoperability in the US healthcare system. Synthetic data has potential in those areas but much of the generated data is non-medical. For…

Synthetic Data

5 min read

Synthea: Do-It-Yourself Data
Synthea: Do-It-Yourself Data
Synthetic Data

5 min read

Robert (Bob) Hoyt MD FACP ABPM-CI FAMIA

Robert (Bob) Hoyt MD FACP ABPM-CI FAMIA

48 Followers

Bob is a physician data scientist and author of several textbooks found at https://www.informaticseducation.org and https://nocodedatascience.net

Following
  • Parul Pandey

    Parul Pandey

  • Frank Odom

    Frank Odom

  • Rachel Draelos, MD, PhD

    Rachel Draelos, MD, PhD

  • elvis

    elvis

  • Phani Rohith

    Phani Rohith

See all (13)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech

Teams