No-Code Data Science: Part 1

--

Are You Ready?

Robert Hoyt MD FACP FAMIA

David Patrishkoff MS LSSMBB

Introduction

Over the past approximately five years, alternatives to coding in R and Python have appeared. While programming has been the lingua franca (cross-cultural common language) for most graduate data science programs, it is also an obstacle for many who have neither the aptitude nor time to learn and retain coding skills.

Background of Low-Code Data Science

First came the low-code alternatives that expedited programming by performing advanced functions with fewer lines of code. Many of these early programs fit into the category of AutoML, in other words, they automated machine learning. Examples of these programs are DataViz, PyCaret, Pandas-profiling, and MLBox. There were also data science programs or platforms considered to be low-code such as Google Cloud AutoML and H20 AutoML. These programs excel in data preprocessing and machine learning model creation. While most low-code and no-code programs focus on facilitating data preparation and machine learning, there is also a No-code movement in artificial intelligence.

Another major category of low-code data science would be data science conducted by large language models, such as ChatGPT, Claude 2, Perplexity AI, and others. In reality, LLMs can be a blend of low-code and no-code data science. Users can generate Python or R scripts to perform a variety of functions. The script can also be cut and pasted into any integrated development environment (IDE), such as a Jupyter Notebook. On the other hand, LLMs can summarize and analyze datasets and generate reports without any coding. They are useful as data science assistants to answer detailed questions, but intermediate knowledge in data science is needed in order to intelligently leverage these technologies.

Much of what is written about the low and no-code movements focuses on their application in the realm of software development. We will not discuss this aspect and refer readers to a recent Medium article. [1]

No-Code Data Science — Visual Programming Options

With the advent of visual programming came true no-code data science. Instead of writing code, moving widgets or operators accomplishes the same thing. Examples of visual programming are RapidMiner, KNIME, and Orange. In the case of the latter two programs, Python is the computational program in the background, but no code is seen by the user. Orange is the only one of the three mentioned that is open-source and free.

Orange is a unique data science platform due to its ease of use and depth. It was created by the University of Ljubljana in Slovenia and updated on a regular basis. It has an extensive library with over 200 widgets to perform multiple functions in data processing, data exploration, visualization, modeling, supervised and unsupervised learning, time series forecasting, survival analysis, geolocation, image analysis, and text mining. There are also additional modules that can be downloaded, for example, bioinformatics and spectroscopy. We have published a series on Orange in Medium: Maximizing Orange for Data Science Education — Part 1 and Maximizing Orange for Data Science Education — Part 2.

Below is a comparison between Orange and Python programming. In the figure below, a simple classification model was created in a few seconds using Orange. The following model performance metrics were reported for both: AUC, accuracy, F1 score, precision, recall, MCC, and specificity.

Figure 1 Orange workflow to create a classification model

Here is an example of Python code for creating a classification model using random forest as the classifier and calculating the same performance metrics.

```python
# Import necessary libraries
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, accuracy_score, f1_score, precision_score, recall_score, matthews_corrcoef

# Load the dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

# Create the random forest classifier
rf = RandomForestClassifier()
rf.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = rf.predict(X_test)
y_prob = rf.predict_proba(X_test)[:, 1]

# Calculate the evaluation metrics
auc = roc_auc_score(y_test, y_prob)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
mcc = matthews_corrcoef(y_test, y_pred)
tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()
specificity = tn / (tn + fp)

# Print the evaluation metrics
print(“AUC:”, auc)
print(“Accuracy:”, accuracy)
print(“F1 score:”, f1)
print(“Precision:”, precision)
print(“Recall:”, recall)
print(“MCC:”, mcc)
print(“Specificity:”, specificity)
```

The example above illustrates that the Orange visual programming approach is intuitive, uncomplicated, and doesn’t require programming expertise.

No-Code Data Science — Other Options

Here are some other examples of no-code data analysis tools:

Tableau: Allows users to combine, analyze, and visualize data from multiple sources without any coding expertise.

Microsoft Power BI: A cloud-based business analytics service that provides interactive visualizations and business intelligence capabilities.

Google Looker Studio: A free data visualization tool that allows users to create interactive dashboards and reports with data from multiple sources.

Sisense: Enables users to create interactive dashboards, generate reports, and derive insights to make data-driven decisions.

CastorDoc: A no-code data tool that cleans and transforms data, creates visualizations, and generates reports. [2–8]

JASP: A free open-source project supported by the University of Amsterdam that supports frequentist and Bayesian statistical analysis, as well as machine learning analysis.

The no-code movement has also impacted many other areas, not specifically related to data science. For example, you can create a website without knowing HTML by using software such as Wix, Webflow, or Bubble.

No-Code Data Science Advantages

No-code data science has several advantages, including:

Accessibility: No-code data analysis tools make data analysis capabilities accessible to a broader audience. This empowers individuals throughout an organization to independently explore and analyze data, promoting a data-driven culture across the business.

Lower learning curve: No-code data analysis applications have lower learning curves.

Cost savings: There can be financial savings if the no-code approach saves times or reduces the need to employ a data scientist.

Automation: No-code data platforms can automate tedious tasks, freeing up individuals to focus on more innovative work.

Robust preprocessing: The best no-code analysis programs offer the same robust data processing, data visualizations, and model optimization techniques as Python programming.

Democratization of AI and ML: No-code platforms democratize AI and ML, reducing dependence on data scientists and empowering organizations to respond quickly to emerging trends and customer demands.

Overall, no-code data science tools can help organizations build a culture of data-driven decision-making, increase operational efficiency, and reduce costs.[9–13]

No-Code Data Science Disadvantages

No-code data science has many advantages, but it also has some potential disadvantages, including:

Limited flexibility and customization: Perhaps the most significant drawback of no-code data science is its limited flexibility in comparison to traditional coding.

Learning curve: Despite no-code data science platforms having lower learning curves than traditional coding, there is still a learning curve for users new to the platforms.

Integration and scalability constraints: No-code data science platforms may have integration and scalability constraints, which can be a disadvantage for businesses that require integration with other systems.

It must be pointed out that just because visual programming is much faster than standard programming, data preprocessing, exploration, and visualization cannot be skipped. Those steps are critical for understanding the data and preparing it for modeling.

In Part 2 we will discuss the creation of our textbook No-Code Data Science which leverages Orange and the free open-source statistical package JASP to teach comprehensive data science.

Conclusions

By removing the need for coding, entry into data science becomes notably more accessible for a larger group of interested professionals. This enables a wider array of data enthusiasts and smaller organizations to venture into the digital realm, alleviating the pressure on traditional data scientists to shoulder the entirety of an organization’s data science expansion endeavors. By involving a more diverse group of professionals in the data science process, we can foster greater trust and cultivate deeper collaborations with established data scientists.

Citations:

[1] https://medium.com/@nawaz_s/unlocking-the-future-of-software-development-the-low-code-and-no-code-revolution-d89b42775419

[2] https://sqlpad.io/tutorial/best-no-code-data-analytics-tools/
[3] https://www.softr.io/no-code/types-of-no-code-tools
[4] https://www.forbes.com/sites/bernardmarr/2022/12/12/the-10-best-examples-of-low-code-and-no-code-ai/?sh=24a5688e74b5
[5] https://userguiding.com/blog/no-code-tools/
[6] https://webflow.com/blog/no-code-apps
[7] https://www.castordoc.com/blog/what-are-the-no-code-data-tools
[8] https://unitedtraining.com/resources/blog/no-code-low-code-data-analysis-tools
[9] https://pathmonk.com/what-is-no-code-data-analysis-and-why-you-need-it/
[10] https://www.revealbi.io/blog/benefits-of-low-code-no-code-bi-solutions
11] https://www.yellowfinbi.com/blog/benefits-of-low-code-no-code-bi-solutions
[12] https://northwest.education/insights/careers/5-pros-and-cons-of-no-code-development/
[13] https://northwest.education/insights/careers/advantages-of-no-code-ai-and-ml/

--

--

Robert (Bob) Hoyt MD FACP ABPM-CI FAMIA
Robert (Bob) Hoyt MD FACP ABPM-CI FAMIA

No responses yet