TabPFN: New Way to Analyze Data
Supervised learning is a machine learning method to predict an outcome using data that has a known label or outcome as part of the data. If you are trying to predict a categorical binary outcome, such as lived/died or admitted/not admitted, that is a classification model. If the outcome you are trying to predict is numerical, such as length of stay or life expectancy, that is a regression model.
A myriad of algorithms are used in predictive analytics to learn patterns in data used by classification and regression models. Some have statistical origins, such as logistic and linear regression, and these have been around for a long time. Machine learning algorithms, such as random forest and XGBoost, are newer, although the latter was introduced in 2014. What has happened since then? Enter Tabular Prior-data Fitted Network (TabPFN), which was highlighted in an article titled Accurate predictions on small data with a tabular foundation model published on January 8, 2025 in Nature.
Revolutionary Approach to Tabular Data
TabPFN introduces a revolutionary approach to managing tabular data for classification and regression tasks. Tabular data is data that fits into a table or spreadsheet. TabPFN employs a pre-trained transformer model (deep neural networks), which eliminates the need for hyperparameter tuning to make predictions in a single forward pass, in contrast to traditional methods that require training from scratch. This is a foundational AI model, which means it is capable of a variety of tasks. It does not require the data to be split into training and testing, and the overall process is fast. It is intended for small to medium-sized datasets with up to 10,000 rows and 500 columns. TabPFN is a Python package so easy to install.
Putting TabPFN to the test
According to the article in Nature, this method is touted to equal or exceed current machine learning algorithms, but how about a head-to-head comparison? I tested TabPFN with three classification and one regression task. I used Vizly, an AI-enabled data analytics platform where you can select the programming language and the large language model. For these scenarios, I used Python and GPT-4o.
Adult Income dataset. I asked TabPFN and XGBoost to create a classification model on this dataset to predict income > 50K and report AUC, accuracy, F1 score, precision, recall, specificity, and execution time. In the case of XGBoost, I split the data into 75% train/25% test, 10-fold cross-validation, and used grid search to optimize the algorithm. No preprocessing was undertaken for either approach. File size = 5.3 MB. In this example, XGBoost was associated with higher accuracy, F1 score, recall, and specificity, but TabPFN was associated with a higher AUC and precision. Click the table to expand.
Framingham Heart Study. I asked TabPFN and XGBoost to create a classification model on this dataset to predict the 10-year risk of coronary heart disease and report AUC, accuracy, F1 score, precision, recall, specificity, and execution time. In the case of XGBoost, I split the data into 75% train/25% test and used grid search to optimize the algorithm. No preprocessing was undertaken for either approach. File size = 192 KB. This dataset has missing data, and there is a class imbalance, as only 15% of subjects developed heart disease at 10 years. The results are similar for the two approaches, and the results reflect the known class imbalance, with low recall and high specificity.
Heart Disease Cleveland. A classification model was created based on this dataset to predict heart disease and report AUC, accuracy, F1 score, precision, recall, specificity, and execution time on test data. . All patients had multiple tests, including cardiac catheterization and stress testing. I first ran the model using TabPFN in Vizly with no preprocessing. Using multiple other classification algorithms as noted in the table below, I split the data into 75% train/25% test. No preprocessing was undertaken for either approach. 10-fold cross-validation was used, along with grid search. File size = 24 KB. This dataset has only three missing data points and no class imbalance. For this dataset, logistic regression was the overall winner, and all algorithms executed quicker than TabPFN.
Boston Housing Dataset. The outcome is sales price, so this is a regression model. For the algorithms used, I requested the performance metrics be reported on the test data as MSE, RMSE, MAE, R squared, and execution time. For non-TabPFN algorithms, I requested that the data be split into 75% training and 25% testing, plus using 10-fold cross-validation. File size = 38 KB. In this example, TabPFN performed the best, followed by XGBoost.
Conclusions
This rudimentary evaluation of TabPFN suggests it does perform very well and often equals or exceeds xgboost, which is one of the most competitive algorithms used for predictive analytics. It is worth noting that TabPFN had the slowest execution times, which could be an issue with larger datasets. The authors clearly state the TabPFN is indicated for datasets with less than 10k rows and fewer than 500 columns, which is adequate for many research and educational projects.
It is not known whether this new deep learning method is better with common data challenges, such as class imbalance, multicollinearity, missing data, outliers, and so forth. I look forward to future academic publications addressing these questions.