You can edit almost every page by Creating an account and confirming your email.

TabPFN

From EverybodyWiki Bios & Wiki

TabPFN
Developer(s)Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, Frank Hutter, Leo Grinsztajn, Klemens Flöge, Oscar Key & Sauraj Gambhir [1]
Initial releaseSeptember 16, 2023; 2 years ago (2023-09-16)[2][3]
Written inPython [3]
Engine
    Operating systemLinux, macOS, Microsoft Windows[3]
    TypeMachine learning
    LicenseApache License 2.0
    Websitegithub.com/PriorLabs/TabPFN

    Search TabPFN on Amazon.TabPFN (Tabular Prior-data Fitted Network) is a machine learning model that uses a transformer architecture for supervised classification and regression tasks on small to medium-sized tabular datasets, e.g., up to 10,000 samples.[1] The model is known for high predictive performance on small dataset benchmarks and using a meta-learning approach built upon prior-data fitted networks.[4]

    Overview

    First developed in 2022, TabPFN v2 was published in 2025 in Nature (journal) by Hollmann and co-authors.[1] The source code is published on GitHub under a modified Apache License and on PyPi.[5]

    TabPFN v1 was introduced in a 2022 pre-print and presented at ICLR 2023.[2] Prior Labs, founded in 2024, aims to commercialize TabPFN.[6]

    TabPFN supports classification, regression and generative tasks,[1] and its TabPFN-TS extension adds time series forecasting.[7]

    Pre-training

    TabPFN addresses challenges in modeling tabular data[8][9] with Prior-Data Fitted Networks,[10] by using a transformer pre-trained on synthetic tabular datasets.[2][4]

    It is pre-trained once on around 130 million synthetic datasets generated using Structural Causal Models or Bayesian Neural Networks, simulating real-world data characteristics like missing values or noise.[1] This enables TabPFN to process new datasets in a single forward pass, adapting to the input without retraining.[2] The model’s transformer encoder processes features and labels by alternating attention across rows and columns, capturing relationships within the data.[11] TabPFN v2, an updated version, handles numerical and categorical features, missing values, and supports tasks like regression and synthetic data generation.[1]

    TabPFN's pre-training exclusively uses synthetically generated datasets, avoiding benchmark contamination and the costs of curating real-world data.[2] TabPFN v2 was pre-trained on approximately 130 million such datasets, each serving as a "meta-datapoint".[1]

    The synthetic datasets are primarily drawn from a prior distribution embodying causal reasoning principles, using Structural Causal Models (SCMs) or Bayesian Neural Networks (BNNs). Random inputs are passed through these models to generate outputs, with a bias towards simpler causal structures. The process generates diverse datasets that simulate real-world imperfections like missing values, imbalanced data and noise. During pre-training, TabPFN predicts the masked target values of new data points given training data points and their known targets, effectively learning a generic learning algorithm that is executed by running a neural network forward pass.[1]

    Since TabPFN is pre-trained, in contrast to other deep learning methods, it does not require costly hyperparameter optimization.[11]

    Applications

    Applications for TabPFN have been investigated for domains such as Time Series Forecasting,[7] chemoproteomics,[12] insurance risk classification,[13] medical diagnostics,[14][15][16][17] metagenomics,[18] wildfire propagation modeling,[19] and others.

    See also

    References

    1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Hollmann, N.; Müller, S.; Purucker, L. (2025). "Accurate predictions on small data with a tabular foundation model". Nature. 637 (8045): 319–326. Bibcode:2025Natur.637..319H. doi:10.1038/s41586-024-08328-6. PMC 11711098 Check |pmc= value (help). PMID 39780007 Check |pmid= value (help).
    2. 2.0 2.1 2.2 2.3 2.4 Hollmann, Noah (2023). TabPFN: A transformer that solves small tabular classification problems in a second. International Conference on Learning Representations (ICLR).
    3. 3.0 3.1 3.2 Python Package Index (PyPI) - tabpfn https://pypi.org/project/tabpfn/
    4. 4.0 4.1 McCarter, Calvin (May 7, 2024). "What exactly has TabPFN learned to do? | ICLR Blogposts 2024". iclr-blogposts.github.io. Retrieved 2025-06-22.
    5. PriorLabs/TabPFN, Prior Labs, 2025-06-22, retrieved 2025-06-23
    6. Kahn, Jeremy (5 February 2025). "AI has struggled to analyze tables and spreadsheets. This German startup thinks its breakthrough is about to change that". Fortune.
    7. 7.0 7.1 "TabPFN Time Series". GitHub.
    8. Shwartz-Ziv, Ravid; Armon, Amitai (2022). "Tabular data: Deep learning is not all you need". Information Fusion. 81: 84–90. arXiv:2106.03253. doi:10.1016/j.inffus.2021.11.011.
    9. Grinsztajn, Léo; Oyallon, Edouard; Varoquaux, Gaël (2022). Why do tree-based models still outperform deep learning on typical tabular data?. Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS '22). pp. 507–520.
    10. Müller, Samuel (2022). Transformers can do Bayesian inference. International Conference on Learning Representations (ICLR).
    11. 11.0 11.1 McElfresh, Duncan C. (8 January 2025). "The AI tool that can interpret any spreadsheet instantly". Nature. 637 (8045): 274–275. Bibcode:2025Natur.637..274M. doi:10.1038/d41586-024-03852-x. PMID 39780000 Check |pmid= value (help).
    12. Offensperger, Fabian; Tin, Gary; Duran-Frigola, Miquel; Hahn, Elisa; Dobner, Sarah; Ende, Christopher W. am; Strohbach, Joseph W.; Rukavina, Andrea; Brennsteiner, Vincenth; Ogilvie, Kevin; Marella, Nara; Kladnik, Katharina; Ciuffa, Rodolfo; Majmudar, Jaimeen D.; Field, S. Denise; Bensimon, Ariel; Ferrari, Luca; Ferrada, Evandro; Ng, Amanda; Zhang, Zhechun; Degliesposti, Gianluca; Boeszoermenyi, Andras; Martens, Sascha; Stanton, Robert; Müller, André C.; Hannich, J. Thomas; Hepworth, David; Superti-Furga, Giulio; Kubicek, Stefan; Schenone, Monica; Winter, Georg E. (26 April 2024). "Large-scale chemoproteomics expedites ligand discovery and predicts ligand behavior in cells". Science. 384 (6694): eadk5864. Bibcode:2024Sci...384k5864O. doi:10.1126/science.adk5864. PMID 38662832 Check |pmid= value (help).
    13. Chu, Jasmin Z. K.; Than, Joel C. M.; Jo, Hudyjaya Siswoyo (2024). "Deep Learning for Cross-Selling Health Insurance Classification". 2024 International Conference on Green Energy, Computing and Sustainable Technology (GECOST). pp. 453–457. doi:10.1109/GECOST60902.2024.10475046. ISBN 979-8-3503-5790-5. Search this book on
    14. Alzakari, Sarah A.; Aldrees, Asma; Umer, Muhammad; Cascone, Lucia; Innab, Nisreen; Ashraf, Imran (December 2024). "Artificial intelligence-driven predictive framework for early detection of still birth". SLAS Technology. 29 (6): 100203. doi:10.1016/j.slast.2024.100203. PMID 39424101 Check |pmid= value (help).
    15. El-Melegy, Moumen; Mamdouh, Ahmed; Ali, Samia; Badawy, Mohamed; El-Ghar, Mohamed Abou; Alghamdi, Norah Saleh; El-Baz, Ayman (21 June 2024). "Prostate Cancer Diagnosis via Visual Representation of Tabular Data and Deep Transfer Learning". Bioengineering. 11 (7): 635. doi:10.3390/bioengineering11070635. PMC 11274351 Check |pmc= value (help). PMID 39061717 Check |pmid= value (help).
    16. Karabacak, Mert; Schupper, Alexander; Carr, Matthew; Margetis, Konstantinos (August 2024). "A machine learning-based approach for individualized prediction of short-term outcomes after anterior cervical corpectomy". Asian Spine Journal. 18 (4): 541–549. doi:10.31616/asj.2024.0048. PMC 11366553 Check |pmc= value (help). PMID 39113482 Check |pmid= value (help).
    17. Liu, Yanqing; Su, Zhenyi; Tavana, Omid; Gu, Wei (June 2024). "Understanding the complexity of p53 in a new era of tumor suppression". Cancer Cell. 42 (6): 946–967. doi:10.1016/j.ccell.2024.04.009. PMC 11190820 Check |pmc= value (help). PMID 38729160 Check |pmid= value (help).
    18. Perciballi, Giulia; Granese, Federica; Fall, Ahmad; Zehraoui, Farida; Prifti, Edi; Zucker, Jean-Daniel (10 October 2024). Adapting TabPFN for Zero-Inflated Metagenomic Data. Table Representation Learning Workshop at NeurIPS 2024.
    19. Khanmohammadi, Sadegh; Cruz, Miguel G.; Perrakis, Daniel D.B.; Alexander, Martin E.; Arashpour, Mehrdad (September 2024). "Using AutoML and generative AI to predict the type of wildfire propagation in Canadian conifer forests". Ecological Informatics. 82. doi:10.1016/j.ecoinf.2024.102711. Unknown parameter |article-number= ignored (help)


    This article "TabPFN" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:TabPFN. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.