Foundation model

Script error: No such module "Draft topics". Script error: No such module "AfC topic".

In Artificial Intelligence, foundation models are machine learning models like BERT^[1], GPT-3^[2] or DALL-E^[3]^[4] that are "pre-trained" in a task-agnostic way on large-scale data to be "fine-tuned" later to multiple downstream tasks.^[5] As of 2022, foundation models are increasingly the state of the art choice in natural language processing, computer vision, and other domains.^[5]

The concept of foundation models was introduced in 2021 in a position paper by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI), and co-authored by over 120 leading machine learning researchers, including American computer scientist and executive Fei-fei Li.^[5]

Natural language processing, in particular, has been profoundly affected by foundation models.^[5] In 2018, Google AI researchers introduced BERT^[1], a transformer-based self-supervised language model pre-trained on a corpus of unlabeled data extracted from the BooksCorpus with 800M words, and English Wikipedia with 2,500M words. Since its introduction, BERT fine-tuned on downstream tasks achieved state-of-art performance on a number of natural language processing and natural language understanding benchmarks^[1]^[5], including GLUE (General Language Understanding Evaluation), SQuAD (Stanford Question Answering Dataset) v1.1 and v2.0, and SWAG (Situations With Adversarial Generations). In 2020, OpenAI researchers introduced GPT-3, a transformer-based autoregressive language model that passed the Turing test. In computer vision, foundation models have advanced the state of the art in both traditional vision tasks as well as visual synthesis.^[3]^[5]

References[edit]

↑ ^1.0 ^1.1 ^1.2 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (2019-05-24). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805 [cs.CL].
↑ Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini (2020-07-22). "Language Models are Few-Shot Learners". arXiv:2005.14165 [cs.CL].
↑ ^3.0 ^3.1 Ramesh, Aditya; Pavlov, Mikhail; Goh, Gabriel; Gray, Scott; Voss, Chelsea; Radford, Alec; Chen, Mark; Sutskever, Ilya (2021-02-26). "Zero-Shot Text-to-Image Generation". arXiv:2102.12092 [cs.CV].
↑ Radford, Alec; Kim, Jong Wook; Hallacy, Chris; Ramesh, Aditya; Goh, Gabriel; Agarwal, Sandhini; Sastry, Girish; Askell, Amanda; Mishkin, Pamela; Clark, Jack; Krueger, Gretchen (2021-02-26). "Learning Transferable Visual Models From Natural Language Supervision". arXiv:2103.00020 [cs.CV].
↑ ^5.0 ^5.1 ^5.2 ^5.3 ^5.4 ^5.5 ^5.6 Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; Arora, Simran; von Arx, Sydney; Bernstein, Michael S.; Bohg, Jeannette; Bosselut, Antoine; Brunskill, Emma; Brynjolfsson, Erik (2021-08-16). "On the Opportunities and Risks of Foundation Models". arXiv:2108.07258 [cs.LG].

This article "Foundation model" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Foundation model. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.

This page exists already on Wikipedia.