Foundation model

In contrast, adapting an existing foundation model for a specific task or using it directly is far less costly, as it leverages pre-trained capabilities and typically requires only fine-tuning on smaller, task-specific datasets.

[3][4] Beyond text, foundation models have been developed across a range of modalities—including DALL-E and Flamingo[5] for images, MusicGen[6] for music, and RT-2[7] for robotic control.

Foundation models are also being developed for fields like astronomy,[8] radiology,[9] genomics,[10] music,[11] coding,[12] times-series forecasting,[13] mathematics,[14] and chemistry.

The United States's definitions are the only ones to make reference to the size of a foundation model, and differ on magnitude.

Beyer and Eshoo's definition also specifies that foundation models must achieve a level of performance as to be a potential danger.

All definitions agree that foundation models must be trained on a broad range of data with potential applications in many domains.

Advances in computer parallelism (e.g., CUDA GPUs) and new developments in neural network architecture (e.g., Transformers), and the increased use of training data with minimal supervision all contributed to the rise of foundation models.

[23] Relative to most prior work on deep learning, these language models demonstrated the potential of training on much larger web-sourced datasets using self-supervised objectives (e.g. predicting the next word in a large corpus of text).

"[26] These "dangerous capabilities" stem from the accidental or intentional misuse of such models, which in conjunction with their powerful nature can lead to severe harms.

Due to their adaptability to a wide range of use-cases, foundation models are sometimes considered to be examples of general-purpose AI.

General-purpose AI systems are often characterized by large size, opacity, and potential for emergence, all of which can create unintended harms.

[17] Currently, the Transformer architecture is the de facto choice for building foundation models across a range of modalities.

[35] In general, the training objectives for foundation models promote the learning of broadly useful representations of data.

[37] Training foundation models often runs the risk of violating user privacy, as private data can be disclosed, collected, or used in ways beyond the stated scope.

The average foundation model is too large to be run within a single accelerator's memory and the initial training process requires an expensive amount of resources.

GPUs are the most common choice of compute hardware for machine learning, due to high memory storage and strong power.

Acquiring a sufficient amount of GPUs of requisite compute efficiency is a challenge for many foundation model developers, one that has led to an increasing dilemma in the field.

Particularly, a model's scale is defined by compute, dataset size, and the number of parameters, all of which exhibit a power-law relationship with end performance.

A variety of methods (e.g. prompting, in-context learning, fine-tuning, LoRA) provide different tradeoffs between the costs of adaptation and the extent to which models are specialized.

Traditionally, foundation models are evaluated relative to each other through standardized task benchmarks like MMLU,[42] MMMU,[43] HumanEval,[44] and GSM8K.

[51] Since foundation models' utility depends on their own general capabilities and the performance of fine-tuned applications, evaluation must cover both metrics.

[52] Foundation models' general capabilities allow them to fulfill a unique role in the AI ecosystem,[53] fueled by many upstream and downstream technologies.

Scale AI,[55] Surge[56]) and compute providers (e.g. Amazon Web Services, Google Cloud, Microsoft Azure).

As the size and scope of foundation models grows, larger quantities of internet scraping becomes necessary, resulting in higher likelihoods of biased or toxic data.

[59] To address this issue of low-quality data that arose with unsupervised training, some foundation model developers have turned to manual filtering.

People can then access these applications to serve their various means, allowing one foundation model to power and reach a wide audience.