An energy-based model (EBM) (also called Canonical Ensemble Learning or Learning via Canonical Ensemble – CEL and LCE, respectively) is an application of canonical ensemble formulation from statistical physics for learning from data.
The approach prominently appears in generative artificial intelligence.
EBMs provide a unified framework for many probabilistic and non-probabilistic approaches to such learning, particularly for training graphical and other structured models.
[citation needed] An EBM learns the characteristics of a target dataset and generates a similar but larger dataset.
Energy-based generative neural networks [1][2] is a class of generative models, which aim to learn explicit probability distributions of data in the form of energy-based models, the energy functions of which are parameterized by modern deep neural networks.
Boltzmann machines are a special form of energy-based models with a specific parametrization of the energy.
(also known as the partition function) depends on all the Boltzmann factors of all possible inputs
, it cannot be easily computed or reliably estimated during training simply using standard maximum likelihood estimation.
The expectation in the above formula for the gradient can be approximately estimated by drawing samples
using Markov chain Monte Carlo (MCMC).
[4] Early energy-based models, such as the 2003 Boltzmann machine by Hinton, estimated this expectation via blocked Gibbs sampling.
Newer approaches make use of more efficient Stochastic Gradient Langevin Dynamics (LD), drawing samples using:[5]
A replay buffer of past values
of the neural network are therefore trained in a generative manner via MCMC-based maximum likelihood estimation:[6] the learning process follows an "analysis by synthesis" scheme, where within each learning iteration, the algorithm samples the synthesized examples from the current model by a gradient-based MCMC method (e.g., Langevin dynamics or Hybrid Monte Carlo), and then updates the parameters
based on the difference between the training examples and the synthesized ones – see equation
This process can be interpreted as an alternating mode seeking and mode shifting process, and also has an adversarial interpretation.
[7][8] Essentially, the model learns a function
The term "energy-based models" was first coined in a 2003 JMLR paper[9] where the authors defined a generalisation of independent components analysis to the overcomplete setting using EBMs.
Other early work on EBMs proposed models that represented energy as a composition of latent and observable variables.
EBMs demonstrate useful properties:[4] On image datasets such as CIFAR-10 and ImageNet 32x32, an EBM model generated high-quality images relatively quickly.
It was able to generalize using out-of-distribution datasets, outperforming flow-based and autoregressive models.
EBM was relatively resistant to adversarial perturbations, behaving better than models explicitly trained against them with training for classification.
[4] Target applications include natural language processing, robotics and computer vision.
[10][11] The model has been generalized to various domains to learn distributions of videos,[7][2] and 3D voxels.
), data recovery (e.g., recovering videos with missing pixels or image frames,[7] 3D super-resolution,[4] etc), data reconstruction (e.g., image reconstruction and linear interpolation [14]).
EBMs compete with techniques such as variational autoencoders (VAEs), generative adversarial networks (GANs) or normalizing flows.
Joint energy-based models (JEM), proposed in 2020 by Grathwohl et al., allow any classifier with softmax output to be interpreted as energy-based model.
The key observation is that such a classifier is trained to predict the conditional probability
Without any change to the logits it was proposed to reinterpret the logits to describe a joint probability density: with unknown partition function
By marginalization, we obtain the unnormalized density therefore, so that any classifier can be used to define an energy function