Extreme learning machine

Extreme learning machines are feedforward neural networks for classification, regression, clustering, sparse approximation, compression and feature learning with a single layer or multiple layers of hidden nodes, where the parameters of hidden nodes (not just the weights connecting inputs to hidden nodes) need to be tuned.

In most cases, the output weights of hidden nodes are usually learned in a single step, which essentially amounts to learning a linear model.

The name "extreme learning machine" (ELM) was given to such models by Guang-Bin Huang who originally proposed for the networks with any type of nonlinear piecewise continuous hidden nodes including biological neurons and different type of mathematical basis functions.

[4] According to some researchers, these models are able to produce good generalization performance and learn thousands of times faster than networks trained using backpropagation.

[5] In literature, it also shows that these models can outperform support vector machines in both classification and regression applications.

One significant achievement made in those years is to successfully prove the universal approximation and classification capabilities of ELM in theory.

It is shown that SVM actually provides suboptimal solutions compared to ELM, and ELM can provide the whitebox kernel mapping, which is implemented by ELM random feature mapping, instead of the blackbox kernel used in SVM.

PCA and NMF can be considered as special cases where linear hidden nodes are used in ELM.

Additionally since 2011, significant biological studies have been made that support certain ELM theories.

[18][19][20] From 2017 onwards, to overcome low-convergence problem during training LU decomposition, Hessenberg decomposition and QR decomposition based approaches with regularization have begun to attract attention[21][22][23] In 2017, Google Scholar Blog published a list of "Classic Papers: Articles That Have Stood The Test of Time".

[25][26][27] Given a single hidden layer of ELM, suppose that the output function of the

The output function of the ELM for single hidden layer feedforward networks (SLFN) with

training samples, the hidden layer output matrix

Generally speaking, ELM is a kind of regularization neural networks but with non-tuned hidden layer mappings (formed by either random hidden nodes, kernels or other implementations), its objective function is:

As a special case, a simplest ELM training algorithm learns a model of the form (for single hidden layer sigmoid neural networks): where W1 is the matrix of input-to-hidden-layer weights,

Due to its different learning algorithm implementations for regression, classification, sparse coding, compression, feature learning and clustering, multi ELMs have been used to form multi hidden layer networks, deep learning or hierarchical networks.

[16][17][28] A hidden node in ELM is a computational element, which need not be considered as classical neuron.

[12] Both universal approximation and classification capabilities[6][1] have been proved for ELM in literature.

Especially, Guang-Bin Huang and his team spent almost seven years (2001-2008) on the rigorous proofs of ELM's universal approximation capability.

If tuning the parameters of hidden nodes could make SLFNs approximate any target function

, then hidden node parameters can be randomly generated according to any continuous distribution probability, and

A wide range of nonlinear piecewise continuous functions

The black-box character of neural networks in general and extreme learning machines (ELM) in particular is one of the major concerns that repels engineers from application in unsafe automation tasks.

[29][30] Another approach focuses on the incorporation of continuous constraints into the learning process of ELMs[31][32] which are derived from prior knowledge about the specific task.

This is reasonable, because machine learning solutions have to guarantee a safe operation in many application domains.

The mentioned studies revealed that the special form of ELMs, with its functional separation and the linear read-out weights, is particularly well suited for the efficient incorporation of continuous constraints in predefined regions of the input space.

There are two main complaints from academic community concerning this work, the first one is about "reinventing and ignoring previous ideas", the second one is about "improper naming and popularizing", as shown in some debates in 2008 and 2015.

[33] In particular, it was pointed out in a letter[34] to the editor of IEEE Transactions on Neural Networks that the idea of using a hidden layer connected to the inputs by random untrained weights was already suggested in the original papers on RBF networks in the late 1980s; Guang-Bin Huang replied by pointing out subtle differences.

[35] In a 2015 paper,[1] Huang responded to complaints about his invention of the name ELM for already-existing methods, complaining of "very negative and unhelpful comments on ELM in neither academic nor professional manner due to various reasons and intentions" and an "irresponsible anonymous attack which intends to destroy harmony research environment", arguing that his work "provides a unifying learning platform" for various types of neural nets,[1] including hierarchical structured ELM.

[28] In 2015, Huang also gave a formal rebuttal to what he considered as "malign and attack.