The generalized functional linear model (GFLM) is an extension of the generalized linear model (GLM) that allows one to regress univariate responses of various types (continuous or discrete) on functional predictors, which are mostly random trajectories generated by a square-integrable stochastic processes.
Similarly to GLM, a link function relates the expected value of the response variable to a linear predictor, which in case of GFLM is obtained by forming the scalar product of the random predictor function
Applications of GFLM include classification and discrimination of stochastic processes and functional data.
[1] A key aspect of GFLM is estimation and inference for the smooth parameter function
which is usually obtained by dimension reduction of the infinite dimensional functional predictor.
in an orthonormal basis of L2 space, the Hilbert space of square integrable functions with the simultaneous expansion of the parameter function in the same basis.
This representation is then combined with a truncation step to reduce the contribution of the parameter function
in the linear predictor to a finite number of regression coefficients.
Functional principal component analysis (FPCA) that employs the Karhunen–Loève expansion is a common and parsimonious approach to accomplish this.
The Akaike information criterion (AIC) can be used for selecting the number of included components.
Minimization of cross-validation prediction errors is another criterion often used in classification applications.
Once the dimension of the predictor process has been reduced, the simplified linear predictor allows to use GLM and quasi-likelihood estimation techniques to obtain estimates of the finite dimensional regression coefficients which in turn provide an estimate of the parameter function
, typically are square integrable stochastic processes on a real interval
is typically a real valued random variable which may be either continuous or discrete.
is a smooth invertible function, that relates the conditional mean of the response
In order to implement the necessary dimension reduction, the centered predictor process
For a differentiable link function with bounded first derivative, the approximation error of the
A heuristic motivation for the truncation strategy derives from the fact that
is solved using iterative methods like Newton–Raphson (NR) or Fisher scoring (FS) or iteratively reweighted least squares (IWLS) to get the estimate of the regression coefficients
Functional linear regression, one of the most useful tools of functional data analysis, is an example of GFLM where the response variable is continuous and is often assumed to have a Normal distribution.
When the response variable has binary outcomes, i.e., 0 or 1, the distribution is usually chosen as Bernoulli, and then
Another special case of GFLM occurs when the outcomes are counts, so that the distribution of the responses is assumed to be Poisson.
Extensions of GFLM have been proposed for the cases where there are multiple predictor functions.
[2] Another generalization is called the Semi Parametric Quasi-likelihood Regression (SPQR)[1] which considers the situation where the link and the variance functions are unknown and are estimated non-parametrically from the data.
This situation can also be handled by single or multiple index models, using for example Sliced Inverse Regression (SIR).
In general, estimation in FGAM requires combining IWLS with backfitting.
), they will be independent in which case backfitting is not needed, and one can use popular smoothing methods for estimating the unknown parameter functions
A popular data set that has been used for a number of analysis in the domain of functional data analysis consists of the number of eggs laid daily until death of 1000 Mediterranean fruit flies (or medflies for short)[1][2].
The red colored curves belong to those flies that will lay less than the median number of remaining eggs, while the blue colored curves belong to the flies that will lay more than the median number of remaining eggs after age 25.
A related problem of classifying medflies as long-lived or short-lived based on the initial egg laying trajectories as predictors and the subsequent longevity of the flies as response has been studied with the GFLM[1]