In statistics, DFFIT and DFFITS ("difference in fit(s)") are diagnostics meant to show how influential a point is in a linear regression, first proposed in 1980.
DFFITS is the Studentized DFFIT, where Studentization is achieved by dividing by the estimated standard deviation of the fit at that point: where
is the standard error estimated without the point in question, and
DFFITS also equals the products of the externally Studentized residual (
):[2] Thus, for low leverage points, DFFITS is expected to be small, whereas as the leverage goes to 1 the distribution of the DFFITS value widens infinitely.
For a perfectly balanced experimental design (such as a factorial design or balanced partial factorial design), the leverage for each point is p/n, the number of parameters divided by the number of points.
This means that the DFFITS values will be distributed (in the Gaussian case) as
Therefore, the authors suggest investigating those points with DFFITS greater than
Although the raw values resulting from the equations are different, Cook's distance and DFFITS are conceptually identical and there is a closed-form formula to convert one value to the other.
This led to a variety of quantitative measures, including DFFIT, DFBETA.