Zero-inflated model

In statistics, a zero-inflated model is a statistical model based on a zero-inflated probability distribution, i.e. a distribution that allows for frequent zero-valued observations.

Zero-inflated models are commonly used in the analysis of count data, such as the number of visits a patient makes to the emergency room in one year, or the number of fish caught in one day in one lake.

[2] Other examples of count data are the number of hits recorded by a Geiger counter in one minute, patient days in the hospital, goals scored in a soccer game,[3] and the number of episodes of hypoglycemia per year for a patient with diabetes.

Hilbe [3] notes that "Poisson regression is traditionally conceived of as the basic count model upon which a variety of other count models are based."

In a Poisson model, "… the random variable

is also called the rate or intensity parameter… In statistical literature,

(mu) when referring to Poisson and traditional negative binomial models."

In some data, the number of zeros is greater than would be expected using a Poisson distribution or a negative binomial distribution.

Data with such an excess of zero counts are described as Zero-inflated.

[4] Example histograms of zero-inflated Poisson distributions with mean

of 0.2 or 0.5 are shown below, based on the R program ZeroInflPoiDistPlots.R from Bilder and Laughlin.

As the examples above show, zero-inflated data can arise as a mixture of two distributions.

[7] In the statistical literature, different authors may use different names to distinguish zeros from the two distributions.

Some authors describe zeros generated by the first (binary) distribution as "structural" and zeros generated by the second (count) distribution as "random".

[7] Other authors use the terminology "immune" and "susceptible" for the binary and count zeros, respectively.

[1] One well-known zero-inflated model is Diane Lambert's zero-inflated Poisson model, which concerns a random event containing excess zero-count data in unit time.

[8] For example, the number of insurance claims within a population for a certain type of risk would be zero-inflated by those people who have not taken out insurance against the risk and thus are unable to claim.

The zero-inflated Poisson (ZIP) model mixes two zero generating processes.

The second process is governed by a Poisson distribution that generates counts, some of which may be zero.

The mixture distribution is described as follows: where the outcome variable

The maximum likelihood estimator[10] can be found by solving the following equation where

A closed form solution of this equation is given by[11] with

being the main branch of Lambert's W-function[12] and Alternatively, the equation can be solved by iteration.

is given by In 1994, Greene considered the zero-inflated negative binomial (ZINB) model.

Hall adapted Lambert's methodology to an upper-bounded count situation, thereby obtaining a zero-inflated binomial (ZIB) model.

obey discrete pseudo compound Poisson distribution.

has the probability generating function of the discrete pseudo compound Poisson distribution.

satisfying probability generating function characterization has a discrete pseudo compound Poisson distribution with parameters When all the

are non-negative, it is the discrete compound Poisson distribution (non-Poisson case) with overdispersion property.

Histogram of a zero-inflated Poisson distribution