Count data that have an incidence of zeros greater than expected for the underlying probability distribution of counts can be modeled with a zeroinflated distribution. In GENMOD, the underlying distribution can be either Poisson or negative binomial. See Lambert (1992), Long (1997) and Cameron and Trivedi (1998) for more information about zeroinflated models. The population is considered to consist of two types of individuals. The first type gives Poisson or negative binomial distributed counts, which might contain zeros. The second type always gives a zero count. Let be the underlying distribution mean and be the probability of an individual being of the second type. The parameter is called here the zeroinflation probability, and is the probability of zero counts in excess of the frequency predicted by the underlying distribution. You can request that the zero inflation probability be displayed in an output data set with the PZERO keyword. The probability distribution of a zeroinflated Poisson random variable Y is given by
and the probability distribution of a zeroinflated negative binomial random variable Y is given by
where k is the negative binomial dispersion parameter.
You can model the parameters and in GENMOD with the regression models:






where h is one of the binary link functions: logit, probit, or complementary loglog. The link function h is the logit link by default, or the link function option specified in the ZEROMODEL statement. The link function g is the log link function by default, or the link function specified in the MODEL statement, for both the Poisson and the negative binomial. The covariates for observation i are determined by the model specified in the ZEROMODEL statement, and the covariates are determined by the model specified in the MODEL statement. The regression parameters and are estimated by maximum likelihood.
The mean and variance of Y for the zeroinflated Poisson are given by






and for the zeroinflated negative binomial by






You can request that the mean of Y be displayed for each observation in an output data set with the PRED keyword.