Spike-and-slab regression

[6] Conditional on a predictor being in the regression, we identify a prior distribution for the model coefficient, which corresponds to that variable (β).

A common choice on that step is to use a normal prior with a mean equal to zero and a large variance calculated based on

[8] All steps of the described algorithm are repeated thousands of times using the Markov chain Monte Carlo (MCMC) technique.

As a result, we obtain a posterior distribution of γ (variable inclusion in the model), β (regression coefficient values) and the corresponding prediction of y.

In the absence of such knowledge, some reasonable default values can be used; to quote Scott and Varian (2013): "For the analyst who prefers simplicity at the cost of some reasonable assumptions, useful prior information can be reduced to an expected model size, an expected R2, and a sample size ν determining the weight given to the guess at R2.

"[6] Some researchers suggest the following default values: R2 = 0.5, ν = 0.01, and π = 0.5 (parameter of a prior Bernoulli distribution).