[1][2] Although commonly thought of in terms of causal relationships, the concept of an interaction can also describe non-causal associations (then also called moderation or effect modification).
Interactions are often considered in the context of regression analyses or factorial experiments.
The presence of interactions can have important implications for the interpretation of statistical models.
In exploratory statistical analyses it is common to use products of original variables as the basis of testing whether interaction is present with the possibility of substituting other more realistic interaction variables at a later stage.
When there are more than two explanatory variables, several interaction variables are constructed, with pairwise-products representing pairwise-interactions and higher order products representing higher order interactions.
[3] A simple setting in which interactions can arise is a two-factor experiment analyzed using Analysis of Variance (ANOVA).
The following table shows one possible situation: In this example, there is no interaction between the two treatments — their effects are additive.
In contrast, if the following average responses are observed then there is an interaction between the treatments — their effects are not additive.
The table of means on the left, below, shows a quantitative interaction — treatment A is beneficial both when B is given, and when B is not given, but the benefit is greater when B is not given (i.e. when A is given alone).
In its simplest form, the assumption of treatment unit additivity states that the observed response yij from experimental unit i when receiving treatment j can be written as the sum yij = yi + tj.
Therefore, by contraposition, a necessary condition for unit treatment additivity is that the variance is constant.
[citation needed] The property of unit treatment additivity is not invariant under a change of scale,[citation needed] so statisticians often use transformations to achieve unit treatment additivity.
[8] In many cases, a statistician may specify that logarithmic transforms be applied to the responses, which are believed to follow a multiplicative model.
Kempthorne's use of unit treatment additivity and randomization is similar to the design-based analysis of finite population survey sampling.
In recent years, it has become common[citation needed] to use the terminology of Donald Rubin, which uses counterfactuals.
However, unit treatment additivity implies that the cumulative distribution functions F1 and F2 for the two groups satisfy F2(y) = F1(y − τ), as long as the assignment of individuals to groups 1 and 2 is independent of all other factors influencing y (i.e. there are no confounders).
If one wishes to predict a person's height based only on the person's religion and occupation, a simple additive model, i.e., a model without interaction, would add to an overall average height an adjustment for a particular religion and another for a particular occupation.
Statistically, the presence of an interaction between categorical variables is generally tested using a form of analysis of variance (ANOVA).
If one or more of the variables is continuous in nature, however, it would typically be tested using moderated multiple regression.
Genichi Taguchi contended[11] that interactions could be eliminated from a system by appropriate choice of response variable and transformation.
could be formed explicitly by multiplying two (or more) variables, or implicitly using factorial notation in modern statistical packages such as Stata.
Regression approaches to interaction modeling are very general because they can accommodate additional predictors, and many alternative specifications or estimation strategies beyond ordinary least squares.
Robust, quantile, and mixed-effects (multilevel) models are among the possibilities, as is generalized linear modeling encompassing a wide range of categorical, ordered, counted or otherwise limited dependent variables.
The graph depicts an education*politics interaction, from a probability-weighted logit regression analysis of survey data.
As a second example, consider a clinical trial on the interaction between stroke severity and the efficacy of a drug on patient survival.
The line for the severe stroke group is flat, indicating that, among these patients, there is no difference in survival between the drug and placebo treatments.
Is the yield of good cookies affected by the baking temperature and time in the oven?
From the graph and the data, it is clear that the lines are not parallel, indicating that there is an interaction.
That is, the second ANOVA model explicitly performs a hypothesis test for interaction.
In the ANOVA model that ignores interaction, neither temperature nor time has a significant effect on yield (p=0.91), which is clearly the incorrect conclusion.