Logical interaction
In statistics, an “interaction”, describes a situation in which the effect on a variable Y of one variable A depends on the state of a second variable B.
For instance in the model
A.B is an “interaction”
A logical interaction is a generalization of interaction, for instance “A and B”, “A or B” , “A or exclusive B”
History[edit]
The mathematical notion of “ logical interaction ”, conceived as a generalization of that of “interaction”, resulting from the design of experiments, was introduced at the end of the 1990s.[1] . First used in data analysis (Iconography of correlations), it has found a field of application in linear regression[2] · [3]
Concept of interaction[edit]
The notion of interaction should not be confused with that of correlation. We speak of an "interaction effect" when a variable to be explained Y is conditioned by the "coupling" of two explanatory variables A and B.
In the following example, Y correlates neither with A nor with B; but Y is negatively correlated with the product A.B. Indeed, Y has high values when A.B has low values:
A | B | A.B | Y | |
---|---|---|---|---|
Trial 1 | −1 | −1 | 1 | 10 |
Trial 2 | −1 | 1 | −1 | 21 |
Trial 3 | 1 | −1 | −1 | 19 |
Trial 4 | 1 | 1 | 1 | 9 |
A special case of a data table[edit]
The above table is sometimes referred to as a " 2-level full factorial design of experiments ". Indeed, each explanatory variable has only 2 levels (weak and strong), and all cases are considered, namely:
- * A weak and B weak,
- * A weak and B strong,
- * A strong and B weak,
- * A strong and B strong.
The explanatory variable Y is also called the "response" of the experiment.
This is a special case of the "full k-level factorial design of experiments".
In a “full factorial design”, the variables A, B and A.B are orthogonal, ie their correlation is zero.
The full factorial design is itself a special case of the design of experiment , in which the explanatory variables A and B are controlled in a reasoned manner to obtain the maximum amount of information concerning their influences on Y, in the minimum number of trials.
Finally, the design of experiments is a special case of data tables, in which the explanatory variables are not necessarily controlled.
Generalization to arbitrary arrays[edit]
The notion of logical interaction, which will be introduced below, applies to tables of data in general, on quantitative and / or qualitative variables (provided that the latter use boolean coding) . When the variables A and B do not have the same unit, how to calculate the product A.B so that it keeps a physical meaning?
We have to come down to "a common unit of evaluation". The custom is to standardize the variables A and B, before calculating the cross product A.B. (Standardized variable have a zero mean and a standard deviation equal to one). In these new units, our table becomes:
A | B | A.B | Y | |
---|---|---|---|---|
Trial 1 | −0.866 | −0.866 | .866 | 10 |
Trial 2 | −0.866 | 0.866 | −0.866 | 21 |
Trial 3 | 0.866 | −0.866 | −0.866 | 19 |
Trial 4 | 0.866 | 0.866 | 0.866 | 9 |
Physical interpretation of the A.B product[edit]
The physical interpretation of the product of two variables of the same unit, such as length and width, is easy (it is an area).
But what does the effect on Y of the product A.B of two variables which were at the origin of different units, and which were standardized?
- Figure 1: A on the x-axis, B on the y-axis; and the corresponding values of Y. The explanatory variable Y is weak if A and B are weak, or if A and B are strong.
- Figure 2:
- • in red: variation of Y as a function of A, for low B;
- • in blue: variation of Y as a function of A, for strong B.
- Y therefore varies differently depending on A, depending on whether B is weak or strong.
- Figure 3: variation profiles, depending on the sequence of tests: Y mainly looks like "A * B". Or if you prefer, Y is positively correlated with "A * B" and negatively with A.B.
These figures show that Y is strong if “A is weak and B is strong”, or if “A is strong and B is weak”.
In other words, the operation "A * B" = −A.B corresponds to the "" or exclusive "" "of the logic.
Figure 1 represented the “ or exclusive ” in the case where the variables A and B are discontinuous at two levels.
If the variables A and B are continuous, we obtain figure 4 characterized by 'mountains' in red when A is strong and B weak , or else A is weak and B is strong . Otherwise, there are “valleys” (in blue).
- Figure 4 : response surfaces of the variable A * B
Concept of "logical interaction"[edit]
Since the artificial variable “A * B” = −AB corresponds to the “exclusive or” of logic, it is natural to also be interested in a “logical interaction” that is much more frequent in physics, namely the logical “and”: “A&B”.
In the case of 2-level variables, the “A&B” column will have the following values (strong value only if A and B are strong):
A B A.B A*B A&B Y Trial 1 −1 −1 1 −1 −1 10 Trial 2 −1 1 −1 1 −1 21 Trial 3 1 −1 −1 1 −1 19 Trial 4 1 1 1 −1 1 9
And, in the general case of continuous variables, we have the following figure:
The following figures show other "logical interactions", the description of which will be found below, and the mathematical formulas in references. Note that "A + B", which is not, strictly speaking, an interaction, has been placed there to show the difference with "A&B".
Meaning of logical interaction symbols[edit]
f (A, B) Meaning The Y response is strong when ... A * B A or-exclusive B ... A is strong and B weak or A is weak and B strong A ^ B A or B ... A is strong or B is strong A ^ -B A or not B ... A is strong or B is weak A&B A and B ... A and B are strong A & -B A and not B ... A is strong and B is weak A]B A if B ... A is strong if B is strong A]−B A if no B ... A is strong if B is weak A}B A if mean B ... A is strong if B is medium A{B A medium if B ... A is medium if B is strong A{−B A medium if not B ... A is medium if B is low A'B neither A nor B (broad sense) ... neither A nor B are extreme (they are average) A!B neither A nor B (strict sense) ... neither A nor B are extreme (they are strictly average) A # B A like B ... A varies like B A + B "A plus B" ... the sum of A and B (standardized) is high A−B "A minus B" ... the difference of A and B (standardized) is strong
"A&B" or "A]B" response surfaces, much simpler than "A * B", are also more frequent in practice. They often allow better fitting models.
Example of the application of logical interactions in a prediction model[edit]
Consider the following data:
A B C D E Y e1 7 7 1 4 2 1.304 e2 8 5 6 5 5 17.052 e3 3 4 3 8 8 2.123 e4 5 2 8 3 6 12.618 e5 4 6 2 2 7 2.723 e6 2 3 5 1 1 1.733 e7 1 8 7 6 4 1.119 e8 6 1 4 7 3 6.955 e9 5 5 5 5 5 7.774 e10 1 8 1 1 8 2.381 e11 8 1 8 1 1 20424 e12 1 8 1 8 1 0.959 e13 1 1 8 1 8 −1.616 e14 8 1 1 8 1 0.485 e15 8 8 8 8 8 23.039
We will compare a classical regression model of Y, with a model that can include logical interactions.
the goodness of fit of models will be evaluated by
→ Q2 : the R2 when the model on a training set is applied to a test set.
→ F-test : the ratio of the fraction explained by the model to the residual fraction.
We will use Forward selection of the terms of the model, which we will write in decreasing order of importance: each term explaining the residue not explained by the previous terms. We stop adding terms when the standard error of predicion (SEP) no longer decreases.
Model 1, without logical interactions
Y = -6.904 + 1.589 A + 14.44 A.C + 1.391 C + 2.613 C.D
R2a = 0.995 Q2 = 0.992 F = 715.3 SEP= 0.8412
Model 2 with logical interactions
Y = 6.605 + 29.91 A&C + 3.923 B]-D
R2a = 0.999 Q2 = 0.998 F = 5887. SEP= 0.3357
Model 2 includes two terms instead of four. Parsimonious models are simple models with great explanatory predictive power. They explain data with a minimum number of predictor variables.
Model 2 is easier to interpret ( “A&C”: Y increases if A and C are strong simultaneously. "B]-D": The residue of Y not explained by the first term increases with B if D is small).
R2a, Q2 and F have increased. The SEP error has decrease.
Note: beware, in a regression equation, the value of the predictors coefficients depends on the units in which the interactions are expressed. For example, if A is in m/s and B in degrees, in which unit to express A.B? In the product, A and B are standardized, and the product itself is standardized. Instead of standardizing, another possible unit is the “variable-instant correlation”. However, regardless of the interaction unit, R2a, Q2, F and SEP remain the same.
See also[edit]
Non-postulated multiple regression models.
Some use of "" in your query was not closed by a matching "".Some use of "" in your query was not closed by a matching "".
References[edit]
- ↑ "" Une nouvelle approche dans le choix des régresseurs de la régression multiple en présence d'interactions et de colinéarités. " M. Lesty, La Revue de Modulad, n°22, pp.41–77, janvier 1999" (PDF) (in français).
- ↑ "" The iconographic correlation (CORICO) method, a new approach for the optimization of microwave cooking processes: application for cooking fish. " J.C. Laguerre, I. Douiri-Bédoui1, C. Chireux, D. Marier, P. Jacolot, C. Jouquand, F.J. Tessier, K. Woodward, P. Gadonna-Widehem. November 2013, EFFOST Annual meeting, At Bologna, Italy".
- ↑ "" Optimization of microwave cooking of beef burgundy in terms of nutritional and organoleptic properties. " Celine Jouquand , Frederic J. Tessier , Julien Bernard, David Marier, KenWoodward,Philippe Jacolot, Pascale Gadonna-Widehem, Jean-Claude Laguerre, In: LWT - Food Science and Technology 60 (2015) 271e27". LWT - Food Science and Technology. 60 (1): 271–276. January 2015. doi:10.1016/j.lwt.2014.07.038.
This article "Logical interaction" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Logical interaction. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.