Logical interaction

In statistics, an “interaction” describes a situation in which the effect on a variable Y of one variable A depends on the state of a second variable B.

For instance in the model

Y = c + a . A + b . B + d . (A . B) + error

A.B is an “interaction”

A logical interaction is a generalization of interaction, for instance “A and B”, “A or B”, “A or exclusive B”.

History

The mathematical notion of “logical interaction”, conceived as a generalization of that of “interaction”, resulting from the design of experiments, was introduced at the end of the 1990s.^[1]. First used in data analysis (Iconography of correlations), it has found a field of application in linear regression^[2] · ^[3]

Concept of interaction

The notion of interaction should not be confused with that of correlation. We speak of an "interaction effect" when a variable to be explained Y is conditioned by the "coupling" of two explanatory variables A and B.

In the following example, Y correlates neither with A nor with B; but Y is negatively correlated with the product A.B. Indeed, Y has high values when A.B has low values:

	A	B	A.B	Y
Trial 1	−1	−1	1	10
Trial 2	−1	1	−1	21
Trial 3	1	−1	−1	19
Trial 4	1	1	1	9

A special case of a data table

The above table is sometimes referred to as a "2-level full factorial design of experiments". Indeed, each explanatory variable has only 2 levels (weak and strong), and all cases are considered, namely:

* A weak and B weak,

* A weak and B strong,

* A strong and B weak,

* A strong and B strong.

The explanatory variable Y is also called the "response" of the experiment.

This is a special case of the "full k-level factorial design of experiments".

In a “full factorial design”, the variables A, B and A.B are orthogonal, ie their correlation is zero.

The full factorial design is itself a special case of the design of experiment, in which the explanatory variables A and B are controlled in a reasoned manner to obtain the maximum amount of information concerning their influences on Y, in the minimum number of trials.

Finally, the design of experiments is a special case of data tables, in which the explanatory variables are not necessarily controlled.

Generalization to arbitrary arrays

The notion of logical interaction, which will be introduced below, applies to tables of data in general, on quantitative and / or qualitative variables (provided that the latter use boolean coding). When the variables A and B do not have the same unit, how to calculate the product A.B so that it keeps a physical meaning?

We have to come down to "a common unit of evaluation". The custom is to standardize the variables A and B, before calculating the cross product A.B. (Standardized variables have a zero mean and a standard deviation equal to one). In these new units, our table becomes:

	A	B	A.B	Y
Trial 1	−0.866	−0.866	.866	10
Trial 2	−0.866	0.866	−0.866	21
Trial 3	0.866	−0.866	−0.866	19
Trial 4	0.866	0.866	0.866	9

Physical interpretation of the A.B product

The physical interpretation of the product of two variables of the same unit, such as length and width, is easy (it is an area).

But what does the effect on Y of the product A.B of two variables which were at the origin of different units, and which were standardized?

File:InteractionA.Bang.png

Figure 1: A on the x-axis, B on the y-axis; and the corresponding values of Y. The explanatory variable Y is weak if A and B are weak, or if A and B are strong.

Figure 2:

• in red: variation of Y as a function of A, for low B;

• in blue: variation of Y as a function of A, for strong B.

Y therefore varies differently depending on A, depending on whether B is weak or strong.

Figure 3: variation profiles, depending on the sequence of tests: Y mainly looks like "A * B". Or if you prefer, Y is positively correlated with "A * B" and negatively with A.B.

These figures show that Y is strong if “A is weak and B is strong”, or if “A is strong and B is weak”.

In other words, the operation "A * B" = −A.B corresponds to the "" or exclusive "" "of the logic.

Figure 1 represented the “or exclusive” in the case where the variables A and B are discontinuous at two levels.

If the variables A and B are continuous, we obtain figure 4 characterized by 'mountains' in red when A is strong and B weak, or else A is weak and B strong. Otherwise, there are “valleys” (in blue).

File:InteractionABlisse2.png

Figure 4 : response surfaces of the variable A * B

Concept of "logical interaction"

Since the artificial variable “A * B” = −AB corresponds to the “exclusive or” of logic, it is natural to also be interested in a “logical interaction” that is much more frequent in physics, namely the logical “and”: “A&B”.

In the case of 2-level variables, the “A&B” column will have the following values (strong value only if A and B are strong):

	A	B	A.B	A*B	A&B	Y
Trial 1	−1	−1	1	−1	−1	10
Trial 2	−1	1	−1	1	−1	21
Trial 3	1	−1	−1	1	−1	19
Trial 4	1	1	1	−1	1	9

And, in the general case of continuous variables, we have the following figure:

File:InteractionAandBlisse2.png

: Figure 5 : " A and B " response surface

The following figures show other "logical interactions", the description of which will be found below, and the mathematical formulas in references. Note that "A + B", which is not, strictly speaking, an interaction, has been placed there to show the difference with "A&B".

File:InteractionLogiquesLisses.png

Meaning of logical interaction symbols

f (A, B)	Meaning	The Y response is strong when ...
A * B	A or-exclusive B	... A is strong and B weak or A is weak and B strong
A ^ B	A or B	... A is strong or B is strong
A ^ -B	A or not B	... A is strong or B is weak
A&B	A and B	... A and B are strong
A & -B	A and not B	... A is strong and B is weak
A]B	A if B	... A is strong if B is strong
A]−B	A if no B	... A is strong if B is weak
A}B	A if mean B	... A is strong if B is medium
A{B	A medium if B	... A is medium if B is strong
A{−B	A medium if not B	... A is medium if B is low
A'B	neither A nor B (broad sense)	... neither A nor B are extreme (they are average)
A!B	neither A nor B (strict sense)	... neither A nor B are extreme (they are strictly average)
A # B	A like B	... A varies like B
A + B	"A plus B"	... the sum of A and B (standardized) is high
A−B	"A minus B"	... the difference of A and B (standardized) is strong

"A&B" or "A]B" response surfaces, much simpler than "A * B", are also more frequent in practice. They often allow better fitting models.

Example of the application of logical interactions in a prediction model

Consider the following data:

	A	B	C	D	E	Y
e1	7	7	1	4	2	1.304
e2	8	5	6	5	5	17.052
e3	3	4	3	8	8	2.123
e4	5	2	8	3	6	12.618
e5	4	6	2	2	7	2.723
e6	2	3	5	1	1	1.733
e7	1	8	7	6	4	1.119
e8	6	1	4	7	3	6.955
e9	5	5	5	5	5	7.774
e10	1	8	1	1	8	2.381
e11	8	1	8	1	1	20424
e12	1	8	1	8	1	0.959
e13	1	1	8	1	8	−1.616
e14	8	1	1	8	1	0.485
e15	8	8	8	8	8	23.039

We will compare a classical regression model of Y, with a model that can include logical interactions.

the goodness of fit of models will be evaluated by

→ R2a = adjusted R-squared

→ Q2 : the R2 when the model on a training set is applied to a test set.

→ F-test : the ratio of the fraction explained by the model to the residual fraction.

We will use Forward selection of the terms of the model, which we will write in decreasing order of importance: each term explaining the residue not explained by the previous terms. We stop adding terms when the standard error of prediction (SEP) no longer decreases.

Model 1, without logical interactions

Y = -6.904 + 1.589 A + 14.44 A.C + 1.391 C + 2.613 C.D

        R2a = 0.995      Q2 = 0.992      F =   715.3         SEP=  0.8412

Model 2 with logical interactions

Y = 6.605 + 29.91 A&C + 3.923 B]-D

        R2a = 0.999      Q2 = 0.998      F =   5887.         SEP=  0.3357

Model 2 includes two terms instead of four. Parsimonious models are simple models with great explanatory predictive power. They explain data with a minimum number of predictor variables.

Model 2 is easier to interpret ( “A&C”: Y increases if A and C are strong simultaneously. "B]-D": The residue of Y not explained by the first term increases with B if D is small).

R2a, Q2 and F have increased. The SEP error has decreased.

Note: beware, in a regression equation, the value of the predictors coefficients depends on the units in which the interactions are expressed. For example, if A is in m/s and B in degrees, in which unit to express A.B? In the product, A and B are standardized, and the product itself is standardized. Instead of standardizing, another possible unit is the “variable-instant correlation”. However, regardless of the interaction unit, R2a, Q2, F and SEP remain the same.

References

This article "Logical interaction" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Logical interaction. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.

[1] "" Une nouvelle approche dans le choix des régresseurs de la régression multiple en présence d'interactions et de colinéarités. " M. Lesty, La Revue de Modulad, n°22, pp.41–77, janvier 1999" (PDF) (in français).

[2] "" The iconographic correlation (CORICO) method, a new approach for the optimization of microwave cooking processes: application for cooking fish. " J.C. Laguerre, I. Douiri-Bédoui1, C. Chireux, D. Marier, P. Jacolot, C. Jouquand, F.J. Tessier, K. Woodward, P. Gadonna-Widehem. November 2013, EFFOST Annual meeting, At Bologna, Italy".

[3] "" Optimization of microwave cooking of beef burgundy in terms of nutritional and organoleptic properties. " Celine Jouquand , Frederic J. Tessier , Julien Bernard, David Marier, KenWoodward,Philippe Jacolot, Pascale Gadonna-Widehem, Jean-Claude Laguerre, In: LWT - Food Science and Technology 60 (2015) 271e27". LWT - Food Science and Technology. 60 (1): 271–276. January 2015. doi:10.1016/j.lwt.2014.07.038.

[1]

[2]

[3]