Quantcast
Channel: Interpreting sub-sample analysis when coefficient signs are opposite - Cross Validated
Viewing all articles
Browse latest Browse all 5

Answer by BenP for Interpreting sub-sample analysis when coefficient signs are opposite

$
0
0

In an analysis of variance class I explained this with tables, one containing cell means of the dependent and one containing the corresponding cell frequencies:

enter image description here

For each of the four Gr groups the treatment "causes" a decrease of 1 point in the mean of the dependent. However, due to the unbalanced frequencies, the overall Treatment mean is 0.5 higher than the Control mean and thus opposite to what is seen within each of the four Gr(oups).

With balanced frequencies, say each cell containing n=4 (1/4 of the total per row), the overall Treatment mean 2.5 would lie exactly in the middle of the four Treatment means, and the overall Control mean would be 3.5. Hence, the overall difference between Treatment and Control would then be equal to the within GR1, GR2, etc. difference, namely 1 point.

This example is nothing new of course, just showing in a different way what was already said in the earlier comments. The question to be answered is: is it meaningful to control for Gr? Looking per Gr group, as the questions says, is (more or less) similar to controlling for Gr in a regression or ANOVA model. In an ANOVA the four Gr(oups) could act as a second factor; the interaction Treatment*Gr would be zero in the example. Also, the estimated marginal means (EMM) of Treatment and Control would be equal to 2.5 and 3.5, neutralizing the influences of the different frequencies on the marginal means.

So, is controlling for Gr necessary or desired? If there is some causal influence of Gr on the dependent, than the answer is 'yes', I would say.

However ... the nice answer of Civilstat about mediation by the Gr(oups) offers a different view. I will try to explain such mediation for my example data.

Suppose that the composition of the four Gr(oups) is influenced by the treatment. It could be that the treatment causes people to more "prefer" Gr4 instead of Gr1. Prefer may not be the right word here, as there may be all kinds of reasons why people more belong to Gr4 after the treatment. The point is that the treatment causes the "move" away from Gr1 and into Gr4. So, this change is not coincidental or manipulated by the researcher, but it is a truly causal influence of the treatment. We then have the following causal chain:

enter image description here

Instead of using the entire variable Gr as the mediating variable, in the above diagram I used only Gr4, or the proportion of people belonging to Gr4. This is explained now. The Gr(oup) composition changes as a consequence of the treatment, but only for the proportions of people in Gr1 and Gr4. The proportion of people in Gr4 is 2/16 for Control and 10/16 for Treatment, so a shift of +8/16. We can also say that the proportion in Gr1 changes by -8/16. If the proportions in Gr2 and Gr3 do NOT change, then the (absolute) proportional shift away from Gr1 goes together with the same proportional shift into Gr4. In the chain therefore only Gr4 is given as mediator, with +8/16 as the regression effect of the Treatment on Gr4. The regression effect of Gr4 on the dependent is +3, because we only need to look at the move from Gr1 into Gr4 and these groups differ +3 in there means, as shown within each row of the upper left table containing means. In total then the mediated or indirect influence from Treatment on the Dependent can be calculated as 8/16 * 3 = +1.5. The direct effect of Treatment on the dependent is -1 as is also shown, within columns, in the upper left table and also along the curved arrow in the causal diagram. The total influence of Treatment now is equal to the sum of the direct and indirect effect of Treatment, -1 + 1.5 = +0.5.

Another, may be simpler, method to calculate the indirect influence of Treatment on the dependent is as follows. Going from Control to Treatment the Gr(oups) composition changes. We could now ask: how does Gr composition alone affect the dependent, i.e. when keeping Treatment constant or when controlling for Treatment? Knowing this change in the dependent shows us the influence of Treatment on the dependent as far as it is caused by the changing Gr(oups) composition only. Or: it shows us the indirect influence via Gr composition. Suppose we apply the Gr frequencies (2, 2, 2, 10) for Treatment to the data of the Control group ... this would produce a mean of (22 + 23 + 24 + 105) / 16 = 68/16 = 4.25. So, this different Gr composition would produces a mean of 4.25, whereas with Gr composition (10, 2, 2, 2) the Control mean was 2.75 (see table). That is, the Gr composition shift alone leads to a increase of 4.25 - 2.75 = +1.5; this 1.5 increase is purely caused by the shift in composition of the four Gr groups, which is one of the things that change when comparing Control and Treatment! So here we (again) have the indirect or mediated influence of Treatment.

And finally, the question at issue again is: which influence to report, -1 or +0.5? In case of mediation, both would be relevant. And if the Gr(oups) frequencies are in no way caused by the Treatment, and the Gr(oups) have an indisputable influence on the dependent, it seems more plausible to me to consider -1 as the "true" Treatment effect.


Viewing all articles
Browse latest Browse all 5

Latest Images

Trending Articles





Latest Images