Heterogeneity #4, Deviation from the type or essential trajectory

Of course, statistical analysis involves more than t-tests and their generalizations. Correlation and regression are another mainstay. Here, however, the emphasis lies more on prediction than variation, as if, as a generalization of the emphasis in t-tests on types, the line or curve of prediction captured the *essential trajectory* of the data (McLaughlin 1989). (Of course, everyone knows that correlation is not causation, but most of us interpret regressions in a causal spirit.) The following excerpt from Taylor (2008; see http://bit.ly/osTjQ3) highlights an alternative view of correlation and regression that keeps our attention on the variation (also discussed in a series of posts):

Consider the concept of a regression line as a best predictor line. To predict one measurement from another is to hint at, or to invite, causal interpretation. Granted, if we have the additional information that the second measurement follows the first in time—as is the case for offspring and parental traits—a causal interpretation in the opposite direction is ruled out. But there is nothing about the association between correlated variables, whether temporally ordered or not, that requires it to be assessed in terms of how well the first

predictsthe second (let alone whether the predictions provide insight about the causal process). After all—although this is rarely made clear to statistics students—the correlation is not only the slope of the regression line when the two measurements are scaled to have equal spread, but it also measures how tightly the cloud of points is packed around the line of slope 1 (or slope -1 for a negative correlation). Technically, when both measurements are scaled to have a standard deviation of 1, the average of the squared perpendicular distance from the points to the line of slope 1 or -1 is equal to 1 minus the absolute value of the correlation (Weldon 2000). This means that the larger the correlation, the tighter the packing. This tightness-of-packing view of correlation affords no priority to one measurement over the other. Whereas the typical emphasis in statistical analysis on prediction often fosters causal thinking, a non-directional view of correlation reminds us that additional knowledge always has to be brought in if the patterns in data are used to support causal claims or hypotheses.

[Postscript: The tightness of packing view of regression for continuous variables can be extended to multivariate associations through Principal Component Analysis, factor analysis, etc. The well-known difficulty of interpreting principal components or the factors can be flipped on its head: What causal assumptions about *independent* variables (i.e., independently modifiable variables) enter into interpretations of conventional regression analysis?]

*(continuing a series of posts—see first post; see next post)*

Pingback: Heterogeneity and Data Analysis: Heterogeneity #4, Deviation from the type « Intersecting Processes