The previous posts have noted the equivalence between the correlation coefficient as the slope of the regression line and the correlation as the tightness of packing of points around the line of best fit in the case of a set of items each measured on two variables. How then to generalize that equivalence?
One way is to see that the line of best fit is the first principal component. The variation (technically, the variance) in the direction of that line is equal to 1 + the correlation (or, for negative correlations, 1 – correlation), leaving the variation away from (i.e., perpendicular to) the line equal to 1 – correlation. If many variables are measured for the set of items, then a series of principal component lines can be derived, each having along it the maximum amount of the variation not in the direction of the preceding lines in the series (which means that each line is perpendicular to all the preceding ones).
Where for the two-variable case, the line of bet fit captured the association between the two variables as well as possible, the series of principal components capture the association among the multiple variables as well as possible. Where, in the two-variable case, we can think about the corresponding regression lines (lines plural because we can predict the horizontal variable from the vertical as well as the usual way around) in terms of prediction, we can envisage multiple regression lines predicting any one variable in terms of the others—or a subset of variables in terms of the others not in the first subset.
We could, however, just think directly about the relationships among the variables as seen in the patterns captured by the principal components …. to be continued.