A NYTimes article on predicting effective tweets provides some nice examples of abstract points I often make about statistical analysis. Patterns found in data, which are often formulated in predictive regression equations that look like and are talked about like they are about causes (even though “we all know that correlation is not causation”), suggest policies or practices that might get enacted. When patterns are seen or used that way, what caused the patterns can be misunderstood, or the underlying dynamics that generated the original data, and thus the patterns, can be altered.
Changing a variable that is highly predictive may have no effect. For example, we may find the number of employees formatting their résumés is a good predictor of a company’s bankruptcy. But stopping this behavior hardly seems like a fruitful strategy for fending off creditors.
That is, enforcing a rule that employees not format their resumes does not stop an impending bankruptcy.
[T]he tweet predictor [developed using large data sets] finds that longer tweets are more likely to be retweeted. It seems unlikely that you should therefore write longer tweets. The old adage that “less is more” is, if anything, truer in this medium. Instead, length is probably a good predictor because longer tweets have more content. So the lesson is not “make your tweets longer” but “have more content,” which is far harder to do.
The example implies that, even if a variable in a predictive equation derived from data analysis looks like a plausible cause, there’s grounds to explore what the underlying dynamics are. (I would add that it may be that heterogeneous underlying dynamics result in the same predictive variable.) Two more examples:
But once an algorithm [the tweet predictor] finds those things that draw attention and starts exploiting them, their value erodes. When few people do something, it catches the eye; when everyone does it, it is ho-hum. Calling a food “artisanal” was eye-catching, until it became so common that we’re not far away from an artisanal plunger. In the Twitter example, the use of the words “retweet” or “please” were predictive. But if everyone starts asking you to “Share this article. Please,” will it continue to work?
[C]omputer scientists [Backstrom et al.] predict which posts on Facebook generate many comments. One of the most predictive variables is the time it takes for the first comment to arrive: If the first comment arrives quickly, then the post is likely to generate many more comments in the future. This helps Facebook decide which posts to show you. But it does not help anyone to write a highly commented post. It says: “Want to write a post people like [enough to comment on]? Well, write one that people like!”