Draft of discussion paper prepared for a 1.5 day visit to Iowa State to learn more about the dynamic graphical approaches to data analysis developed by the Research Group of Dianne Cook.
For this session, I will give a brief introduction, then participants take turns, say 5 minutes each, to relate how the paper intersects with or stimulates their own thinking (while the author stays quiet, listening). I join in at the end. This approach means that the emphasis is on participants teasing out their own thinking more than on digging into what the author thinks.
My interest in visual exploration of large data sets traces back to my studies and first research job in multivariate “pattern analysis” in ecology and agriculture (in Australia in mid 1970s). Conversations among the plant breeders I worked with were lively when they saw the plots I generated for them. Much less so when I showed them analyses of variance and other numerical output. Although I have since strayed from my quantitative roots—I am now more of a sociologist and philosopher of science than a data analyst—but I remain very interested in ways that people push the limits of conventional quantitative methods.
The theme of people addressing or suppressing heterogeneity runs through my studies these days of what researchers do (or don’t do) in social epidemiology, population health, and quantitative genetics. In this vein, I see the various tools of interactive and dynamic graphics for data analysis as ways to address heterogeneity, in the sense of teasing apart homogeneous components of a (heterogeneous) mixture so that separate kinds of explanations can be formulated for the separate components. Traditional statistical analysis allows itself to be confounded by the mixture of patterns or structure in a given data set. In this spirit, Cook and Swayne (2007, 13) quote Buja (1996) approvingly: “Non-discovery is the failure to identify meaningful structure… [T]he fear of non-discovery should be at least as great as the fear of false discovery.”
Our discussion may shed light on why the issue of heterogeneity is not explicitly named in discussions of exploratory data analysis by interactive graphics. If my characterization of this enterprise makes sense to you, how does your experience with exploratory data analysis and interactive graphics helps you think about the range of meanings of heterogeneity? Table 1 presents my current taxonomy of heterogeneities. The vignettes that follow illustrate some of the meanings and sketch some implications. This paper by no means circumscribes the issues you might bring to the topic of heterogeneity and data analysis. Nor do I presume that the vignettes resonate with your day-to-day concerns. Yet, I do hope some of these thoughts-from-an-outsider stimulate discussion in which specialists in representation and analysis of data provide deeper accounts of the conceptual and practical issues, correct my presentation when necessary, and help me learn more (continued next post).
Buja, A. (1996). “Interactive graphical methods in the analysis of customer panel data: Comment.” Journal of Business & Economic Statistics 14(1): 128-129.
Cook, D. and D. F. Swayne (2007). Interactive and Dynamic Graphics for Data Analysis. New York: Springer.
 The P.I. for this first research job, Don Byth, was a 1965 Iowa State Ph.D.
 These vignettes are extracted or adapted from publications, blogposts, notes to students, unpublished drafts, and a proposal for a book on heterogeneity in the biomedical sciences.