I know we have talked a bit before about statistical significance in the data. It’s the kind of thing that keeps us researchers in the business. Drop a term like “statistical significance” in a party full of researchers, and watch our eyes light up with excitement. If you thought we were nerds before, just wait!
But because we have not talked about this kind of stuff in a while, I would like to revisit the topic. Today we take a look at what differences mean in the data. If data is collected randomly, then differences that we may find between various groups (such as racial groups, gender, client groups, etc.) should be real. However, aside from the magnitude of the differences, say 30 percentage points between the groups in a response to a particular question or behavior, it can become difficult to know whether or not the difference you are seeing are “real.”
You see, every sample has a “margin of error.” You know what that is, because you see it all the time in the political polls you are bombarded with as of late. If you don’t, read about it here.
Unless you have a census, researchers must deal with margins of error, and there are acceptable levels of error we are comfortable with, say +/- 5%. These exist because there is a possibility that the differences we see between the groups (like the percentage of voters that will choose a particular candidate for president) occur by mere chance. Refined sampling techniques, like the ones we use, are designed to minimize this possibility. But again, unless we survey every possible case in a targeted population, this possibility is present. Census targets are very costly and are trumped by the high precision of random sampling.
Let’s get a little bit more specific.
Say we see differences between males and females in how difficulty they think college will be. A recent IQS study showed that 55% of African American male adults believe that college will be difficult for high school students. Only 23% of females believe this. Now, we can probably tell that this difference is real based on the magnitude in the spread (32%). We don’t really need a statistical test to tell us this.
But let’s look at another example, one that may not be so clear. The same study revealed that 41% of white males said that everyone should get a college degree, compared to 48% of white females. A difference of only 7% is less clear. Perhaps the difference is real, or perhaps it is occurring because of that possibility of chance due to sampling error that we just discussed.
To ease your anticipation as you sit on the edge of your sets, I can tell you that the difference was indeed “statistically significant.” In other words, it was real. There is in fact a difference in opinion between these two groups. But is the difference practical? In other words, is the difference so great as to warrant different marketing campaigns directed toward men and women to raise perceptions of importance for a college education? Are the additional costs justified? The answer is probably no. But sometimes the differences, while real, are too subtle to develop different strategies to address the problem(s).
Thus, there is a difference between “statistical” and “practical.” Statistical differences are real and meaningful from a data standpoint. They help guide researches in finding insights in the data. But if you are on the receiving end of the analysis, perhaps these differences are not always practical. It is often the researcher’s charge to help in delineating between the two.
For more information on this topic, be sure to read our white paper on the matter. It will give you a deeper understanding of differences that are revealed in data analysis.