A recent study carried out by our company and The Civil Rights Project for Jefferson County Public Schools came under fire for a common misconception among those who don’t fully understand the power of random sampling. Without going into a long, drawn-out discussion of what the study entailed, the project aimed to gain an understanding of the Louisville community’s perceptions of the student assignment plan and the diversity goals it seeks to accomplish. Perhaps the methods would not have come under such scrutiny had the findings been less controversial, but regardless, the methods did indeed come under attack.

But if we take a moment to understand the science behind sampling methods, and realize that it is not voodoo magic, then I think the community can begin to focus on the real issues the study uncovered. To put it simply, sampling is indeed science. Without going into the theory of probablity and the numerous mathematical assesssments to test the validity of a sample, we can say that a random sample, so long as the laws of probablity and nature hold true, and some tear in the fabric of the universe has not occured, is certainly representative of any population it attempts to embody.

Let us first begin to understand why this is so. When I taught statistics and probability to undergrads during my days as an instructor, I found I needed to keep this explanation simple – not because my students lacked the intellengence to fully understand this, but more so because probablity theory can get a little sticky, and keeping the examples simple seemed to work best. Imagine we have a coin – a fair sided coin that is not weighted in any way (aside from a screw up from the Treasury, in which case your coin could be worth a bundle of cash). We all know this example. If you flip it, you have a 50-50 chance of getting a particular side of that coin. In essence, that is the law of probability (the simplest of many).

Random sampling is the same way. While there are various methods to go about sampling a population randomly, Simple Random Sampling is the easiest and most commonly used. To put it simply, each member of a population is assigned a unique value, and a random generator picks values within a defined range (say 1 to 1,000,000). Each member of that population has an equal chance of being selected. These chosen members become the lucky ones to be a true representation of a population. They are not “chosen” in the sense that they get to drink the Koolaid and ascend beyond, but they are chosen to speak on behalf of an entire population. Pretty cool, huh?!

These samples are representative because, well, probability tells that it is. I can spend pages and pages of your precious, valuable time discussing why this is the case, but that discussion will undoubtedly put you to sleep. However, this is why not every person in a population needs to be surveyed. And, it is a great cost conserving measure when you only have to sample, say, 500 people to respresent a much larger population. Here I can bore you again with monotonic relationships and exponential sampling benefits, but I will not do that. (You can thank me later).

Now for the real bang! Say you want to measure satisfaction of city services within a small city of 50,000 people. In order to have a representative sample, all you need is a sampling of 382 people (with a 5% margin of error). Now, say that you want to do the same study, only on the entire city of Louisville, with a population of nearly 1.5 million. What size sample do you think you need? Are you ready for this? The number is 385! Wow. Only 3 more randomly selected residents are needed for a population 30 times greater. The beauty of sampling, and wonders of monotonic relationships! More on that later. You can play around with all sorts of sample size calculators (or do it by long hand, if you dare). I suggest this site.

Of course, if you want a smaller margin of error (in essence, if you want to be more confident that your sample is truly accurate of your population), you need to a larger sample. But I’ll post a discussion on margins of error and confidence levels another day. I leave you now to ponder the brillance of statistics!!