The New York Times recently released a story on how a group of sociologists from Cornell University have used Twitter to track the moods of people throughout the day, week, and month.  Using tweets from millions of Twitter accounts, they discovered that emotional tones of these tweets follow a pattern throughout time.  The researchers claim that this is evidence of a broader biological rhythm of human emotion that is irrespective of culture.  Of course, these findings are not without criticism, as it is using a rather subjective means to analyze emotional tone.  On a broader note, these criticisms are more so related to social media as a platform for research in general, rather than specifically on this Cornell study.  Without fully understanding the limits, researchers can become susceptible to selection bias in their samples.

But before going more into selection bias, let’s talk a bit why social media is such a goldmine for data.  Facebook and Twitter have hundreds of millions of users, and considering the entirety of the world’s population (just under 7 billion) this is a sizable representation.  Marketers have already begun using information posted on personal accounts to direct expansive and valuable product campaigns, and researchers are only now beginning to understand the potential these sources have for their own work.

Let’s take Facebook as an example.  There are around 800 million active users on Facebook worldwide.  And subscribers to Facebook are more than forthcoming when it comes to posting personal information, such as favorite music, foods, and movies, as well as the places they plan to visit.  On top of this, they tend to throw in the kitchen sink in their status updates.  Think of the vast amount of data that can be extrapolated from what people post online.

Even though social sites such as Facebook and Twitter have been around for years now, the technology to analyze these sources of data are only now beginning to catch up.  Text analytic software now has the capacity to handle such large amounts of data and meaningfully subscribe value to what is being said in qualitative response, such as Facebook and Twitter posts.  More on the capabilities of text analytics will be in our next post.

But how does this relate to selection bias?  Selection bias occurs when a sample is drawn from a population without regard to who that sample may be excluding.  If you are familiar with statistical sampling, then you know that samples must be taken randomly from the population in order for that sample to be statistically representative of the population.  But social media enters a whole other quandary of selection bias; because even if the Cornell researchers, as an example, randomly selected their sample of Twitter users, there is still selection bias occurring.  The problem does not lie in the fact that they used Twitter as their population source – the problem is that their findings neglect the fact that there are entire segments of the world population that does not use or has no access to social media.

The researchers took information that they gathered from Twitter, a social media platform, and stated conclusions based on all human beings.  How is this a problem?  It’s a problem because of who uses social media – typically younger, more affluent individuals.  Thus, how can we know for certain that all humans undergo these uniform emotional swings, such as third-world citizens who don’t have access to social media or the elderly who haven’t overcome the technological divide.  The simple answer is that we can’t.  And that is why selection bias is a problem here.

But regardless of this, if researchers take the time to understand the population of which they analyze and account for that, then the sky is the limit with new knowledge that is only now beginning to be extracted from this rapidly growing means of discourse – the social media sphere.