Social media as a research tool – The danger of selection bias

Date: October 6, 2011 | Shawn Herbig | News | Comments Off on Social media as a research tool – The danger of selection bias

The New York Times recently released a story on how a group of sociologists from Cornell University have used Twitter to track the moods of people throughout the day, week, and month.  Using tweets from millions of Twitter accounts, they discovered that emotional tones of these tweets follow a pattern throughout time.  The researchers claim that this is evidence of a broader biological rhythm of human emotion that is irrespective of culture.  Of course, these findings are not without criticism, as it is using a rather subjective means to analyze emotional tone.  On a broader note, these criticisms are more so related to social media as a platform for research in general, rather than specifically on this Cornell study.  Without fully understanding the limits, researchers can become susceptible to selection bias in their samples.

But before going more into selection bias, let’s talk a bit why social media is such a goldmine for data.  Facebook and Twitter have hundreds of millions of users, and considering the entirety of the world’s population (just under 7 billion) this is a sizable representation.  Marketers have already begun using information posted on personal accounts to direct expansive and valuable product campaigns, and researchers are only now beginning to understand the potential these sources have for their own work.

Let’s take Facebook as an example.  There are around 800 million active users on Facebook worldwide.  And subscribers to Facebook are more than forthcoming when it comes to posting personal information, such as favorite music, foods, and movies, as well as the places they plan to visit.  On top of this, they tend to throw in the kitchen sink in their status updates.  Think of the vast amount of data that can be extrapolated from what people post online.

Even though social sites such as Facebook and Twitter have been around for years now, the technology to analyze these sources of data are only now beginning to catch up.  Text analytic software now has the capacity to handle such large amounts of data and meaningfully subscribe value to what is being said in qualitative response, such as Facebook and Twitter posts.  More on the capabilities of text analytics will be in our next post.

But how does this relate to selection bias?  Selection bias occurs when a sample is drawn from a population without regard to who that sample may be excluding.  If you are familiar with statistical sampling, then you know that samples must be taken randomly from the population in order for that sample to be statistically representative of the population.  But social media enters a whole other quandary of selection bias; because even if the Cornell researchers, as an example, randomly selected their sample of Twitter users, there is still selection bias occurring.  The problem does not lie in the fact that they used Twitter as their population source – the problem is that their findings neglect the fact that there are entire segments of the world population that does not use or has no access to social media.

The researchers took information that they gathered from Twitter, a social media platform, and stated conclusions based on all human beings.  How is this a problem?  It’s a problem because of who uses social media – typically younger, more affluent individuals.  Thus, how can we know for certain that all humans undergo these uniform emotional swings, such as third-world citizens who don’t have access to social media or the elderly who haven’t overcome the technological divide.  The simple answer is that we can’t.  And that is why selection bias is a problem here.

But regardless of this, if researchers take the time to understand the population of which they analyze and account for that, then the sky is the limit with new knowledge that is only now beginning to be extracted from this rapidly growing means of discourse – the social media sphere.

view all

Straw Polls – Should we listen to them?

Date: September 22, 2011 | Shawn Herbig | News | Comments Off on Straw Polls – Should we listen to them?

As a researcher, I absolutely love election season.  While I could say that the reason for this is that I am simply living up to my obligations as a citizen (partly true), the real reason I enjoy it so is because of all the polls that are released.  And because so many polls are released, it can become difficult to decipher which ones are good and which ones are political nonsense.  That is what makes it interesting for a researcher!

There has been a lot of talk in the recent Republican primary race about straw polls.  And each of these polls seem to declare a different victor.  Mitt Romney won the New Hampshire poll, Rep. Ron Paul won both the Washington, D.C. and the California polls, Herman Cain won the Arizona poll, Michele Bachmann was victorious in the Iowa poll.  So many polls, so many different winners.  This begs the question, what exactly are straw polls and should we as potential voters listen to them?

Let’s begin with the first question – what is a straw poll?  There are two broad categories of polling: scientific and unscientific.  Scientific polling uses random sampling controls so that the results from a sample that is drawn is statistically representative of the population.  Previous posts have discussed this greater detail.  Unscientific polling, on the other hand, has no systematic sampling controls in place that would allow for representation of a population.  Historically, a lot of straw polls in the United States have been political in nature, and are usually fielded during election season by a particular political party.  The very name “straw poll” alludes to their nature – it is thought that this idiom alludes to a piece of straw being held in the air to determine which direction the wind is blowing.

Most straw polls are very targeted, very narrow surveys of opinion.  Their main purpose is to take a “snapshot” of a general opinion during a particular point in time.  This seems valid enough, but the difference between scientific and straw polls exists within the methodology.  Most straw polls use a form of convenience sampling that is a bit unorthodox, and the selection bias associated with can be extreme.

It is hard to assign a broad methodology to all straw polls (as each one is different in its own right), but many of them have candidates, such as in the Ames Straw Poll in Iowa, attract voters to cast their vote on who they believe should be the Republican candidate.  If it sounds like political grandstanding, it’s because it is to some degree.  It uses somewhat of an “honor system” whereby anyone can vote (within the parameters), which opens up a whole argument regarding the validity of the polls.

This brings us to our second question – should we pay any heed to the results of these polls?  I previously stated many of the recent straw polls and their victors.  There have been many polls, and there have been many different winners.  But to answer this question, we only need to look at the candidates themselves.  And they certainly place weight on these polls.  Tim Pawlenty dropped out of the Republican primary because of the lack of support the Iowa poll showed for his campaign.  Entire strategies are formulated based on results of straw polls.  That is because these polls show the weaknesses of particular candidates.  And for this reason, candidates are perhaps wise to take caution to what the polls are telling them.

However, are they good predictors of ultimate outcomes?  In answering this question, we are reminded of the 1936 presidential election.  The Literary Digest conducted its own straw poll, which showed Franklin Delano Roosevelt being defeated by a large majority.  We all know this was not the case, and the reason for this catastrophic (as it led to the downfall of the Digest) miscalculation was in the methodology of the poll, which is the main criticism of any straw poll.  The Digest used their mailing list to administer the poll, which consisted of motor vehicle registries and telephone books.  The problem here?  It was the Great Depression – many Americans were too poor to own a car or telephone, and thus a large sector of the population was neglected in this poll (selection bias at its finest), the very sector that was more likely to vote for FDR and his economic reforms.

The point of this post is this: take what you hear from these straw polls with a grain of salt.  They do little to predict outcomes, but can be very valuable to the candidates themselves in adjusting and fine tuning their campaigns.  Although there is a vast expanse of difference that exists between a lot of straw polls and scientific research, it can be surprisingly easy to muddle the reliability of each. However, knowing how to digest the results of research, both good and bad, will help you to avoid unsettling surprises.

view all