Using Text Analytics as a Research Tool

Date: February 14, 2012 | Shawn Herbig | News | Comments Off on Using Text Analytics as a Research Tool

Imagine you are handed every letter to the editor to the New York Times from 1984 through 2011, and you are asked to find every reference to the word “immigration,” take a count, and determine an underlying sentiment or pattern of thought regarding the term.

And you have 12 minutes to do it.

Yeah, right.

But just like it has revolutionized every industry, technology has provided the automated solutions and tools that can do just that—and in more like 12 seconds—with text analytics.

Basically this tool takes large bodies of text to find not just counts of terms, but through creative key term combinations and simultaneous searches, the sentiment behind the terms can be measured and evaluated as well.

This works great for all the social network activity out there right now — and which is only going to grow over time — because all the thoughts, feelings, observations, etc. that are posted on the Internet are all in the format of searchable text. New data is generated every day, in enormous amounts, and it holds rich, measurable data that can help companies keep a pulse on their reputation and the positive or negative sentiment that is out there regarding their products and services.

Text analytics can also be used internally to measure employee satisfaction, management effectiveness, reactions to change, alongside surveys and responses, to discover anything that needs discovered. The possibilities are truly endless.

We’ve done a lot of text analysis for our clients (and sometimes for a little fun), and it’s always interesting to see, not only how attitudes change around a certain topic or theme, but even how language changes in general. We know some grammar geeks and linguists who love doing text analysis to see what has happened to the language in the last 10, 20, or even 100 years.

A New Development in Text Analytics

An interesting new development in this already relatively new solution is the anonymous versus non-anonymous reaction, and what people say and how they say them if they believe their comments are anonymous, like on newspaper websites.

Many news sites and blog sites are now requiring people to login to the site as members or use their Facebook profiles to be able to make comments on articles and posts. The interesting aspect here is how the tone, positive or negative, aggressiveness, and civility in comments changes when the individual knows that his or her identity will be known when commenting. This is another layer of text analytics that can open up whole worlds of research data…and will.

view all

Text Analytics – Beginning to really hear what has been said

Date: November 4, 2011 | Shawn Herbig | News | Comments Off on Text Analytics – Beginning to really hear what has been said

Recently, I posted on the benefits and pitfalls of using social media as tool for research.  While it is certainly attractive to use such avenues because of the seemingly limitless sample pool (800 million Facebook users alone), researchers must be cognizant of the fact that any sample drawn from social media users will be skewed to typically younger, Western, and more affluent cases.  I argued that the Cornell researchers measuring patterns of emotions within humanity were not cognizant of this, or at least overlooked it.

In this particular post, however, I would like to speak more on how new technologies are allowing social media research to become more meaningful and available.  Let’s take Facebook as a convenient example.  If you are a marketer, for instance, and would like to understand basic demographics such as age and gender of a particular group, then there is no real challenge there.  Many social media sites offer such insights.  However, what if you wanted to better understand what is being said about a particular product?   This may present much more of a challenge, especially if your product is being distributed internationally or even inter-regionally in the U.S.

Enter text analytics.  Only until recently has technology caught up with the fast pace social media world to better analyze and understand what is being said.  The crux behind Facebook and Twitter, for instance, are feeds and tweets – what people are saying about their life, experiences, and the like.  This is what makes being involved in such discourse so appealing: the status update serves as our personal soapbox.  We can easily, without fear of stage fright or retaliation, say what we feel – about politics, about things we buy, and about our daily moods.  Think how valuable this is from  product assessment point of view.  And now we have the analytical means to assess this.

The capabilities of text analytics vary greatly and, as with anything, are multifarious in function.  The true value, however, exists in its ability to quickly quantify qualitative data.  This is how the researchers at Cornell University analyzed emotions.  Using such software enables the researcher to assign value to a group of comments (such as comments about mood and emotional tone, or about a product) in sentimental value.

As technology becomes smarter and more capable, a researcher’s ability also is increased.  Of course, before text analytics came on the scene we certainly could analyze and quantify qualitative feedback, but it took longer and there was more room for individual error.  But in the case of social media, where millions of data points are waiting to be mined, this was not necessarily a possibility.

Let’s take a look at some of the benefits that have been gained as a result of text analytics software:

1. All voices can be heard.  Traditional methods of analyzing qualitative data relied on researchers manually reading through comments and subjectively picking up on emerging themes.  While good researchers could perform this task exceedingly well, it was time consuming and was open for missing of important ideas.  Analytical software uses a series of algorithms that can pick up on consistent themes, thereby eliminating this potential error.  Thus, the voice of your customer or whoever you are trying to gauge will not go unheard.

2. Textual data sources are a good for picking up nuances.  Quantitative measures cannot capture the specific reasons why someone is unhappy or why they are satisfied.  That is why in depth research employs both quantitative and qualitative measures.  Comments allow you to pick up on key nuances that may be driving particular opinions.

Here at IQS Reseach, our text anlytics also employs something known as sentiment, which allows for a better understanding of not only what is being said, but also whether what is being said is positive or negative in nature.  If a common theme is “cost,” this enables us to understand if what is being about cost is good or bad.

Using these pieces of information as a background for the capabilities of text analytics can help us better understand the true value it provides when analyzing millions and millions of comments in social media avenues.  As our forums for thought open to new audiences and are expanded to all areas of the globe, our capability to measure these audiences must also expand.  And that is why researchers and consumers of research alike are now beginning to understand the full value of nontraditional methods of analyzing data.  How else could we measure something like human emotion on a platform as open and wide as Twitter?

view all

Social media as a research tool – The danger of selection bias

Date: October 6, 2011 | Shawn Herbig | News | Comments Off on Social media as a research tool – The danger of selection bias

The New York Times recently released a story on how a group of sociologists from Cornell University have used Twitter to track the moods of people throughout the day, week, and month.  Using tweets from millions of Twitter accounts, they discovered that emotional tones of these tweets follow a pattern throughout time.  The researchers claim that this is evidence of a broader biological rhythm of human emotion that is irrespective of culture.  Of course, these findings are not without criticism, as it is using a rather subjective means to analyze emotional tone.  On a broader note, these criticisms are more so related to social media as a platform for research in general, rather than specifically on this Cornell study.  Without fully understanding the limits, researchers can become susceptible to selection bias in their samples.

But before going more into selection bias, let’s talk a bit why social media is such a goldmine for data.  Facebook and Twitter have hundreds of millions of users, and considering the entirety of the world’s population (just under 7 billion) this is a sizable representation.  Marketers have already begun using information posted on personal accounts to direct expansive and valuable product campaigns, and researchers are only now beginning to understand the potential these sources have for their own work.

Let’s take Facebook as an example.  There are around 800 million active users on Facebook worldwide.  And subscribers to Facebook are more than forthcoming when it comes to posting personal information, such as favorite music, foods, and movies, as well as the places they plan to visit.  On top of this, they tend to throw in the kitchen sink in their status updates.  Think of the vast amount of data that can be extrapolated from what people post online.

Even though social sites such as Facebook and Twitter have been around for years now, the technology to analyze these sources of data are only now beginning to catch up.  Text analytic software now has the capacity to handle such large amounts of data and meaningfully subscribe value to what is being said in qualitative response, such as Facebook and Twitter posts.  More on the capabilities of text analytics will be in our next post.

But how does this relate to selection bias?  Selection bias occurs when a sample is drawn from a population without regard to who that sample may be excluding.  If you are familiar with statistical sampling, then you know that samples must be taken randomly from the population in order for that sample to be statistically representative of the population.  But social media enters a whole other quandary of selection bias; because even if the Cornell researchers, as an example, randomly selected their sample of Twitter users, there is still selection bias occurring.  The problem does not lie in the fact that they used Twitter as their population source – the problem is that their findings neglect the fact that there are entire segments of the world population that does not use or has no access to social media.

The researchers took information that they gathered from Twitter, a social media platform, and stated conclusions based on all human beings.  How is this a problem?  It’s a problem because of who uses social media – typically younger, more affluent individuals.  Thus, how can we know for certain that all humans undergo these uniform emotional swings, such as third-world citizens who don’t have access to social media or the elderly who haven’t overcome the technological divide.  The simple answer is that we can’t.  And that is why selection bias is a problem here.

But regardless of this, if researchers take the time to understand the population of which they analyze and account for that, then the sky is the limit with new knowledge that is only now beginning to be extracted from this rapidly growing means of discourse – the social media sphere.

view all