Introducing the United States Social Sentiment Index
Here in Boston one of the first things we do in the morning is take a look at the weather map. Things like temperature, precipitation, and the like can vary a lot from day to day, if not hour to hour! It helps us get a sense of what things are going to be like when we walk outside, as well as how things are looking for our friends and family around the country.
What if it were possible to get up in the morning and see an emotional weather map of the country as well? We all know that depending on the time of day, day of the week, and even time of year, that (on average) our collective mood can vary a lot. Not only that, things like weather, results of sporting events, and local events can also make the mood of certain geographic areas vary a lot. A single shot, or field goal attempt, or a random tree falling in the street that causes a major traffic backup, can all affect the collective mood of a community.
In 2009 some colleagues of mine over at Northeastern University noticed that using something called Natural Language Processing, or NLP, that data from the social media site Twitter could provide a way to map out emotional weather, or sentiment, of geographic regions and even non-geogrpahic “communities” (more on that in a bit). By looking at billions and billions of Tweets, messages that contain text as well as information on time and location, it was possible to map out mood patterns over time across locations in the USA and around the globe. We got some nice press and decided to form a company around this idea.
Since then, we have taken our time to delve deeply into the Twitter data, which has grown exponentially as it has become a global social media platform. Collecting, storing, and retrieving this data in order to perfect our sentiment measurement system was a big challenge, as was the unstructured nature of the data. What unstructured means is that, as opposed to surveys and other ways of trying to assess people’s mood, our tool does not ask questions but rather gleans emotional content from a large sample of all the messages out there on Twitter. This makes it much harder to glean such signal from a typical message as not many people Tweet “my sentiment is 20% lower this hour #sentiment95.3”! Instead, we used some techniques from a research area often referred to as Deep Learning to get the best possible tool to measure accurate sentiment from each and every message. To learn more about how we did that, click here (link to Bjarke paper).
In addition to making sure our tool was sensitive to sentiment changes and specific in that it was consistent over time, we also looked into how our sentiment measure and other tools could be linked and even predict real-world phenomena, making our tool something that was not simply “cool” but useful.To help with that, we sought out partners would could help us figure out how we could use our tool in a way that could be useful in areas outside of academia. We were fortunate to build a partnership with IHS (now IHS-Markit) and Dow Jones, Inc., publishers of the Wall Street Journal.
All our work over the past few years has led up to today, where we are proud to announce the launch of the Wall Street Journal-IHS United States Social Sentiment Index. This index provides daily sentiment for the United States from over 5 million messages that are localized to time, place (state) and even gender! The data is update hourly, so it’s possible to see in real time how the mood of the country is changing. Our historical library of over 5 years of data allows for comparision of how sentiment today compared with the same date from 2012. Furthermore, we have done research to show how this data can be used to forecast economic measures of interest including consumer confidence and volatility in the stock market.
Over the next weeks and months we will be posting a number of cool findings from our data, as well as give some insight into how our technology works. Follow us on Facebook and Twitter to get updates on these releases and other updates.
Where do we go from here? Communty tracking is one area. I mentioned before the idea of following geography independent “communties”. We know that not all Cubs fans live in Chicago or stock traders live in New York, though we might love to know how the mood of these groups is when the Cubs wins the world series or the stock market takes a dive. Using tools that we helped invent in the area known as social network analysis, we can find those communities and follow them just like we might follow people from Connecticut or California. We are hard at work identifiying those specific communities as well which will enrich our data with ever increasing indices from the data.
One other thing we are eager to do is to partner with researchers and others who think they might have a use for data such as ours. We are all from academia and believe that there are many applications of our data that we have not even considered yet. We are eager to share our historical data whenever possible with people and build a community of users globally who can help put our tool to good use.
Looking forward to hearing from you!