Friday, July 8, 2011
Building Adly's Twitter Analytics
If you are a regular reader of my blog, you will know that since joining Adly, I've been a little busy... Together, we have designed and built a real-time search platform, a content and location targeted ad network, and this morning we announced the public release of our Twitter Audience Analytics Platform, Adly Analytics.
This post will discuss some of the tools and methods that we used to pull and process the data to turn it into what it is today. As discussed in our press release and on TechCrunch, we have 8 tabs (12 views) worth of information that represent some of the ways that Adly influencers can view their audience data: Top Mentions, Also Follows, Top Followers, Follower Growth, Gender, City, State, and Country. In each section below, I will describe some of the techniques we use to gather and process this information.
Before I dig into each of the specific tabs, I wanted to give a brief overview of the technology we use to make everything happen. There will be more detail to come on this front, and I am in the process of writing some technical whitepapers, but for now here is the big picture.
The main technical tool at our disposal is our custom Twitter Spider. Similar in a sense to GoogleBot that crawls much of the web, our spider crawls different parts of Twitter. On the more technical side of things, our spider communicates with Twitter's servers using their API.
Each different kind of data that we fetch requires a different type of spider, and each type of data is stored in one or more different formats. The underlying technology is actually very straightforward; we use ActiveMQ as a coarse-grained task queue (one of our senior engineers, Eric Van Dewoestine, who is behind Eclim, wrote our custom Python bindings about a year ago for our Ad Network), Redis as our general high-speed data store, and a custom task processor written in Python to spider, process, and store the data back into Redis.
Let's look at a few of the tabs (you can see them all on Adly.com/Analytics):
The first tab, Top Mentions, is intended as a way to allow you to discover who are the most influential people that are @mentioning or retweeting you. We pull this information direct from Twitter and filter it to only represent those most influential people who are already interacting with you.
The data that is behind Follower Growth is used as part of many other parts of our system. Generally, any time we receive user information from Twitter (sometimes we get it as part of the call, like in the case for Top Mentions), we check to see if the information is for a user that we have determined to be influential (in the Adly network, is from a set of the most influential Twitter users, etc.). If it is, we update the current count for their number of followers, and place that user in a list of users whose followers we want to pull. Over time, we will fetch the full listing of followers for everyone we have determined to be influential, and combine all of these lists to find new users whose information we do not yet have. Some of these users will then be influential, thus helping us to further develop our listing of influential people, find more followers, etc.
Once we have the full listing of followers for any two influencers, we can calculate how many followers they share. For example, 24% of @JetBlue's followers also follow @50Cent, but only 8% of @50Cent's followers follow @JetBlue. Then again, @50Cent has over 12 times the number of followers, so his followers do tend to follow @JetBlue far more than is typical. This gives both @JetBlue and @50Cent the opportunity to discover brands and influencers that have something in common with themselves.
Like Also Follows, since we have the full listing of followers on hand, and because we also have the public user information for all of those followers, we can easily determine things like Justin Bieber (@justinbieber) and Ashton Kutcher (@aplusk) are @charliesheen's two biggest followers. This is useful to help discover the biggest influencers that are interested in what you say, and who you may want to start interacting with.
Gender, City, State, and Country
Like Top Followers, again we have the full listing of followers on hand, but we've pre-processed the user information to determine the gender and location of Twitter users around the world. We use this information to give both an individual "62% of Kim Kardashian's followers are women" or "6.8% of Fox News' followers are in Texas", as well as "Kim Kardashian has 1.3 times the number of women following her than expected" or "Charlie Sheen has 3.2 times the number of followers in Ireland than expected".
For instance, @SnoopDogg has one of the most diverse and globally representative followings we've ever seen on Twitter. He has fans all around the world, and in many cities and states one might not expect to find a lot of Twitter users.
Beyond interesting, this is useful Business Intelligence for Snoop and his managers. Understanding who his mega fans are, and where they are most conventrated can be helpful in planning tours, personal appearances, PR and more to really bring his Twitter following to life in a meaningful way.
Anyway, I am actually headed out for some much needed R&R, so will come back to this topic later with more technical discussion and some graphics to illustrate how we created the new service.
If you want to check it out for yourself, leave a comment below with your Twitter @name, and I will be sure to get you in the queue.