Social media analysis is valuable for the study of anything from fashion trends to friendships. However, there are few systems able to study the estimated 7 billion tweets every day globally. TAGHREED is the first system for efficient and scalable querying, analysis, and visualization of billions of tweets, and powers GIS analyses such as tweets by user geolocation, or tweet language, as well as non-GIS analyses. TAGHREED is able to provide real-time updates on on-going topics/events that are trending on social media.
TAGHREED consists of five main components. The indexer efficiently digests incoming tweets with high arrival rates into light memory-resident indices. The flush manager transfers memory-resident indices to disk. On memory failure, the recovery manager restores the system status from replicated copies. The query engine generates an optimal query plan to be executed by efficient retrieval techniques that provide query responses in the order of milliseconds despite the large data set. The visualizer allows end users to issue a wide variety of spatio-temporal queries, and interactive exploration through the data.