The exponential growth of social media and social network sites, such as Facebook and Twitter, and their potential impact on real world politics has increasingly attracted the attention of scholars in recent years. Broadly speaking, I think we can identify four main areas of research in this respect. The first area links social media with collective actions. For example, scholars have studied how social networks have been used to organize demonstrations and revolts during the “Arab spring” to engage individuals in mobilizations and to build social movements and political parties, such as the Pirate Party in Sweden and Germany or the Italian Movimento 5 Stelle, which use the web to set the party line and select candidates. The second approach investigates the possibility for the web to become an “uncoerced public sphere.” Thus far, several authors have debated about the potential of the new media to act as a “habermasian public sphere.” While some authors have suggested that the Internet and social media are potential sources of direct democracy, which may contribute to increasing responsiveness and accountability of real world politics, others have proposed diverging views strongly criticizing this same idea (see Hindman, 2009). The third (large) stream of research in the literature on social media adopts a more “political supply-side” approach, analyzing how the Internet and the diffusion of social media has affected the content of electoral campaigning and the candidates and parties’ political communication. While some of the initial hope for e-democracy has been unfulfilled, the Internet still provides new opportunities for electoral campaigning, which enables politicians to engage with the wider public.
However, the diffusion of social media makes it possible to delve into the web to explore and track the political and electoral preferences of citizens. It is this latter (fourth) area of study that I want to further discuss here.
More recently, scholars have started to explore social media as a device to assess the popularity of politicians, to track the political alignment of social media users, and to compare citizens’ political preferences expressed online with those reported by polls. Analyzing social media during an electoral campaign can indeed be very interesting for a number of reasons. Besides being cheaper and faster compared with traditional surveys, social media analysis can monitor an electoral campaign on a daily (or on an hourly) basis. Consequently, the possibility to nowcast a campaign, that is, to track trends in real time and capture (eventual) sudden changes (so called “momentum”) in public opinion faster than is possible through traditional polls (for example, the results of a TV debate), becomes a reality. Some scholars, however, go even further, claiming that analyzing social media allows a reliable forecast of the final result. This is quite fascinating, as forecasting an election is one of the few exercises in social science where an independent measure of the outcome that a model is trying to predict is clearly and indisputable available, i.e., the vote share of candidates (and/or parties) at the ballots.
To reach this aim, however, at least two challenges need to be successfully overcome. Some months ago, while attending a conference, I heard a speaker arguing that Giuseppe Civati won the primary election of the Italian Democratic Party, at least on Twitter. The speaker justified this statement by asserting that all the people the speaker was following on Twitter were posting messages in favor of Civati. After collecting and analyzing almost 600,000 tweets through Voices from the Blogs, which discussed the primary election posted in the three weeks leading up to the day of the election, I can confidently say that this was not the case. In fact, Civati was the third (and therefore, the last) candidate in terms of declared support on Twitter, clearly beyond Matteo Renzi as well as Gianni Cuperlo. This example warns us against the risk of political homophily and selective exposure that is always present nonetheless the promise of a virtual world where everyone can freely connect with anyone else (Colleoni et al., 2014). Moreover, relying on random sampling of Big Data Internet is extremely complex, more than working with traditional surveys. There is nothing like a comprehensive phone list of the entire Internet community on which the standard techniques of sampling are applied. Other than that, no reliable information about the individual traits of social media users is currently accessible, making the possibility of a stratified sample unfeasible. However, unlike traditional surveys where we have to rely on a sample precisely because analyzing the universe is unattainable, when we talk about social media, the entire universe is—in principle—available, at least the universe referring to public posts. However, reaching such a “universe” poses a technical and/or a financial problem. Regarding the technical problem, to be able to download all the public tweets, posts, and mentions, published on Internet on a given topic, you need to rely on an efficient crawler; moreover, you need to possess extremely good informatics knowledge to program such a crawler. Regarding the financial problem, a researcher can purchase such data from a firehouse on the market; however, this is normally (quite) expensive.
Both problems are clearly far from being irrelevant, but they are only the initial challenges confronting researchers. Imagine if you could collect the “universe.” The difficult part would only just begin: how does one analyze such a large amount of data? How would one extract politically significant meaning from the data?
In this respect, relying on a proper assessment method matters. Furthermore, this is clearly a statistical/methodological problem. For example, is it enough to count the volume of data related to candidates or parties to try to predict their final electoral result? Let’s go back once again to the example of the Italian primary election, but this time, we will analyze the 2012 center-left election: in November 2012, Matteo Renzi had approximately 73,000 mentions on Twitter (i.e., posts that contained the word “Renzi”), while Pierluigi Bersani reached approximately 26,000 mentions. According to these numbers, Renzi should have exceeded Bersani by approximately 73%; however, Bersani won the polls with a 10% margin in the first round (and over 20% in the second round). Of course, this should not be that surprising. Indeed, the number of mentions are indicative of only the notoriety (positive or negative alike), not the popularity or the (potential) support (at least online) for a politician. We recently conducted a meta-analysis on 80 social media-based electoral forecasts published over the last few years, covering diverse countries, such as the United States, Italy, France, Spain, Germany, and Singapore (Ceron et al., 2014). Our aim was to ascertain the reasons that could explain the accuracy of the electoral forecast. The results of the analysis show the crucial role played by the method adopted to analyze social media. Supervised and Aggregated Sentiment Analysis (that is, techniques that exploit the human codification in their process and focus on the estimation of the aggregated distribution of the opinions, rather than on individual classification of each single text)1 increases the accuracy of the forecasts by 5%, compared with forecasts based on volume of data or naïve techniques of Sentiment Analysis mainly based on ontological dictionaries. Interestingly, the analysis also reveals that, overall, social media is a better predictor of election outcomes because of the presence of electoral systems, either based on proportional representation or in which voters cast their vote on a specific candidate rather than choosing a party list or when the share of Internet users within a country increases. This last finding brings us to our final consideration.
When social media is analyzed in an attempt to nowcast or forecast politics, a traditional puzzle arises. Socio-economic traits of social media users do not exactly match the actual demographics of the whole population: people on social media are generally younger (albeit the percentage of elderly people is rapidly increasing) and more highly educated, concentrated in urban areas, as well as more politically active. However, do we need a representative sample when, for example, 22% of voters spontaneously declared their voting behavior on social network sites, as it happened during the U.S. Presidential campaign? Perhaps the sheer magnitude of data available on social media, i.e., the “wisdom of crowds,” may compensate for this partly unrepresentative information. For a crowd to be wise, it needs to be diverse, independent, and possess decentralized decision-making procedures. This is something that is usually attained in the world of the Internet.
Moreover, to cast an accurate forecast, we should be more worried about the distribution of political preferences on the web. Previous (albeit quite dated) analyses showed that left-leaning citizens are over-represented, though only marginally. We clearly need more (updated) analyses in this regard.2 Accordingly, one way to improve the social media forecast would be to develop an appropriate set of weights based on the representativeness of certain groups of users or, even better, according to the political preferences of social media users, provided this type of information is available (and reliable). Nonetheless, some of the potential bias that arises from social media analysis may be softened in the medium (short?) term with the increase in social network usage.
Finally, although the social media population is so far not always representative of one country’s citizenry, there are still some doubts about whether such bias could affect the predictive skills of social media analysis. Indeed, the latter aspect (the predictive skills of social media analysis) does not necessarily need the previous factor (i.e., the issue of representation) to hold true in order to be effective. This can happen, for example, if we assume that Internet users act like opinion makers who are able to influence (and thus, often anticipating) the preferences of a wider audience, including the ones of the broader media ecosystem. The same applies if social media discussions are able to reproduce the (more general) public opinion of a broad section of the community.
In sum, despite the well-known limits and challenges faced by social media analysis, there are reasons to be optimistic about the capability of sentiment analysis becoming (if it is not already) a useful supplement/complement to traditional offline polls.
1 For a discussion of the different methods available to analyze social media texts, see Ceron et al. (2013).
2 For a first step in this regard, see Vaccari et al. (2013).
- Ceron, A., Curini L., and Iacus S.M. (2014), Using social media to forecast electoral results. A meta-analysis, Working Paper, Department of Economics, Management and Quantitative Methods at Università degli Studi di Milano. URL: http://services.bepress.com/unimi/statistics/art62/
- Ceron, A., Curini L., and Iacus S.M. (2013), Social Media e Sentiment Analysis. L’evoluzione dei fenomeni sociali attraverso la Rete, Milan: Springer.
- Colleoni, E., Rozza A., and Arvidsson A. (2014), ‘Echo chamber or public sphere? Predicting political orientation and measuring political homophily in Twitter using big data’, Journal of Communication, 64(2): 317-332.
- Gayo-Avello, D. (2013), ‘A meta-analysis of state-of-the-art electoral prediction from Twitter data’, Social Science Computer Review, 31: 649–679.
- Hindman, M. (2009), The Myth of Digital Democracy, Princeton, NJ: Princeton University Press.
- Vaccari, C., Valeriani A., Barberá P., Bonneau R., Jost J.T., Nagler J., and Tucker J. (2013), ‘Social media and political communication. A survey of Twitter users during the 2013 Italian general election’, Rivista Italiana di Scienza Politica, 43(3): 381–410.