Automated analysis of the US presidential elections using Big Data and network analysis

In all mature democracies, political elections are the arena of enormous mobilisation of material and symbolic resources, where the mobilisation of symbolic resources is defined as the strategic use of words, images and concepts to either persuade or influence public opinion. The American presidential elections are among the most interesting campaigns because of the sheer amount of resources deployed and the influence of the United States on global governance.

This study presents a large-scale analysis of mass media coverage of the 2012 US presidential elections, combining automatic corpus linguistics methods and network analysis to obtain a network representation of the entire campaign coverage by the news media. Mapping the full extent to which an electoral campaign is represented by media offline and online constitutes a very difficult challenge for researchers, given the large amount of data and the multitude of sources available in advanced democracies. In addition, whenever the full coverage has been analysed, the core method used so far has been traditional content analysis (in either the automatic or manual coding variant), which allows us to identify the most salient issues in each candidate’s campaign.

The traditional lines of investigation of the social sciences, such as the ideological position of candidates, their political communication strategy and the social representations of the election in the media remain salient, but require new conceptual and methodological approaches. We propose a Big Data approach, based on automated information extraction.

Our study is based on the automated analysis of 130,213 news articles related to the US presidential elections from 719 news outlets, based on state-of-the-art Natural Language Processing (NLP) and Artificial Intelligence (AI) techniques, to extract information about the key actors and their relations in the media narrative of the US elections. These relations are in the form of subject-verb-object (SVO) triplets extracted by a parser, and are organised as a network, or semantic graph, which can then be analysed with mathematical tools. This approach goes well beyond traditional word-association networks, producing not only directed links, but links whose semantic nature is retained and understood. Much of the following analysis would not be possible without these features, which ultimately resulted from the use of a parser to identify noun phrases and verbs to form SVO triplets. An example of one such triplet is ‘Obama criticised Romney’.

The article is divided into different sections: first, we discuss the theoretical framework that informed our study, stating the importance of applying a semantic graph approach to texts; second, we outline the novel methodology employed to automate the analysis of news media content. Third, we present our findings; last, we conclude by reflecting on both theoretical and methodological aspects of studying the network of actors and actions.