Big data, Algorithms, and the Princely County of Gorizia and Gradisca

An ambitious piece of big-data research brings back to life a forgotten corner of old Europe. By digitising 47,000 pages of old newspapers, researchers at the University of Bristol have investigated political and cultural features in the Princely County of Gorizia and Gradisca, a lost world that was a region of the Austrian-Hungarian Empire, and is now split between Italy and Slovenia.

The study, which combines big-data with history and library sciences, involved three data-scientists with different backgrounds, two computer scientists and one historian. The article has appeared in the international journal “Historical Methods”.

Nello Cristianini, Thomas Lansdall-Welfare & Gaetano Dato (2018)
Large-scale content analysis of historical newspapers in the town of Gorizia 1873–1914,
Historical Methods: A Journal of Quantitative and Interdisciplinary History,
DOI: 10.1080/01615440.2018.1443862

This study is an example of the growing field of Digital Humanities, an area of scholarship where the University of Bristol has played a pioneering role, particularly in the massive-scale analysis of media content in the past few years.

The project leader, professor of Artificial Intelligence Nello Cristianini, said:

“The territory of Gorizia is a unique place, the northern-most point of the Mediterranean world, and the place where Germanic, Slavic and Latin civilizations have been meeting for centuries. Its multilingual and multi-ethnic character makes it an ideal place to start developing tools that we can apply to wider European areas. And the historical time before WW1 is an incredibly interesting time to investigate, a time of rapid change, with problems and anxieties that sound very familiar to the modern ear”.

Using microfilms from the Biblioteca Statale Isontina, the researchers digitised 47,000 pages of two Italian language newspapers covering the period 1873 – 1914, obtaining 110 million words. These were then combined with comparable Slovenian language newspapers, already digitised by the Slovenian Digital Library, for a total of 180 million words. A human scholar would have needed 8 years just to read them.

 


The Bristol group will hold workshop on Digital Humanities in Windsor in June: http://thinkbig.enm.bris.ac.uk/dh-css-workshop

This project was partly funded by the ERC Advanced Grant ThinkBIG, which explores applications and implications of big-data technologies.

 

Media Coverage:

http://ilpiccolo.gelocal.it/tempo-libero/2018/05/04/news/nuove-informazioni-da-180-milioni-di-parole-1.16793125

http://ilpiccolo.gelocal.it/tempo-libero/2018/05/04/news/cosi-a-gorizia-arrivo-il-primo-telefono-1.16793332

Skip to toolbar