Folders
[General Documents]
[Data and Code]
[Workshop 2]
DIGISURVOR Team - University of Manchester
Rachel Gibson (PI):
Professor of Political Science, Cathie Marsh Institute for Social Research [CMIST]
rachel.gibson@manchester.ac.uk
Marta Cantijoch:
Senior Lecturer in Politics, Department of Politics
marta.cantijoch@manchester.ac.uk
Alexandru Cernat:
Associate Professor in Social Statistics, Cathie Marsh Institute for Social Research [CMIST]
alexandru.cernat@manchester.ac.uk
Riza Batista-Navarro:
Senior Lecturer in Text Mining, Department of Computer Science
riza.batista@manchester.ac.uk
Conor Gaughan:
Postdoctoral Research Associate, Cathie Marsh Institute for Social Research [CMIST]
conor.gaughan@manchester.ac.uk
Project Abstract
This project will focus on demonstrating the conceptual and methodological value and challenges in producing anonymised and standardised variables from survey respondents’ digital trace data (DTD). We will do this using existing YouGov datasets collected over two time periods in the US 2020 and 2024, and a third collected in the UK 2022. The US datasets link individual survey responses to their Twitter/X feeds and the UK to their browsing history. All three datasets were designed to address research questions about the effects of digital media consumption and exposure on citizen attitudes and behaviours. The project will proceed in three main stages. First we will identify a range of new anonymized variables that can be created from the DTD that can address important new substantive questions about the impact of web and social-media content on individuals’ political engagement. We will also specify a set of more methodologically interesting variables that we can extract from the observational trace data that can be used to validate the survey responses. After identifying the range of ‘ideal’ variables that could be generated, we will then select a subset of these variables to show how they can be operationalised and discuss the technical challenges faced in doing so, focusing particularly on comparing Twitter to browser data. We will select the variables by rating them on two core criteria of utility and scientific value and ease of computation. In a final stage we will reflect on the ethical issues raised in this process of linking survey data with digital trace data, and the key ‘take homes’ that our research has identified for future projects of this type to consider, prior to data collection.