Digital Trace Data

Logo

Introduction to digital trace data`:` Quality, ethics, and analysis. Utrecht University

Updated at: 20 August 2025 (Javier Garcia-Bernardo, Laura Boeschoten, and Thijs Carrière)

Group project guidelines

The group project is a central part of the course. The project is designed to give you hands-on experience with digital trace data, and to apply the knowledge you have gained in the course to a real-world problem. In the practical you will work in groups to collect text data, label it using Natural Language Processing models, discuss the errors and biases that you encounter in the data, and interpret the results in light of the errors.

The slides used in the first lab are available here.

Timeslots for the feedback moments (see dates and location on MyTimetable)

Practical information

Assignment 1: Errors in data collection (30% of the grade)

In the first assignment you will develop a research question, collect text data using one of the methods explained in the lectures/labs (data donation, plug-ins, scraping, APIs), and identify and discuss the errors that you anticipate and encounter when collecting data to answer it.

The outcome is a short report (<1,000 words excluding references and potential figures) that you will submit in week 5 (see weekly schedule). You will receive feedback from the lecturers during the feedback sessions (weeks 3 and 4).

Please find the template and the rubric of assignment 1 here.

Steps:

Grading:

Assignment 2: Errors in data labeling and moving past errors

In the second assignment you will further analyze the data collected in the first assignment to answer the RQ. For this you will use a Natural Language Processing model of your choice to label the data, which you will learn on week 5. These type of models can predict things from text. For example, they can predict if the text contains hate speech, polarized content, negative language, or specific personality traits. You will discuss the biases that you encounter in the labeling process, and how you will move past these errors in data collection and data labeling.

The outcome is a final presentation (10-15 minutes + 5-10 minutes of Q&A) that you will present in week 8 (see weekly schedule). Please submit the final presentation using the upload link before 23:59 in the day of the deadline (see weekly schedule). Name your slides groupX_presentation.pdf/pptx.

Please find the template and the rubric of assignment 2 here.

You will receive feedback from the lecturers during the feedback sessions (weeks 6 and 7).

Steps:

Grading: