Digital Trace Data

Logo

Introduction to digital trace data`:` Quality, ethics, and analysis. Utrecht University

Updated at: 1 September 2024 (Javier Garcia-Bernardo and Laura Boeschoten)

Group project guidelines

The group project is a central part of the course. The project is designed to give you hands-on experience with digital trace data, and to apply the knowledge you have gained in the course to a real-world problem. In the practical you will work in groups to collect text data, label it using Natural Language Processing models, discuss the errors and biases that you encounter in the data, and interpret the results in light of the errors.

The slides used in the first lab are available here.

Timeslots for the Wednesday feedback moments (see dates and location on MyTimetable)

Group Timeslot Instructors
Uncia 13:15-13:30 Thijs + Javier
Tigris 13:35-13:50 Thijs + Javier
Lynx 13:55-14:10 Thijs + Javier
Onca 14:15-14:30 Thijs + Javier
Pardus 14:45-15:00 Thijs + Laura
Cheetah 15:05-15:20 Thijs + Laura
Caracal 15:25-15:40 Thijs + Laura
Leo 15:45-16:00 Thijs + Laura

Practical information

Assignment 1: Errors in data collection (30% of the grade)

In the first assignment you will develop a research question, collect text data using one of the methods explained in the lectures/labs (data donation, plug-ins, scraping, APIs), and identify and discuss the errors that you anticipate and encounter when collecting data to answer it.

The outcome is a short report (<1,000 words excluding references and potential figures) that you will submit in week 5 (see weekly schedule). You will receive feedback from the lecturers during the feedback sessions (weeks 3 and 4).

Please find the template and the rubric of assignment 1 here.

Steps:

Grading:

Assignment 2: Errors in data labeling and moving past errors

In the second assignment you will further analyze the data collected in the first assignment to answer the RQ. For this you will use a Natural Language Processing model of your choice to label the data, which you will learn on week 5. These type of models can predict things from text. For example, they can predict if the text contains hate speech, polarized content, negative language, or specific personality traits. You will discuss the biases that you encounter in the labeling process, and how you will move past these errors in data collection and data labeling.

The outcome is a final presentation (10-15 minutes + 5-10 minutes of Q&A) that you will present in week 8 (see weekly schedule). Please submit the final presentation using the upload link on October 24th (before 23:59). Name your slides presentation_groupX.pdf/pptx.

Please find the template and the rubric of assignment 2 here.

You will receive feedback from the lecturers during the feedback sessions (weeks 7 and 8).

Steps:

Grading: