Updated at: 19 August 2025 (Javier Garcia-Bernardo, Laura Boeschoten, and Thijs Carrière)

Course manual

Introducton to digital trace data provides students with foundational knowledge in digital behavioral data, focusing on collecting, analyzing, and interpretating such data. The course emphasizes various data types and methodologies, and the implications that data and algorithmic biases play on reinforcing inequalities.

Over eight weeks, students delve into the critical analysis of digital behavioral data. Weekly lectures cover theoretical and methodological background, while practical sessions involve hands-on data collection and analysis. Topics include social media data, web scraping, APIs, data donation, survey data, data quality frameworks, and ethical considerations in data collection and analysis.

Pre-requisites: Participants should be familiar with data analysis software (preferably Python or another programming language/statistical analysis software e.g. R, JASP, Stata, SPSS, SAS).

General course information

The course (7.5EC) is structured through eight weeks as following:

1 lecture per week. Before the lecture, students are expected to read the assigned readings.
1 practical per week. The practicals are hands-on sessions where students will work collecting, analyzing and interpreting digital trace data. Make sure to bring a laptop to the practicals. Questions about the practicals can be discussed in the next practical.
1 group project (60% of the grade) with two assignments (30% each). For more information, read the project guidelines.
1 final exam (40% of the grade) on Remindo, composed of multiple choice and open questions.

Attendance is mandatory. If you are unable to attend a lecture or practical session, please inform the course coordinator in advance.

If you miss a session (e.g., due to sickness), you should catch up in the regular way: Read the readings, go through the lecture slides, do the practicals, ask your peers if you have questions, and (after the above) ask the lab teacher for further explanation.

To pass the course you need:

Participate in the group project: We will create the groups for the group project during the first practical. If you miss that practical you will not be able to participate in the group project and will fail the course.
To attend all four feedback sessions of the group project (see the project guidelines).
Have a grade above 5.5 in all components (the two projects and the final exam).
To have a final grade greater than or equal to 5.5. The final grade is based on the group project and the final exam. The group project consists of two assignments, each worth 30% of the final grade. The final exam is worth 40% of the final grade.

Resit: If the final grade is between 4.0 and 5.4, students may resit the exam. The resit will replace the grade from the exam. To have the right to the resit you need to attend over 75% of the practicals (i.e., you can only miss one).

Fraud and plagiarism

Plagiarism and fraud are serious academic offenses. Plagiarism is defined as the use of another person’s work without proper acknowledgment. This includes copying and pasting text from generative AI, the internet, from books, or from other students. If you use text from another source, you must put it in quotation marks and provide a citation. If you do not, you are committing plagiarism. Fraud is defined as the use of dishonest methods to gain an unfair advantage. This includes copying another student’s work, submitting work that is not your own, or submitting the same work for two different courses. If you commit fraud or plagiarism, you will fail the course. If you are not sure what constitutes plagiarism or fraud, please see the (UU Fraud and plagiarism policy)[https://students.uu.nl/en/practical-information/policies-and-procedures/fraud-and-plagiarism].

Use of generative AI (Scenario B of the UU GenAI index)

You may use GenAI to prepare the work you hand in. What are considered preparatory tasks for this assignment are below. You may NOT use genAI for the assignment that you hand in, with the exception of copy-editing. You may use AI tools to assist you in generating code that results in reproducible data sets.

The use of generative AI (e.g., chatGPT) in the group assignment is allowed only for the following cases:

Creating code to download and analyze data, or to explain code.
Labeling data.
Copy-editing text (i.e., making the text more readable, but not changing the content).

The use of generative AI must be clearly indicated in the assignment, including a link to the full conversation with the tool (either using the Share button in the top-right corner, or exporting the conversation to an online document).

The materials in this course are generated by FSBS teaching staff, who hold the copyright. The intellectual property belongs to Utrecht University.

⚠️ Warning There is no information in these materials that exceeds legal use of copyright materials in academic settings, or that should not be part of the public domain.

You may use all content in this course—excluding staff names and datasets—and submit it as input to GenAI tools, provided that the content is not used for further training of the model.

If you do not know how to prevent the use of the content for further training of the model, you should not use any course materials as input for the AI tool. The same holds if you are not absolutely certain that the content is not used for further training of the model.

Who to ask what

General questions about the course: Email course coordinator (Javier (before October 1st) or Laura (after October 1st))
Questions about the lectures: Email lecturer (Laura or Javier)
Questions about the practicals or group project (including grading): Email Thijs

Course objectives and learning outcomes:

The course aims to provide students with foundational knowledge in digital behavioral data, focusing on collecting, analyzing, and interpretating such data.

At the end of the course:

Students develop fundamental knowledge and understanding of digital data collection and analysis
Students apply their knowledge in a multi-disciplinary context to contemporary problems
Students are able to judge how data and algorithmic biases can affect study results
Students are able to determine the most effective research method(s) to address a research problem
Students are capable of autonomous and responsible scholarly self-development
Students are able to identify and critically evaluate ethical dilemmas
Students are able to present on research findings and insights to specialist and non-specialist audiences clearly and unambiguously in English

Required readings (see also weekly schedule)

During the course, we will use the following readings:

Books:

Bit by Bit: Social Research in the Digital Age by Matthew J. Salganik
- Chapter 1: 1.1 - 1.4
- Chapter 2: 2.1 - 2.5
- Chapter 3: 3.1 - 3.4 & 3.6
- Chapter 6
Data collection with Wearables, Apps and Sensors by Florian Keusch, Bella Struminskaya, Stephanie Eckman, & Heidi Guyer
- Chapter 1
- Chapter 4
Data feminism by Catherine D’Ignazio and Lauren F. Klein
- Introduction
- Chapter 1
Big data and social science by Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter and Julia Lane
- Chapter 2
- Chapter 3.1 - 3.4
- Chapter 7.1 - 7.5, 7.7.2, 7.9
- Chapter 11.1 - 11.6

Digital Trace Data