Digital Trace Data

Logo

Introduction to digital trace data`:` Quality, ethics, and analysis. Utrecht University

Updated at: 19 August 2025 (Javier Garcia-Bernardo, Laura Boeschoten, and Thijs Carrière)

Course manual

Introducton to digital trace data provides students with foundational knowledge in digital behavioral data, focusing on collecting, analyzing, and interpretating such data. The course emphasizes various data types and methodologies, and the implications that data and algorithmic biases play on reinforcing inequalities.

Over eight weeks, students delve into the critical analysis of digital behavioral data. Weekly lectures cover theoretical and methodological background, while practical sessions involve hands-on data collection and analysis. Topics include social media data, web scraping, APIs, data donation, survey data, data quality frameworks, and ethical considerations in data collection and analysis.

Pre-requisites: Participants should be familiar with data analysis software (preferably Python or another programming language/statistical analysis software e.g. R, JASP, Stata, SPSS, SAS).

General course information

The course (7.5EC) is structured through eight weeks as following:

Attendance is mandatory. If you are unable to attend a lecture or practical session, please inform the course coordinator in advance.

If you miss a session (e.g., due to sickness), you should catch up in the regular way: Read the readings, go through the lecture slides, do the practicals, ask your peers if you have questions, and (after the above) ask the lab teacher for further explanation.

To pass the course you need:

Resit: If the final grade is between 4.0 and 5.4, students may resit the exam. The resit will replace the grade from the exam. To have the right to the resit you need to attend over 75% of the practicals (i.e., you can only miss one).

Fraud and plagiarism

Plagiarism and fraud are serious academic offenses. Plagiarism is defined as the use of another person’s work without proper acknowledgment. This includes copying and pasting text from generative AI, the internet, from books, or from other students. If you use text from another source, you must put it in quotation marks and provide a citation. If you do not, you are committing plagiarism. Fraud is defined as the use of dishonest methods to gain an unfair advantage. This includes copying another student’s work, submitting work that is not your own, or submitting the same work for two different courses. If you commit fraud or plagiarism, you will fail the course. If you are not sure what constitutes plagiarism or fraud, please see the (UU Fraud and plagiarism policy)[https://students.uu.nl/en/practical-information/policies-and-procedures/fraud-and-plagiarism].

Use of generative AI (Scenario B of the UU GenAI index)

You may use GenAI to prepare the work you hand in. What are considered preparatory tasks for this assignment are below. You may NOT use genAI for the assignment that you hand in, with the exception of copy-editing. You may use AI tools to assist you in generating code that results in reproducible data sets.

The use of generative AI (e.g., chatGPT) in the group assignment is allowed only for the following cases:

The use of generative AI must be clearly indicated in the assignment, including a link to the full conversation with the tool (either using the Share button in the top-right corner, or exporting the conversation to an online document).

The materials in this course are generated by FSBS teaching staff, who hold the copyright. The intellectual property belongs to Utrecht University.

⚠️ Warning There is no information in these materials that exceeds legal use of copyright materials in academic settings, or that should not be part of the public domain.

You may use all content in this course—excluding staff names and datasets—and submit it as input to GenAI tools, provided that the content is not used for further training of the model.

If you do not know how to prevent the use of the content for further training of the model, you should not use any course materials as input for the AI tool. The same holds if you are not absolutely certain that the content is not used for further training of the model.

Who to ask what

Course objectives and learning outcomes:

The course aims to provide students with foundational knowledge in digital behavioral data, focusing on collecting, analyzing, and interpretating such data.

At the end of the course:

Required readings (see also weekly schedule)

During the course, we will use the following readings:

Books:

Articles: