Teaching Data Feminism to Data Science Students

The SDS Capstone is a one semester course focused on using data in ethical ways to collaboratively solve real-world problems. While the capstone has been offered numerous times, I just finished leading that course for the first time.  What a semester to do anything, none-the-less attempt to really stretch myself beyond my content comfort zone! But this blog post isn’t about the challenges of switching to remote mid-semester, its about incorporating the new book Data Feminism (by Catherine D’Ignazio and Lauren F. Klein) into the course.

Data Feminism

First a Little Context

In the SDS capstone, the students work in groups with external partners/sponsors who have a data project that the group of students work on. The students usually are all senior SDS majors. In addition to working with partners/sponsors a large portion of the class is dedicated to introducing and interrogating ideas around data ethics and data justice. This is in-line with the SDS major learning goals:

  • Identify and work with a wide variety of data types (including, but not limited to, categorical, numerical, text, spatial and temporal) and formats (e.g. CSV, XML, JSON, relational databases, audio, video, etc.).
  • Extract meaningful information from data sets that have a variety of sizes and formats.
  • Fit and interpret statistical models, including but not limited to linear regression models. Use models to make predictions, and evaluate the efficacy of those models and the accuracy of those predictions.
  • Understand the strengths and limits of different research methods for the collection, analysis and interpretation of data. Be able to design studies for various purposes.
  • Attend to and explain the role of uncertainty in inferential statistical procedures.
  • Read and understand data analyses used in research reports. Contribute to the data analysis portion of a research project in at least one applied discipline.
  • Compute with data in at least one high-level programming language, as evidenced by the ability to analyze a complex data set.
  • Work in multiple languages and computational environments.
  • Convey quantitative information in written, oral and graphical forms of communication to both technical and nontechnical audiences.
  • Assess the ethical implications to society of data-based research, analyses, and technology in an informed manner. Use resources, such as professional guidelines, institutional review boards, and published research, to inform ethical responsibilities.

As a program, SDS is working to weave the ethics learning goal throughout the curriculum, and the capstone course helps anchor this goal in a culminating experience.

A little self-disclosure: I am not a data ethics expert. I am trained as a Biostatistician and Epidemiologist, have completed many human subjects trainings, and am very familiar with IRBs. I have thought, read, and listened to discussions about how people who use data can impact peoples’ lives. But I have a lot of room to grow when thinking about data ethics. I also am a white trans man in a TT position who has two parents with PhDs. Just like everyone else, my perspective is rooted in my particular experiences. While I try to educate myself, I don’t always do a great job of perceiving other peoples’ needs, concerns, and perspectives. Date Feminism terms this privilege hazard.

Setting my Intentions

In my syllabus I wrote:

The ability to analyze data, (that is to take data, summarize, visualize, and investigate relationships from data to discover useful information) is incredibly powerful. How is this power used? Who wields this power, and who is subjected to this power? We will investigate these questions to identify how to use data in ethically responsible and just ways.

At the end of the course, I wanted to get students closer to being able to create for themselves their own standards of data ethics and data justice, to be able to put those standards into action, and to be able to communicate why and how those standards should be enacted. I wanted students to move from passively accepting ideas of what is and isn’t ethical or what is and isn’t just towards defining these ideas for themselves. My reasoning here is that as students go out into their new roles after graduation, they are going to encounter situations that we could not anticipate. They need to adapt to these situations and be able to define for themselves how data ethics and data justice should be applied.  They need to be ready to be thoughtful, reflective leaders in data ethics and justice.

Creating a Structure

Healing Crystals 101: Finding the Right One for YouDon’t let my collection of crystals sitting next to my desk in my office fool you. I am not someone who does well when the expectation is to ~go with the flow~. That idea makes me break out in a nervous sweat and/or hives. I am more of a go with a very detailed and thought-out plan that is broken into discrete components kind of person. Preparing to teach this class was keeping me up at night. I had never taken a class like this, I didn’t feel like I was an expert in data ethics (or GitHub, or Scrum. But those are other blog posts), and I was worried I was going to mess up for this class that is supposed to be the cherry on top of these students’ college experience. So I wasn’t putting any pressure on myself at all.

Three things that helped me get my footing:

  1. My colleague Ben Baumer had taught the capstone multiple times before and he was very free with his materials. Thanks Ben!
  2. All of my SDS colleagues (including Ben) were open to talking over ideas. Thanks Ben, Albert Kim, Katie Kinnaird, and Randi Garcia!
  3. Having a book (rather than needing to cobble together readings) was an incredible help!

For the data ethics / data justice component of the course I ended up with this plan:

  • Tuesday week 1: students are assigned to read chapter 1 of Data Feminism and to write a blog post in response to the reading, due Tuesday week 2.
  • Tuesday week 2: students are assigned to respond to at least 2 of their classmates’ blog posts with comments by Thursday week 2. They are also assigned to read the next chapter (chapter 2) of Data Feminism, and to post about that reading by Tuesday week 3.
  • Thursday week 2: discussion of chapter 1 of Data Feminism in class.
  • This pattern continues until we are finished with the book
  • I also assigned a long-form blog post for the end of the semester that could be about anything as long as it addressed data ethics in an in-depth way
  • Throughout the semester, students are working in groups of 4 or 5 with external partners on a data science project. At the end of the semester the students work in their groups to write a final paper on their project which includes a data ethics section where they discuss the ethical considerations that were a part of their project experience.

Here’s what a typical Thursday class looks like:

  1. Teams meet for their scrum standup for 10 minutes.
  2.  5 minutes: I have an excerpt from the chapter we are discussing projected, and students individually do a free write to get them thinking about the chapter we will be discussing. They do not share the writing.
  3. In groups of 4 or 5: I present a second excerpt or figure from the chapter and ask students in their groups to discuss how that connected with the main idea of the chapter, questions about the excerpt or figure, and ideas about how this relates to data ethics and data justice
  4. As a whole class: students report back from the groups, and we create a mind-map about the chapter, where concepts or questions are written on the white board, and concepts that are related to each other are connected with a line.  The goal here is to make connections explicit and to organize ideas about the chapter.
  5. A class activity that brings in materials or ideas from outside of the textbook that relates to the chapter. I usually would try to have these be generative exercises where the students are creating meaning for themselves.
  6. Wrap up the class, remind them about due dates, upcoming assignments, etc.

Example Activity: A Data Ethics Inclusive Workflow

For these activities, I wanted to transition from having the students searching for answers to their data ethics/ data justice questions from the book or from their instructors and start to navigate their own path towards data ethics / data justice.

Ultimately, students will encounter difficult conundrums for which there is no accepted “right answer”. They will need to chart their own courses. With these activities I hoped to get them more comfortable in that role.

Alright, enough preamble, let’s get into it! One such activity came after we discussed the need for context in data, which is addressed in chapter 6 of Data Feminism “The Numbers Don’t Speak for Themselves”.

For the activity I introduce Tidy Tuesday, an offshoot of the R For Data Science online learning community that supports learning from the freely available R for Data Science book.  [Side note: these are fantastic resources for students to know about, so if nothing else, students will come away from the class with more knowledge of the R community and ways to engage. ]

As described in the Read Me in the Tidy Tuesday GitHub Repo, Tidy Tuesday is:

a weekly data project aimed at the R ecosystem. As this project was borne out of the R4DS Online Learning Community and the R for Data Science textbook, an emphasis was placed on understanding how to summarize and arrange data to make meaningful charts with ggplot2tidyrdplyr, and other tools in the tidyverse ecosystem. However, any code-based methodology is welcome – just please remember to share the code used to generate the results.

Every Monday, data (and other helpful files) are posted to the TidyTuesday repo, and anyone can use that data to create visualizations, and try out the the tidyverse to explore the data. Then, participants share their work and results amongst this friendly community.

Chapter 6 of Data Feminism gets into the pros and cons of open data, and the need for thinking about the context of the data:

  • How/Why/When was the data collected?
  • What data is missing?
  • Who is being centered? Who is being marginalized?

Next I shared with the students the Tidy Tuesday data set and description that week. In this case it was week 9 of 2020: measles vaccinations. We talked about these questions and thought of other questions that would be good to ask of the data. Once we have these questions, I  show a data analysis workflow visualization from the introduction of R for Data Science:

data-science-explore

I ask the class to take a minute and look this over. Have they seen something similar before? What is this visual communicating? I ask the class what they notice and what they wonder. How does this visualization relate to what we have been reading about in Data Feminism?

Next I give the students a task to be completed in their groups: starting with this original workflow, what would they add to have a workflow that incorporates the ideas we have been talking about with Data Feminism? The students work in groups to create their workflows that include tangible steps relating to data ethics and data justice considerations. Then I have the students go around and look at what the other groups created.  Lastly, we had a discussion about what they noticed / liked about the other groups’ workflows. In future classes, I would give each group a chance to revise their workflows now that they got ideas from the other groups.

I was extremely impressed with the workflows the students designed. They were able to transition from discussing and critiquing Data Feminism to creating an action plan for future projects.

What I Would Change…

The book Data Feminism wasn’t yet available when I started the semester in January 2020, but the authors had made a draft online for open peer review. So we were using the draft. This had a few benefits: it was free, the students got to learn about peer review, and it was interesting to get a behind-the-scenes look at writing a book. The downside was that the students were reading a product that was still being revised. In the future I will assign the actual non-draft version of the book, which is totally re-organized and has many improvements over the draft, and costs ~30 dollars.

As the semester went on, students started questioning in their blog posts learning about data ethics from a book written by two white women. I was glad that students brought this up and incorporated this feedback into the weekly assignments. While the text includes many examples from people with many identities that do not overlap with those of the authors, those examples are still filtered through the authors’ perspectives and experiences. I began supplementing readings (or sometimes substituting readings) with primary source material created by BIPOC data scientists and sometimes TedTalks as well. I still think the book (despite its faults) has a lot of value and is at an appropriate level for advanced undergraduates. Going forward I want to address this head-on in the beginning of the semester and will assign readings from people with different experiences and identities then the book authors every week. I also am actively looking for other texts to use in addition to Data Feminism. 

I learned a great deal over the course of the semester, from reading the book and the blog posts by the students in the class. I realized that I could stand to learn a lot more about black feminist thought, intersectionality, standpoint theory, and other topics touched on in Data Feminism. This is fun for me. I am going to be reading up on these topics this summer. Books I plan to read include:

As far as managing the work of administering the class, I made my typical new class prep mistake and bit off more than I could chew, especially where the blog posts were concerned. Students wrote wonderful blog posts, and in turn I wrote very long and in-depth responses as part of their grading and feedback.  Sometimes my responses were longer than what they had written in their blog posts! This ultimately was not sustainable when balanced with all of the other work and commitments I had going on this semester. I tried to keep this up, but I just wasn’t able to. In the future I will give an in-depth response to the first blog post and then tell the students that they can pick one blog post in the future that they would like me to give a detailed response to, otherwise I will give more cursory responses to each post, with a longer response that is shared with the whole class.

Would I Do This Again?

Yes! Using Data Feminism gave me the structure and content that I needed to help me broach conversations that I would otherwise not have felt comfortable to do so. I think there is always room for improvement, but for a first attempt at teaching this content, it felt like a success. We will see if my student evaluations corroborate that 🙂

This entry was posted in Uncategorized. Bookmark the permalink.