Aggregation & Re-identification

Overview

Teaching: 10 min
Exercises: 10 min

Questions

If de-identified data can be re-identified, and anonymization is hard to guarantee, how can we protect our patrons’ privacy?

Objectives

Summarize and aggregate data about individual patrons into data about the population

Evaluate the risk of re-identification by looking at the size of the smallest sub-populations described

FIXME

Key Points

Data aggregation is the process of combining data in such a way that it no longer refers to specific individuals, but rather reveals insight about groups within the population.

Data which is both de-identified and aggregated can still be valuable for analysis while posing less risk to the privacy of our patrons.

previous episode

Protecting Patron Privacy Pythonically

lesson home

Aggregation & Re-identification

Overview

Key Points

previous episode

lesson home