This lesson is still being designed and assembled (Pre-Alpha version)

Protecting Patron Privacy Pythonically: Glossary

Key Points

Introduction
  • First key point. Brief Answer to questions. (FIXME)

Importing Data with Pandas
  • Use import to load a library and make it available to your own code

  • Use the help() function to see the built-in documentation for a library

  • Import data into python with the pandas library

  • The info() function will display a summary of your imported data

Working With Data
  • Data imported using the pandas library is organized into a powerful structure called a DataFrame.

  • DataFrames have many useful features for putting data to use.

PII and Other Risky Data
  • Personally Identifiable Information (PII) is of two types.

  • In a library context, PII 1 is information about a patron. (E.g. name, date of birth, library barcode, etc.)

  • PII 2 is information about your activities and other information that can be linked back to a patron. (E.g. search history, circulation records, access to electronic resources, etc.)

  • By making connections within a pool of data, it is possible to identify specific patrons and their activities

  • Limiting the data we collect and how long we keep it around can help mitigate these risks

Parsing Data with Functions
  • Write functions to efficiently run code you want to reuse.

  • Functions can make use of other functions - those you import from libraries, as well as those you write yourself.

  • Well written and tested functions can reliably do things that might be hard to accomplish by hand.

De-identification
  • De-identification is the process of removing or obscuring PII, such that the remaining information does not identify an individual.

  • De-identified information can be re-identified, given access to the right information (e.g. the algorithm or pseudonym used for de-identification or sufficient data from other sources about the patrons in the original data).

  • Anonymization is the process of de-identifying information in such a way that it cannot be re-identified, usually by means of statistical disclosure limitation techniques.

  • Due to continuous advances in computation technology, full anonymity is difficult (some would say impossible) to guarantee.

Aggregation & Re-identification
  • Data aggregation is the process of combining data in such a way that it no longer refers to specific individuals, but rather reveals insight about groups within the population.

  • Data which is both de-identified and aggregated can still be valuable for analysis while posing less risk to the privacy of our patrons.

Glossary

FIXME