This lesson is still being designed and assembled (Pre-Alpha version)

Protecting Patron Privacy Pythonically

Do you handle patron data?
Would you like to, but the fear of compromising your patrons’ privacy provokes paralysis?
Do you handle it all the time and wonder what’s the BFD about privacy anyway, it’s just a book checkout, rite?
Would you like to have confidence that you are managing your patron data responsibly, so that you can do something useful with it?

If you answered “yes” to any of these questions, this workshop is for you. In it, we will discuss and implement strategies for de-identifying and anonymizing personally identifiable information (PII) within patron data. We will do this by simulating the Transform step of an Extract, Transform, Load (ETL) workflow. We will use the python programming language, due to its easy setup, gentle learning curve, availability of useful libraries for this purpose, and popularity for ETL applications. However, the concepts and techniques covered could be ported to other environments and/or languages as needed. While experienced programmers are welcome, this workshop is intended to be useful to you even if you have minimal programming experience.

This will be a hands-on workshop, so you should bring a laptop upon which you have installed or can install the relevant software. Instructions will be provided in advance of the workshop and support will be available on the day in case you need it. In order to get the most out of the material, you should be comfortable with programmatic thinking. Familiarity with fundamental programming concepts (variable assignment, looping, etc) would also be helpful. If you’re not sure if this is the right workshop for you, we’d be happy to talk before you register.

Under Design

These materials are being developed for their debut as a preconference workshop at Code4Lib19. Thereafter, they will be offered to the Library Carpentry community for further development and use as a lesson; thus the Carpentries chrome and formatting. As a disclaimer, they have yet to be reviewed, revised, vetted, &/or approved as part of the Library Carpentry curriculum. Any mistakes or errors within the content should be understood as the responsibility of the authors alone.

Prerequisites

  1. Learners need to understand what files and directories are, what a working directory is, and how to start a Python interpreter from a terminal window.

  2. Learners must install Anaconda or have a working python 3 environment with which they are familiar before the class starts.

    Please see the setup instructions for details.

Schedule

Setup Download files required for the lesson
00:00 1. Introduction Key question (FIXME)
00:15 2. Importing Data with Pandas How can I use a programming library in python?
How can I learn how to use a library that I have imported?
How can I import data into python?
How can I see my data once I’ve imported it?
00:45 3. Working With Data How can I work with my data once I’ve imported it?
01:15 4. PII and Other Risky Data What data could pose a risk to our patrons’ privacy?
Where might risky data be found?
What are some things we can do to limit these risks?
01:35 5. Parsing Data with Functions How can I execute the same code on different data?
01:55 6. De-identification How can I remove PII from library data?
02:15 7. Aggregation & Re-identification If de-identified data can be re-identified, and anonymization is hard to guarantee, how can we protect our patrons’ privacy?
02:35 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.