Protecting Patron Privacy Pythonically: Learner Profiles

Learner Profiles

Alex is an inveterate librarian. She has had to work hard to keep her skillset current. Fortunately, she enjoys learning. Unfortunately, there never seems to be quite enough time to study all she wants to learn while keeping up with her duties, particularly in the area of computing skills. She handles a fair amount of data - usually in the form of spreadsheets - that populate a number of reports she generates on a regular basis to show how her department is meeting the needs of its patrons. She is sure that there are more efficient ways of doing this, but so far, she seems to be stuck doing everything by hand. Library carpentry will teach Marsha how to clean her data efficiently using Open Refine, how to generate the reports (and filter out sensitive patron data) using python, and how to collaborate with her colleagues to share and improve her scripts using git.

Jean is an early career software engineer working in an academic library. They are skilled in programmatic thinking, but the transition to libraryland has introduced them to new programming languages, more open source collaboration, and the entirely new domain of library science. While confident in their core skills, all this novelty can be overwhelming and can sometimes leave them feeling a little like an imposter. They wish they could spare the time and expense of library school, as they feel it would benefit them to have a better foundation in library science, but there’s little financial incentive to do so, so instead they pick up what they can, where they can. Some of Library Carpentry’s curriculum is therefore a review (basic shell, git, & python), but there’s enough new material (OpenRefine) taught from a new slant (some of the python libraries, library data) to make it seem a good use of their time.

After completing the Patron Privacy lesson, learners will be able to find Personally Identifiable Information (PII) in the data they handle. They will be able to de-identify, anonymize, and aggregate such data to construct valuable datasets that protect their patrons’ privacy. They can then check their work to evaluate the risk of re-identification. They will be able to use several python libraries that can help them do this work efficiently, and will have the tools necessary to select new ones to help them with future challenges.