The Data Privacy Scientist, Claire McKay Bowen
I’m a principal investigator for many different projects that tie into our tagline at the Urban Institute, “Elevating the debate on social and economic policy.” The Urban Institute is a nonprofit public policy research institute that provides data and evidence for data-informed decision-making. We work in all kinds of policy—health policy, justice policy, tax policy. We have a dozen or so different centers focused on domestic public policy issues.
Often an entity such as the Internal Revenue Service or the U.S. Census Bureau has a data set that is used for public policy discussions but contains confidential information. These entities must carefully consider what information to release without violating a records’ privacy. Some people think there is a way to do this that perfectly preserves privacy and maintains accuracy. But that’s impossible, because privacy and information naturally oppose one another. The more you know about somebody, the less privacy they have. So how do you make these kinds of trade-offs in a way that respects individual privacy but enables evidence-based public policy? That’s my area of specialty.
Recently, I’ve been investigating how to implement a privacy definition called differential privacy. Differential privacy is a complex and nonintuitive concept, but here’s a kind of high-level overview.
Everybody has their own definitions of privacy and risk. For a long time, decisions about what information should be considered private were made ad hoc by those releasing data or data stewards. They had to assume how someone might attack the data (are they looking for a specific person or group of people?) and with what possible information and computing power. Differential privacy basically throws that way of thinking out the window, or at least starts with a clean slate, because trying to predict how somebody might attack data is extremely difficult.
Differential privacy basically says that you have to think of the worst-case scenario. You have to consider that somebody may have any possible version of the data you’re trying to release and any future versions—you have to think of the universe of all possible versions of this data set. And then you have to protect privacy in that context.
Most of my day-to-day work is assessing and communicating about methods that satisfy differential privacy. Although I’m always working in privacy, I get to work on different data sets for different areas of public policy all the time. I never get bored! The statistician John W. Tukey once said, “The best thing about being a statistician is that you get to play in everyone’s backyard.”
I have bachelor’s degrees in physics and math and a PhD in statistics. I went into physics because I wanted to know how the world worked. After working in several different labs, my then boyfriend (now spouse) pointed out that I seemed to like the statistics side of research. I realized he was right. When I entered graduate school in statistics, my advisor told me that she didn’t care what application I worked on as long as my research involved Bayesian statistics. That left the door wide open, so I looked at research funding opportunities to see what was there and what interested me. This led me to the field of data privacy and confidentiality, and specifically, differential privacy applications. I realized that it was a big area with a big impact and that the work was really interesting. I kept winning funding to work on applying differential privacy to data to expand access, and here I am.
My physics background has served me well. In physics you learn to be a critical thinker, to use what you do know to solve what you don’t know. Physics is also a very broad field—as an undergrad you’re trying to encompass a very large field and thinking of all the different ways you might tackle different problems. It also works to my advantage that I came from a very different background than most of my colleagues. Even though the field of expanding access to valuable data through data privacy and confidentiality methods affects everybody, computer science dominates the discussion. It’s been helpful to have a different perspective.
If I were to give just one piece of career advice to students, I’d tell them that if you don’t have issues of finance or family obligations or other constraints, you should optimize happiness. Easier said than done, I know, but this is the life that you live. You might as well enjoy it as much as possible.