If you had all of the data in the world at your hands, what question would you ask first?
That's today's big question, and my guest is Dr. Emma Pierson.
Emma is an assistant professor of computer science at the Jacobs Technion-Cornell Institute at Cornell Tech, and a computer science field member at Cornell University with a secondary joint appointment as an assistant professor of population health sciences at Weill Cornell Medical College. Sure. Why not?
Emma has published a number of game-changing papers, and we talk about those today and how they all tie together. She's written for the New York Times, 538, The Atlantic, Washington Post, Wired, all my favorites, and has been named to the MIT Technology Review 35 Innovators under 35 list, and the Forbes 30 under 30 in Science list.
Her team's work has helped unlock answers and solutions to some of our biggest, most lingering and also sometimes most urgent questions.
Emma and her team work diligently to develop data science and machine learning methods to study two vital, huge interlocking areas: inequality and healthcare.
-----------
Have feedback or questions? Tweet us, or send a message to questions@importantnotimportant.com
New here? Get started with our fan favorite episodes at importantnotimportant.com/podcast.
-----------
INI Book Club:
Links:
Follow us:
Find our more about our guests here: https://www.importantnotimportant.com/guest-stats
Advertise with us: https://www.importantnotimportant.com/sponsors
Mentioned in this episode:
Quinn: [00:00:00] If you had all of the data in the world at your hands, what question would you answer first? That's today's big question, and my guest is Dr. Emma Pierson. Emma is an assistant professor of computer science at the Jacobs Technion-Cornell Institute at Cornell Tech, and a computer science field member at Cornell University with a secondary joint appointment as an assistant professor of population health sciences at Weill Cornell Medical College.
Sure. Why not? Emma has published a number of game-changing papers, and we talk about those today and how they all tie together. She's written herself for the New York Times, 538, The Atlantic, Washington Post, Wired, all my favorites, and Emma's been named understandably to the MIT Technology Review 35 Innovators under 35 list, and the Forbes 30 under 30 in Science list.
Now, I wanted to talk with her so bad because her work, her team's [00:01:00] work, has helped unlock answers and solutions to some of our biggest, most lingering and also sometimes most urgent questions. Emma and her team work diligently to develop data science and machine learning methods to study two vital, huge interlocking areas, inequality and healthcare.
We've talked quite a bit about those on this show, but there's much more coming. This is Important, Not Important. My name is Quinn Emmett, and this is Science for People Who Give A Shit. In these weekly conversations like my one with Emma today, I take a deep dive with an incredible human working on the front lines of the future, and sometimes it meshes with the past to build a radically better today and tomorrow for everyone.
Along the way, we're gonna discover some tips, strategies, and stories you can use to get involved to become more effective for yourself, your family, your city, your company, and our world. Let's go talk to Emma.[00:02:00]
Emma, welcome to the show. Thanks for coming on today.
Emma Pierson: Thank you.
Quinn: Yeah, so I try to do my research on my guests and their work as much as I can. Again, I'm a very ancient liberal arts major, so it's very much mile wide inch deep, though, a hundred and almost 60 these, so the inch gets a little deeper each time here and there, but I was reading something, somewhere in my research about all of the things you have published over the past 10 years and all the writing you've done and all these problems you have applied your work to, and I thought that there was a quote that was really interesting, which is for all the power, and especially this week and going forward of machine learning or artificial intelligence.
You said that you specifically use it to try to answer old questions. Can you tell me a little bit about what that means to you and in general, if that were something that was standardized in some way. What are old questions to you?
Emma Pierson: I think what I meant by that was I'm trying to answer deeply human questions that you can [00:03:00] explain to your grandmother, and that in fact I do explain to my grandmother.
Quinn: The ultimate test.
Emma Pierson: Exactly. My grandma happens to have a PhD or two, so she's perhaps unusual as 90 years go. But what I mean by that is, we're not studying issues, technical questions that only nerds in basements care about, right? Like we might be pursuing them using cutting edge methods and terabyte scale data, but ultimately the questions we're trying to answer are things like, how much are rich people and poor people mixing in cities? Or are the police racially biased in whom they search? Or, why are Black patients with knee pain experiencing greater pain than apparently similar white patients.
So these are easy to explain and easy to think about, even though the methods you pursue them might be advanced AI methods.
Quinn: So I want to come back to your grandmother in just a moment. But it almost seems as if this is like low hanging fruit, right? Like you don't have to reach to try to find these questions, obviously.
And we try to [00:04:00] spin things in a very future positive way here as much as we can. But don't skip over the hard stuff obviously. Enormous systemic problems should bring great opportunities as long as we can work towards ways and reputable ways and measurable ways to achieve those or to at least work towards them.
But I'm curious of, and again, I tried to understand the scope of your work and the particulars of it as much as I could, but I wonder if you can talk a little bit about, of all of those, because there's very many to choose from among what you've done and what's out there available to you. It's a potpourri of old problems.
How many of them do you feel like we’re able to try to understand for the first time now with some of the tools at your disposal and your team versus ones we have tried to answer before and maybe not done a good job? Or are you trying to really get it over the hump and say, no, look like the, these are principles people have had before, but here is the data or are they entirely new ones where we go, Hey, listen, for the first time we can really measure this stuff.
So I wonder if you can talk a little bit about that.
Emma Pierson: Yeah, I think that's a great question. I think [00:05:00] in some cases, For example, let's take police discrimination. It would be incredibly arrogant and false to pretend that we are the first people who have looked at this question. Numerous prior sources of work attest to this being an issue.
But you know what we were able to do, I think is look at it on a scale that had previously been difficult. So specifically what we did in that project was we demanded data from police departments spanning the entire United States and collected that data into a single standardized database.
And then we analyzed it using machine learning methods. That were only possible to run at that scale due to a number of mathematical innovations. And so it very much was building on, an enormous amount of prior work as well as like the lived experiences of minority drivers. But it was doing so on a scale that had previously been difficult due in part to the fact that like police data in the United States it's just, it's a mess, basically. So yeah. In other cases I think you can do stuff that really is qualitatively new and difficult to [00:06:00] do previously. So I'll give you an example of this. When we were trying to understand disparities in pain between white and Black patients with knee pain, a central focus of that study was can we train an artificial intelligence model to predict from an X-ray of the patient's knee, their level of pain, and maybe pick up on features that the doctor is missing but are disproportionately affecting minority patients. So can we use AI to find signal on the image that the doctor is missing? You could not do this more than about five to 10 years ago, like the technology just was not there. Yeah. We didn't have the AI models that were capable of doing this thing. So it's intrinsically a very modern project that would be difficult to do before. All science builds on prior science in some way, but I think that's an example of something qualitatively new.
Quinn: When we analyze these things or when we seek to analyze them or hear about a problem, right? This that makes you go, oh, that's interesting. Or what the hell's going on with that? Which often the best science comes from, right? Which is the, what the hell? What is this? It seems like [00:07:00] with, again, whether we're analyzing policing, or pain or pregnancy or all these different things, college admissions, mortgages, whatever all this different stuff, which historically we've tackled pretty recently or trying to, it seems like there's three prongs that have come along in a increasingly powerful and structured way, but also we're finding that they're messy, and correct me if I'm wrong, it seems to be the actual computing power, the chips to be able to do these sort of things, the underlying foundational data, which, like you said, police data is mess.
Obamacare put a bunch of money to electronic health records and 12 years later we're getting there. It's not great. And then it's the algorithms themselves. And of course those come from us, the whole alignment problem idea. So where do you find that you are seeing the most opportunities open up on those fronts? Or do you feel like, okay, the technology is good to go, here's what I need better data from, or we need to really think about how we're gonna write these algorithms?
Emma Pierson: I think that's a very nice what, I don't know, trichotomy or something like that.
Quinn: It's super [00:08:00] simplified. Of course.
Emma Pierson: Yeah. No. But I think that's a good way to think about it, why are we seeing such unprecedented gains. Yeah, like part of it is hardware, part of it is algorithms and part of it is increasing availability of data. I don't know that I would necessarily prioritize any one of those three is particularly important.
I would say there's certainly cases where like data plus simple models, is on its own sufficient. There are cases where you can improve on this state of the art using your laptop, using regression models that were, I don't know, hundreds of years ago or something like this.
And then there are other cases where making progress really depends on having state-of-the-art hardware and state-of-the-art algorithms. Anything in deep learning is gonna be an example of that. Deep learning being a branch of machine learning, which is very useful and powerful for certain types of data.
For example, anything to do with text, anything to do with images. In other cases, really old school modeling and relatively old school hardware will get you a long way.
Quinn: How often are you starting from scratch and how much are you trying to build on those or stand on the shoulders of maybe some of that work that's been done?
Or does that not exist? I don't [00:09:00] really know what I'm talking about.
Emma Pierson: I would say I always do the simplest thing that doesn't seem completely stupid.
Quinn: My God, if I could just like literally frame that on the wall for my children and like that would solve a lot of problems.
Emma Pierson: Yeah. I just think with math, apparently still waters get deep incredibly quickly. And I really think, you can have two variables in a scatter plot and confuse yourself about what's going on if you think hard enough about it. And so I try and keep things as simple as I can. That said, there's unavoidable complexity in modeling a very complicated world.
Like you'll be like, ah, that's not quite right, this there's some aspect of the data I'm not capturing here, and it keeps you up at night. You start with the simplest possible thing and then you find the complexity becomes unavoidable. And I think that's the general theme of my work.
Like build on past work where you can and then innovate where you can.
Quinn: Could you talk me through in a specific example where you worked in that way? Because I am curious, I try to frame everything here as future positive, action oriented as we put it, this whole realm of what can [00:10:00] we do?
What are the measurable, reputable things we can do? And so a big part of that is really thinking through how do we approach problems and how do we approach the solutions or action part of this work, right? It's easy to say let's find things that'll help us be less racist or less sexist.
Or both or all the above. Again, I'm curious about like how you hypothesize a problem, explore the data, and come out on the other side. When and where that does lead to maybe a mechanism of some sort of action.
Emma Pierson: So maybe take the policing data as an example. Okay so the original goal of that project was like, let's see if the police are biased in whom they search after a stop.
So they stopped a driver, then they have some, they search some drivers as their sort of racial bias in whom they're searching. Now there is like very simple first pass approaches you can use to this problem. For example, you can look at the fraction of drivers they search, break that down by race, you see massive disparities where they're more likely to search, Black drivers than white drivers.
Now, someone looking at that might say okay, fine, but maybe some drivers are just more likely to be carrying illegal things, and that's why the police are more likely to search them. So it's not [00:11:00] actually biased on the part of the police. It's rather an example of broader systemic disparities.
So then you might say, okay, fine. Let's not look at how likely they are to search drivers. Let's look at how likely those searches are to find anything. And so the intuition would be like if searches of Black drivers are finding something, only 10% of the time, but searches of white drivers are finding something, 90% of the time it's suggesting that the police are searching Black drivers on the basis of less evidence.
Whereas they're searching white drivers only if they're acting really sketchy. That again, is like a relatively simple method, like it's just a fraction. But it's intuitively quite revealing, turns out there are statistical issues with that method as well. So then you can get into sort of more complex methods.
But I guess the point being, you're starting with the simple things. Like you're starting with fractions, basic descriptive statistics. That's gonna get you a long way to understanding basic aspects of the problem. Then you can build in complexity iteratively, I think.
The broad point being that like there's a cost to increased complexity as academics, we're strongly incentivized to do fancy maths. But the problem is the fancier the math gets like the [00:12:00] harder it is to implement, the more likely you've made a mistake or missed some subtle thing. This is one reason I really am a fan of simple descriptive statistics.
While they might be an imperfect description of what in fact is happening, you can at least be confident that like you understand them, you didn't mess something up, et cetera. And I think that's like pretty simple, you know that's important.
Quinn: I wanna hold on that idea of complexity for a moment, because again, because this is, we're broad here.
It's not, there's not everything there's a specific prism of what we try to tackle and understand and help folks with sort of the make or break things. To put it lightly. It's interesting because you've got climate modeling on the one hand, which everybody gets it now, but you talk to folks like Dr. Kate Marvel, who for a long time was at NASA Atmospheric Science, trying to do this kind of stuff and going everyone's asking for projections. Where are we? Are we accounting for everything? What are the levers we're pulling? And it's the most complicated, massive data you could ever imagine.
It seems like simplification is where you get in trouble. But at the same time, so much of, as I've tried to understand better, AI and machine learning and [00:13:00] deep learning and things like that. When you step outside the, again, the very specific prism of this computer is teaching itself how to play checkers or Go, or chess or whatever, when you're getting a little more broad, especially when people are involved and people are involved at every level, data becomes so important and it also becomes pretty controversial, right?
Like we see with all these tools that are coming out every day now. 50 tools a day, some people are going, okay, but where is the data coming from? Who is providing it? What is it made up of? Did they have permission, et cetera. All these different things, which are all open-ended, complicated questions in itself.
So I'm curious with something like, and you hinted at local police data is not so reputable. I wonder if we can go back to the beginning of that project and say, you tell me like what are the questions you asked before you start to go try to get data? Is it, what kind of data do we need? Or what does it need to look like?
What sort of standardized data do we need? Where do we get it? What qualifies as reputable or useful to us? Or are those two different questions? I'm so curious because it is such an important part of how [00:14:00] we're gonna conduct these things and use these tools.
Emma Pierson: It's a great question. The data collection effort of the project was led crucially by data journalists who had done a lot of work with policing data in the past. Specifically like Cheryl Phillips, was a journalist whose work has won multiple Pulitzer surprises for her work on policing specifically. So she knows lot about policing data, about how you file the public records requests and stuff like that.
She has been instrumental. And then I think, the project would never have been done to us high quality without her. And I think broadly domain experts. If you're working with medical data, you want a clinician like, if you're working with legal data, you want a lawyer.
This is just a broad and recurring, because the problem is being technically smart is not enough to anticipate like all the weirdness that pops up in police data. It's like somewhat helpful, I think, certainly when we were looking at police data, if you're being a conscientious data scientist, stuff you can see by eye that can pop out at you.
But then other stuff like, you wouldn't necessarily anticipate. So here's an example. So like the Texas Police [00:15:00] Department was like, systematically mis-recording drivers with obviously Hispanic names as white. They were doing so to improve their numbers. That's like a, I don't know, how do you know to look at the name cross reference it against the race? It turns out that was exposed by some pioneering journalists down in Texas who had looked, whatever. So then we as data scientists were able to replicate that analysis and confirm that yes, it was an issue in our data, but it's hard to anticipate all these things ex ante right? Data can be weird in a billion different ways. Having people who have worked carefully and deeply with the particular type of data you're working with is like enormously important. The other reason those people are important, by the way, is that they help you ask the important questions of the data.
So in machine learning, we love to just predict things, but it turns out that a lot of things, predicting them doesn't really matter. And doing a good job of predicting them doesn't matter. And so having domain experts to be like, here are the questions you should be answering, it's invaluable.
Quinn: That is always one of my favorite questions is to ask a domain expert look, I've tried to do my homework here. As I always say to people, I try to get to almost 301 level, which is a little [00:16:00] beyond fake until you make it. But also treading in dangerous waters because it becomes very clear that you're pulling at a string that the sweater's very large.
But I do love asking what questions didn't I ask here? And often they can be much more nuanced and require years of expertise or education, but sometimes they're incredibly obvious just from someone who's been doing the work every day and going no, this is what you're missing. No, this is what you need to start with. And I imagine that's gotta be pretty humbling at times, right?
Emma Pierson: Yeah. Humbling and terrifying. I think. I placed late night calls to my sister, who's a medical student, and her husband is in training to be a doctor now, to just ask them like basic dumb medical questions and no one will make you feel dumb like your little sister, right?
But yeah, human biology is amazingly complicated and you really, you can't work it out from first principles.
Quinn: Nothing is funnier than my 10 year old the other day, who's like my friends and I invented this. I'm like, no, you didn't. That's been around for 40 years. Stop.
My wife is a incredibly hardworking, talented, successful [00:17:00] screenwriter and producer. And, but at the same time, we're very straightforward to them that questions are what matter, and there is no dumb questions. And it's one of my favorite things to, again, talk to someone like yourself who's just like in a completely different stratosphere and just ask the dumbest questions because for me it's so helpful because otherwise I'm not gonna understand the rest of the conversation.
But also out there somewhere among our listeners, is somebody going I really wish someone would give me like the lowest common denominator version of this so I can go on the ride with them. It's like in a movie where someone explains the one plot point and they never talk it about again. It lets the audience go, got it.
They needed to do the bank robbery because X. Great. Let's move on. But that's great. I love that even with everything that you have covered, you still call your sister in the middle of the night and go, what on earth is happening in this X-rau? I wonder if you could talk a little bit, because you wrote very candidly in the New York Times about your family story with data as it were. I wonder if you could start with your grandmother and her two PhDs and gimme the story. Your family seems very ready with a sister who's a medical student or not, husband's a doctor and your grandma's got two [00:18:00] PhDs, but I'm just so curious how that influences you as you went on that journey.
Emma Pierson: Yeah. I think, when I say AI is a deeply human enterprise. Certainly for me, it has been, I've always been a math nerd since I was a little kid. I dressed up as a chess board when I was eight, so I've been consistently geeky.
Quinn: Nailing it. Nailing it.
Emma Pierson: Oh, yes. Very cool. But I think when I was 12 and my mom was 45, she was diagnosed with breast cancer. She's fine now. She recovered, but that's a striking experience to have as a little kid. And what my parents didn't tell me at the time was, that in fact, the reason she had developed cancer at such a young age was due to a genetic mutation that runs through my family and confers an abnormally high risk of many cancers, including, breast and ovarian, but also other cancers as, as well.
And she got it from her dad who ultimately died from cancer a couple years later as well. So when I was 21, I found out I carried the [00:19:00] mutation myself, it's called the BRCA1 mutation, and Angelina Jolie is the much more famous carrier of the mutation. I think that has profoundly influenced my interest in AI and specifically the applications of it.
I went to college as a physics major, and I was intending to use AI to study galaxies. And when I actually did that, what I discovered basically was I liked the AI part and I didn't like the galaxies part.
Quinn: What's your problem with galaxies?
Emma Pierson: Galaxies are fine, but they're mega parsects away.
And sorry to be clear, I'm very glad some people are studying galaxies. It's really, I think, a matter of intellectual taste and personal history. No hate to the cosmologists. Some of my best friends are cosmologists. That's not really true. And in particular, I think, I have great respect for doctors, but I also as a patient have quite a visceral anger at the inadequacy of our medical knowledge and decision making. I've spent more than my fair share of time, I think, in hospitals and like hospitals are fascinating places, but they're also deeply suboptimal places.
Quinn: Yeah. Even the best of them. [00:20:00]
Emma Pierson: Even the best of them. And I think, the other thing is my interest specifically is in health equity and certainly, medical decision making becomes particularly inadequate and medical knowledge particularly incomplete, as you've noted when it comes to minority populations. Including women, racial minorities, underserved populations of all kinds.
There's the intellectual awareness that there's a lot of low hanging fruit here due to the suboptimality of decision making, but there is also the visceral and emotional awareness that like, this is not okay.
Quinn: My wife and I have been very lucky recently to have some experiences with the Mayo Clinic, like you said, and again, nowhere is perfect, but it is a little bit like visiting Star Trek for a brief period, both in the way the soft power and the heart part, how institutional it's built and literally how the buildings are constructed. Like the way you go about the whole thing, the way doctors are compensated, the way it's value driven and also makes you go like, why is this not the thing for everyone everywhere?
What are we doing? But it's a little bit like, and I was talking to my daughter who has decided she's a tree activist, which [00:21:00] is very exciting. She wrote a letter to President Biden and got a letter back.
Emma Pierson: Oh, that's awesome.
Quinn: Oh my God, she's so excited. We were doing some contacts.
We’ve begun puzzling because it's a really nice way for all of us to calm down at night. And we've had 10 successive variety of puzzles that are like incredible women through history and this ant this. And one of them, was everyone from Marie Curie and this and that.
And under each of their names was a little tagged with what they actually worked on. Their field or whatever might have been. And many of them said, activist. And she said, what's she’s eight? And she said, what's an activist? And I said some of this and this. And she goes, oh, I'm a tree activist.
And I was like, hell yeah. Fuck. Yeah. That's awesome. It's really frustrating to experience something so tremendous and not go I don't understand, but my point to her was like, you can't just fight for trees. Like you also have to try to change the system because it's like saying yeah, we should have a hundred percent clean energy.
Of course, but politics are complicated. We've gotta fix those as well. It's a two front fight here. So that was my point with the Mayo side, again, it's very easy to feel [00:22:00] like, oh my God, I'm being taken care of. But also what the hell are we doing? Like, how could you not be angry in that situation to understand that we only got in because a friend was able to get us in there.
That's crazy. No one can experience that level of care. It's crazy. Or any level of care.
Emma Pierson: Yeah. Yeah. Certainly, yeah. I have my own corresponding experience because I live in New York and I go for cancer checkups at Memorial Sloan Kettering, which is, there is no better place to get cancer care.
I feel this sense, I like go to the hospital and take a little picture of the sign and like text it to my friends oh, I'm here. And they're like, what? Who cares? But anyway, so there's definitely a fan girl aspect. I will say even there though, you see the boundaries of our medical knowledge, I think, it's can you screen for these cancers I'm at high risk of? And the answer's kind of for some of them, no, not really. We just don't, we don't know how to do that yet. And, I'd like to, I'd like to be part of the okay, now we know how to do that.
Quinn: It is a hell of a thing. We've experienced some things lately and I think a lot of folks, millions of folks dealing with something like Long Covid are experiencing what a lot of folks who are immunocompromised or [00:23:00] have recurring pain of experience, which is like process of elimination is both great and difficult because hopefully you cross off some pretty rough stuff early on.
We have clear black or white tests for, but on the other hand it makes you go, okay, but what the hell is it? And it's pretty hard to believe in knowledge and science and all these tools and all these incredible things we can do for the best of the best to go yeah, I don't know. And to say yeah, we can't really figure that out.
Maybe it's this. And then you talk to someone else and they go not sure. And you go, you're it. That's it. This is the top. And again, it's this feeling of, like you said, then you dial it back down to this work you did. And I wanna dig into this more, about pain and I've tried to cover here Black maternal health. Maternal health in general in the United States, which is an atrocity. Black maternal health, which is a nightmare and how even women like Serena Williams and Beyonce just suffer so greatly. It doesn't matter your wealth or your fame apparently, but just these two opposite sides of the puzzle. When I was reading your pain paper it seemed, correct me if I'm wrong, but was [00:24:00] it somewhat referencing like the scale when you go in the doctor's office and it's like point to the face of what your pain feels like, is that sort of part of the association when we're asking, trying to say, oh, from this x-ray it's maybe this level of pain.
Can you explain that a little more? Because that feels like one of those problems where I go, I do not know where to start with this. Like, how does one try to understand that?
Emma Pierson: The actual pain scale we ask the model to predict is what's called the COO pain score. This is used for knees specifically. So it's not like that generic thing.
You basically ask patients how much pain do you feel when, like walking up the stairs, when doing various activities at various levels of strenuousness.
Quinn: Okay. So it's specific and standardized.
Emma Pierson: Yes. Yeah. It's for knees specifically.
Quinn: Gotcha. And then, how do you get into to that side of the data. I've been talking to a lot of folks about what a nightmare sort of clinical trials are for, again, a huge variety of very complicated reasons here. Versus somewhere like in the UK, which again, imperfect, but data collection is opt out when you're born, for the NHS, which [00:25:00] turns out solves a lot of problems.
Again, very imperfect. But we saw in Covid, like they were able to run some trials very quickly on data they just inherently had that we would have an impossible time getting in for a thousand different reasons. But I'm curious knowing that and how fragmented it is and how unstandardized it is even now with Epic having whatever, 80% of the market.
Again, like the police data start to approach them and go I need this specific sort of thing to try to answer this kind of question because it's a mess.
Emma Pierson: Yeah. Medical data in the United States is a mess, as you note for a number of reasons. One is that it's highly decentralized and people don't wanna share it. We have very restrictive privacy laws, which can make it hard to share. There are enormous institutional barriers to do so, there are financial barriers, reasons people don't want to give you their data, et cetera, et cetera. In the particular case of that the pain study, the data comes from NIH funded study and the data is essentially publicly available and it was enormously important to our research.
And another reason it was really important is because the authors of the study, to their credit, had gone to specific lengths to collect a racially and [00:26:00] socioeconomically diverse data set, which was crucially important for what we were trying to do. We were trying to study racial disparities in pain and, both of those things, without a data set like that the study would not have been possible.
Like we needed x-rays linked to pain and to various other measures of knee health. And it had all those things and it was racially diverse. And I think broadly there is a push in the United States to make larger data sets more easily available to more folks including by my collaborators.
That's gonna be enormously important. Medicine is definitely an area we're making large data sets available to these models is gonna be a crucial aspect of progress. It's not just about the size of the data set, by the way. It's also having diversity in what hospital was it taken at?
What demographic groups was it taken at? Because otherwise what happens is you end up with a model that was trained on this one, Boston Hospital on all white people or whatever, and then it gets deployed much more broadly. That stuff happens.
Quinn: Sure. And I've been trying to understand sort of the bottlenecks, the systemic bottlenecks, whether intentional or not, around so much of the clinical trial stuff.
And it seems again, traditionally for a variety of reasons, it turns out we'll have [00:27:00] tested medicine on a very limited dataset, a not diverse dataset. And again that's for a thousand other reasons, but we have to do better there. But it was interesting doing some reading on some folks, and I'm forgetting the gentleman's name, I'll send it to you later.
Talking about maybe we're asking too much of these clinical trials, like trying to find every data point out of these things and it makes it hard to recruit. It makes it very incredibly expensive. These things are un unbelievable and in ones where like an Apple watch isn't gonna get it done yet because that's still incredibly limited and unproven.
It's not an FDA device, it just seems like we could do more if we actually simplified what we're looking for. Am I understanding that right?
Emma Pierson: Do you mean in the case of getting data from clinical trials?
Quinn: I think it that's the idea of getting data from clinical trials where they say it's so hard to get these off the ground because everyone starts adding, we need to try to factor these 10 things and these comorbidities and this lifestyle thing and all this.
And they're like, but that's not actually the question we're trying to answer, and so it's like the enemy of done is perfect or whatever it is.
Emma Pierson: [00:28:00] Whatever the saying is. Yeah. Yeah. It's a good question. I've worked with, I actually don't have not really worked with data from clinical trials, so I'm less well-placed to comment on that.
But definitely I think, with respect to machine learning, having more actual clinical trials with the efficacy of algorithms would be, great in rolling these things out actually and being sure that they actually work.
Quinn: It comes down to this idea, and again, my wife talks about it with making a movie involves hundreds of people.
If not thousands of people. And now you've got all these executives involved and this much money and egos and all this and what she always, part of the reason I think she's successful besides being the greatest human alive, is she always tries to point people towards the point of our goal is to get the movie made.
Everything else has to answer to that. And that's one of the reasons I really love Mariana Mazzucato. She is this book, oh gosh, Mission Based Economy, I think it's called. And the whole point is we need a very clear, measurable outcome and we reverse engineer everything else to that.
It's put somebody on the moon and bring 'em home. That is very transparent and measurable. And everything else, your teams, your decisions, your budgets, everything else. If paint color is not gonna do that, land 'em there and get 'em home. It's [00:29:00] outta here. That's it. And obviously that can be simplistic, but the whole like get the movie made is like everything's gotta go towards that.
And I understand you feel this way about that or the budge is this and that, but it seems that way for the clinical trials too. It's if we're not running these because we're making them too complicated. We're not getting what we need out of them, and more people are gonna suffer and the whole system's gonna cost us more.
That seems to be applicable when, like the incredible super spreader research you did with the mobile phone data and things like that. These are actionable things we need to try to undertake here and we have to do this work.
Emma Pierson: Yeah, I definitely think being single-minded about what are the decisions I'm going to seek to affect and like, how can I do so in a rigorous way I think that's crucial.
Quinn: And at the same time, like we have to be more broadly inclusive of these things. It's a complicated one. I wonder if you can talk a little bit about that one, the mobile phone data one and the mobility one. Was it early 2020 you did that? Late 2020? It feels like 600 years ago.
Emma Pierson: I know. We started that work in about March, April, 2020. And I, this is actually work led by my partner who I would [00:30:00] say in terms of single-mindedness is a force of nature in that regard. So yeah, basically what the project does, the project seeks to model the spread of COVID19 using large scale data on human mobility.
So obviously mobility profoundly affects how the disease spreads, right? People in the same room, bad things happen. So what we did was we took cell phone data tracking how people flow from neighborhoods to places like, people from the Bronx go to this bar or something like this.
But then imagine that multiplied by literally a billion. So a network with 5.4 billion hourly edges. Big networks. We use machine learning to infer those networks, tracking the flows of people over time. And then what you do is you put an epidemiological model on top of those networks basically modeling, if you have all these people from neighborhood A and neighborhood B meeting in this bar, and this is how big the bar is, and this is the fraction of people who are sick from each of those two neighborhoods.
How would you expect disease transmission to occur in that bar? Okay. Then you do that a gajillion times to forecast the spread of the disease hour by hour. And once [00:31:00] you have a model that can do that and we fit that model from very early Covid data, right? Because that's when we were doing the work. March to May or something like this.
Then you can ask a whole bunch of questions like, what might happen if you reopened restaurants? Like, why are poorer neighborhoods getting infected at higher rates? What mobility patterns are giving rise to those?
Quinn: It is such an invaluable time to do that work. The n was everyone on Earth. What seems lost, and again, there's been such a hubbub, understandably about, the stories we tell ourselves and each other now about what happened the past three years and the decisions that were made is to understand that it became infinitely more complex as it went on, because the second people started getting infected and not dying.
And the second people started getting this shot or that shot or this many shots or this, the factors became innumerable almost at that point. So the research you did is really so important. I wanna talk though for a second because, and I wanna understand and help folks understand, because one of my favorite publications is the Markup.
I'm not sure if you're familiar with their work. They do incredible journalism. [00:32:00] They talk a lot about geolocation data and data brokering and what a nightmare it is and how we don't have any privacy laws basically. And things like. How do you go about finding that data, but using it, first of all, making sure it's anonymized, but also using it in an ethical way, for the light, not the dark side, but are there places that are able to do that?
I wanna seek to understand that, because again, there could be students or researchers are out there who are going like, Hey, I'm trying to do the right thing, but at the same time I wanna make sure I'm not breaking a bunch of laws or going down the privacy rabbit hole.
Emma Pierson: Yep. I would say first that the way geolocation data, and to be clear on what that is, that's like data on exactly where you go.
So each data point would be like latitude, longitude time point infer from your cellphone, right? Yeah. I think like the broad way that's regulated in the United States is a shit show, it is embarrassing. This is a major problem. This needs to be improved. There needs to be better regulation of this data.
And I think a lot of what is currently legal to do, probably should not be legal to do. I was reading a story in the Washington Post today about how like the Catholic church was tracking the [00:33:00] whereabouts of gay priests. And it's like, what the hell? What, are you serious?
So as a social scientist I'm conflicted because on the one hand I have deep disagreements with the way this data is collected and regulated. On the other hand, it is clearly invaluable for a lot of really important health and social science questions. So you're always fighting this tension in this battle.
What I would say is, in terms of how I reconcile it, for one thing, a lot of this data that you get is an aggregated form, which is to say it doesn't tell you like, Emma Pierson went to the grocery store and then she went to the hospital and then et cetera. Rather, it tells you on March 8th 50 people from this neighborhood went to this hospital.
Which is a lot less sensitive. And that in fact is how we do all the COVID work. It's all in aggregated data and we make no attempt to trace individuals and I don't even know if you could, if you wanted to. I've also done work on individual level mobility data and there it is much more sensitive and I am much more conflicted.
And I think the way I think about it is it's always a conflict between how important is the question you're seeking to answer and is there another means via which you could answer [00:34:00] it? Definitely am ambivalent about it and it's actually something I talk to my colleagues a lot about.
Quinn: It's interesting because that last point of like, how important is the question we're trying to answer. And how many, the byproduct of that is how many people are historically or currently affected by this or something like COVID. We just, everyone, we need to know this. But also, is there any other way to go about this?
And it goes back to the beginning of our conversation, which is we're able to do a lot of this work for the first time, and it's both very tempting then to use this data because it's so powerful and so encompassing. But at the same time, it's also tempting to go holy shit, we might be able to answer this question for the first time.
And thus, help some folks or point towards some ways where these inequities are structurally designed or we didn't know. We just didn't know, and look, we could fix this. It's kinda like when Atul Gawande wrote the Checklist Manifesto and you go if only we had known then, like literally making checklists could save lives in hospitals, type of thing.
But it brings me back to the health side because again, so many of the stories about and I try to shout it from the rooftops about, oh, the [00:35:00] Instagram ad you got for the mental health thing, just sign into it and some therapist will talk to you. Oh, but also they were selling your data and it's like goddammit, again, like this data could be so useful, but at the same time, like there has to be better regulations about this.
Like it's crazy because we're both hindering these tools or we're letting these tools straight outta the box with zero trust around them whatsoever. It's just. Anyways, that's my rant. It's incredibly frustrating.
Emma Pierson: No, it is frustrating. I will say, on a more optimistic note, I think there are data collection efforts which are exemplar in their efforts to protect, privacy and be meaningful about consent.
So an example of this would be like, there's something called the All of Us Research Study. It's run by the NIH. I was using the data as a researcher. And then I was so impressed by their consent and privacy processes that I actually, I went and signed up for it as a participant, just to find I don’t know, and so now they have my DNA and I guess my PhD students have my DNA, which now that I think about it might get a little weird.
But anyway, I trust those students, which is to say I think there are people who are seeking to do better. And I agree with you, the people who [00:36:00] abuse the power and granularity of these data sets without adequate user consent. They're poisoning the well for the rest of us, in addition, just to committing massive privacy violations in and of themselves.
Quinn: Yeah. And again, it almost seems like this such a huge opportunity for us to, especially in this late COVID stage, to look around and go, there's so many opportunities both comprehensively, more broadly with all of us, but sector by sector wise to look and say for people to do good, say, Hey, we're gonna collect this data, but we're gonna be incredibly transparent about as much as we can, and this is who we're licensing it to, and this is what it's used for and all these things.
Because again, like we can do these things, we can do incredible things for the first time, and we need to do these things. For the first time, even if they fail or they don't come up, what we thought, which is just how science works, but we've gotta start setting examples of how it can be used, in an actionable way, but also in the most reputable way humanly possible, [00:37:00] right?
Because right now it's a little frustrating to see that trending in the other direction, but maybe later I would love any examples you could send of policy shops or whoever it is that are collecting this sort of thing and sharing it in a reputable way. That's really interesting.
That's really helpful. Here's a question for you. With everything going on and this has been written about a lot, and I want to talk a little bit about your team. If you could, it's easier to say AI made this decision. We don't know, and it's a black box and this and this, the algorithm, that third piece, right?
The chips are done. We've talked about data, the algorithms, it's easy to say the algorithm, identified Black people as chimps or whatever it might be or Bing was crazy and tried to break up a guy's marriage. Whatever it is, they're written by people, right? And so we bring who we are to these things and that's why it does obviously matter so much who are at these companies. Is there a chief liberal arts person at these companies saying, should we do this? Who's gonna be affected? Who gets to make these decisions? Who gets to make the decisions about who's making [00:38:00] the decisions? How do you build a team and choose collaborators? Knowing that is such a vital part of the process, knowing that you are really responsible for not just choosing the data, if you can find the data you need, but executing on it, and who gets to make those calls?
Emma Pierson: Yeah, I think it's crucial. It's very clear that people's life backgrounds at least very clear to me, both from my own anecdotal experience and just from seeing the research field is it's obvious to me that people's life backgrounds influence the data science problems they choose to pursue and how they pursue them.
And perhaps a canonical example of this is, Joy Buolamwini observed this computer vision system didn't recognize her face unless she wore a white mask, right? Because they didn't perform well on darker-skinned women. And then she goes on to do like a series of seminal and world-changing studies on this, which really have influenced the way major tech companies or are marketing and producing these products.
The seed of that project was intrinsically her own personal experience. So I think it is absolutely crucial [00:39:00] aspect and, now that I've been afforded a small measure of power in academia, not very much, but slightly more than a PhD student. I do absolutely try to recruit teams that are more historically diverse.
In many ways I think than has been possible in academia. I think I'm still not doing anywhere near as good a job at that as I'd like to, frankly, I, to make a particular comment. I think it's particularly bad that many of the teams I've worked on where we were working on questions of racial disparities, they did not have a lot of Black or Hispanic co-authors.
That's a direct consequence of the lack of diversity in academia, but it's clearly undermines I think the work, and that's something that in particular we need to seek to remedy. But I think, to the extent that I can choose collaborators on projects, I do try and diversify the teams that work on these things.
The other thing I think as I mentioned earlier in this conversation, is really important is you need people with specific domain expertise in the domain you're seeking to do, like right now I'm in the market for a geneticist. So if any of your listeners [00:40:00] happen to be geneticists, I need a geneticist because I don't know anything about genetics.
This problem occurs constantly. So I think both in terms of sort of diversity of life experience and background, that's crucial to me. But also diversity of like academic background and expertise in the specific area. Both of those are instrumental to doing good work and historically computer science research has not been particularly good at either of those things.
Quinn: Yeah I hope we can keep doing better on that front but like you said, some of it, so much of it is a structural pipeline problem, right? You can look around and not find these folks. But also it's because they're not in academia. Okay. Why aren't they in academia? We had conversations with a couple doctors at Johns Hopkins who did some really interesting work, they're surgeons, but also do studies on how Black heart transplant survivors fair compared to the rest.
And it's easy to say only 5% of doctors are Black. And that's part of the problem. But it also makes you go boy, that is a very small number. And when we talk about pain and when we talk about maternal health outcomes and all these other things, that's gotta be part of the discussion is like, why not?
What [00:41:00] is going on with medical schools? What is going on with obviously college and all these different things that we keep really shooting ourselves in the foot and continually hurting these populations. It's not like we can magically make these candidates appear.
Emma Pierson: I think it is. It's on us not to just be like, oh yeah, that's a pity, but that's upstream of me. There are decisions that we can personally make and then advocacies we can personally make and there's a reason I'm making these comments on your podcast for example, so I so I think yes, these issues are systemic, but at the same time, like I think we do have efficacy and ability to change these things a tiny bit at a time.
And I think it's incumbent on us to do so for sort of the quality of the work itself and the rigor of the work itself.
Quinn: All these incoming calls from folks over the past couple years with, whether titans of industry or school teachers or whatever, what can I do? It was very early easy in Covid if you weren't doing the work you were doing to, for the answer to be.
Nothing. Stay home. It's very easy to feel impotent in that situation. It's very easy to feel impotent and frustrated or like, fuck it then if the answers are well, it's systemic. And so I settled on [00:42:00] this idea of like, all you can do is all you can do, and it's this version of like control, which you can control, but the idea being like it's two parts.
It's like the first part. All you can do really needs to be, like you said, if you have any sort of agency or power whatsoever. Like you said, as you're starting to gather power there, like use it for all you got, because that is really gonna be your sphere of influence. And these things are additive, right?
It's infuriating, gatekeeping around climate action for instance. People are like, individual actions, get people hooked. And it's like no, those don't matter, its systemic stuff. Jesus Christ, we're not getting anywhere. We’re social beings. There's plenty of studies that's say if you have solar on your roof, I'm gonna look at it and be like, shit, maybe I should get that.
That's pretty cool. It works like, Use what power you do have, and it really does matter whether you're running a lab or one of my best friends works at a research hospital in southwestern Virginia and his entire job for the past few years has been trying to get people to stop coming into the emergency room so often, right? Because it's costly to them. It's costly to the system. It clogs up beds, but it also makes you step back and [00:43:00] go, why do they keep coming back? And part of the answer is they don't take their medicine, they leave. Doctor tells them, Hey Jeff, you've been here three times in the past six months. Take these pills.
You don't have to come back. You won't get the pills, this and this. But he's trying to use that there. It can seem like small potatoes in the general scheme, but it is applicable everywhere and hopefully we can build on those things. Before we get to the last couple questions, I do have to ask, what are your white whales. If there are any. And what are the specific obstacles there?
Emma Pierson: What’s a white whale? Sorry, I don't know the idiom.
Quinn: Oh, white whales. I'm so sorry. Moby Dick.
Emma Pierson: See, I'm not a liberal arts major.
Quinn: Thank God. Jesus. There's enough. Oh my God, please don't. What are either the problems or the questions that are out there, that you're like, one of these days I'm coming for you. And either you're missing a team member, a geneticist. Like you just hinted at, or you're missing data where you've been like, that's not good enough yet or, I dunno where to get it. But are there any lingering big ones out there that you're like, when I get there, I'm gonna try to tackle this one? Is there anything sticking out? Maybe you can't talk about them, but I'm [00:44:00] curious.
Emma Pierson: So I would say in my field of medical machine learning broadly, I think the central and critical challenges, how do we go beyond, we have algorithms which are very good at making predictions to okay, so now fewer cancer patients are dying too young.
Like we need to close the gap between, in theory this is predictably useful to, okay, patient outcomes are actually better. I think that is the central overriding challenge, which our field is on the cusp of confronting in terms of problems that personally compel me. I think I'm viscerally compelled by certain problems in sort of women's and maternal health that I didn't do as much work on in PhD as I wish I could.
Those problems include, I'm very interested in a variety of cancers that disproportionately affect women, including breast and ovarian cancer. I've always been very compelled by intimate partner violence, and that's a problem I've just started to work on. It's one of those problems where you feel that Darth Vader, like rage about, or maybe Darth Vader doesn't get angry, but whatever. Like the Sith having a hundred percent, anyway [00:45:00] sorry. Sorry. Moving on
Quinn: No, never apologize for a Darth Vader reference. I get it.
Emma Pierson: But and then I think also, problems around maternal health, like you mentioned, the striking racial disparities around maternal health, which are, they're outrageous.
They're enormous, they're unconscionable. I think that, and stuff around like the enormous prevalence of still births in the United States, which are shockingly high. And if you actually think about what kind of experience that is, problems like those are particularly emotionally compelling to me.
Quinn: What is missing for you to be able to tackle those beside the bandwidth?
Emma Pierson: Often it is data issues. That is one issue, like how do you get access to the data? And then I think the other problem can often be, what actually is the right angle on this problem for a machine learning person to work on?
For example, still births are an enormously upsetting problem. The question is what actually is the predictive target that one should look at? Why might we believe that there is signal here? How might that signal inform decision making? These are very cold, very analytical [00:46:00] questions, but they're crucial questions if you're gonna actually work on this.
Quinn: That's all very fair. This has been fantastic. I could badger you for hours, but you have world problems to solve here, so I won't abuse that by any stretch. I got a last couple questions I ask everybody if you don't mind, and then, we'll get you outta here, Emma. When was the first time in your life, and it could have been last week though I doubt it, you dressed up as a chess board when you were eight. When you felt either yourself or your team or your family together, whatever it might've been, the power of change or the power to move the needle on something where you were like, oh, shit, I can do this, or I did this, however small it might have been, but it was the hook for you?
Emma Pierson: When I started using AI algorithms, I got like a little bit struck by lightning when I was a little kid. The feeling of wielding an AI algorithm. There's like a thunderbolt sort of Zeus feeling, and I think I'm still a little bit hooked. I'm still a little bit hooked on that feeling, which maybe I had for the first time when I was, I don't know, 17 or 18 years old.
Quinn: That's how I feel using literally basic, if I am able to get the cells to [00:47:00] work in Excel, I get the average thing to work and the number comes out, I'm like, this is amazing .
Emma Pierson: See, I'm still not at that stage with Excel, frankly. So there you go.
Quinn: Yeah, I think it's fine. I think you're gonna be okay.
Emma, who is someone in your life that has positively impacted your work in the past six months?
Emma Pierson: Can I say my girlfriend?
Quinn: Are you kidding me? I just talked about my wife for 45 minutes. Of course you can say your girlfriend.
Emma Pierson: Yeah. I don't know. She's also a computer scientist and she's just better at a lot of things than I am, frankly, she makes all my work better. She asks questions. I don't think to ask. She makes prettier figures than I do. I don't know. Yeah, she's really good at her job and I'm constantly grateful for her feedback.
Quinn: I love that.
Where would we be without our partners? I truly, I joke to my children who I think maybe this is too dark, but I'm just, I like God. I wake up in the morning, I look over and I'm like, mom's still here? That's so great. And they're like, what? What's your deal, buddy? And I'm like, no, she amazing. But that's the whole thing, right?
It's like we are so lucky if in a [00:48:00] relationship, whether it's friends or partners or whatever, when you feel like you both feel like the lucky one, right? I'd like to think that my wife still feels the same way. We'll see. But I know I do, so I'll get it. I don't know.
The other day they were like, Hey, so is mom still like into the beard? Or how come she lets you keep that? And I was like, guys, I don't think she cares. I think we're well past that. Like that used to be like on the list of things. Now the list is much longer and more annoying. Anyways. Emma, what is a book you have read in the past year or so that's either opened your mind, maybe a topic you hadn't considered before or actually changed your thinking in some way?
We've got a whole list up on bookshop that people love to check out.
Emma Pierson: This is squarely in my field of expertise and that's why I'm so impressed by it. It's the Alignment Problem by Brian Christian. Oh, good. It's nice to read someone who's just a qualitatively much better writer than you are.
And the way he explains complex technical topics is, it's formidable. [00:49:00] It's really pretty dazzling. I think.
Quinn: It was formidable to me as like a moron from the outside. I can't I imagine for you, like it's exceptional.
Emma Pierson: No, he's just really good at what he does. Yeah.
Quinn: Exceptional writers one, like anything I tell my children takes a lot of practice. We aren't, just necessarily always born that way, but it both makes me like so impressed, super jealous, and like really bummed at times, but almost just, I'm just thankful to exist. But yeah, that book changed a lot for me.
That's where I come at these questions of like why it matters so much. Who is doing the building blocks of the teams and the algorithms and choosing the data and what's in the data and all of those different things. Because again, it always, it's like it's us. That's the thing. It always comes back to us.
Emma, where can our listeners follow your work? Should you choose, support it in any way? Maybe not. Just read things. How can they keep up with all things, Emma.
Emma Pierson: To the extent that they want to. You can follow me on Twitter at two plus two, make five, and that has a link to my website as well.
Quinn: Okay. Rock and roll. I cannot thank you enough for your time today. This is [00:50:00] truly wonderful. I feel like I could, like I said, I could badger you for forever and I'm so thankful for the work you're doing, reading your like your list of papers and you see how much you've been cited in this and this, but for me it's easy to get greedy and look at this, go I hope she does this next and maybe she'll do this and maybe she can do this.
Because it's great to do nuance stuff that is like in a very small field that nobody cares about and that's all great too. But it's another thing to have someone who's constantly here are the things that matters and I'm gonna go after them because I have the tools and I think I can build the team to do that.
And that's tremendously inspiring, so thank you.
Emma Pierson: No, thanks so much for taking the time. Great questions. This was a pleasure.
Quinn: Oh, you're very kind. That's it. What an incredible conversation. Important, Not Important is hosted by me, Quinn Emmett. It is produced by Willow Beck, edited by Anthony Luciani and the team at Exela.
Music is by Tim Blaine. You can read our critically claimed newsletter and get notified about new podcast conversations @ importantnotimportant.com. We've got fantastic t-shirts, [00:51:00] hoodies, coffee and stuff like that. Not coffee, coffee mugs, coffee could be fun, at our store. Same website. I'm on Twitter at Quinn Emmett or important imp.
I'm also on LinkedIn. As always, you can send us feedback, guest suggestions, recipes, anything like that at questions@importantnotimportant.com. Thanks so much.