“She told me the topic was really boring, but that you made it kind of interesting,” the woman said when I asked her to be honest about what our mutual acquaintance had said after attending a class I’d taught on writing a data management plan. This is not the first time I’d heard something like this. The fact is, I’m pretty damn passionate and excited about a topic that most people find slightly less boring than watching paint dry: data. Now, I’m not going to try to convince you that data is not nerdy. It is. Very nerdy. I have never claimed to be cool, and this is probably one of my least cool interests. However, I think I have some very good reasons for finding data rather interesting.
I remember pretty much the exact moment when I realized the very interesting potential that lives in data. I was in library school and taking a class in the biomedical engineering department about medical knowledge representation, and we were spending the whole quarter on talking about the very complicated issue of representing the clinical data around a very specific disease (glioblastoma multiforme or GBM, a type of brain cancer). It’s very difficult with this disease, as with many others, to arrange and organize the data just about a single patient in such a way that a clinician can make sense of it. There’s genetic data, vital signs data, drug dosing data, imaging data, lab report data, genetic data, doctor’s subjective notes, patient’s subjective reports of their symptoms, and tons of other stuff, and it all shifts and changes over time as the disease progresses or recedes. Is there any way to build a system that could present this data in any sort of a manageable way to allow a clinician to view meaningful trends that might provide insight into the course of disease that could help improve treatment? Disappointingly, at least for now, the answer seems to be no, not really.
But the moment that I really knew that I wanted to work with this stuff was when we were talking about personalized medicine and genetic data. In the case of GBM, as with many other diseases, certain medicines work very well on some patients, but fail almost completely in others. Many factors could play into this, but there’s likely a large genetic component for why this should be. Given enough data about the patients in whom these drugs worked and in whom they didn’t, then, could we potentially figure out in advance which drug could help someone? Extrapolating from that, if we have enough health data about enough different patients, aren’t there endless puzzles we could solve just by examining the patterns that would emerge by getting enough information into a system that could make it comprehensible?
Perhaps that’s oversimplifying it, but I do think it’s fair to conceive of data as pure, unrefined knowledge. When I look at a dataset, I don’t see a bunch of numbers or some random collection of information. I imagine what potential lives within that data just waiting to be uncovered by the careful observation of some astute individual or a program that can pick out the patterns that no human could ever catch. To me, raw data represents the final frontier of wild, untamed knowledge just waiting to be understood and explained, and to someone like me who is really in love with knowledge above all, that’s a pretty damn cool thing.
Yes, I know that writing a data management plan or figuring out what kind of metadata to use for a dataset is pretty boring. I’m not denying that. But sometimes you have to do some boring stuff to make cool things happen. You have to get your oil changed if you want your Bugatti Veyron to do 0 to 60 in 2.5 seconds (I mean, I’m assuming those things have to get oil changes?). You have to do the math to make sure your flight pattern is right if you want to shoot a rocket into space. And you can’t find out all the cool secrets that live in your dataset if it’s a messy pile of papers sitting on your desk. So the way I see it, my job is to make data management as easy and as interesting as possible so that the people who have the data will be able to unlock the secrets that are waiting for them. So spread the word, my fellow data nerds. Let’s make data management as cool as regular oral hygiene. 😉