Who Am I? The Identity Crisis of the Librarian/Informationist/Data Scientist

More and more lately, I’m asked the question “what do you do?” This is a surprisingly difficult question to answer.  Often, how I answer depends on who’s asking – is it someone who really cares or needs to know? – and how much detail I feel like going to at the moment when I’m asked.  When I’m asked at conferences, as I was quite a bit at FORCE2016, I tried to be as explanatory as possible without getting pedantic, boring, or long-winded.  My answer in those scenarios goes something like “I’m a data librarian – I do a lot of instruction on data science, like R and data visualization, and data management.”  When I’m asked in more social contexts, I hardly even bother explaining.  Depending on my mood and the person who’s asking, I’ll usually say something like data scientist, medical librarian, or, if I really don’t feel like talking about it, just librarian.  It’s hard to know how to describe yourself when you have a job title that is pretty obscure: Research Data Informationist.  I would venture to guess that 99% of my family, friends, and even work colleagues have little to no idea what I actually spend my days doing.

In some regards, that’s fine.  Does it really matter if my mom and dad know what it means that I’ve taught hundreds of scientists R? Not really (they’re still really proud, though!).  Do I care if my date has a clear understanding of what a data librarian does?  Not really.  Do I care if a random person I happen to chat with while I’m watching a hockey game at my local gets the nuances of the informationist profession?  Absolutely not.

On the other hand, there are often times that I wish I had a somewhat more scrutable job title.  When I’m talking to researchers at my institution, I want them to know what I do because I want them to know when to ask me for help.  I want them to know that the library has someone like me who can help with their data science questions, their data management needs, and so on.  I know it’s not natural to think “library” when the question is “how do I get help with finding data” or “I need to learn R and don’t know where to start” or “I’d like to create a data visualization but I have no idea how to do it” or any of the other myriad data-related issues I or my colleagues could address.

The “informationist” term is one that has a clear definition and a history within the realm of medical librarianship, but I feel like it has almost no meaning outside of our own field.  I can’t even count the number of weird variations I’ve heard on that title – informaticist, informationalist, informatist, and many more.  It would be nice to get to the point that researchers understood what an informationist is and how we can help them in their work, but I just don’t see that happening in the near future.

So what do we do to make our contributions and expertise and status as potential collaborators known?  What term can we call ourselves to make our role clear?  Librarian doesn’t really do it, because I think people have a very stereotypical and not at all correct view of what librarians do, and it doesn’t capture the data informationist role at all.  Informationist doesn’t do it, because no one has any clue what that means.  I’ve toyed with calling myself a data scientist, and though I do think that label fits, I have some reservations about using that title, probably mostly driven by a terrible case of imposter syndrome.

What’s in a name?  A lot, I think.  How can data librarians, informationists, library-based data scientists, whatever you want to call us, communicate our role, our expertise, our services, to our user communities?  Is there a better term for people who are doing this type of work?

Some ponderings on #force2016 and open data

I’m attending FORCE2016, which is my first FORCE11 conference after following this movement (or group?) for awhile and I have to say, this is one interesting, thought-provoking conference.  I haven’t been blogging in awhile, but I felt inspired to get a few thoughts down after the first day of FORCE2016:

  • I love the interdisciplinarity of this conference, and to me, that’s what makes it a great conference to attend.  In our “swag bag,” we were all given a “passport” and could earn extra tickets for getting signatures of attendees from different disciplines and geographic locations.  While free drinks are of course a great incentive, I think the fact that we have so many diverse attendees at this conference is a draw on its own.  I love that we are getting researchers, funders, publishers, librarians, and so many other stakeholders at the table, and I can’t think of another conference where I’ve seen this many different types of people from this many countries getting involved in the conversatioon.
  • I actually really love that there are so few concurrent sessions.  Obviously, fewer concurrent sessions means fewer voices joining the official conversation, but I think this is a small enough conference that there are ways to be involved, active, and vocal without necessarily being an invited speaker.  While I love big conferences like MLA, I always feel pulled in a million different directions – sometimes literally, like last year when I was scheduled to present papers at two different sessions during the same time period.  I feel more engaged at a conference when I’m seeing mostly the same content as others.  We’re all on the same page and we can have better conversations.  I also feel more engaged in the Twitter stream.  I’m not trying to follow five, ten, or more tweet streams at once from multiple sessions.  Instead, I’m seeing lots of different perspectives and ideas and feedback on one single session.  I like us all being on the same page.

Now, those are some positives, but I do have to bring it down with one negative from this conference, and that is that I think it’s hard to constructively talk about how to encourage sharing and open science when you have a whole conference full of open science advocates.  I do not in any way want to disparage anyone because I have a lot of respect for many of the participants in the session I’m talking about, but I was a little disappointed in the final session today on data management.  I loved the idea of an interactive session (plus I heard there would be balloons and chocolate, so, yeah!) and also the idea of debate on topics in data sharing and management, since that’s my jam.  I did debate in high school, so I can recognize the difficulty but also the usefulness of having to argue for a position with which you strongly disagree.  There’s real value in spending some time thinking about why people hold positions that are in opposition of your strongly held position.  And yeah, this was the last session of a long day, and it was fun, and it had popping of balloons, and apparently some chocolate, and whatnot, but I am a little disappointed at what I see as a real missed opportunity to spend some time really discussing how we can address some of the arguments against data sharing and data management.  Sure, we all laughed at the straw men that were being thrown out there by the teams who were being called upon to argue in favor of something that they (and all of us, as open science advocates) strongly disagreed with.  But I think we really lost an opportunity to spend some time giving serious thought to some of the real issues that researchers who are not open science advocates actually raise.  Someone in that session mentioned the open data excuses bingo page (you can find it here if you haven’t seen it before).  Again, funny, but SERIOUSLY I have actually have real researchers say ALL of these things, except for the thing about terrorists.  I will reiterate that I know and respect a lot of people involved with that session and I’m not trying to disparage them in any way, but I do hope we can give some real thought to some of the issues that were brought up in jest today.  Some of these excuses, or complaints, or whatever, are actual, strongly-held beliefs of many, many researchers.  The burden is on us, as open science advocates, to demonstrate why data sharing, data management, and the like are tenable positions and in fact the “correct” choice.

Okay, off my soap box!  I’m really enjoying this conference, having a great time reconnecting with people I’ve not seen in years, and making new connections.  And Portland!  What a great city. 🙂

Radical Reuse: Repurposing Yesterday’s Data for Tomorrow’s Discoveries

I’ve been invited to be speaker at this evening’s Health 2.0 STAT meetup at Bethesda’s Barking Dog, alongside some pretty awesome scientists with whom I’ve been collaborating on some interesting research projects.  This invitation is a good step toward my ridiculously nerdy goal of one day being invited to give a TED talk.  My talk, entitled “Radical Reuse: Repurposing Yesterday’s Data for Tomorrow’s Discoveries” will briefly outline my view of data sharing and reuse, including what I view as five key factors in enabling data reuse.  Since I have only five minutes for this talk, obviously I’ll be hitting only some highlights, so I decided to write this blog post to elaborate on the ideas in that talk.

First, let’s talk about the term “radical reuse.”  I borrow this term from the realm of design, where it refers to taking discarded objects and giving them new life in some context far removed from their original use.  For some nice examples (and some cool craft ideas), check out this Pinterest board devoted to the topic.  For example, shipping pallets are built to fulfill the specific purpose of providing a base for goods in transport.  The person assembling that shipping pallet, the person loading it on to a truck, the person unpacking it, and so on, use it for this specific purpose, but a very creative person might see that shipping pallet and realize that they can make a pretty cool wine rack out of it.

The very same principle is true of scientific research data.  Most often, a researcher collects data to test some specific hypothesis, often under the auspices of funding that was earmarked to address a particular area of science.  Maybe that researcher will go on to write an article that discusses the significance of this data in the context of that research question.  Or maybe that data will never be published anywhere because they represent negative or inconclusive findings (for a nice discussion of this publication bias, see Ben Goldacre’s 2012 TED talk).  Whatever the outcome, the usefulness of the dataset need not end when the researcher who gathered the data is done with it.  In fact, that data may help answer a question that the original researcher never even conceived, perhaps in an entirely different realm of science.  What’s more, the return on investment in that data increases when it can be reused to answer novel questions, science moves more quickly because the process of data gathering need not be repeated, and therapies potentially make their way into practice more quickly.

Unfortunately, science as it is practiced today does not particularly lend itself to this kind of radical reuse.  Datasets are difficult to find, hard to get from researchers who “own” them, and often incomprehensible to those who would seek to reuse them.  Changing how researchers gather, use, and share data is no trivial task, but to move toward an environment that is more conducive to data sharing, I suggest that we need to think about five factors:

  • Description: if you manage to find a dataset that will answer your question, it’s unlikely that the researcher who originally gathered that data is going to stand over your shoulder and explain the ins and outs of how the data were gathered, what the variables or abbreviations mean, or how the machine was calibrated when the data were gathered.  I recently helped some researchers locate data about influenza, and one of the variables was patient temperature.  Straight forward enough.  Except the researchers asked me to find out how temperature had been obtained – oral, rectal, tympanic membrane – since this affects the reading.  I emailed the contact person, and he didn’t know.  He gave me someone else to talk to, who also didn’t know.  I was never able to hunt down the answer to this fairly simple question, which is pretty problematic.  To the extent possible, data should be thoroughly described, particularly using standardized taxonomies, controlled vocabularies, and formal metadata schemas that will convey the maximum amount of information possible to potential data re-users or other people who have questions about the dataset.
  • Discoverability: when you go into a library, you don’t see a big pile of books just lying around and dig through the pile hoping you’ll find something you can use.  Obviously this would be ridiculous; chances are you’d throw up your hands in dismay and leave before you ever found what you were looking for.  Librarians catalog books, shelve them in a logical order, and put the information into a catalog that you can search and browse in a variety of ways so that you can find just the book you need with a minimal amount of effort.  And why shouldn’t the same be true of data?  One of the services I provide as a research data informationist is assisting researchers in locating datasets that can answer their questions.  I find it to be a very interesting part of my job, but frankly, I don’t think you should have to ask a specialist in order to find a dataset, anymore than I think you should have to ask a librarian to go find a book on the shelf for you.  Instead, we need to create “catalogs” that empower users to search existing datasets for themselves.  Databib, which I describe as a repository of repositories, is a good first step in this direction – you can use it to at least hopefully find a data repository that might have the kind of data you’re looking for, but we need to go even further and do a better job of cataloging well-described datasets so researchers can easily find them.
  • Dissemination: sometimes when I ask researchers about data sharing, the look of horror they give me is such that you’d think I’d asked them whether they’d consider giving up their firstborn child.  And to be fair, I can understand why researchers feel a sense of ownership about their data, which they have probably worked very hard to gather.  To be clear, when I talk about dissemination and sharing, I’m not suggesting that everyone upload their data to the internet for all the world to access.  Some datasets have confidential patient information, some have commercial value, some even have biosecurity implications, like H5N1 flu data that a federal advisory committee advised be withheld out of fear of potential bioterrorism.  Making all data available to anyone, anywhere is neither feasible nor advisable.  However, the scientific and academic communities should consider how to increase the incentives and remove the barriers to data sharing where appropriate, such as by creating the kind of data catalogs I described above, raising awareness about appropriate methods for data citation, and rewarding data sharing in the promotion and tenure process.
  • Digital Infrastructure: okay, this is normally called cyberinfrastructure, but I had this whole “words starting with the letter D” thing going and I didn’t want to ruin it. 🙂  If we want to do data sharing properly, we need to build the tools to manage, curate, and search it.  This might seem trivial – I mean, if Google can return 168 million web pages about dogs for me in 0.36 seconds, what’s the big deal with searching for data?  I’m not an IT person, so I’m really not the right person to explain the details of this, but as a case in point, consider the famed Library of Congress Twitter collection.  The Library of Congress announced that they would start collecting everything ever tweeted since Twitter started in 2006.  Cool, huh?  Only problem is, at least as of January 2013, LC couldn’t provide access to the tweets because they lacked the technology to allow such a huge dataset to be searched.  I can confirm that this was true when I contacted them in March or April of 2013 to ask about getting tweets with a specific hashtag that I wanted to use to conduct some research on the sociology of scientific data sharing, and they turned me down for this reason.  Imagine the logistical problems that would arise with even bigger, more complex datasets, like those associated with genome wide association studies.
  • Data Literacy: Back in my library school days, my first ever library job was at the reference desk at UCLA’s Louise M. Darling Biomedical Library.  My boss, Rikke Ogawa, who trained me to be an awesome medical librarian, emphasized that when people came and asked questions at the reference desk, this was a teachable moment.  Yes, you could just quickly print out the article the person needed because you knew PubMed inside and out, but the better thing to do was turn that swiveling monitor around and show the person how to find the information.  You know, the whole “give a man a fish and he’ll eat for a day, teach a man to fish and he’ll eat for a lifetime” thing.  The same is true of finding, using, and sharing data.  I’m in the process of conducting a survey about data practices at NIH, and almost 80% of the respondents have never had any training in data management.  Think about that for a second.  In one of the world’s most prestigious biomedical research institutions 80% of people have never been taught how to manage data.  Eighty per cent.  If you’re not as appalled by that as I am, well, you should be.  Data cannot be used to its fullest if the next generation of scientists continues with the kind of makeshift, slapdash data practices I often encounter in labs today.  I see the potential for more librarians to take positions like mine, focusing on making data better, but that doesn’t mean that scientists shouldn’t be trained in at least the basics of data management.

So that’s my data sharing manifesto.  What I propose is not the kind of thing that can be accomplished with a few quick changes.  It’s a significant paradigm shift in the way that data are collected and science is practiced.  Change is never easy and rarely embraced right away, but in the end, we’re often better for having challenged ourselves to do better than we’ve been doing.  Personally, I’m thrilled to be an informationist and librarian at this point in history, and I look forward to fondly reminiscing about these days in our data-driven future. 🙂

the sweet scent of Actinomycetes (or why rain smells good)

This morning I walked out the door and caught a whiff of something I don’t smell often in Los Angeles – actinomycetes!

You know that “rain smell” that you can detect, especially on a day when it hasn’t rained in awhile?  That’s actinomycetes.  It’s a kind of bacteria that lives in the soil, and when it rains, the water hitting the ground aerosolizes the bacteria, creating that distinctive rain smell.  So the next time you catch a whiff of the lovely, fresh scent of rain, don’t forget that it’s actually tiny liquid droplets of dirt bacteria entering your nose. 🙂

Why Data Management is Cool (Sort Of)

“She told me the topic was really boring, but that you made it kind of interesting,” the woman said when I asked her to be honest about what our mutual acquaintance had said after attending a class I’d taught on writing a data management plan.  This is not the first time I’d heard something like this.  The fact is, I’m pretty damn passionate and excited about a topic that most people find slightly less boring than watching paint dry: data.  Now, I’m not going to try to convince you that data is not nerdy.  It is.  Very nerdy.   I have never claimed to be cool, and this is probably one of my least cool interests.  However, I think I have some very good reasons for finding data rather interesting.

I remember pretty much the exact moment when I realized the very interesting potential that lives in data.  I was in library school and taking a class in the biomedical engineering department about medical knowledge representation, and we were spending the whole quarter on talking about the very complicated issue of representing the clinical data around a very specific disease (glioblastoma multiforme or GBM, a type of brain cancer).  It’s very difficult with this disease, as with many others, to arrange and organize the data just about a single patient in such a way that a clinician can make sense of it.  There’s genetic data, vital signs data, drug dosing data, imaging data, lab report data, genetic data, doctor’s subjective notes, patient’s subjective reports of their symptoms, and tons of other stuff, and it all shifts and changes over time as the disease progresses or recedes.  Is there any way to build a system that could present this data in any sort of a manageable way to allow a clinician to view meaningful trends that might provide insight into the course of disease that could help improve treatment?  Disappointingly, at least for now, the answer seems to be no, not really.

But the moment that I really knew that I wanted to work with this stuff was when we were talking about personalized medicine and genetic data.  In the case of GBM, as with many other diseases, certain medicines work very well on some patients, but fail almost completely in others.  Many factors could play into this, but there’s likely a large genetic component for why this should be.  Given enough data about the patients in whom these drugs worked and in whom they didn’t, then, could we potentially figure out in advance which drug could help someone?  Extrapolating from that, if we have enough health data about enough different patients, aren’t there endless puzzles we could solve just by examining the patterns that would emerge by getting enough information into a system that could make it comprehensible?

Perhaps that’s oversimplifying it, but I do think it’s fair to conceive of data as pure, unrefined knowledge.  When I look at a dataset, I don’t see a bunch of numbers or some random collection of information.  I imagine what potential lives within that data just waiting to be uncovered by the careful observation of some astute individual or a program that can pick out the patterns that no human could ever catch.  To me, raw data represents the final frontier of wild, untamed knowledge just waiting to be understood and explained, and to someone like me who is really in love with knowledge above all, that’s a pretty damn cool thing.

Yes, I know that writing a data management plan or figuring out what kind of metadata to use for a dataset is pretty boring.  I’m not denying that.  But sometimes you have to do some boring stuff to make cool things happen.  You have to get your oil changed if you want your Bugatti Veyron to do 0 to 60 in 2.5 seconds (I mean, I’m assuming those things have to get oil changes?).  You have to do the math to make sure your flight pattern is right if you want to shoot a rocket into space.  And you can’t find out all the cool secrets that live in your dataset if it’s a messy pile of papers sitting on your desk.  So the way I see it, my job is to make data management as easy and as interesting as possible so that the people who have the data will be able to unlock the secrets that are waiting for them.  So spread the word, my fellow data nerds.  Let’s make data management as cool as regular oral hygiene.  😉

Reading the Great Books of Science

It’s been ages since I posted here, and I can’t let all the blog readers down, can I?!  I’ve been up to all sorts of fantastically nerdy things lately, which have kept me rather too busy for blogging, and which I will probably report on here in due time.  For now, let’s talk science and books, which as we all know, are two of my favorite things (the other top contenders for my favorite things being dogs, champagne, and Paris).

One of the many perks of working at a major research institution is that really awesome people come speak here.  Case in point: a few weeks back, I had the opportunity to attend a Q&A session with James Watson, as in Watson and Crick, as in discoverers of the double helix structure of DNA.  True story: the Q&A ended at the exact same time as I had to be across campus for the start of a class, so I knew I was going to have to leave early.  When I told this to one of my bosses, who was also attending, she said, “you’re going to get up and walk out while James Watson is talking?”  And indeed, that is exactly what I did. 🙂

However, before I left, one of the things Watson had to say that struck me was regarding what he referred to as “the great books.”  I forget exactly how he put it, but he said that he had appreciated his schooling for exposing him to these great books, which had helped shape his thinking.  This statement reminded me of a blog post I’d recently read about Carl Sagan’s reading list, written in his own hand and excerpted from his papers, now held by the Library of Congress.  As the blog post I’d read eloquently puts it, is it possible to “reverse engineer” a great mind by following in that thinker’s literary footsteps?

I’m sure it’s not so simple as that, but in any case, I decided that I would like to add to my already completely ridiculous collection of to-read books by creating my own “great books of science” library.  Based on my research into what one might currently consider the important books in science (at least for the non-scientist), I’ve started my library with the following titles:

  1. Charles Darwin – The Origin of Species
  2. Richard Dawkins – The Selfish Gene
  3. Stephen Hawking – A Brief History of Time
  4. Matt Ridley – Genome: The Autobiography of a Species in 23 Chapters
  5. Carl Sagan – Cosmos

So far, I’m about 1/3 of the way into Genome, which I really enjoy (but I did also just start Haruki Murakami’s The Wind-Up Bird Chronicle, the reading of which has become a near-obsession that currently occupies almost all of my free time).  It’s a nice overview of evolution and genetics, though perhaps a little bit less technical than I would have liked, but certainly an enjoyable read.

So, dear blog readers, as you can see, my list is at present by no means comprehensive.  What would you add to a library of the “great books” of science?  Let me know in the comments so I can add to my Amazon wish list. 🙂

More Neuroscience Awesomeness and A Challenge for Librarians!

In my neuroscience class, we’ve now moved away from developmental neuroscience and into what I find way more interesting and the real reason I wanted to take the course: molecular neuroscience.  For the next three weeks, we’ll be learning how nerves communicate with each other.  Mostly this is through different channels that send stuff like ions and neurotransmitters in and out of cells.  We had a guest speaker who specializes in genetic neurological diseases, and she focused her talk specifically on what are called “channelopathies.”  That is, genetic diseases in which symptoms are caused by problems with these nerve channels.  Some of these problems are common – for example, many types of migraine are caused by channelopathies – but some are rare and super bizarre.

Here’s one of the rare and super bizarre ones the lecturer told us about: periodic paralysis is a condition in which the patient becomes temporarily but completely paralyzed, and then afterwards, they’re totally fine.  The paralysis can be brought on by all different kinds of things – stress, excitement, etc.  The lecturer told us about a really strange case of familial periodic paralysis that was found in a large family in Ireland.  Genetically, it’s autosomal dominant, meaning that if one parent has it, the children have a 50% chance of developing it.  So as one would expect, about half of this family is affected.  The trigger for this particular familial periodic paralysis is overeating.  The lecturer said “think of the gatherings this family must have.  They all get together and eat a big meal, and then half of them are paralyzed!”  Can you imagine, half of a family falling over paralyzed after dinner and then getting up and going home a few hours later like nothing ever happened?  Wouldn’t that make for some awkward family reunions?  Since the condition isn’t dangerous, I think it’s okay if we laugh a little bit at that image, right?  (Obviously familial periodic paralysis is not funny, and I’m definitely not making fun of it.  But don’t you have to admit that you’re wondering how different your family gatherings might have been had half of you been paralyzed for awhile after dinner?)

This family and their condition intrigued me so much that as soon as I got home, I went to PubMed to see if I could find the case in the literature (I really can’t help it…I’m a librarian), but my searching has turned up nothing so far.  Therefore I am challenging the medical librarians out there to find me a case report.  If you find it, you will win….I don’t know, honor and glory.  🙂  So to run down again, here’s what we know:

  • autosomal dominant
  • channelopathy (I think she said on the potassium ion channel, which would make sense because I found lots of cases of hyperkalemic periodic paralysis)
  • familial periodic paralysis
  • overeating
  • probably an Irish family (the lecturer did specify Irish, but as every librarian knows, people often misremember these kinds of details, so probably best not to rely on this particular piece of information)

Alright, go! 🙂

(And by the way, if no one finds this within a week, I’ll email the guest lecturer and ask, but let’s try to save me the embarrassment of having to compose that bizarre email, shall we?)

Talking the Talk: Why Research Informationists Should Go to Class!

On a recent evening, I found myself wondering about neurotransmitters (like you do).  I had sort of a vague idea of how they worked, but It occurred to me that, as the liaison librarian to the departments of all brain-y things at UCLA (neurology, neuroscience, psychiatry, psychology, etc), I’d probably be doing myself a favor if I learned a little bit more about these areas.  Thus it was that I came to enroll in Neuroscience 101B, an undergrad course in developmental and molecular neuroscience – that is, how the nervous system is formed during gestation, and how neurotransmitters and other molecular signalling methods work in the adult.  I had to contact the professor to get special permission to join the course, and he said I was welcome to freely attend the lectures if I wanted (as it’s huge and they don’t take roll), but I could also officially enroll, which would require that I take the three exams and complete weekly, page-long critical responses to recent articles in the field.  I thought to myself, “if I just audit this class, things will get busy during the quarter like they always do, and I’ll stop going.  But if I actually enroll and have to earn a grade, I have real incentive to learn this.”  So I decided to actually enroll, and I’m so glad I did – I’m only two and a half weeks in, but I can already see how taking this class is going to be so helpful to me as a librarian and research informationist.  Already I have started to get some benefits:

  1. Learn their language.  A mere sampling of the words and phrases that have entered my vocabulary in just two and a half weeks: ligand, rostral/caudal, filopodia, membrane diffusible, notochord, presynaptic compartment.  No, I did not make any of that up, and yes, I can define all of it.  In short, I am learning to speak the language of neuroscience.
  2. Learn their experimental methods: Thanks to this class, I now know what two-photon microscopy is.  I know the exact procedure by which one creates a cranial window for imaging neurons via a craniotomy (don’t look it up.  Trust me. It involves dental adhesive and super glue and it’s not at all pleasant).  I can explain several different experimental methods for examining neuronal activity, as well as various reasons why one would want to examine neuronal activity in the first place. Understanding the how and why of the science makes such a huge difference in being able to understand the how and why of their research methods.  Obviously, for a research informationist, this is key.
  3. Learn the big names in the field.  Though I live in LA, I’m not one to name drop. 🙂  However, I will say this about neuroscience, in my experience of it: you are going to learn to recognize the people who did the big experiments (and it’s probably true of other fields as well).  For one thing, you can’t help but know them because there’s stuff named after them (see for example the Cajal-Retzius cell and the interstitial cell of Cajal, both named for an evidently reclusive Spanish Nobel prize winner who spent hours and hours of his life dyeing nerves to study them and thus ended up discovering tons of stuff).  But even when there’s not something named after the researcher, you still learn who did the experiment, and I get the feeling this kind of thing might even be on the exam.  I appreciate that about the field – credit where credit is due, right?  More importantly, it’s interesting to learn the big names who are currently doing research in the field, particularly when those big names happen to be on my campus and publishing in Nature and such.  When I hear those things in lecture, that is something I definitely file away for later.
  4. Learn about the department (and have them learn about me).  When I contacted the professor to ask to take the course and told him why, I have a feeling that was probably the first time he even knew he had a liaison librarian.  Now, not only do he and the other two class professors know I’m here, but they also know that I’m interested in what they do.  Plus, I’m learning all sorts of things about the department (such as the fact that they have TONS of seminars and lectures I’d never heard about) as well as things about the student experience, so I have more of a context in which to understand the kind of research assistance these students might need.
  5. Learn fantastic trivia for more interesting conversations.  Okay, not an entirely serious reason, but a nice side effect of the course.  For example, did you know that in a rat, each whisker is connected to a single neuron?  I assume the same is true for dogs, so now I like to bug Ophelia by touching a single whisker and wondering which neuron it’s setting off.  (I explained to her that it’s for science, but she still seems annoyed by it.)  Or how about that there are proteins and neurotransmitters with names like Sonic hedgehog, Dickkopf (means big head in German!), and Frizzled?

All of this is important to me because I love working with researchers and I feel like I can more legitimately sit at the table now, so to speak.  Obviously the knowledge I’m getting from one undergrad survey class is hardly enough to get me up to speed on something so complex as the nervous system, but at least now I feel like I understand all of those brain-y departments better, especially in terms of the research they’re conducting.

I do want to emphasize that I don’t think a degree in a science field is necessary for a research informationist or other librarian who is interested in working with clinical or basic science researchers.  Some of the best science/medical librarians I know have liberal arts degrees: political science, English, philosophy, etc.  Regardless of your educational background, though, I think the best science librarians are those who are able to learn how to adapt to the field and learn the language and culture of the science they work with.  Like different regions of the United States, each scientific field has their own dialect and “regional” traditions and practices.  If you don’t know how to operate in that language and tradition, you are pretty obviously an outsider.  But…if you want to slip in amongst them…it’s easy enough to do so if you have a little knowledge.  Taking a class is not necessarily for everyone.  I don’t know many adult professional people who would voluntarily spend their weekend studying for a neuroscience exam (I’m lame, I know, but look, I really want an A), but for those librarians who can manage it, I can’t speak highly enough of the experience.  Fortunately for those who are not quite as insane ambitious as I am, there are other ways of gaining knowledge too, like checking out Data Curation Profiles, going to open lectures and grand rounds, talking to researchers about their work, and, erm, reading Wikipedia.  🙂

Of course I say all this now, but I might be singing a different tune after my first exam this Monday. Now, if you’ll excuse me, I have to go remind myself about the three different mechanisms by which synaptic topography is modeled in the developing nervous system.

Cool Science: Crowdsourcing Big Data

Anyone who knows me at all knows I really like data.  It’s a tremendously nerdy interest, but I find data really fascinating, I guess in part because I love the idea that there is some great knowledge that’s hidden in the numbers, just waiting for someone to come along and dig it out.  What’s very cool is that we live in an age when technology allows us to generate massive amounts of data.  For example, the Large Hadron Collider generates more than 25 petabytes a year in data, which is more than 70 terabytes a day.  A DAY.  Some data analysis can be done by computers, but some of it really has to be done by people.  Plus, some studies really rely on the ability to gather data from massive groups of people in order to get an adequate sample from various groups to prove what you’re trying to show.  To solve these and other “big data” problems, some very smart and cool research groups have jumped on the crowdsourcing bandwagon and are having people from around the world get online and help solve the problems of data gathering and analysis.  Here are some cool projects I’ve heard about.

Eyewire: a group of researchers working on retinal connectomes at MIT found a fascinating way to get people to help with their data analysis – turn it into a game.  They have a good wiki that explains the project in depth, but the gist of it is that these researchers have microscopic scans of neurons from the retina.  Neurons are a huge tangled mess, so their computers could figure out how some of them fit together, but it really takes an actual person to go in and figure out what’s connected and what’s not.  So this team turned it into this 3D puzzle/game thing that’s really hard to explain unless you try it.  You go through a tutorial to learn how to use the system, and then you’re turned loose to start mapping neurons!  It’s not like the most compelling game I’ve ever played or something I’d spend hours doing, but it is interesting, and it helps neuroscience, so that’s pretty cool.

Small World of Words: this study aims to better understand human speech and how we subconsciously create networks of associations among words.  To do so, they set up a game to gather word associations from native and non-native English speakers.  Again, I wouldn’t necessarily call this a game in the sense of “woohoo, we’re having so much fun!” but it is kind of interesting to see what your brain comes up with when you’re given a set of random words.  (Plus it’s perhaps a little telling of your own psychological state if you really think about the words you’re coming up with.)  It takes like 2 minutes to do, and again, it’s contributing to science!  Also, according to their website, they are making their dataset publicly available, which as a research informationist/data librarian I wholeheartedly endorse.

Foldit: I haven’t played this yet, so I can’t speak to how fun it is (or boring), but it sounds similar to Eyewire in the sense of being a puzzle in which the players are helping to map a structure – in this case proteins.  Proteins are long chains of amino acids, but they fold up in certain ways that determine their function.  Knowing more about this folding structure makes it possible to create better drugs and understand the pathology of diseases.  For example, one of the things this project is looking at is proteins that are crucial for HIV to replicate itself within the human body.  Better understanding of the structure of these proteins could help contribute to drugs to treat HIV and AIDS.

So I encourage you to go play some games for science!  Do it now!  And if you’re at work and someone tries to stop you, just politely explain that you’re not playing a game – you’re curing AIDS.  🙂

The Librarian’s First Dataset: A Treatise on Incredible Nerdiness

I must preface this post by saying that, if you didn’t know already, I’m a huge herd.  The biggest.  There’s nothing I’m more passionate about than knowledge and learning, and this has often earned me very perplexed looks from people who probably think I’m crazy.  In this post, I’m going to wax poetic about knowledge and reveal the depths of my geekiness.  However, I’m guessing if you’re here reading this blog, this is probably not going to come as any sort of a surprise to you.

For the last few weeks, I’ve been working on planning a research data management class.  Working with researchers on their data is hands-down my favorite part of my job.  I adore science and the best part of being a medical librarian/research informationist is that I get to work with all different researchers and hear about all sorts of fascinating things.  Sometimes I regret that I didn’t get a science degree, but mostly I’m okay with it because this job allows me to get my hands into all sorts of different things and never have to choose a specialty. Talking to researchers is fascinating.  However, the more I talk to them, the more I realize that a lot of them really have no idea what they’re doing when it comes to data management.  These are brilliant people, to be sure, but the way they handle their data makes me cringe.  They’ve never been trained to do it properly, but as a librarian, I have that training.  Part of what I do is helping people with their data, but I also believe in the adage about giving a man a fish versus teaching him to fish.  I’m one librarian in a huge research enterprise.  As much as I’d like to, there’s no way I could possibly reach everyone to personally help them figure out their data.  So one of the things I decided to do to help mitigate the fact that I can’t be in eight million places at once is to offer a class on research data management.

Because I work in the field of medicine, in which everything must be evidence-based, of course I wasn’t satisfied just to offer a class and hope people liked it.  I am a data librarian, so I decided that I should probably gather some data!  My plan was to devise a pre-test that people would take before the class, then a follow-up post test.  Obviously the goal was that they wouldn’t know the answers to the questions on the pre-test, and then they would after the class. I spent weeks agonizing over how best to assess this. I’ve had very, very preliminary training in devising assessment instruments, but mostly I was just kind of taking a shot in the dark when I came up with my pre-test. I changed the questions a million times, but I finally came up with something that I thought would probably work.

Today, our office manager sent out the reminder email about tomorrow’s class to those who had RSVP’d.  The email contained a link to the survey and a brief explanation of why I was asking people to complete it.  It was a short survey, took only a couple minutes to complete, but I had this sinking feeling that everyone would ignore it.  Because of IRB (Institutional Research Board) requirements, I had emphasized in the email that people weren’t required to take the survey if they wanted to do the class.  I figured people would see that and just ignore the survey, but I was keeping my fingers crossed.  I was on the train to the airport in San Francisco on my way back to Los Angeles when I saw that the email had gone out.

So now, allow me to set the scene for one of the nerdiest moments of my life.  I had gotten to the airport and had some time to kill before my flight, so I was sitting in a wine bar getting something to eat (and drink of course).  I ordered a glass of Champagne (yeah, that’s how I roll) and pulled out my laptop.  I was logging on when the Champagne arrived.  I pulled up the survey site.  The email had only gone out maybe an hour or so earlier, so I wasn’t expecting any responses yet.  But when I logged on, you know what I found?  Almost EVERY SINGLE PERSON who has registered for the class had taken the survey!  When I saw the number of responses, I made an audible, astonished gasp, and several people in the restaurant turned and looked at me.  I refrained from getting up from my seat and jumping up and down in excitement, though this is what I would have done if I had been alone. 🙂

Not only did people respond to my survey, but they responded exactly as I hoped they would.  I won’t go into detail here, since obviously I’m going to attempt to publish all of this in a peer-reviewed journal.  🙂  But essentially, these pre-test results reveal that, as I had suspected, these people really need a lot of help with this stuff and don’t have a lot of knowledge of the many awesome resources out there.  Hopefully that will all change tomorrow when I teach this class.

So that is the story of how I came to have my very own research dataset.  This is incredibly heartening for me.  For one thing, I’ve always felt like I really ought to have more hands-on experience working with data if I’m going to teach it.  My dataset is super tiny compared to the datasets I help researchers with, but this is a good start.  More importantly, I am so excited that this actually worked.  I’ve been wanting to move forward with additional research in this area, but I wasn’t entirely sure if it was worthwhile, since I basically only had anecdotal evidence to suggest this kind of thing was needed, and there have been a few naysayers whose words weighed heavily on my mind.  I’ve worked really hard on all of this, and it’s been exhausting, especially with having to work around sort of a crazy travel schedule.  But now it feels like things are all falling into place.  All those little ideas I’ve had floating around in my mind about additional research I’d like to do feel a little more feasible now.  So it’s an exciting time for me career-wise.  Now that I’m a little more assured that I know what I’m doing, I have some good ideas about how to move forward. I’ve got a hunger for data and research now and I need more. 🙂

So yeah, again, probably news to no one, but I’m a huge nerd.  Now, in celebration, I’m going to order a second glass of Champagne to enjoy in the hour before I have to catch my flight.  Cheers!