Can you hack it? On librarian-ing at hackathons

I had the great pleasure of spending the last few days working on a team at the latest NCBI hackathon.  I think this is the sixth hackathon I’ve been involved in, but this is the first time I’ve actually been a participant, i.e. a “hacker.”  Prior to working on these events, I’d heard a little bit about hackathons, mostly in the context of competitive hackathons – a bunch of teams compete against each other to find the “best” solution to some common problem, usually with the winning team receiving some sort of cash prize.  This approach can lead to successful and innovative solutions to problems in a short time frame.  However, the so-called NCBI-style hackathons that I’ve been involved in over the last couple years involve multiple teams each working on their own individual challenge over a period of three days. There are no winners, but in my experience, everyone walks away having accomplished something, and some very promising software products have come out of these hackathons.  For more specifics about the how and why of this kind of hackathon, check out the article I co-authored with several participants and the mastermind behind the hackathons, Ben Busby of NCBI.

As I said, this time was the first hackathon that I’ve actually been involved as a participant on a team, but I’ve had a lot of fun doing some librarian-y type “consulting” for five other hackathons before this, and it’s an experience I can highly recommend for any information professional who is interested in seeing science happen real-time.  There’s something very exciting about watching groups of people from different backgrounds, with different expertise, most of whom have never met each other before, get together on a Monday morning with nothing but an often very vague idea, and end up on Wednesday afternoon with working software that solves a real and significant biomedical research problem.  Not only that, but most of the groups manage to get pretty far along on writing a draft of a paper by that time, and several have gone on to publish those papers, with more on their way out (see the F1000Research Hackathons channel for some good examples).

As motivated and talented as all these hackathon participants are, as you can imagine, it takes a lot of organizational effort and background work to make something like this successful.  A lot of that work needs to be done by someone with a lot of scientific and computing expertise.  However, if you are a librarian who is reading this, I’m here to tell you that there are some really exciting opportunities to be involved with a hackathon, even if you are completely clueless when it comes to writing code.  In the past five hackathons, I’ve sort of functioned as an embedded informationist/librarian, doing things like:

  • basic lit searching for paper introductions and generally locating background information.  These aren’t formal papers that require an extensive or systematic lit review, but it’s useful for a paper to provide some context for why the problem is significant.  The hackers have a ton of work to fit in to three days, so it’s silly to have them spend their limited time on lit searching when a pro librarian can jump in and likely use their expertise to find things more easily anyway
  • manuscript editing and scholarly communication advice.  Anyone who has worked  with co-authors knows that it takes some work to make the paper sound cohesive, and not like five or six people’s papers smushed together.  Having someone like a librarian with editing experience to help make that happen can be really helpful.  Plus, many librarians  have relevant expertise in scholarly publishing, especially useful since hackathon participants are often students and earlier career researchers who haven’t had as much experience with submitting manuscripts.  They can benefit from advice on things like citation management and handling the submission process.  Also, I am a strong believer in having a knowledgeable non-expert read any paper, not just hackathon papers.  Often writers (and I absolutely include myself here) are so deeply immersed in their own work that they make generous assumptions about what readers will know about the topic.  It can be helpful to have someone who hasn’t been involved with the project from the start take a look at the manuscript and point out where additional background or explanation might be beneficial to aiding general understandability.
  • consulting on information seeking behavior and giving user feedback.  Most of the hackathons I’ve worked on have had teams made up of all different types of people – biologists, programmers, sys admins, other types of scientists.  They are all highly experienced and brilliant people, but most have a particular perspective related to their specific subject area, whereas librarians often have a broader perspective based on our interactions with lots of people from various different subject areas.  I often find myself thinking of how other researchers I’ve met might use a tool in other ways, potentially ones the hackathon creators didn’t necessarily intend.  Also, at least at the hackathons I’ve been at, some of the tools have definite use cases for librarians – for example, tools that involve novel ways of searching or visualizing MeSH terms or PubMed results.  Having a librarian on hand to give feedback about how the tool will work can be useful for teams with that kind of a scope.

I think librarians can bring a lot to hackathons, and I’d encourage all hackathon organizers to think about engaging librarians in the process early on.  But it’s not a one-way street – there’s a lot for librarians to gain from getting involved in a hackathon, even tangentially.  For one thing, seeing a project go from idea to reality in three days is interesting and informative.  When I first started working with hackathons, I didn’t have that much coding experience, and I certainly had no idea how software was actually developed.  Even just hanging around hackathons gave me so much of a better understanding, and as an informationist who supports data science, that understanding is very relevant.  Even if you’re not involved in data science per se, if you’re a biomedical librarian who wants to gain a better understanding of the science your users are engaged in, being involved in a hackathon will be a highly educational experience.  I hadn’t really realized how much I had learned by working with hackathons until a librarian friend asked me for some advice on genomic databases. I responded by mentioning how cool it was that ClinVar would tell you about pathogenic variants, including their location and type (insertion, deletion, etc), and my friend was like, what are you even talking about, and that was when it occurred to me that I’ve really learned a lot from hackathons!  And hey, if nothing else, there tends to be pizza at these events, and you can never go wrong with pizza.

I’ll end this post by reiterating that these hackathons aren’t about competing against each other, but there are awards given for certain “exemplary” achievements.  Never one to shy away from a little friendly competition, I hoped I might be honored for some contribution this time around, and I’m pleased to say I was indeed recognized . 🙂

It's true, I'm the absolute worst at darts.

There is a story behind this, but trust me when I say it’s true, I’m the absolute worst at darts.

So you think you can code

I’ve been thinking about many ideas lately dealing with data and data science (this is, I’m sure, not news to anyone).  I’ve also had several people encourage me to pick my blog back up, and I’ve recently made my den into a cute and comfy little office, so, why not put all this together and resume blogging with a little post about my thoughts on data!  In particular, in this post I’m going to talk about coding.

Early on in my library career when I first got interested in data, I was talking to one of my first bosses and told her I thought I should learn R, which is essentially a scripting language, very useful for data processing, analysis, statistics, and visualization.  She gave me a sort of dubious look, and even as I said it, I was thinking in my head, yeah, I’m probably not going to do that.  I’m no computer scientist.  Fast forward a few years later, and not only have I actually learned R, it’s probably the single most important skill in my professional toolbox.

Here’s the thing – you don’t have to be a computer scientist to code, especially in R.  It’s actually remarkably straightforward, once you get over the initial strangeness of it and get a feel for the syntax.  I started offering R classes around the beginning of this year and I call my introductory classes “Introduction to R for Non-programmers.”  I had two reasons for selecting this name: one, I had only been using R for less than a year myself and didn’t (and still don’t) consider myself an expert.  When I started thinking about getting up in front of a room of people and teaching them to code, I had horrifying visions of experienced computer scientists calling me out on my relative lack of expertise, mocking my class exercises, or correcting me in front of everyone.  So, I figured, let’s set the bar low. 🙂  More importantly, I wanted to emphasize that R is approachable!  It’s not scary!  I can learn it, you can learn it.  Hell, young children can (and do) learn it.  Not only that, but you can learn it from one of a plethora of free resources without ever cracking a book or spending a dime.  All it takes is a little time, patience, and practice.

The payoff?  For one thing, you can impress your friends with your nerdy awesome skills!  (Or at least that’s what I keep telling myself.)  If you work with data of any kind, you can simplify your work, because using R (or other scientific programming languages) is faaaaar more efficient than using other point and click tools like Excel.  You can create super awesome visualizations, do crazy data analysis in a snap, and work with big huge data sets that would break Excel.  And you can do all of this for free!  If you’re a research and/or medical librarian, you will also make yourself an invaluable resource to your user community.  I believe that I could teach an R class every day at my library and there would still be people showing up.  We regularly have waitlists of 20 or more people.  Scientists are starting to catch on to all the reasons I’ve mentioned above, but not all of them have the time or inclination to use one of the free online resources.  Plus, since I’m a real human person who knows my users and their research and their data, I know what they probably want to do, so my classes are more tailored to them.

I was being introduced to Hadley Wickham yesterday, who is a pretty big deal in the R world, as he created some very important R packages (kind of like apps), and my friend and colleague who introduced me said, “this is Lisa; she is our prototypical data scientist librarian.”  I know there are other librarian coders out there because I’m on mailing lists with some of them, but I’m not currently aware of any other data librarians or medical librarians who know R.  I’m sure there are others and I would be very interested in knowing them.  And if it is fair to consider me a “prototype,” I wonder how many other librarians will be interested in becoming data scientist librarians.  I’m really interested in hearing from the librarians reading this – do you want to code?  Do you think you can learn to code?  And if not, why not?

Radical Reuse: Repurposing Yesterday’s Data for Tomorrow’s Discoveries

I’ve been invited to be speaker at this evening’s Health 2.0 STAT meetup at Bethesda’s Barking Dog, alongside some pretty awesome scientists with whom I’ve been collaborating on some interesting research projects.  This invitation is a good step toward my ridiculously nerdy goal of one day being invited to give a TED talk.  My talk, entitled “Radical Reuse: Repurposing Yesterday’s Data for Tomorrow’s Discoveries” will briefly outline my view of data sharing and reuse, including what I view as five key factors in enabling data reuse.  Since I have only five minutes for this talk, obviously I’ll be hitting only some highlights, so I decided to write this blog post to elaborate on the ideas in that talk.

First, let’s talk about the term “radical reuse.”  I borrow this term from the realm of design, where it refers to taking discarded objects and giving them new life in some context far removed from their original use.  For some nice examples (and some cool craft ideas), check out this Pinterest board devoted to the topic.  For example, shipping pallets are built to fulfill the specific purpose of providing a base for goods in transport.  The person assembling that shipping pallet, the person loading it on to a truck, the person unpacking it, and so on, use it for this specific purpose, but a very creative person might see that shipping pallet and realize that they can make a pretty cool wine rack out of it.

The very same principle is true of scientific research data.  Most often, a researcher collects data to test some specific hypothesis, often under the auspices of funding that was earmarked to address a particular area of science.  Maybe that researcher will go on to write an article that discusses the significance of this data in the context of that research question.  Or maybe that data will never be published anywhere because they represent negative or inconclusive findings (for a nice discussion of this publication bias, see Ben Goldacre’s 2012 TED talk).  Whatever the outcome, the usefulness of the dataset need not end when the researcher who gathered the data is done with it.  In fact, that data may help answer a question that the original researcher never even conceived, perhaps in an entirely different realm of science.  What’s more, the return on investment in that data increases when it can be reused to answer novel questions, science moves more quickly because the process of data gathering need not be repeated, and therapies potentially make their way into practice more quickly.

Unfortunately, science as it is practiced today does not particularly lend itself to this kind of radical reuse.  Datasets are difficult to find, hard to get from researchers who “own” them, and often incomprehensible to those who would seek to reuse them.  Changing how researchers gather, use, and share data is no trivial task, but to move toward an environment that is more conducive to data sharing, I suggest that we need to think about five factors:

  • Description: if you manage to find a dataset that will answer your question, it’s unlikely that the researcher who originally gathered that data is going to stand over your shoulder and explain the ins and outs of how the data were gathered, what the variables or abbreviations mean, or how the machine was calibrated when the data were gathered.  I recently helped some researchers locate data about influenza, and one of the variables was patient temperature.  Straight forward enough.  Except the researchers asked me to find out how temperature had been obtained – oral, rectal, tympanic membrane – since this affects the reading.  I emailed the contact person, and he didn’t know.  He gave me someone else to talk to, who also didn’t know.  I was never able to hunt down the answer to this fairly simple question, which is pretty problematic.  To the extent possible, data should be thoroughly described, particularly using standardized taxonomies, controlled vocabularies, and formal metadata schemas that will convey the maximum amount of information possible to potential data re-users or other people who have questions about the dataset.
  • Discoverability: when you go into a library, you don’t see a big pile of books just lying around and dig through the pile hoping you’ll find something you can use.  Obviously this would be ridiculous; chances are you’d throw up your hands in dismay and leave before you ever found what you were looking for.  Librarians catalog books, shelve them in a logical order, and put the information into a catalog that you can search and browse in a variety of ways so that you can find just the book you need with a minimal amount of effort.  And why shouldn’t the same be true of data?  One of the services I provide as a research data informationist is assisting researchers in locating datasets that can answer their questions.  I find it to be a very interesting part of my job, but frankly, I don’t think you should have to ask a specialist in order to find a dataset, anymore than I think you should have to ask a librarian to go find a book on the shelf for you.  Instead, we need to create “catalogs” that empower users to search existing datasets for themselves.  Databib, which I describe as a repository of repositories, is a good first step in this direction – you can use it to at least hopefully find a data repository that might have the kind of data you’re looking for, but we need to go even further and do a better job of cataloging well-described datasets so researchers can easily find them.
  • Dissemination: sometimes when I ask researchers about data sharing, the look of horror they give me is such that you’d think I’d asked them whether they’d consider giving up their firstborn child.  And to be fair, I can understand why researchers feel a sense of ownership about their data, which they have probably worked very hard to gather.  To be clear, when I talk about dissemination and sharing, I’m not suggesting that everyone upload their data to the internet for all the world to access.  Some datasets have confidential patient information, some have commercial value, some even have biosecurity implications, like H5N1 flu data that a federal advisory committee advised be withheld out of fear of potential bioterrorism.  Making all data available to anyone, anywhere is neither feasible nor advisable.  However, the scientific and academic communities should consider how to increase the incentives and remove the barriers to data sharing where appropriate, such as by creating the kind of data catalogs I described above, raising awareness about appropriate methods for data citation, and rewarding data sharing in the promotion and tenure process.
  • Digital Infrastructure: okay, this is normally called cyberinfrastructure, but I had this whole “words starting with the letter D” thing going and I didn’t want to ruin it. 🙂  If we want to do data sharing properly, we need to build the tools to manage, curate, and search it.  This might seem trivial – I mean, if Google can return 168 million web pages about dogs for me in 0.36 seconds, what’s the big deal with searching for data?  I’m not an IT person, so I’m really not the right person to explain the details of this, but as a case in point, consider the famed Library of Congress Twitter collection.  The Library of Congress announced that they would start collecting everything ever tweeted since Twitter started in 2006.  Cool, huh?  Only problem is, at least as of January 2013, LC couldn’t provide access to the tweets because they lacked the technology to allow such a huge dataset to be searched.  I can confirm that this was true when I contacted them in March or April of 2013 to ask about getting tweets with a specific hashtag that I wanted to use to conduct some research on the sociology of scientific data sharing, and they turned me down for this reason.  Imagine the logistical problems that would arise with even bigger, more complex datasets, like those associated with genome wide association studies.
  • Data Literacy: Back in my library school days, my first ever library job was at the reference desk at UCLA’s Louise M. Darling Biomedical Library.  My boss, Rikke Ogawa, who trained me to be an awesome medical librarian, emphasized that when people came and asked questions at the reference desk, this was a teachable moment.  Yes, you could just quickly print out the article the person needed because you knew PubMed inside and out, but the better thing to do was turn that swiveling monitor around and show the person how to find the information.  You know, the whole “give a man a fish and he’ll eat for a day, teach a man to fish and he’ll eat for a lifetime” thing.  The same is true of finding, using, and sharing data.  I’m in the process of conducting a survey about data practices at NIH, and almost 80% of the respondents have never had any training in data management.  Think about that for a second.  In one of the world’s most prestigious biomedical research institutions 80% of people have never been taught how to manage data.  Eighty per cent.  If you’re not as appalled by that as I am, well, you should be.  Data cannot be used to its fullest if the next generation of scientists continues with the kind of makeshift, slapdash data practices I often encounter in labs today.  I see the potential for more librarians to take positions like mine, focusing on making data better, but that doesn’t mean that scientists shouldn’t be trained in at least the basics of data management.

So that’s my data sharing manifesto.  What I propose is not the kind of thing that can be accomplished with a few quick changes.  It’s a significant paradigm shift in the way that data are collected and science is practiced.  Change is never easy and rarely embraced right away, but in the end, we’re often better for having challenged ourselves to do better than we’ve been doing.  Personally, I’m thrilled to be an informationist and librarian at this point in history, and I look forward to fondly reminiscing about these days in our data-driven future. 🙂

A Week in the Life: Tuesday

Tonight, your friendly research informationist almost didn’t get around to posting a blog because I just now finished getting caught up on some work (but to be fair, there were a lot of interruptions from the resident pup, who never gets tired of playing Squirrelly or chasing the ball, even when mom is working).  However, I promised a full week of updates, and I’m not about to stop after only one day.  So, for those inquiring minds who want to know, here’s what I got up to today.

  1. Attended the weekly meeting for my department, which is called Research, Instruction, and Collection Services.  Basically we catch each other up on the various goings-on in our department.  Though there are only 6 of us, we are all crazy busy fiends, so it’s nice to have an hour a week in which we find out what everyone is up to.
  2. Gave an orientation and overview of library services to first year students in the psychology graduate program.  It was a small group, but they were very interested in what I had to say, which is always nice, and had lots of questions.
  3. Went to a meeting about the UCLA Library’s Affordable Courseware Initiative, which is a program in which we’re offering grants to professors who update their course syllabi to offer free/open access/low cost alternative to textbooks and other paid course materials.  Rather shockingly and disconcertingly, the price of college textbooks has risen 812% since 1978.  By comparison, the consumer price index has risen around 250%.  With tuition also increasing significantly in the last few years, particularly in California, students are being hit pretty hard financially.  This initiative is designed to help mitigate some of those costs.  A similar program at UMass Amherst resulted in $750,000 savings for students from a $20,000 initial investment, which is a pretty good ROI if you ask me.  So it will be interesting to see how this all goes at UCLA.
  4. I’m the chair of the committee for speakers for the Medical Library Group of Southern California and Arizona/Northern California and Nevada Medical Library Group Joint Meeting that is coming up in July, so today I worked on getting together some information and sending some emails for that.
  5. Continued more work on NIH Public Access Policy as described yesterday.  Every time I send an email to the NIH Manuscript Submission System help desk, I feel like starting it “hello, it’s ME AGAIN!!!”  But the nice thing about doing this work is that people are genuinely happy to have the help and the results are pretty immediate.
  6. Continued the work on the NCBI course as described yesterday.
  7. Answered a gazillion more emails.
  8. Finished some ordering for my public health funds (yay!), but I still have a lot to do on my other stuff.
  9. The whole department cornered one of our coworkers who was celebrating a birthday today and sang Happy Birthday to him.  🙂
  10. Filled out paperwork for upcoming travel, of which there is quite a bit.  I never knew librarians traveled so much, but I have been on the road pretty often this year.  I think between September 2012 and August 2013, I will have taken about 12 business trips.  And there is a LOT of paperwork that goes along with all of it.  But I’m super lucky to be able to go to some very interesting meetings and take some very cool courses.

A Week in the Life of a Research Informationist: Monday

So recently my job title changed from Health and Life Sciences Librarian to Research Informationist, which is pretty cool, except that now instead of people assuming I spend my day shelving books and thinking about the Dewey Decimal System, they basically have no idea what it is I do.  I’m pretty sure my friends and family have absolutely no idea what I do for a living.  In fact, I’m not sure my co-workers even really know for sure.  One of my colleagues suggested I ought to write about what a research informationist does, and since I haven’t blogged here in ages, I thought this would be a good time to spread the word of what a research informationist is/does.  Right around the time I thought I should write this blog series, another research informationist, the lovely and talented Sally Gore, beat me to it by writing about it on her blog.   But hey, you can never have too many research informationists talking about their awesome jobs, right?

With that, I give you the activities of my Monday.

  1. I spent a lot of time helping several people trying to figure out the NIH Public Access Policy.  To vastly simplify, I would summarize the policy by saying if you get NIH grant money, you have to make your articles that come out of that funding available in PubMed Central (PMC), the open access repository of the National Library of Medicine.  In truth, the policy and the myriad different things you have to do to comply with it are quite complex.  NIH has recently announced that they would start enforcing the policy by delaying grant renewals to researchers who aren’t in compliance, so this means that I’m getting a lot of calls from people who are having to catch up on five years’ worth of article submissions.  In theory, I like this policy and I think it’s really important in getting medical literature to clinicians and researchers who wouldn’t be able to afford it otherwise, but in practice, it’s really confusing for people because there are so many different ways you can comply and also lots of ways things can go wrong.  I would like for it to be a lot easier for researchers to get their work into PMC so they and their staff don’t have to spend a lot of time freaking out about this.  However, in the meantime, I help a lot of people who need to figure this stuff out and in so doing have become more of an expert on the policy than I ever wanted to.
  2. I’m working on a couple of search strategies for researchers who are writing systematic reviews.  These are articles that essentially summarizes the body of literature on a particular question.  This is nice because a busy clinician can then just read one article instead of having to go find the hundreds or thousands that are relevant to the question. Plus, when you gather a lot of data and consider it all together, you can get a better sense of what’s really going on than if you just had a small sample.  However, identifying all of the relevant literature is pretty challenging, so it’s useful to have a librarian/research informationist help out as an “expert searcher” or as I like to think of it, a “PubMed whisperer.”  Putting these searches together is pretty time-consuming, plus I help the researchers manage the workflow of analyzing the articles that my searches turn up.  So today I helped out some of the researchers I’m working with on those articles, including getting them set with using Mendeley, a very cool citation management program.
  3. I’m a member of the Medical Library Group of Southern California and Arizona and the chair of their blog committee, so today I had to do some work with getting some entries up on the blog.
  4. Another one of my responsibilities is collection development, or buying stuff for the departments to whom I am the liaison librarian, which include public health, psychology, and some others.  I’ve been so busy that I’ve kind of been putting off my ordering, so I have to find a lot of stuff to buy in the next couple weeks.  You’d think getting to spend lots of money on books would be great, but it is less so when it’s in the context of work.  Plus, I can never find exactly what I want.  For example, my public health students ask a lot of questions about two fairly obscure and relatively specific topics: water consumption and usage in the context of health care, and food deserts (urban areas where it’s hard to find healthy food so people end up eating junk food and whatever they can get at convenience stores).  So I wanted to buy some books that would help them out with this, but it’s harder than you’d think!  This project will be carried over to tomorrow.
  5. I’m taking a very cool online/in-person course called Librarian’s Guide to NCBI.  The course covers some bioinformatics tools that are particularly relevant to people doing work in genetics and molecular biology.  As a research informationist, I think it’s important to be able to provide a high level of specialized assistance to researchers, so learning more about these tools is essentially adding some more stuff to my toolbox. I did the first week’s module today (although it’s the second week, so I’m already behind).  Most of the material in this first lecture was stuff I pretty much already knew, but I played around a little bit with some of the tools and searched around a bit in NLM’s Gene database.
  6. I manage our four library school graduate students who work on our reference desk, and today we had our monthly training session.  There’s really a lot you need to know to work at the reference desk of a busy biomedical library, and these students do a fantastic job, but the learning is never really over.
  7. Email.  I answered a gazillion emails.  The email never ends.

I did some other random stuff, but that’s the main stuff I did today.  Phew.  🙂

The Librarian’s First Dataset: A Treatise on Incredible Nerdiness

I must preface this post by saying that, if you didn’t know already, I’m a huge herd.  The biggest.  There’s nothing I’m more passionate about than knowledge and learning, and this has often earned me very perplexed looks from people who probably think I’m crazy.  In this post, I’m going to wax poetic about knowledge and reveal the depths of my geekiness.  However, I’m guessing if you’re here reading this blog, this is probably not going to come as any sort of a surprise to you.

For the last few weeks, I’ve been working on planning a research data management class.  Working with researchers on their data is hands-down my favorite part of my job.  I adore science and the best part of being a medical librarian/research informationist is that I get to work with all different researchers and hear about all sorts of fascinating things.  Sometimes I regret that I didn’t get a science degree, but mostly I’m okay with it because this job allows me to get my hands into all sorts of different things and never have to choose a specialty. Talking to researchers is fascinating.  However, the more I talk to them, the more I realize that a lot of them really have no idea what they’re doing when it comes to data management.  These are brilliant people, to be sure, but the way they handle their data makes me cringe.  They’ve never been trained to do it properly, but as a librarian, I have that training.  Part of what I do is helping people with their data, but I also believe in the adage about giving a man a fish versus teaching him to fish.  I’m one librarian in a huge research enterprise.  As much as I’d like to, there’s no way I could possibly reach everyone to personally help them figure out their data.  So one of the things I decided to do to help mitigate the fact that I can’t be in eight million places at once is to offer a class on research data management.

Because I work in the field of medicine, in which everything must be evidence-based, of course I wasn’t satisfied just to offer a class and hope people liked it.  I am a data librarian, so I decided that I should probably gather some data!  My plan was to devise a pre-test that people would take before the class, then a follow-up post test.  Obviously the goal was that they wouldn’t know the answers to the questions on the pre-test, and then they would after the class. I spent weeks agonizing over how best to assess this. I’ve had very, very preliminary training in devising assessment instruments, but mostly I was just kind of taking a shot in the dark when I came up with my pre-test. I changed the questions a million times, but I finally came up with something that I thought would probably work.

Today, our office manager sent out the reminder email about tomorrow’s class to those who had RSVP’d.  The email contained a link to the survey and a brief explanation of why I was asking people to complete it.  It was a short survey, took only a couple minutes to complete, but I had this sinking feeling that everyone would ignore it.  Because of IRB (Institutional Research Board) requirements, I had emphasized in the email that people weren’t required to take the survey if they wanted to do the class.  I figured people would see that and just ignore the survey, but I was keeping my fingers crossed.  I was on the train to the airport in San Francisco on my way back to Los Angeles when I saw that the email had gone out.

So now, allow me to set the scene for one of the nerdiest moments of my life.  I had gotten to the airport and had some time to kill before my flight, so I was sitting in a wine bar getting something to eat (and drink of course).  I ordered a glass of Champagne (yeah, that’s how I roll) and pulled out my laptop.  I was logging on when the Champagne arrived.  I pulled up the survey site.  The email had only gone out maybe an hour or so earlier, so I wasn’t expecting any responses yet.  But when I logged on, you know what I found?  Almost EVERY SINGLE PERSON who has registered for the class had taken the survey!  When I saw the number of responses, I made an audible, astonished gasp, and several people in the restaurant turned and looked at me.  I refrained from getting up from my seat and jumping up and down in excitement, though this is what I would have done if I had been alone. 🙂

Not only did people respond to my survey, but they responded exactly as I hoped they would.  I won’t go into detail here, since obviously I’m going to attempt to publish all of this in a peer-reviewed journal.  🙂  But essentially, these pre-test results reveal that, as I had suspected, these people really need a lot of help with this stuff and don’t have a lot of knowledge of the many awesome resources out there.  Hopefully that will all change tomorrow when I teach this class.

So that is the story of how I came to have my very own research dataset.  This is incredibly heartening for me.  For one thing, I’ve always felt like I really ought to have more hands-on experience working with data if I’m going to teach it.  My dataset is super tiny compared to the datasets I help researchers with, but this is a good start.  More importantly, I am so excited that this actually worked.  I’ve been wanting to move forward with additional research in this area, but I wasn’t entirely sure if it was worthwhile, since I basically only had anecdotal evidence to suggest this kind of thing was needed, and there have been a few naysayers whose words weighed heavily on my mind.  I’ve worked really hard on all of this, and it’s been exhausting, especially with having to work around sort of a crazy travel schedule.  But now it feels like things are all falling into place.  All those little ideas I’ve had floating around in my mind about additional research I’d like to do feel a little more feasible now.  So it’s an exciting time for me career-wise.  Now that I’m a little more assured that I know what I’m doing, I have some good ideas about how to move forward. I’ve got a hunger for data and research now and I need more. 🙂

So yeah, again, probably news to no one, but I’m a huge nerd.  Now, in celebration, I’m going to order a second glass of Champagne to enjoy in the hour before I have to catch my flight.  Cheers!

Data Literacy Instruction: Training the Next Generation of Researchers

Drowning in data? A librarian can help! (Image by Cjangaritas (Own work) [CC-BY-SA-3.0 (, via Wikimedia Commons)”

This post was originally published on Data Pub, a blog on data publication, sharing, citation, and more from the California Digital Library’s University of California Curation Center.

In my previous life as an English professor, every semester I looked forward to the information literacy instruction that our librarian did for my classes.  I always learned something new, and, even better, my students no longer tried to cite Wikipedia as a source in their research papers.  Now that I’m a health and life sciences librarian, the tables are turned, and I’m the one responsible for making sure that my patrons are equipped to locate and use the information they need.  When it comes to the people I work with in the sciences, often the information they need is not an article or a book, but a dataset.  As a result, I am one of many librarians starting to think about best practices for providing data literacy instruction.

According to the National Forum on Information Literacy, information literacy is “the ability to know when there is a need for information, to be able to identify, locate, evaluate, and effectively use that information for the issue or problem at hand.”  The American Library Association has outlined a list of Information Literacy Competency Standards for Higher Education.  So far, a similar list of competencies for data literacy instruction has not been defined, but the general concepts are the same – researchers need to know how to locate data, evaluate it, and use it.  More importantly, as data creators themselves, they need to know how to make their datasets available and useful not just to their own research group, but to others.

Fortunately, a number of groups around the country are working on developing data literacy curricula.  Teams from Purdue University, Stanford University, the University of Minnesota, and the University of Oregon have received a grant from the Institute of Museum and Library Services (IMLS) to “develop a training program in data information literacy for graduate students who will become the next generation of scientists.”  Results and resources will eventually be available on their project website.  Also working under the auspices of an IMLS grant, a team from University of Massachusetts Medical School and Worcester Polytechnic Institute has developed a set of seven curricular modules for teaching data literacy.  Their curriculum centers on teaching researchers what they would need to know to complete a data management plan as required by the National Science Foundation (NSF) and several other major grant funders.

All of the work that these other institutions has done is a fantastic start, but at my institution, the researchers and students are very busy, and not likely to commit to a seven-session data literacy program.  Nonetheless, it’s still important that they learn how to manage, preserve, and share their data, not only because many funders now require it, but also because it’s the right thing to do as a member of the scientific community.  Thus, my challenge has been to design a one-off session that would be applicable across a variety of scientific (and perhaps even social science) fields.  In order to do so, I’ve started with my own list of core competencies for data literacy instruction, including:

  • understanding the “data life cycle” and the importance of sharing and preservation across the entire life cycle, especially for rare or unique datasets
  • knowing how to write a data management plan that will fulfill the requirements of funders like NSF
  • making appropriate choices about file forms and formats (such as by choosing open rather than proprietary standards)
  • keeping data organized and discoverable using file naming standards and appropriate metadata schema
  • planning for long-term, secure storage of data
  • promoting sharing by publishing datasets and assigning persistent identifiers like DOIs
  • awareness of data as scholarly output that should be considered in the context of promotion and tenure

Does this list cover everything a researcher would need to know to effectively manage their data?  Almost certainly not, but as with any single session, my goal is to introduce learners to the major issues and let them know that the library has the expertise to assist them with the more complicated issues that will inevitably arise.  Supporting the data needs of researchers is a daunting task, but librarians already have much of the knowledge and skills to provide this assistance – we simply need to adapt our knowledge of information structures and best practices to this burgeoning area.

As research becomes increasingly data-driven, libraries will be doing a great service to individuals and the research community as a whole by helping to create researchers who are good data stewards.  Like my formerly Wikipedia-dependent students, many of our researchers are still taking shortcuts when it comes to handling their data because they simply don’t know any better.  It’s up to librarians and other information professionals to ensure that the valuable research that is going on at our institutions remains available for future generations of researchers.

Frontier Librarians: Information Professionals in the Digital World

Taken in 1976, this photo illustrates a librarian filing computer tapes in the LA public library's computer facility. Image from UCLA's Digital Library's LA Times Photograph collection.

Yesterday, I waxed poetic about the role libraries have played in my life, though I knew so little about what librarians actually did.  In much the same way, I think most of my patrons don’t know what I do, either.  In fact, my family and friends don’t really know what I do.  When it comes right down to it, when I got into library school, even I wasn’t entirely sure what a librarian did. The answer to that, as I said, is that we actually do a lot of different things, most of which the average person would probably not associate with librarians. In any case, this whole train of thought started with a friend asking me how being a librarian has changed since the Internet, and I intend to answer that now.

First, there’s a lot more information out there these days, and it’s a lot easier for people to get direct access to it.  This is great in a lot of ways – now anyone can walk in to their public library and hop on the Internet to find pretty much anything they’re interested in.  Of course, there are still some people who don’t have Internet access, and not everyone has the digital literacy skills to navigate the web even if they do have access, but at least for me as an academic librarian, I can generally assume that my patrons are fully capable of getting on Google and finding what they think they need.  The problem is that what you find on Google is not necessarily what you actually need.  Let’s put it this way: would you want your doctor to decide how to treat your condition based on an article he’d just found on Google?  I don’t think I would, but that’s exactly how a lot of doctors find information and they see absolutely no problem with it.  As someone who’s an expert in this (or at least a young, burgeoning expert), I know how to find it and it’s not that hard for me to teach people how to do it.  The problem is that people are busy, and if they’re really convinced that they already know how to find what they need, they’re not going to come spend even an hour to hear what I’ve got to say.

Compare this to the days when there were computers and networks in the library, but they weren’t yet there for the patrons.  I love talking to librarians who have been in the field for awhile about how searching worked in the 1980s and 90s.  Rather than having to do their best to figure it out themselves, patrons told librarians what they were looking for, and the librarians found it.  For some databases, you paid per search, so you couldn’t just keep adjusting your search strategy until you found what you were looking for – you had to know how to word your search the first time around (I would be ashamed to show these librarians my PubMed search history – I sometimes play around with it and run searches in tons of different ways just to see how little changes give me different results). And compare this to the days before there were computers at all, and when you had a reference question, the librarian had to know which book contained the information you needed.  I know where to find things in the sense that I know which databases or resources would have the information, but I can’t imagine having to know what all the books in my library contained.

Obviously, it would not be the quickest system, having a whole university’s or hospital’s worth of patrons going to the librarians every time they wanted to find something, and then the librarians taking turns on a computer that must have been terribly slow since I doubt they even had dial-up access by then.  Now you can get on Google and find something in the blink of an eye.  Never mind that it may be crap information.  If you want a demonstration of this, Google “Martin Luther King” and take a look at the fourth result down, and tell me you’re not bothered by the fact that something so out-there is the number four hit (my awesome colleague does this at the medical students’ orientation and everyone is always surprised).

This method for judging the usefulness of medical information appears in many medical texts:

usefulness of medical information  =

relevance × validity


Work refers to the amount of time or effort the person had to put in to find the information.  So that means that a really good article that takes a long time to find is as useful to a doctor as a crappy article he or she found really easily.  Or maybe even less useful, if it takes long enough.  Knowing that, my goal as a librarian is to help people learn how to find articles with a higher level of relevance and validity while still expending the minimum effort possible.  Sure, it would great if people would think to consult the librarian every time they had a serious question they needed to answer, but I know that’s not going to happen, and frankly, I wouldn’t have enough time to answer all those questions anyway.  What seems most logical to me is teaching people how to use tools well so they will hopefully know what to do when I’m not there to help them. In that sense, I don’t think my job is that different from the librarians of the pre-Internet era – they too were trying to teach people how to connect with knowledge.  A big difference now, though, is it’s a harder sell to people whose information seeking skills are barely passable, yet who think of themselves as being perfectly awesome researchers because they don’t even realize all the stuff that’s out there that they’re missing.

It’s a daunting task, but luckily, along with the challenges of the Internet come the tools by which we can also reach people.  I can’t be with a patron at 3 am when they’re writing their paper, but the video tutorial I make or the web page I write or the research guides I post can be.  I can’t put up paper announcements that are going to be seen by every person who might be interested in a class I’m offering, but Tweeting, blogging, Facebooking, and emailing that info will reach a lot more people than signs ever could.  People that I could never possibly get an appointment with will usually at least answer my emails. I guess, then, it all kind of evens out.  The technology makes our job more challenging, but it gives us the tools to meet that challenge.  It throws up roadblocks, but gives us new shortcuts to go the other way around.

I could say something about the democratization of information thanks to the Internet, but I think that’s a discussion for another day.

The Researcher’s Guide to Making the Most of Your Librarian

I bet this is what you think of when you hear "librarian," but the 21st century academic librarian does a lot more than shelving books, and is one of the most valuable research tools out there. Image attribution: David Rees (1943—), Environmental Protection Agency derivative work: Andrzej 22 Public domain, via Wikimedia Commons

The way I see it, if you’re a researcher, your librarian should be your best friend.  Maybe I’m biased, but I think that, no matter what field you’re in, you are doing yourself a favor if you get to know your librarian. If you don’t know who your librarian is, or (gasp) don’t even know where you library is, read on to find out how to make your life and research easier, and then stop what you’re doing and meet your librarian!

When I meet researchers who haven’t worked much with librarians, I can tell what they’re thinking.  They consider me a person to call when their library card isn’t working, their electronic access to a journal article is down, or they want to contest a fine.  I know that’s kind of what most people think librarians do, but in fact, I have nothing to do with any of that and I couldn’t actually answer any of those questions for you (although I could point you in the right direction).  To be honest, I went into library school kind of thinking that this was what librarians did, too.  I remember worrying that I might have to memorize the Dewey Decimal System (which, by the way, I also know very little about, as it’s not used in most academic or medical libraries).

As it turns out, librarians are experts in a lot more than just how books and journals are arranged.  I didn’t end up learning the Dewey Decimal System in library school, but I did learn some of the librarian-y things you’d expect, like how to conduct a reference interview, about information-seeking behaviors, how to do information literacy instruction, and the like.  However, I also learned about database construction, user experience design and information architecture, grant-writing, metadata standards, data curation and management, and a ton of other things that make librarians invaluable assets to researchers.

In my job, I work with researchers in many capacities – assisting with search strategies for literature searches, helping them figure out how to use citation management software like EndNote and Mendeley, and yes, sometimes helping people when electronic access to journals breaks.  I teach people how to find information more easily, or to put it another way, where to look for what you want (hint: it’s not Google) and how to word your search so that the results will be what you’re looking for and you won’t have to sift through 20 pages of crap articles to get to the one you want. Sometimes researchers come to me after spending several frustrating hours trying unsuccessfully to find something, and I can find it in under ten minutes.  Searching is a skill, and it’s not one that most people learn, unless they go to library school or get a librarian to teach it to them.  Of course there’s a lot I’m also doing behind the scenes, like selecting resources to purchase and fighting for open access and against things like the Research Works Act.

One of the things that I find most interesting in my interactions with researchers is helping them with their data.  I think a lot of researchers still don’t realize that the library (at least this is true at UCLA) is equipped to help with NSF data management plans, data management, storage and preservation of data, and the like.  Sometimes I sit down with researchers and look at their data sets and point out things they could do or change to make that data set not only useful for other people, assuming the data will be shared, but also things that will make it easier for the original researcher.  If you’re a researcher working with any sort of data, from a simple little Excel spreadsheet up to some massive data set, there are probably things that you could be doing better with it, and a librarian could help you with that.

Now that you know about some of the hidden talents of the librarian and you want to get yours working for you, here’s how to do it:

  • Find out who your librarian is.  In many academic libraries, librarians are assigned liaison areas, so figure out who covers your area.  This person will be knowledgeable about the kinds of resources people in your field use, and will almost certainly be able to teach you some tricks for using those resources more efficiently.
  • Meet or email your librarian.  Many librarians are introverts, so they’re not necessarily the kind of people who are going to be showing up and being vocal all over the place, but most of the librarians I know love hearing from patrons and are happy to help.
  • Let your librarian know what you’re researching and what you’re interested in. I certainly can’t speak for all librarians, but I remember the patrons I help, and when I run across an article or resource that seems relevant to their search, I email it to them.
  • Ask your librarian about data services on your campus.  Here at UCLA, we have tons of cool services that can make people’s research lives so much easier, but a lot of researchers have no idea any of this stuff exists, much less how to use it.
  • When you’re going to start a new research project, consult your librarian early in the process.  Chances are good that he or she will have some ideas that will save you lots of time and trouble.  The help a librarian can give you will leave you more time to work on your actual research rather than doing something like formatting citations, and wouldn’t you rather be working on your research?

So there you go.  Well, what are you still doing here?  Go talk to your librarian! 🙂

Surviving the Times: #HLTH, Data and Keeping Librarians Relevant

The library world has been disturbed to hear news of layoffs from Harvard’s Library. What can we do as librarians to help ourselves and our field?  (Image by Joseph Williams (originally posted to Flickr as Harvard) [CC-BY-2.0 (

An academic friend of mine sent me a Facebook message this evening.  He’d heard a rumor about something terrible going on at the Harvard Libraries – surely it couldn’t be true that Harvard had fired all of its librarians?

Well, no.  Not exactly.  The outlook is still grim as additional details roll in, but the truth is not quite as bad as initial reports would have it seem.  What I’m hearing, primarily from Chris Bourg’s very informative blog post, is that layoffs will affect technical services (like cataloging and metadata librarians), preservation, and access services, but not collection development, reference, or special collections, although I gather the situation isn’t necessarily looking super promising for these librarians, either.

Of course, there’s much wringing of hands over this news, as well there should be.  If something like this is happening at Harvard, with its deep pockets, what’s going to happen at struggling public institutions like the UCs, especially with a $100 million budget cut recently handed down from the state?  Librarians should be concerned, but I think hearing news like this serves as a great reminder that we need to be proactive in finding new roles for ourselves, as people seem to increasingly feel that they can “just Google it” and that they don’t need us.  What can we do to convince them otherwise?

My feeling is that we need to do more to demonstrate the library’s value as more than just a place where you go to check out books.  More importantly, we need to demonstrate the librarian’s value as more than just someone who you email when your electronic access to a journal isn’t working.  Most of us hold masters degrees, and those of us who don’t draw on valuable work expertise.  We know how to do a whole lot more than just tell people how to use the printer.  With our knowledge of information systems, metadata, needs assessments, technology, and tons of other stuff we know a lot about, we are invaluable campus resources.  It’s important that we make ourselves vocal and let people know about that.

I’ve been doing a lot of outreach to faculty that my library hasn’t been very connected with in the recent past, which involves me wrangling half an hour with a busy faculty member who is probably only seeing me to be polite.  When I got to go to a faculty meeting, I introduced myself and said how pleased I was to be there, and the chair looked at me and said, “this won’t take very long, right?”  I’ve gone into these meetings knowing that I had a very short window of opportunity to prove to these people that time spent talking to me and time given to me to stand in front of their classes was not going to be wasted time.  I did end up winning over that chair and her department – when I was walking out the door after my half hour was up, I heard one of them say, “I wish we could just talk about all of that for the rest of the meeting.”  So what did I say to win them over?

Rather than guess what I thought they might find most interesting and try to lead with that, I took the rather optimistic approach of basically listing off services we offered and other things that I’m able to talk on knowledgeably, and keeping an eye on them to see what they reacted to.  There were a couple of trends in all of the meetings I’ve had lately.  The one that surprised me a little: citation management software.  Everybody seems to know that citation management software is a great time saver, but no one seems to know how to use it.  When they heard that I did, they were thrilled.  The one that didn’t surprise me: data.

Scientists are inundated with data and are, on the whole, given no training in how to handle it.  Librarians, however, have a great deal of expertise in handling data.  No matter what your focus or specialty as a librarian, you probably have some sort of special knowledge that would make a scientist very happy to sit down with you and talk data.  Most significantly, those librarians in the groups affected at Harvard – cataloging and metadata, preservation, and access services – probably have the most valuable knowledge as it relates to data.  As someone who is increasingly working with scientists on data issues, I can say with absolute certainty that we will need these librarians and their knowledge and experience.  So it will be a real shame if they all get fired.

I feel terrible for all of the librarians at Harvard.  I can’t imagine how nerve-wracking it will be for these poor people to go back to work with this kind of thing hanging over their heads.  I’ll be keeping an eye on Twitter (#hlth) and I encourage my fellow librarians to do the same.  More importantly, though, I challenge my fellow librarians to do something tomorrow to fight against this tide: sign up for a continuing education class to learn a new skill, make an appointment with a faculty member you’ve never spoken to, schedule a workshop to teach something new to your patrons (I give you permission to steal from me and go with citation management software).  Whatever you do, make sure that your patrons (and more importantly, your chancellor or dean or whoever) knows that the library is about a lot more than just checking out books.