Data Literacy Instruction: Training the Next Generation of Researchers

Drowning in data? A librarian can help! (Image by Cjangaritas (Own work) [CC-BY-SA-3.0 (, via Wikimedia Commons)”

This post was originally published on Data Pub, a blog on data publication, sharing, citation, and more from the California Digital Library’s University of California Curation Center.

In my previous life as an English professor, every semester I looked forward to the information literacy instruction that our librarian did for my classes.  I always learned something new, and, even better, my students no longer tried to cite Wikipedia as a source in their research papers.  Now that I’m a health and life sciences librarian, the tables are turned, and I’m the one responsible for making sure that my patrons are equipped to locate and use the information they need.  When it comes to the people I work with in the sciences, often the information they need is not an article or a book, but a dataset.  As a result, I am one of many librarians starting to think about best practices for providing data literacy instruction.

According to the National Forum on Information Literacy, information literacy is “the ability to know when there is a need for information, to be able to identify, locate, evaluate, and effectively use that information for the issue or problem at hand.”  The American Library Association has outlined a list of Information Literacy Competency Standards for Higher Education.  So far, a similar list of competencies for data literacy instruction has not been defined, but the general concepts are the same – researchers need to know how to locate data, evaluate it, and use it.  More importantly, as data creators themselves, they need to know how to make their datasets available and useful not just to their own research group, but to others.

Fortunately, a number of groups around the country are working on developing data literacy curricula.  Teams from Purdue University, Stanford University, the University of Minnesota, and the University of Oregon have received a grant from the Institute of Museum and Library Services (IMLS) to “develop a training program in data information literacy for graduate students who will become the next generation of scientists.”  Results and resources will eventually be available on their project website.  Also working under the auspices of an IMLS grant, a team from University of Massachusetts Medical School and Worcester Polytechnic Institute has developed a set of seven curricular modules for teaching data literacy.  Their curriculum centers on teaching researchers what they would need to know to complete a data management plan as required by the National Science Foundation (NSF) and several other major grant funders.

All of the work that these other institutions has done is a fantastic start, but at my institution, the researchers and students are very busy, and not likely to commit to a seven-session data literacy program.  Nonetheless, it’s still important that they learn how to manage, preserve, and share their data, not only because many funders now require it, but also because it’s the right thing to do as a member of the scientific community.  Thus, my challenge has been to design a one-off session that would be applicable across a variety of scientific (and perhaps even social science) fields.  In order to do so, I’ve started with my own list of core competencies for data literacy instruction, including:

  • understanding the “data life cycle” and the importance of sharing and preservation across the entire life cycle, especially for rare or unique datasets
  • knowing how to write a data management plan that will fulfill the requirements of funders like NSF
  • making appropriate choices about file forms and formats (such as by choosing open rather than proprietary standards)
  • keeping data organized and discoverable using file naming standards and appropriate metadata schema
  • planning for long-term, secure storage of data
  • promoting sharing by publishing datasets and assigning persistent identifiers like DOIs
  • awareness of data as scholarly output that should be considered in the context of promotion and tenure

Does this list cover everything a researcher would need to know to effectively manage their data?  Almost certainly not, but as with any single session, my goal is to introduce learners to the major issues and let them know that the library has the expertise to assist them with the more complicated issues that will inevitably arise.  Supporting the data needs of researchers is a daunting task, but librarians already have much of the knowledge and skills to provide this assistance – we simply need to adapt our knowledge of information structures and best practices to this burgeoning area.

As research becomes increasingly data-driven, libraries will be doing a great service to individuals and the research community as a whole by helping to create researchers who are good data stewards.  Like my formerly Wikipedia-dependent students, many of our researchers are still taking shortcuts when it comes to handling their data because they simply don’t know any better.  It’s up to librarians and other information professionals to ensure that the valuable research that is going on at our institutions remains available for future generations of researchers.

The Researcher’s Guide to Making the Most of Your Librarian

I bet this is what you think of when you hear "librarian," but the 21st century academic librarian does a lot more than shelving books, and is one of the most valuable research tools out there. Image attribution: David Rees (1943—), Environmental Protection Agency derivative work: Andrzej 22 Public domain, via Wikimedia Commons

The way I see it, if you’re a researcher, your librarian should be your best friend.  Maybe I’m biased, but I think that, no matter what field you’re in, you are doing yourself a favor if you get to know your librarian. If you don’t know who your librarian is, or (gasp) don’t even know where you library is, read on to find out how to make your life and research easier, and then stop what you’re doing and meet your librarian!

When I meet researchers who haven’t worked much with librarians, I can tell what they’re thinking.  They consider me a person to call when their library card isn’t working, their electronic access to a journal article is down, or they want to contest a fine.  I know that’s kind of what most people think librarians do, but in fact, I have nothing to do with any of that and I couldn’t actually answer any of those questions for you (although I could point you in the right direction).  To be honest, I went into library school kind of thinking that this was what librarians did, too.  I remember worrying that I might have to memorize the Dewey Decimal System (which, by the way, I also know very little about, as it’s not used in most academic or medical libraries).

As it turns out, librarians are experts in a lot more than just how books and journals are arranged.  I didn’t end up learning the Dewey Decimal System in library school, but I did learn some of the librarian-y things you’d expect, like how to conduct a reference interview, about information-seeking behaviors, how to do information literacy instruction, and the like.  However, I also learned about database construction, user experience design and information architecture, grant-writing, metadata standards, data curation and management, and a ton of other things that make librarians invaluable assets to researchers.

In my job, I work with researchers in many capacities – assisting with search strategies for literature searches, helping them figure out how to use citation management software like EndNote and Mendeley, and yes, sometimes helping people when electronic access to journals breaks.  I teach people how to find information more easily, or to put it another way, where to look for what you want (hint: it’s not Google) and how to word your search so that the results will be what you’re looking for and you won’t have to sift through 20 pages of crap articles to get to the one you want. Sometimes researchers come to me after spending several frustrating hours trying unsuccessfully to find something, and I can find it in under ten minutes.  Searching is a skill, and it’s not one that most people learn, unless they go to library school or get a librarian to teach it to them.  Of course there’s a lot I’m also doing behind the scenes, like selecting resources to purchase and fighting for open access and against things like the Research Works Act.

One of the things that I find most interesting in my interactions with researchers is helping them with their data.  I think a lot of researchers still don’t realize that the library (at least this is true at UCLA) is equipped to help with NSF data management plans, data management, storage and preservation of data, and the like.  Sometimes I sit down with researchers and look at their data sets and point out things they could do or change to make that data set not only useful for other people, assuming the data will be shared, but also things that will make it easier for the original researcher.  If you’re a researcher working with any sort of data, from a simple little Excel spreadsheet up to some massive data set, there are probably things that you could be doing better with it, and a librarian could help you with that.

Now that you know about some of the hidden talents of the librarian and you want to get yours working for you, here’s how to do it:

  • Find out who your librarian is.  In many academic libraries, librarians are assigned liaison areas, so figure out who covers your area.  This person will be knowledgeable about the kinds of resources people in your field use, and will almost certainly be able to teach you some tricks for using those resources more efficiently.
  • Meet or email your librarian.  Many librarians are introverts, so they’re not necessarily the kind of people who are going to be showing up and being vocal all over the place, but most of the librarians I know love hearing from patrons and are happy to help.
  • Let your librarian know what you’re researching and what you’re interested in. I certainly can’t speak for all librarians, but I remember the patrons I help, and when I run across an article or resource that seems relevant to their search, I email it to them.
  • Ask your librarian about data services on your campus.  Here at UCLA, we have tons of cool services that can make people’s research lives so much easier, but a lot of researchers have no idea any of this stuff exists, much less how to use it.
  • When you’re going to start a new research project, consult your librarian early in the process.  Chances are good that he or she will have some ideas that will save you lots of time and trouble.  The help a librarian can give you will leave you more time to work on your actual research rather than doing something like formatting citations, and wouldn’t you rather be working on your research?

So there you go.  Well, what are you still doing here?  Go talk to your librarian! 🙂

Link Roundup: Open Science

German scientists, being all science-y with beakers and chemicals

If you’re an American taxpayer, you are funding the research of scientists around the countryIn return, you’re getting cures for your illnesses, more accurate weather reports, and tons of other stuff that comes about as a result of the US research endeavor.  This is nothing new.

What is new is the fact that you, sitting there at your computer, can get access to a lot of this science.  Some of it, you can read for free by accessing it from an open access content source like PubMed Central or Public Library of Science (PLoS). More often, though, this work is published in a scientific journal that costs a lot.  You can buy access to these articles for usually around $30 a pop, which is more than I usually pay for a book, much less a single article.  Probably most people aren’t going to pay that.  The bigger question is, should people even be asked to pay that?  If I’ve already paid for this research with my tax dollars, am I not entitled to read the results of that research?

This is the question that drives the concept of open access.  Large federal funders like the National Institutes of Health and the National Science Foundation require that you do make your work open access if you’re getting funding from them.  As a librarian, I’m very much in favor of open access.  I think that making knowledge freely available betters society and creates more opportunities for researchers to collaborate on projects that will further the greater good.  Also, because I’m perhaps a bit idealistic, I have a little bit of problem with publishers making millions off of articles that were entirely funded by my tax money, but that’s for another post.

(By the way, lest you think that open access is going to put publishers out of business, you don’t have to worry for them.  If I’m an author whose NIH grant funding means that my article has to be made freely available online, the publisher is just going to charge me, the author, to publish my work in their journal.  These open access fees often come to several thousands of dollars, so the publishers are still making a pretty penny.)

I assume if you’re here it’s because you like reading and learning and perhaps you’d like to read and learn more about this, so with that in mind, here is a list of articles that I have found of interest lately on the concept of open science.  The federal funding issue is one part of this; as you will see, these links deal with the concept of openness in science more broadly.  Enjoy!

  • Shrimp on treadmills, laundry-folding robots, and the problem of ridiculing research
    You’ve proabably heard of the Ig Nobel Awards, which, um, “honor” scientists doing “improbable” research.  In other words, they make fun of people who are working on what sound like really stupid research projects, like making a bra that converts into a gas mask or figuring out the minimum air density of wasabi necessary to wake a sleeping person, thereby facilitating the invention of a wasabi-spraying fire alarm (I know I’d rather be wakened by being doused in wasabi than having to hear some shrill alarm, right?).  It’s easy to laugh at these projects, except as Liz Borkowski points out in this article, even experiments that sound absurd can have practical applications.  When Congress people start mocking scientific studies that they don’t understand under the guise of protecting the taxpayers from silly spending, we risk losing out on important government funding that supports a great deal of the very important research that goes on in the US today.
  • U.S. Says Details Of Flu Experiments Should Stay Secret (or opt for the official NIH Press Statement on the NSABB Review of H5N1 Research)
    As we all know from watching movies like Contagion, bird flu is the terrifying pandemic that will eventually kill us all.  Some researchers have done some research into the likelihood of this situation by studying what sorts of genetic changes to the virus would make it easier for the illness to pass between humans (right now, you’ve got to get it from a bird).  Now, the US government would like the researchers to kindly keep quiet about their research because of fears that bioterrorists could use this knowledge to weaponize the virus.  I can see the point of their concerns, but the scientific community argues that this knowledge needs to be shared so that others can build upon this initial research, hopefully getting us closer to finding a cure or learning how to prevent the spread of the disease.  I can see both sides, but at least for now, the researchers are respecting the request, although the journal Science seems to be considering moving forward with publishing one of the articles.
  • Acceptance of CC-NC has sold readers and authors seriously short
    Open science expert Peter Murray-Rust discusses why licensing open access articles in PubMed Central as CC-NC rather than CC-BY is “a disaster.”  CC stands for Creative Commons, which is an organization dedicated to creating the legal and technical infrastructure necessary to facilitate sharing and openness on the Internet.  There are a number of different CC licenses one may apply to their work that specify what others can and cannot do with that work.  I won’t get into the technical details of what all of these different licenses do, but Murray-Rust nicely explains why the difference is important.  With authors paying thousands of dollars for their work to be “open access,” it’s important that the access is really as open as we might expect.

Image info: Deutsche Fotothek‎ [CC-BY-SA-3.0-de (], via Wikimedia Commons

The Human Genome – What It Means to You (If You Want to Know, That Is)

DNA double helix*

Lately I’ve had the great honor to work with a researcher who is involved in the development of what will likely develop into a major weapon against disease: personalized medicine. My extremely simplistic explanation of what that means goes like this: many diseases, particularly cancers, are genetically based.  That is, mutations on a certain gene can cause you to be predisposed to develop a certain kind of cancer.  For example, genetic researchers have identified BRCA-1 and BRCA-2, breast cancer suppressor genes that prevent tumor development.  Genetic mutations in these two genes have been linked to the development of breast and ovarian cancer.

The practical application of this is that which gene is affected, and the way it is affected, can influence the way that you respond to medications and treatments for the condition.  That is, if you had a mutation on BRCA-1 as opposed to BRCA-2, you might respond better to Drug A than Drug B.  This science is very preliminary, but we’re learning more and more about how genetic factors relate to which treatments will be effective and which diagnostic tests will be accurate.  The researcher I know has told me that we’re a good ten years away from this becoming a part of regular medical practice, but at some point, it’s theoretically possible that you could receive targeted drugs designed to treat your specific illness.  Incidentally, Steve Jobs had his pancreatic cancer sequenced, but it was too late for him.

There’s still a lot we don’t know about how the human genome really works, but we have a lot of data.  This is part of the reason that it’s so exciting for me as a librarian to be involved in data curation.  In library school, I took a class in the biomedical engineering department called Medical Knowledge Representation.  For me, what it really boiled down to is this: right now, we have a lot of data, but we don’t quite know how to tease out real knowledge from that.  We can look at two different patients and see that one responds well to a given treatment and the other does not.  We have their tissue samples and their genetic info, but at this point, we haven’t quite got the know-how to get to the correlations between the genetic factors and the treatment successes.  In a way, the answers are there, but we don’t quite know how to read them yet.  As a librarian, I can help scientists preserve their data in a way that will facilitate it being used in ways that will aid in the discovery of cancer cures, once we have a greater understanding of how exactly the human genome works.  That’s pretty awesome.

As I’ve said, we’re still several steps away from having personalized cancer cures. However, there is a lot we do know, and there’s a lot that anyone with 99 bucks to spare can find out about his or her own genetic secrets, through a service called 23andMe.  It goes a little something like this: you spit in a tube and mail it back with your $99, and then in 6 to 8 weeks, you get to learn all about your genome.  You can find out more about your genetic ancestry and learn what percentage Neanderthal you are.  You can learn which diseases you’re at risk for out of a list of 116, including Alzheimer’s, a handful of cancers, Creutzfeldt-Jakob Disease (aka Mad Cow Disease), and even the dreaded Restless Leg Syndrome.  You can find out whether you’re likely to respond to 20 different drugs for everything from hypertension to depression.  You can find out what eye color you’re likely to have, in case you don’t have a mirror, I suppose.

I’ve looked at what the site can offer, and I guess at this point in time, I’m unsure whether I’d want this service.  I’ve known about 23andMe for awhile, but I came across some marketing they’re doing for the holiday season, I guess – 23 reasons to give their service as a gift.  My question is, would I want this as a gift?  Okay, so doing this can tell me that I’m at increased risk for, say, breast cancer or cirrhosis of the liver.  However, at this point in time, as far as I know, there’s absolutely nothing that I can gain from this information.  We haven’t come far enough that I could say, aha, I’m going to develop breast cancer – here’s what I can do to stop it!  Plus, just because I have the gene doesn’t mean I’m guaranteed to develop the disease – it might happen tomorrow, in ten years, or never.  So would I really want to know?  And also, would you want to give someone the gift of knowing that they could develop some horrific disease at any time?

I’m not in any way trying to downplay the importance of 23andMe.  For one thing, the samples they get from their subscribers get incorporated into research on the human genome, so in some ways, you’re adding to the scientific endeavor of curing human ills if you pony up the $99 for this, in addition to getting some good info for yourself.  There are some practical applications to this, like knowing you are likely to respond to caffeine or, more importantly, certain prescription medications.  However, would I want this as an unsolicited gift?  Probably not.  Do I want to know myself about my chances for developing whatever disease?  Honestly?  I don’t really think so.  Unless this warning came with some sort of practical advice – my god, you’re going to get Mad Cow Disease unless you start eating an orange every day! – I just see this as something else to worry about pointlessly.  And I already have plenty of those things.

Whether you want to know or not, of course, chances are good personalized medicine will be part of your future, given the trends in research right now.  Remember, you heard it here first. 🙂

*By Spiffistan (Own work) [Public domain], via Wikimedia Commons