In my previous life as an English professor, every semester I looked forward to the information literacy instruction that our librarian did for my classes. I always learned something new, and, even better, my students no longer tried to cite Wikipedia as a source in their research papers. Now that I’m a health and life sciences librarian, the tables are turned, and I’m the one responsible for making sure that my patrons are equipped to locate and use the information they need. When it comes to the people I work with in the sciences, often the information they need is not an article or a book, but a dataset. As a result, I am one of many librarians starting to think about best practices for providing data literacy instruction.
According to the National Forum on Information Literacy, information literacy is “the ability to know when there is a need for information, to be able to identify, locate, evaluate, and effectively use that information for the issue or problem at hand.” The American Library Association has outlined a list of Information Literacy Competency Standards for Higher Education. So far, a similar list of competencies for data literacy instruction has not been defined, but the general concepts are the same – researchers need to know how to locate data, evaluate it, and use it. More importantly, as data creators themselves, they need to know how to make their datasets available and useful not just to their own research group, but to others.
Fortunately, a number of groups around the country are working on developing data literacy curricula. Teams from Purdue University, Stanford University, the University of Minnesota, and the University of Oregon have received a grant from the Institute of Museum and Library Services (IMLS) to “develop a training program in data information literacy for graduate students who will become the next generation of scientists.” Results and resources will eventually be available on their project website. Also working under the auspices of an IMLS grant, a team from University of Massachusetts Medical School and Worcester Polytechnic Institute has developed a set of seven curricular modules for teaching data literacy. Their curriculum centers on teaching researchers what they would need to know to complete a data management plan as required by the National Science Foundation (NSF) and several other major grant funders.
All of the work that these other institutions has done is a fantastic start, but at my institution, the researchers and students are very busy, and not likely to commit to a seven-session data literacy program. Nonetheless, it’s still important that they learn how to manage, preserve, and share their data, not only because many funders now require it, but also because it’s the right thing to do as a member of the scientific community. Thus, my challenge has been to design a one-off session that would be applicable across a variety of scientific (and perhaps even social science) fields. In order to do so, I’ve started with my own list of core competencies for data literacy instruction, including:
- understanding the “data life cycle” and the importance of sharing and preservation across the entire life cycle, especially for rare or unique datasets
- knowing how to write a data management plan that will fulfill the requirements of funders like NSF
- making appropriate choices about file forms and formats (such as by choosing open rather than proprietary standards)
- keeping data organized and discoverable using file naming standards and appropriate metadata schema
- planning for long-term, secure storage of data
- promoting sharing by publishing datasets and assigning persistent identifiers like DOIs
- awareness of data as scholarly output that should be considered in the context of promotion and tenure
Does this list cover everything a researcher would need to know to effectively manage their data? Almost certainly not, but as with any single session, my goal is to introduce learners to the major issues and let them know that the library has the expertise to assist them with the more complicated issues that will inevitably arise. Supporting the data needs of researchers is a daunting task, but librarians already have much of the knowledge and skills to provide this assistance – we simply need to adapt our knowledge of information structures and best practices to this burgeoning area.
As research becomes increasingly data-driven, libraries will be doing a great service to individuals and the research community as a whole by helping to create researchers who are good data stewards. Like my formerly Wikipedia-dependent students, many of our researchers are still taking shortcuts when it comes to handling their data because they simply don’t know any better. It’s up to librarians and other information professionals to ensure that the valuable research that is going on at our institutions remains available for future generations of researchers.