Flip flop: my failed experiment with flipped classroom R instruction

I don’t know if this terminology is common outside of library circles, but it seems like the “flipped classroom” has been all the rage in library instruction lately.  The idea is that learners do some work before coming to the session (like read something or watch a video lecture), and then the in-person time is spent on doing more activities, group exercises, etc.  As someone who is always keen to try something new and exciting, I decided to see what would happen if I tried out the flipped classroom model for my R classes.

Actually, teaching R this way makes a lot of sense.  Especially if you don’t have any experience, there’s a lot of baseline knowledge you need before you can really do anything interesting.  You’ve got to learn a lot of terminology, how the syntax of R works, boring things like what a data frame is and why it matters.  That could easily be covered before class to save the in person time for the more hands-on aspects.  I’ve also noticed a lot of variability in terms of how much people know coming into classes.  Some people are pretty tech savvy when they arrive, maybe even have some experience with another programming language.  Other people have difficulty understanding how to open a file.  It’s hard to figure out how to pace a class when you’ve got people from all over that spectrum of expertise.  On the other hand, curriculum planning would be much easier if you could know that everyone is starting out with a certain set of knowledge and build off of it.

The other reason I wanted to try this is just the time factor.  I’m busy, really busy.  My library’s training room is also hard to book because we offer so many classes.  The people I teach are busy.  I teach my basic introduction to R course as a 3-hour session, and though I’d really rather make it 4 hours, even finding a 3-hour window when I and the room are both available and people are likely to be able to attend is difficult.  Plus, it would be nice if there was some way to deliver this instruction that wasn’t so time-intensive for me.  I love teaching R – it’s probably my favorite thing I do in my job and I’d estimate I’ve taught close to 500 researchers how to code.  I generally spend around 9 hours a month teaching R, plus another 4-6 hours doing prep, administrative stuff, and all the other things that have to get done to make a class function.  That’s a lot of time, and though I don’t at all mind doing it, I’d definitely be interested in any sort of way I could streamline that work without having a negative impact on the experience of learning R from me.

For all these reasons, I decided to experiment with trying out a flipped classroom model for my introduction to R class.  I had grand plans of making a series of short video tutorials that covered bite-sized pieces of learning R.  There would be a bunch of them, but they’d be about 5 minutes each.  I arranged for the library to get Adobe Captivate, which is very cool video tutorial software, and these tutorials are going to be so awesome when I get around to making them.  However, I had already scheduled the class for today, February 28, and I hadn’t gotten around to making them yet.  Fortunately, I had a recording of a previous Intro to R class I’d taught, so I chopped the relevant parts of that up into smaller pieces and made a YouTube playlist that served as my pre-class work for this session, probably about two and a half hours total.

I had 42 people were either signed up or on the waitlist at the end of last week.  I think I made the class description pretty clear – that this session was only an hour, but you did have to do stuff before you got there.  I sent out an email with the link to the video reminding people that they would be lost in class if they didn’t watch this stuff.  Even so, yesterday morning, the last of the videos had only 8 views, and I knew at least two of those were from me checking the video to make sure it worked.  So I sent out another email, once again imploring them to watch the videos before they came to class and to please cancel their registration and sign up for a regular R class if this video thing wasn’t for them.

By the time I taught the class this afternoon, 20 people had canceled their registration.  Of the remaining 22, 5 showed up.  Of the 5 that showed up, it quickly became apparent to me that none of them had watched the videos.  I knew no one was going to answer honestly if I asked who had watched them, so I started by telling them to read in the CSV file to a data frame.  This request is pretty fundamental, and also pretty much the first thing I covered in the videos, so when I was met with a lot of blank stares, I knew this experiment had pretty much failed.  I did my best to cover what I could in an hour, but that’s not much, so instead of this being a cool, interactive class where people ended up feeling empowered and ready to go write code, I got the feeling those people left feeling bewildered and like they wasted an hour.  One guy who had come in 10 minutes late came up to me after class and was like, “so this is a programming language?  What can you do with it?”  And I kind of looked at him like….whaaaat?  It turned out he hadn’t even registered for the class to begin with, much less done any of the pre-class work – he had been in the library and saw me teaching and apparently thought it looked interesting so he decided to wander in.

I felt disappointed by this failed experiment, but I’m not one to give up at the first sign of failure, so I’ve been thinking about how I could make this system work.  It could just be that this model is not suited to people in the setting where I teach.  I am similar to them – a busy, working professional who knows this is useful and I should learn it, but it’s hard to find the time – and I think about what it would take for me to do the pre-class work.  If I had the time and the videos were decent enough quality, I think I’d do it, but honestly chances are 50-50 that I’d be able to find the time.  So maybe this model just isn’t made for my community.

Before I give up on this experiment entirely, though, I’d love to hear from anyone who has tried this kind of approach for adult learners.  Did it work, did it not?  What went well and what didn’t?  And of course, being the data queen that I am, I intend to collect some data.  I’m working on a modified class evaluation for those 5 brave souls who did come to get some feedback on the pre-class work model, and I’m also planning on sending a survey out to the other 38 people who didn’t come to see what I can find out from them.  Data to the rescue of the flipped class!

Delocalizing data – a data sharing conundrum

This week I’ve been reading the second half of Sergio Sismondo’s An Introduction to Science and Technology Studies and I have been finding myself interested in the question of the universality of scientific knowledge and data.  A single sentence that I think captures the scope of the problem I’m finding interesting: “scientific and engineering research is textured at the local level, that it is shaped by professional cultures and interplays of interests, and that its claims and products result from thoroughly social processes” (168).  That is to say, the output of a scientific experiment is not some sort of universal truth – rather, data are the record of a manipulation of nature at a given time in a given place by a given person, highly contextualized and far from universally applicable.

I was in my kitchen the other day, baking a mushroom pot pie, after reading Chapter 10, specifically the section on “Tinkering, Skills, and Tacit Knowledge.”  That section describes the difficulties researchers were having in recreating a certain type of laser, even when they had written documentation from the original creators, even when they had sufficient technical expertise to do so, even when they had all the proper tools – in fact, even when they themselves had already built one, they found it difficult to build a second laser.  As I was pulling my pie out of the oven, I was thinking about the tacit knowledge involved in baking – how I know what exactly is meant when the instructions say I should bake till the crust is “golden brown,” how I make the decision to use fresh thyme instead of the chipotle peppers the recipe called for because I don’t like too much heat, how I know that my oven tends to run a little cold so I should set the temperature 10 degrees higher than called for by the recipe.  Just having a recipe isn’t enough to get a really tasty mushroom pot pie out of the oven, just as having a research article or other scientific documentation isn’t enough to get success out of an experiment.

These problems raise some obvious issues around reproducibility, which is a huge focus of concern in science at the moment.  Obviously scientific instruments are hopefully a little more standardized than my old apartment oven that runs cold, but you’d be surprised how much variation exists in scientific research.  Reproducibility is especially a problem when the researcher is herself the instrument, such as in the case of certain types of qualitative research.  Focus group or interview research is usually conducted using a script, so theoretically anyone could pick up the script and use it to do an interview, but a highly experienced researcher knows how to go off-script in appropriate ways to get the needed information, asking probing questions or guiding a participant back from a tangent.

More relevant to my own research, thinking about data not as representations of some sort of universal truth, but as the results of an experiment conducted within a potentially complex local and social context, can shared data be meaningfully reused?  How do we filter out the noise and get to some sort of ground truth when it comes to data, or can we at all?  Part of the question that I really want to address in my dissertation is what barriers exist to reusing shared data, and I think this is a huge one.  Some of the problem can be addressed by standards, or “formal objectivity” (140).  However, as Sismondo notes, standards are themselves localized and tied to social processes.  Between different scientific fields, the same data point may be measured using vastly different techniques, and within a lab, the equipment you purchase often has a huge impact on how your data are collected and stored.  Maybe we can standardize to an extent within certain communities of practice, but can we really hope to get everyone in the world on one page when it comes to standards?

If we can’t standardize, then maybe we can at least document.  If I measured in inches but your analysis needs length input in centimeters, that’s okay, as long as you know I measured in inches and you convert the data before doing your analysis.  That seems fairly obvious, but how do I necessarily know what I need to document to fully contextualize the data for someone else to use it?  Is it important that I took the measurement on a Tuesday at 4 pm, that the temperature outside was 80 degrees with 70% humidity, that I used a ruler rather than a tape measure, that the ruler was made of plastic rather than wood?  I could go on and on.  How much documentation is enough, and who decides?

The concepts of reproducibility, standardization, and documentation are nothing new, but the idea of data being inextricably caught up in local and social contexts does get me thinking about the feasibility of reusing shared data.  I don’t think data sharing is going to stop – there are enough funders and journals on board with requiring data sharing that I think researchers should expect that data sharing will be part of their scientific work going forward.  The question then is what is the utility of this shared data.  Is it just useful for transparency of the published articles, to document and prove the claims made in those publications?  Or can we figure out ways to surmount data’s limited context and make it more broadly usable in other settings?  Are there certain fields that are more likely to achieve that formal objectivity than others, and therefore certain fields were data reuse may be more appropriate or at least easier than others?  I think this requires further thought.  Good thing I have a few years to spend thinking about it!

 

Who owns science? Some musings on structural functionalism

This week I’ve been reading Sergio Sismondo’s An Introduction to Science and Technology Studies, which has given me a lot to think about in terms of theoretical backgrounds for understanding how science creates knowledge.  In fact, it’s almost given me too much to think about.  There are so many different theoretical bases brought into the mix here, and I can see the relative merits of each, so I find myself wondering how to make sense of it all, but also what it means to adopt a theoretical underpinning as a social scientist.  Is it like a religion, where you accept one and only one dogma, and all parts of it, to the exclusion of all others?  Or is it more like a buffet, where you pick a little bit of the things that seem appealing to you and leave behind the things that don’t catch your eye?  I’m hoping it’s the latter, and I’m going to go on that assumption until the theory police tell me I can’t do it. 🙂  So, on that assumption, here are some ideas I’ve put on my plate from Sismondo’s buffet.

Structural Functionalism and Mertonian Norms

My favorite theoretical framework I picked up here was structural functionalism, and in particular, Robert Merton’s four guiding norms.  Structural functionalism, as I understand it, argues that society is composed of institutional structures that function based on guiding norms and customs.  Merton suggests that science is one such institution, the primary goal of which is “the extension of certified knowledge” (23).  Merton also outlined four norms of behavior that guide scientific practice, suggesting that those who follow them will be rewarded and those who violate them will be punished.  The norms are universalism (that the same criteria should be used to evaluate scientific claims regardless of the race, gender, etc of the person making them), communism (that scientific knowledge belongs to everyone), disinterestedness (that scientists place the good of the scientific community ahead of their own personal gain), and organized skepticism (that the community should not believe new ideas until they have been convincingly proven).

Of those four norms, communism and disinterestedness speak the most to my interest in data sharing and reuse.  Communism seems the most obviously related.  It’s very interesting to think about what parts of science are typically thought to belong to the community and which are thought to be privately owned.  For example, the Supreme Court unanimously ruled in 2015 that human genes could not be patented, a decision that seems in line with Merton’s communism norm.  On the other hand, plenty of scientific ideas can be and are patented.  While many scientific journals are becoming open access and making their articles freely available, many more work on a subscription model, suggesting that the ideas shared within are available for common consumption – if you are willing and able to pay the fee.

Although this example comes from an entirely different realm than science, thinking about these ideas has reminded me of the case of the artist Anish Kapoor, who purchased the exclusive rights to paint with the world’s “blackest black” so that no other artist can use it.  In retaliation, another artist designed the “pinkest pink” paint and made it available for sale – to any artist except Anish Kapoor. While this episode is somewhat entertaining, it does bring up some interesting ideas about ownership in communities that are generally dedicated to the common good.  Art and science are very different, but they’re also quite alike in some ways that are very relevant to the work I’m doing.  They’re both activities carried out by individuals for their own reasons (artistic expression, scientific curiosity) for the common good (to share beauty with the world, to further scientific knowledge).  We are outraged when we hear of a rich artist laying exclusive claim to the raw materials of art so that no one else can use them.  It feels somehow petty, and it also seems like a disservice to not just the art world, but to all of is.  What could others be creating for us if they had access to that black?  I don’t know if we feel that same outrage when we hear of a scientist trying to lay exclusive claim to data.  Of course this isn’t a perfect analogy – a big part of the work of science is gathering or creating the data, which confuses the concept of ownership.  Still, I think there are some interesting ideas here to explore about how scientists think about common ownership of science – not just the ideas, but the data as well.

I started out this entry saying I was going to dip into some other theories – I have some things to say about social constructionism and actor-network theory, but now I’ve spent a long time going on and on about art and science and this is getting a bit long, so I think I’ll stop here for today. 🙂

 

On data, knowledge, and theory

As I’ve mentioned on this blog before, I recently started a PhD program at the University of Maryland’s iSchool, focusing on scientific researchers’ data reuse practices.  There’s a great deal of attention lately on encouraging, and even requiring, researchers to share their data, but less work has been done on how researchers actually make use of that shared data (or if indeed they do at all).  This semester, I’m doing an independent study with my advisor, Dr. Andrea Wiggins, with the aim of better understanding the theoretical background for this problem.  I have the good fortune of working in a job that involves interacting with researchers on data questions on a pretty much daily basis, so I have plenty of opportunity to observe actual practices, but I have less background on theoretical frameworks for contextualizing and understanding why  these things happen, so that’s my goal this semester!  I’ve picked out several readings and am going to write weekly reflections on what I’ve read and thought, and since I have this blog, I figured, why not inflict all this on you, my readers, as well? 🙂

This week I read Paul Davidson Reynolds’ Primer in Theory Construction, which breaks down the research process and explores the scientific method and all its component parts.  It is described as being designed “for those who have already studied one or more of the social, behavioral, or natural sciences, but have no formal introduction to the way theories are constructed, stated, tested, and connected together to form a scientific body of knowledge.”  While I was reading it, I often was thinking to myself, “well, yeah, obviously…” but after I had a little more time to think about it, it occurred to me that it was useful to really stop and think about why research is done the way it is and what we can really determine using data, inference, and logic.

One of the things I was thinking about as I was reading this book was how we make the jump from data to knowledge, and also how to operationalize terms like “data” and “knowledge.”  The NIH’s big data initiative is called Big Data to Knowledge, but what exactly does it mean to translate “big data” to “knowledge”?  How do we define “big data” (as opposed to small data?) and “knowledge”?  Are the ways that big data become knowledge different than the ways non-big data become knowledge?  There are some good definitions of big data, but how do we define “knowledge” in the scientific, and particularly biomedical, realm?

Thinking about how researchers use data by really breaking things down to their most basic level is a little different from how I’ve thought about things before, but actually makes good sense.  I suggest that the barriers to reuse of shared data are:

  • technological: there aren’t good tools for easily getting/reusing the data, or the data are poor quality or hard to find)
  • social: incentive structures of science often do not reward research that reuses data – take a look at the concept of #researchparasites
  • educational: reusing data involves a different skill set that most researchers aren’t taught

However, I never really thought about one of the most fundamental social factors, which is how researchers in a field conceptualize data and how it is transformed into knowledge.  Are there fundamental differences between the data I gather and data someone else gathers and I reuse?  Obviously if I gather my own data, I know more about its context, quality, and provenance.  If I reuse someone’s shared data, I don’t know how careful they were when collecting it, or other important things I might need to know about how the data were collected to be able to reuse them meaningfully.  For example, I once worked with a researcher on locating a clinical dataset for reuse, and once we got the dataset, the researcher asked how patient temperature had been measured – oral, axillary, rectal?  I got back in touch with the original data owner, and they didn’t know – the person who would be able to answer that question had moved on to a new position.  Apparently that mattered to the methods of the researcher I was working with, so they couldn’t use that dataset.  The sorts of things that seem like little minor details can actually make a big difference, but there’s really no way of knowing that unless you know how a research field works with and understands data.

Some things – like knowing how temperature was measured – are probably pretty specific to a narrow field, or even just a particular research method, and it’s probably not possible to know all of the intricacies of the many fields that comprise biomedical research.  However, I think there are also likely other fundamental qualities of data that would apply more broadly across many research fields, and perhaps that would be a useful approach to this question.