Flip flop: my failed experiment with flipped classroom R instruction

I don’t know if this terminology is common outside of library circles, but it seems like the “flipped classroom” has been all the rage in library instruction lately.  The idea is that learners do some work before coming to the session (like read something or watch a video lecture), and then the in-person time is spent on doing more activities, group exercises, etc.  As someone who is always keen to try something new and exciting, I decided to see what would happen if I tried out the flipped classroom model for my R classes.

Actually, teaching R this way makes a lot of sense.  Especially if you don’t have any experience, there’s a lot of baseline knowledge you need before you can really do anything interesting.  You’ve got to learn a lot of terminology, how the syntax of R works, boring things like what a data frame is and why it matters.  That could easily be covered before class to save the in person time for the more hands-on aspects.  I’ve also noticed a lot of variability in terms of how much people know coming into classes.  Some people are pretty tech savvy when they arrive, maybe even have some experience with another programming language.  Other people have difficulty understanding how to open a file.  It’s hard to figure out how to pace a class when you’ve got people from all over that spectrum of expertise.  On the other hand, curriculum planning would be much easier if you could know that everyone is starting out with a certain set of knowledge and build off of it.

The other reason I wanted to try this is just the time factor.  I’m busy, really busy.  My library’s training room is also hard to book because we offer so many classes.  The people I teach are busy.  I teach my basic introduction to R course as a 3-hour session, and though I’d really rather make it 4 hours, even finding a 3-hour window when I and the room are both available and people are likely to be able to attend is difficult.  Plus, it would be nice if there was some way to deliver this instruction that wasn’t so time-intensive for me.  I love teaching R – it’s probably my favorite thing I do in my job and I’d estimate I’ve taught close to 500 researchers how to code.  I generally spend around 9 hours a month teaching R, plus another 4-6 hours doing prep, administrative stuff, and all the other things that have to get done to make a class function.  That’s a lot of time, and though I don’t at all mind doing it, I’d definitely be interested in any sort of way I could streamline that work without having a negative impact on the experience of learning R from me.

For all these reasons, I decided to experiment with trying out a flipped classroom model for my introduction to R class.  I had grand plans of making a series of short video tutorials that covered bite-sized pieces of learning R.  There would be a bunch of them, but they’d be about 5 minutes each.  I arranged for the library to get Adobe Captivate, which is very cool video tutorial software, and these tutorials are going to be so awesome when I get around to making them.  However, I had already scheduled the class for today, February 28, and I hadn’t gotten around to making them yet.  Fortunately, I had a recording of a previous Intro to R class I’d taught, so I chopped the relevant parts of that up into smaller pieces and made a YouTube playlist that served as my pre-class work for this session, probably about two and a half hours total.

I had 42 people were either signed up or on the waitlist at the end of last week.  I think I made the class description pretty clear – that this session was only an hour, but you did have to do stuff before you got there.  I sent out an email with the link to the video reminding people that they would be lost in class if they didn’t watch this stuff.  Even so, yesterday morning, the last of the videos had only 8 views, and I knew at least two of those were from me checking the video to make sure it worked.  So I sent out another email, once again imploring them to watch the videos before they came to class and to please cancel their registration and sign up for a regular R class if this video thing wasn’t for them.

By the time I taught the class this afternoon, 20 people had canceled their registration.  Of the remaining 22, 5 showed up.  Of the 5 that showed up, it quickly became apparent to me that none of them had watched the videos.  I knew no one was going to answer honestly if I asked who had watched them, so I started by telling them to read in the CSV file to a data frame.  This request is pretty fundamental, and also pretty much the first thing I covered in the videos, so when I was met with a lot of blank stares, I knew this experiment had pretty much failed.  I did my best to cover what I could in an hour, but that’s not much, so instead of this being a cool, interactive class where people ended up feeling empowered and ready to go write code, I got the feeling those people left feeling bewildered and like they wasted an hour.  One guy who had come in 10 minutes late came up to me after class and was like, “so this is a programming language?  What can you do with it?”  And I kind of looked at him like….whaaaat?  It turned out he hadn’t even registered for the class to begin with, much less done any of the pre-class work – he had been in the library and saw me teaching and apparently thought it looked interesting so he decided to wander in.

I felt disappointed by this failed experiment, but I’m not one to give up at the first sign of failure, so I’ve been thinking about how I could make this system work.  It could just be that this model is not suited to people in the setting where I teach.  I am similar to them – a busy, working professional who knows this is useful and I should learn it, but it’s hard to find the time – and I think about what it would take for me to do the pre-class work.  If I had the time and the videos were decent enough quality, I think I’d do it, but honestly chances are 50-50 that I’d be able to find the time.  So maybe this model just isn’t made for my community.

Before I give up on this experiment entirely, though, I’d love to hear from anyone who has tried this kind of approach for adult learners.  Did it work, did it not?  What went well and what didn’t?  And of course, being the data queen that I am, I intend to collect some data.  I’m working on a modified class evaluation for those 5 brave souls who did come to get some feedback on the pre-class work model, and I’m also planning on sending a survey out to the other 38 people who didn’t come to see what I can find out from them.  Data to the rescue of the flipped class!

A Silly Experiment in Quantifying Death (and Doing Better Code)

Doesn’t it seem like a lot of people died in 2016?  Think of all the famous people the world lost this year.  It was around the time that Alan Thicke died a couple weeks ago that I started thinking, this is quite odd; uncanny, even.  Then again, maybe there was really nothing unusual about this year, but because a few very big names passed away relatively young, we were all paying a little more attention to it.  Because I’m a data person, I decided to do a rather silly thing, which was to write an R script that would go out and collect a list of celebrity deaths, clean up the data, and then do some analysis and visualization.

You might wonder why I would spend my limited free time doing this rather silly thing.  For one thing, after I started thinking about celebrity deaths, I really was genuinely curious about whether this year had been especially fatal or if it was just an average year, maybe with some bigger names.  More importantly, this little project was actually a good way to practice a few things I wanted to teach myself.  Probably some of you are just here for the death, so I won’t bore you with a long discussion of my nerdy reasons, but if you’re interested in R, Github, and what I learned from this project that actually made it quite worth while, please do stick around for that after the death discussion!

Part One: Celebrity Deaths!

To do this, I used Wikipedia’s lists of deaths of notable people from 2006 to present. This dataset is very imperfect, for reasons I’ll discuss further, but obviously we’re not being super scientific here, so let’s not worry too much about it. After discarding incomplete data, this left me with 52,185 people.  Here they are on a histogram, by year.

year_plotAs you can see, 2016 does in fact have the most deaths, with 6,640 notable people’s deaths having been recorded as of January 3, 2017. The next closest year is 2014, when 6,479 notable people died, but that’s a full 161 people less than 2016 (which is only a 2% difference, to be fair, but still).  The average number of notable people who died yearly over this 11-year period, was 4,774, and the number of people that died in 2016 alone is 40% higher than that average.  So it’s not just in my head, or yours – more notable people died this year.

Now, before we all start freaking out about this, it should be noted that the higher number of deaths in 2016 may not reflect more people actually dying – it may simply be that more deaths are being recorded on Wikipedia. The fairly steady increase and the relatively low number of deaths reported in 2006 (when Wikipedia was only five years old) suggests that this is probably the case.  I do not in any way consider Wikipedia a definitive source when it comes to vital statistics, but since, as I’ve mentioned, this project was primarily to teach myself some coding lessons, I didn’t bother myself too much about the completeness or veracity of the data.  Besides likely being an incomplete list, there are also some other data problems, which I’ll get to shortly.

By the way, in case you were wondering what the deadliest month is for notable people, it appears to be January:

month_plotObviously a death is sad no matter how old the person was, but part of what seemed to make 2016 extra awful is that many of the people who died seemed relatively young. Are more young celebrities dying in 2016? This boxplot suggests that the answer to that is no:

age_plotThis chart tells us that 2016 is pretty similar to other years in terms of the age at which notable people died. The mean age of death in 2016 was 76.85, which is actually slightly higher than the overall mean of 75.95. The red dots on the chart indicate outliers, basically people who died at an age that’s significantly more or less than the age most people died at in that year. There are 268 in 2016, which is a little more than other years, but not shockingly so.

By the way, you may notice those outliers in 2006 and 2014 where someone died at a very, very old age. I didn’t realize it at first, butWikipedia does include some notable non-humans in their list. One is a famous tree that died in an ice storm at age 125 and the other a tortoise who had allegedly been owned by Charles Darwin, but significantly outlived him, dying at age 176.  Obviously this makes the data and therefore this analysis even more suspect as a true scientific pursuit.  But we had fun, right? 🙂

By the way, since I’m making an effort toward doing more open science (if you want to call this science), you can find all the code for this on my Github repository.  And that leads me into the next part of this…

Part Two: Why Do This?

I’m the kind of person who learns best by doing.  I do (usually) read the documentation for stuff, but it really doesn’t make a whole lot of sense to me until I actually get in there myself and start tinkering around.  I like to experiment when I’m learning code, see what happens if I change this thing or that, so I really learn how and why things work. That’s why, when I needed to learn a few key things, rather than just sitting down and reading a book or the help text, I decided to see if I could make this little death experiment work.

One thing I needed to learn: I’m working with a researcher on a project that involves web scraping, which I had kind of played with a little, but never done in any sort of serious way, so this project seemed like a good way to learn that (and it was).  Another motivator: I’m going to be participating in an NCBI hackathon next week, which I’m super excited about, but I really felt like I needed to beef up my coding skills and get more comfortable with Github.  Frankly, doing command line stuff still makes me squeamish, so in the course of doing this project, I taught myself how to use RStudio’s Github integration, which actually worked pretty well (I got a lot out of Hadley Wickham’s explanation of it).  This death project was fairly inconsequential in and of itself, but since I went to the trouble of learning a lot of stuff to make it work, I feel a lot more prepared to be a contributing member of my hackathon team.

I wrote in my post on the open-ish PhD that I would be more amenable to sharing my code if I didn’t feel as if it were so laughably amateurish.  In the past, when I wrote code, I would just do whatever ridiculous thing popped into my head that I thought my work, because, hey, who was going to see it anyway?  Ever since I wrote that open-ish PhD post, I’ve really approached how I write code differently, on the assumption that someone will look at it (not that I think anyone is really all that interested in my goofy death analysis, but hey, it’s out there in case someone wants to look).

As I wrote this code, I challenged myself to think not just of a way, any way, to do something, but the best, most efficient, and most elegant way.  I learned how to write good functions, for real.  I learned how to use the %>%, (which is a pipe operator, and it’s very awesome).  I challenged myself to avoid using for loops, since those are considered not-so-efficient in R, and I succeeded in this except for one for loop that I couldn’t think of a way to avoid at the time, though I think in retrospect there’s another, more efficient way I could write that part and I’ll probably go back and change it at some point.  In the past, I would write code and be elated if it actually worked.  With this project, I realized I’ve reached a new level, where I now look at code and think, “okay, that worked, but how can I do it better?  Can I do that in one line of code instead of three?  Can I make that more efficient?”

So while this little project might have been somewhat silly, in the end I still think it was a good use of my time because I actually learned a lot and am already starting to use a lot of what I learned in my real work.  Plus, I learned that thing about Darwin’s tortoise, and that really makes the whole thing worth it, doesn’t it?

So you think you can code

I’ve been thinking about many ideas lately dealing with data and data science (this is, I’m sure, not news to anyone).  I’ve also had several people encourage me to pick my blog back up, and I’ve recently made my den into a cute and comfy little office, so, why not put all this together and resume blogging with a little post about my thoughts on data!  In particular, in this post I’m going to talk about coding.

Early on in my library career when I first got interested in data, I was talking to one of my first bosses and told her I thought I should learn R, which is essentially a scripting language, very useful for data processing, analysis, statistics, and visualization.  She gave me a sort of dubious look, and even as I said it, I was thinking in my head, yeah, I’m probably not going to do that.  I’m no computer scientist.  Fast forward a few years later, and not only have I actually learned R, it’s probably the single most important skill in my professional toolbox.

Here’s the thing – you don’t have to be a computer scientist to code, especially in R.  It’s actually remarkably straightforward, once you get over the initial strangeness of it and get a feel for the syntax.  I started offering R classes around the beginning of this year and I call my introductory classes “Introduction to R for Non-programmers.”  I had two reasons for selecting this name: one, I had only been using R for less than a year myself and didn’t (and still don’t) consider myself an expert.  When I started thinking about getting up in front of a room of people and teaching them to code, I had horrifying visions of experienced computer scientists calling me out on my relative lack of expertise, mocking my class exercises, or correcting me in front of everyone.  So, I figured, let’s set the bar low. 🙂  More importantly, I wanted to emphasize that R is approachable!  It’s not scary!  I can learn it, you can learn it.  Hell, young children can (and do) learn it.  Not only that, but you can learn it from one of a plethora of free resources without ever cracking a book or spending a dime.  All it takes is a little time, patience, and practice.

The payoff?  For one thing, you can impress your friends with your nerdy awesome skills!  (Or at least that’s what I keep telling myself.)  If you work with data of any kind, you can simplify your work, because using R (or other scientific programming languages) is faaaaar more efficient than using other point and click tools like Excel.  You can create super awesome visualizations, do crazy data analysis in a snap, and work with big huge data sets that would break Excel.  And you can do all of this for free!  If you’re a research and/or medical librarian, you will also make yourself an invaluable resource to your user community.  I believe that I could teach an R class every day at my library and there would still be people showing up.  We regularly have waitlists of 20 or more people.  Scientists are starting to catch on to all the reasons I’ve mentioned above, but not all of them have the time or inclination to use one of the free online resources.  Plus, since I’m a real human person who knows my users and their research and their data, I know what they probably want to do, so my classes are more tailored to them.

I was being introduced to Hadley Wickham yesterday, who is a pretty big deal in the R world, as he created some very important R packages (kind of like apps), and my friend and colleague who introduced me said, “this is Lisa; she is our prototypical data scientist librarian.”  I know there are other librarian coders out there because I’m on mailing lists with some of them, but I’m not currently aware of any other data librarians or medical librarians who know R.  I’m sure there are others and I would be very interested in knowing them.  And if it is fair to consider me a “prototype,” I wonder how many other librarians will be interested in becoming data scientist librarians.  I’m really interested in hearing from the librarians reading this – do you want to code?  Do you think you can learn to code?  And if not, why not?