A Silly Experiment in Quantifying Death (and Doing Better Code)

Doesn’t it seem like a lot of people died in 2016?  Think of all the famous people the world lost this year.  It was around the time that Alan Thicke died a couple weeks ago that I started thinking, this is quite odd; uncanny, even.  Then again, maybe there was really nothing unusual about this year, but because a few very big names passed away relatively young, we were all paying a little more attention to it.  Because I’m a data person, I decided to do a rather silly thing, which was to write an R script that would go out and collect a list of celebrity deaths, clean up the data, and then do some analysis and visualization.

You might wonder why I would spend my limited free time doing this rather silly thing.  For one thing, after I started thinking about celebrity deaths, I really was genuinely curious about whether this year had been especially fatal or if it was just an average year, maybe with some bigger names.  More importantly, this little project was actually a good way to practice a few things I wanted to teach myself.  Probably some of you are just here for the death, so I won’t bore you with a long discussion of my nerdy reasons, but if you’re interested in R, Github, and what I learned from this project that actually made it quite worth while, please do stick around for that after the death discussion!

Part One: Celebrity Deaths!

To do this, I used Wikipedia’s lists of deaths of notable people from 2006 to present. This dataset is very imperfect, for reasons I’ll discuss further, but obviously we’re not being super scientific here, so let’s not worry too much about it. After discarding incomplete data, this left me with 52,185 people.  Here they are on a histogram, by year.

year_plotAs you can see, 2016 does in fact have the most deaths, with 6,640 notable people’s deaths having been recorded as of January 3, 2017. The next closest year is 2014, when 6,479 notable people died, but that’s a full 161 people less than 2016 (which is only a 2% difference, to be fair, but still).  The average number of notable people who died yearly over this 11-year period, was 4,774, and the number of people that died in 2016 alone is 40% higher than that average.  So it’s not just in my head, or yours – more notable people died this year.

Now, before we all start freaking out about this, it should be noted that the higher number of deaths in 2016 may not reflect more people actually dying – it may simply be that more deaths are being recorded on Wikipedia. The fairly steady increase and the relatively low number of deaths reported in 2006 (when Wikipedia was only five years old) suggests that this is probably the case.  I do not in any way consider Wikipedia a definitive source when it comes to vital statistics, but since, as I’ve mentioned, this project was primarily to teach myself some coding lessons, I didn’t bother myself too much about the completeness or veracity of the data.  Besides likely being an incomplete list, there are also some other data problems, which I’ll get to shortly.

By the way, in case you were wondering what the deadliest month is for notable people, it appears to be January:

month_plotObviously a death is sad no matter how old the person was, but part of what seemed to make 2016 extra awful is that many of the people who died seemed relatively young. Are more young celebrities dying in 2016? This boxplot suggests that the answer to that is no:

age_plotThis chart tells us that 2016 is pretty similar to other years in terms of the age at which notable people died. The mean age of death in 2016 was 76.85, which is actually slightly higher than the overall mean of 75.95. The red dots on the chart indicate outliers, basically people who died at an age that’s significantly more or less than the age most people died at in that year. There are 268 in 2016, which is a little more than other years, but not shockingly so.

By the way, you may notice those outliers in 2006 and 2014 where someone died at a very, very old age. I didn’t realize it at first, butWikipedia does include some notable non-humans in their list. One is a famous tree that died in an ice storm at age 125 and the other a tortoise who had allegedly been owned by Charles Darwin, but significantly outlived him, dying at age 176.  Obviously this makes the data and therefore this analysis even more suspect as a true scientific pursuit.  But we had fun, right? 🙂

By the way, since I’m making an effort toward doing more open science (if you want to call this science), you can find all the code for this on my Github repository.  And that leads me into the next part of this…

Part Two: Why Do This?

I’m the kind of person who learns best by doing.  I do (usually) read the documentation for stuff, but it really doesn’t make a whole lot of sense to me until I actually get in there myself and start tinkering around.  I like to experiment when I’m learning code, see what happens if I change this thing or that, so I really learn how and why things work. That’s why, when I needed to learn a few key things, rather than just sitting down and reading a book or the help text, I decided to see if I could make this little death experiment work.

One thing I needed to learn: I’m working with a researcher on a project that involves web scraping, which I had kind of played with a little, but never done in any sort of serious way, so this project seemed like a good way to learn that (and it was).  Another motivator: I’m going to be participating in an NCBI hackathon next week, which I’m super excited about, but I really felt like I needed to beef up my coding skills and get more comfortable with Github.  Frankly, doing command line stuff still makes me squeamish, so in the course of doing this project, I taught myself how to use RStudio’s Github integration, which actually worked pretty well (I got a lot out of Hadley Wickham’s explanation of it).  This death project was fairly inconsequential in and of itself, but since I went to the trouble of learning a lot of stuff to make it work, I feel a lot more prepared to be a contributing member of my hackathon team.

I wrote in my post on the open-ish PhD that I would be more amenable to sharing my code if I didn’t feel as if it were so laughably amateurish.  In the past, when I wrote code, I would just do whatever ridiculous thing popped into my head that I thought my work, because, hey, who was going to see it anyway?  Ever since I wrote that open-ish PhD post, I’ve really approached how I write code differently, on the assumption that someone will look at it (not that I think anyone is really all that interested in my goofy death analysis, but hey, it’s out there in case someone wants to look).

As I wrote this code, I challenged myself to think not just of a way, any way, to do something, but the best, most efficient, and most elegant way.  I learned how to write good functions, for real.  I learned how to use the %>%, (which is a pipe operator, and it’s very awesome).  I challenged myself to avoid using for loops, since those are considered not-so-efficient in R, and I succeeded in this except for one for loop that I couldn’t think of a way to avoid at the time, though I think in retrospect there’s another, more efficient way I could write that part and I’ll probably go back and change it at some point.  In the past, I would write code and be elated if it actually worked.  With this project, I realized I’ve reached a new level, where I now look at code and think, “okay, that worked, but how can I do it better?  Can I do that in one line of code instead of three?  Can I make that more efficient?”

So while this little project might have been somewhat silly, in the end I still think it was a good use of my time because I actually learned a lot and am already starting to use a lot of what I learned in my real work.  Plus, I learned that thing about Darwin’s tortoise, and that really makes the whole thing worth it, doesn’t it?

Friday Fun Paper: Down the Dark Road of Carrot Addiction

Be careful – your next salad might be the one that starts you down the dark path of carrot addiction. (By Kander, via Wikimedia Commons)

Last week there was no Friday Fun Paper because I was off in Cape Cod gallivanting as well as attending the National Library of Medicine Bioinformatics course at the Woods Hole Marine Biological Laboratory.  I met some very interesting people and learned a ton, and I can highly recommend this experience to anyone interested in applying technology to medicine.

At the end of the week, I boarded the plane back to Los Angeles, and found myself seated next to a woman who was probably in her mid-60s.  About halfway through the flight, she pulled a small bag of baby carrots out of the back of her seat pocket, set them on her tray table, looked at them for several minutes, and then put them away again.  I did not subsequently see the carrots, and I’m fairly certain that she did not eat them at any point in the flight.  This was strange enough in itself, but it also struck me because it reminded me of a very strange and fascinating article I read several years ago by Mary Roach, one of my favorite science writers.

Published several years ago in Salon, Roach’s article “Turning Orange” elucidates the curious phenomenon of carrot addiction.  Yes, this is a real thing.  Roach interviewed several carrot addicts, including one who had not been able to travel for many years because of her carrot addiction – she had to have her carrots cooked in a special way and eat them immediately after they were cooked, so she couldn’t go on a long flight or road trip because she wouldn’t have access to the carrots.  When her out-of-state daughter was going to get married, she braved the flight, but had to have her daughter waiting at the airport with the carrots as soon as she got off the flight.  It occurred to me that I might be sitting next to this woman, but since her carrots were raw in their original packaging and she never ate them, I think it’s unlikely.

A quick search of the medical literature1 reveals that the subject of carrot addiction has been explored by one R. Kaplan in the Australian and New Zealand Journal of Psychiatry (30.5) in an article titled simply “Carrot Addiction.”  The mechanism by which people become addicted to carrots, as far as I’m aware, remains unknown, though there are two theories.  First, some carrot addicts develop their carrot addiction while they’re quitting smoking, suggesting that it’s sort of an oral fixation substitute.  Secondly, some people may actually become physically addicted to the beta carotene in carrots. Some people end up eating so many carrots that their skin actually turns orange from the beta carotene.  Not even kidding.

So now, next time you see someone eating carrots, I bet you’re going to wonder, aren’t you?  Is this just a casual carrot eater, or are you dealing with a full-on carrot addict?

1.  In case you’re interested, after some playing around, my PubMed search string was

(“Behavior, Addictive”[Mesh] OR addict*) AND (“Daucus carota”[Mesh] OR “carrot” NOT “Card Arranging Reward Responsivity Objective Test”)

The bit on the end about the card arranging test is because I was getting lots of articles about this test (abbreviated CARROT) being used with people who had other addiction issues.