If data sharing is difficult, what can it tell us? An Actor-Network Theory approach

In my ongoing adventures in science and technology studies readings, this week I’ve been reading The Social Construction of Technological Systems.  It diverges a little bit from my interests, strictly speaking, and focuses more on development of technologies rather than more of the laboratory and clinical science that I’m interested in, but I’m still glad I read it because it sparked some thoughts and ideas that I think could be interesting to pursue.

The portions of the collection that I read were rooted in social constructivist theory (as you might guess from the title of the book), specifically Actor-Network Theory (ANT).  The preface to the 25th anniversary edition explores some new developments in the field since the original edition, including “posthuman” approaches that consider nonhuman actants within social systems (xxv).  Scientific researchers operate within a complex system – not only because scientific research is itself often complicated, but also because science happens within a social system involving things like grant funding and scholarly articles and citations and so on.  Data play important roles in that system, as the raw product of scientific research, as evidence for scientific claims, and, now that many researchers operate in fields where data sharing is becoming more expected, something of a commodity.  In ANT, actants can be nonhuman, so I think it would be reasonable to consider data an actant in the social network of scientific research, and potentially one of the more interesting parts of that network, even more so than the humans.

The other avenue this collection sent my mind down had to do with data repositories.  At the start of the chapter “Society in the Making: The Study of Technology as a Tool for Sociological Analysis,” Michael Callon argues that “the study of technology itself can be transformed into a sociological tool of analysis” (77).  To summarize his thesis, essentially he argues that technological systems are created by what he calls “engineer-sociologists,” the designers or creators of the technology, who have had to essentially transform themselves into sociologists to study the intended users in order to develop technologies that will meet their needs.  If this is true, then these new technologies should be able to tell us something about their intended users.

This chapter got me thinking about some of the systems that are in place for data sharing, like some of the major data repositories.  I won’t name any names, but there are a couple of very well-known data repositories that people often complain to me about when it comes to submitting their data.  In some labs, researchers have mentioned that they have one person who knows how to submit the data, and they all have to bug that person because they can’t figure out how to do it properly.  I’ve read some of the help documentation for some of these repositories, and those people weren’t complaining for nothing.  Many of these systems are a big pain – opaque in many of their requirements and onerous to use, yet many researchers are specifically required to put their data there because of grant or journal requirements.

So if we take Callon’s approach and view the system as a tool for sociological analysis, what does it say about the state of data sharing that some of these repositories are so difficult to use?  I can think of possibilities:

  • that the engineers haven’t really been in all that close of contact with the users, so they’ve built a system that doesn’t actually meet their users’ needs;
  • that the needs of the system administrators (good quality data with a minimal amount of effort on their part) are directly at odds with the needs of the data submitters (also a minimal amount of effort on their part) and the administrators’ needs won out;
  • that the engineers are aware of issues but there just isn’t money/time/resources to make the system easier to use.

Another possibility is that sharing data isn’t really that much of a priority for most researchers, so they go along with a hard-to-use system because it’s not worth the trouble to try to get it to change.  It’s sort of like how I feel like it’s really a huge pain to have to deal with the DMV, but I only have to go there once every few years, so I’m not about to start a huge campaign to reform the DMV, especially when there are bigger problems our elected officials should be dealing with.  Maybe sharing your data in some of these systems is like that – an annoyance you deal with because you have to.

This is all entirely speculation on my part, but I do think it’s an interesting approach to take.  It would be interesting to sit down with some of the people who built or who currently run some of these systems and get the story on why things are the way they are.

One comment

Leave a Reply