Delocalizing data – a data sharing conundrum

This week I’ve been reading the second half of Sergio Sismondo’s An Introduction to Science and Technology Studies and I have been finding myself interested in the question of the universality of scientific knowledge and data.  A single sentence that I think captures the scope of the problem I’m finding interesting: “scientific and engineering research is textured at the local level, that it is shaped by professional cultures and interplays of interests, and that its claims and products result from thoroughly social processes” (168).  That is to say, the output of a scientific experiment is not some sort of universal truth – rather, data are the record of a manipulation of nature at a given time in a given place by a given person, highly contextualized and far from universally applicable.

I was in my kitchen the other day, baking a mushroom pot pie, after reading Chapter 10, specifically the section on “Tinkering, Skills, and Tacit Knowledge.”  That section describes the difficulties researchers were having in recreating a certain type of laser, even when they had written documentation from the original creators, even when they had sufficient technical expertise to do so, even when they had all the proper tools – in fact, even when they themselves had already built one, they found it difficult to build a second laser.  As I was pulling my pie out of the oven, I was thinking about the tacit knowledge involved in baking – how I know what exactly is meant when the instructions say I should bake till the crust is “golden brown,” how I make the decision to use fresh thyme instead of the chipotle peppers the recipe called for because I don’t like too much heat, how I know that my oven tends to run a little cold so I should set the temperature 10 degrees higher than called for by the recipe.  Just having a recipe isn’t enough to get a really tasty mushroom pot pie out of the oven, just as having a research article or other scientific documentation isn’t enough to get success out of an experiment.

These problems raise some obvious issues around reproducibility, which is a huge focus of concern in science at the moment.  Obviously scientific instruments are hopefully a little more standardized than my old apartment oven that runs cold, but you’d be surprised how much variation exists in scientific research.  Reproducibility is especially a problem when the researcher is herself the instrument, such as in the case of certain types of qualitative research.  Focus group or interview research is usually conducted using a script, so theoretically anyone could pick up the script and use it to do an interview, but a highly experienced researcher knows how to go off-script in appropriate ways to get the needed information, asking probing questions or guiding a participant back from a tangent.

More relevant to my own research, thinking about data not as representations of some sort of universal truth, but as the results of an experiment conducted within a potentially complex local and social context, can shared data be meaningfully reused?  How do we filter out the noise and get to some sort of ground truth when it comes to data, or can we at all?  Part of the question that I really want to address in my dissertation is what barriers exist to reusing shared data, and I think this is a huge one.  Some of the problem can be addressed by standards, or “formal objectivity” (140).  However, as Sismondo notes, standards are themselves localized and tied to social processes.  Between different scientific fields, the same data point may be measured using vastly different techniques, and within a lab, the equipment you purchase often has a huge impact on how your data are collected and stored.  Maybe we can standardize to an extent within certain communities of practice, but can we really hope to get everyone in the world on one page when it comes to standards?

If we can’t standardize, then maybe we can at least document.  If I measured in inches but your analysis needs length input in centimeters, that’s okay, as long as you know I measured in inches and you convert the data before doing your analysis.  That seems fairly obvious, but how do I necessarily know what I need to document to fully contextualize the data for someone else to use it?  Is it important that I took the measurement on a Tuesday at 4 pm, that the temperature outside was 80 degrees with 70% humidity, that I used a ruler rather than a tape measure, that the ruler was made of plastic rather than wood?  I could go on and on.  How much documentation is enough, and who decides?

The concepts of reproducibility, standardization, and documentation are nothing new, but the idea of data being inextricably caught up in local and social contexts does get me thinking about the feasibility of reusing shared data.  I don’t think data sharing is going to stop – there are enough funders and journals on board with requiring data sharing that I think researchers should expect that data sharing will be part of their scientific work going forward.  The question then is what is the utility of this shared data.  Is it just useful for transparency of the published articles, to document and prove the claims made in those publications?  Or can we figure out ways to surmount data’s limited context and make it more broadly usable in other settings?  Are there certain fields that are more likely to achieve that formal objectivity than others, and therefore certain fields were data reuse may be more appropriate or at least easier than others?  I think this requires further thought.  Good thing I have a few years to spend thinking about it!


Leave a Reply