Open Badges Redux: A Few Years On, How’s the Evidence Looking?
A couple of years ago, I took shots at the way a small, uncontrolled study from 2016 of badges to encourage open science practices at a journal was done, reported, spun, and then hyped far and wide. How small? There were about 4 open badges a month, across a 2-year period.
The hype? Badges are a “simple” and “low cost” way to have a “dramatic” effect on data sharing. The hallmarks of a “sounds too good to be true” claim, eh?
In fact, although the article didn’t report it, the badges were one of a suite of concurrent interventions designed to improve rigor of articles accepted at the journal, and some were very intensive – anything but simple and low cost. And their introduction was accompanied by a drop in articles published (from 25 articles a month in the preceding 2-year period to 19). (Data here.)
The marketing hype continues, though, and the promoters, the Open Science Framework (OSF), report 66 journals now use their badges. There has been tons of time and opportunity to gather and report evidence, hasn’t there? So let’s revisit.
What’s been happening at the journals in the study? Psychological Science (PS) is the journal that introduced the badges and other measures. I had gathered long-term publication data on PS and 3 of the 4 comparator journals, and I’ve updated it. (I had excluded one of them because it only started publishing in 2013. It has also since added open badges.) At the end of 2016, one of the comparator journals had overtaken the annual publication rate of PS: in 2019, all of them did. (Data notes below this post.)
Psychological Science (red) & comparator journals:
PubMed records 2002 to 2019
The outgoing editor of the journal, D. Stephen Lindsay, reported this month that submissions to the journal have also declined: from 2,700 in their peak year (2011), to around 1,700 per annum the last 3 years. The journal’s impact factor declined in 2018. The interventions at the journal clearly came at great cost in several ways. Lindsay took over editorship in July 2015, extending the editorial practices intended to improve scientific rigor. They have invested in 6 statistical advisors, for example. Kudos!
At the end of the study period, the rate of open data badges was 39%: among research articles in the print issues July-December 2019, I counted a rate of 43%. Lindsay reports some fluctuation over time, with just over 60% for the full year. Only 8% of the total 64 articles I counted for that half year were open access. As I have argued, a study report is data, too. The badges at this journal, then, give a misleading impression of how open the work is.
My previous conclusion was that the data are consistent with the journal repelling potential authors who would not have open data, and attracting those who would. With a substantially shrinking denominator, that could account for the magnitude of change. (See the additional discussion and data here.) I think Lindsay’s data on submission rates strengthens the case for this explanation.
Could badges alone be enough to get people to make data open that they otherwise would not have, and on a large scale? Because that’s the OSF’s claim:
Implementing these badges dramatically increases the rate of data sharing.
To establish that, we ideally need controlled studies of the badges intervention alone. And because this is so public, and data-sharers moving among journals doesn’t increase sharing, we also need to see behavior studied across a field, taking into account funders’ policies on open data. Obviously, if a funder mandated it, a journal’s badge down the line can’t take credit for the sharing behavior. Other types of study would be relevant, too, particularly wide-ranging data from all 66 journals with badges, and studies of researchers’ attitudes to badges and data sharing.
Here’s what I found.
There seemed to be no planned evaluation for the wide use of OSF badges.
In May 2017, a relevant systematic review was published by Anisa Rowhani-Farid and colleagues. The OSF-related study of PS was the only one they could find of badges. Two of the review’s authors went on to study a form of badging at the journal Biostatistics (2018). This was another observational study, comparing the journal to another that didn’t use any kind of badging (Statistics in Medicine). Biostatistics had introduced adding a letter prominently to signify data sharing in 2009 – D for data, and C for code: if both were tested for reproducibility, the article got an R for reproducible.
Rowhani-Farid and Barnett randomly selected 30 articles a year from each journal from 2006 to 2013, 480 papers altogether. They had calculated they needed only 19 articles to detect the level of increase seen in the study of PS. They wrote:
It is clear that data availability and probability of sharing were greater over time in Biostatistics than in the control journal, Statistics in Medicine, but the probability of sharing data at Biostatistics was still low, at well below 0.25. Interestingly an increase in data sharing at Biostatistics took place before badges were introduced at the journal…After the introduction of badges at Biostatistics, the probability of data sharing increased 3.9 times. This prevalence ratio might seem like a large increase but on an absolute scale it is only a 7.6% increase in the rate of data sharing, which is much lower than the 37.9% effect of badges at Psychological Science…Badges did not appear to have an effect on code sharing.
They noted Biostatistics also had a change of editor in the year their evaluation started. The new editor was Roger Peng, a reproducibility enthusiast. In 2011, he wrote that we needed infrastructure and culture change to increase sharing of data and code: it didn’t seem to me he was relying on badging to change practice.
There’s a saying in biomedicine about having randomized controlled studies: studies with enthusiasm have no controls, and those with controls, have no enthusiasm. Apparent effects from observational studies can melt away with more rigorous research. That’s in part because of greater rigor. But it can also be because observational studies that found an effect were more likely to get published (or get attention). If a few of those 66 journals went to print reporting an impact, for example, we wouldn’t know if all the rest had looked and found no impact.
A case in point. Here’s a set of slides for a pilot study that hasn’t been reported in a journal, at least not yet. Christine Hurrell and colleagues, inspired by the PS study, explored introducing badges to encourage depositing articles in a repository. Badges wouldn’t be enough of an incentive, they concluded, as long as depositing the article took around 11 minutes. Right. Well, that’s substantially less effort than preparing a data set for release that hadn’t been planned.
Effort, as well as concerns about potentially losing the credit if others produced more work from your data before you do, are the elephants in this room, not a lack of a badge.
Lisa Federer and colleagues cite 2 studies that found up to 25% of researchers would avoid publishing in a journal that required data sharing. Lindsey Harper and Youngseek Kim studied psychologists’ atttitudes to open badges and data sharing after the PS study, but they only got a 12% response rate. In that study, people for whom data sharing meant a lot of work weren’t particularly impressed by the lure of a badge.
The weight of disincentives that a badge has to compete with is summed up in a qualitative study with high energy physicists. Sebastian Feger & co reported that even getting extra citations was only considered a mild incentive: “It’s more motivating to start a new analysis, other than spending time encoding things…”
Daniel Nüst and colleagues write about what seems to me a much better case for badges: badges as a way to search for data across systems. (They point out that the Association for Computing Machinery has badges, too. More on that here.) But that’s not how they’re being implemented or advocated by OSF.
What could be more effective than a badge? There are another couple of studies in the psychological field, of introducing a data sharing policy. Tom Hardwicke & co found sharing bumped up at the journal Cognition from 25% to 78% after introduction of a policy. Based on another observational study of journals in psychology, Michèle Nuitjen & co wrote “We noted that journal policy on sharing data seems highly effective” (see figure 4 in that paper). Which underscores why introducing badges as part of a package isn’t going to tell you much about badges.
Bottom line here: we still don’t know whether badges result in data being shared that otherwise wouldn’t have been. A combination of making data-sharing easier and obligatory seems to be what it takes for culture change on scale. “Science” is behavior, as Emma Norris and Brian O’Connor write, and we already know how hard that is to change. Depressingly, the open badges story is a reminder that the use of hyper-biased research for advocacy is another form of behavior that’s hard to change.
What could possibly go wrong?
[Update, 31 December 2019]: Added a link to information about the badges at the Association for Computing Machinery – thanks to Melissa Rethlefson via Twitter.]
Disclosures: When I wrote the original post, my day job was with a public agency that maintains major literature and data repositories (National Center for Biotechnology Information at the National Library of Medicine/National Institutes of Health). I don’t work there any more. I have had a long relationship with PLOS, including several years as an academic editor of PLOS Medicine, and serving in the human ethics advisory group of PLOS One, as well as blogging here at its Blog Network. PLOS does not use the OSF badges. I have been a user of the OSF, but as an R user, I now prefer GitHub.
The original posts:
Bias in Open Science Advocacy: The Case of Article Badges for Data Sharing
What’s Open, What’s Data? What’s Proof, What’s Spin?
Absolutely Maybe posts tagged open science.
Data notes on PubMed records by journal:
There were additional 2016 records for Psych Sci when I did the search, so data for all journals was updated from 2016. The new data, including the PubMed search terms, are here. The searches were done on 29 December 2019, so the 2019 data may not be complete.
The cartoon and photos of badges are my own (CC BY-NC-ND license). (More cartoons at Statistically Funny and on Tumblr.)