Can We Science Our Way out of the Reproducibility Crisis?

June 27, 2018 Hilda Bastian Reproducibility

Many studies are so thin on details, they’re unverifiable, unusable, or both. Many are too small, badly designed, or otherwise unreliable – and then the results are misinterpreted, the validity exaggerated. Many aren’t published, especially if the results aren’t favorable. It’s the scale of these problems, compounding over the years, that constitutes a reproducibility crisis.

Weak science, harmful policies, and counterproductive work practices burrowed us into this hole. It’s all fueled by unexamined assumptions, cherry-picked data, and anecdote-driven beliefs – and even the way we discuss and try to tackle non-reproducibility can be like that. It’s the opposite of the way scientists are meant to approach the world.

We need to science our way through this – not just with more rigorous methods in research and reporting it, but with evidence-based policies and evaluation as well. Good intentions and stabs in the semi-dark aren’t going to cut it.

Take the issue of publication bias. Thanks to meta-research – research on research – we know our health knowledge base is skewed because desirable results are more likely to be published. People initially assumed the fault lay with journals only wanting to publish positive or particularly interesting results. But studying the problem showed that the responsibility lies overwhelmingly on the scientists’ side. You can’t solve a problem if you misdiagnose the cause.

You can’t get complacent with solutions that are totally on point, either. Even when you have an obvious and widely accepted fix, you can’t assume you can relegate the problem to history.

In the late 1980s, systematic reviews with standardized critiques of key elements of clinical trials took off. One of the problems we faced was critical missing information about what the researchers had done. Without it, you couldn’t judge how reliable the trials’ results were. It was so bad, that you often couldn’t even be sure what had happened to all of the people who had entered the trial.

So CONSORT was created in the mid-1990s. It’s a set of guidelines for what critical elements of trials need to be reported, along with a flowchart tracking what happened to all the people recruited into the trial. Health journals started to adopt CONSORT as their expected standard. And if you move in certain circles, CONSORT is the unquestioned norm.

Yet well over a decade later, even in journals that adopted CONSORT, most trials still weren’t reporting key elements. Things improved. But having accepted guidelines wasn’t enough. When you try to assess methodological quality of individual trials, your assessments are still riddled with lots of question marks. Here’s a real example, that’s comparatively good. It’s 5 trials, all post-CONSORT, assessed on 4 criteria. (This comes from my post on understanding data in meta-analysis.)

*Green plus sign is a definite “yes”, a red minus sign is a definite “no”, and a yellow question means the information wasn’t properly reported.*

Our standards keep getting more demanding, too. In clinical effectiveness research, I think it’s almost like a methodological arms race. To me, that’s another critical reason for being extremely rigorous about research into research methods and policies. If we add burdens to scientific studies without commensurate improvements in reliability, verifiability, ethics, or usability, we’re making things worse.

Then there’s the issue of opportunity costs, when we go down a futile road instead of pursuing something else that might have paid off. I think that’s happening, for example, when people put their eggs in the basket of double-blind peer review as the strategy to fight social biases in journal peer review: end of discussion. That lets journals off the hook from the tougher things they should be doing to address these problems directly. And it might make things worse anyway. (I’ve written about this here, here, and here.)

We got so far down that particular dead-end road with the combination of assumptions, anecdotes, and cherry-picking data I spoke of at the start of this post. Another example, I think, is when a 2016 paper advocating badges at journals to promote reproducible research practices went skidding way off the rails. (More on that here and here.)

It’s often the way, though, isn’t it?, that enthusiasm for a position is in almost inverse proportion to the strength of evidence for it… even for scientists. Enthusiasm can be a powerful biasing mechanism, too. That truly sucks, because effecting serious change in a vast enterprise like “science” takes a colossal amount of energy. We need it to be coupled with scientific rigor. Reforming energy is an important community asset we can’t afford to waste.

~~~~

The cartoons are my own (CC BY-NC-ND license). (More cartoons at Statistically Funny and on Tumblr.)

You wrote:

“We need to science our way through this – not just with more rigorous methods in research and reporting it, but with evidence-based policies and evaluation as well. Good intentions and stabs in the semi-dark aren’t going to cut it.”

“Another example, I think, is when a 2016 paper advocating badges at journals to promote reproducible research practices went skidding way off the rails. (More on that here and here.)

I would like to add that i reason there might be a possible danger in (solely) depending on, and emphasizing, evidence-based decision making. I reason everything that gets measured or investigated has the potential to (re-) present a skewed picture (intentionally or unintentionally). Just take a look at the links you provided concerning the paper about the benefits of the “badges”.

Another (possible) danger is that “meta-scientific research” can be used as a way to not really engage with any criticism about proposed improvements, and subsequently simply implement them because you want to. You can say stuff like “we will have to investigate and see how X will influence Y” etc, hereby bypassing every criticism and (to the casual listener/reader) still sounding like you mean well. I feel this is already being done.

Like someone mentioned in the following discussion about the presented evidence for the potential benefits of “open science”: “Sometimes you do things based on principle, not evidence.“

http://andrewgelman.com/2018/08/06/lets-open-evidence-benefits-open-science/

Next to principles, other things may also be important to base decisions on. In light of this, i am (still) puzzled about (the implementation of) “Registered Reports” where journals and/or authors can leave out the pre-registration information in the paper like it’s no big deal, and like nothing has happened in the past decade concerning the awareness of editors/journals/authors misrepresenting evidence and/or the research process in their papers. See here:

http://andrewgelman.com/2017/09/08/much-backscratching-happy-talk-junk-science-gets-share-reputation-respected-universities/#comment-560672

Discussion

Anonymous says:

August 16, 2018 at 12:00 am

You wrote:

“We need to science our way through this – not just with more rigorous methods in research and reporting it, but with evidence-based policies and evaluation as well. Good intentions and stabs in the semi-dark aren’t going to cut it.”

&

“Another example, I think, is when a 2016 paper advocating badges at journals to promote reproducible research practices went skidding way off the rails. (More on that here and here.)

It’s often the way, though, isn’t it?, that enthusiasm for a position is in almost inverse proportion to the strength of evidence for it… even for scientists. Enthusiasm can be a powerful biasing mechanism, too. ”

I would like to add that i reason there might be a possible danger in (solely) depending on, and emphasizing, evidence-based decision making. I reason everything that gets measured or investigated has the potential to (re-) present a skewed picture (intentionally or unintentionally). Just take a look at the links you provided concerning the paper about the benefits of the “badges”.

Another (possible) danger is that “meta-scientific research” can be used as a way to not really engage with any criticism about proposed improvements, and subsequently simply implement them because you want to. You can say stuff like “we will have to investigate and see how X will influence Y” etc, hereby bypassing every criticism and (to the casual listener/reader) still sounding like you mean well. I feel this is already being done.

Like someone mentioned in the following discussion about the presented evidence for the potential benefits of “open science”: “Sometimes you do things based on principle, not evidence.“

http://andrewgelman.com/2018/08/06/lets-open-evidence-benefits-open-science/

Next to principles, other things may also be important to base decisions on. In light of this, i am (still) puzzled about (the implementation of) “Registered Reports” where journals and/or authors can leave out the pre-registration information in the paper like it’s no big deal, and like nothing has happened in the past decade concerning the awareness of editors/journals/authors misrepresenting evidence and/or the research process in their papers. See here:

http://andrewgelman.com/2017/09/08/much-backscratching-happy-talk-junk-science-gets-share-reputation-respected-universities/#comment-560672

Jean Pierre says:

October 6, 2018 at 12:00 am

Indeed we are facing a big reproducibility crisis, and the worst that we could do is looking to the other side. It is time to start to flag those pieces of work that having big impact are simple not reproducible: http://reproducescience.com/

Leave a Reply Cancel reply