Can We Science Our Way out of the Reproducibility Crisis?
Many studies are so thin on details, they’re unverifiable, unusable, or both. Many are too small, badly designed, or otherwise unreliable – and then the results are misinterpreted, the validity exaggerated. Many aren’t published, especially if the results aren’t favorable. It’s the scale of these problems, compounding over the years, that constitutes a reproducibility crisis.
Weak science, harmful policies, and counterproductive work practices burrowed us into this hole. It’s all fueled by unexamined assumptions, cherry-picked data, and anecdote-driven beliefs – and even the way we discuss and try to tackle non-reproducibility can be like that. It’s the opposite of the way scientists are meant to approach the world.
We need to science our way through this – not just with more rigorous methods in research and reporting it, but with evidence-based policies and evaluation as well. Good intentions and stabs in the semi-dark aren’t going to cut it.
Take the issue of publication bias. Thanks to meta-research – research on research – we know our health knowledge base is skewed because desirable results are more likely to be published. People initially assumed the fault lay with journals only wanting to publish positive or particularly interesting results. But studying the problem showed that the responsibility lies overwhelmingly on the scientists’ side. You can’t solve a problem if you misdiagnose the cause.
You can’t get complacent with solutions that are totally on point, either. Even when you have an obvious and widely accepted fix, you can’t assume you can relegate the problem to history.
In the late 1980s, systematic reviews with standardized critiques of key elements of clinical trials took off. One of the problems we faced was critical missing information about what the researchers had done. Without it, you couldn’t judge how reliable the trials’ results were. It was so bad, that you often couldn’t even be sure what had happened to all of the people who had entered the trial.
So CONSORT was created in the mid-1990s. It’s a set of guidelines for what critical elements of trials need to be reported, along with a flowchart tracking what happened to all the people recruited into the trial. Health journals started to adopt CONSORT as their expected standard. And if you move in certain circles, CONSORT is the unquestioned norm.
Yet well over a decade later, even in journals that adopted CONSORT, most trials still weren’t reporting key elements. Things improved. But having accepted guidelines wasn’t enough. When you try to assess methodological quality of individual trials, your assessments are still riddled with lots of question marks. Here’s a real example, that’s comparatively good. It’s 5 trials, all post-CONSORT, assessed on 4 criteria. (This comes from my post on understanding data in meta-analysis.)
Our standards keep getting more demanding, too. In clinical effectiveness research, I think it’s almost like a methodological arms race. To me, that’s another critical reason for being extremely rigorous about research into research methods and policies. If we add burdens to scientific studies without commensurate improvements in reliability, verifiability, ethics, or usability, we’re making things worse.
Then there’s the issue of opportunity costs, when we go down a futile road instead of pursuing something else that might have paid off. I think that’s happening, for example, when people put their eggs in the basket of double-blind peer review as the strategy to fight social biases in journal peer review: end of discussion. That lets journals off the hook from the tougher things they should be doing to address these problems directly. And it might make things worse anyway. (I’ve written about this here, here, and here.)
We got so far down that particular dead-end road with the combination of assumptions, anecdotes, and cherry-picking data I spoke of at the start of this post. Another example, I think, is when a 2016 paper advocating badges at journals to promote reproducible research practices went skidding way off the rails. (More on that here and here.)
It’s often the way, though, isn’t it?, that enthusiasm for a position is in almost inverse proportion to the strength of evidence for it… even for scientists. Enthusiasm can be a powerful biasing mechanism, too. That truly sucks, because effecting serious change in a vast enterprise like “science” takes a colossal amount of energy. We need it to be coupled with scientific rigor. Reforming energy is an important community asset we can’t afford to waste.
~~~~
The cartoons are my own (CC BY-NC-ND license). (More cartoons at Statistically Funny and on Tumblr.)