Study Preregistration and Avoiding the Methods Fetish/Demonization Trap

March 22, 2024 Hilda Bastian Bias Reproducibility

A bulging smug study is saying "I'm bigger!" A cute small study sitting on a data plan says, "But I think I'm better!" (Cartoon by Hilda Bastian.)

This month, there was a thought-provoking 2-day workshop about preregistering studies, held at the Royal Society in London. The theme was “promises and pitfalls.” While there was some data, mostly there were lots of theories and claims to consider. It was a terrific meeting, with speakers and participants generally grappling with the tough task of being realistic. Watching extremely “pro” and “anti” advocates engage with each others’ arguments was enlightening, too, though.

Brian Nosek listed a range of potential benefits for registering some details or full study plans ahead of time:

Transparency of research process – to facilitate evaluation and self-correction;
Existence of research – to identify publication bias and for other meta-estimation;
Reliability of research – to reduce questionable research practices and improve estimation;
Planning of research – to improve quality, efficiency, and planning of research;
Others: avoiding redundancy, participant enrolment, fraud deterrence, better documentation, and public commitment to do the research.

All of these don’t apply to every type of preregistration, which range from registering minimal data through to registered reports, where detailed methods and data analysis plans are peer reviewed, and potentially accepted for publication in principle before results are in. Further, there’s a bunch of dependencies that affect whether or not a potential benefit could even theoretically be realized by a particular preregistration practice.

Take, for example, registration of clinical trials. Mandates pushed the registration of minimal details to routine practice, and the data is mostly very easy to find and download. As Isabelle Boutron showed in her talk, that has had an enormous impact on the measure’s major goals, which fall under the category of “existence of research” in Nosek’s list.

Trial registers power what Boutron called “collective accountability” now that metascientists, regulators, and others can track whether or not the results of planned trials have been published. Trial registers fuel a lot of metascience that informs the field and evolution of its methods. And those systematic reviewers who search trial registers aren’t as much in the dark about the possible extent of publication bias on the questions they’re studying – and can, sometimes at least, chase down unpublished data to improve their meta-analyses.

However, those benefits are meaningful because the practice is essentially ubiquitous. Where preregistration is uncommon, or even rare, it can’t provide broad accountability across topics, or solid knowledge about publication bias.

On the other hand, the minimal data that’s standard in trial registries can’t put much of a dent in other problems – reducing questionable research practices, for example. Nicholas DeVito showed that even with the minimal requirements in registries, the data quality is often inadequate. The potential benefits of what is there aren’t fully realized either. For example, most systematic reviewers don’t search trial registries. In the worst example he cited, only 12% of systematic reviews included a search for studies in registries. (That was anesthesiology.)

Those issues – quality of preregistrations, and following through on the next steps that can realize their potential – are a major obstacle to preregistration achieving the aspirations of the methods at scale. Dedicated effort by individuals and journals, though, can result in improved research for them. We heard lots of testimonials about the research quality payoffs from the level of discipline involved in preparing detailed protocols and data analysis plans. The potential for reducing researcher bias by spelling hypotheses and expectations out in full preregistration was, I think, captured by E.J. Wagenmakers when he said, “It attends me to the fact that I was wrong.”

How often is that ideal reached, though, and what does garden-variety preregistration achieve, other than investing intensive time to the planning stage of research projects? Does that extra time pay off in the long run? And what could possibly go wrong?

Outside of fields where preregistration is well-established, like clinical trials and systematic reviews in biomedicine, there’s not a lot of evidence. Across the workshop, results of several studies in psychology were referred to. The report of the most recent one, by Olmo van den Akker and colleagues, includes a good discussion of their own and previous studies assessing the respective quality of studies that were or were not preregistered. They point out that several factors other than simply their preregistration make these studies different to others in psychology. For example, the researchers who choose these unusual methods self-select and may be systematically different. Therefore, they argue, “causal claims about the effect of preregistration on the proportion of positive results or effect size are difficult to make.”

At the Royal Society workshop, Abel Brodeur also discussed this study he undertook with colleagues, analyzing signs of p-hacking in preregistered and unregistered randomized trials in economics. This study underscores one of the complexities van den Akker raised. Brodeur’s analysis suggested there was no difference between studies that were preregistered and those that weren’t – but studies with full prospective data analysis plans did seem to have less p-hacking.

In 2021, Aline Claesen and colleagues published an analysis of reported and unreported deviations to preregistered research plans in the 27 first studies to get the preregistration badge in Psychological Science. (The 27 studies were reported within 23 articles.) Deviations, they stress, aren’t inherently a problem, especially when it became clear the original plan had mistakes, or was sub-optimal in some respects. They should, however, be reported. Claesen & co determined that only 3 of the 27 studies either had no deviations, or reported them all. Deviations in the Claesen study included very minor variations, and points that are highly contestable.

However, some of the deviations were critical to avoiding the problems preregistration aspires to prevent. Most of the studies – 14 of them – reported analyses that hadn’t been preregistered, and only 3 of them disclosed this. And 5 study reports didn’t include results of all the pre-registered analyses, with 2 of them disclosing/explaining this. Claesen discussed a pair of similar studies in other social science fields that had similar results.

None of this was surprising to me, coming from the biomedical field where preregistration is well-established. We had published protocols for Cochrane systematic reviews, for example, from day 1 in the early 1990s. These were always a form of registered report, though that terminology wasn’t around back then. And for years, I worked with public agencies that not only preregister, but also have major public consultations on draft protocols. Yet, even with preregistration expected as standard, we don’t lower the guards. Within systematic reviews, we still expect trials to be assessed for biased results reporting, regardless of registration. And in assessing the quality of systematic reviews, we expect to check whether there are significant unreported deviations from the protocol.

What surprises me is when people treat preregistration as if it’s proof of superiority, without further critique. I recently criticized a systematic review in psychology as an example of this, and why it’s such a problem. I raised this in the panel I was in at the end of the Royal Society workshop, as an example of fetishization of preregistration – endowing the act of preregistration alone with awesome powers it doesn’t have. Inflated valuation of it is one of current downsides of the use of preregistration in psychology, for example, but it’s an avoidable one.

After attending this workshop, I think demonization of preregistration is a bigger problem than hype in the social sciences. Coming from outside that community, I hadn’t appreciated its extent and influence. The flurry of straw man arguments is quite something, and it muddies the discussion of genuine potential pitfalls, diverting people’s energy to defensiveness. Syed Moin wrote about this in his excellent post about the workshop:

With preregistration, and open science practices in general, folks seem to feel free to fire off half-baked criticisms, raising issues that have already been fully addressed. This was on full display at the meeting, with comments about how preregistration prohibits exploratory analysis (false), that researchers are “locked in” to their preregistration plan (false), that preregistration prohibits optional stopping of data collection within a frequentist framework (false), that preregistration may not be appropriate for public datasets (false), that preregistration’s primary use is for replication studies (false), and so on. These are all just false claims, and it is disappointing to see them trotted out again and again.
Moin Syed, Preregistration: More Promises than Pitfalls (2024)

I’d add a couple more that had me mentally shaking my head in frustration over and over during some presentations. Firstly, arguments based on the claim that people have to pre-select only a single outcome or model. And secondly, dismissing preregistration entirely just because it doesn’t entirely, on its own, solve a particular problem. Sheesh! As Fiona Fidler pointed out, editorial interventions typically have trivial or even negligible impacts on the complex problem of research quality. Even if preregistration only has a modest effect, it would be more than pulling its weight.

In Syed’s post, he wrote, “There was a stark difference at the meeting between people who did and did not consider the possibility of researcher bias.” I agree with this, too. Syed digs into the argument about whether predetermined versus post hoc reasoning is inherently meaningful. I agree with him again: To argue that there’s no real difference, is to ignore the systemic distortion of bodies of evidence that results from researcher bias.

An example of this came in Chris Donkin’s talk. He argued: “Replacing one arbitrary reason for an analysis (i.e., the size of a p-value) with another (i.e., I said it beforehand) doesn’t solve the fundamental issue.” Neither of these is arbitrary. Indeed, choosing an outcome to report based on the size of the p-value is the very opposite of arbitrary, and we know it has a pernicious effect. Pre-specifying outcome analyses with transparent reporting could ameliorate this. It just needs to make a worthwhile difference; it doesn’t have to be a panacea.

Fortunately, the workshop participants didn’t expend all their energy batting down arguments based on a caricature of preregistration. There was plenty of discussion of journals’ perspectives, with several journals represented among the participants, as well as whether there are some types of study where preregistration isn’t likely to have potential benefits.

Several serious potential pitfalls emerged, as well:

What Wagenmakers called “model myopia” – the opposite of cherry-picking, this is testing and reporting the results from only a single statistical model: Could preregistration increase its incidence?
Could preregistration increase the risk of false-negatives, curtailing potentially valuable lines of enquiry?
Could people feel unnecessarily constrained from deviating from their plans, and thus limit the potential for discovery?
Simine Vazire discussed the power imbalance for many researchers submitting a study for a registered report. When journals reject a study with “negative” results, the outcome can be publication bias if the study remains unpublished. Could rejections of research plans submitted for a registered report lead to people not doing studies at all?

The last part of the meeting concentrated on registered reports, and the results-independent editorial review and decision-making at the heart of it. Anne Scheel introduced us to interesting work she’s doing with colleagues on risk perspectives for researchers considering going this route. The incentive structure for registered reports, she pointed out, has trade-offs in the current context, where highly prestigious journals don’t offer it. If you get an in-principle acceptance from a journal before you run the study, and you don’t have “spectacular” results, you end up ahead, as you get your publication without having to invest a lot of effort late in the piece – potentially at more than one journal, too. However, if you have “spectacular” results, you could have a good chance of scoring a spot in a very high status journal.

I’m going to end with a highlight of the workshop. Chris Chambers introduced a registered report model 11 years ago that’s now been adopted by more than 350 journals. He listed several limitations of it though, including the time peer review of the research plan adds to the project, and that it’s not well-suited to programmatic research with multiple potential publications from one plan. Critically, it leaves the power over the practice, model, and its development in the hands of journals, not the academic community.

Enter Registered Reports 2.0, Peer Community in Registered Reports. This isn’t a journal. It’s an academic community that de-couples peer review from journal publication. It is, Chambers said, “managed, controlled, and owned by the community.” It’s free to authors, and peer reviews are published (with or without names), linked to preprints and eventual journal publications.

PCI has a distributed funding model – relatively small donations from a number of sources distributes the load and protects the model from the impact of a major funder pulling out. The list is here.

If you want to go the RR route with them, you give a short summary of your plans well a couple of months before the Stage 1 report (the full research plan) is ready, and schedule peer review. By scheduling peer review in advance for each stage, the goal is to crunch down the process into days, not months. The website doesn’t seem to have data on actual turnaround and journal publications. PCI RR only launched in April 2021, though, so it’s very early days for a preregistration initiative.

There are about 3 dozen “PCI RR-friendly” journals that commit to offering in-principle acceptance to any research that gets this far if the authors want to publish with them – including Collabra: Psychology, Cortex, and Royal Society Open Science (listed here). There are also some “PCI RR-interested” journals, including PLOS Biology and Nature Human Behaviour, that get notified of PCI RR new recommendations. They might contact authors to offer publication, with or without their own additional peer review (listed here).

Chambers had begun his talk asking us to “imagine a publication utopia.” If scheduled, free-to-user, community-controlled peer review, recognized widely by journals, can work at scale, this model would get pretty close to many people’s utopia. This is going to be interesting!

You can follow PCI RR on Mastodon @pcirr@spore.social and on BlueSky @pci-regreports@bsky.social

Point them to Chris Chambers’ talk if you would like to encourage a journal to become PCI RR-curious (that’s not a thing – I made that designation up!). Here are FAQs for journals.

The recording of the Royal Society workshop is on YouTube: Day 1, Day 2.

~~~~

Disclosures: I received travel and participation support from the Royal Society to attend the March meeting on preregistration, in London.

The cartoon is my own (CC BY-NC-ND license). (More cartoons at Statistically Funny.)

Leave a Reply Cancel reply