5 Things We Learned About Peer Review in 2023

December 31, 2023 Hilda Bastian Listicles Peer review research roundup Science Communication

This is my fifth annual research roundup about journal peer review. And we’re only averaging about one randomized trial a year. This year, there’s one, too. It’s the biggest one yet on whether or not to conceal authors’ names from peer reviewers, so it’s a particularly welcome addition. As important as this new evidence is, though, it also highlights why we need more if we’re to get a good handle on this critical part of science. Let’s get stuck straight in…

The largest trial to date of not naming authors to peer reviewers (“double-blind review”) was published. The authors concluded that there was a substantial reduction in prestige bias.
Language models can be trained to identify authors in from 70-90% of papers based on the text and references. In most cases, the first 512 words starting from the abstract are enough text.
Politeness in peer reviews could theoretically be measured computationally – if so, automated politeness checks might be feasible.
There are several soundly-based free online training courses for peer reviewers. There isn’t enough evidence, though, on whether training could substantially improve the quality of peer review.
There are several schools of thought about why, therefore how, peer review should be improved, and there are some important tensions between them.
Other interesting studies

1. The largest trial to date of not naming authors to peer reviewers (“double-blind review”) was published. The authors concluded that there was a substantial reduction in prestige bias.

[Randomized trial]

This is a very important trial – it dwarfs all the previous randomized trials on this question. It was run at an ecology journal, Functional Ecology, and that makes it the first large randomized trial in a non-medical journal. It definitely shifts the needle on what we know, because it’s the first time a large trial has shown a substantial effect of not naming authors.

I have somewhat less confidence than the authors in the strength of the evidence, and am unsure how big the effect was. Plus I’m doubtful about how much the result translates to the chances of being published at that journal, and how generalizable the results are to other journals. I wrote a detailed post about why – and how the results compare to trials from medical journals.

The context varies a lot from journal to journal – both in how much bias comes from editors, and how feasible it is to conceal the identity of authors from peer reviewers. That means we need quite a few good quality trials at a range of journals to get clarity on this question. We don’t have that. With only 3 trials in the last decade, and only 3 large trials at all, we’re not likely to have strong enough evidence on this critical question any time soon.

Charles W. Fox and colleagues (2023). Double-blind peer review affects reviewer ratings and editor decisions at an ecology journal.

Back to contents

2. Language models can be trained to identify authors in from 70-90% of papers based on the text and references. In most cases, the first 512 words starting from the abstract are enough text.

[Machine learning study]

The authors of this study developed a dataset and tested it with a language model in over 2 million preprints on arXiv. It was tested in large and small subsets of arXiv. In the largest, they were able to correctly attribute authors for 73% of papers. In smaller subsets, the success rate reached over 90%.

Self-citation was a big give-away, and 11% of citations were of the authors’ own work. But the machines can figure it out even without the self-citations.

The authors of this study found that the first 512 words were enough text to provide robust author attribution: “We believe that this is because the abstract and introduction often express the authors creative identity together with the research field. These personalized characteristics enable identification of the authors.”

Leonard Bauersfeld and colleagues (2023). Cracking double-blind review: Authorship attribution with deep learning.

Back to contents

3. Politeness in peer reviews could theoretically be measured computationally – if so, automated politeness checks might be feasible.

[Training dataset for machine learning]

“Language,” Bharti and colleagues write, “relates to power and status.” Considering politeness, too, could be an important part of understanding power dynamics in peer review, as it is for other workplace topics. To advance this, they developed an annotated dataset that can be used as a training set in language models. It’s called PolitePEER.

The sources for the sentences they curated came from a diverse range of sciences, and included information processing sources, Publons, and ShitMyReviewersSay. The training set of data included 2,500 sentences for which the tone had been annotated by 1 of 4 annotators:

Highly impolite
Impolite
Neutral
Polite
Highly polite

This is a very early stage of research – too soon to know how useful a path this might be. Most of the dataset was classified as neutral – so much so, that the authors ended up re-using multiple copies of the same sentences to shore up other categories. Although they included a range of sources, it was still very limited. The dataset was developed and evaluated in the same data source.

The authors also pointed out that some of the current models you can use this data with weren’t particularly good at detecting impolite sentences when a polite word like please or sorry was included in a derogatory remark – or when there was insinuation or sarcasm in long sentences and passages. There’s a tendency, then, to

A bonus for other researchers in this paper is their description of other available major datasets for studying peer review.

Prabhat Kumar Bharti and colleagues (2023). PolitePEER: does peer review hurt? A dataset to gauge politeness intensity in the peer reviews.

Bewigged English lawyer calling fellow lawyer saying "With all due respect, your Honor, my learned friend is a wackaloon." (Cartoon by Hilda Bastian.)

Back to contents

4. There are several soundly-based free online training courses for peer reviewers. There isn’t enough evidence, though, on whether training could substantially improve the quality of peer review.

[2 systematic reviews]

The first review on this topic was by Willis and colleagues. They searched for online training materials for scholarly peer review, and evaluated the quality of the training materials. The authors rated only 4 of the 20 as. All 4 are free online:

Natural sciences: a course from Nature.
General: a course from the publisher Elsevier.
Biomedicine: a course from the Cochrane Collaboration’s Eyes and Vision Group.
Clinical trials: COBPeer, a CONSORT-based course.

The second review by Hesselberg and colleagues covered peer reviewer training for journals and for grants. They found 8 randomized trials for journal peer review. However, only 3 of the 8 interventions tested were what most of us would call formal training. None of them tested the courses listed above.

There were at most small improvements in peer review quality in the trials of those 3 interventions. That’s not enough evidence to be sure about the potential of training or mentoring – either for review quality or other outcomes.

Jessie Willis and colleagues (2023). Limited online training opportunities exist for scholarly peer reviewers.

Jan-Ole Hesselberg and colleagues (2023). Reviewer training for improving grant and journal peer review.

Back to contents

5. There are several schools of thought about why, therefore how, peer review should be improved, and there are some important tensions between them.

[Essay]

Based on their previous work, Waltman and colleagues suggest there are 4 schools of thought among those seeking to improve peer review:

The Quality & Reproducibility school;
The Democracy & Transparency school;
The Equity & Inclusion school; and
The Efficiency & Incentives school.

These differing perspectives, while generally complementary, can lead to different, and sometimes conflicting, priorities in peer review reform.

For example, the Quality & Reproducibility school is focused on evaluating and improving research quality and reproducibility, leading to interest in interventions like statistical peer review and peer reviewer training.

The Democracy & Transparency school is focused on making evaluation of research more democratic and transparent, leading to interest in interventions like open peer review and soundness-only peer review.

The Equity & Inclusion school is focused on making the evaluation of research more equitable and inclusive, leading to an interest in interventions like increasing reviewer diversity and anonymity in peer review.

The Efficiency & Incentives school is focused on improving efficiency and incentives for peer reviewers, leading to an interest in interventions like portable peer review and reviewer recognition.

There’s a lot of complementarity, says the authors, but there are tensions, too. An obvious one is the tension over openness versus anonymity in peer review, with concerns about impact on fairness and review quality. Other tensions could arise, for example, between the goals of diversity in peer review and improved efficiency.

The authors make the case that, “To improve peer review, the ideas and ambitions of all four schools need serious consideration.” The peer review system, they argue, is interconnected with other developments in the research system, and a wide variety of stakeholders. Action to improve it “should be based on a rigorous evidence-informed understanding of the peer review system,” they say. And that means more study and experimentation with “new forms of peer review.”

Ludo Waltman and colleagues (2023). How to improve scientific peer review: Four schools of thought.

Back to contents

Other interesting studies

About 11% of over 45,000 open peer reviews were signed in a study of reviews from the MDPI journals. The authors’ conclusion: “Signed reviews tend to be 15% longer (perhaps to be more careful or polite) but gave similar decisions to anonymous reviews.”
In last year’s post, I included several abstracts of studies that were presented at the Peer Review Congress. Several have now been published – notably, a pair of randomized trials that found reminding peer reviewers about items from reporting guidelines didn’t improve completeness of reporting (Speich and colleagues, 2023).
More on the uneven playing of peer review: A study of over 200,000 submissions to 60 single-anonymous journals published by Institute of Physics Publishing found that peer reviewers were more likely to give a positive review when the authors were from the same country as they were. And authors from wealthier countries were more likely to have peer reviewers from their own country assigned to their manuscripts.
There’s a report from EQUAP², a project evaluating the quality assurance process in scholarly publishing. This one is a survey of more than 3,200 scientists from German and Swiss universities and research institutions.
Based on a survey of 354 participants (44% from Canada), the global cost of peer review was estimated at US$6 billion in 2020. An analysis I picked in my “5 things we learned” for 2021 concluded it was more than a couple of billion dollars in 2020 (Aczel and colleagues, 2021).

Back to contents

~~~~

You can keep up with my work via my free newsletter, Living With Evidence.

This is the 7th post of a series on peer review research – that started with a couple of catch-ups on peer review research milestones from 1945 to 2018:

Peer review research roundups

All posts tagged “Peer Review”

Disclosures: I’ve had a variety of editorial roles at multiple journals across the years, including having been a member of the ethics committee of the BMJ, and being on the editorial board of PLOS Medicine for a time, and PLOS ONE‘s human ethics advisory group. I wrote a chapter of the second edition of the BMJ‘s book, Peer Review in Health Sciences. I have done research on post-publication peer review, subsequent to a previous role I had, as Editor-in-Chief of PubMed Commons (a discontinued post-publication commenting system for PubMed). I have been advising on some controversial issues for The Cochrane Library, a systematic review journal which I helped establish, and for which I was an editor for several years.

The cartoons are my own (CC BY-NC-ND license). (More cartoons at Statistically Funny.)

Discussion

Meredith Warshaw says:

January 1, 2024 at 3:27 am

The first time I did peer review, it was for a paper my boss had been asked to review. He mentored and supervised my review – something I highly recommend be done for all first reviews. The most important piece of quality control advice he gave me was to always write my reviews as if my name would be on them.

1. The largest trial to date of not naming authors to peer reviewers (“double-blind review”) was published. The authors concluded that there was a substantial reduction in prestige bias.

2. Language models can be trained to identify authors in from 70-90% of papers based on the text and references. In most cases, the first 512 words starting from the abstract are enough text.

3. Politeness in peer reviews could theoretically be measured computationally – if so, automated politeness checks might be feasible.

4. There are several soundly-based free online training courses for peer reviewers. There isn’t enough evidence, though, on whether training could substantially improve the quality of peer review.

5. There are several schools of thought about why, therefore how, peer review should be improved, and there are some important tensions between them.

Other interesting studies

Leave a Reply Cancel reply