The Mastodon migration from Twitter in 2022 was massive for the non-commercial community network: The number of accounts swelled with more than…
5 Things We Learned About Journal Peer Review in 2025

Back in 2019 I wrote a couple of posts summarizing what we had learned from research about peer review at journals. Since then, I’ve done an annual research roundup to keep up with the field. This is the ninth of those posts. (The posts in this series are tagged here.)
Peer review is still under-researched (Vendé 2025). The research is patchy, too, with experimental studies disproportionately done in medical journals. So we’re a long way from solid answers to a lot of pressing questions. Each year when I do these posts, though, I find a lot of food for thought. Here are my 5 topics from 2025’s research, with a summary for each, linking to more detail about the studies:
- The use of “AI” in the science publication system is causing massive problems, but some positive uses in peer review have been suggested.
- Offering an honorarium to peer reviewers might increase acceptance rates, but there still isn’t much evidence on this question.
- A pair of studies added to what we know about the critical influence of editors’ biases and the review process.
- Description of peer reviewers’ use of an equity/diversity/inclusion checkbox at medical journals suggests this practice is worth exploring further.
- Having a librarian or information specialist as a co-author of a systematic review seems to reduce the study’s risk of bias more than having one as a peer reviewer.
1. The use of “AI” in the science publication system is causing massive problems, but some positive uses in peer review have been suggested.
[A descriptive study and a comparison of LLM prompts]
A deluge of fully- or partially-AI-generated articles are straining the already-stretched seams at preprint servers and journals. That appears to be affecting computer science most dramatically so far, with a report of 20% AI-generated submissions at a conference that publishes proceedings. The arXiv no longer accepts computer science reviews, and has changed its endorsement policy to try to stem “the flood of low-quality, non-scientific submissions.”
This technology is adding hallucinated references to the problems that can plague manuscripts, and super-charging paper mills, and the generation of fake authors and peer reviewers. Other ways this is helping paper mills is by enabling bypassing of plagiarism detectors and generating AI images (Albakina 2025).
Meanwhile, using this technology to help produce manuscripts is dramatically boosting the number of them researchers can produce (Kusumegi 2025—accessible discussion here). As long as using this technology remains affordable, hyperproduction is likely to escalate. As Albakina and colleagues point out, the system can’t go on like this: The incentives in academia have to be changed.
If the advent of this technology does precipitate that change in culture, that would be a great outcome. There are other potential positive contributions it could make. At the 2025 International Peer Review Congress, Isaac Kohane argued that machines may soon be able to produce higher quality peer reviews than a majority of current reviewers. Given how inadequate so much peer review is, that’s not saying much. He also predicts “Reviews for reproducibility and accuracy are highly likely to become ubiquitous comprehensive processes rather than the artisanal passion project of a few. Gamesmanship of citation networks as currently practiced will become easily unraveled.”
A couple of studies showing some uses of LLMs in peer review with the potential to support the improvement of peer review stood out for me.
The first by Lan Jiang and colleagues used the technology to compare human notations of adherence to reporting guidelines for 100 pairs of randomized trial protocol and results reports (200 articles), with using GPT-4o prompts. The reporting guidelines are SPIRIT for RCT protocols, and CONSORT for RCT results.
The researchers used F1 score to compare the results: That’s the harmonic mean of precision and recall, which ranges from 0 to 1, with a higher score being better. They compared the LLM’s results with a baseline of the most common response for each question in the training set. The LLM scored over 0.8 for each measure reported. This could, they argue, “support peer review workflows,” and describe an early stage of development. If these checks aren’t being done, or checks are only cursory, this could improve the status quo. But if it replaced high quality human assessment, it might be a downgrade. At this early stage, though, it sounds more like an add-on activity.
The second study aimed to see if an LLM could help identify a potential pool of peer reviewers with more gender and geography diversity. Teixeira fed the titles and abstracts of 50 articles randomly selected from 5 high impact medical journals into GPT-4o, then randomly used one of 2 prompts to get a list of 20 experts in the field. One of the prompts included a specification for “a balanced representation of gender and geographical diversity, including scientists from low- and middle-income countries.” Tests for LLM consistency were run. The author then assigned gender and country of birth to the scientists in the list where possible, as well as current institutional affiliation, and their publications in PubMed and their h-index where possible.
Without the diversity prompt, the scientists on the lists were 68% male, and 95% were affiliated with institutions in high-income countries. With the prompt, male scientists decreased to 49% and high-income-country affiliated decreased to 42%. There was no difference in the measures of publishing record.
Lan Jiang and colleagues (2025). Leveraging Large Language Models for assessing the adherence of randomized controlled trial publications to reporting guidelines.
André L. Teixeira (2025). AI in peer review: can artificial intelligence be an ally in reducing gender and geographical gaps in peer review? A randomized trial.
2. Offering an honorarium to peer reviewers might increase acceptance rates, but there still isn’t much evidence on this question.
[Non-randomized trial]
Cotton and colleagues ran a trial of honoraria as an incentive for peer reviewers at a single specialist medical journal, Critical Care Medicine. It’s not clear why they didn’t run a randomized trial. Instead, they alternated between paying and not for reviewers for manuscripts in fortnightly blocks. Although the handling editors weren’t informed of whether their manuscript was in the paid group or not, the investigators acknowledged with their protocol that an “editor could likely determine the current condition.” The editors published their trial in their own journal instead of an independent one, too, which is also unfortunate.
There were 715 reviewer invitations sent in this trial, with 414 including an offer of $250 (58%). The primary outcome was the proportion of invitations that resulted in a submitted peer review. That was higher in the paid group: 49.8% versus 42.2% (but they don’t report confidence intervals). The rate of acceptance of invitations to peer review was 52.7% with an offer of an honorarium, versus 47.8% in the control group. The authors concluded there was no difference in quality of peer reviews, and those in the honorarium group came in on average a day faster.
According to a recent review of peer review experiments, this is only the second trial of paying peer reviewers (Boudreau 2026). The first was reported by Chetty and colleagues in 2014, and it was a randomized trial at an economics journal. One of the incentives to meet the reviewing deadline tested in that trial was a $100 Amazon gift card, and it had an impact on how quickly reviews were submitted. The authors found that “tenured professors are less sensitive to deadlines and cash incentives than untenured referees.” Editors at the journal Biology Open have been evaluating a rapid editorial decision system with paid peer reviewers, but not as a randomized trial and without an unpaid comparison group in the rapid system (Gorelick 2025).
The heated debate about whether peer reviewers should be paid or not will continue without much evidence to inform it. More discussion of this here.
Christopher S. Cotton and colleagues (2025). Effect of monetary incentives on peer review acceptance and completion: A quasi-randomized interventional trial. [Protocol]
3. A pair of studies added to what we know about the critical influence of editors’ biases and the review process.
[A descriptive study and a before-and-after study]
Back in 2017 I wrote a post about the fractured logic of focusing on masking peer reviewers to reduce social biases in journal process when most opportunities for rejecting or advancing manuscripts lies with editors, from desk rejections to choosing peer reviewers and final decisions. “The power here, on balance, lies with editors,” I wrote: “They might be the principal beneficiaries of hidden editorial processes, too.” Along with the frequent failure of attempts to anonymize authors and peer reviewers, I think editors’ biases and decisions explain why masked peer review hasn’t been shown to have a major impact on fairness at journals. In theory, this was a strong argument, and who sits in editors’ chairs is pivotal, along with interventions directed at them. But there was very little data to go on.
We got some more peeks into this black box from a couple of abstracts at the quadrennial International Peer Review Congress in 2025, and I look forward to seeing more details of this work. The first comes from an analysis of metadata for 110,303 evaluations of 5 years’ worth submissions to Science and Science Advances. A de-identified dataset will be made available to other researchers for further analyses, too, which is exciting. I’m not sure how much data about the editors themselves will be included. The researchers concluded that strong associations with publication at these journals and the prestige and geography of authors “are primarily attributable (via a mediation analysis) to the influence of the editor, even as the tenor of advice from outside experts correlates strongly with the final editorial decision.”
The second abstract is a before-and-after study of masking an editor-in-chief to authors’ identities for their first screening assessment of manuscript submissions at a single journal (the Journal of the American Academy of Child and Adolescent Psychiatry). The process was not otherwise changed: The EIC screens on a few criteria in round 1, with the options of rejecting, transferring the submission to another journal, or going on to round 2. Round 2 is a detailed reading of the full manuscript as submitted, together with the authors’ cover letter. Round 2 has the same options as round 1 screening, with the additional option of assigning the manuscript to an action editor (and presumably, then, peer review and the possibility of acceptance).
Ultimately, when the editor was masked to authors’ identity, the rate of manuscripts ultimately sent to action editors didn’t change. Masking the editor to the authors’ identity, however, very often shifted the decision to round 2 when the editors’ identity was known: Only 27% went to round 2 when identity was known, whereas 48% did when the authors weren’t named. Authors’ identity was clearly a decision heuristic.
Nicholas LaBerge and colleagues (2025). Manuscript characteristics associated with editorial review and peer review outcomes at Science and Science Advances.
Douglas K. Novins and colleagues (2025). Editor initial manuscript review: A masked pilot study.
4. Description of peer reviewers’ use of an equity/diversity/inclusion checkbox at medical journals suggests this practice is worth exploring further.
[Descriptive study—mixed method analysis]
A new checkbox was added to the peer review form for the 13 JAMA Network journals in March 2023. Peer reviewer instructions encourage reviewers to check the box when they see equity, diversity, and/or inclusion concerns in a manuscript, and to explain those concerns in a confidential comment to the editors. These are not the first medical journals to introduce this practice. Medical Education introduced it in 2022. The goal is to increase the chances that peer reviewers look for, and draw editors’ attention to, EDI concerns, and thereby reduce the publication of problematic articles and problematic aspects within articles that are published.
“Problematic” casts a wide net here, from the slightly wince-inducing to the truly egregious. The authors of the study I’m discussing here gave an example of a retracted study in a surgical journal from 2020 to illustrate the problem. It didn’t ring a bell, so I had a look. The retraction notice is short, and worth reading to appreciate what appalling judgment peer reviewers and editors can have, and how sorely interventions are needed that could counter prejudices and biases in authors/researchers/editors. (There was an outcry about this study on Twitter, #MedBikini—here’s a post about it at Retraction Watch.)
Michael O. Mensah and colleagues published an evaluation of JAMA‘s EDI checkbox practice—in a JAMA journal, unfortunately. I say that because there is an inherent conflict of interest in editors editing evaluations of their own practices, and there is no shortage of journals with expertise in research on editorial practice. This is, as far as I know, the second evaluation of this practice, along Medical Education‘s—which was also published by the journal in-house (Hauer 2024).
Mensah found that the checkbox was not commonly used. However, at more than 5% of over 39,000 manuscripts, that was 2,075 times. Less than half included confidential comments related to EDI. Whereas Hauer reported the checkbox was used by two-thirds of manuscripts at their journal.
Neither the Mensah and Hauer evaluations address the key questions of impact on editors and publications, and neither had a comparison group. Neither describes the diversity of the peer reviewers (with the exception of geographical region reported in Hauer’s study.) Yet, the salience of this intervention is likely to depend heavily on the diversity of expertise and life experience among peer reviewers (and editors). For me, considering the studies of this intervention has underscored for me the critical importance of equity, diversity, and inclusion in editorial and peer review roles. And reading the retraction notice of that “MedBikini” fiasco in 2020 created by senior surgeons in the US underscores how disgraceful it is that universities and institutions there have been dismantling DEI-related policies and studies.
Michael O. Mensah and colleagues (2025). Equity, diversity, and inclusion concerns from JAMA Network peer reviewers.
5. Having a librarian or information specialist as a co-author of a systematic review seems to reduce the study’s risk of bias more than having one as a peer reviewer.
[Randomized trial]
A critical aspect of higher quality and lower risk of bias in systematic reviews is the adequacy of the search for studies, and how well it is reported. Rethlefsen and colleagues ran a trial at 3 of the BMJ‘s medical journals to test whether adding a librarian or information specialist to the peer reviewers for systematic reviews could have an impact on improving systematic reviews. They recruited peer reviewers via the Librarian Peer Reviewer Database. That sometimes required so many invitations, that it made me wonder if it was a limiting factor in this trial. Peer reviewers don’t generally spend an enormous amount of time on the task, so specialist subject interest and expertise may be key.
The primary outcomes for the trial were the quality of reporting and risk of bias in the versions of the systematic review manuscripts after the first revision, which is the first opportunity to see an impact of peer reviewers’ input. A secondary outcome was the rate of rejection of manuscripts after the first round of peer review. In addition, the authors analyzed whether or not librarian/information specialist (LIS) involvement in the review—named as co-authors or only acknowledged—had an impact on the primary outcomes.
There were 400 manuscripts in this study, 168 of which were rejected. The manuscripts had been submitted in 2023, and 166 had been revised and resubmitted by the time the study was closed for analysis, which was enough to be able to detect a 15% improvement in the primary outcomes.
Adding an LIS peer reviewer did not result in the 15% improvement set for the primary outcomes: The difference was 4.4% in favor of the LIS peer review group (95% CI: −2.0%, 10.7%). There was a difference in the number of manuscripts rejected at first decision [98 vs 70 in the control group, a 13.8% difference (95% CI: 3.9%, 23.8%)]. Helping weed out systematic reviews with inadequate search strategies is an important contribution.
Perhaps a reason for the limited impact was the involvement of LIS experts in the systematic reviews themselves. The authors found that was a predictor of more adequate reporting of search strategies and a lower risk of bias in related parts of the reviews. For example, a low risk of bias in search terms and structures was more likely with an LIS named author (OR 4.0; 1.3-12.0).
Only 10% of the reviews had named LIS authors, with almost another quarter mentioned LIS involvement in their acknowledgements. I would have thought the rate would be higher in reviews submitted to these journals. Sigh. My main takeaway from this study was that there is a longer way to go with having enough expertise in systematic review author groups than I had realized. Systematic reviews really need both LIS and statistical expertise to be reliable. These are both stretched resources, but in my experience, systematic reviewers over-estimating their own expertise in those areas accounts for a lot of this problem.
Melissa L. Rethlefsen and colleagues (2025). Improving peer review of systematic reviews and related review types by involving librarians and information specialists as methodological peer reviewers: a randomised controlled trial.
~~~~
You can keep up with my work via my free newsletter, Living With Evidence.

This is the 9th post of a series on peer review research, starting with a couple of catch-ups on peer review research milestones from 1945 to 2018:
All posts tagged “Peer Review”
Disclosures: I’ve had a variety of editorial roles at multiple journals across the years, including having been a member of the ethics committee of the BMJ, and being on the editorial board of PLOS Medicine for a time, and PLOS ONE‘s human ethics advisory group. I wrote a chapter of the second edition of the BMJ‘s book, Peer Review in Health Sciences. I have done research on post-publication peer review, subsequent to a previous role I had, as Editor-in-Chief of PubMed Commons (a discontinued post-publication commenting system for PubMed). Up to early 2025, I had been advising on some controversial issues for The Cochrane Library, a systematic review journal which I helped establish, and for which I was an editor for several years. I peer reviewed several abstracts for the 2025 International Peer Review Congress. And I know authors of 2 of the studies I highlighted this year well (the one on leveraging LLMs for adherence to RCT reporting guidelines, and including librarians/info specialists in systematic reviews).
The cartoons are my own (CC BY-NC-ND license). (More cartoons at Statistically Funny.)