The High Risk Methods of a New Systematic Review of HPV Vaccines

February 28, 2020 Hilda Bastian Evidence Health Prevention

Back in 2018, the authors of a then-unpublished systematic review fired a spin- and error-laced salvo at the Cochrane review of the HPV vaccines. The authors were critical of the core of the Cochrane review’s methodology – reviewing only published articles about trials and leaving out unpublished clinical study reports (CSRs) in the first version. They did the opposite in their own review: they left out the published articles (except for a couple of follow-up studies). That was quite an experimental approach, and a risky one at that. Now that review has been published (Jørgensen 2020).

Does the new review challenge the Cochrane review’s conclusion that the vaccine reduced the risk of precursors of cervical cancer? No.

The authors had criticized the Cochrane review for missing trials because they only searched for ones with published articles. Does the new review include all the trials? No, because CSRs were only available for about a half.

One of the authors, Peter Gøtzsche, has been sounding a safety alarm based on this review, for example here and here and here, and on German TV:

We did find serious harms with the HPV vaccines, so we were able to show that there are more serious neurological harms with HPV vaccines than in the control groups.

Does the review back up these strong claims? The answer to that is no, too.

After all that furore, I think we remain essentially where the Cochrane review concluded we were in the first place.

The new review is by Lars Jørgensen, Peter Gøtzsche, and Tom Jefferson. I saw it before it was published because I was invited to write the commentary published alongside it and its accompanying methods paper. I’ll raise the main points I addressed in that commentary here, as well as other points I couldn’t fit in the space. Let’s start with an overview of the 2 reviews.

Cochrane review vs new Jørgensen review

Characteristic	Arbyn (Cochrane)	Jørgensen
Type of trials	Phase II & III	Phase II, III, & IV
Gender	Female only	Female & male
Search cut-off date	June 2017	June 2017
Number of trials	26	22
Number of participants	73,428 female	79,102 female; 16,568 male
Estimated number of eligible or potentially eligible participants missing	ca 5,300 [1]	25,000+ [2]

According to Cochrane’s rapid audit of its review after Jørgensen et al’s 2018 critique, by relying only on published articles, the Cochrane review was missing 5 eligible trials, with 5,267 female participants. (That’s my source for [1] in the table above.) The data for those women, they calculated, did not have a substantial impact on the conclusions. The review is in the process of being updated, including, presumably, with CSR data.

Meanwhile, there are 22 trials in the Jørgensen review, but they weren’t able to get CSRs for another 23 trials potentially eligible for their broader scope. They do not know how many participants there were in 2 of those trials, but there were more than 25,000 participants in the other 21. (That’s my source for [2] in the table above.) The authors acknowledge this high proportion of missing data means their more marginal findings are vulnerable to being overturned if all the trials could be added.

The upshot here is that while Jørgensen and colleagues got a lot of attention for their claim that the Cochrane review had a missing trial problem, missing trials appear to be a far bigger problem for their own review than Cochrane’s. (By the way, the editors of the journal that published the critique of the Cochrane review have yet to correct its serious errors, although they had said they would issue corrections/clarifications quickly. You can read the backstory here. [Update: I pinged the journal about this on Twitter after posting this.])

The authors of these reviews disagree about the reliability of the trials. But the major point of contention between these 2 reviews is the conclusion about serious neurological adverse events.

The Jørgensen review comes to conclusions about potential vaccine harms based on post hoc exploratory outcome analyses of adverse event data. The abstract says, “The HPV vaccines increased serious nervous system disorders”. Working out exactly what they did is enough to give you a bad headache – which, as it happens, could classify as a serious neurological disorder in this review. And if you lost sleep over it, too, you could count as 2 people with serious neurological disorders. If that sounds odd, well, it is. We have to go back to the beginning of the Jørgensen review to make sense of this.

GIF rubbing eyes in disbelief — *Via Giphy*

Missing data and the authors’ plan B

The authors published a protocol for their review (great!), but it was very thin on detail (not so great). This was its section on missing data:

Dealing with missing data: We have a comprehensive strategy for dealing with data that are missing at the trial level (i.e., we plan to obtain clinical study reports of unpublished trials), and at the outcome level (clinical study reports generally include comprehensive data on all planned outcomes). The purpose of this review is to provide as complete a picture as possible of a trial programme, without reliance on the published literature.

In other words, their plan for handling missing data was to ensure there wasn’t any. But their method, CSRs only, made that aspiration unreachable: in the end, there was a lot of missing data. It’s not just missing data from all the trials for which they got no CSRs. None of the CSRs they got were complete and unredacted.

What’s more, there is at least one critical systematic bias in what they were able to gather. They got 65% of the CSRs for one company’s vaccine (17 out of 26 trials) – a vaccine that can protect from only 2 types of HPV and has been discontinued. Whereas they got only 33% of the CSRs for vaccines by another company (7 out of 21 trials), almost all of which can protect against more virus strains. That imbalance is reflected in the number of trial participants too: there were 66,000 in the trials they included for the discontinued brand, and only 31,000 for the others.

When they were faced with such patchy data, they developed plan B on the question of the risk of possible harms to individuals: post hoc exploratory outcome analyses.

You can’t place as much confidence in post hoc analyses as ones that were planned before the data came in. (See my explainer on why the risk of bias is higher.) So this adds another layer of unreliability on top of that caused by the missing data.

The amendments to their protocol were posted online. (That’s great.) They wrote that the CSRs didn’t have the basic details they had hoped for about adverse events. For example, they couldn’t tell how soon the events happened in relation to vaccination – a key bit of information to come to any conclusion about whether the event was related to vaccination. And identifiers for people were redacted, so they couldn’t tell how often adverse events were happening to the same person. And that brings us to the next critical problem with the analyses and conclusions:

How can you state the risk of being affected when you don’t how many people were affected?

Well, you can’t. So the method they used was not only post hoc, it wasn’t a standard use of statistical tests either. They substituted the number of events in the mathematical formulas where the numbers of people are supposed to go – but still used the language of the risk of being harmed, and then calculated an NNH from that (number needed to harm – how many people had to be vaccinated for one person to be harmed). As I wrote in my commentary,

Both of these statistics unambiguously require knowing how many individuals were affected by harms as a proportion of all individuals – data that the authors did not have. You cannot know the risk of being harmed, if you do not know how many people were harmed.

The authors drew the line at continuing this method of calculation in cases where events were so common that the numerators eventually exceeded denominators. The results of meta-analysis in these circumstances, they wrote, would be “nonsensical”. But the respective size of the data points is not what compromises these analyses. The problem is doing calculations with data points other than those the formulas require.

The effect of this was a risk that rates of adverse events were inflated. And that added a third layer of unreliability to the adverse events data. They argued it was justifiable for serious neurological disorders, because (a) the numbers were relatively small so they believed it was unlikely that individuals had more than a single event, (b) they believed the events are likely to be under-reported, and (c) they believed that because the control injections were usually not saline placebos, they could have been causing adverse events and that could cancel out any over-counting. All that is speculative, though, and there is no way of knowing from the data.

There was, in the end, no increased risk of serious adverse events, or in any one type. But even though they only had exploratory analyses with all the layers of uncertainty around them that we discussed, they concluded vaccines caused serious neurological harms. This group was a collection of events, largely headaches. As I explained in my commentary,

…they did not know how many separate individuals experienced them. So if a person had a headache bad enough to interfere with their normal activity as well as dizziness that affected them as badly, or they had disturbed sleep (or all three), then that one person would be counted as two (or three) people with serious neurological harms.

Then there’s the multiple testing …

A fourth layer of unreliability came from what’s called multiple testing, which the authors acknowledged adds uncertainty to their findings. They ran statistical significance testing for a large number of events, and when you do that in a large dataset, your chances of coming up with false positive hits shoot up. (My explainer here.) It’s the source of a lot of the fluke results that never turn out to mean anything in healthcare research.

What about the specific “harms judged as definitely associated” with some rare neurological conditions?

Their conclusions about “harms definitely associated with” 2 rare neurological conditions (POTS and CRPS) have another layer of unreliability. POTS (postural orthostatic tachycardia syndrome) and CRPS (complex regional pain syndrome) have a collection of symptoms that are almost never caused by one of these rare conditions. The review’s findings aren’t based on diagnoses of these conditions. The authors based their decision on one of those post-hoc (“Plan B”) methods. Here’s how I explain it in the commentary:

They collected every unique term used for any recorded adverse event, and put them into an Excel sheet. They asked a single clinician to code those she considered definitely associated with POTS or CRPS. The result, as the authors point out, included conditions “that do not align well with the diagnostic criteria of POTS or CRPS”, like constipation. Coded “definitely associated” was a very long list of symptoms including many kinds of common pain, conditions including food poisoning, and having tests including chest x-rays, blood tests, and ultrasounds. There did not have to be a cluster of them. These events are exceedingly more likely not to be associated with POTS or CRPS than they are to be a signal of a rare neurological condition.

With all those provisos about uncertainty in mind, what was the “risk” they calculated for experiencing the post-hoc category of serious neurological disorders during the trials? It was well under 1 out of 1,000 (0.6). That’s weighed against lowering the risk of cervical cancer that, according to National Cancer Institute estimates, is the cause of death for 2 out of every 1,000 women in the US. (In Denmark, the review’s authors’ country, cervical cancer was the cause of death for more than 3 out of every 1,000 women who died in 2017.)

And then there is the anxiety and physical harm to women having tests and procedures after abnormal pap smears: about 2 million women a year have abnormal pap smear results in the US. Cervical cancer, and efforts to find and treat it early, lead to a colossal burden of anxiety, pain, suffering, and loss.

How big might the vaccine’s benefit be in cancer prevention? It usually takes a long time if cancer develops after HPV infection, so it was always going to take years to know how HPV vaccination affected cancer. Since June 2017, the cut-off date for data for both these reviews, we’ve been getting signs that the risk of cervical cancer is reducing in high-vaccination communities. With vaccination happening at younger ages than it was on average in the trials, the benefit could be greater in high-vaccination communities than it was in the trials. What about the trials? Women in the control groups were offered vaccination after the trials ended, which complicates long-term study of the women in the trials. However, in December 2017 a preliminary report of a drop in cancer for women in a trial follow-up study was published.

Fingers crossed that the benefit is higher than in the trials. And fingers crossed, too, that this new systematic review with such shaky foundations doesn’t set off the kind of safety scare that has seen HPV vaccination rates drop dramatically in some countries.

How could a causal claim as serious as this, based on such problematic analyses, reach publication in an important journal? For one thing, I think many people running biomedical journals and peer reviewing for them just aren’t attuned to their heightened responsibilities in an era where this kind of information is weaponized. (I’ve written about that here.) This review could be a poster child for another likely contributor though.

When statisticians are indispensable for meta-analyses

Statistical knowledge can be high in non-statistician systematic reviewers/meta-analysts. But so can intellectual overreach. Statistical peer review is one of very few things that have been shown to improve the quality of biomedical publications generally.

None of the 3 authors of the Jørgensen review are statisticians. The lead author was a PhD student, and the second author was his supervisor. That meant the second author chose the examiners for the PhD. None of the people he chose was a statistician either. (They are listed here and here.) There are 2 people the authors acknowledge for help, but neither of them appears to be a statistician.

When the manuscript arrived at the journal, the handling editor was not a statistician, and it wasn’t sent to a statistician for peer review. I was invited to write the commentary after it was already accepted – and I’m not a statistician either (although I did consult one when I was writing my commentary, acknowledged below).

There aren’t enough statisticians specializing in meta-analyses to co-author and review every single systematic review, although some journals for sure do it. There aren’t enough statisticians for peer review in general. However, it seems to me that there are at least 2 cases when statisticians are needed as co-authors of meta-analyses, and at the very least, as reviewers/statistical consultants for them at journals: when meta-analysts are using unconventional methods, and when they make an unusual claim of public health significance. This review did both.

What about the implications for the CSRs vs published articles debate?

Jørgensen & co made extraordinary efforts to get CSRs – and to their great credit, they managed to accumulate nearly 60,000 pages worth. However, they did not receive even one complete and unredacted CSR. Not one. And in a methods paper accompanying the new systematic review, they acknowledge that doing systematic reviews without journal articles risks missing data. But what about the risks of relying only on papers, as the Cochrane review did?

Unfortunately their methods paper doesn’t help as much as it could have, because they only compared data from CSRs with a single paper or a single trial registry entry per trial. And of course, good systematic reviews based on journal publications don’t select just a single source – they aim to include all publications on each trial. Follow-up papers don’t always have additional data or information, but sometimes even trial authors’ replies to letters to the editor do.

Jørgensen and his colleagues concluded that the CSR data made no important difference to meta-analyses based on a paper alone. I found 4 other studies that compared CSR analyses with those from journal publications only, and the results were mixed (cited in my commentary). From what we know, you can’t assume that a meta-analysis in a systematic review based on journal publications alone is unreliable. The value of CSRs appears to lie in data that isn’t included in journal publications at all, and that can make them very important.

On the other hand, these authors spent several years trying to get these CSRs together. It shouldn’t be that hard, of course. Hopefully it will get much, much easier, but I’m not sure the case has been made for exploding the workload and timeline of every systematic review pursuing hard-to-get CSRs. CSRs will sometimes be critical to a question, but not always.

One of the authors, Gøtzsche, has claimed that their systematic review is “much more reliable than the Cochrane review as we based it on clinical study reports and not on publications”. Yet, the conclusion in the jointly-authored paper is that a systematic review still needs publications – and this one certainly needed them. The source material is not the only factor that makes a review reliable or unreliable, in any event: sounds methods and drawing reasonable inferences are necessary to reliability too.

At least in this particular instance, the claim of superiority based on using CSR data is time-limited. The authors of the 2018 Cochrane review had written that they would be seeking out CSRs in the next phase of their review. Those authors now have the benefit of having had a competing review team and a Cochrane audit that has pointed out weaknesses and identified missing trials and data for them. Here’s hoping they take full advantage of it.

~~~~

My thanks to Thomas Lumley (Professor of Biostatistics at the University of Auckland) and Jenny Doust (Clinical Professor at the University of Queensland and Bond University), with whom I consulted on issues I raised in my commentary. And my thanks, too, to the peer reviewer of my commentary. (The views and any errors in the commentary or this post are mine.)

(Since the Cochrane review, 2 other systematic reviews than Jørgensen & co’s were published too – the scope of all 4 reviews are summarized below this post.)

This is the 7th (!!) in a series of posts at this blog about the unfolding events and related issues that began with the publication of a critique of the Cochrane review of clinical trials of the HPV vaccine to prevent cervical cancer in 2018. The first critiqued that critique. The second looked at what we know about whether the vaccines are working in the community as would be expected from the trials. The third goes into a concurrent crisis that unfolded at the Cochrane Collaboration: that post included discussing responses to the critique. The fourth post discussed extremism and anti-industry bias. The fifth discussed journals’ responsibilities in vaccine debates. And the 6th addressed journal editors’ call for feedback on the need to correct the original critique they published of the Cochrane review.

My timeline and analysis of the conflict between Peter Gøtzsche and the Cochrane Collaboration, and why I don’t think it’s about the HPV vaccines review, is at my personal website.

Disclosures: I was invited to write the commentary accompanying the publication of the Jørgensen et al systematic review on the HPV vaccines. I led the development of a fact sheet and evaluation of evidence on HPV vaccine for consumers in 2009 for Germany’s national evidence agency, the Institute for Quality and Efficiency in Healthcare (IQWiG), where I was the head of the health information department. We based our advice on this 2007 systematic review including 6 trials with 40,323 women, and an assessment of those trials. The findings were similar to those of the 2018 Cochrane review. I have no financial or other professional conflicts of interest in relation to the HPV vaccine. My personal interest in understanding the evidence about the HPV vaccine is as a grandmother (of a boy and a girl).

I am one of the members of the founding group of the Cochrane Collaboration and was the coordinating editor of a Cochrane review group for 7 years, and coordinator of its Consumer Network for many years. I have often butted heads with the Cochrane Collaboration (most recently as a co-signatory to this letter in the BMJ). I am no longer a member, although I occasionally contribute peer review on methods. This year, after having been very critical of the Cochrane review on exercise therapy and ME/CFS (here and here), I was appointed by Cochrane to lead its independent advisory group into the update of that Cochrane review. I have also both collaborated, and butted heads on the subject of bias and evidence, with 2 of the 3 authors of the Copenhagen critique and Jørgensen review (Tom Jefferson and Peter Gøtzsche), and have defended the Cochrane Collaboration’s decision to expel Gøtzsche from its membership in 2018.

I was invited to speak at Evidence Live, and my participation was supported by the organizers, a partnership between the BMJ and the Centre for Evidence-Based Medicine (CEBM) at the University of Oxford’s Nuffield Department of Primary Care Health Sciences – the director of the CEBM is the editor of BMJ EBM. I blog from time to time at BMJ Blogs. Between 2011 and 2018, I worked on PubMed projects at the National Center of Biotechnology Information (NCBI), which is part of the US National Institutes of Health. I recently submitted my doctoral dissertation on shifting evidence and how that affects the validity of systematic reviews.

Summary of 4 recent systematic reviews

The Cochrane review is by Marc Arbyn and colleagues, and was published in May 2018. It focuses solely on vaccinating girls and women, preventing cervical cancer and the lesions which can turn into cervical cancer, and adverse events. Their search for studies had a cut-off date in June 2017, and they only included phase II and III randomized trials. There are 26 trials included in the review, with 73,428 female participants, based only on published articles. It is in the process of being updated: the current reviews says an update will include adding data from other sources, like clinical study reports (CSRs).

The next systematic review was quite different. Published in June 2019, by Mélanie Drolet and colleagues, the review includes 65 population-level studies from 14 countries and around 60 million people (female and male), looking not at trials, but at comparisons of HPV infection, cervical cancer-related outcomes, and/or warts before and after vaccination was introduced. This study had a cut-off date in October 2018.

The third systematic review is another review of trials. It was by Claire Rees and colleagues, and was published in January 2020. Like the Cochrane review, they studied phase II and III randomized trials, but only for 2 vaccines (brand names Gardasil and Cervarix), with a cut-off date for searches in July 2018. Unlike both the Cochrane and Jørgensen reviews, this one doesn’t pass basic quality standards for systematic reviews (for example, including an analysis of the risk of bias or methodological quality of the trials). They broke down the trials a little differently, including just under 68,000 women. They did not look at adverse events, and only considered CIN3+ as a cervical cancer precursor, even though CIN2+ is the level where treatment is generally recommended, and is typically used for trials and systematic reviews. (Explainer here.)

The CSRs-based Jørgensen systematic review was published in February 2020. It covers phase II to IV randomized trials in females and males, with any HPV-related outcomes, as well as adverse events. Their cut-off date for data was in July 2017 – the same as the Cochrane one. There are 22 trials in this review, with 79,102 females and 16,568 males. They didn’t get CSRs, though, for another 23 trials that were potentially eligible. They do not know how many participants there were in 2 of those trials, but there were more than 25,000 participants in 21 of them. The authors acknowledge the high proportion of missing data means their more marginal findings are vulnerable to being overturned if all the trials are included.

The cartoons are my own (CC BY-NC-ND license). (More cartoons at Statistically Funny and on Tumblr.)