Skip to content

When you choose to publish with PLOS, your research makes an impact. Make your work accessible to all, without restrictions, and accelerate scientific discovery with options like preprints and published peer review that make your work more Open.

PLOS BLOGS Absolutely Maybe

The Limits of Those Reviews of Masks For All and What We Do Know

French anti-tuberculosis poster, date unknown

Have you noticed that you see so many pictures of people wearing face masks during the 1918-20 influenza pandemic, but you don’t for people with tuberculosis (TB) in the same era? There is a story here about a controversy that I think makes the one over evidence in the Covid-19 #masks4all debate easier to understand.

It starts with a sanitary engineer, his protogée medical student, and a landmark study in Baltimore published in 1959. The medical student, Riley, called his mentor, Wells, “an eccentric genius”. Wells’ studies had convinced him that people got infected with TB via dried residue of TB bacteria suspended in the air. But he knew others wouldn’t believe it unless there was extraordinary evidence.

So they built an experimental TB ward, with airtight control. They built chambers for TB-free guinea pigs, venting air from the TB ward or uncontaminated air to them. The experiment went on for 2 years, and a research assistant, Cretyl Mills carried it through. Riley reported:

I was astounded at the details [Mills] had recorded. She not only kept records of the monthly tuberculin tests, as expected, but also ancillary data that now became important. She knew where in the exposure chamber every infected guinea pig was housed and where in the lungs every tubercle was located… And, of course, she found no infections in the control chamber receiving disinfected air.

On the other hand, an average of 3 of the guinea pigs exposed to the ward’s air got infected every month. (Mills eventually got TB, too.)

In 1979, the CDC recommended people with TB wear surgical masks to prevent transmission. But it was controversial because there was no strong proof masks would make a difference. After multi-drug resistant tuberculosis became such a risk, discussion about masks and TB in hospitals looked a lot like the ones we’re having about masks and Covid-19 now. The balance of the debate shifted in 2012, when Ashwin Dharmadhikari and colleagues in South Africa reported on their experiment, inspired by the Baltimore one all those years before.

They built a 6-bed experimental ward for people multi-drug resistant TB in eMalahleni, east of Pretoria in South Africa. They built 2 identical chambers for guinea pigs. (There are pictures of the wards and guinea pig chambers in Baltimore and eMalahleni here.)

Over 12 weeks, 1 guinea pig chamber had air vented from the ward when patients wore surgical masks from 7 am to 7 pm (except when eating, sleeping, or taking medication). The guinea pigs in the other chamber breathed ward air only on days patients weren’t wearing masks at all. The difference in TB transmission?

Sixty-nine of 90 control guinea pigs (76.6%; 95% confidence interval [CI], 68–85%) became infected, compared with 36 of 90 intervention guinea pigs (40%; 95% CI, 31–51%), representing a 56% (95% CI, 33–70.5%) decreased risk of TB transmission when patients used masks.

Dharmadhikari & co concluded:

[Surgical] face masks are unlikely to adequately protect those who wear them from acquiring TB infection because they almost always have significant leaks at the mask–skin interface…

When worn by patients with TB, on the other hand, we believe that simple surgical masks, rather than the much more expensive N95 respirators, are sufficient to reduce the extent to which patients emit infectious particles…Moreover, whereas a better face seal can reduce leakage around the filtration piece of a respirator during inspiration, it is unlikely to substantially resist the air pressure generated by cough…

[W]e believe that surgical masks are optimally used short term, for symptomatic patients in waiting rooms, during transport, and in other temporary situations.

You won’t find this eMalahleni study in most reviews of the evidence about face masks and respiratory illness. And several issues incorporated in this story help explain why different pictures and positions emerge in the outbreak of reviews of evidence on face masks and respiratory infection. Let’s unpack why.

Cartoon of dueling meta-analysts

I think we can actually leave aside the issue of different interpretations of the same evidence, because the reviews aren’t really on the same page. And given we know that with Covid-19, you don’t have to be symptomatic to be infectious, I’m not going to differentiate between patients wearing masks and mass use of masks for this bit.

Major recent reviews relevant to non-healthcare worker mask-wearing ended up with different results for 5 main reasons related to their design and conduct:

  1. Disease scope: they were limited to different groups of respiratory illnesses. There have been studies in a wide variety of them: mild to serious viral respiratory infections, and bacterial ones like TB and pertussis (whooping cough). Some reviews had a wider disease scope, but their searches didn’t cover every respiratory illness for which studies could have been eligible (like measles).
  2. “Setting” scope: they were limited in this too, looking for evidence on the effects of masks only in disease outbreaks, for example. There are also studies of masks in and/or around hospitalized people at risk of getting or transmitting infections, like people with TB, cystic fibrosis, or who have had stem cell transplants.
  3. The type of studies they reviewed: some were only considering randomized trials in humans. Others also considered non-randomized comparative studies, or other direct but non-comparative studies. Others considered indirect evidence too – like mechanistic studies: those simulating coughs with and without masks, or getting infected people to cough or talk in laboratory-controlled situations and measuring what escapes masks of different types. Back in 2016, a systematic review found 23 of them (Smith 2016), and there have been more, including some with people infected with SARS-CoV-2 (the Covid-19-causing virus). Another example of indirect evidence is that eMalahleni study where humans wore the masks, but the infection outcomes were in guinea pigs.
  4. They may or may not have explicitly considered the issues of protection from disease from source control – the risk of transmitting the disease to someone else, when you may not even know you are infected.
  5. Issues related to the methodological rigor and conduct of the review.

The concepts of strength and directness of evidence are important in that list of issues. A powerful, well-conducted, randomized trial in reducing infection rates in the Covid-19 pandemic would be strong and direct evidence. You can also have strong indirect evidence, and weak direct evidence (for example, if randomized trials are small and/or poorly designed/conducted). (There’s a system to GRADE all this up and down that’s relevant if you want to read more about this: it’s used in Cochrane and WHO systematic reviews.)

Given we have to look at evidence beyond studies about Covid-19, what’s the argument for limiting the disease scope for reviews to influenza only, or influenza and coronoviruses? I don’t know. The authors don’t seem to me to be making a strong case for excluding studies on masks for other diseases transmitted by droplets.

But let’s look at issue 5: the methodological rigor and conduct of the reviews. Because that’s critical here, too. Whichever review you look at, you are only going to be seeing part of the evidence picture. And there can be gaps even within the scope of the review itself. The result, I believe, is to systematically underestimate the overall body of evidence on masks and the transmission of respiratory infections.

For example, below this post I have a table showing which randomized trials possibly relevant to the scope of 3 systematic reviews in 2020 were included in each of them. Between them, there were 11 trials, but you would only see all of them in 1 of the reviews. (Note: I didn’t try to find more trials, so I can’t vouch for there being only 11.)

These are those 3 systematic reviews, along with a fourth that is TB-specific:

  • Brainard 2020 [preprint] (all respiratory diseases, with 11/11 randomized trials – but its scope isn’t just trials, and it’s missing other studies)
  • Jefferson 2020 [preprint] (all viral respiratory diseases, with 8/11 viral respiratory disease trials)
  • Xiao 2020 (pandemic influenza, with 9/11 of the trials)
  • WHO 2019 (tuberculosis only)

The preprints are reviews that have not been published in journals yet: and both need careful editorial/peer review. There are also several influential non-systematic reviews, and the issues of incomplete evidence and other issues of rigor get intense with these.

I think the combination of scope and evidence gaps, and often a kind of “draft manuscript” status, mean you have to be careful about all the reviews, except the WHO one on TB. And you have to be particularly careful about the non-systematic reviews, or commentaries based on them.

Some examples of what I mean. Non-systematic reviews are sometimes including case reports of infected people wearing masks on an airplane, where other passengers didn’t get infected. If we knew for sure that airplanes were the source of clusters of infections, that would make sense. But it’s not clear that they are, and it needs a systematic review of case reports and other studies of transmission on airplanes. (See the Brainard review, which looked for studies – I don’t know if that’s all there is though.)

The rapid reviews and preprints also seem particularly prone to error, which isn’t surprising when taking shortcuts on such a complex topic. Check out the very serious concerns raised about the Howard 2020 review by Martin Goodson on the Royal Statistical Society’s data science blog.

So what did I conclude about the gaps from looking closely at all of them?

If the review doesn’t assess the weight of evidence for source control, separately from the weight of evidences for self-protection from infection, it’s going to be misleading. That’s because source control is where the most direct evidence of benefit of using masks in trials is stronger.

If the review focuses only on randomized trials, then it’s going to be missing a body of comparative studies from a 2008/2009 pandemic and, critically, from the 2002/2003 SARS epidemic. And that’s where the weight of most direct comparative evidence about mask use lies.

Thirdly, if the review doesn’t include a systematic review of indirect studies – and I could find no up-to-date one that does – then we’re losing valuable insight into the complexity of potential benefit and harm, and important information about the capacity of different types of material to block droplet transmission (and inhalation). All the mechanistic evidence doesn’t show the same thing by a long way – see for example the Brosseau 2020 review. But we need to turn to the indirect mechanistic evidence to consider which kinds of cloth/home-made masks might get closer to the capacity for blocking droplets that surgical masks have.

The WHO review on TB gives you an idea, I think, of roughly where we would land if there was a review of all direct and indirect evidence. While the evidence was weak, they said it was enough to justify coming to a strong conclusion in favor of masks for infected people to protect people around them, in some circumstances at least. Pretty much what the authors in the story I led this post with concluded (not surprisingly).

How much have masks contributed in the Covid-19 pandemic so far, or how much could they? I haven’t tried to keep up with that: it’s a fast-moving target at the moment. Modeling studies have suggested that if most people wore face masks, it could contribute to a reduction in a community-wide infection rate (Yan 2018, Javid 2020 [preprint]). But there is too much we still don’t know about this to be certain.

Photo of sewing a face mask

If we had a strong systematic review of all the evidence, I think we would be putting more energy into pointing people to great resources on how to use masks, than we would into arguing about whether masks do more good than harm. People are using them, and trials show that with minimal education at least, they can do more good than harm. So the teaching part here is critical, just as it was with hand-washing and showing people what distance they should keep from others in public.

I argued in WIRED that there have been demands for methodological purity in evidence about masks for the general public that aren’t applied to other measures – or to masks for healthcare workers, for that matter.

But as the debate unfolds, a couple of things have been really striking to me. One is obviously the issue of the mask shortage, and the lack of preparedness of countries outside some in Asia for this predictable pandemic problem. The difference between rich supposedly well-prepared countries outside Asia and places like Taiwan are stunning on this.

To see how Taiwan was already making sure in January there would be cheap reliable supplies of surgical masks for the public, check out Jason Wang’s article or his fascinating talk at Stanford. Planning to address the supply of masks for the public was a feature of a prominent call for action by Leung and colleagues at the beginning of March, too. The rest of us are starting from way behind, and getting on top of supply problems is now a well-known burning issue.

The second frustration is symbolized to me by this photo:

Photo of street car during 1918 pandemic
Person without mask denied entry to street car in Seattle, 1918 (via Wikimedia Commons)

Consider the risk to people working on that streetcar, or needing to spend a lot of time on it going to and from work. An awful lot of “non-healthcare-workers” can be at as high, or even higher, risk of getting and transmitting infections than some healthcare workers are. What we call “the frontline” is snaking in and around us everywhere. I don’t think we can afford to disregard the evidence for a serious risk mitigation strategy.

This post started with a story about TB. It’s a good place to end, too, because of the central role masks now play in TB anti-stigma campaigns. As many are pointing out, if the risk of spreading infection is high, wearing a mask is also a gesture of de-stigmatizing community solidarity. And we need a lot of those gestures to get us through this.


Update: I didn’t do a search for systematic reviews on masks when I wrote this post. The following rapid/systematic reviews identified after I wrote this post bring the total to 15 in 2020 (as of 9 June 2020) – not including reviews only comparing types of masks to each other:

Disclosures: I wrote an opinion piece on about masks in WIRED. I was a member of the GRADE Working Group when it was developing guidance on rating evidence and recommendations, and methods for going from evidence to recommendations. I’m one of the founders of the Cochrane Collaboration and participated in the development of its methods, and I’ve studied them as part of my PhD (currently under examination). My PhD supervisors are both co-authors of the Jefferson 2020 systematic review (Paul Glasziou and Chris Del Mar) referred to in this post, and I have butted heads with the lead author on other issues in the past. I have had some Twitter conversation with Glasziou on the subject on Twitter, but have had no private discussions about masks with any of the authors. I have had public agreements and disagreements with Trish Greenhalgh on a variety of issues.

[Update 5 May 2020] The original post included a claim that Greenhalgh’s commentary had an error, which I’ve deleted. It read:

For example, Greenhalgh’s commentary (2020-b) states (emphasis the author’s): “randomised controlled trial evidence, in relation to source control, is entirely absent“. That’s an error: the Macintyre 2016 trial, for example, is specifically about source control. [Update: it’s going to be corrected – great!] (Check out the Brainard and Xiao reviews for a discussion of the breakdown of protection vs source control in other trials.)

Her statement was actually qualified earlier in the paragraph that quote came from, to specify trials showing masks for source controls protects people “in the community”. All the trials of source controls in the Brainard and Xiao reviews were of source control in households or tents. Apologies to Trish Greenhalgh for my error!

[Update 7 May 2020] Added note about Gupta 2020 systematic review.

[Update 17 May 2020] Added note about Mondal 2020 and the McMaster Health Forum evidence profile.

[Update 9 June 2020] Updated list of systematic reviews.

The cartoon in this post is my own (CC BY-NC-ND license). (More cartoons at Statistically Funny and on Tumblr.)

The anti-tuberculosis poster at the top of this post is from the Ministère de la santé publique, France, date unknown, via U.S. National Library of Medicine (NLM).

The photo of someone sewing a face mask is by Tadeáš Bednarz via Wikimedia Commons.

The photo of a person denied entry to a street car in Seattle in the 1918 pandemic is via Wikimedia Commons, photographer and subjects unknown.

Randomized trials in 3 systematic reviews from 2020



Brainard Jefferson Xiao**
Alfelali 2019
Aiello 2010
Aiello 2012
Barasheed 2014
Canini 2010
Cowling 2008  ✓
Larson 2010
Macintyre 2009
Macintyre 2016
Simmerman 2011
Suess 2012
TOTAL 11/11 8/11 9/11

* These are only the trials included in at least one of these reviews. None includes the tuberculosis study at the start of this post or trials in people with cystic fibrosis or whooping cough (if there are any).

** This review was limited to influenza pandemics and non-healthcare settings: both the “missing” trials were influenza in non-healthcare settings, but I didn’t try to establish whether or not they eligible.

  1. Hilda it’s a fascinating story of facts that we have to consider when taking an evidence-based decision. Thank you for putting this into a layman’s or ordinary doctor’s language.
    My question is at this point in time we may not be able to satisfy all aspects of perfect evidence-based study but its important to know the limitations and strengths.
    This is lacking in most the so-called evidence-based studies.

  2. The best piece I’ve read on masks and the challenges of weighing the evidence. Thank you!

  3. very curious as to what height the air was extracted from the ward. given that masks change airflow pattern of exhalations, it should be significant. if the height was around the ceiling or floor it should give less weight to the study (as particles up high should fall past face height whereas particles around floor level are less likely to be breathed in), whereas if it was extracted from face level then even more weight should be given to the findings.

Leave a Reply

Your email address will not be published. Required fields are marked *

Add your ORCID here. (e.g. 0000-0002-7299-680X)

Related Posts
Back to top