Last year I wrote that a day may come, when we have evidence about masks that’s so strong, so convincing, only the…
A Meta-Analyst View of Polling Averages and Alternatives
Polls haven’t been great at reflecting how electoral college votes will stack up in the last 2 US Presidential elections. In 2016, a Clinton win seemed pretty sure. In 2020, polls over-estimated the likelihood of Biden winning by so much, it was judged the worst polling error for decades.
In response to those polling errors, polling aggregators and forecasters modified their methods, often substantially – and so did lots of pollsters. They tweak along the way, too. We can’t know what chance they have at getting close this time. The forecasters are aiming at an awfully small target: The last 2 Presidential elections were decided by just tens of thousands of votes in a few states.
Polling was only developed in the 1930s, and poll aggregation decades later. With Presidential elections only every four years, there’s not a lot of those major data points to work with. And there have only been 2 previous elections with the MAGA version of the Republican Party.
The previous 1 or 2 elections are an important feature both in determining sampling and weighting of respondents for polls, and in the modeling of “polling averages” and forecasts. The first 2 go-arounds with MAGA, pollsters were operating with pre-MAGA assumptions about voters. They’re still not sure how to account for the MAGA impact. Perhaps it remains under-estimated. On the other hand, what if, while they are trying so hard to cram more MAGA-voters into their polls and reckoning, the propulsive energy of the MAGA wave is fading? It wouldn’t take much to overshoot the mark, and end up having under-sampled Democrat voters this time round.
Those 2 elections had other major features making them even more unprecedented: The first woman candidate of a major party in 2016, for example. Then for 2020, a pandemic and major change in ease of voting by mail (and resulting turnout). And now this one comes with assassination attempts, a late change in the top of the Democratic ticket, the first candidate who is a woman of color, and the first one who is an insurrectionist and a felon – against a backdrop of MAGA abortion bans, including in some battleground states, and escalating MAGA suppression of voting in some.
Massive servings of “unprecedented” are clearly problematic for any system based on precedents.
Even knowing all that, I couldn’t resist looking at “averages” frequently. I wanted to at least have my head around the technicalities of these statistical models from a meta-analyst perspective, and make a choice about which to click on. So I dug around in the literature and under the hood of several of the popular online aggregators to get a handle on a couple of questions: Are any of them likely to have an edge over others? And what about alternative ways to gauge potential voting shifts?
I stumbled quickly at several limitations of these models, though. Consider the potential for unusual turnout and voting swings for women, for example. I didn’t find a feminist analysis of election poll aggregation models. Going by the publicly available details, aggregators didn’t seem to address gender beyond taking women’s population share into account, along with their likelihood of voting in previous elections. The historical margins might not hold this time, though. If the gender gap widens along with an increase in women’s turnout, that could shift the odds for Democrats.
The gender gap has been driven by Black women, and been estimated at hovering in the 7-12% range (between 2008 and 2020). This year might end up higher than that. A couple of polls in swing states from one pollster in September this year put it at around 16%; a national one from a different pollster had a 33% gender gap. I haven’t seen an aggregation of the gender gap in voting intention. (ABC News’ 538 publishes a polling average on favorability, but without a gender breakdown.)
There could be several issues driving a gendered response and turnout. But let’s consider reproductive freedom. On top of widespread concern about this, several of the battleground states have introduced variously-restrictive bans for early abortions, and some Presidential and Senate battlegrounds have abortion-related referenda on the ballot.
An alternative source of data that could foreshadow whether or not women’s turnout will be different this time is new voter registrations. This could be an indicator of momentum in an election that might not be captured by polls. Indeed, new voters were identified as one of the potential contributors to 2020’s voting error.
States release this data at different times, so you can’t get a consistent picture in real time. There’s a free online dashboard of this data now at TargetSmart, which is a wonderful resource. Beware of the lead chart though: A sidebar suggests that recent data for many states either isn’t yet available, or hasn’t been uploaded.
In mid-September, TargetSmart reported that 38 states had updated their files since Harris moved to the top of the Democratic ticket, including four battlegrounds. They estimated that women account for “nearly 55%” of new registrants in that time, with major surges in Black women and young people registering. The trend was similar or even higher in the battlegrounds.
Later in September, they reported that there had been a surge of Hispanic registrations in the week after Harris’ hat went in the ring. They estimated over 26,000 new Hispanic registrants in the battlegrounds plus Texas and Florida – predominantly under 30, especially women.
So now, with a rough idea of the odds against get a precise and accurate answer from current methods, let’s look at the online poll aggregators. These are the main types of methods they are based on:
- Simple combination of very recent polls that really is “averaging.”
- Poll results combined with other data, adjusted in a statistical model. Results of older polls stay around for longer, but phase out over time. National results tend to be fed into state-level calculations, as well. They are showing current estimations of people’s voting intentions, with a trendline charting the “average” across time.
- Forecasting using polling along with other data in a statistical model, running multiple simulations to calculate the probability of particular election results. As it gets closer to the end of the election, polls come thick and fast, and the other data gradually phases out of the model. Again, results of older polls stay in the model for a while, but phase out over time, and national trends feed into state-level forecasts as well.
There aren’t many in the first category – Real Clear Politics is the main one. Most of the poll aggregators are in the second category, charting the national popular vote as well as battleground (or all) states. Even though the jargon for these is “polling averages,” it’s statistical modeling. The forecasters in the third category also post that type of “polling average” as well.
There are just a few major aggregator and forecaster teams posting frequent online updates, and there has been some flux in “who’s who” here.
The first major one was FiveThirtyEight, started by Nate Silver for the 2008 Presidential election (Obama’s first victory). In 2023, ABC News took over the brand and re-named it 538. Silver kept the rights to his model, though, and is now operating it as Silver Bulletin, with the forecast and much other data behind a paywall.
Meanwhile, G. Elliott Morris moved over to 538. He was part of the team with statistician Andrew Gelman that developed the election forecasting model for The Economist. There is now a fair bit of convergence in methods now between 538 – which is freely accessible, and The Economist model – which is mostly behind a paywall (although the model itself is open). Gelman is the Columbia University statistics professor who writes the StatModeling blog: You can dig into lots of his posts on the technical aspects of all this there.
In the end, I focused on those 3 polling aggregations/forecasts, as well another 2 polling aggregations – from Upshot at the New York Times and Washington Post (both behind paywalls). Others fell by the wayside on quality or transparency grounds. (For Steve Kornacki fans: I tried to figure out the source of the “polling average” he/NBC mentions, but couldn’t identify the source.)
It’s difficult to compare the methods of the 5 aggregators because of incomplete and inconsistent reporting of methods and modifications.
I’ll start with a summary of assessments I found in the scientific literature made by third parties of these aggregators’ results close to election time – including comparisons to the simple polling average from Real Clear Politics (RCP). It’s a bit grim:
- Barnett (2023) compared FiveThirtyEight and RCP for the 2020 election. The conclusion: “FiveThirtyEight only marginally outperformed RCP.”
- Thomas (2021) evaluated The Economist model for the 2020 election. They concluded that “The probabilities were too certain of a Biden win because the polls were too certain of a Biden win.” They also concluded that polls have become less reliable in recent years. The reality, they wrote, is “that much of forecasting is still luck.”
- Wright (2018) included FiveThirtyEight, RCP, and Upshot at The New York Times in their comparison of aggregators. FiveThirtyEight, they concluded, only had “somewhat better correlations” than RCP at a state level in 2008 and 2012. For the 2 forecasters in 2016, “FiveThirtyEight and Upshot provided respective probabilities of 29% and 15% for a Trump victory, and for these models the outcome cannot be considered a rare or highly surprising event.” Both forecasters had forecasts of loss, though, for 5 key states that Trump won.
Wright also concluded that “the averaging necessary for robust state estimates may have made the prediction sites insensitive to a late change in the national component of the polls… Thus we propose that much of the bias may have arisen from the aggregation methods, not the individual polls.” They also concluded, “poll aggregation models often do not adequately correct for the effect” of undecided voters and third party candidates.
That issue about handling undecided voters arose in the literature in other types of articles too. It felt a bit akin to the problems from handling of missing clinical trial data in meta-analyses. Liu (2021) pointed out that the number of undecided voters during an election can be higher than the margin between candidates, which makes it very important earlier on. They concluded that this could have made a determinative difference to models for the 2016 election. Bon (2019) point to this as a problem at the level of polls as well, and suggested this could have been a problem for FiveThirtyEight.
While I was digging into this literature, I was also collecting data from the 5 aggregators every few days. There wasn’t really a lot of difference. After all that, analyzing differences between the aggregators’ methods came to feel pointless.
I decided the best way to choose between them was how they showed results – especially making the level of uncertainty clear. That made 538 the resounding winner. It displays the uncertainty and number of polls clearly for both national and state results. And the aggregation of favorability ratings that I mentioned earlier is an advantage to me, too. On top of that, it’s the only one that has no paywall restriction.
I think Morris from 538 summarizes the limited results these models are showing well. He wrote that polls “point to a close race today,” but while it’s probabilistically close, that “does not mean that big wins aren’t also possible.” Yup. It could be very close… or not.
This exercise has led to a big reduction in my consumption of polling aggregations! I’ll still look at 538 from time to time, though. It’s better than paying attention to conflicting individual polls popping up around the place. That would just let my confirmation bias run amok, finding reasons to dismiss results that stress me out.
Note: I embarked on this exercise after writing about a frustration with media coverage of polls in my newsletter at the beginning of September.
You can keep up with my work at my newsletter, Living With Evidence. And I’m active on Mastodon: @hildabast@mastodon.online and on Bluesky @hildabast.bsky.social
~~~~
Disclosures: I live in Australia, and I am not a US citizen. If I were, I would be a registered Democrat. I lived in the US during 2 Presidential elections (2012 and 2016).
The cartoon is my own (CC BY-NC-ND license). (More cartoons at Statistically Funny.)