How to Forecast the Most Unpredictable Election Season in Decades

Where’s the model, Nate?”

Search the dark corners of election Twitter, or at least the mentions of any Nate Silver tweet, and you will see people demanding to see his presidential election model. First released only 12 years ago, it has become a key moment in election politics coverage. In fact, is there even an election happening if Silver isn’t predicting who’s going to win it?

Now the 2020 model has arrived, and it has Joe Biden as a 71% likely winner—almost exactly the same as Hillary Clinton’s number at this time in 2016. But everything else has changed: Not only is there the obvious fact that we’re barreling toward an election with a raging pandemic, sweeping protests for racial justice, and tremendous economic upheaval, but the polling and data science world is facing serious scrutiny after predicting the wrong winner in 2016. (Of course, as Silver has pointed out many times, the 538 model was more optimistic about Trump’s chances than both traditional media and many of his peers.)

This year, the 538 model once again finds itself at odds with some parts of the political analysis world. One major competitor, The Economist, gives Biden significantly higher odds of taking the presidency—89%. When I asked Silver about the divergence he explained that, “funnily enough, our election-day forecast looks a lot like The Economist’s prediction. I think that would be 97% of the popular vote. Ninety-two percent Electoral College.” The difference, according to Silver, is how 538 deals with the fact that there are still just under three months left until Election Day. While Silver concedes that polling so far has been incredibly consistent—“It's been literally I think the most stable that a polling average has ever been”—he cautions that “stability predicts stability weakly.” For evidence, he points out that “the Democratic primary polling average was incredibly stable for months and months, and all of a sudden, Biden collapses after Iowa and then rises as fast as any candidate ever has in history after South Carolina.”

Put all that together and Silver sees a race where “if Biden moved from 70 to 92 over the course of a couple of months, that wouldn't surprise me. Most of the uncertainty right now is because it's August, and both the economic data and the news cycle are crazy. But there are fewer things that would predict incredible uncertainty on Election Day itself.” This contrasts with 2016, where societal conditions were relatively calm and a lot of the uncertainty in Silver’s model stemmed from the high number of undecided voters and strong third-party candidates.

The dichotomy between the two elections captures the strengths and challenges of modeling presidential election results. On the one hand, it’s possible to take two vastly different sets of circumstances, synthesize them, and spit out a number that lets a nervous voter compare Joe Biden’s chances with Hillary Clinton’s. But that also flattens the context of those elections. Two thousand sixteen was a volatile race in a relatively normal world—the polls ebbed and flowed, with people changing their minds up until the last possible minute. This election has been remarkably stable, even as the world seems in chaos. Some modelers calculate that as a sign that the race will remain this way. Silver just isn’t convinced: Heck, even 2016, he points out, wasn’t as crazy as some other recent races, “like 1988, where Dukakis went from 10 points ahead to losing by almost 10, or ’92, where Perot gets in and out of the race and things are kind of haywire. Two thousand sixteen was actually kind of quite average in some ways.”

So, while everybody agrees that Biden’s current position is strong, and that if he holds this position come November he’s likely to win, there is still plenty of disagreement on how likely it is that will come to pass. In this interview, Silver talks about the differences between 2020 and 2016; how the 538 model accounts for factors like the COVID pandemic and voter suppression; and why its methodology is superior to that of competitors.

GQ: There’s only one place to start: What does your 2020 model say?

Nate Silver: So there'll be more polls between now [GQ spoke with Silver on August 7] and Wednesday morning. As of right now, I think it shows Biden 72% to win, Trump at 28% [by the time of the model's release on August 12, Biden's chances shifted slightly to 71%]. We've been monitoring it for a week or so and reexamining a few things, and Biden has been bouncing around between 70% and 75%. If there are a bunch of big national polls that come out on Sunday or something, then it could shift a little bit. And it seems to like that range, which is ironically pretty close to where Clinton was at the end of 2016.

Since you brought up Clinton, one thing I'd note is that 2016 was really volatile—both the polling and your model picked up a lot of ups and downs. Has it been the same this time around?

We haven't run it all the way back [to see what the model would say about earlier phases of this election season]. If you revert back to March or whatever, there’d be different economic data with COVID that would shift things. But going back to Memorial Day, I think Biden has always been in the 70% to 80% range. It's actually tightened a teensy tiny bit. And also the economic data has gotten a little bit better. So we actually would have been a little bit higher on Biden before.

In 2016 you released your presidential election model at the end of June, and in 2012 it was the beginning of that month. Was it a conscious decision to release the model later this year?

COVID first hit in March, and that coincided with Bernie Sanders basically bowing out. In March and April, either I was in a weird COVID fugue state or writing about COVID, so I didn't get much done on the model. And then I began working on it in May, and the general sense was that there's no reason to be rushing out to market. There's no reason, in an abnormal time, that people had to see these probabilities, or that that was necessarily the best way to conceive of the election.

Also we looked at a lot of things like, how are we accounting for additional uncertainties because of COVID? We spent a lot more time, for example, on the economic component, because you can potentially have a V-shaped recovery, and what the economy looks like in July or August may not really resemble what it looks like in November. We also like to look at the number of full-width headlines in the New York Times as a way to quantify additional uncertainty when there is more news. Things could get worse and worse, as they have been recently, or there could be a vaccine or whatever else. Thinking through that took a fair amount of time. And editorially, we think this product works best in the stretch run of the campaign. We also got our polling averages out in mid-June, which is a preliminary version of the model. You can still use that polling average, which we think is a better product than its competitors. It's more transparent, it's more inclusive. It's smarter about not going for phantom swings. So it was a two-stage process.

I think that 72% number for Biden will be received very differently than it would be if the 2016 election was not fresh in everybody's minds. To what degree do you think ahead to what the reception of your model and the numbers are going to be?

No matter what our number was this year, I think it was going to be kind of a fraught and complicated event. I suppose it wound up in a place where I think it's a bit easier to narrate the conclusions. Like, hey, actually, this thing is not in the bag, right? If we wound up with Biden at 89% or whatever, then you were probably narrating a story about, well, here are the reasons why this is not like 2016.

Every presidential election for which you’ve released a model has had a very different narrative cast. I do wonder if that plays into your thinking, not as you build the model, but as you're thinking about how to present the information to the world and how it’s going to be received. At this point the 538 model is an event that might shift the gravity of discussion of the election.

I do try to think about it. In some ways the pandemic makes it feel less important, and so maybe because of that I spend less time worrying about how it will be received. But it's also a little bit hard to pinpoint what the conventional wisdom is. If you look at betting markets, they have Biden at around 58%, 60%, which seems, I think, too low. If you read political Twitter, it seems like the conventional wisdom has flipped from a toss-up to Biden is dominating. And then the other models all have Biden very high. So it's a little bit hard to figure out what the average person thinks.

I think I saw that you put something into this iteration of the model to account for shifting ballot access laws or regulations in various states. Is that correct?

That's right. What I'm looking at is, there is a group of academics who have updated an ease-of-voting index since 1996, or calculated it back to ’96. So it turns out that when voting is easier, then turnout increases and turnout becomes more Democratic. And so states like Michigan, where they've gotten Democratic leadership in place since 2018 and made a lot of steps to make voting easier, it's not necessarily a surprise that Biden is doing well in the polls. Whereas conversely, it may be harder for Biden to get over the hump in states like Texas and Georgia, which are putting restrictions in.

We also looked at primaries that were held post-COVID. And it looked to us like the turnout was harder to predict in those primaries. Sometimes it was way up. Sometimes it was way down. And it's complicated because you had the Democratic primary ending, the presidential primary ending. And so the incentives are different, but the turnout in those races was kind of all over the place. And we can go back and look and say, okay, what happens in a state where turnout becomes hard to predict? Does that also make your polling average more prone toward error? And the answer is yes. And so we think that, in addition to the fact that polls might or might not move, there's probably going to be around 20% more error on Election Day, because pollsters may or may not be able to pinpoint what the electorate looks like in a world where voting is different than it was before. With that said, if Biden’s up eight points on Election Day, then...

Do you have a sense of how much an adjustment like that moves the model? I assume you're doing constant adjustments as you're building the model out.

There are things we looked at, but it all didn't really make much difference. Most of the uncertainty is still in the fact that there is time for the race to move between August and November. The VP hasn't been picked [Biden chose Kamala Harris as his vice president yesterday]. The conventions haven't begun, the debates haven't happened. Maybe intuitively, it might feel very baked in, but it's not the job of the model to reflect people's intuition.

I think the best argument for why Trump would lose is that we're in the middle of a pandemic that he totally screwed up, and things won't get better enough by November for people to forgive Trump, and he was already behind anyway. That's not really a narrative having to do with the model per se, right? The model will say, okay, well, he's down eight, but it's August and things are kind of crazy. And so where's that wind up? Well, 70% or 80% Biden and 20%–30% Trump. That's all the model is really doing.

You can have as much confidence as you want, but it shouldn't come from a polling average in August, because the polling average in August historically is only so accurate. And yes, it's been more accurate if you look back at the past few election cycles, although 2016 was a bit of a problem. But are we willing to just rely on three or four data points and totally ignore any impact from COVID? We don't think that reflects robust modeling. It may turn out that COVID makes polling more accurate because more people are at home answering people's calls, because you have voting early, therefore you lock in the right number earlier. But you can't know that ahead of time. And to the extent we can, we want this model to reflect real-world uncertainties.

Now, there are still some assumptions baked in. Are we trying to account for irregularities in how the vote is actually counted? Not necessarily. One thing the model can be is a benchmark over what things look like in ordinary circumstances. If we have those conservative assumptions, Biden is 99.95% likely to win New York state, right? And then if somehow absentee ballots are impounded in New York state and Trump claims victory, that provides some type of a benchmark to look at. But I do think one risk, if you were to have a forecast at 90% Biden—even though that isn't 100% and I would be the first to point out that—you do start to see people say, well, either Biden wins or the election was rigged. You start to hear commentators pick toward that. And I think that's not really true. There are plenty of precedents for big swings, like Michael Dukakis being way up in ’88. I think we need to have a lot of humility about how we're going to measure the economy. For example, you're looking at some models using third-quarter GDP, and third-quarter GDP might be plus eight or something. And so therefore that would show a Trump landslide.

There is this tension between having a model that's built to have causal explanations and a model that just says, here's the stuff that we can show. A lot of times when people look at your model or Nate Cohn’s, they're looking at it through a causal lens—why does it say this? Do you view the job of the website as delivering the information in as limited a fashion as possible, so that people are not taking away more than you're putting out there with the percentages?

All the numbers are there. But they're dressed up now in visualizations that are a little bit more approachable and intuitive and also, in some ways, surprising. You want people to pause and stop and think about what they're actually seeing. And they're actually seeing uncertainty, they're seeing different versions of the universe that are all plausible at this stage. And so some people say hey, this is not really a true prediction. It’s just kind of handicapping or something, laying out a table of odds based on current conditions. I think that’s not the worst way to look at it.

When you're doing data journalism, you're trying to quantify uncertainty. And when people are reading your data journalism, they just want to know what's going to happen. There is this consumption divide that you have to figure out a way to bridge, which I think is always a challenge.

Yeah, it's weird. Obviously, we felt that tension in 2016.

Two thousand twelve, too, I’d imagine? In the other direction?

Yeah. But in 2016, if you asked people, what do you think the chances are that Clinton will win, they would have said 90% or 95% or something, right? Based on the tone of coverage from cable news and the Times and the Post and the Journal. I don't know what people would have said, but that's probably why, when you see the 538 landing page for the forecast, you won't see a ton of numbers at first. You’ll see a bunch of maps, and you have to scroll down and you can see the numbers and a very kind of clever ball swarm, as we call it. So we do take seriously the fact that some people misinterpreted the forecast in 2016.

I also think that this has become a debate where some things have been memory-holed and forgotten about. A lot of people in 2016 were actually saying, you have Trump way too high, you know, what are you doing? We think there's a lot of uncertainty in the real world about how things could turn out. And there’s a flip side too, which is that you could have Biden win by 12 points or something. We will probably have Biden with maybe a 10% chance of winning South Carolina, where he might be zero in some other forecast.

Nothing clarifies the difference between the people making data journalism predictions and the people consuming them better than 2016, when you and the model got yelled at for being too bullish on Trump right up until the moment when he won, at which point everybody blamed you for saying Hillary Clinton was gonna win.

It wasn't super fun, but you know, it's also amazing that 538 got so much credit in 2012 when it was a pretty boring and steady race. I think our model still gave Romney, like, an 8% chance, but now we think actually the pre-2016 versions of the model were a little bit overconfident. Probably, in retrospect, Romney should have had a 20% chance on Election Day and not eight or whatever. We really took a hard look at the confidence of our model as a result of the primary in 2016, where we didn’t really build a model until later on, and then what we built was half-assed. As a pundit, I got way out in front of the idea that Trump's not going to win this nomination.

I think that encouraged us in 2016 to be really rigorous about making sure we're spending a lot of time breaking down different types of uncertainty and how that plays in. And so there were changes in the model between 2012 and 2016 that in some ways are bigger than the ones between 2016 and this year. This year it's more like, let's think about the COVID stuff and a few bells and whistles, like changes in voting laws, that we've always in the background thought about trying to account for. But none of these changes are motivated by the fact that Trump won when he was a 70/30 underdog. I mean, that's pretty normal. If we go back and rerun 2016 using our 2020 code, and the additional data point for 2016, it still winds up with Clinton into that 70% range.

I think when you're building a model, you really want to see what it would have said for past elections, but you want a “reasonable answer,” quote unquote, for everything. You don't necessarily want the right answer. Because the right answer means that you're trying to overfit the model and you're fighting the last war. So to some extent, I don't care what its final forecast would have been in 2016 as long as it’s somewhere in the right vicinity between Clinton 60% and Clinton 85%. If we had designed our model in such a way that Trump had been favored in 2016, then we would have done something really wrong. Clinton was ahead in the polls. Not by a lot, but she was ahead in the polls, and the question is, given the margin for her leads and the swing states plus the uncertainties involved, what was the chance polling would not correctly point toward the winner in 2016? It was actually decently high because it was pretty close and there was high uncertainty, high undecideds and so forth, high volatility.

This year is a little bit different, where it's certainly not close right now. But between the fact that there’s still a long time to go and that Trump tends to win the close elections because of the Electoral College, and a little bit of additional uncertainty on the mechanics of voting on Election Day itself, we don't know. What if mail ballots mean that turnout goes way up and it's way more Democratic? That's possible too. There definitely can be errors in both directions. You know, you have a big split between the traditional live-caller polls, which sometimes could be 10, 12, 14 points in favor of Biden. And then you'll have online polls and robo polls showing a tighter race.

I know you account for the different kinds of polling, but it does seem as we talk right now that there haven't been a lot of high-quality live polls recently. And those seem to be something that moves the average. You then need to decide, is the poll movement reflecting people changing their opinions and answering pollsters’ questions differently, or is it simply reflecting the mixture of polls in the average changing?

There's a balance between weighting toward the highest-quality polls and weighting toward all polls. I think actually, in previous incarnations, it did only weight toward live-caller polls. And we decided that is too aggressive an assumption because the live polls are usually more accurate. But there are cases like 2016 where they are not, and so it hedges just a little bit. I think there are really three tiers, right? There's the online polls, there's the live-caller polls, and there's the robo polls that are often very kind of scammy and crappy. They often won't call enough cell phones, they have really Republican-leaning samples, and some of them just kind of seem to think, hey, there's a market to always show optimistic polls for Trump. If I were making subjective calls, then maybe I wouldn't include some of those polls. But I think I do better when I don't make subjective calls.

I know a lot of people depend on the Real Clear Politics average, and yours does diverge from it to some degree. And G. Elliot Morris at The Economist has not only his averages, but also a model that's been out for a little while now. What is it that you’re doing differently to them?

Part of it is that we have decisions we set up in a systematic way. RCP has some degree of manually choosing which polls it wants to include or not include in ways that, to me, don't obviously correlate with quality or any other particular benchmark. And G. Elliot Morris also says, well, this poll isn't doing education weighting, or this is a robo poll, so we're not going to include it. I want to make as few subjective decisions as possible. I want a good set of rules that generalize well, and we're pretty strict about following those rules. Sometimes a pollster will write us and say, hey, we intended our poll to be interpreted this way, and you're doing it that way. Well, I'm sorry, but we actually have a set of rules and they're here on the site. And we have to be consistent.

A lot of the time you do a lot of back-testing [feeding historical data into the model in order to compare it to known outcomes] and you're trying to figure out how to set an equilibrium with the parameters used in the model, such that it picks up on real change and avoids noise. You have this couple-of-months-long process of working on a model, and probably three weeks of that was just running simulations all the way back to 1968. There’s a lot of time spent on tweaking one thing a little bit, running a simulation, and coming back in an hour and 20 minutes to see if that worsened or improved things. There are dangers to being both too aggressive and too conservative. And obviously, different polls have different house effects. So when a Rasmussen Reports poll comes along, it shouldn't be a surprise if it only shows Biden up three and everyone else shows plus eight—that reflects the typical house effect. And so our numbers are generally more stable than RCP’s.

G. Elliot Morris is around 91% in favor of Biden, something in that range. And he was out early and loud with a very firm forecast.

Yeah, they're at 90%. Today, 98% chance of Biden winning the popular vote. They have a couple of things going on. One is they assume that because of polarization, polling drift and error will be lower, which I think is reasonable. Our model uses that as one of the metrics it looks at, but I think you can't just ignore the additional uncertainty brought about by COVID and other news events. And to me, The Economist weights very strongly toward a prior [the assumptions that a predictive model makes before adjusting for data]. And that prior says that Trump is toast, which is different than our prior, which is much more like 50/50 still. And so between those two things, they're quite confident the polls won't move much, and you wouldn’t expect them to move anyway because the economy is so bad that Trump is bound to lose. I am not necessarily convinced. It's not just that polls could move. It's a question of, like, how well can pollsters predict turnout when the mechanics of voting have really changed?

You said before that there are dangers to being too sensitive or not sensitive enough. Are you talking about the dangers of accurately reflecting the world that polling is picking up on?

What we optimize for is a snapshot, not a prediction. It's a first step. So we try to see what settings best predict new polls that will come out in the next two weeks.

You do that with the average of the polling as well as the final model?

Right, and the model will take that average and it can then hedge further. There are a couple of ways that it hedges. One is that we think the prior is more conservative than other models that people are using, so that dampens any polling swing by 20 or 25%. The other thing is that when there are major events—conventions, debates, VP picks—those can cause sharp swings and then revert to the mean. So the model says, okay, the start of the conventions is in a week and a half? Take a snapshot of the polling average from before the conventions and then weight that for some period of time until the convention bounce stabilizes. I think it clarifies things to say, what's a snapshot and then what's a forecast? They differ, and our estimate of what would happen in the election in November is not the same as what would happen in an election two days after the Democratic Convention.

Is this how you replace the 538 nowcast [a much-derided feature of the 2016 model that predicted what would happen if the election were held that day]? Where you try to separate out what we're looking at now from what you project will happen on Election Day?

The problem with the nowcast was that it’s a little bit of a messy concept philosophically. Because if the election were held today, a lot of things would have been different, right? So it's getting rid of the nowcast and saying our polling averages basically are the equivalent of that. Putting probabilities on things wound up just confusing people.

When I look at forecasting models, I can mentally break it down into two parts. There's part one, which is: Right now, given the information we have access to, what is the state of the world? And then there is part two, which is: What do we project that to be going into the future? When you get into part two, it seems to me that the line becomes fuzzier between what is rigorous data analysis and what is assumption. Is that a line that you find yourself up against?

For sure, and I think especially this year, because 2020 is obviously in many respects an outlier economically and in the news cycle. So, how much do you trust the default settings from whatever prior you've been using? We try to stick to the way we've done things in the past as much as possible. This year it's the same six economic variables, but we include a forecasting component so it recognizes that you may be looking at how the economy changes between now and November. We also extended the analysis back to 1880 to capture a fuller range of economic conditions. So you get the Great Depression. You get various panics and stuff like that.

Let’s focus on the economic variables for a second. When you build that all out, how much more precision do you feel like you're adding into the model?

Not that much. The problem is we have had robust state polling basically since 1972, which is not that large a number of cases. If you look at how the economic index performed from 1972 onward, it actually is fairly predictive. But we punish it because we think that it would not have been as predictive if we really hadn’t known the results ahead of time. When I originally designed the system in 2011, the economic index data covered, I think, 1968 through 2008. That means that we’re building a model using the data from 1968–2008 and then seeing how well it performs on the elections outside of that window. And the answer is it tells you a little bit, but not nearly as much as it told you about 1968–2008. Because of that, we intentionally penalize the impact of the economic data prior in the model.

It sounds like, to the extent that you're making judgment calls, you're using them to make the model less sure of itself.

I think when you build models like this, there are a lot of things you optimize for that probably result in some mild overfitting [when a model fits the existing data too well, it can be a sign that the model is too focused on explaining what happened in the past and not enough on predicting the future]. And so given that you kind of unconsciously make a lot of those decisions, it's usually good to put a little bit of a finger on the scale toward having more uncertainty. I was a little surprised when we first turned it on this year. I thought that Biden would be higher. But, you know, sometimes it's good when a model doesn't exactly match your intuition.

It's the old Bill James line, right? If you build something and it spits out 10 names, you want eight of them to be names that you expected. Because if it's 10, it's not adding anything to your knowledge base. And if it's three, well, it’s maybe wrong.

Yeah. If you look back to 1972, there are a lot of elections since then where things move by six or seven or eight points or more between early August and Election Day. The reason I say six or seven points is, although Biden is up by more like eight points in the national average, he's up by six or seven points in the tipping-point states. So what's the chance of a six- or seven-point polling error? I think what the model is saying is that if you combine the movement that could occur before Election Day and then error on Election Day itself, it’s still 28% or so. Now, to be clear, if Biden holds his current numbers, then it would shoot up to probably around 90%, low 90s by Election Day, which is still a long way from 99% or something.

But, obviously, there's some theories that say, hey, because of increased polarization, because there are fewer swing voters and there aren't many undecideds, the average has not been very volatile this year. There's more polling. There are all these perfectly reasonable theories that explain why variance, and what we call drift, between now and Election Day would be smaller than usual. And that’s a counterweight against COVID, and the economy has never posted numbers like this, and shit’s crazy, right? There's an unprecedented amount of news. And so we wind up basically assuming that variance will be as high as it is, on average, in the historical data set, instead of assuming that we've solved this problem. In 2016, you'd have wound up saying the variance is probably somewhere more in the middle, because there were so many undecided voters and third-party voters in 2016, and that year the polling average was volatile. This year there are not many undecideds and no real third party—sorry, Kanye. The polling average has been stable, but we just can't ignore the elephant in the room, which is COVID.

We're in a world now where COVID has happened and shit is unquestionably weird. You always have to account for the possibility of unusual circumstances, but does the fact that we’re already way off the normal path impact how you go about building the model this time around?

I think when you wind up with an edge case, then you want to kind of think through, am I doing things in ways that generalize well, that’d also handle this case well? Like, if you’re working on an NBA metric and it has Giannis way lower than you would think, you might examine something and say, okay, well, why is he good?

Right. Is he good in a way that breaks the model, and that's what makes him good, or is he good in a way that the model should account for and thus we should adjust the model?

Right. And ideally, like, oh, you adjusted it. Now it also handles Kevin Durant better, another player that for some reason we're underrating. And so you're looking and saying, okay, can I come up with generalizations? Certainly it seems that if the economy is kind of going haywire, where you have millions unemployed and millions become reemployed and GDP falls by 36% annualized or whatever, then economic uncertainty is high. And you can measure that. But all we're really saying is, if you go back and look at the history of elections for which we have rigorous polling, an eight-point lead in the popular vote in August translates into Biden winning the popular vote 80-something percent of the time, 82% or something. However, because Trump has an Electoral College advantage, it’s 72%, which in some ways is still fairly high.

This interview has been edited and condensed.

Originally Appeared on GQ