Are coronavirus statistics reliable?

What’s happening

As the coronavirus has spread throughout the world, checking in on statistics has become something of a daily ritual for many people. Tracking the number of cases and deaths in a specific country or region offers a sense of how quickly the virus is proliferating, or in some cases being contained.

The data is incredibly important. It lets lawmakers and public health officials know if containment measures are working and how many resources need to be sent to hospitals. The figures also inform models predicting the severity and duration of the epidemic.

Despite the intense interest in this data, statistical models can sometimes create more confusion than clarity. Predictions of the number of deaths in the U.S. range from 100,000 to as many as 2.2 million. Expert reports on the virus’s mortality rate — the percentage of people who contract the virus who die — have varied from up to 3.4 percent down to less than 1 percent. The gap between these numbers represents millions more possible deaths.

The wide variance in these figures raises questions about how accurate coronavirus data really is and whether numbers may be so flawed that they’re not worth paying attention to, let alone setting public health policy by.

Why there’s debate

A handful of voices in right-wing media have speculated, without evidence, that coronavirus figures in the U.S. are being inflated to make President Trump look bad. Most experts, however, say it’s more likely that the number of people who have contracted the virus is dramatically higher than official counts of confirmed cases.

In many places, only the most severe cases ever get tested, which means a potentially large number of people who have manageable symptoms or no symptoms at all never show up in the data. The silver lining of this shortcoming is it could indicate that the fatality rate of infections is much lower than official statistics suggest. At the same time, experts believe some deaths from COVID-19 may be being missed in the counts.

Reporting issues could also be skewing the data. Each state in the U.S. has its own methods of releasing its numbers, with varying information included. Global statistics also rely on figures that are prone to miscounts, either because of errors or because leaders are suppressing the numbers in their countries.

Health experts stress that even flawed statistics can be valuable in combating the virus. When a number of data sets are taken together, rather than individually, they can help control for the variances inside any one set of numbers. Others argue that it doesn’t matter much whether the numbers are imperfect as long as they convince people to follow health guidelines like social distancing and hand washing, which they have in many places.

What’s next

As the volume of data grows day after day, so does our understanding of the virus, experts say. Even with improved statistics and the benefit of time, it’s possible that we will never have definitive numbers of the virus’s true toll, some experts say.


Numbers of known cases are meaningless without widespread testing

“The data collected so far on how many people are infected and how the epidemic is evolving are utterly unreliable. ... We don’t know if we are failing to capture infections by a factor of three or 300.” — John P.A. Ioannidis, Stat

Lack of statistical clarity is hampering the response to the virus

“What we’re experiencing now is the fog of pandemic. The officials tracking COVID-19 are swimming in statistics: infection rates, case-fatality ratios, economic data. But in these early stages of the fight against the coronavirus, these figures each have their own particular limitations.” — Derek Thompson, Atlantic

The fatality rate may be much lower than testing numbers suggest

“If the number of actual infections is much larger than the number of cases — orders of magnitude larger — then the true fatality rate is much lower as well. That’s not only plausible but likely based on what we know so far.” — Eran Bendavid and Jay Bhattacharya, Wall Street Journal

We have enough information to make the broad public health decisions we need to make

“We need to make policy decisions and clinical decisions now. You can’t say, ‘Let’s wait a month until we have the data.’” — Infectious disease specialist Jeremy Farrar to Nature

Increased testing may make it look like the virus is spreading more rapidly than it really is

“Another thing happening is that massively expanded testing is revealing more cases, making it appear that cases are shooting up dramatically when in fact the number of tests may be what is spiking.” — Editorial, Providence Journal

It’s possible that more people are dying than we know about

“Lot of discussion about how we're missing positive cases and therefore the fatality rate is lower than we think, but much less discussion if we're missing Covid fatalities.” — MSNBC host Chris Hayes

Data from other countries may be manipulated for political reasons

“The numbers that the World Health Organisation publishes, the numbers that journalists and governments around the world refer to, are contaminated with politics. They are not useless, they tell us something, but they paint a skewed picture.” — Michael Meyer-Resende, EU Observer

Lack of uniform reporting guidelines makes state-to-state data a mess

“The lack of federally mandated rules on reporting positive cases and deaths reflects an ongoing problem with major disaster responses in the US — the CDC’s guidelines and recommendations don’t add up to a cohesive national system, because states can choose if and when they use those recommendations.” — Nidhi Prakash and Ellie Hall, BuzzFeed News

The fatality rate varies too much to make any broad inferences

“There is no single ‘fatality rate’ — there are many. The fatality rate for the United States is going to differ from the fatality rate in a country where, say, diabetes is less prevalent. The same could be said for the rates within the U.S. — if the virus spreads in a metro area with many elderly residents, the fatality rate calculated there will be higher than if the epicenter was in a city that skewed younger.” — Maggie Koerth, Laura Bronner and Jasmine Mithani, FiveThirtyEight

Even flawed data sets have value if their methods are consistent

“If different nations have different standards and conditions, they at least generate a consistent curve if those standards and conditions are stable across time.” — NYU professor Lisa Gitelman to CNN

The data is getting better over time

“As we learn more about the fatality rates for different populations of people, though, it will hopefully help governments make the best decisions about public health policies — as well as the best allocation of resources to protect and treat the most vulnerable.” — Katherine Harmon Courage, Vox

Health officials know not to put too much stock into imperfect data

“It’s a lesson that endures throughout medicine: Look at the big picture, not a single piece of data. Triangulate on the truth, using all the sources of information you have, no matter how good a single test. And don’t be shy about questioning a conclusion that doesn’t fully fit the facts.” — Harlan M. Krumholz, New York Times

Photo illustration: Yahoo News; photo: Mandel Ngan/AFP via Getty Images