New research founded on unbiased machine learning algorithms has statistically confirmed widely reported data suggesting that racially segregated counties in the United States have disproportionately high levels of COVID-19 infections and mortalities, reinforcing that such areas may be in greater need of vaccines and other resources.
Gerard Torrats-Espinosa, a data scientist from Columbia University, verified anecdotal findings and less exhaustive data reports by creating and analyzing unbiased statistical models that account for nearly 96% of the U.S. population, incorporating 2,174 counties.His findings, set to be published Feb. 16 in Proceedings of the National Academy of Sciences, were released early by the journal due to their urgency.
The results suggest that for every 100,000 residents in a county, a county that is more racially segregated than average will have an additional 105 COVID-19 infections and four related deaths. They also indicate that in counties with greater segregation between Black and white people, the Black mortality rate is 8% higher than in counties with less segregation.
“I think it tells us where we should be doing more outreach, where we should be doing more testing, and that we should be more proactive in vaccinating and in making sure communities have the resources they need, to stay at home when they have to,” Torrats-Espinosa told The Academic Times.
He hopes his work will help policymakers decide where to focus vaccine and resource distribution to help curtail the spread of the virus — in communities largely inhabited by racial minorities — by providing indisputable mathematical proof behind the assertion that these communities are being disproportionately affected.
"In the early stages of the pandemic, you could hear in the news or in conversations with scholars that it seems like the pandemic is impacting the most disadvantaged communities, specifically in places where African Americans and Hispanics concentrate," Torrats-Espinosa said. "But, there was no formal test or serious approach to try to put some numbers to the intuition."
To respond to this gap in the literature, Torrats-Espinosa began by compiling a list of 50 county factors that could impact COVID-19 infection and mortality. The group included every relevant variable that Torrats-Espinosa could come up with, in an effort to ensure that all ground was covered to prevent the same pushback that some other reports have experienced. Categories ranged from essential worker employment status and health risk factors to air pollution and political views.
“I said, 'I am going to use a statistical method that is transparent and honest, that will tie my hands behind my back and not let me pick any of the controls myself. I will let the algorithm tell me which ones are the controls I should be fitting into a regression,'" Torrats-Espinosa said. "That's why I chose the machine learning approach."
He used double lasso regression, a machine learning method, that allowed him to first select the most accurate controls from his original 50 factors, and then place them into a later regression.
Torrats-Espinosa tested the possibility that the results could be biased by running sensitivity analyses, which checked for any unmeasured factors that could confound the relationship between racial segregation and COVID-19 outcomes. Such a factor was found to be very unlikely.
The study’s final results revealed a very strong correlation between racial segregation and COVID-19 infection and mortality, particularly among minorities in those communities.
Torrats-Espinosa believes that the results can be explained by a combination of three elements: the prevalence of pre-existing health conditions in minorities, greater employment among minorities as essential workers and the dense population of minority-heavy communities.
And the impact of these disparities is not limited to those communities.
In short, minorities are more likely to get the virus, and if they are exposed to others of their own racial group, which they often are, they are also likely to spread the virus to other members of the already-vulnerable group. Taking it a step further, once they have unavoidable interactions with those outside their racial group, the virus will continue to spread to people of other racial backgrounds.
“It creates this almost perfect storm, where you have a group of people who are more vulnerable to COVID-19 because of pre-existing health conditions, and at the same time you have them congregating in very small, densely populated parts of a city," Torrats-Espinosa explained. "The virus almost keeps reproducing itself."
Another new study published in PNAS by Indiana University researchers suggests that the pandemic is widening socioeconomic inequality, hurting disadvantaged groups even more.
For minority populations, this continuous cycle likely proliferates the virus by recreating the same conditions that caused them higher rates of infection to begin with. For instance, an increase in unemployment will boost demand for essential worker jobs.
The Indiana University study noted their results were in line with “The Matthew Effect,” which can be summarized by the age-old adage, “The rich get richer and the poor get poorer.”
The study "Using machine learning to estimate the effect of racial segregation on COVID-19 mortality in the United States," published in the Feb. 16 edition of Proceedings of the National Academy of Sciences of the United States of America, was authored by Gerard Torrats-Espinosa, Columbia University.