COVID-19 Vaccinations in the United States: A Study

Project Description

Introduction

It’s no secret that COVID-19 has dramatically changed our lives over the course of the past year. It’s only recently, as vaccines have started to become available, that society has begun to return to normal.

When we sat down to discuss what to study, we were all drawn to the dramatic impact of vaccinations. And although different aspects of the issue, ranging from differential state effects, to political correlations, to adaptations over time, caught our attention, it was clear that we wanted to study vaccination data. We thought we’d turn to the United States, as our previous project had a more international focus.

Once we’d settled on a topic, we began to look for data sources. In the end, we ended up with six datasets. The first two focused on state level vaccination data, in California and Tennessee respectively. In a similar vein, there was a dataset that contained information about Tennessee on a county-wide level. The next two were more general, containing basic information about doses administered over time: one was from the CDC, one was from our world in data. Finally, we used a dataset from the MIT Elections Lab to learn about past political results.

From there, we moved to analysis. That work was broken into three main sections. They are all easily accessible via tabs on the left-hand side of your screen. Enjoy!

Comparison: CA and TN

An issue central to policy response to the COVID-19 pandemic is how to efficiently, equitably, and safely administered vaccinations among subpopulations of a given country’s citizens. Especially given the race-related events, movements, and justice and injustices in the past year, equitable vaccine distribution among racial groups is a very important, highly studied topic. Below, we will analyze vaccine administration by race in California and Tennessee, two historically partisan states, to see how well racially just policies and public health hopes are faring as they are put into practice. To calculate the proportion of racial groups, I scaled vaccine counts by racial subpopulations in California and Tennessee.

Unfortunately, the datasets available on California and Tennessee vaccinations had different racial categories and fineness thereof. Regardless, there are important conclusions to draw from the data. It is clear that in both California and Tennessee, Black Americans received COVID-19 vaccinations at the low rates. This trend began from the beginning of vaccine administration in both cases, demonstrating a clear racial inequity for this group.

Similarly, most other groups of people of color in California were vaccinated at rates below Caucasian Californians (except for Asian Californians and Native Hawaiian/Pacific Islander Californians). This suggests that Black and Brown communities encounter barriers to receiving vaccinations. Asian Californians appear to receive vaccines at similar rates to Caucasian Californians, which suggests a possibly less stark set of systemic barriers keeping Asian Californians from accessing the COVID-19 vaccine. Lastly, Native Hawaiian or Pacific Islander Californians appear to have consistently received the COVID-19 vaccine from the start. One theory for this occurrence is that Native Hawaiians whose tribal status is federally or regionally recognized may be able to receive the COVID-19 vaccine more easily via Indian Health Service vaccination clinics. However, this theory does not align with the low rates that non-Hawaiian Natives appear to be receiving vaccines.

In Tennessee, we also see a similar vaccination rate between Asian Tennesseans and Caucasian Tennesseans. This also suggests that Asian Tennesseans encounter less stark systemic race-related barriers to COVID-19 vaccination. However, Black Tennesseans appear to receive COVID-19 vaccinations at lower rates than their non-Black counterparts, which suggests that vaccine distribution between Black and non-Black communities in Tennessee is not equitable.

Lastly, it is important to note that in general, the overall vaccination rates in Tennessee are higher than in California. I theorize that this is because the Tennessee dataset did not specify whether vaccine counts meant fully vaccinated persons or vaccine doses administered, whereas the California dataset supplied counts of fully vaccinated persons. Since the overall purpose of this visualization was to compare rates between racial groups, and I believe that it is a fair assumption to say that racial groups and vaccine type (i.e. one-dose series or two-dose series) are not confounded, I hope that you all will find this visualization adequate.

The above graph shows the change in total vaccine doses in California between February 15 and April 15, 2021. I chose to analyze total doses instead of proportion of populations to demonstrate the incredibly high need for vaccine doses in highly populated areas, such as Los Angeles County (colored dark purple in the April 15 map). It appears that total doses administered falls highest in more populous counties, specifically those that encircle the Bay Area, Los Angeles, San Diego, Sacramento, and the Central Valley. If you look closely, you can see that some less populous counties saw a change in vaccine dosage between February 15 and April 15, but that that change is much smaller than the changes in populous areas. This can be explained by three main factors, the first (and most obvious) that vaccine counts should be lower where there are less people, the second that rural areas tend to be more conservative and conservative populations tend to have more anti-vaccine individuals in them, and the third (and most important) that public health infrastructure in rural areas tends to be less thorough, presenting many access issues for agricultural communities, Native communities, elderly people, and other populations that live in rural areas.

Lastly, the above graph shows the change in total vaccine doses in Tennessee between February 15 and April 15, 2021. It appears that total doses administered falls highest in more populous counties, specifically those that encircle Nashville, Memphis, Knoxville, and Chattanooga. Similar as to in California, it appears that less populous counties saw a small change in vaccine dosage between February 15 and April 15. This suggests that rural counties in Tennessee face challenges in administering vaccine doses to their populations that are similar to those in rural Californian counties.

Politics and Vaccinations

Data and Methods

Much has been made of the possible correlation between political inclination and vaccine hesitancy, a phenomenon where people refuse to get their COVID-19 vaccination shot, even though the inoculation has been proven sound from a medical perspective. Articles such as this one from Fortune Magazine and this one from the New York Times allege that Trump voters and, to some extent, Republicans, are more likely to refuse vaccination.

To visualize this idea, I drew from two data sources. First, I relied on the CDC’s vaccination records. I was comfortable with the reliability of that source. I also drew from the MIT Election Lab, which provided me a comprehensive list of state’s voting records, including during the 2020 presidential election, which I was most interested in.

From there, it was a matter of data-wrangling: I merged the two datasets, selected the variables I was going to use, and created the required new variables. The first of these was a ratio of vaccines delivered vs. administered, per state. The second was an indicator variable, which allowed me to see how a state voted in 2020 (Trump vs. Biden).

After that, it was a matter of visualization. I proceeded in four steps. I wanted to take a highly scientific approach, convincing myself of the validity and importance of understanding politics role in vaccination, as it’s such a hot button issue.

  1. Look at the big picture. For this, I made a map of state-level vaccination administration ratios, then compared it to a map that showed the percentage of people vaccinated.

  2. The above convinced me of the importance of understanding what leads to low administration ratios, as much of the time, that appeared to be a barrier to achieving high levels of per-capita vaccination. To confirm this inkling, I turned to a scatterplot. Once satisfied there, I lastly checked a linear model. This led me to decide that it was valid to see if my factor of interest, politics, could play a role.

  3. I created a map of the United States, colored by politics, then shaded by ratio. I thought I saw a clear trend towards Republican states being worse at administering the vaccines they were given, but I wanted to investigate further.

  4. Looking at a boxplot, I confirmed my visual assumption: Republican states had a much lower median ratio.

These steps, and conclusions, are described in more detail on the next tabs.

Initial Maps, Scatterplot

Below are the first two visualizations I created, meant to justify my investigation of political impacts on vaccination ratios.

The above maps were meant to provide a bird’s eye view of the issue. I was interested in state’s success with vaccination, which meant I wanted to look at underlying factors that could explain differential population inoculation. Specifically, I wanted to know whether the proportion of vaccines administered out of the total received by a state appeared to be correlated with overall statewide progress. Investigating this question was crucial, as it wouldn’t make sense to look at dose administration if it turned out it was almost identical across states and, further, was not correlated with percentage vaccinated. If that had been true, it would’ve been better to look at medical infrastructure, or see if some states were getting less relative doses.

To disprove these hypothesized possibilities, I created the above maps. They allowed me to visualize, in a very easy way, whether it was worth considering vaccination ratio as a possible cause of some state’s advancements. The maps confirmed, in my mind, that it was. As you can see, states with very dark shading on the left-hand map tended to also have dark shading on the right-hand map. In simpler terms, this meant that when a state had much of its adult population fully vaccinated, it was also efficient about using the doses that it was given.

As I’d hoped, this suggested that at least some of the differential progress states are making in fighting COVID is due to their own medical progress, not due to the government misallocating shots.

The other benefit of creating these maps was that it allowed me to look at regional patterns. I could see that the South, in addition to some Great Plains states, had very low levels on both maps. Those are historically Republican regions. That inspired me to go ahead with my political research.

## 
## Call:
## lm(formula = percent_full_18 ~ ratio_overall, data = ratios.2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.9933 -1.6603 -0.2191  2.0164 10.2985 
## 
## Coefficients:
##               Estimate Std. Error t value     Pr(>|t|)    
## (Intercept)      1.240      5.709   0.217        0.829    
## ratio_overall   46.007      7.170   6.416 0.0000000532 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.509 on 49 degrees of freedom
## Multiple R-squared:  0.4566, Adjusted R-squared:  0.4455 
## F-statistic: 41.17 on 1 and 49 DF,  p-value: 0.00000005322

In this section, I also created a scatterplot. The aim here was simple. In some circumstances, the eye can trick us. I wanted to confirm that the correlation I thought I saw actually existed, rather than being a result of my own bias.

As we can see though, there is a reasonably strong, positive linear correlation between the perfect of the 18+ population that is fully vaccinated and the ratio administered. It’s true that we see some variability and some clustering, but I would say it’s far from a random scatter.

Further investigation confirmed that a linear model predicting vaccination progress from the administration ratio explained 46% of the variability in vaccinations on the statewide level. That’s almost half, using only one predictor! Additionally, the ratio is a statistically significant predictor of the percentage vaccinated, further solidifying their relationship.

Given this research, there’s a clear motive to understanding what’s causing these failures in administration, as we can then better combat them and move towards vaccination. To do that, we turn to politics.

Political Map, Boxplot

This section is intended to investigate the link between a state’s voting record in the 2020 presidential election and the state government’s ability to administer as many COVID-19 vaccine doses as they receive. In all cases, when I describe a state as Democratic, I mean that it “went blue” in the 2020 election. In simpler terms, I mean that the majority of the population voted for Joe Biden. Conversely, when I say that a state is Republican, I mean that the majority of the population voted for Donald Trump, the incumbent.

I chose this election to focus on as it’s the most recent. Additionally, Trump has changed the dynamics of the Republican party, so whether a state voted to reelect him is, for these purposes, the most relevant information about a state’s modern political inclinations.

The above map shows many interesting results. First, we can see what we already knew from the previous maps, with regard to ratios across the United States. Areas in the the South and Great Plains had some of the lowest ratios. However, stratifying by political result allows us to catch a much wider pattern. There is more dark blue than dark red. Conversely, there is more light red than light blue.

These results imply that Republican areas are likely to have a lower amount of vaccinations administered out of the ones received. That means they’re wasting more doses and, based on the above results, they probably have less percentage of their population above 18 fully vaccinated.

What could explain this trend? Well, there has been a lot of misinformation spread by conspiracies such as Q-Anon. These conspiracy theories have an easier time slipping into Republican discourse, as Q-Anon is pro-Trump and more pro-Republican. Given that these conspiracies tell people not to get the vaccine, for fear of being tracked by the government, it would make sense that Republican states are suffering from worse vaccine hesitancy, preventing them from being able to give out the doses they’re receiving. If people simply don’t want the shots, the hospitals can’t use them.

That’s not to say that there’s no misinformation on the Democratic side, but it might be impacting the sides differently, with stronger influence in Republican states.

To visualize the above information in another way, we can look at the above boxplot. Visual color patterns are a good indicator of what’s happening, but this simplified diagram allows us to compare empirical numbers.

The pattern we noticed on the map is more pronounced here. The median ratio of vaccination administration for states that voted for Joe Biden in 2020 is around 0.825. The median level for states that voted for Donald Trump is around 0.755. That’s almost a 10% difference (0.07, to be exact). To interpret that in context: looking at the median, Republican states are wasting 7% more doses than Democratic states.

Additionally, the range for Democrat states, with the exception of one outlier (Georgia), is higher and more condensed, with all states above 0.7 and some as high as about 0.925. Meanwhile, Republican states see more range, with some as low as almost 0.6. We can also see that the entire interquartile range for Democrats is higher than the Republicans IQR. That’s a significant amount of difference between groups.

To summarize, then: we have evidence that one of the major factors impacting the speed at which states can vaccinate their populations is their ability to use all of the vaccine doses they’re given. One underlying factor of that ratio is whether people are actually willing to be inoculated. You can’t use all your shots if much of your population doesn’t want them.

The final bit of analysis showed that one reason people don’t want doses is possibly because of their political inclination. We can see on a statewide level, that likely had an effect. There’s certainly a difference in medians when stratified by voting history. If Democratic states consistently have higher ratios, the two variables are probably correlated.

The one outlier is Georgia, which is notably historically Republican, and flipped this election.

Tracking Statewide Vaccinations

When vaccines were first starting to be administered, each state had their own guidelines for eligibility, meaning who was allowed to get vaccinated. Most states started with first-line-responses, such as doctors, nurses, and other medical workers. Then, they expanded to the state’s most vulnerable populations, and continued to extend the guidelines, using age, occupation, and preexisting health conditions as a marker for who would be allowed to get vaccinated. Each state followed a similar process, and each state was allocated a certain number of vaccine doses each week, the amount proportionate to their population.

However, we see that some states opened up their guidelines much faster than other states. For instance, on one side of the spectrum, New Jersey was one of the last states to open up vaccine eligibility to everyone 16 and older, on April 19th. On the other side, Alaska, Alabama, and Arkansas opened up on March 9th, March 24th, and March 30th, respectively (US News).

This led me to my main questions of:

  1. How did each state differ in the number of doses they were administering each day?

  2. Are some states vaccinating their populations at a faster rate than other states?

To answer these questions, I decided to make a time lapse of the United States measuring the cumulative number of doses that were given overtime in each state. I started January 13th, and went up to May 15th of 2021, cumulatively adding the number of doses given each day in each state.

For instance, if on January 13th, there were 13361 doses given in Massachusetts, and 14697 doses given on January 14th in Massachusetts, the number that I measured for January 13th would be 13361 and the number that I measured for January 14th would be 13361 + 14697 = 28058.

Below is the main code that I used to make the time lapse visualization. First, I merged the vaccine dataset that I created (kv_states) with the usa_states dataset which contains the information to make a map of the United States. Then, I mapped that new dataset, called usa_vaccine_states, and used gganimate to bring it from a static map to a timelapse.

usa_states <- map_data(map = "state", region = ".") 

# merging the US map dataset with the vaccine dataset
# using inner_join to remove alaska & hawaii
usa_vaccine_states <- kv_states %>%
  inner_join(usa_states, by = c("state" = "region"))


#animate the map to become a time lapse/gif  
anim_map_us <- usa_vaccine_states %>%
ggplot() +
 geom_polygon(aes(x = long, y = lat, group = group, fill = cumulative_vaccinations)
            ,color = "white") +
 theme_void() +
 coord_fixed(ratio = 1.3) +
 labs(fill = "Number of People Vaccinated"
      , title = "USA Daily Vaccinations by State"
      , subtitle = "{closest_state}"
  theme(legend.position="right") +
  scale_fill_distiller(palette = "YlGnBu", direction = "horizantle") +
    
  # using gganimate's transition_states to inform how the map should move
  transition_states(state = Day, transition_length=2
                  , state_length = 5)
anim_map_us

#adjusting the number of frames to cycle through in order to cycle through all of the dates
animate(anim_map_us, nframes = 2*length(unique(usa_vaccine_states$Day))
        , renderer = gifski_renderer("kriti/usavaccinesfinal1.gif"))

Here is the result:

Cumulative Number of Vaccine Doses Delivered by State

As we see on the time lapse, it seems that California, Texas, Florida, and New York consistently delivered the most vaccine doses from the beginning of the vaccine rollout. For instance, on January 19th, these states had begun to turn green, while the other states were still yellow.

Cumulative Number of Vaccine Doses Delivered by State

And, the trend continues throughout the time lapse as well. As we start to see how the other states’ cumulative number of vaccine doses given changes from January to May, we can see that in general, the right half of the country seems to be on the upper side of the scale, while the left half of the country is on the lower end of the scale.

However, it’s important to note that while California, Texas, Florida, and New York have the most vaccine doses given, they also have the biggest populations. Because they have the biggest populations, it makes sense that they would get a larger number of doses allocated to them, which means that they would give a larger number of doses each day, meaning their cumulative number of vaccine doses given will be very high.

So, in order to be as accurate as possible in my conclusions (and answer my second question), I decided to make another map that tracked the cumulative number of vaccine doses given as a percent of state population. That way, I could account for the population when measuring how fast each state was vaccinating their populations. That way, a lower populated state would not be overshadowed by larger states if the proportion of cumulative vaccine doses given was the same as a larger state, even if the raw number was much lower.

In order to do this, I created a new column that took the cumulative number of vaccines for each day, and divided it by the state population for each day in each state.

From there, I used the same process as I did for the first timelapse.

Here is the result:

Cumulative Number of Vaccine Doses Delivered by State as a Percent of Population

At first glance, I was surprised to see how much the map standardized. In the previous map, measuring the number of vaccinations, the map had states from all different places on the scale. However, in the second map, all of the states were a shade of blue or dark blue, which meant that their cumulative numbers of vaccines doses given as a percent of population were all close to each other, and close to the higher end of the scale.

However, taking a closer look at the map, I realized that the states are not actually very similar in terms of cumulative vaccine doses as a percent of populations. If we look at a static map of May 15, here is what we see:

Cumulative Number of Vaccine Doses Delivered by State as a Percent of Population on May 15

While all of the states are a shade of blue, the shades of blue are much different, ranging from approximately 50% to 100%, which is about half of the scale. For instance, the top three states in terms of Cumulative Number of Vaccine Doses as a Percent of Population are Vermont, Massachusetts, and Connecticut–their percents are 101.41%, 99.62%, and 97.70%. The bottom two states are Alabama, and Mississippi at 55.82% and 55.41%, much lower than the top three states. So, while there is less of a discrepancy than the raw numbers, there is still a large difference between the rates that each state is vaccinating their populations at.

Lastly, I created a shiny app that represents the same information in line graph form, so that someone can more easily compare vaccine doses given within different states. The user is able to click through and decide which variable they want to measure. Below is a description of the variables:

  • Cumulative Vaccinations: the cumulative number of vaccine doses given each day in a state

  • Cumulative Vaccines Delivered as a Percent of Population: the cumulative number of vaccine doses given each day in a state divided by the state population, multiplied by 100

  • Vaccinations Administered per Day: the daily number of vaccinations delivered each day in a state

  • Vaccinations Administered per Day as a Percent of Population: the daily number of vaccinations delivered each day in a state divided by the state population, multiplied by 100

What I found really interesting, was that when we look at Vaccinations Administered per Day as a Percent of Population, the states as a group were steadily increasing until about mid-April, and then started a decline after that.

Shiny App: Vaccinations Administered per Day as a Percent of Population

This matches the NY Times finding that vaccine administration started decreasing steadily, after the pause of use of the Johnson and Johnson vaccine. I thought it was very interesting that all of the states moved together as a unit, rather than branching off at any point. This was interesting, because while there was a lot of range in number of doses given, the states all moved together to a decline at this mid-April point.

Limitations

As a limitation, I would like to note that second time lapse map is measuring the cumulative number of vaccine doses administered as a percent of population, and does not necessarily accurately reflect the number of people fully/partially vaccinated. The data I used only had the number of vaccine doses given per day, and since Pfizer-BioNTech and Moderna require 2 doses to be fully vaccinated, while Johnson and Johnson only requires one shot, I did not have a way to extract information needed to measure the number of people fully/partially vaccinated in each state. I do think it would be extremely informative to have a map that did have the number of people fully vaccinated as a measurement. Additionally, due to limitations of the US map dataset, I was unable to include Alaska and Hawaii in my temporal analysis.

Next, with regard to the political analysis, it’s important to clarify that a major limitation is our inability to determine causality. Without preforming an experiment, we cannot make any causal assumptions: rather, we can determine associations. This really impacts the conclusions we’re able to meaningfully draw. In further research, we could try and conduct at least a randomized study of political affiliation and vaccine opinion, to try and get closer to establishing a stronger type of connection.

Conclusion

We discovered several interesting patterns with regard to vaccination in the United States.

Looking at vaccine rollout by race in California and Tennessee revealed inequitable vaccine dosage, suggesting that communities of color encounter starker barriers to vaccine access than white communities do. Also, vaccine rollout visualizations between February 15th and April 15th suggest large dose increases in urban areas and bring the issue of sparse rural healthcare to light.

Also, we found that while there has been a lot of discussion within the United States about different states moving faster than others when it comes to vaccinations, most states are giving vaccination doses at a more similar rate when taking into account their populations, even if raw numbers differ widely. However, the rate still ranges widely, from ~55% to ~101% of population, which means that some states are vaccinating their populations at a much higher rate than other states. Additionally interestingly enough, all of the states moved to decline at mid-April together in terms of daily vaccinations.

That said, there are still some differences between states. We found that a major factor underlying those differences appears to be the ratio of received vaccines to vaccines that are actually administered. One reason that some states appear to be struggling to give out their doses is politics: support for this idea comes for the fact that Republican states have lower ratios than Democratic states, generally speaking.