A simple time series model of pandemic excess…

Spiro P. Pantazatos, PhD

Feb 8

Anyone with basic stats skills and internet can follow along

Read →

19 Comments

jr vildmarks

Feb 11

In my view, you allow too little time for 2. shot disaster to appear. Link

https://metatron.substack.com/p/alberta-just-inadvertently-confessed

So you see that 50% of break-thru cases appear in 15 days after 1st shot. But the damage takes more time after 2nd shot, the months 5,6,7 are peaking as breakthru cases.

This guy has a snapshot two months earlier, on nov 4th

https://robertmoloney.substack.com/p/what-the-alberta-covid-19-dashboard

The situation after the first shot is the same, only younger ages have been added.

However, the second shot is still very much evolving, it really jumps within 2 months.

And most likely there is data in the making; 1st shots are done, but there are many people under 5 months afters 2nd shot.

Please note that this Alberta data captures the Delta vawe effect; omicron was a game changer (robert has some graphs on it).

Also, by eye you can see the correlations to hospitalizations and deaths...

Expand full comment

Reply (1)

Spiro P. Pantazatos, PhD

Feb 11

If I understand your comment correctly, you are saying the model could be improved if it could incorporate lagged and persistent effects of the vaccine, and separate out effects of the first dose vs. the second dose? If so, I agree, but there are some challenges/limitations to implementing these changes including multicollinearity etc.

Expand full comment

henjin

Feb 10

How much variation did vaccines explain when you didn't allow each COVID wave to have a different term in your model?

Your 2015-2019 average baseline exaggerates excess deaths in 2021 and 2022 relative to 2020. Part of your variation explained by vaccines might actually be due to your inaccurate baseline, because your baseline produces superfluous excess deaths in 2021 and 2022 that happen to partially coincide with vaccination waves. And because your model has a different term for each COVID wave, it allows the weight of COVID waves in 2021 and 2022 to be reduced in order to accommodate a higher weight to vaccines.

In your first plot which shows excess deaths in the CDC dataset, there's no week where the excess mortality is even close to zero after the first few weeks of 2020. However at Mortality Watch if you plot ASMR with a 2010-2019 linear baseline, there's even a few weeks with negative excess mortality in March and April of 2022: https://www.mortality.watch/explorer/?c=USA&ct=weekly&df=2020%2520W01&bm=lin_reg.

You wrote that the CDC dataset had a total of 1,743,770 excess deaths in 2020-2022. When I downloaded the CDC dataset, I got the same result for MMWR weeks in 2020 and 2022 as a whole when I looked at the column "Number above average (unweighted)". I got 585,409 excess deaths on MMWR weeks in the year 2020, 670,667 in 2021, and 487,694 in 2022:

t=fread("AH_Excess_Deaths_by_Sex__Age__and_Race_and_Hispanic_Origin_20250211.csv")

t[Sex=="All Sexes"&RaceEthnicity=="All Race/Ethnicity Groups"&AgeGroup=="All Ages",sum(`Number above average (unweighted)`),MMWRyear]

However when I used my own more accurate method to calculate excess deaths where I multiplied the 2010-2019 linear trend in CMR for each age by the mid-year resident population estimates of the age, I got only about 1.27 million excess deaths in 2020-2022: sars2.net/rootclaim.html#Table_of_excess_deaths_by_cause. I got about 468,885 excess deaths in 2020, 515,125 in 2021, and 285,019 in 2022. So the CDC dataset had about 117,000 more excess deaths in 2020, 156,000 in 2021, and 203,000 in 2022, so the CDC dataset exaggerated excess deaths each year but it was particularly bad in 2022.

Expand full comment

Reply (1)

Spiro P. Pantazatos, PhD

Feb 11

When individual COVID waves are not modeled, the doses term is not significant (p=0.16) and it explains only an additional 0.4% variance. Omitted variable bias may be masking the effect of the doses in this model. You raise a valid point about the baseline not taking into account any pre-pandemic yearly trends. In terms of your next sentence, "And because your model has a different term for each COVID wave, it allows the weight of COVID waves in 2021 and 2022 to be reduced in order to accommodate a higher weight to vaccines." is a bit less clear. When the doses term is added, I see the w5 and w6 have lower weights, but the w7 and w8 have higher weights. If the vaccines are a better fit to excess deaths then I would expect the individual COVID wave weights to change a bit. I wasn't able to easily locate your (more conservative and probably accurate) excess death calculations in your second link, but I did see another CDC spreadsheet that appear to correct for yearly (and seasonal) trends (https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm). Do you have any comments on the methods the CDC uses for this spreadsheet in relation to your method for calculating the excess deaths?

Expand full comment

Reply (1)

henjin

Feb 11

On the website by CDC you linked, the dataset under "Download Data > CSV Format > National and State Estimates of Excess Deaths" has about 1.31 million excess deaths on weeks ending in 2020-2022, so it seems to be more accurate than the CDC dataset you used which had about 1.74 million excess deaths on MMWR years 2020-2022:

e=fread("Excess_Deaths_Associated_with_COVID-19.csv")

e[Type=="Unweighted"&State=="United States"&`Week Ending Date`%like%"202[012]",sum(`Observed Number`)-sum(`Average Expected Count`)] # 1312316 (unweighted)

e[Type=="Predicted (weighted)"&Outcome=="All causes"&State=="United States"&`Week Ending Date`%like%"202[012]",sum(`Observed Number`)-sum(`Average Expected Count`)] # 1312344 (weighted)

The code above shows that the excess deaths were nearly the same using the weighted and unweighted figures, because the weighting was used to impute deaths that were missing due to a registration delay, but the last version of the CDC dataset was published in 2023 when only a few deaths were still missing in 2020-2022 because of a registration delay.

But anyway there's still something weird even with CDC's more sophisticated method of calculating the baseline, because on CDC's excess_deaths.htm if you look at the plot titled "Weekly number of deaths (from all causes)", the actual deaths are below the baseline on almost every week of 2018 and 2019: https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm. But on the other hand the plot seems correct in the respect that it has weeks with negative deaths in March and April 2022.

But in either case it would've probably been better to use one of the other datasets by the CDC that used the more sophisticated method to calculate the baseline rather than the dataset which used a simple 2015-2019 average baseline.

The dataset you used was titled "AH Excess Deaths by Sex, Age, and Race and Hispanic Origin". I didn't even find it linked on CDC's main excess_deaths.html page but only here: https://www.cdc.gov/nchs/covid19/covid-19-mortality-data-files.htm. A note on the same page says: "AH = Ad-hoc. Datasets with the prefix AH are not updated routinely, but can be updated upon request."

---

I posted a more detailed response here: sars2.net/rootclaim2.html#Linear_regression_model_by_Spiro_Pantazatos.

I think I figured out why you didn't include a model where you compared cases plus vaccines against excess deaths without allowing each COVID wave to have its own term, which is that it gave you a negative coefficient for vaccines. Or at least that's what happened to me when I tried to reproduce your models.

And when I compared COVID deaths plus vaccines against excess deaths, it also gave me a negative coefficient for vaccines, regardless of whether I took the excess deaths from the same ad-hoc CDC dataset you used or I used my own more accurate method of calculating excess deaths.

If you would've done the models directly based on COVID deaths instead of cases, you wouldn't even have needed a separate term for each COVID wave, because it would've taken into account how different COVID waves had different CFR values. If your goal was to evaluate the contribution of vaccines to excess deaths beyond the deaths attributed to COVID, then it seemed like a weird approach to do the model indirectly based on COVID cases rather than directly based on COVID deaths.

So did you in fact also do models based on COVID deaths, but you didn't publish them because they gave you a negative coefficient for vaccines?

Expand full comment

Reply (2)

henjin

Feb 11

I forgot to ask: Can you list the exact ranges of MMWR weeks you used for each COVID wave so I can reproduce your model?

Expand full comment

Reply (2)

Spiro P. Pantazatos, PhD

Feb 12

01/04/2020 - 06/06/2020: w1

06/13/2020 - 09/05/2020: w2

09/12/2020 - 03/06/2021: w3

03/13/2021 - 06/26/2021: w4

07/03/2021 - 10/30/2021: w5

11/06/2021 - 04/02/2022: w6

04/09/2022 - 10/22/2022: w7

10/29/2022 - 12/31/2022: w8

Expand full comment

Reply (1)

henjin

Feb 12

Thanks, 5 of my weeks were off by 2 and 1 week was off by 1.

I'm now getting an r^2 of about 0.780 without the vaccine term and 0.789 with the vaccine term. So I still don't understand what I'm doing different from your calculation:

library(data.table);library(MMWRweek)

waves=data.table(year=rep(2020:2022,c(3,3,2)),week=c(1,24,37,10,26,44,14,43))

owid=fread("owid-covid-data.csv")[location=="United States"]

vax=owid[,.(MMWRweek(date),vax=new_vaccinations)][,.(vax=sum(vax,na.rm=T)),.(year=MMWRyear,week=MMWRweek)]

case=fread("daily-new-confirmed-covid-19-cases.csv")[Entity=="United States"]

case=case[,.(MMWRweek(Day+8),cases=.SD[[3]])][,.(cases=sum(cases)),.(year=MMWRyear,week=MMWRweek)]

excess=fread("AH_Excess_Deaths_by_Sex__Age__and_Race_and_Hispanic_Origin_20250211.csv")

excess=excess[Sex=="All Sexes"&RaceEthnicity=="All Race/Ethnicity Groups"&AgeGroup=="All Ages"]

excess=excess[,.(excess=`Number above average (unweighted)`,year=MMWRyear,week=MMWRweek)]

me=merge(merge(excess,case,all=T),vax,all=T)[year%in%2020:2022];me[is.na(me)]=0

me$wave=me[,findInterval(year*100+week,waves[,year*100+week])]

me=cbind(me,`colnames<-`(sapply(1:8,\(i)me[,ifelse(wave==i,cases,0)]),paste0("w",1:8)))

summary(lm(excess~w1+w2+w3+w4+w5+w6+w7+w8,me)) # r^2 is about 0.780

summary(lm(excess~w1+w2+w3+w4+w5+w6+w7+w8+vax,me)) # r^2 is about 0.789

Expand full comment

Reply (1)

Spiro P. Pantazatos, PhD

Feb 12

Thanks for double checking and the replication attempt. I was not as precise as you with the lag (I did not apply an 8-day lag to the daily cases prior to down sampling to weekly resolution which seems a better approach), plus there may be an error in my code which I will double check tomorrow. The OWID website says this about the daily case count: “In addition, there is a delay between testing, confirming, and reporting a case to international organizations. This means the numbers do not necessarily reflect the number of cases on the specific date.” but I don’t see any more details about how long of a delay there is. What happens if you don’t apply any lag to the cases, or just apply 3 or 4 day lag? I’ll debug some more on my end and be back in touch shortly.

Expand full comment

I now used these as the starting weeks of each wave after I had shifted the cases 8 days forwards: 2020 week 1, 2020 week 24, 2020 week 39, 2021 week 12, 2021 week 27, 2021 week 46, 2022 week 16, and 2022 week 45.

However in a model where I included a separate term for each wave, my r^2 value was about 0.780 regardless of whether I added a term for vaccine doses or not: sars2.net/rootclaim2.html#Linear_regression_model_by_Spiro_Pantazatos.

So I don't know what I did different from your calculation. This time I even took data for cases from the OWID dataset "Daily new confirmed COVID-19 cases per million people", which showed a moving average of cases per capita with a daily precision, even though earlier I took the weekly number of cases from the new_cases column in the file owid-covid-data.csv.

Expand full comment

Spiro P. Pantazatos, PhD

Feb 11

After having slept on it, the 2015-2019 average baseline is not necessarily less “accurate”: it is reporting excess deaths relative to a 2015-2019 averaged baseline. If I understand correctly, your approach and the more sophisticated CDC adjustment assumes that mortality rates were on an upward trend from 2015 through 2019, that this same upward trend would have continued in years 2020-2022, and that this upward trend has nothing to do with the excess deaths attributable to the pandemic. The first assumption appears questionable in light of a virtually flat age-adjusted death rate from 2013 to 2018 (see https://www.cdc.gov/nchs/data-visualization/mortality-trends/index.htm) . When I squint the line even suggests a slight downward trend from 2017-2018. Moreover, when you look at the first graph of the post (excess deaths across time), the yearly linear trend appears to decrease from 2020 to 2022 based on the “troughs” being lower in each subsequent year. Also, I wouldn’t expect the excess deaths to ever go to zero or be negative in this period given that US COVID cases were always positive from the beginning of the first wave in 2020 through the end of 2022. Even if we assume the yearly trend in excess deaths was increasing from 2013 through 2019, and even if we assume it would have increased at the same rate in 2020-2022 in the absence of the pandemic, I don't think it doesn’t make sense to try and “remove” that trend because the same reasons that would have caused mortality rates to increase steadily in 2013-2019 (i.e. increasing obesity, chronic disease etc.) would be the same factors that exacerbate and contribute to the pandemic excess deaths (i.e. COVID comorbidities).

Regarding your interesting alternative approach of using COVID deaths, instead of COVID cases, as a predictor of all-cause excess deaths: first, I would not trust the accuracy of a COVID deaths regressor given the varying definitions of ‘COVID deaths’ (i.e. dying with, or of COVID) and variability in physicians’ thresholds (and possible hospital incentives), across time and between states and hospitals, for adjudicating a ‘COVID death’; second, given the increased risk of both COVID infection and death in the first weeks post-injection (see e.g. https://metatron.substack.com/p/alberta-just-inadvertently-confessed ), the COVID death regressor may actual include a substantial number of vaccine-caused deaths, and would result in model misspecification to the extent that vaccine deaths are conflated with COVID deaths, and third, it is counterintuitive (to me at least) to use one type of death (which is a subset of the main outcome variable) to predict all-cause deaths and is not typical of epidemiological time series models that aim to measure the effects of environmental *exposures* that may *contribute* excess deaths. Compared to COVID deaths, COVID cases were measured relatively consistently throughout the pandemic, and while they did not accurately reflect the true infection rate, they consistently undershot the actual infection rate and this underascertainment bias doesn’t really affect in the model because it’s the shape of the wave that matters and each wave is scaled anyways to fit the outcome variable.

In response to your last question, no, I did not also do models with COVID deaths. I did not previously think of this as a modeling approach until you mentioned it. I think it is an interesting approach to consider. However, for the above reasons, I think modeling COVID cases is a better modeling strategy.

Expand full comment

Reply (1)

henjin

Feb 12

The page you linked shows an age-standardized mortality rate and not a raw number of deaths. You can see here that ASMR was roughly flat in the 2010s but the raw number of deaths went up in the 2010s: https://www.mortality.watch/explorer/?c=USA&t=asmr&df=1999, https://www.mortality.watch/explorer/?c=USA&t=deaths&df=1999. And even before COVID the raw number of deaths was projected to start increasing even more steeply in the 2020s than the 2010s: https://www.census.gov/library/stories/2017/10/aging-boomers-deaths.html.

You said there is an elevated risk of COVID death in the first weeks following vaccination. But in the English ONS data during each of the first four months of 2021, unvaccinated people had higher COVID ASMR than people who had received the first dose less than 21 days ago. During other months the number of deaths in the group "First dose, less than 21 days ago" was so small that the COVID ASMR was not listed. Data for the first three months of 2021 was excluded from the last two editions of the dataset which were based on the 2021 census, but it's still available from the earlier editions: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/bulletins/deathsinvolvingcovid19byvaccinationstatusengland/deathsoccurringbetween1january2021and31may2022.

Expand full comment

Reply (2)

Spiro P. Pantazatos, PhD

Feb 12

Yes you are right that raw excess deaths will increase if there are more older people in the population. But age is also a significant predictor of COVID death so I'm still not clear on the motivation behind removing the linear trend for the baseline. At any rate, I meant to say that both infection risk and risk of death (from side effects such as myocarditis/heart attack or stroke which could also be classified as "COVID deaths" if they also tested positive for COVID) increase in the first few weeks post-injection. I'm not too familiar with the ONS data and their methods, but when I took a first pass at the ONS data that you sent (I clicked on "Deaths by vaccination status, England" in Section 2 of the link you sent, and then downloaded and opened Table 1, Column G of the XLS dataset), it actually shows a consistently *higher* all-cause ASMR for "First dose, less than 21 days ago" and "First dose, at least 21 days ago" vs. "Unvaccinated" beginning in June, 2021 onwards, and especially in 2022. When I look at "Deaths involving COVID-19", I also see many months where "First dose, at least 21 days ago" has higher ASMR than "Unvaccinated" from May 2021 onward, but what doesn't make sense to me is why the "First dose, less than 21 days ago" has so few counts while the "First dose, at least 21 days ago" has many more counts in the same month...does this make any sense to you? If enough time passed so that there were enough people that died from more than 21 days since their 1st dose, doesn't that mean there should also be a fair number of deaths among people less than 21 days since the first dose? Or am I missing something?

Expand full comment

Reply (1)

henjin

Feb 12

After the first half of 2021 there were no longer many new people getting vaccinated and especially not among the age groups that have the highest risk of death from COVID, so there's obviously not that many deaths under "First dose, less than 21 days ago". But the group "First dose, at least 21 days ago" even includes people who got vaccinated a year ago. There's also a reduced number of all-cause deaths in the first 3 weeks after vaccination, so even after adjusting for observation time, the group "First dose, less than 21 days ago" has much lower all-cause ASMR than the group "First dose, at least 21 days ago".

And also after the second dose was rolled out, the ASMR of the group "First dose, at least 21 days ago" shot up because the healthy vaccinees moved under the second dose but the so-called "unhealthy stragglers" remained under the first dose. A similar phenomenon can also be seen in the Czech record-level data: sars2.net/czech.html#Plot_for_ASMR_by_dose_and_date.

The reason why the group "First dose, less than 21 days ago" has higher all-cause ASMR than unvaccinated people after early 2021 might be because of a phenomenon I'm calling the "late vaccinee effect", where people who got vaccinated late after the main rollout wave was over had higher mortality than people who got vaccinated during the main rollout wave.

A similar phenomenon exists in the record-level datasets from the Czech Republic and Connecticut and in the UKHSA FOIA data that was given to Clare Craig in May 2024:

sars2.net/czech.html#Triangle_plot_for_excess_mortality_by_month_of_vaccination_and_month_of_death, sars2.net/connecticut.html#ASMR_by_month_of_vaccination, sars2.net/uk.html#Mortality_rate_by_week_of_vaccination_up_to_the_end_of_2022.

One explanation for the late vaccinee effect might be if the late vaccinees are less conscientious than people who got vaccinated on time so they might also have poorer health, or another explanation might be if the late vaccinees include people who had some kind of a sickness during the main rollout so they had to delay vaccination.

But if you look at total ASMR across all months of vaccination, then people who have gotten the first dose less than 3 weeks ago should have much lower ASMR than unvaccinated people: sars2.net/czech2.html#Excess_mortality_by_weeks_after_vaccination.

Expand full comment

henjin

Feb 12Edited

In this Czech dataset there's a total of 43,633 COVID deaths listed, but only 13 of them occurred on the same week number as the week of the first vaccination (https://www.nzip.cz/data/2135-covid-19-prehled-populace):

t=fread("Otevrena-data-NR-26-30-COVID-19-prehled-populace-2024-01.csv")

t[Umrti!="",.N] # 43633 (total COVID deaths)

t[Umrti!=""&Umrti==Datum_Prvni_davka,.N] # 13 (COVID deaths on week of first dose)

Considering that many people got vaccinated during the COVID wave in early 2021, there's actually surprisingly few COVID deaths on the week of the first dose. (But of course the average observation period here doesn't consist of a full 7 days, because if for example someone got vaccinated on Friday at midday, then they had only 2.5 days left to die on the same week when they got vaccinated. But assuming that people usually got vaccinated during weekdays and during working hours, the average exposure time might be around 4.5 days.)

But actually upon further thought, maybe it's not surprising that there's a low number of COVID deaths on the week of vaccination, because people normally didn't get vaccinated immediately after they had been diagnosed with COVID, and people usually took at least a few days after a diagnosis to die from COVID.

Expand full comment

Spiro’s Newsletter

A simple time series model of pandemic excess…