Thanks I'll look into a readtable alternative for Octave. Your heatmap looks similar to mine, but flipped about the x=0 weeks line. Are you sure your lags are correct (i.e. your -10 weeks is really +10 weeks?)
You're right, in my plot a lag of +5 weeks actually meant that the vaccine doses of week 10 were shifted to week 5, even though it should've been the other way around. But even after I fixed it, the results are still not that similar to your plot.
But anyway, I suspect the reason why different age groups get high t-values with a lag of about 2-3 weeks is because the vaccination data was not stratified by age. If the data would've been stratified by age then younger age groups might have gotten more high t-values with negative lags.
Do you mind posting your updated heatmap, with x-axis [-10 weeks, -9 weeks, ... 9 weeks, 10 weeks] ? I imagine there will be minor differences in our results because we used different languages and including negative lags changes the FDR critical p-value threshold, etc. In looking at your heatmap, it looks like the younger ages will have higher t-values at later weeks, at around +6-8 weeks, while the older ages peak earlier at around +2-3 weeks. As you state, this might be better explained by differences in the timing of the vaccine rollout by age group, rather than biological differences that affect the timing of vaccination adverse effects in difference age groups.
Some age groups got t-values over 3 even with lags of -20 weeks and 20 weeks. I think it's because the model is highly flexible because there's 8 different terms for cases and the terms for cases are even allowed to have negative weights. Most of my models with a p-value below 0.05 had at least one negative weight for cases.
Thanks. Given that both the vaccine and COVID regressors consist of "waves" (as does the outcome variable) then it's not surprising if some negative lags yield some clusters of (false) positives where the vaccine regressor becomes more correlated with the COVID regressor and/or the outcome variable. What about a heatmap of the overall model performance (i.e. R-squared) as a function of vaccine term lags? If the adjusted R-squareds are consistently higher for positive lags vs. negative lags within each age group then that would be strong evidence that the model is correctly specified for positively lagged vaccine terms and misspecified for negatively lagged terms. Please see Part 2 of this series for why negative terms in one or more "w" regressors does not invalidate the model. Also your updated results show consistently elevated adverse effects for ages 25-40 from 0 to 20 weeks, consistent with adverse effects of myocarditis that can take years to cause death and with reports of elevated excess deaths in 2023 and 2024 (https://bmjgroup.com/high-excess-death-rates-in-the-west-for-3-years-running-since-start-of-pandemic/) especially in younger age groups. If it is not the vaccine that is causing this effect in your results, then what is? If you posit it is noise/false positive, then how would you better control for false positives in a way that is not overly conservative and/or better model excess deaths using COVID cases and vaccinations using time-series approach?
Most age groups got the highest r^2 value with a lag of 1 to 4 weeks: https://i.ibb.co/nNqj5b1R/stimped-with-r-squared.png. But also the w4 term got a negative coefficient in all models with a lag of 1 to 4 weeks.
71 out of 697 models got a negative coefficient for the vaccine term, even though the p-value of the vaccine term wasn't below 0.05 in any of them.
The reason why the r^2 values were consistently higher for positive lags might partially be because the data for vaccine doses was not age-stratified.
As for how I would make the models, I would rather include only a single term for COVID deaths and not multiple terms for COVID cases. And all data should be age-stratified. I'll try to make properly age-stratified models of the Czech data next.
The paper by Mostert et al. you linked only looked at data up to the end of 2022, so it didn't include 2023 or 2024. I got only about 1% excess ASMR in the United States in 2023 and about -5% excess ASMR in 2024: https://sars2.net/ethical2.html#Is_there_still_excess_all_cause_mortality_in_2024. And in the United States most excess deaths in ages 15-44 have been because of drugs and external causes.
You wrote "the temporal pattern of excess deaths (and vaccination peaks to some extent) will vary across age-groups and time". But I think your "to some extent" should be the other way around, because the spikes in deaths occur around the same time in all age groups where COVID deaths had sufficient impact on all-cause mortality, because all age groups were hit by COVID waves around the same time (but the youngest ages have so few COVID deaths that COVID waves don't have much effect on all-cause mortality). But on the other hand the vaccination peaks occur at different times in different ages because younger age groups got vaccinated later than older age groups.
For example in the Czech Republic in November to December 2021, there was a spike in excess deaths which roughly coincided with a spike in the number of new vaccine doses administered if you look at all ages combined together: https://sars2.net/czech.html#Daily_deaths_and_vaccine_doses_by_age_group. But if you look at age-stratified data, vaccine doses peaked about a month before deaths in ages 80+ but about a month after deaths in ages 40-59, because the deaths peaked around the same time in all ages but the older age groups got booster doses earlier than younger age groups.
You could've also tested for negative lags in your model. If negative lags would have also improved your t-values, then the reason why adding the lag improved your t-values was not necessarily due to vaccines causing deaths with a delay.
If you would've used age-stratified data for vaccine doses administered, there might have been more variation across age groups in what the optimal lag values were. I haven't found data for vaccine doses administered by date and narrow age groups in the United States. This file has cases by 10-year age groups however: https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf.
When I tried running your code in Octave, it said `the 'readtable' function is not yet implemented in Octave`.
I tried reproducing your heatmap in R so that I also included negative lags, but my results were completely different from your code, so I don't know if you can figure out what I did wrong: https://pastebin.com/raw/vCY0MM9k, https://i.ibb.co/QS89fkk/stimped-reproduction-fail.png.
Thanks I'll look into a readtable alternative for Octave. Your heatmap looks similar to mine, but flipped about the x=0 weeks line. Are you sure your lags are correct (i.e. your -10 weeks is really +10 weeks?)
You're right, in my plot a lag of +5 weeks actually meant that the vaccine doses of week 10 were shifted to week 5, even though it should've been the other way around. But even after I fixed it, the results are still not that similar to your plot.
But anyway, I suspect the reason why different age groups get high t-values with a lag of about 2-3 weeks is because the vaccination data was not stratified by age. If the data would've been stratified by age then younger age groups might have gotten more high t-values with negative lags.
Do you mind posting your updated heatmap, with x-axis [-10 weeks, -9 weeks, ... 9 weeks, 10 weeks] ? I imagine there will be minor differences in our results because we used different languages and including negative lags changes the FDR critical p-value threshold, etc. In looking at your heatmap, it looks like the younger ages will have higher t-values at later weeks, at around +6-8 weeks, while the older ages peak earlier at around +2-3 weeks. As you state, this might be better explained by differences in the timing of the vaccine rollout by age group, rather than biological differences that affect the timing of vaccination adverse effects in difference age groups.
Here's the corrected version of the heatmap: https://i.ibb.co/HD2WNSNg/stimped-reproduction-v2-probably-still-wrong.png. I also extended the range of the lags to ±20 weeks.
Some age groups got t-values over 3 even with lags of -20 weeks and 20 weeks. I think it's because the model is highly flexible because there's 8 different terms for cases and the terms for cases are even allowed to have negative weights. Most of my models with a p-value below 0.05 had at least one negative weight for cases.
Thanks. Given that both the vaccine and COVID regressors consist of "waves" (as does the outcome variable) then it's not surprising if some negative lags yield some clusters of (false) positives where the vaccine regressor becomes more correlated with the COVID regressor and/or the outcome variable. What about a heatmap of the overall model performance (i.e. R-squared) as a function of vaccine term lags? If the adjusted R-squareds are consistently higher for positive lags vs. negative lags within each age group then that would be strong evidence that the model is correctly specified for positively lagged vaccine terms and misspecified for negatively lagged terms. Please see Part 2 of this series for why negative terms in one or more "w" regressors does not invalidate the model. Also your updated results show consistently elevated adverse effects for ages 25-40 from 0 to 20 weeks, consistent with adverse effects of myocarditis that can take years to cause death and with reports of elevated excess deaths in 2023 and 2024 (https://bmjgroup.com/high-excess-death-rates-in-the-west-for-3-years-running-since-start-of-pandemic/) especially in younger age groups. If it is not the vaccine that is causing this effect in your results, then what is? If you posit it is noise/false positive, then how would you better control for false positives in a way that is not overly conservative and/or better model excess deaths using COVID cases and vaccinations using time-series approach?
Most age groups got the highest r^2 value with a lag of 1 to 4 weeks: https://i.ibb.co/nNqj5b1R/stimped-with-r-squared.png. But also the w4 term got a negative coefficient in all models with a lag of 1 to 4 weeks.
71 out of 697 models got a negative coefficient for the vaccine term, even though the p-value of the vaccine term wasn't below 0.05 in any of them.
The reason why the r^2 values were consistently higher for positive lags might partially be because the data for vaccine doses was not age-stratified.
As for how I would make the models, I would rather include only a single term for COVID deaths and not multiple terms for COVID cases. And all data should be age-stratified. I'll try to make properly age-stratified models of the Czech data next.
The paper by Mostert et al. you linked only looked at data up to the end of 2022, so it didn't include 2023 or 2024. I got only about 1% excess ASMR in the United States in 2023 and about -5% excess ASMR in 2024: https://sars2.net/ethical2.html#Is_there_still_excess_all_cause_mortality_in_2024. And in the United States most excess deaths in ages 15-44 have been because of drugs and external causes.
You wrote "the temporal pattern of excess deaths (and vaccination peaks to some extent) will vary across age-groups and time". But I think your "to some extent" should be the other way around, because the spikes in deaths occur around the same time in all age groups where COVID deaths had sufficient impact on all-cause mortality, because all age groups were hit by COVID waves around the same time (but the youngest ages have so few COVID deaths that COVID waves don't have much effect on all-cause mortality). But on the other hand the vaccination peaks occur at different times in different ages because younger age groups got vaccinated later than older age groups.
For example in the Czech Republic in November to December 2021, there was a spike in excess deaths which roughly coincided with a spike in the number of new vaccine doses administered if you look at all ages combined together: https://sars2.net/czech.html#Daily_deaths_and_vaccine_doses_by_age_group. But if you look at age-stratified data, vaccine doses peaked about a month before deaths in ages 80+ but about a month after deaths in ages 40-59, because the deaths peaked around the same time in all ages but the older age groups got booster doses earlier than younger age groups.
You could've also tested for negative lags in your model. If negative lags would have also improved your t-values, then the reason why adding the lag improved your t-values was not necessarily due to vaccines causing deaths with a delay.
If you would've used age-stratified data for vaccine doses administered, there might have been more variation across age groups in what the optimal lag values were. I haven't found data for vaccine doses administered by date and narrow age groups in the United States. This file has cases by 10-year age groups however: https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf.
The Czech Republic has daily data with fine age groups for vaccine doses administered (https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19/ockovani.csv), cases (https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19/nakazeni-hospitalizace-testy.csv), COVID deaths (https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19/umrti.csv), and excess deaths (https://sars2.net/f/czdeadproj.csv).