Second Iteration: My incredibly amateur 2024 US Presidential Election model
My adventure to design an election model from scratch continues.
Welcome back!
For those of you who didn’t see the first iteration of my model, see here. (tl;dr below*)
This post contains the second iteration of my model. I have made multiple changes to the model to reduce the Democratic bias, hopefully providing more realistic predictions.
Like before, please don’t take my model’s predictions too seriously.
I am far more comfortable with the predictions of this iteration, though I won’t deny that the model still favours Democrats. I suspect that this is still due to the problems that I have previously outlined.
Nonetheless, some major problems have been corrected for. These include:
The overly large Democratic margins in the rust belt swing states have been reduced
Texas is no longer predicted as an incredulously close tossup
“Dead” swing states (i.e. Iowa, Ohio, Florida) are no longer projected as close tossups though the model still views Florida as somewhat close
Other swing states (i.e. Nevada, Arizona, North Carolina, Georgia) have more realistic margins, albeit margins that again still favour the Democrats
In this post, I will detail the changes I have made and include some further discussion on why the model still favours Democrats. The bulk of my explanation of the model lies in the original post, if you are interested.
Like before, I again make nine predictions. The model predicts the performance of each party if national popular vote projections are correct and if they are skewed in each party’s favour. For each national popular vote prediction, I also make further predictions that rely on state-level opinion polling to various extents.
Table of Contents
Introduction
Predictions
Probabilities
*tl;dr I built an election model from scratch as a personal project with the primary goal being to come up with any model, not necessarily the best model given my level of statistical knowledge. The model had a massive Democratic bias and made insanely unrealistic predictions.
Predictions last updated: 08 Sept 24
Standard National Popular Vote
These predictions are based on a situation where the national popular vote projections are correct.
GOP-Overestimated National Popular Vote
In these predictions, projections for the national popular vote overestimate Republican support and the party underperforms.
Dem-Overestimated National Popular Vote
Here, the predictions model a situation where projections for the popular vote overestimate Democrats to the extent that Harris vastly underperforms.
Electoral College Victory Probabilities
The chart below shows the probabilities that either candidate wins the electoral college. The three sets of predictions selected are those that I find to be the most likely scenarios we’ll see in November.
Average Electoral College Victory Probabilities over Time
The chart below tracks the probability of each candidate winning the electoral college over time. An average is taken from the three probabilities above.
State-by-State Probabilities
The table below shows the probabilities for a Democratic or Republican victory in each state based on three of the above predictions. Note that:
Std./High = Standard National Popular Vote, High Reliance on Polling
GOP+/Mod. = Dem-Overestimated (GOP-Overperformance) National Popular Vote, Moderate Reliance on Polling
GOP+/High = Dem-Overestimated (GOP-Overperformance) National Popular Vote, High Reliance on Polling
What changed?
Some quick background info…
My model makes predictions, in fact, based on two separate models. The first model predicts the state-level vote share for the incumbent and non-incumbent parties based on retrospective voting. It has a particularly strong reliance on economic data. The second model predicts the share of the vote each party receives in each state using demographic data. Both models rely strongly on a state’s historic voting pattern and the national popular vote. The results from both models are combined using calculated weights and are supplemented with polling data.
For more details including the specific variables I used, feel free to consult my original post.
In my original post about the first iteration of my model, I outlined a few key problems. These includes an assumption that variables all have the same effect in every state, difficulty accounting for voting behaviour change, quality of data, and the calculation of a partisan voting index.
The changes I detail below all attempt to remedy for the problems above, and it appears that these changes have yielded some success in reducing the Democratic bias.
Changes to the Retrospective Voting Model
A key challenge that the retrospective model faced was a lack of recent data. A lot of economic data for 2024 has yet to be publicly available. So, instead I had to rely on data from 2023 and even 2022 when calculating annual changes. To mitigate the effects of this poor quality data, I decreased the emphasis that the retrospective model placed on economic data.
Measuring Variables using Quantiles
Another issue I faced was that my model did not make any distinctions between election years when analyzing the 2000 to 2020 elections. It treated each election year in each state as its own independent observation (e.g. Alabama 2000). This problem created a major issue as it led to the effects of some variables being overestimated.
For example, let’s take religiosity. More religious states tend to favour Republicans and this effect is very clear when you compare states within each election year. However, if you compare the effect of religiosity across years, that effect is not as strong. So, in the original model, the effect of religiosity was overestimated since the model could see that states where 60% believed in God were far more Republican than states where 40% believed in God. The model focuses on this big difference and ignores the possibility that it might simply be that states that are relatively more religious than others favour Republicans more. Given that the overall national trend is the country becoming less religious, the large effect extracted from conducting an ‘absolute’ analysis of religion as opposed to a ‘relative’ analysis led the model to exaggerate the swing in favour of Democrats.
To remedy for this, I instead now measure religion through quantiles so that a relative comparison is instead used. So, the model now predicts a state’s vote share through how religious it is relative to other states in an election year rather than through how religious it is overall. I adopt a similar approach for many other of the demographic variables I measure too, such as educational attainment, union membership, and so on.
Recalculating the Partisan Voting Index
This change is fairly simple. In the first iteration of my model, I didn’t give enough weight to the most recent elections. The latest election, for example, was only weighed 50%. I have now increased its weight to two-thirds. I have also increased the relative weight of the second most recent election too, which should enable us to gain a better picture of a state’s voting habits in recent history.
I suspect that too much weight being given to even older elections previously led former swing states such as Ohio and Iowa to re-emerge as swing states in the first iteration of my model.
Incorporating Polling Data
In the article for the first iteration of my model, I explained that polling data was included as my model didn’t strongly consider the contemporaneous political climate as it focused on economic and demographic variables. To mitigate this problem at the time, I incorporated polling data for some key swing states.
In this iteration, I have incorporated even more polling data. The predictions now include polling data for all states.
Persistent Limitations
Despite the changes I have made, some of the core issues remain and these limitations will persist. For instance, as much as I try to mitigate them, methodological issues such as the lack of distinction between election years and states will continue to have an effect.
I also struggle to most effectively capture changes in voting behaviour. I think a key problem that this limitation poses is that the model is unable to properly ascertain the effect that right-wing populism has on certain demographic groups—groups that political sociologists might term ‘modernization losers.’ The issue with this ‘populism problem’ is that in the period that I based my model on (2000 to 2020), only two cycles feature a right-wing populist. Moreover, even if I did include right-wing populism in my model, I am unsure of how I would operationalize such a variable. Regardless, the appeal of populism in my model is something that I have not yet been able to incorporate and I believe that this explains the model’s favouring of Democrats in the rust belt swing states and Florida. There are groups in these states that previously reliably voted Democrat but in recent years have pivoted to Republicans given Trump’s populist appeal.
In my original article, I wrote that:
…you could also argue that 2016 marked the end of [the political climate that begun in 2000].
This is what I meant. It turned states that had previously been swing states into safe states. It turned states that had previously been safe states into swing states. This upheaval caused by right-wing populism is something that my model struggles to account for.
One other issue that I think the model has is that it overestimates the rate at which some changes in voting behaviour occur. Some states such as Texas, Arizona and Georgia are certainly taking on demographic profiles that mirror that of traditionally Democratic states. And yes, demographics and social cleavages undoubtedly play a significant role in determining partisan preferences. However, there are other factors that do so as well. Many individuals in these states were still socialized in a manner conducive to Republicans. Accordingly, even though the states’ demographic profiles are starting to match a ‘Democrat’ profile, that doesn’t mean voters are switching their votes immediately. Overall, I think this issue means that the model provides a good image of where potential gains for Democrats lie in the future. Indeed, many commentators predict that Texas will turn blue in the next decade.
What’s next?
For the time being, I will refrain from making any further changes to my model. I think I have exhausted all possible changes I can think of so far and any additional changes would require either a substantial revision to the methodology or the collection of additional data (i.e. to address omitted variables)—both of which would require a substantial time commitment. Quite frankly, I think I have sunk enough time into this project over the past several days and it is time for me to carry on with the other things I have going on. However, if anyone does have any suggestions for how to improve my model, I’d be more than happy to hear them and even find the time to incorporate them.
I nevertheless have some good news, if you are, for some wild reason, very interested in my rather limited, amateur election model. I have made the process of updating my model far more efficient. So, I will now be able to update my model far more often than once weekly, so feel free to check back to this page fairly regularly. I indicate the date of the latest update to the model at the end of the introduction.
If you’ve gotten this far into my post, I’d like to extend my sincerest gratitude for your interest in my work. If you want to check out the full results in a spreadsheet, click here. As mentioned above, I intend to update the model a few times a week, so please check back from time to time.
If you have any questions or comments, please feel free to message me @jandthejuls on literally any social media platform (i.e. Twitter, Instagram, Discord).