My incredibly amateur 2024 US Presidential Election model
Join me on my adventure to design an election model from scratch.
Hi all, I’m Julian and I’m a second-year student studying History and Politics at Oxford.
Below, please find an election model on the 2024 US Presidential Election that I created, an explanation of how it works, and some discussion of its more-than-noticeable Democratic lean.
Importantly, please don’t take my model too seriously.
When developing it, my goal was first and foremost to build a model and then to deal with how rigorous it was. This model is based on my statistical knowledge, which I am more than willing to admit is somewhat limited. I simply hoped to construct some model, not necessarily the best model.
I make nine predictions in total. My model is very dependent on the estimated national popular vote, so I have predictions where it is correct according to polls and projections and I have predictions where it is skewed in favour of each party. For each popular vote measure, I also make predictions that weigh opinion polling in swing states to various extents. My goal is to update the model at least once every week.
As I have already mentioned, my model is limited and I disagree with many of its predictions. So, if you are going to look at them, I highly suggest you skip all the predictions until the Dem-Overestimated, High Poll Reliance model as the others are painfully unrealistic.
EDIT: This version is now deprecated and will not be updated. Please see my second iteration here. It will be regularly updated.
Table of Contents:
Introduction
Predictions
Model last updated: 02 Sept 2024
Standard National Popular Vote
Please ignore this section! Well not really ignore, but please don’t take this section’s results seriously! The model leans to the Dems so much that it embarrasses me. Iowa? Ohio? Texas? Tossups? So unrealistic. So, keep on scrolling to the very bottom where I discuss what I think could’ve gone wrong.
GOP-Overestimated National Popular Vote
Keep on scrolling! The results in this section feel even LESS probable. The embarrassment just continues to mount, my goodness. These results make me want to cry.
In all seriousness, if you are still reading, these results are what the model predicts if the GOP vote is overestimated to the extent that the Dems win the popular vote by nearly 5 points.
Dem-Overestimated National Popular Vote
Here, you’ll find the predictions if the Democratic popular vote is overestimated and the GOP popular vote is underestimated, giving a margin of around 1.5 points in favour of the Democrats. Such a margin come election day may not be realistic. However, I find the state-by-state predictions here to be far more likely than any of the above, especially those the first projection that relies strongly on polling. Nevertheless, I still find these predictions to be rather unrealistic and exceedingly generous to the Dems.
How the Model Works
Overview
Simply, my model is based on an analysis of Presidential election results from 2000 to 2020. When using the model to predict each state in those elections, states are correctly predicted at a rate of 96%. In 2020, only Georgia and Arizona were incorrectly predicted. However, elections and voting behaviour are complex matters and my model only covers the tip of the iceberg. So again, please don’t take my model too seriously as it has many limitations.
I chose the elections from 2000 to 2020 for two reasons. Firstly, going before 2000 would make my life so much harder because I would have to find even more historical records for the variables mentioned below. It was already incredibly time-consuming gathering all that economic and demographic data, and including elections prior to 2000 would’ve taken even more time. Secondly, I find that the 2000 election marked the beginning of our current political climate (though you could also argue that 2016 marked the end of it too). Prior to 2000, elections featured prominent third-party candidates, Clinton’s appeal in many southern states, and Reagan’s landslide performances. For these reasons, I stick to elections from 2000 to 2020.
When designing my model, I collected data for the following variables:
Vote share each party obtained at the state and national level
Each state’s historic voting patterns
Whether the incumbent party had been in power for 8 years
Unemployment rate and change in unemployment rate at both the state and national level
Change in GDP per capita at both the state and national level
Inflation and change in inflation
Change in median household income at the state and national level
University of Michigan’s Consumer Sentiment Index and change in the index
Size of each state’s manufacturing sector
Each state’s median age
Size of each state’s urban population
Size of each state’s Black, Asian, Hispanic, and White populations
Size of each state’s evangelical population
Religiosity of each state
Rate of union membership in each state
Each state’s poverty rate
Each state’s educational attainment
Each state’s income decile (compared to other states)
Now that’s a lot of variables and it took me a lot of time to gather all that data! In the end, I did not use all of the variables as my analysis discovered that not all of the variables had a significant effect on vote share. Moreover, a lot of variables are highly correlated, especially those that are interlinked such as national and state unemployment rates. There are, of course, other variables I would’ve loved to include (e.g. favourability ratings) but the process of gathering that data seemed so tedious that it was not worth it.
This data was fed into two separate models, a retrospective voting model and a partisan voting model. The former predicts the incumbent and non-incumbent parties’ vote share whilst the latter predicts the Democratic and Republican parties’ vote share. At the end, these are combined with polling data to create the predictions above.
Retrospective Voting Model
In political science, retrospective voting is the idea that voters make their choice based on a reflection of the incumbent party’s performance. If they performed poorly, in comes a new government. If the incumbent government performed well, they are kindly rewarded with another term in office.
When studying the United States, retrospective voting is particularly useful given its majoritarian decision-making model—best exemplified by the two-party system. In theory, with there being two-parties as opposed to complicated, messy coalitions, it is incredibly easy for voters to identify the political actor responsible for the current social, political, and economic climate. This concept is called ‘clarity of responsibility.’
In my model, I use economic indicators on the state and national level to predict parties’ vote share. However, instead of predicting it for the Democratic or Republican Party, I instead predict the vote share for the incumbent and non-incumbent party. The idea is that if the economic climate is poor, then voters may choose to punish the incumbent party and usher in the non-incumbent party.
I also include whether a party has served in office for eight years as a variable as there is a clear historical trend of parties being booted out of office after eight years.
Partisan Voting Model
This model is simple. It attempts to predict the state-by-state vote share of each party, the Democrats and the Republicans, based on their past performance in each state, the national vote share and a set of demographic data. It is perhaps what most people first think of when they think of an election model.
Bringing it all Together
In the end, my model combines the results generated by the retrospective and partisan voting model to create the most accurate results possible. When used to predict each state’s election results from 2000-2024, the retrospective model succeeded 92% of the time and the partisan model succeeded 93% of the time. However, when combined using calculated weights, the accuracy rises to 96% of the time.
I must again reiterate, this stamp of approval from the past is in no way an endorsement of my model. It only shows that the model is at least rooted in some form of reality, which is a low bar. It is still obvious that my model yields some, if not many, incredulous results.
I also combine my model’s predictions with polling data. This is crucial because my model doesn’t strongly consider the contemporaneous political climate given its focus on economic and demographic variables. It is important to note that polling data is only used for swing states as it is hard to find recent, high-quality polls for safe states. And plus, extra precision is not needed for safe states given that the expected winner is already so obvious. However, the same clearly cannot also be said for swing states.
Lastly, I also run my model at three levels of the national popular vote. The national popular vote is an important part of my model, but the only figures available prior to the election are estimates and projections based on forecasts and opinion polling. As such, I run the model for a popular vote margin that is wider and narrower than expected. For each popular vote margin, I also provide predictions that weigh opinion polling to a low, moderate, and high extent.
Discussion and Limitations
I think it’s quite obvious by this point that I strongly disagree with many of my model’s predictions, which is why I have urged all of you to focus on the prediction based on an underestimated Republican popular vote share and a strong reliance on polls—a prediction that is nonetheless problematic. In this section, I’ll discuss in more depth how my model functions and why I believe my model leads to such a strong lean in favour of the Democrats. Harris ain’t winning Texas, Iowa, Ohio, Florida! And, she’s not gonna get massive margins in swing states.
Methodological Issues
Firstly, my model treats each state and its corresponding presidential result in an election year as a single observation. That gives me in total 306 observations (six elections and 51 “states”) on which my model is based. The problem with this is that the model makes no distinction between states and no distinction between years.
Given that the model makes no distinction between states, it assumes, for example, that changes in economic circumstance would prompt the same result in every state. Or, that median age will always have the same relative impact compared to union membership in every state when predicting electoral outcomes. Is this realistic? Probably not. However, given my limited statistical knowledge, my model unfortunately operates under this assumption that every state behaves the same.
Changes in Voting Behaviour
As mentioned, my model also doesn’t make any distinction between election years, given my level of statistical knowledge. This was, again, just a fun project for me where I wanted to construct some model, not necessarily the best model.
This lack of distinction between years may not seem like a major problem at first. After all, isn’t the goal of an election model to predict the future based on what we learnt about the past? However, the inability to distinguish between years leads to a critical problem: inability to account for voting behaviour change.
Undeniably, over the past few decades there has been changes in voting behaviour. From changing cleavage systems, the rise of values-based voting, increased cognitive mobilization to partisan dealignment, voters do not vote the same way anymore. And, these changes happen over time.
To give you a more concrete example of why this is problematic, consider the role of union membership as a predictor of electoral outcomes. Perhaps, in the past, union members did have a strong partisan affiliation and states with more union members could be expected to vote a certain way. However, in the present, that strong partisan affiliation might be weaker. Unfortunately, my model would not capture this change as it does not distinguish between years.
Quality of Data
There are two factors at play here: accessibility and recency.
Firstly, let’s start with accessibility. It is definitely likely that the data I seek does exist. However, that does not mean it is always easily accessible. Some of this data may be buried deep inside other regularly conducted surveys that are not as easily accessible as US Census data. As such, for some variables, I chose to interpolate and extrapolate values based on the data and trends I did have. Moreover, inaccessibility also limited the variables I could work with. One possible reason for the evident Democratic-tilt, in addition to those that I have already mentioned, is that there is some omitted variable I have not accounted for.
Secondly, there’s the problem of recency. The year is currently 2024. However, it’s only September. That means a lot of data I seek is not yet available for 2024. Data from this year’s American Community Service will only be released next year! That means that a lot of the data I used to predict the election results is not recent and instead from 2022-2023. Obviously, this limits the model’s accuracy.
Partisan Voting Index
Besides being dependent on the national popular vote, the model also relies strongly on each state’s historic voting patterns. I construct my own partisan voting index using past election results. Upon a review of my data, I find that perhaps I may have weighed past elections a bit too much, which explains why old swing states like Iowa and Ohio are now in contention.
When designing my own index, I did review the Cook PVI which is perhaps one of the more popular, well-known measures. However, I was not a massive fan of it only relying on the past two elections. (Also, you have to pay to access the data as a .csv). It does seem though that I overcorrected too much as these are the index values I have for some key states:
Arizona: R+3.03
Florida: R+1.91
Georgia: R+3.36
Iowa: R+5.02
Michigan: D+3.93
Nevada: D+3.44
North Carolina: R+2.44
Ohio: R+5.44
Pennsylvania: D+1.97
Texas: R+9.04
Wisconsin: D+2.04
In the coming week, I think I will definitely try to tweak my calculation s for the index. However, two states that seem problemsome are Arizona and Georgia. If I weigh the 2020 election more than I already do, then Democrats would only be favoured more and I already find the predictions in both states to be unrealistic. But, it is perhaps more important that the overall results be more realistic than just those two states in particular. When I do devise a new index, I will definitely re-run the model and publish the new predictions.
If you’ve gotten this far, thanks so much for checking out my election model! If you want to check out the full results in a spreadsheet, click here. My hope is to update the model weekly, so please check back from time to time.
If you have any questions or comments, please feel free to message me @jandthejuls on literally any social media platform (i.e. Twitter, Instagram, Discord).