How to predict an election result

While it is undeniable that Barack Obama won the 2012 US Presidential Election, many people believe the real winner was the discipline of Statistics - or Big Data or Quantitative Analysis or whatever you want to call it!

Most prominent among a number of successful election forecasters who used quantitative techniques was Nate Silver of the New York Times' FiveThirtyEight blog.

Nate Silver's election predictions

Before the 2008 election, Silver, who has a background in Sabermetrics (the pointy end of sports statistics applied to baseball) provided forecasts for each of the 50 states in the USA electoral college. To the chagrin of the many so-called experts and political pundits, he got the result correct in 49 of the 50 states.

In the lead up to the 2012 election, many of the pundits and news media in general were claiming the election was 'down to the wire', 'neck and neck' and 'too close to call'. And they were lining up to give Silver a metaphorical beating.

"Nate Silver could be a one-term celebrity" said one commentator. Others described him as "overrated" and "smoking the wacky weed".

One conservative commentator rallied against Silver's modelling, which was predicting a greater than 90% chance of an Obama win: "anybody that thinks that this race is anything but a toss-up right now is such an ideologue, they should be kept away from typewriters, computers, laptops and microphones for the next ten days, because they're jokes."

Unfortunately, Silver (and other like-minded statistical modellers) had the last laugh. In a triumph for data analysis over "gut feeling", Obama was observed to be the clear winner, while Silver got the electoral college result correct in all 50 states.

So how did he do it?

According to the methodology published on his blog and elsewhere, Silver uses a number of techniques familiar to undergraduate statistics students. Working state-by-state, he uses a weighted average of all the available polls. Polls that are less recent, historically less reliable, or use a relatively small sample size receive a lower weighting. The averaged poll results are then adjusted for bias, recent trends in national polling and voter likelihood (voting is not compulsory as it is in Australia).

In addition, certain political, financial, religious, ethnic and demographic factors are also combined using regression analysis to give an "ideal" poll result for each state. This ideal poll is combined with the actual poll results and is used to stabilise the predictions, particularly in states where actual polling data is lacking.

In the run-up to polling day, each day's predictions are updated. Monte Carlo simulation is used to run a "virtual" US election 10,000 times. In this way, the (average) predicted electoral outcome and, most importantly, the margin for error, can be calculated.

The success of this approach is there for all to see.

Many observers are now predicting that the media coverage of US politics will be changed forever. The nerds have conquered Washington!

Dr. Ian Grundy - Program Leader Undergrad Maths and Stats Programs (RMIT)

comments