Introducing, The OxMethod.

In 2004, I ran a site called MyElectionAnalysis.com.  I used a method somewhat similar to that now used by Nate Silver at fivethirtyeight.com.  I am now resurrecting it. 

It is basically a stripped down version of fivethirtyeight.  I use only polling numbers.  There is no trendline analysis, nor is there (importantly imho), and demographic fiddling around.  The reason is simple -- in the Democratic primary, a lot of the variables were nonlinear (eg Obama did well in heavily R and heavily D counties), and unsuited to an OLS model.

Second, a lot of the sixteen variables he uses strike me as related to each other (eg Kerry's vote performance and AA%).  This causes problems in an OLS model. 

Third, I'm not certain how useful some of them are -- such as donations -- given that Obama has actually outraised McCain in Arizona this year. 

 Finally, I think his weighting by sample size, while making some sense, results in some bad weighting.  For example, PPP -- a partisan polling outfit that uses 1,000 respondents -- is weighted more heavily than the historically very accurate Rasmussen Reports, which only uses 500 respondents.

Unlike RCP, I do weight the polls, and I explain the methodology at the bottom.

 

Photobucket

 

 

McCain trails Obama narrowly -- 276-262.  The closest state is CO, where McCain trails by a narrow .06%.  In the next couple days this will likely flip to McCain, as older polls continue to cycle out.  This would give McCain the lead.  At this point in 2004, by the way, Kerry was leading with over 300 electoral votes in his pocket.

Weighting the state polls by their turnout in 2004, Obama has a 46.3%-44.3% popular vote lead.  This is consistent with the weighted average of national polls which show Obama winning 45.5%-42.9%.

Interestingly, if you normalized the 2004 map to a Kerry-Bush national tie (eg moved all results 2.5 points in the Dem direction) and normalized the 2008 map to an Obama-McCain national tie (eg moved all results about 2.5 points in the Rep direction), you end up with a remarkably stable map.  Only four states move more than 6 points toward the Republicans (accounting for a 3% MOE) -- MA, LA, MD, and AZ.  Quite a few more states move more than 6 points toward the Democrats -- TX, VT, NE, IN, SD, WY, AK, HI, MT, ID, ND, and UT -- representing the relative movement of the Mountain West, plus VT and TX (in large part probably due to the absence of favorite son Bush at the top of the ticket) toward Obama. 

22 states are within 3 points of their normalized 2004 vote result.

Anyway, here's where the states stand.  I'll be updating every week or so, to see trends:

 

  8/21
AL 17.7
AK 4.4
AZ 13.2
AR 10.0
CA -15.9
CO -0.1
CT -15.5
DE -9.0
FL 2.7
GA 8.8
HI -30.0
ID 13.0
IL -14.5
IN 5.9
IA -6.6
KS 15.8
KY 16.0
LA 18.2
ME -12.2
MD -10.0
MA -14.5
MI -4.1
MN -6.0
MS 10.6
MO 4.9
MT -1.0
NE 18.9
NV 1.4
NH -1.8
NJ -9.4
NM -6.0
NY -18.3
NC 4.5
ND 1.8
OH 3.0
OK 32.0
OR -7.0
PA -5.6
RI -24.9
SC 11.3
SD 4.0
TN 15.0
TX 8.6
UT 19.0
VT -34.0
VA 0.6
WA -10.5
WV 8.0
WI -7.3
WY 19.0

Here is a more detailed explanation of my methodology.

 

Assumptions

My Presidential model proceeds upon the assumption that there are two relevant factors when evaluating a poll: the poll’s accuracy and its date. Putting it another way, assuming two polls are taken at the same time, you would obviously give greater weight to the one that was historically more accurate. However, if the historically accurate poll occurred two months ago, you might give greater weight to the less accurate poll.

Basic Methodology

For those not interested in the details of the formula, this gives a brief overview of how that formula works. Basically, all polls for a particular state are collected. There are a number of places that you can get these polls from; aside from checking the websites for major polling companies like Survey USA ("SUSA") , there are also plenty of other sites that collect them (e.g. RealClearPolitics and Hedgehog Report.)

After examining the polls, each poll result is assigned a weight based on the historical performance of the polling company. The poll is also weighted by date, with the weight declining in such a manner that all polls older than 60 days are given no weight. Averaging the weighted numbers gives the candidate’s overall performance. There is no subjective reasoning; even if the weighted poll results seem counter-intuitive, they are what I use. As a result, although the latest poll results get the most weight, they aren't dispositive, and the state designations change more gradually than at other sites. After the Republican Convention, I will shorten the time period used to 30 days. In the last three weeks, I will shorten the time period to 15 days.

Specific Methodology

This poll incorporates these ideas by weighting polling data by accuracy and by date. The poll accuracy weight is determined as follows:

  • The poll is assigned a weighted accuracy score based on Survey USA's Election Scorecard. There, SurveyUSA has used a statistical analysis comparing its polling results to the results of other polling companies, where the two have polled the same race.

     

  • Consider a hypothetical polling company: "MetaPoll." The weighted accuracy formula is determined by combining two comparisons with SUSA. First, it looks at the difference between MetaPoll's predicted spread and the actual spread. In other words, if MetaPoll predicted Kerry +5 and the result was Bush-1, MetaPoll's error was 6 (its an absoulte value). An average is taken of all of these errors, to give MetaPoll's average error on the spread.  Thankfully, SUSA computes this for many major pollsters. I then also look at the difference between MetaPoll's predicted candidate share and the actual share. Thus, if MetaPoll predicted Kerry to get 54% of the vote, and he got 52% of the vote, the error would be 2%. taking the difference between MetaPoll's mean error and SUSA's mean error. This rewards companies that get the spread right, and that get close to the actual results. Because a poll that predicted Bush to win 33% to 30% got the spread right, but was pretty much junk in my opinion. I then average the two errors. I then subtract SUSA's error in contests polled by both SUSA and Metapoll. In other words, if MetaPoll had a mean error of 8.26, and SUSA had a mean error (when competing with MetaPoll) of 3.96, the result would be 5.3. Survey USA is subtracted against itself, for a value of 0.

     

  • In order to give the baseline a value of 1, one is added to all the results.

     

  • The problem with this range is that it would actually weight the more inaccurate polls more heavily when multiplied against the result. To correct this, the inverse of the number is taken. In order weight polls that are closer to SUSA's results, the cube root of the number is also taken. This results in a spread from .47 (Marist) to 1.14 (Gallup).

     

  • A few polls do not fit into this methodology. These are polls that perform much better than SUSA. Because the mean difference between the two results in a negative number here, their results after going through the formula are negative as well. In order to correct for this, they are assigned weights of 1.3, which places them about as far above Survey USA as a poll that was a similar degree worse (2 points or so) would be below SUSA.

     

  • Finally, the weight is multiplied by 100. This is of no consequence, as it is a linear operation that is later factored out in averaging. It just makes the number easier to remember. :-)

     

  • As a final matter, some polls do not have a reference. These polls are arbitrarily assigned a weight of 40, which is somewhat less than the worst poll. A partisan poll (e.g. a company that only polls for one party) is assigned a 30. Online polling is ignored (eg Zogby Interactive)

Computing the time weight is much easier. It is (for now) simply 60 minus the number of days old a poll is. Thus, a 2-day old poll is weighted at 58, while a 30-day old poll is at 30. As a result, polls that are within a month are weighted roughly the same. After that, however, the relative weight of the poll drops off dramatically.

 

The two weights are multiplied together, in order to get the final weight for a result. The technique for taking a weighted average is defined here.

0
Your rating: None

Comments

Snore factor

Look, give us all a break and do the thing that really matters:  Give us numbers based on only LIKELY VOTERS.  Until then -- I'm snoozing.

Thanks Sean

I am looking forward to your reports.

Your technique for taking a weighted average link is broken.