Vote Chooser

Quiz Candidates Issues Poll Results

Research

The data collected from the survey on this site supports research done at Carnegie Mellon University. Our research group has developed a novel and patent pending method of combining data from traditional RDD (random digit dialing) surveys with data from online surveys.

Polling companies, such as Gallup, traditionally survey people by calling random telephone numbers and then using sophisticated statistical techniques to produce accurate results. This is called RDD (random digit dialing). The downside of this approach is the cost of hiring many telephone operators to perform this complex procedure. Online surveys could revolutionize the polling industry by making surveying much easier and cheaper. Lots of data can be collected simply by placing a survey online. VoteChooser.com has collected 1.5 million surveys since January 1, 2008. By contrast, the typical telephone survey covers a few thousand people.

The disadvantage of online surveys that do not have a well controlled sampling procedure is that they may have bias. The people taking the survey are self-selected and therefore may not be representative of the total population. For example, it is well known that Internet users tend to be more wealthy and better educated than the general populace.

We have developed a novel technique that uses artificial intelligence to combine data collected from RDD surveys with data collected from online surveys in order to gain the advantages of each method: the accuracy of RDD surveys and the lower cost of online surveys. Suppose we want to determine the percentage of people who think that the Iraq war was a good idea in each of the 50 states. We could do an RDD survey in all 50 states, but this would be expensive. We could do an online survey in all 50 states, but most online surveys are less accurate than RDD surveys. The new method we suggest is to perform an RDD survey in a fraction of the states, say 10 of them, and an online survey in all 50 states. Then a machine learning algorithm may be trained on the 10 states where both the RDD data and online survey data have been obtained. The machine learning algorithm learns the relationship between the two datasets. As an example, it could be the case that the online survey data consistently under predicts the number of people who think the Iraq war was a good idea - one possible reason for this is that Internet users tend to be younger and younger people tend to be Democrats rather than Republicans. The machine learning algorithm would be able to recognize this bias in the online survey data and correct for it. After the machine learning algorithm has been trained, it can be applied to the remaining 40 states. The online survey results for the 40 states are passed into the algorithm, and the algorithm predicts what the result from an RDD survey would have been, even though no RDD survey was taken in those states.

The following graphs were created from real data. The first graph shows a set of data from 10 states. The x-axis indicates the percentage of people thinking the Iraq war was a good idea according to our online survey, and the y-axis indicates the same thing according to an RDD poll by CBS News. The trend line is created by linear regression and represents the machine learning model that is learned from the data.

The second graph shows this line along with data from 13 states that were NOT used to train the machine learning model. The green line, which predicts the CBS News results from the VoteChooser.com results based on the machine learning model, is a good fit for these 13 states. The red line, which shows what results would be predicted for the CBS News results if we used the VoteChooser.com results without machine learning, is clearly a worse fit to the data. In fact, using machine learning has reduced the error per state from an average of 9.87 percentage points to 5.66 percentage points.

Our method of using artificial intelligence to combine RDD surveys with online surveys has proven to be effective in preliminary studies of several of th questions on this site. We are conducting further experiments to explore the method further. It is patent pending with patent application number 61/189,451.