You Can't Trust Most Polls or Surveys

Internet polls are fun but rubbish. Formal surveys conducted by so-called experts aren’t much better. If you disagree, consider this: A poll I conducted for Betanews asking “How would you identify yourself as a computer user?” puts more than 25 percent of respondents as Linux PC users and less than 61 percent as Windows PC users. Do you believe that? I don’t. But I do believe, as early results indicated, that there are more Betanews readers identifying themselves as Linux PC users than Macheads. But more than one-quarter are Linux users? Perhaps in some alternative universe, but not this one.

The polling started innocently enough. On November 12th, I blogged: “This film is rated PC: No Macs were used in the making of this video“, praising a Microsoft marketing video. I also inserted the aforementioned poll. Respondents had four choices: Windows PC, Macintosh, Linux PC and Other. Three days later, I awoke to 682 votes, with 507 for Windows PC, 77 for Linux PC, 70 for Macintosh and 19 for Other. That worked out to about 76 percent Windows PC, nearly 12 percent Linux PC and more than 11 percent Macintosh. The Windows PC response was a little lower than I expected, but not by much.

An Unbelievable Result
Linux PC surprised me, so I embedded the same poll in new post: “Do more Betanews readers use Linux PCs than Macs?” I predicted: “This post may marshall the fanboys and skew further results”. That same day, November 15th, a poll commenter simply identified as Jesse responded (comment grammatically corrected):

I have a feeling this poll has massive amounts of bullshit in it, provided by Betanews’s core audience of 13 year-old Xbox Live kids. More people browse the Web on iOS than Linux, so honestly it’s not even a question. Would someone would mark Linux when they’re on Windows just to make Apple look worse? Probably. You people are that sad.

I don’t agree with the “sad” dig at my readers, but Jesse and I are otherwise in agreement about the results being skewed—well, with a caveat I’ll explain in a few paragraphs. As I write there are 1,961 votes:

  • 1,191 for Windows PC (60.73 percent)
  • 496 for Linux PC (25.29 percent)
  • 236 for Macintosh (12.03 percent)
  • 38 for Other (1.94 percent)

That’s a fairly good size sampling, but it’s unqualified. I don’t know who the people responding are. I also don’t have handy information on which Websites or forums link to the poll. Could there be a rallying among Linux blogs and forums? The poll uses cookies to prevent repeat voters, but it wouldn’t take much tech savvy to get around that. PollDaddy provides just basic tools, even with my $200/year Pro account, for analyzing data. IP filtering is revealing. There are 1,807 IP addresses, with the largest number of votes (22) coming from bellsouth.net string. Microsoft.com accounts for another 7 IPs.

Geographic analysis is surprisingly useful. Only 958 responses are from the United States. Nearly 64 percent for Windows PC, about 19 percent for Linux PC and 14.5 percent for Macintosh. Canada: 158 responses. United Kingdom: 132 responses. If you believe the Netherlands’ 16 votes, then an equal number—43.75 percent—of people identify themselves as Windows PC users and Linux PC users. Bulgaria’s 59 votes come out to 83 percent Linux PC users (I just might believe that). To my surprise—and I see this as good finding—people from 94 countries responded to the poll.

Questioning Polls and Surveys
Now for that caveat: I didn’t ask which PC operating system people use but how they identify themselves as personal computer users. The poll is specifically meant to measure sentiment, which is about the only value I see coming from any poll or survey. The results reflect respondents’ attitudes rather than what they actually use. But sentiments of whom? That’s the data problem with this poll and many others like it.

Internet polls are suddenly the rage, and the results are easily shared on social networks (yeah, yeah, Facebook). Just click “Like”. The results are too easily believed as they spread. But the data isn’t necessarily representative of anything, and results can be manipulated. Take this post. Based on the data I could have written here or at Betanews something sensational about the surprisingly large number of Linux users or dwindling number of Windows users. I’ve got the data, backed up nearly 2,000 respondents from more than 90 countries. That makes the poll seemingly credible. But it’s not. I know from my everyday dealings, where I often either observe or ask what operating systems businesses or consumers use, the majority run Windows. Virtually no one uses Linux. If by some strangeness, I’m wrong, I just passed up one of the biggest tech stories of the decade. But I’m not wrong, because the data is incomplete and respondents haven’t been properly vetted.

I see poll or survey data being manipulated or misreported nearly every day. Sometimes the fault is the interpretation (by the pollster or people reporting/blogging about it) or the actual poll or survey (Web metrics data is among the most problematic). For online polls, people self-select to take them. Good pollsters weight the data to compensate for how self-selection skews the data, but if the data is relatively clean why should math massaging be necessary? Phone polls/surveys can be fairly random in their representation of the target populace, unless the respondents have been prequalified. Sentiment is another problem, because it can change, sometimes dramatically. Imagine a poll taken about Americans’ attitudes towards muslims on Sept. 10, 2001 and one taken two days later.

CBS News’ Airport Scanner Poll
On November 15th, CBS news posted story “Poll: 4 in 5 Support Full-Body Airport Scanners“. The headline actually misstates the data. CBS News asked 1,137 U.S. adults by telephone: “Should airports use full-body airport scanners?” Eighty-one percent said yes. But agreeing that airports should use the scanners isn’t the same thing as supporting them. It’s this kind of nuance that a comprehensive survey would seek to reveal. For example, I may believe that Barack Obama should be president but not support all of his agenda. You might answer a poll saying the government should use whatever means necessary to fight terrorism, but that wouldn’t necessarily mean supporting surveillance of you or your neighbors.

Timing is important, too. CBS News conducted the poll November 7-10, just as full-body scanners were coming to major airports—San Diego’s scanner(s) arrived in August, among 11 airports planned for this year. Presumably, most respondents haven’t been through a full-body scanner. How would they answer if asked November 29th, right after the busy Thanksgiving travel weekend?

CBS News’ poll result is startling. For most of November there have been news stories every day regarding conflict and controversy about full-body scanners (Then there was the high-profile incident here in San Diego just last week). The U.S. Senate held hearings on airport scanners just two days ago. Has CBS News uncovered some media conspiracy? The news stories suggest people are pissed about the scanners. In contrast, the poll indicates most Americans believe that airports should use the security devices. A broader, well-crafted survey might sniff out the differences and truly be newsworthy. So there remains uncertainty between the poll and the news about the extent of Americans’ outrage or acceptance of full-body scanners. That the poll raises the question but offers no real answer makes the findings unreliable.

Can you really trust polls or surveys? My answer is no for the majority of them. In a future post, I’ll offer tips on how to conduct reasonably reliable polls or surveys.

Photo Credit: Greg Grieco/Penn State University

Do you have a story about polls, pollsters or surveys that you’d like told? Please email Joe Wilcox: joewilcox at gmail dot com.