Sunday, 25 July 2010

Survey Subject and Non Response Error

Statisticians usually have a skeptical opinion about web surveys, the most common type of survey nowadays in North America. That is because there is virtually no sampling theory that can accommodate the "sampling design" of a Web Survey without making strong assumptions well known not to hold in many situations. But in a capitalist world the cost drives everything and drives also the way we draw our samples. If you think about the huge economy a Web survey can proportionate you will agree that the "flawed" sample can be justified - it is just a trade off between cost and precision. Hopefully those who are buying the survey are aware of that, and here is where the problem is, as I see it.

The trade off can be very advantageous indeed because it has been shown that in some cases a Web Survey can approximate reasonably the results a probabilistic sample. Our experience is crucial on the understanding of when we should be concerned with web survey results and how to make them more in line with probability samples. This paper, for example, shows that results from attitudinal questions from a web panel sample are quite close to a probability survey. But no doubt we can not hope to be able to use the mathematical theory that allows us to pinpoint the precision of the study.

Challenges are never enough though. In an effort to reduce the burden of the survey to the respondent, researchers are disclosing the subject of the survey before its beginning so that the respondent can decide whether of not to participate. The expected time of the survey is another information often released before the survey starts. Now the respondent that does not like the survey subject or does not feel like responding about it can just send the survey invitation to the trash can. But, what the possible consequences of this are?

Non response biases can be huge. Suppose, for example, a survey about groceries. Pretty much everybody buys grocery and would be eligible for participating in such a survey but possibly not everybody like to do it. The probability that someone that likes to shop for groceries has to participate in such a survey, if the subject is disclosed, we can argue, is much higher than the probability of somebody that thinks it is just a chore to participate. If the incidence of "grocery shopping lovers" in the population is, say, 20%, it will likely be much higher in the sample, without any possibility of being corrected through weighting. Shopping habits and attitudes will likely be associated to some extent to how much people like to shop and this will likely cause a bias in the survey.

Disclosing the subject in a web survey is almost certainly the same as biasing the results in a way that can not be corrected or quantified. Yet again the technical aspect of sampling is put aside without fully understanding its consequences. Yet again the accuracy of estimates is second to the business priorities and yet again we face the challenge of making useful the results of surveys that cross unknown boundaries...

No comments: