Saturday, 14 August 2010

Interesting figures

When analyzing cross-sectional surveys I am well familiar with the "bias" of significant results caused by the excess of search for them. Negatives and neutral results are ignored and the positive results are always reported. With thousands of possible results in a simple survey it is obvious that the Type I error will happen quite often.

When testing a statistical hypotheses the theory tells us that we should first establish the hypothesis, then collect the data and finally test the hypothesis using the data. If we look at the data prior to establishing the hypothesis the probabilities associated with statistical test are no longer valid and significant results might easily be only random fluctuations.

Ok, related to that, I read today in this book (See page 101) about a comparison between randomized studies and not controlled ones. I could not access any of the papers on table 3.4.1, but the results shown are expected anyway. If studies are not controlled many more positive results will be found, compared with randomized studies where the statistical theory of experimental design is followed as close as possible. If statistical hypothesis and statistical procedures needed to test them are not decided upon up front as is usually the case of randomized trials, lots of room is created for human interference either by changing hypotheses to others that are confirmed by the results or by making use of confounded results in favor of the seek results.

I think this table shows the importance of planned experiment on testing hypothesis and how careful we should be when analyzing non experimental data. At the same time we need to recognize that experimental studies are not always possible because of costs, time or even because they are not really possible. This makes the analysis of cross-sectional data necessary and very valuable. In causal analysis, though, we need to be aware of the drawbacks of analyzing this type of data.

No comments: