Saturday 25 August 2012

Statistical Tests and the real life

The interpretation of Statistical Test is perhaps one of the most misunderstood things considering the more routine statistical applications. And among the misinterpretations is the one about the result of a statistical test that fails to reject the null hypothesis and is almost always interpreted as evidence in favor of the null. We all know that not rejecting the null does not mean it is correct in most cases. Now, what is interesting is that this idea is also common in our daily life, outside of the scope of statistical tests, and there it is misinterpreted as well.

A usual example that I use on explaining classical Hypothesis Tests interpretation is the Court Room. In short, if someone is not found guilty it does not mean the person did not perpetrate the crime, but that there is no evidence of it. Therefore the person is considered innocent, but the innocence is often not proven since this is not the goal of the process. The right thing to say is that there is not enough evidence that the person committed the crime.

This week the news about Lance Armstrong's refusal to fight USADA on charges on doping is all over the media. And I was surprised to see this article from a large US newspaper incurring in such a basic mistake - the idea that not testing positive means that Lance did not use illegal drugs. Here again we have the same idea of non significant statistical test in a non statistical setting. Yes, Statistical reasoning is important for our daily life.

Here the failure to reject the null hypothesis (negative drug test) does not mean much for more reasons that lack of power. But the power is an important one, we do not know and it is not easy to find the "power" of these tests, how much a negative test means. Part of the problem may well be because there must be different tests for different drugs, each with their own "power". But the negative result can happen even with a guilty athlete for others and perhaps more relevant reasons. For example, the simple fact that there is no test for the drug used or that the athlete developed some way to use illegal drugs in a way the test is not prepared to detect.

I always like to look at the comments in such article (to tell the truth I could not finish reading the article, the idea that the writer is thinking that negative tests are proof of innocence just drove me away from it) and there you can see folks making the point I am making here, trying to correct the fallacy. But others and maybe most of the readers will take it as the truth and will use the ideas to make a point in favor of Armstrong. This way even a flawed text like this can become widespread making the bad science and perhaps the bad statistical analysis, also widespread. So much so that I found this text because someone retweeted it...

Tuesday 14 August 2012

Randomized Trials and Public Policy

There is here a very interesting paper that talks about using more Randomized Controlled Trials before making decisions on policies. The paper is easy to ready, easy to understand, as I think it should in order to better spread the idea. Well, it is not really an idea, new stuff, or anything.

It is my perception that that RCTs are underutilized in a word where causal analysis is widespread. But while causal methods for observational data are all over the place, the RCTs seem to be limited to the medical research and forgotten,not considered, sometimes unknown of, in other fields. I think the same way RCTs are demanded for drugs developments, the same should be true for policies development.

Saturday 4 August 2012

Cohen's d

A often overlooked thing in statistical analysis is the meaningfulness of effect sizes. Usually when comparing means or proportions we do a statistical test and not even thing about how meaningful is the observed difference, whether of not significant. Many times, in presence of good power, meaningless effects will be flagged as being significant.

Cohen's d is a standardized effect measure that allows us to make some assessment of the size of the observed effect in practical terms. It is just the difference in means divided by the standard deviation of the sample. Notice that we are not talking about the standard deviation of the mean or of the difference of means, but the sample itself. The idea is to access how large is the effect in light of the natural variation observed in the data.

Usually Cohen's d will be lower than 1 in absolute terms and values around 0.5 and above are taken as practically important. If we think in a broad and approximated terms and consider the data as having normal distribution we have that the variation of the data is about 4 standard deviation.  If we have a intervention that can cause things to change by one standard deviation (Conhen's d = 1), it makes sense to think this is a pretty big effect. And it does not matter too much what we are talking about, meaning it is so for different variables and different studies.

This calculation of effect size is totally missed in Marketing Research and is most common in fields related to medicine. I am already having some ideas about testing Cohen;s d on Segmentation analysis to understand how segments differ in a more meaningful way.

A short non technical paper with some more technical references about the subject is this one.