Saturday 14 August 2010

Interesting figures

When analyzing cross-sectional surveys I am well familiar with the "bias" of significant results caused by the excess of search for them. Negatives and neutral results are ignored and the positive results are always reported. With thousands of possible results in a simple survey it is obvious that the Type I error will happen quite often.

When testing a statistical hypotheses the theory tells us that we should first establish the hypothesis, then collect the data and finally test the hypothesis using the data. If we look at the data prior to establishing the hypothesis the probabilities associated with statistical test are no longer valid and significant results might easily be only random fluctuations.

Ok, related to that, I read today in this book (See page 101) about a comparison between randomized studies and not controlled ones. I could not access any of the papers on table 3.4.1, but the results shown are expected anyway. If studies are not controlled many more positive results will be found, compared with randomized studies where the statistical theory of experimental design is followed as close as possible. If statistical hypothesis and statistical procedures needed to test them are not decided upon up front as is usually the case of randomized trials, lots of room is created for human interference either by changing hypotheses to others that are confirmed by the results or by making use of confounded results in favor of the seek results.

I think this table shows the importance of planned experiment on testing hypothesis and how careful we should be when analyzing non experimental data. At the same time we need to recognize that experimental studies are not always possible because of costs, time or even because they are not really possible. This makes the analysis of cross-sectional data necessary and very valuable. In causal analysis, though, we need to be aware of the drawbacks of analyzing this type of data.

Saturday 7 August 2010

New book

I started reading another book about Clinical Trials. At least I heard better comments about this one. Design and Analysis of Clinical Trials is a book written in a more professional language and it seems to explain Clinical Trials in depth, covering much broader areas than statistics.

I read the first chapter so far and it talks about clinical trials regulatory rules of FDA in US. It describes the structure of FDA and the steps it takes to have a new drug approved in the marketing. It also talks about length of the process and it phases. I was surprised that the average time spent on drug development before it is launched in the market is about 12 years. It is a lot of time and a lot of money. This is quite interesting for anyone who wants to understand this process of getting a new drug in the marketing and if the content is not enough there are references.

The second chapter is about basic statistical concept and I was less than happy with the way the authors explained variance, bias, validity and reliability. I have some disagreements with the text, like using variability interchangeably with variance. To me the text does not explain these very important concepts in a satisfactory way. One might think that the book is directed to non math people so it should not go deep into these things, but I disagree; I think it might open a door for bad statistics practice, as if we did not have enough.  The other example is when they talk about the bias caused by the fact that the trial was planned to take place in three cities but it turned out that one of them did not have enough volunteer, so they increased the sample in the other two cities. They say this might have caused a bias. But they do not define clearly the target population, the target variables to be measured, the reasons of the bias when excluding one city. To me, if we are talking about clinical trials to test a new drug, we have to define the target population for this drug to begin with. It is the set of people that have the disease X, say. Of course these folks are not only in three cities, so how do you explain that sampling only three cities does not bring a bias? It is possible, if they define the cities as the only important target for the new drug, but I don't think this is likely to be reasonable. They could also lay down their argument as to why these three cities only would represent fairly the US for this given health condition. Anyway, they talked about biases but very important things were left aside and this is a very important concept when sampling under restrictions (I see clinical trials as something that can almost never have a good probabilistic sample, so concepts like biases are crucial).

I liked the first chapter and so far what is in the second chapter is not what I was expecting to learn from the book, so I will keep reading it. The topics ahead in the book are of great interest to me, like type of designs, randomization, specific type of trials for cancer, and so on. I will try to comment as I read the chapters.