Wednesday 3 July 2013

Confidence Intervals through simulation

A paper that appeared in the last edition of The American Statistician discuss the construction of confidence intervals for functions of parameters using simulations. Confidence intervals are becoming more important in the sense that they are seen as giving more information about the parameter than hypothesis tests, and I tend to agree a lot with this trend. I think presenting confidence intervals, especially using graphs, is usually a very nice way of showing inferential results, allowing us even to disregard p-values.

So, it often happens that one needs to estimate these intervals for functions of parameters. For example, in a recent study we needed to estimate the effect in percent in a Regression Discontinuity model. Basically we wanted a confidence interval for the ratio of two regression coefficients.

The first thing that comes to my mind is the Delta Method, where you can use derivatives of the function of the parameter to find confidence interval for the function. It is fairly easy to calculate if the derivatives can be quickly calculated analytically. For example, in the univariate case, the variance of g(theta) = sigma*g'(theta)^2. Of course this is just an approximation and it depend on some assumptions, but usually works quite well.

The second thing that comes to mind is bootstrap, which is usually fine, but it comes with the price of some time spent on programming.

The paper talks about a third option, which I hadn't heard about. The basic idea is that if we want a confidence interval for g(theta), then we can estimate theta (say, theta_hat) using our sample and its variance, and assuming its distribution converges to Normal, we can generate Normal distributed noises, with mean zero and variance = var(theta). We generate many of these noises, say n1, n2, n3, ... nk, where k may be 10000. Now for each simulated noise i, we calculate g(theta_hat + ni) and this simulation will give us a confidence interval for g(theta) that is usually less expensive computationally then bootstrap.

The paper covers more thoroughly the theory and assumptions of the methods as well as cases where it may not work well, so in any case it is interesting to read the text, as my description is quite summarized...

No comments: