Riding Numbers: Which model?

Saturday, 11 December 2010

Which model?

It happened just last month, a paper of ours submitted to a journal got rejected mainly because of many non methodological problems, but there was one critic related to our regression model. Since our dependent variable was a sort of count, we were asked to explain why using a linear regression model instead of a Poisson model.

We noticed the highly skewed nature of the dependent variable, so we applied the logarithmic transformation before fitting the linear model. To make coefficients more easily interpretable we exponentiated them and presented the results with confidence intervals.

Upon receiving the comment from the publisher reviewer, we decided to go back and readjust the data using the Poisson model. It turned out that the Poisson model was not adequate because of overdispersion, so we fitted the Negative Binomial model instead, with log link. It worked nicely but for our surprise the coefficients were nearly identical to the ones from the linear model, fitted with logarithmic transformation of the dependent variable.

It feels good when you do the right thing and you can say "I used a Negative Binomial Model", it looks like you are a professional statistician using high level and advanced models, the state of art type of thing, despite of the fact that this model is around for quite some time and what makes it to look good is, perhaps, the fact that the linear regression model is by far almost the only think know outside of statistical circles.

But my point is that to me it is not really clear how worth is to go to the Negative Binomial model when conclusions will not be different from the simpler linear model. Maybe it makes sense to take the effort to go the most advanced way in a scientific paper, but we also have the world out there where we need to be pragmatic. Does it pay to use the Poisson, Negative Binomial, Gamma or whatever model in the real world or it ends up being just academic distraction? Maybe in some cases it does pay out, like if you think about some financial predictive models out there, but for most part my impression is that we do not add value by using more advanced and technically correct statistical methods. This might seem a word against statistic but I think it is the opposite. We, as statistician, should pursue the answer to questions like this, so to know when it is important seeking more advanced methods and when we might do just as well with simple models...

2 comments:

Will Dwinnell said...: I find your critic's question about not using Poisson regression interesting. I am not the first person to observe that traditional statisticians tend to worry about matching "appropriate" models to statistical assumptions, and data miners worry more about out-of-sample results.

My point is that I (a data miner) might have also asked what alternative models were attempted, but not because I thought that such-and-such model (in this case Poisson regression) should be used, but because other models might outperform the one you tried.; 12 December 2010 at 14:14
Paulo said...: Fala Marcão! Quase 1 ano depois eu leio esse post. Gostei bastante e até enviei ele pelo meu twitter.
Abraços.
Paulo Nogueira Starzynski; 8 November 2011 at 11:40