Sunday 19 December 2010

ARM Chapter 3

Data Analysis using Regression and Multilevel Models is one of the books I liked. I think it is a different book on Regression Models, one that focus on the modeling part more than in the mathematical part. I mean, it is for sure important to have the mathematical foundation, even more so now with all these Bayesian models coming around. But when using regression for causal inference, I feel that for most part the lack of modeling expertise rather than mathematical expertize is what drives the low quality of many works. I am reading the book for the second time and I will leave some points I think is important here.

Chapter 3 is about the basics of regression models. It does a good job talking about basic things, like binary predictors and interactions, also using scaterplots to make things visual. These are some points I would like to flag:

1 - Think about the intercept as the point where X is zero, this has always helped me interpret it. Sometimes X =0 will not make much sense and then centralizing the predictor might help.

2 - One can interpret the regression in a predictive or counterfactual way. The predictive interpretation focus on group comparison, sort of exploring the world out there, what is the difference between folks with X =0 and folks with X =1. The counterfactual interpretation is more about effects of changing X by one unit. If we increase X by one unit how the world out there will change. This is as I see the difference between modeling and not  fitting a model.

3 -  Interactions should be included in the model when main effects are large. When using binary variables interactions are a way of fitting different models to different groups.

4 - Always look at the residual versus the predicted as a diagnostic.

5 - Use graphics to display precision of coefficients. Use the R function "sim" for simulations.

6 - Make sure regression is the right tool to use and that the variables you are using are the correct ones to answer your research question. The validity of the Regression Model for the research question is put by the book as the most important assumption of the regression model.

7 Additivity and Linearity is the second most important assumption. Use transformation on the predictors if these assumptions are not matched.

8 - Normality and equal variance are less important assumption.

9 - The best way to validate a model is to test it in the real world, with a new set of data. A model is good when it can be replicated with external data.

For those who likes R, the book has plenty of codes that can be used by the beginners as a starting point for learning the software. 

No comments: