Saturday, 4 September 2010

The "Little Jiffy" rule and Factor Analysis

Factor Analysis is a statistical technique we use all the time in Marketing Research. It is often the case that we want to uncover the dimensions in the consumer's mind that can help to explain variables of interest like Overall Satisfaction and Loyalty. Questionnaire are often cluttered with a whole bunch of attributes that makes the understanding of the analysis difficult and tiresome. To make matters worse it is not uncommon to see people wanting to run Regression Analysis with 20 or 30 attributes highly correlated.  I don't want to talk about Regression here, but to me the only way out in these cases is to run a Factor Analysis and work with factors instead.

Back in my university years I learnt that Factor Analysis had to be rotated with Varimax Rotation, especially if you wanted to use the factor as independent variables in a Regression Analysis because this way the factors would have the optimum property of being orthogonal and their effects could be estimated purely in a Regression Analysis. Using the Varimax Rotation along with Principal Component Analysis and retaining as many factor as there are eigenvalues higher than 1 in the correlation matrix is the so called "Little Jiffy" approach for factor analysis. And it is widespread, you not only learn it in the school, you will see people using it all the time and you will find lots of papers following the approach. I remember this one from a research I did while doing some analyses for a paper.

In the workplace I soon noticed that you pay a price for having beautifully orthogonal factors to use in your regression: they are not always easy to be interpreted and even when the factors make sense they are not so sharply defined as one would like. It also makes more sense not to constrain factors to be independent because we have no evidence that if these factors exist in real world they are orthogonal. So I abandoned the orthogonal rotation.

The number of factors to retain is something more subtle. I think the eigenvalue higher than 1 rule usually gives a good starting point for the Exploratory Factor Analysis, but I like to look at several solutions and choose the one easier to interpret as the factor has to be understood as it will be target for actions on the client side.

Finally I want to link to this paper, which is an interesting source for those who works with Factor Analysis. There is something in the example (with measures of boxes...) that I don't like, maybe the non linear dependence of some measure on others would tell us that we should not expect Factor Analysis to perform well. It would be better to simulate data clearly following a certain factor model structure and explore it instead.

Do not let the computer decide what to do with the data, by using the Little Jiffy rule of thumb, instead take a more proactive approach and bring your knowledge in, don't be afraid to think your problem and what makes more sense in your specific situation.

No comments: