by johnr » Thu Jul 02, 2015 3:56 pm
Hi Neeraj
The short answer is no, this is probably not a problem.
The long answer is also no, but just a little longer ... You should think of multicollinearity in terms of the effect on the AVC matrix. In linear regression models the AVC matrix is
sigma^2 * (X'X) ^-1
where X is the data.
The issue and why we are concerned about correlation is due to 1) X'X and 2) the ^-1 in the above equation.
1) X'X
If X is orthogonal, then X'X will have zero covariances, and the leading diagonal will take large values. By inverting the matrix, i.e., the ^-1 above, the resulting matrix will have zero covariances and very low values along the leading diagonal. This means zero covariances and very small standard errors.
2) ^-1
The ability to invert the matrix will be affected by the degree of correlation within the columns of the matrix being inverted (in this case X'X). To high correlation means that you cannot invert the matrix, which means your matrix is singular!
The point is, we care about the correlation structure of X, not because we care about the correlation structure of X, but because we care about obtaining uncorrelated betas (and low standard errors)! The aim is to find out the influence of X on Y right? How do we measure that influence - the betas. If a design with high correlation allowed you to uncover the influences of X on Y better than a design with low correlation, which would you choose? That is, would you choose the design that has (so called*) "good" properties (i.e., low correlation) but doesn't allow you to uncover what you want, or the design that has "poor" statistical properties (i.e., high correlations) but gives you what you want? It just so happens that for linear regression models, designs with (so called*) good properties, give you what you want.
For the MNL (same principle for other logit models, just more complicated formula), the AVC matrix is computed as (Z'Z) ^-1, where the elements within Z are given as z = (xjk - sum(overi)xik*Pi)*sqrt(Pj) where x is the kth attribute for alternative j, and P the probability. This is non-linear given P has an exponential in it, meaning that the influence of x is a non-linear monotonic transformation into z.
Given this, and thinking about the above discussion on regression, we can simply replace (X'X) ^-1 with (Z'Z) ^-1.
1) Z'Z
if Z is orthogonal, then Z'Z will have zero covariances, and the leading diagonal will take large values. By inverting the matrix, i.e., the ^-1 above, the resulting matrix will have zero covariances and very low values along the leading diagonal. This means zero covariances and very small standard errors.
Note however that Z is no longer X under any other assumption other than the betas are zero (and of course, I am using the equation provided above for an MNL model - this may not hold for other models). So the question becomes, why do we care (if the betas are non-zero) what the correlation structure of X is, provided that the correlation structure of X does not unduly cause high correlations in Z? The answer is we shouldn't. We care about Z because we care about the betas (recall, these are our measures of influence).
2) ^-1
The ability to invert the matrix will be affected by the degree of correlation within the columns of the matrix being inverted (in this case Z'Z). To higher correlation means that you cannot invert the matrix, which means your Hessian is singular!
But we care about X only insofar as how X translates into Z, and that will depend on the betas (probabilities), etc., used when generating the design, or the betas when estimating the model.
So based on the long answer, we arrive at the short answer "this is probably not a problem." The "probably" in my answer lies in the fact that I have no idea how the correlation structure of your X translates into the correlation structure of your Z matrix, given I don't have the design, nor the betas you used to generate it. However, the fact that Z'Z is invertable (you got an AVC matrix for the design did you not), then it suggests that it is not a problem.
So let me throw a question back to you and others using this forum ... what do you mean by the word multicollinearity when dealing with non-linear models? Should we think of it in terms of X as we do with linear models, or should we define it in terms of Z?
John
* I would argue that the degree of goodness a design has relates to the outputs of the design and not the design itself. So to blindly say such and such a property, such as orthognality, is good (or bad), to me is like saying drinking alcohol is always bad. When about to drive home, this is probably a true statement. When trying to get the courage talk to that cute girl (or guy) at a party, then a beer or two is probably a good idea (at least for me).