Getting correlated attributes in an Efficient Design_Doubt

This forum is for posts covering broader stated choice experimental design issues.

Moderators: Andrew Collins, Michiel Bliemer, johnr

Getting correlated attributes in an Efficient Design_Doubt

Postby neeraj85 » Tue Jun 23, 2015 3:35 pm

Hi,

I read chapter 7 of the NGENE manual and understood that efficient designs are based on minimizing the AVC matrix. Hence orthogonality need not necessarily hold.

I developed an efficient design using prior betas, eff(mnl,d) and 3 alternatives (1 status quo & 2 hypothetical). Each alternative is defined by 4 attributes which are all alternate specific variables.

I collected data from this design and fit a Mixed Logit model on it. The results have expected sign and are significant.

However, following is the correlation matrix for the 4 attributes (data is in long form)
1 2 3 4
1 1
2 .7 1
3 .3 .4 1
4 .8 .5 .3 1

I am wondering if this high degree of correlation is going to impact my analysis. I feel it won't have an impact as my estimates are already significant.
Additionally, I read in one of the other post that in case you are estimating only a single parameter for both alternatives, so essentially there is no multicollinearity.

Please let me know whether my understanding is correct or not.

Thanks
Neeraj
neeraj85
 
Posts: 31
Joined: Wed Jan 28, 2015 4:58 pm

Re: Getting correlated attributes in an Efficient Design_Dou

Postby johnr » Thu Jul 02, 2015 3:56 pm

Hi Neeraj

The short answer is no, this is probably not a problem.

The long answer is also no, but just a little longer ... You should think of multicollinearity in terms of the effect on the AVC matrix. In linear regression models the AVC matrix is

sigma^2 * (X'X) ^-1

where X is the data.

The issue and why we are concerned about correlation is due to 1) X'X and 2) the ^-1 in the above equation.

1) X'X

If X is orthogonal, then X'X will have zero covariances, and the leading diagonal will take large values. By inverting the matrix, i.e., the ^-1 above, the resulting matrix will have zero covariances and very low values along the leading diagonal. This means zero covariances and very small standard errors.

2) ^-1

The ability to invert the matrix will be affected by the degree of correlation within the columns of the matrix being inverted (in this case X'X). To high correlation means that you cannot invert the matrix, which means your matrix is singular!

The point is, we care about the correlation structure of X, not because we care about the correlation structure of X, but because we care about obtaining uncorrelated betas (and low standard errors)! The aim is to find out the influence of X on Y right? How do we measure that influence - the betas. If a design with high correlation allowed you to uncover the influences of X on Y better than a design with low correlation, which would you choose? That is, would you choose the design that has (so called*) "good" properties (i.e., low correlation) but doesn't allow you to uncover what you want, or the design that has "poor" statistical properties (i.e., high correlations) but gives you what you want? It just so happens that for linear regression models, designs with (so called*) good properties, give you what you want.

For the MNL (same principle for other logit models, just more complicated formula), the AVC matrix is computed as (Z'Z) ^-1, where the elements within Z are given as z = (xjk - sum(overi)xik*Pi)*sqrt(Pj) where x is the kth attribute for alternative j, and P the probability. This is non-linear given P has an exponential in it, meaning that the influence of x is a non-linear monotonic transformation into z.

Given this, and thinking about the above discussion on regression, we can simply replace (X'X) ^-1 with (Z'Z) ^-1.

1) Z'Z

if Z is orthogonal, then Z'Z will have zero covariances, and the leading diagonal will take large values. By inverting the matrix, i.e., the ^-1 above, the resulting matrix will have zero covariances and very low values along the leading diagonal. This means zero covariances and very small standard errors.

Note however that Z is no longer X under any other assumption other than the betas are zero (and of course, I am using the equation provided above for an MNL model - this may not hold for other models). So the question becomes, why do we care (if the betas are non-zero) what the correlation structure of X is, provided that the correlation structure of X does not unduly cause high correlations in Z? The answer is we shouldn't. We care about Z because we care about the betas (recall, these are our measures of influence).

2) ^-1

The ability to invert the matrix will be affected by the degree of correlation within the columns of the matrix being inverted (in this case Z'Z). To higher correlation means that you cannot invert the matrix, which means your Hessian is singular!

But we care about X only insofar as how X translates into Z, and that will depend on the betas (probabilities), etc., used when generating the design, or the betas when estimating the model.

So based on the long answer, we arrive at the short answer "this is probably not a problem." The "probably" in my answer lies in the fact that I have no idea how the correlation structure of your X translates into the correlation structure of your Z matrix, given I don't have the design, nor the betas you used to generate it. However, the fact that Z'Z is invertable (you got an AVC matrix for the design did you not), then it suggests that it is not a problem.

So let me throw a question back to you and others using this forum ... what do you mean by the word multicollinearity when dealing with non-linear models? Should we think of it in terms of X as we do with linear models, or should we define it in terms of Z?

John

* I would argue that the degree of goodness a design has relates to the outputs of the design and not the design itself. So to blindly say such and such a property, such as orthognality, is good (or bad), to me is like saying drinking alcohol is always bad. When about to drive home, this is probably a true statement. When trying to get the courage talk to that cute girl (or guy) at a party, then a beer or two is probably a good idea (at least for me).
johnr
 
Posts: 168
Joined: Fri Mar 13, 2009 7:15 am


Return to Choice experiments - general

Who is online

Users browsing this forum: No registered users and 3 guests

cron