Acceptable level of correlation between attributes
Posted:
Tue Aug 05, 2014 2:08 pm
by Kreg
I recently created an experimental design in Ngene for a choice experiment with 4 attributes (4*4*3*2), two alternatives + neither alternative, and one choice set per respondent given the length of other survey content. The smallest orthogonal design generated by Ngene was 72 rows. I was able to generate D-efficient designs with 16 rows, but the Pearson correlations between attributes in the design were high. The correlations within attributes (across alternatives) also were high, with a correlation of -1 common for the 2-level attribute. This was a concern in case this attribute were particularly salient for respondents.
I tried the modified Federov algorithm approach, but the designs were very imbalanced. For example, one level of one of the 4-level attributes appeared in the first alternative only once in 16 choice sets.
Question -- is there guidance on when inter-attribute and intra-attribute (inter-alternative) correlations become sufficiently high to discard a design in favour of others that are less desirable on other measures (D-error, number of rows, etc.)?
Thanks in advance for any tips.
Re: Acceptable level of correlation between attributes
Posted:
Tue Aug 12, 2014 2:46 pm
by johnr
Hi Kreg
Thanks for your question. The issue of correlation in non-linear models is an interesting one. One needs to put aside previous prejudices arising from working with linear models. If one looks at the designs generated by Street and Burgess for example, you will get exactly what you describe in terms of perfect negative correlations between the alternatives. If you follow the design strategy suggested by Kaninnen in her 2002 paper, you will also get this result for all but one attribute. It is worth noting that both design types are OPTIMAL, and not efficient. You will tend to get this type of result in (but not limited to) cases where the designs are generating under the assumption of zero priors and unlabelled alternatives. In unlabelled alternatives, this is not an issue, at least statistically. Behaviourally, however, it may be another matter. It will mean that the level will never take the same value across the two alternatives, meaning that respondents will always have to trade-off of that alternative. This is called minimum overlap, and whilst some of the literature promote this type of design, personally I have found this to be problematic in that it has the potential to create lexicographic behaviour.
The point that is worth asking is why is avoiding correlation important in the first place? In linear models, the variance covariance matrix is sigma^2 *(X'X)^-1. If you ignore sigma^2 (it multiplies all elements of (X'X)^-1 so is often ignored), then you are left with (X'X)^-1, where X is the data/design. The interesting thing is that when X is orthogonal, X'X will tend to produce very large values for the leading diagonal of the resulting matrix, and zero off-diagonals. When you invert this matrix, you get very small values for the leading diagonal of the resulting matrix and zero off-diagonals. That is, you get zero covariances and the smallest possible variances for the parameter estimates. It also ensures that X'X is invertable.
But this is not the VC matrix of a logit model. The logit model VC matrix is (Z'Z)^-1 where Z = (x_ksi - sum over j x_ksj * Psj)*sqrt(Psj) where x_ksi is the kth attribute in choice set s related to alternative I, and Psj is the probability that alternative j will be chosen in choice set s. Note that Z contains P, the probability which is a function of the design, and betas. Hence, it is not just the X that matters as with linear models, but also the betas. In cases where beta all = 0, then the choice probabilities will be constants and the model will approximate a linear model (hence why orthogonal designs tend to be optimal for this assumption). But if the betas are not zero, then then the choice probabilities will vary over choice tasks, and the optimal design will not be orthogonal. This means that some correlation in the Xs is actually good for the model, however it will depend on Z, which will depend on the design itself and the betas. Hence, there is no real answer other than you need to invert Z'Z, and this will generally not be possible (near) perfectly correlated. How near is near is an open question. In our book, Hensher Rose and Greene (2005), we provide data (which you can download) with correlations up to 0.9. The models estimate perfectly, all the estimates are in the directions expected, etc.
John
Re: Acceptable level of correlation between attributes
Posted:
Thu Aug 21, 2014 9:30 am
by Kreg
Thank you very much for the response.