Long story short - I generate efficient designs optimized for MXL or MNL, or OOD designs. I use these designs to produce artificial choice data (using MXL as data generating process) and estimate MXL models to check if I am able to recover the original parameters. I repeatedly observe that the coefficient for the alternative specific constant (ASC) is better recovered by the designs optimized for MNL than designs optimized for MXL (despite including ;con in NGENE syntax)! MXL designs result in the ASCs turning out insignificant more often.
When I investigated this more closely I found that the d-error of the MXL designs (recovered from simulations) is higher than that of the MNL or OOD designs.
Here are the details of a simulation I set up to reproduce this weird result:
1. Designs
I used 15 different designs which are based on the following syntax (with variations outlined below):
- Code: Select all
design
; alts = alt1, alt2, alt3
; rows = 48
; block = 4
; rdraws=sobol(10000)
; rep = 1200
; eff = (rppanel,d)
? ; eff = (mnl,d)
? ; orth = ood
; con
; model:
U(alt1) = bsq[n,-1,0.5] /
U(alt2) = bA1[n,1,0.5]*A1[1,0]
+ bA2[n,1,0.5]*A2[1,0]
+ bA3[n,1,0.5]*A3[1,0]
+ bC[n,1,0.5]*C[-1,-2,-3,-4] /
U(alt3) = bA1*A1
+ bA2*A2
+ bA3*A3
+ bC*C
$
There are 15 designs because they differ in the following way:
- I vary rdraws and rep - I let each efficient designs 'cook' in NGene for about 2 days with different simulation precision.
- I include ;block or not (keeping the number of rows unchanged)
- I include ;con or not
- I optimize designs for mnl
- I optimize designs for ood
Here is the summary of the design types used:
- Code: Select all
design block con d-error d-error ('true')
1.1 mxl 1 1 0.4151 0.4439
1.2 mxl 1 1 0.4151 0.4383
1.3 mxl 1 1 0.4394
2.1 mxl 1 0 0.3113 0.3304
2.2 mxl 1 0 0.3421 0.3471
2.3 mxl 1 0 0.3542
3.1 mxl 0 1 0.3768 0.5415
3.2 mxl 0 1 0.4615 0.4993
3.3 mxl 0 1 0.5349
4.1 mnl 1 1
4.2 ood 1 1
5.1 mnl 1 0
5.2 ood 1 0
6.1 mnl 0 1
6.2 ood 0 1
For example, designs 1.1, 1.2 and 1.3 differ with respect to rdraws and rep only - 1.1 was generated with rdraws = 100 and rep = 100, 1.2 used 1000 1000, while 1.3 used rdraws = 10000 and rep = 1200.
2.x and 5.x did not include ;con
3.x and 6.x did not include ;block
4.1, 5.1 and 6.1 was optimized for mnl
4.2, 5.2 and 6.2 was optimized for ood
The table above (which turns out formatted badly but is also available in excel format below) includes d-error reported by NGENE and 'true' d-error - resulting from evaluating the final design at 10000 sobol draws and 1200 rep. You can see that because of limited time some of the x.1 or x.2 desings are in fact better than x.3 designs (since x.3 version did not do enough iterations in the time that was available).
2. Data generating process
With the designs at hand I generate the data.
I assume 1200 respondents each facing 12 choice tasks (irrespective of whether a design was blocked or not).
Respondents' preferences are normally distributed, using the same parameters as for generating designs.
I then calculate utilities associated with each alternative, add Gumbel error, simulate choices.
The process is repeated 100 times for each design type. I am happy to have the Matlab code inspected, if you are interested.
3. Estimation
For each of the datasets generated in 2. I estimate the MXL model. The LL is simulated using 1000 Sobol draws, they converge nicely.
Results
The full results are available here:
http://czaj.org/pub/results%20-%20con%20test.xlsx
Here is the summary:
- Code: Select all
d-error ^ 30 trace
design block con d-error d-error ('true') mean median min max mean median min max
1.1 mxl 1 1 0.4151 0.4439 3.9128 1.6908 0.5889 78.1367 0.0200 0.0155 0.0127 0.2181
1.2 mxl 1 1 0.4151 0.4383 4.3882 1.5867 0.6061 215.6998 0.0214 0.0150 0.0129 0.4929
1.3 mxl 1 1 0.4394 0.4394 2.2341 1.8054 0.9440 9.3334 0.0155 0.0147 0.0131 0.0292
2.1 mxl 1 0 0.3113 0.3304 1.5419 1.3710 0.6293 3.7269 0.0158 0.0151 0.0132 0.0242
2.2 mxl 1 0 0.3421 0.3471 2.1977 1.9529 0.9850 7.4614 0.0159 0.0152 0.0136 0.0321
2.3 mxl 1 0 0.3542 0.3542 6.0659 3.0184 1.3638 99.1343 0.0220 0.0163 0.0144 0.2050
3.1 mxl 0 1 0.3768 0.5415 4.3769 2.7256 1.1420 104.8539 0.0179 0.0156 0.0136 0.1799
3.2 mxl 0 1 0.4615 0.4993 1.7092 0.8619 0.4220 68.3785 0.0165 0.0134 0.0120 0.2613
3.3 mxl 0 1 0.5349 0.5349 4.0756 3.4583 1.2738 22.4224 0.0178 0.0166 0.0137 0.0548
4.1 mnl 1 1 0.0000 0.0000 0.3070 0.2547 0.1520 1.2487 0.0125 0.0118 0.0108 0.0246
4.2 ood 1 1 0.0000 0.0000 1.1414 0.6939 0.2502 14.2980 0.0184 0.0142 0.0117 0.1630
5.1 mnl 1 0 0.0000 0.0000 0.3157 0.2413 0.1191 1.6325 0.0128 0.0118 0.0105 0.0341
5.2 ood 1 0 0.0000 0.0000 0.8741 0.4314 0.2137 31.6656 0.0180 0.0133 0.0116 0.3336
6.1 mnl 0 1 0.0000 0.0000 0.2770 0.2554 0.1190 0.7470 0.0123 0.0120 0.0106 0.0173
6.2 ood 0 1 0.0000 0.0000 1.3426 0.6011 0.1916 19.0074 0.0192 0.0146 0.0114 0.1592
As you can see:
1. the mean or median (of the 100 repetitions) d-error of the designs optimized for MXL is substantially higher that that of the MNL or OOD. d-error for the MNL designs is lower than OOD designs.
2. the differences in the trace of the AVC matrix are much lower, but MNL and OOD is still better than MXL.
3. excluding ;con from the design resulted in better mxl design (in terms of lower d-errors, but still worse than MNL or OOD)
4. excluding blocking from the design resulted in better mxl design (in terms of lower d-errors, but still worse than MNL or OOD)
5. ... some other comparisons are possible
When looking at the recovered parameters, their standard errors and z-stats (in the excel file) - they are not all that different.
But you can see what I indicated in the beginning when you look at the minimum of the z-stats for the standard deviation of the ASC parameter - it is often 0 for MXL (or OOD) designs, while it does not happen for MNL designs. This difference was much more evident for designs/data which used less choice tasks per respondent and less respondents - I can run a simulation like this too, if this helps.
Questions
- How is it possible that the d-error of the MXL design is worse than that of the MNL design?
- How come s.d. of ASC is better recovered by MNL than MXL optimized designs?
I appreciate any comments you could have on these weird results.