Problems with ASC parameter identification

This forum is for posts that specifically focus on Ngene.

Moderators: Andrew Collins, Michiel Bliemer, johnr

Problems with ASC parameter identification

Postby miq » Fri Sep 11, 2015 9:19 pm

I hope to pick your brain (identify an error?) about a weird result I observed when running simulations with different design types.

Long story short - I generate efficient designs optimized for MXL or MNL, or OOD designs. I use these designs to produce artificial choice data (using MXL as data generating process) and estimate MXL models to check if I am able to recover the original parameters. I repeatedly observe that the coefficient for the alternative specific constant (ASC) is better recovered by the designs optimized for MNL than designs optimized for MXL (despite including ;con in NGENE syntax)! MXL designs result in the ASCs turning out insignificant more often.
When I investigated this more closely I found that the d-error of the MXL designs (recovered from simulations) is higher than that of the MNL or OOD designs.

Here are the details of a simulation I set up to reproduce this weird result:

1. Designs
I used 15 different designs which are based on the following syntax (with variations outlined below):

Code: Select all

design
; alts = alt1, alt2, alt3
; rows = 48
; block = 4
; rdraws=sobol(10000)
; rep = 1200
; eff = (rppanel,d)
? ; eff = (mnl,d)
? ; orth = ood
; con

; model:
U(alt1) = bsq[n,-1,0.5] /
U(alt2) = bA1[n,1,0.5]*A1[1,0]
        + bA2[n,1,0.5]*A2[1,0]
        + bA3[n,1,0.5]*A3[1,0]
        + bC[n,1,0.5]*C[-1,-2,-3,-4] /
U(alt3) = bA1*A1
        + bA2*A2
        + bA3*A3
        + bC*C
$



There are 15 designs because they differ in the following way:

- I vary rdraws and rep - I let each efficient designs 'cook' in NGene for about 2 days with different simulation precision.
- I include ;block or not (keeping the number of rows unchanged)
- I include ;con or not
- I optimize designs for mnl
- I optimize designs for ood

Here is the summary of the design types used:
Code: Select all
   design   block   con   d-error   d-error ('true')
1.1   mxl   1   1   0.4151   0.4439
1.2   mxl   1   1   0.4151   0.4383
1.3   mxl   1   1      0.4394
2.1   mxl   1   0   0.3113   0.3304
2.2   mxl   1   0   0.3421   0.3471
2.3   mxl   1   0      0.3542
3.1   mxl   0   1   0.3768   0.5415
3.2   mxl   0   1   0.4615   0.4993
3.3   mxl   0   1      0.5349
4.1   mnl   1   1      
4.2   ood   1   1      
5.1   mnl   1   0      
5.2   ood   1   0      
6.1   mnl   0   1      
6.2   ood   0   1      


For example, designs 1.1, 1.2 and 1.3 differ with respect to rdraws and rep only - 1.1 was generated with rdraws = 100 and rep = 100, 1.2 used 1000 1000, while 1.3 used rdraws = 10000 and rep = 1200.
2.x and 5.x did not include ;con
3.x and 6.x did not include ;block
4.1, 5.1 and 6.1 was optimized for mnl
4.2, 5.2 and 6.2 was optimized for ood

The table above (which turns out formatted badly but is also available in excel format below) includes d-error reported by NGENE and 'true' d-error - resulting from evaluating the final design at 10000 sobol draws and 1200 rep. You can see that because of limited time some of the x.1 or x.2 desings are in fact better than x.3 designs (since x.3 version did not do enough iterations in the time that was available).

2. Data generating process
With the designs at hand I generate the data.
I assume 1200 respondents each facing 12 choice tasks (irrespective of whether a design was blocked or not).
Respondents' preferences are normally distributed, using the same parameters as for generating designs.
I then calculate utilities associated with each alternative, add Gumbel error, simulate choices.
The process is repeated 100 times for each design type. I am happy to have the Matlab code inspected, if you are interested.

3. Estimation
For each of the datasets generated in 2. I estimate the MXL model. The LL is simulated using 1000 Sobol draws, they converge nicely.

Results

The full results are available here:
http://czaj.org/pub/results%20-%20con%20test.xlsx

Here is the summary:
Code: Select all
                  d-error ^ 30            trace         
   design   block   con   d-error   d-error ('true')   mean   median   min   max   mean   median   min   max
1.1   mxl   1   1   0.4151   0.4439   3.9128   1.6908   0.5889   78.1367   0.0200   0.0155   0.0127   0.2181
1.2   mxl   1   1   0.4151   0.4383   4.3882   1.5867   0.6061   215.6998   0.0214   0.0150   0.0129   0.4929
1.3   mxl   1   1   0.4394   0.4394   2.2341   1.8054   0.9440   9.3334   0.0155   0.0147   0.0131   0.0292
2.1   mxl   1   0   0.3113   0.3304   1.5419   1.3710   0.6293   3.7269   0.0158   0.0151   0.0132   0.0242
2.2   mxl   1   0   0.3421   0.3471   2.1977   1.9529   0.9850   7.4614   0.0159   0.0152   0.0136   0.0321
2.3   mxl   1   0   0.3542   0.3542   6.0659   3.0184   1.3638   99.1343   0.0220   0.0163   0.0144   0.2050
3.1   mxl   0   1   0.3768   0.5415   4.3769   2.7256   1.1420   104.8539   0.0179   0.0156   0.0136   0.1799
3.2   mxl   0   1   0.4615   0.4993   1.7092   0.8619   0.4220   68.3785   0.0165   0.0134   0.0120   0.2613
3.3   mxl   0   1   0.5349   0.5349   4.0756   3.4583   1.2738   22.4224   0.0178   0.0166   0.0137   0.0548
4.1   mnl   1   1   0.0000   0.0000   0.3070   0.2547   0.1520   1.2487   0.0125   0.0118   0.0108   0.0246
4.2   ood   1   1   0.0000   0.0000   1.1414   0.6939   0.2502   14.2980   0.0184   0.0142   0.0117   0.1630
5.1   mnl   1   0   0.0000   0.0000   0.3157   0.2413   0.1191   1.6325   0.0128   0.0118   0.0105   0.0341
5.2   ood   1   0   0.0000   0.0000   0.8741   0.4314   0.2137   31.6656   0.0180   0.0133   0.0116   0.3336
6.1   mnl   0   1   0.0000   0.0000   0.2770   0.2554   0.1190   0.7470   0.0123   0.0120   0.0106   0.0173
6.2   ood   0   1   0.0000   0.0000   1.3426   0.6011   0.1916   19.0074   0.0192   0.0146   0.0114   0.1592



As you can see:
1. the mean or median (of the 100 repetitions) d-error of the designs optimized for MXL is substantially higher that that of the MNL or OOD. d-error for the MNL designs is lower than OOD designs.
2. the differences in the trace of the AVC matrix are much lower, but MNL and OOD is still better than MXL.
3. excluding ;con from the design resulted in better mxl design (in terms of lower d-errors, but still worse than MNL or OOD)
4. excluding blocking from the design resulted in better mxl design (in terms of lower d-errors, but still worse than MNL or OOD)
5. ... some other comparisons are possible

When looking at the recovered parameters, their standard errors and z-stats (in the excel file) - they are not all that different.
But you can see what I indicated in the beginning when you look at the minimum of the z-stats for the standard deviation of the ASC parameter - it is often 0 for MXL (or OOD) designs, while it does not happen for MNL designs. This difference was much more evident for designs/data which used less choice tasks per respondent and less respondents - I can run a simulation like this too, if this helps.

Questions
- How is it possible that the d-error of the MXL design is worse than that of the MNL design?
- How come s.d. of ASC is better recovered by MNL than MXL optimized designs?

I appreciate any comments you could have on these weird results.
miq
 
Posts: 22
Joined: Thu Mar 26, 2009 6:13 am

Re: Problems with ASC parameter identification

Postby johnr » Thu Sep 17, 2015 5:42 pm

Hi Miq

I have been trying to understand your question for a few days before answering, however I'm still a little confused. For example, in your post, you have D-error and D-error true. What is the difference?

I ran your syntax, but with 100 draws and only 120 respondents (I've run out of computer processors at the moment). I took the first design generated, saved it and simulated a data set. I then estimated a model in Nlogit on the simulated data set. below is the syntax and output from Nlogit I got.

|-> nlogit
;lhs=choice,cset,Altij
;choices=A,B,C
;rpl
;fcn=sq(n),ba1(n),ba2(n),ba3(n),bc(n)
;pts= 100
;halton
;pds = 12
;model:
U(A) = SQ /
U(B) = bA1*BA1 + bA2*BA2 + bA3*BA3 + bC*BC /
U(C) = bA1*BA1 + bA2*BA2 + bA3*BA3 + bC*BC $
Normal exit: 6 iterations. Status=0, F= 5072.051

-----------------------------------------------------------------------------
Start values obtained using MNL model
Dependent variable Choice
Log likelihood function -5072.05139
Estimation based on N = 5760, K = 5
Inf.Cr.AIC = 10154.1 AIC/N = 1.763
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;...;RHS=ONE$
Response data are given as ind. choices
Number of obs.= 5760, skipped 0 obs
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
SQ| -.58258*** .07179 -8.11 .0000 -.72329 -.44187
BA1| .95838*** .04796 19.98 .0000 .86437 1.05238
BA2| .86707*** .04759 18.22 .0000 .77379 .96034
BA3| .79718*** .04792 16.63 .0000 .70325 .89111
BC| .77871*** .02290 34.01 .0000 .73383 .82359
--------+--------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Sep 17, 2015 at 05:20:43 PM
-----------------------------------------------------------------------------

Normal exit: 19 iterations. Status=0, F= 4721.056

-----------------------------------------------------------------------------
Random Parameters Logit Model
Dependent variable CHOICE
Log likelihood function -4721.05637
Restricted log likelihood -6328.00678
Chi squared [ 10](P= .000) 3213.90082
Significance level .00000
McFadden Pseudo R-squared .2539426
Estimation based on N = 5760, K = 10
Inf.Cr.AIC = 9462.1 AIC/N = 1.643
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
No coefficients -6328.0068 .2539 .2533
Constants only can be computed directly
Use NLOGIT ;...;RHS=ONE$
At start values -5072.0514 .0692 .0684
Response data are given as ind. choices
Replications for simulated probs. = 100
Used Halton sequences in simulations.
RPL model with panel has 480 groups
Fixed number of obsrvs./group= 12
Number of obs.= 5760, skipped 0 obs
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Random parameters in utility functions
SQ| -.96413*** .08513 -11.33 .0000 -1.13099 -.79728
BA1| 1.05441*** .06122 17.22 .0000 .93442 1.17440
BA2| .99544*** .06022 16.53 .0000 .87742 1.11347
BA3| .95677*** .05805 16.48 .0000 .84299 1.07055
BC| .99229*** .03696 26.85 .0000 .91985 1.06472
|Distns. of RPs. Std.Devs or limits of triangular
NsSQ| .39636*** .14986 2.64 .0082 .10265 .69008
NsBA1| .61460*** .09095 6.76 .0000 .43635 .79286
NsBA2| .60807*** .07482 8.13 .0000 .46144 .75471
NsBA3| .42629*** .10589 4.03 .0001 .21875 .63383
NsBC| .48455*** .02933 16.52 .0000 .42706 .54203
--------+--------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Sep 17, 2015 at 05:22:48 PM
-----------------------------------------------------------------------------

Looking at the above output, it seemed to retrieve the parameters (including ASCs) quite well.

To test if this was a fluke, I re-ran the design, took the first new design it found, and repeated the process (using different simulated draws in simulating my sample). Here are the results of my second attempt...

|-> nlogit
;lhs=choice,cset,Altij
;choices=A,B,C
;rpl
;fcn=sq(n),ba1(n),ba2(n),ba3(n),bc(n)
;pts= 100
;halton
;pds = 12
;model:
U(A) = SQ /
U(B) = bA1*BA1 + bA2*BA2 + bA3*BA3 + bC*BC /
U(C) = bA1*BA1 + bA2*BA2 + bA3*BA3 + bC*BC $
Normal exit: 6 iterations. Status=0, F= 5195.850

-----------------------------------------------------------------------------
Start values obtained using MNL model
Dependent variable Choice
Log likelihood function -5195.84980
Estimation based on N = 5760, K = 5
Inf.Cr.AIC = 10401.7 AIC/N = 1.806
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;...;RHS=ONE$
Response data are given as ind. choices
Number of obs.= 5760, skipped 0 obs
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
SQ| -.62503*** .06730 -9.29 .0000 -.75693 -.49313
BA1| .83964*** .04784 17.55 .0000 .74587 .93341
BA2| .84971*** .04619 18.40 .0000 .75918 .94025
BA3| .88363*** .04798 18.42 .0000 .78960 .97766
BC| .79250*** .02338 33.90 .0000 .74668 .83833
--------+--------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Sep 17, 2015 at 05:31:43 PM
-----------------------------------------------------------------------------

Normal exit: 17 iterations. Status=0, F= 4784.668

-----------------------------------------------------------------------------
Random Parameters Logit Model
Dependent variable CHOICE
Log likelihood function -4784.66838
Restricted log likelihood -6328.00678
Chi squared [ 10](P= .000) 3086.67680
Significance level .00000
McFadden Pseudo R-squared .2438901
Estimation based on N = 5760, K = 10
Inf.Cr.AIC = 9589.3 AIC/N = 1.665
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
No coefficients -6328.0068 .2439 .2432
Constants only can be computed directly
Use NLOGIT ;...;RHS=ONE$
At start values -5195.8498 .0791 .0783
Response data are given as ind. choices
Replications for simulated probs. = 100
Used Halton sequences in simulations.
RPL model with panel has 480 groups
Fixed number of obsrvs./group= 12
Number of obs.= 5760, skipped 0 obs
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Random parameters in utility functions
SQ| -1.02870*** .07741 -13.29 .0000 -1.18042 -.87699
BA1| .98507*** .05685 17.33 .0000 .87366 1.09649
BA2| 1.03489*** .05786 17.89 .0000 .92150 1.14829
BA3| .99579*** .05886 16.92 .0000 .88043 1.11115
BC| 1.01860*** .03881 26.25 .0000 .94253 1.09466
|Distns. of RPs. Std.Devs or limits of triangular
NsSQ| .09628 .21842 .44 .6594 -.33182 .52438
NsBA1| .43795*** .09999 4.38 .0000 .24198 .63391
NsBA2| .49086*** .08824 5.56 .0000 .31792 .66380
NsBA3| .50705*** .09070 5.59 .0000 .32928 .68481
NsBC| .55134*** .02883 19.12 .0000 .49483 .60785
--------+--------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Sep 17, 2015 at 05:33:40 PM
-----------------------------------------------------------------------------

Now the meas are spot on, however the standard deviation parameter of the ASC is not significant.

Third time lucky...

|-> nlogit
;lhs=choice,cset,Altij
;choices=A,B,C
;rpl
;fcn=sq(n),ba1(n),ba2(n),ba3(n),bc(n)
;pts= 100
;halton
;pds = 12
;model:
U(A) = SQ /
U(B) = bA1*BA1 + bA2*BA2 + bA3*BA3 + bC*BC /
U(C) = bA1*BA1 + bA2*BA2 + bA3*BA3 + bC*BC $
Normal exit: 6 iterations. Status=0, F= 5010.361

-----------------------------------------------------------------------------
Start values obtained using MNL model
Dependent variable Choice
Log likelihood function -5010.36054
Estimation based on N = 5760, K = 5
Inf.Cr.AIC = 10030.7 AIC/N = 1.741
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
Constants only must be computed directly
Use NLOGIT ;...;RHS=ONE$
Response data are given as ind. choices
Number of obs.= 5760, skipped 0 obs
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
SQ| -.57007*** .07862 -7.25 .0000 -.72416 -.41597
BA1| .86291*** .04947 17.44 .0000 .76595 .95987
BA2| .83462*** .04782 17.45 .0000 .74090 .92835
BA3| .82477*** .04740 17.40 .0000 .73187 .91768
BC| .74785*** .02232 33.50 .0000 .70409 .79160
--------+--------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Sep 17, 2015 at 05:37:24 PM
-----------------------------------------------------------------------------

Normal exit: 24 iterations. Status=0, F= 4629.674

-----------------------------------------------------------------------------
Random Parameters Logit Model
Dependent variable CHOICE
Log likelihood function -4629.67384
Restricted log likelihood -6328.00678
Chi squared [ 10](P= .000) 3396.66588
Significance level .00000
McFadden Pseudo R-squared .2683836
Estimation based on N = 5760, K = 10
Inf.Cr.AIC = 9279.3 AIC/N = 1.611
R2=1-LogL/LogL* Log-L fncn R-sqrd R2Adj
No coefficients -6328.0068 .2684 .2677
Constants only can be computed directly
Use NLOGIT ;...;RHS=ONE$
At start values -5010.3605 .0760 .0752
Response data are given as ind. choices
Replications for simulated probs. = 100
Used Halton sequences in simulations.
RPL model with panel has 480 groups
Fixed number of obsrvs./group= 12
Number of obs.= 5760, skipped 0 obs
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Random parameters in utility functions
SQ| -1.02314*** .09300 -11.00 .0000 -1.20542 -.84087
BA1| .98081*** .05742 17.08 .0000 .86827 1.09335
BA2| .91640*** .05747 15.95 .0000 .80375 1.02904
BA3| .97447*** .06143 15.86 .0000 .85407 1.09486
BC| .96701*** .03799 25.45 .0000 .89255 1.04148
|Distns. of RPs. Std.Devs or limits of triangular
NsSQ| .39781*** .13898 2.86 .0042 .12542 .67021
NsBA1| .36040*** .12507 2.88 .0040 .11527 .60552
NsBA2| .48225*** .09694 4.97 .0000 .29224 .67225
NsBA3| .59396*** .07939 7.48 .0000 .43836 .74955
NsBC| .53241*** .02912 18.28 .0000 .47533 .58948
--------+--------------------------------------------------------------------
***, **, * ==> Significance at 1%, 5%, 10% level.
Model was estimated on Sep 17, 2015 at 05:40:02 PM
-----------------------------------------------------------------------------

I seem to have no problems retrieving the parameter estimates bearing in mind my three examples are just random (non-optimised) designs (the second one had a very high d-error compared to the other two). Without knowing more, I cannot offer any insights into why you cannot reproduce the inputs unfortunately.

John
johnr
 
Posts: 168
Joined: Fri Mar 13, 2009 7:15 am

Re: Problems with ASC parameter identification

Postby miq » Thu Sep 17, 2015 11:55 pm

Hi John,

Thanks for your reply.
The second attempt you report illustrates what is happening (every once in a while). I did this 100 times for different designs and observed that:
- this is less likely to happen in the case of the designs optimized for MNL (even though MXL is estimated using these designs)
- in fact, the mean or median determinant of the AVC matrix of the MXL model estimates is lower for MNL desingns than for MXL designs.
This should not happen, as far as I understand.
I wondered if this has something to do with AVC being treated differently than all other parameters but excluding ;con or ;block does not make the weird result go away.

D-error true is the d-error reported by NGENE if the design generated with a lower number of draws is evaluated using 10,000 draws and 1,200 respondents.

Best,

Mik
miq
 
Posts: 22
Joined: Thu Mar 26, 2009 6:13 am

Re: Problems with ASC parameter identification

Postby johnr » Fri Sep 18, 2015 9:37 am

Hi Mik

Okay, however I'm still a little confused about whether or not you are attempting to compare the D-errors across model types. The D-error can only be interpreted everything being equal (same model type, data structure, number of alts, atts, levels, priors, etc.). So you cannot compare the D-error of a design assuming an MNL model with that of a design assuming a MMNL model. What you can do is take the design optimised for an MNL model and test the D-error for that design assuming now it is estimated on a MMNL model.

Without opening up the AVC matrix itself, my best guess as to what is occurring is that given you used so many draws for the MMNL model design, it probably did not find a very good best design - you might have to run it for a few months - you might have only tested 25 designs over 2 days (I have no idea how many in reality). My experience is that as the number of choice tasks increases in a design, an MNL optimised design tends to mimic a MMNL optimised design. That is, a good MNL design is often a good design for a MMNL model if S is large. In an hour, you probably tested 10s of '000s of MNL designs, hence landed on a MNL design that was good for the MMNL simulation problem you set up. A fair comparison would be to test the same number of MMNL designs in your optimisation process as you did MNL designs. In practice, I often optimise for MNL and test how the design would perform assuming MMNL model (Ngene allows you to do this if you set the code up properly). I then use the best MNL design as the start design for the MMNL model.

John
johnr
 
Posts: 168
Joined: Fri Mar 13, 2009 7:15 am

Re: Problems with ASC parameter identification

Postby Michiel Bliemer » Fri Sep 18, 2015 6:16 pm

Just reading this conversation, I fully agree with John that it makes no sense comparing an MNL D-error with a MXL D-error, there is no reason to assume that one is larger than the other.

Further, I think John is also correctly pointing to your design syntax with 10,000 Sobol draws and 1200 repeititions for generating a design with 48 rows. This is asking for trouble in my opinion. It means that for each design evaluation it requires 10000 x 1200 = 12 million draws. So it is not possible to optimise using these dimensions. Further, 48 rows in a panel model could be problematic, because you are multiplying 48 probabilities in a row in the calculations. This yields extremely small numbers, such that numerical problems could occur in the calculations in both Ngene and in model estimation.

If you can run the computer for long enough (weeks), you could try this syntax:

design
; alts = alt1, alt2, alt3
; rows = 12
; rdraws=gauss(3)
; rep = 500
; eff = (rppanel,d)
? ; eff = (mnl,d)
? ; orth = ood
; con

; model:
U(alt1) = bsq[n,-1,0.5] /
U(alt2) = bA1[n,1,0.5]*A1[1,0]
+ bA2[n,1,0.5]*A2[1,0]
+ bA3[n,1,0.5]*A3[1,0]
+ bC[n,1,0.5]*C[-1,-2,-3,-4] /
U(alt3) = bA1*A1
+ bA2*A2
+ bA3*A3
+ bC*C
$

As we point out in Bliemer and Rose (2011), it is almost impossible to optimise for panel mixed logit models, and we advise to optimise for MNL models while evaluating the design for panel mixed logit.
Michiel Bliemer
 
Posts: 1745
Joined: Tue Mar 31, 2009 4:13 pm

Re: Problems with ASC parameter identification

Postby miq » Tue Sep 22, 2015 12:39 am

Dear Michiel,

Thanks for the reference to the paper. Did you mean this one:
Bliemer, M. C. J., and Rose, J. M., 2011. Experimental design influences on stated choice outputs: An empirical study in air travel choice. Transportation Research Part A: Policy and Practice, 45(1):63-79.
It actually only deals with MNL models, but here:
Bliemer, M. C. J., and Rose, J. M., 2010. Construction of experimental designs for mixed logit models allowing for correlation across choice observations. Transportation Research Part B: Methodological, 44(6):720-734.
I found some additional information (although no dummy-coded attributes and no ASCs).

Please note that I do not compare d-errors between MNL and MXL, I compare d-errors of MXL models estimated using data based on MNL or MXL (or RP-panel if you will) designs. And the MNL designs seem to outperform MXL designs in terms of d-error (calculated ex post, from the AVCs of the estimated MXL models) and MXL-efficient designs often result in insignificant standard deviations for the ASCs.

Also, please note that in addition to 10000 draws x 1200 repetitions (which was able to evaluate a few hundred designs in the time I gave it) I also used 1000x1000 and 100x100 for the comparison - the general conclusions was unchanged.
Yes I used 48 rows but I also used blocking - I suppose it is taken into account in design evaluation, but actually even without blocking I encountered no numerical problems.

I can probably repeat the simulation for a simpler design if you think this will bring more light to the issue.
miq
 
Posts: 22
Joined: Thu Mar 26, 2009 6:13 am

Re: Problems with ASC parameter identification

Postby Michiel Bliemer » Tue Sep 22, 2015 8:20 am

You are right, it is Bliemer and Rose (2010).
In that article we advise to use an MNL optimised design for estimating RP panel models, since they turn out quite efficient. This seems to confirm your finding where the MNL optimised design is actually better than a non-optimised RP panel design. You need ro evaluate millions of designs in order to find a good RP panel design and you simply have not run the optimisation long enough. Setting it to 100 x 100 will not be accurate enough to get reliable calculations, the values I provided I would say are lower bounds.

I do not think you can say that there are no numerical problems in the RP panel design, since you cannot see them. The numerical problems simply lead to a wrong AVC matrix, but you cannot easily check this. Sometimes you can check the probabilities and notice they do bot add up to 1. Further note that Ngene does NOT take blocking into account in the optimisation, blocking in Ngene is done after the design has been generated (in line with all other design types). This of course is inconsistent, but simultaneously optimising for a blocking column would add a further complication to the optimisation and is very difficult. We hope to further improve this in the future. We me tioned this issue with blocking in Rose and Bliemer (2013?) in Transportation on sample size requirements. So this means that your simulation in which you take blocking into account will yield different results. Hence my suggestion to use a single block to avoid this inconsistency.
Michiel Bliemer
 
Posts: 1745
Joined: Tue Mar 31, 2009 4:13 pm


Return to Choice experiments - Ngene

Who is online

Users browsing this forum: Google [Bot] and 13 guests

cron