pilot study for a MMNL model
Posted: Sun Feb 17, 2013 5:03 pm
Dear Choice-Metrics team,
First of all I would like to thank you for developing Ngene and for releasing such a great user manual! It not only explains the software package very well, but also provides more insights on experimental design theory for stated choice experiments than most of the academic literature.
Before continuing with my inquiries, I would like to warn you that I’m very new to discrete choice analysis and therefore my questions may seem naïve.
For my study I’m planning to estimate a MMNL model with 2 unlabeled alternatives and a status quo alternative that has its attributes set at the base levels to serve as a reference. To do so I would like to produce an efficient design, for which I do not have priors.
i) I’m therefore still deciding on whether I should (a) proceed with an efficient design for a MNL model (and subsequently a MMNL panel design), or (b) first produce an orthogonal design for a pilot study, from which I could obtain the priors for my main study (please see the syntax for both options below).
a) Efficient design:
Design
;alts = proj1*, proj2*, stquo
;rows = 12
;eff = (mnl,wtp(ref1))
;wtp = ref1(*/b6)
;model:
U(proj1) = b11[0] +
b2[0]*yield[-160,-80,0,80] +
b3.effects[0]*pest[1,0] +
b4.effects[0]*hab_adja[1,0] +
b5.effects[0]*hab_farm[1,0] +
b6.effects[0|0]*bee[1,2,0] +
b7[-0.01]*costs[0,100,200,300] +
b8[0]*yield*yield
/
U(proj2) = b12[0] +
b2*yield +
b3*pest +
b4*hab_adja +
b5*hab_farm +
b6*bee +
b7*costs +
b8*yield*yield
/
U(stquo) = b2[0]*yieldSQ[-160] +
b3.effects[0]*pestSQ[1,0](0,12) +
b4.effects[0]*hab_adjaSQ[1,0](0,12) +
b5.effects[0]*hab_farmSQ[1,0](0,12) +
b6.effects[0|0]*beeSQ[1,2,0](0,0,12) +
b7[-0.01]*costsSQ[0]+
b8[0]*yieldSQ*yieldSQ
$
ii) The above syntax stops running with following error report:
"ERROR: A random design could not be generated after 2000000 attempts. There were 0 row repetitions, 7057 alternative repetitions, and 1992943 cases of dominance"
Should I try experimenting with the design dimensions to overcome dominance? In that case I must say that I don’t feel confident enough to do so, since there are no previous studies similar to mine from which I could use parameter estimates as a reference.
iii) My model yields a full factorial design of 4^2*3^1*2^3=384 possible attribute combinations. I decided that my respondents should face the minimum possible amount of choice tasks (i.e. 12 runs, which should suffice for the estimation of 10 parameters according to the S >= K/(J-1) constraint). Nevertheless, I’m not sure if 12 runs offer a subsample that is representative enough of the full factorial design, such as to provide accurate estimates. What do you think?
iv) I already tried running the above syntax for the MMNL model by defaulting the parameter priors to small amounts (assuming their distribution to be normal) and changing the commands to:
;eff = (rppanel,wtp(ref1))
;wtp = ref1(*/b6)
But following error statement shows up:
"Error: The ';eff' property optimises on the WTP of a non-MNL model. Only MNL WTP is currently supported."
Does this mean that to generate an efficient design for my MMNL model, I should allow Ngene to optimize it based on the D-efficiency measure of a MNL model, rather than using the wtp command? Or should I rather estimate my MMNL model based on an efficient design for a MNL model that is optimized based on the variance of the WTP ratio, such as the one presented above?
b) Orthogonal design:
Design
;alts = proj1*, proj2*
;rows = 36
;block = 4
;orth = ood
;model:
U(proj1) = b1 +
b2*yield[-160,-80,0,80] +
b3*pest[1,0] +
b4*hab_adja[1,0] +
b5*hab_farm[1,0] +
b6*bee[1,2,0] +
b7*costs[0,300,600,900] +
b8*yield*yield
/
U(proj2) = b2*yield +
b3*pest +
b4*hab_adja +
b5*hab_farm +
b6*bee +
b7*costs +
b8*yield*yield
$
A design is easily generated with this syntax. I thought that I could present each block to 4 respondents, yielding a total of 16 respondents and 144 choice observations for my pilot study.
v) Do you think that if proceeding with the orthogonal design approach, a sample of 16 respondents would provide significant estimates that can be used as priors for the efficient design, which would underlie my MMNL model?
vi) How superior are optimal orthogonal designs over the regular sequential orthogonal designs or over the simultaneous orthogonal designs? In the manual it is stated that the former lose efficiency as the parameter priors are set to zero, but when running the above syntax with either the sequential or the simultaneous orthogonal design commands, Ngene produces following warning statement:
Defaulting to prior values of zero for the following priors: 'b1, b2, b3, b4, b5, b6, b7, b8'
If one can obtain a simultaneous orthogonal design that isn’t larger in the number of choice sets than a sequential orthogonal design (which is my case), should one prefer it, or is the argument of maximizing attribute level differences enough for sticking to the optimal orthogonal design?
vii) Are the ASC specifications in both syntaxes correct?
viii) I’m considering including various socio-demographic covariates (e.g. education level and income) in my final MMNL model. Does it still make sense to include such covariates in the generation of the underlying design of a MMNL model (by e.g. defining a set of segments) if one of the main features of the MMNL model is that it handles heterogeneity in choice behavior?
I hope I’m not overwhelming you with too many questions and thank you in advance for your advice.
Best regards,
Manuel
First of all I would like to thank you for developing Ngene and for releasing such a great user manual! It not only explains the software package very well, but also provides more insights on experimental design theory for stated choice experiments than most of the academic literature.
Before continuing with my inquiries, I would like to warn you that I’m very new to discrete choice analysis and therefore my questions may seem naïve.
For my study I’m planning to estimate a MMNL model with 2 unlabeled alternatives and a status quo alternative that has its attributes set at the base levels to serve as a reference. To do so I would like to produce an efficient design, for which I do not have priors.
i) I’m therefore still deciding on whether I should (a) proceed with an efficient design for a MNL model (and subsequently a MMNL panel design), or (b) first produce an orthogonal design for a pilot study, from which I could obtain the priors for my main study (please see the syntax for both options below).
a) Efficient design:
Design
;alts = proj1*, proj2*, stquo
;rows = 12
;eff = (mnl,wtp(ref1))
;wtp = ref1(*/b6)
;model:
U(proj1) = b11[0] +
b2[0]*yield[-160,-80,0,80] +
b3.effects[0]*pest[1,0] +
b4.effects[0]*hab_adja[1,0] +
b5.effects[0]*hab_farm[1,0] +
b6.effects[0|0]*bee[1,2,0] +
b7[-0.01]*costs[0,100,200,300] +
b8[0]*yield*yield
/
U(proj2) = b12[0] +
b2*yield +
b3*pest +
b4*hab_adja +
b5*hab_farm +
b6*bee +
b7*costs +
b8*yield*yield
/
U(stquo) = b2[0]*yieldSQ[-160] +
b3.effects[0]*pestSQ[1,0](0,12) +
b4.effects[0]*hab_adjaSQ[1,0](0,12) +
b5.effects[0]*hab_farmSQ[1,0](0,12) +
b6.effects[0|0]*beeSQ[1,2,0](0,0,12) +
b7[-0.01]*costsSQ[0]+
b8[0]*yieldSQ*yieldSQ
$
ii) The above syntax stops running with following error report:
"ERROR: A random design could not be generated after 2000000 attempts. There were 0 row repetitions, 7057 alternative repetitions, and 1992943 cases of dominance"
Should I try experimenting with the design dimensions to overcome dominance? In that case I must say that I don’t feel confident enough to do so, since there are no previous studies similar to mine from which I could use parameter estimates as a reference.
iii) My model yields a full factorial design of 4^2*3^1*2^3=384 possible attribute combinations. I decided that my respondents should face the minimum possible amount of choice tasks (i.e. 12 runs, which should suffice for the estimation of 10 parameters according to the S >= K/(J-1) constraint). Nevertheless, I’m not sure if 12 runs offer a subsample that is representative enough of the full factorial design, such as to provide accurate estimates. What do you think?
iv) I already tried running the above syntax for the MMNL model by defaulting the parameter priors to small amounts (assuming their distribution to be normal) and changing the commands to:
;eff = (rppanel,wtp(ref1))
;wtp = ref1(*/b6)
But following error statement shows up:
"Error: The ';eff' property optimises on the WTP of a non-MNL model. Only MNL WTP is currently supported."
Does this mean that to generate an efficient design for my MMNL model, I should allow Ngene to optimize it based on the D-efficiency measure of a MNL model, rather than using the wtp command? Or should I rather estimate my MMNL model based on an efficient design for a MNL model that is optimized based on the variance of the WTP ratio, such as the one presented above?
b) Orthogonal design:
Design
;alts = proj1*, proj2*
;rows = 36
;block = 4
;orth = ood
;model:
U(proj1) = b1 +
b2*yield[-160,-80,0,80] +
b3*pest[1,0] +
b4*hab_adja[1,0] +
b5*hab_farm[1,0] +
b6*bee[1,2,0] +
b7*costs[0,300,600,900] +
b8*yield*yield
/
U(proj2) = b2*yield +
b3*pest +
b4*hab_adja +
b5*hab_farm +
b6*bee +
b7*costs +
b8*yield*yield
$
A design is easily generated with this syntax. I thought that I could present each block to 4 respondents, yielding a total of 16 respondents and 144 choice observations for my pilot study.
v) Do you think that if proceeding with the orthogonal design approach, a sample of 16 respondents would provide significant estimates that can be used as priors for the efficient design, which would underlie my MMNL model?
vi) How superior are optimal orthogonal designs over the regular sequential orthogonal designs or over the simultaneous orthogonal designs? In the manual it is stated that the former lose efficiency as the parameter priors are set to zero, but when running the above syntax with either the sequential or the simultaneous orthogonal design commands, Ngene produces following warning statement:
Defaulting to prior values of zero for the following priors: 'b1, b2, b3, b4, b5, b6, b7, b8'
If one can obtain a simultaneous orthogonal design that isn’t larger in the number of choice sets than a sequential orthogonal design (which is my case), should one prefer it, or is the argument of maximizing attribute level differences enough for sticking to the optimal orthogonal design?
vii) Are the ASC specifications in both syntaxes correct?
viii) I’m considering including various socio-demographic covariates (e.g. education level and income) in my final MMNL model. Does it still make sense to include such covariates in the generation of the underlying design of a MMNL model (by e.g. defining a set of segments) if one of the main features of the MMNL model is that it handles heterogeneity in choice behavior?
I hope I’m not overwhelming you with too many questions and thank you in advance for your advice.
Best regards,
Manuel