choice-metrics.com

Posted: **Sun Feb 17, 2013 5:03 pm**

Dear Choice-Metrics team,

First of all I would like to thank you for developing Ngene and for releasing such a great user manual! It not only explains the software package very well, but also provides more insights on experimental design theory for stated choice experiments than most of the academic literature.

Before continuing with my inquiries, I would like to warn you that I’m very new to discrete choice analysis and therefore my questions may seem naïve.

For my study I’m planning to estimate a MMNL model with 2 unlabeled alternatives and a status quo alternative that has its attributes set at the base levels to serve as a reference. To do so I would like to produce an efficient design, for which I do not have priors.

i) I’m therefore still deciding on whether I should (a) proceed with an efficient design for a MNL model (and subsequently a MMNL panel design), or (b) first produce an orthogonal design for a pilot study, from which I could obtain the priors for my main study (please see the syntax for both options below).

a) Efficient design:

Design
;alts = proj1*, proj2*, stquo
;rows = 12
;eff = (mnl,wtp(ref1))
;wtp = ref1(*/b6)
;model:
U(proj1) = b11[0] +
b2[0]*yield[-160,-80,0,80] +
b3.effects[0]*pest[1,0] +
b4.effects[0]*hab_adja[1,0] +
b5.effects[0]*hab_farm[1,0] +
b6.effects[0|0]*bee[1,2,0] +
b7[-0.01]*costs[0,100,200,300] +
b8[0]*yield*yield
/
U(proj2) = b12[0] +
b2*yield +
b3*pest +
b4*hab_adja +
b5*hab_farm +
b6*bee +
b7*costs +
b8*yield*yield
/
U(stquo) = b2[0]*yieldSQ[-160] +
b3.effects[0]*pestSQ[1,0](0,12) +
b4.effects[0]*hab_adjaSQ[1,0](0,12) +
b5.effects[0]*hab_farmSQ[1,0](0,12) +
b6.effects[0|0]*beeSQ[1,2,0](0,0,12) +
b7[-0.01]*costsSQ[0]+
b8[0]*yieldSQ*yieldSQ
$

ii) The above syntax stops running with following error report:

"ERROR: A random design could not be generated after 2000000 attempts. There were 0 row repetitions, 7057 alternative repetitions, and 1992943 cases of dominance"

Should I try experimenting with the design dimensions to overcome dominance? In that case I must say that I don’t feel confident enough to do so, since there are no previous studies similar to mine from which I could use parameter estimates as a reference.

iii) My model yields a full factorial design of 4^2*3^1*2^3=384 possible attribute combinations. I decided that my respondents should face the minimum possible amount of choice tasks (i.e. 12 runs, which should suffice for the estimation of 10 parameters according to the S >= K/(J-1) constraint). Nevertheless, I’m not sure if 12 runs offer a subsample that is representative enough of the full factorial design, such as to provide accurate estimates. What do you think?

iv) I already tried running the above syntax for the MMNL model by defaulting the parameter priors to small amounts (assuming their distribution to be normal) and changing the commands to:

;eff = (rppanel,wtp(ref1))
;wtp = ref1(*/b6)

But following error statement shows up:

"Error: The ';eff' property optimises on the WTP of a non-MNL model. Only MNL WTP is currently supported."

Does this mean that to generate an efficient design for my MMNL model, I should allow Ngene to optimize it based on the D-efficiency measure of a MNL model, rather than using the wtp command? Or should I rather estimate my MMNL model based on an efficient design for a MNL model that is optimized based on the variance of the WTP ratio, such as the one presented above?

b) Orthogonal design:

Design
;alts = proj1*, proj2*
;rows = 36
;block = 4
;orth = ood
;model:
U(proj1) = b1 +
b2*yield[-160,-80,0,80] +
b3*pest[1,0] +
b4*hab_adja[1,0] +
b5*hab_farm[1,0] +
b6*bee[1,2,0] +
b7*costs[0,300,600,900] +
b8*yield*yield
/
U(proj2) = b2*yield +
b3*pest +
b4*hab_adja +
b5*hab_farm +
b6*bee +
b7*costs +
b8*yield*yield
$

A design is easily generated with this syntax. I thought that I could present each block to 4 respondents, yielding a total of 16 respondents and 144 choice observations for my pilot study.

v) Do you think that if proceeding with the orthogonal design approach, a sample of 16 respondents would provide significant estimates that can be used as priors for the efficient design, which would underlie my MMNL model?

vi) How superior are optimal orthogonal designs over the regular sequential orthogonal designs or over the simultaneous orthogonal designs? In the manual it is stated that the former lose efficiency as the parameter priors are set to zero, but when running the above syntax with either the sequential or the simultaneous orthogonal design commands, Ngene produces following warning statement:

Defaulting to prior values of zero for the following priors: 'b1, b2, b3, b4, b5, b6, b7, b8'

If one can obtain a simultaneous orthogonal design that isn’t larger in the number of choice sets than a sequential orthogonal design (which is my case), should one prefer it, or is the argument of maximizing attribute level differences enough for sticking to the optimal orthogonal design?

vii) Are the ASC specifications in both syntaxes correct?

viii) I’m considering including various socio-demographic covariates (e.g. education level and income) in my final MMNL model. Does it still make sense to include such covariates in the generation of the underlying design of a MMNL model (by e.g. defining a set of segments) if one of the main features of the MMNL model is that it handles heterogeneity in choice behavior?

I hope I’m not overwhelming you with too many questions and thank you in advance for your advice.

Best regards,
Manuel

Posted: **Sun Feb 17, 2013 7:32 pm**

Dear Manuel,

Thank you for your nice words.

Those are quite a few questions you pose, I will try to answer them one by one.

(i) If you already have some ideas about the priors (from literature study, or logically you know the sign of the parameter), it would be best to start with an efficient design. An orthogonal design assumes no information whatsoever about the value or sign of the parameters, and will therefore always be less efficient. Further, an orthogonal design does not rule out dominant choice alternatives, so you have to check your design afterwards carefully. Clearly, you will be very uncertain about the priors if you try to generate your initial efficient design, so I would recommend using a Bayesian efficient design which is more robust against prior misspecification.

(ii) Ngene is not able to find a design that satisfies your requirements because of two problems: 1. You define wtp as a division by b6, which is set to zero, so you will always get infinite wtps and Ngene does not like that. 2. You have set a negative cost prior and all other priors are zero. Since the status quo has a cost of zero, the status quo will therefore all be dominant. So Ngene cannot find any design without dominant alternatives. You can remove the asteriks (*) in the alternatives, such that Ngene does not check for dominancy, or you should provide more realistic values for the other priors.

(iii) Representativeness of the data is not a prerequisite for being able to estimate the parameters reliably. An efficient design aims to optimise the information from each choice task, and with very low numbers of choice tasks it can already retrieve a lot of information. You do not need a lot of choice tasks to be able to estimate the parameters. The number of choice tasks will only play a role when you have a lot of parameters (for example when you have included many interaction terms). So there is no problem using 'only' 12 choice tasks.

(iv) I probably cannot give a definite answer on this, but generating efficient designs for the panel MMNL model is VERY time consuming, you may need weeks of computation time. Our practical approach is typically to optimise for the MNL model, as we have found that such a design is often also very efficient for estimating a panel MMNL model. So I suggest using the MNL model to optimise the design, but to CHECK how it would behave under a panel MMNL model (using the ;eval command). The WTP command is not supported for panel MMNL, but you can check the standard errors of the parameters themselves instead of the ratios of the parameters by using the D-efficiency criterion.

(v) It is hard to say whether 16 respondents would be sufficient, as this is very case specific (in some studies 5 are enough, some require 100, all depending on how strong the attributes explain the behaviour). I would hope that 16 would be enough to get you good MNL priors, but I think that is not enough to get reliable MMNL priors. But as I suggested, I would opt for an MNL optimised design anyway, and test for panel MMNL.

(vi) Orthogonal designs always use priors equal to zero, as they do not require any priors. Optimal orthogonal designs, as described by Street and Burgess, provide a procedure for optimising the differences in the attributes, which is more efficient than merely randomly choosing an orthogonal design (as many do). Since optimal orthogonal designs maximise the differences, they ensure that all attribute levels across alternatives in each choice tasks are different. This maximises trade-offs (read: information obtained from the choice task), but could lead to problems of dominancy or lexicographical behaviour. Your design includes a status quo, and is therefore not an entirely unlabelled experiment such that the Street and Burgess approach may not provide an optimal design (given zero priors).

(vii) Since your first two alternatives are unlabelled, they should either have the same constant, or no constant and only a constant in the status quo alternative.

(viii) Including covariates in the estimation of MMNL model is still important. Covariates provide EXPLAINED heterogeneity, while random parameters in the MMNL model provide UNEXPLAINED heterogeneity. You would like to explain as much as possible. In the design of experiments, people typically do not include covariates in the design process, except in a paper John Rose and myself wrote some time ago in which we used the segments. Clearly, the more information you put into the design process, the better more efficient the design will be. But it does complicate the design and the survey, as you may have to create several surveys for different segments.

Your questions were a bit overwhelming, but all very good questions! Not all are related to Ngene, some are more general to experimental design and different methods. I would like to invite you to come and take one of our courses in Experimental Design, as it is a bit difficult to explain everything on this forum :-)

Regards,
Michiel

Posted: **Mon Feb 18, 2013 1:26 pm**

Dear Michiel,

thank you very much for answering all my questions in such a quick fashion and I apologize for asking so many of them, especially because most of them could have indeed been answered by consulting the extended literature on experimental design theory. I doubt though that such answers would have been as clear as the answers you provided.

Again, thank you very much for this great service!

Best regards,

Manuel

Posted: **Tue Feb 19, 2013 6:04 pm**

Dear Michiel,

following your advice I'm now trying to generate an efficient design. As recommended in the manual, I wrote the syntax for a simple non-Bayesian design for a MNL model (before generating a more complex Bayesian design), in which I assign small values and the corresponding signs to the parameter priors, as follows:

Design
;alts = proj1*, proj2*, stquo
;rows = 12
;eff = (mnl,wtp(ref1))
;wtp = ref1(*/b6)
;model:
U(proj1) = b1[0.1]*yield[-160,-80,0,80] +
b2.effects[0.1]*pest[1,0] +
b3.effects[0.1]*hab_adja[1,0] +
b4.effects[0.1]*hab_farm[1,0] +
b5.effects[0.1|0.2]*bee[1,2,0] +
b6[-0.01]*costs[0,300,600,900] +
b7[0.1]*yield*yield
/
U(proj2) = b1*yield +
b2*pest +
b3*hab_adja +
b4*hab_farm +
b5*bee +
b6*costs +
b7*yield*yield
/
U(stquo) = c1[0.1] +
b1[0.1]*yieldSQ[-160] +
b2.effects[0.1]*pestSQ[1,0](0,12) +
b3.effects[0.1]*hab_adjaSQ[1,0](0,12) +
b4.effects[0.1]*hab_farmSQ[1,0](0,12) +
b5.effects[0.1|0.2]*beeSQ[1,2,0](0,0,12) +
b6[-0.01]*costsSQ[0]+
b7[0.1]*yieldSQ*yieldSQ
$

Shortly after running this syntax Ngene produces following statement:

"Warning: No valid design has been found after 1000 evaluations. There may be a problem with the specification of the design. A common problem is that the choice probabilities are too extreme (close to 1 and 0), perhaps because some or all of the prior values are too large. Also, it is generally a good idea to start with a simple design (MNL, non-Bayesian), then add complexity. If you press stop, a design will be reported, which may assist in diagnosing the problem."

I have tried experimenting with the design dimensions and no modification seems to work. The reason I defined smaller values by an order of magnitude for the cost attribute is to avoid dominance, but this doesn't seem to do the trick either. Only if I set all priors to zero it seems to run without complaints.

I thank you in advance for your advice!

With best regards from Thailand,

Manuel

Posted: **Tue Feb 19, 2013 8:51 pm**

Manuel,

Your priors for yield and cost are too large. Utility differences should typically be somewhere around 1 or 2 at maximum, but 0.01*900 = 9 and 0.1* 160 = 16, so each choice task will have a clearly dominant alternative with the highest yield and cost levels. Making them 0.01 and 0.001 or even smaller should solve the problem. I am not sure how a WTP efficient design will respond to a very small cost parameter, so why not optimising for D efficuency instead?

Michiel

Posted: **Thu Feb 21, 2013 7:08 pm**

Dear Michiel,

thank you very much! Indeed reducing the values of the yield and costs priors fixed the problem! I modified the levels of the cost attribute and added some constraints to obtain more realistic scenarios, as follows:

Design
;alts = proj1*, proj2*, stquo
;rows = 12
;eff = (mnl,d)
;cond:

? Nothing implemented cant cost money and yield changes less than or equal to zero
If (proj1.pest = 0 and
proj1.hab_adja = 0 and
proj1.hab_farm = 0 and
proj1.bee = 0
, proj1.costs = 0 and proj1.yield <= 0),
If (proj2.pest = 0 and
proj2.hab_adja = 0 and
proj2.hab_farm = 0 and
proj2.bee = 0
, proj2.costs = 0 and proj2.yield <= 0),

? If any implementation level, cost greater than zero
If (proj1.pest > 0 or
proj1.hab_adja > 0 or
proj1.hab_farm > 0 or
proj1.bee > 0
, proj1.costs > 0),
If (proj2.pest > 0 or
proj2.hab_adja > 0 or
proj2.hab_farm > 0 or
proj2.bee > 0
, proj2.costs > 0),

? Project alternatives different from status quo
If (proj1.costs = 0 and
proj1.pest = 0 and
proj1.hab_adja = 0 and
proj1.hab_farm = 0 and
proj1.bee = 0
, proj1.yield <> -160),
If (proj2.costs = 0 and
proj2.pest = 0 and
proj2.hab_adja = 0 and
proj2.hab_farm = 0 and
proj2.bee = 0
, proj2.yield <> -160)
;model:
U(proj1) = b1[0.001]*yield[-160,-80,0,80] +
b2.effects[0.1]*pest[1,0] +
b3.effects[0.1]*hab_adja[1,0] +
b4.effects[0.1]*hab_farm[1,0] +
b5.effects[0.1]*bee[1,0] +
b6[-0.0008]*costs[0,100,200,300] +
b7[0.001]*yield*yield
/
U(proj2) = b1*yield +
b2*pest +
b3*hab_adja +
b4*hab_farm +
b5*bee +
b6*costs +
b7*yield*yield
/
U(stquo) = c1[0.1] +
b1[0.001]*yieldSQ[-160] +
b2.effects[0.1]*pestSQ[1,0](0,12) +
b3.effects[0.1]*hab_adjaSQ[1,0](0,12) +
b4.effects[0.1]*hab_farmSQ[1,0](0,12) +
b5.effects[0.1]*beeSQ[1,0](0,12) +
b6[-0.0008]*costsSQ[0]+
b7[0.001]*yieldSQ*yieldSQ
$

The syntax seems to run smoothly, but I'm a bit worried about the efficiency measures. My design doesn't seem to get a D error lower that 0.4 (which for my case I do not know if it is small enough), and the S estimate appears to be very large (9205542139.30297), which I understand has dramatic implications for my sample size! Is this related to the small dimensions of my parameter priors?

I would appreciate some advice with this regard.

With best regards,

Manuel

Posted: **Sat Feb 23, 2013 10:55 am**

Hi Manuel,

The value of a D-error really makes sense only if interpreted as a relative value, not an absolute, and then, only if everything else is equal. A D-error of 0.4 by itself is actually meaningless. It suggests better efficiency than a design with a D-error of 0.41 but worse than one with a D-error of 0.39. You cannot however compare it across designs where any design dimension differs, or even if you change the priors. Taking your example, if you were to drop yield^2 from the utility function, you might generate a design with a D-error of 0.2. You cannot compare this to your design with a D-error of 0.4 however and say it is any better or worse. Hence I wouldn't get too caught up on the value itself, however in practice I tend to find D-errors greater than 1 to suggest something may not quite be right, however that is simply my own rule of thumb and has no basis in science.

Regarding the S-estimate, the reason you are getting such a large number is power. Sample size requirements are based on the t-test which is a test of whether a parameter is different from zero. The closer a parameter is to zero, the larger the sample size required to detect whether it is truly different from zero. You can see this in the actual S-estimate equation. The prior/parameter is in the denominator of the calculation. As the prior/parameter gets closer to zero, the S-estimate goes to infinity.

John

Posted: **Mon Feb 25, 2013 2:16 pm**

Dear John,

thank you very much for your reply and valuable feedback!

Since I'm trying to generate a design for a pilot study, from which I intend to obtain parameter priors, what do you suggest should I do:

i) go ahead with the pilot using the non-bayesian efficient design that I produced with the small prior values and ignoring the high S estimate? or
ii) try experimenting with the priors such as to obtain a lower S estimate and only then use the design to underlie my pilot? or
iii) Obtain priors by following the approach that you and Michiel present in your 2011 paper "Experimental design influences on stated choice output: an empirical study on air travel choice", in which you conduct a pilot study that was underlain by an orthogonal design? I'm aware that Michiel advised me against using an orthogonal design (see the first reply to this forum entry), but wouldn't the parameters obtained from estimating such a pilot study provide sufficient information to feed it to a bayesian MNL design?

I thank you in advance for your advice!

Best regards, Manuel

Posted: **Mon Feb 25, 2013 4:14 pm**

Hi Manuel

There is actually no right or wrong answer to this. The approach in the paper you mention is not necessarily how I would do this now for example. In the paper, we assumed priors equal to zero if in the pilot the parameter was not significant. This probably is not the best strategy in hindsight, but like you, we found little in the literature on how to obtain priors. But what happens if for example, price is not significant in the pilot? Rather than assume the prior is zero which is what we would have done in the paper, I would now probably still assume that it is negative.

Also, I wouldn't get too tied up in worrying about the S-estimate at this time. It is only helpful to my mind if you are somewhat close to the true parameter estimates, which I am guessing at this time you are not.

So I would suggest that you take the best design you have found to date with the priors you assumed and give the design to a few colleagues. Michiel does this all the time and typically gets not only good feedback on the survey question, but some half decent priors. He then uses these to generate a design which he uses for a pilot which then gives him priors for his main study.

John

Posted: **Tue Feb 26, 2013 7:08 pm**

Dear John,

thank you very much for your advice! Letting my colleagues complete the choice tasks is a great idea!

Again, thank you and the rest of the Choice Metrics team for this great service.

Best regards,

Manuel

choice-metrics.com

pilot study for a MMNL model

pilot study for a MMNL model

Re: pilot study for a MMNL model

Re: pilot study for a MMNL model

Re: pilot study for a MMNL model

Re: pilot study for a MMNL model

Re: pilot study for a MMNL model

Re: pilot study for a MMNL model

Re: pilot study for a MMNL model

Re: pilot study for a MMNL model

Re: pilot study for a MMNL model