choice-metrics.com

by **tomosrobinson** » Thu Dec 17, 2020 11:56 pm

Hi there,

As discussed on a previous thread I started on the Choice Experiments - General page ('A question about blocking'), I am designing my first DCE study, which has the following characteristics:

- 7 attributes, each with 5 levels
- 120 choice sets, split into 10 blocks -> 12 choices sets per respondent
- 1,000 participants, so 100 per block
- We plan to use MNL and MXL regression models to analyse the data, and will be interested in main effects only
- We have data from a previous pilot study (n=171), so are hoping to use a Bayesian efficient design to take into account these priors

My code (so far-please correct me if I've done anything wrong!) with no Bayesian priors is:

Code: Select all: ? Syntax for MNL Model (No Priors) Design ;alts = ChoiceA*, ChoiceB* ;rows = 120 ;block = 10 ;eff = (mnl,d) ;model: U(ChoiceA) = b0 + b1_Tired * Tired[1,2,3,4,5] + b2_Walking * Walking[1,2,3,4,5] + b3_Sports * Sports[1,2,3,4,5] + b4_Concentration * Concentration[1,2,3,4,5] + b5_Embaressed * Embaressed[1,2,3,4,5] + b6_Unhappy * Unhappy[1,2,3,4,5] + b7_Treated * Treated[1,2,3,4,5] / U(ChoiceB) = b1_Tired * Tired + b2_Walking * Walking + b3_Sports * Sports + b4_Concentration * Concentration + b5_Embaressed * Embaressed + b6_Unhappy * Unhappy + b7_Treated * Treated $

As the title of this post suggests, I'm now thinking about trying to incorporate the Bayesian priors from the pilot study into the experimental design.

However, the issue I've got is that some of the priors from the pilot study do not make intuitive sense. I've pasted the coefficients and standard errors from the MNL regression models (n=171) below.

Code: Select all: Coeff Std Error tired_2 0.097 0.109 tired_3 0.119 0.106 tired_4 -0.161 0.108 tired_5 -0.383 0.105 walki_2 0.064 0.106 walki_3 -0.202 0.107 walki_4 0.129 0.104 walki_5 -0.186 0.107 sport_2 0.248 0.112 sport_3 0.212 0.106 sport_4 0.428 0.110 sport_5 0.182 0.107 conce_2 -0.056 0.105 conce_3 0.153 0.105 conce_4 0.098 0.105 conce_5 -0.159 0.102 embar_2 0.260 0.110 embar_3 -0.065 0.109 embar_4 0.227 0.106 embar_5 0.134 0.105 unhap_2 0.077 0.107 unhap_3 0.090 0.106 unhap_4 -0.013 0.105 unhap_5 -0.212 0.106 treat_2 -0.074 0.109 treat_3 -0.181 0.107 treat_4 -0.215 0.108 treat_5 -0.530 0.106

Essentially, a priori one would expect ALL the coefficients to be negative, and for the coefficients to be increasing in magnitude within the attributes (i.e. the tired_1 and tired_5 coefficients should both be negative, and tired_5 should be larger in magnitude).

So my question is, if I were to use some of the priors from the pilot study (i.e. the ones which make intuitive sense) but not others, would this be a problem?

From looking at similar posts on the forum, I note that because I know the direction the coefficients should be going in, I could assign instead assign some of the coefficients a "very small negative or positive value, e.g. -0.000001 or 0.000001 as prior. This means that the prior is essentially zero, but it would allow the automatic avoidance of dominant alternatives in Ngene".

Any help regarding this matter would be greatly appreciated. Apologies if I haven't explained myself very well or have left out some information.

Best wishes,

Tom

by **Michiel Bliemer** » Fri Dec 18, 2020 3:03 pm

A few important things to note:

1. There should not be a constant b0 in the utility function of ChoiceA because ChoiceA and ChoiceB are generic alternatives. You can add a constant in model estimation to correct for left-to-right bias, but when generating a design for an unlabelled experiment there should be no constants.

2. I assume that you used dummy coding in your model estimation? I have changed the syntax to reflect this. If you used effects coding, please substitute .dummy with .effects

3. According to your model estimates, you are estimating coefficients for levels 2 to 5, thereby I am assuming that level 1 is the reference level, and that the order of preference for each attribute is 1>2>3>4>5 (since you mention that all coefficients are expected to be negative). I find it suspicious that all your standard errors are 0.1 with 127 respondents you would expect to receive the correct sign. sport2 to sport5 are all positive, meaning that levels 2 to 5 yield a higher utility than reference level 1. Please carefully check your data, did you do the conversion to dummy coding correctly?

4. In Ngene, the LAST level of a dummy coded variable is the reference level, so I have moved level 1 to the end of the list for each attribute.

5. If you believe that all coefficients need to be negative and have a certain order, then I think you need to be pragmatic and simply impose negativity and the order in your priors. In the syntax below I looked at the most negative value for an attribute and set that coefficient for level 5, and I distributed the coefficients for levels 2, 3, and 4 between 0 (the utility for level 1) and this negative value (for level 5). If all signs were wrong, you can assume only an order such as -0.01, -0.02. -0.03, -0.04.

6. You cannot assume all coefficients to be Bayesian because you would need a lot of draws from the distributions to obtain stable results. I generally recommend restricting the number of Bayesian priors to maximum 12 and using fixed priors for the other coefficients. I selected the largest coefficients (for levels 4 and 5) to be Bayesian while keeping the others fixed. I suggest using 1000 or 2000 Sobol draws or using ;bdraws = gauss(2). This will require a large amount of computation time, so you will probably need to run your syntax for a whole day or more.

Code: Select all: Design ;alts = ChoiceA*, ChoiceB* ;rows = 120 ;block = 10 ;eff = (mnl,d,mean) ;bdraws = sobol(2000) ;model: U(ChoiceA) = b1_Tired.dummy[-0.05|-0.1|(n,-0.16,0.1)|(n,-0.38,0.1)] * Tired[2,3,4,5,1] ? 1 = reference level for all dummy coded variables + b2_Walking.dummy[-0.05|-0.1|(n,-0.15,0.1)|(n,-0.2,0.1)] * Walking[2,3,4,5,1] + b3_Sports.dummy[-0.01|-0.02|-0.03|-0.04] * Sports[2,3,4,5,1] + b4_Concentration.dummy[-0.04|-0.08|-0.12|(n,-0.16,0.1)] * Concentration[2,3,4,5,1] + b5_Embaressed.dummy[-0.01|-0.02|-0.03|-0.04] * Embaressed[2,3,4,5,1] + b6_Unhappy.dummy[-0.05|-0.1|-0.15|(n,-0.212,0.1)] * Unhappy[2,3,4,5,1] + b7_Treated.dummy[(n,-0.074,0.1)|(n,-0.181,0.1)|(n,-0.215,0.1)|(n,-0.53,0.1)] * Treated[2,3,4,5,1] / U(ChoiceB) = b1_Tired * Tired + b2_Walking * Walking + b3_Sports * Sports + b4_Concentration * Concentration + b5_Embaressed * Embaressed + b6_Unhappy * Unhappy + b7_Treated * Treated $

Michiel

by **tomosrobinson** » Fri Dec 18, 2020 9:18 pm

Hi Michiel,

Once more, thank you so much for your detailed and thorough reply - it's a great help.

Re Points 1, 2 & 4 - these are basic errors on my part. Looks like I need to read the Ngene user manual again!

Re Point 3: Yes I agree that the data does seem a bit suspicious. I was not involved in the pilot study (it formed part of my colleague's PhD thesis), but in the new year I will go through the raw data again to make sure I'm not making a stupid mistake.

Re Point 5: This makes intuitive sense to me, thank you for pointing me towards this pragmatic and sensible approach.

Re Point 6: Yes, from reading other posts on the forum I suspected that including all the coefficients as Bayesian would not be possible. I'm fully expecting to leave the design running over the weekend when I finally run it in the new year!

Best wishes,

Tom

by **tomosrobinson** » Tue Jan 12, 2021 10:20 pm

Hi Michiel,

As you suggested, I've checked through the raw pilot data (that is being used to generate the Bayesian priors), and the data looks fine to me...

However, when going back through the pilot data, I ran a few more regression models (not just the MNL model as I had done before), and found that the estimated coefficients from the MXL model were much more in line with what one would expect a priori. I've pasted the output from the MXL model below:

Code: Select all: Coeff Std Error tired_2 0.029 0.149 tired_3 -0.105 0.252 tired_4 -0.397 0.357 tired_5 -0.712 0.478 walking_2 0.066 0.144 walking_3 -0.275 0.248 walking_4 -0.219 0.363 walking_5 -0.499 0.468 sports_2 0.152 0.150 sports_3 -0.036 0.249 sports_4 -0.052 0.360 sports_5 -0.283 0.472 concen_2 -0.094 0.149 concen_3 -0.039 0.247 concen_4 -0.233 0.359 concen_5 -0.580 0.477 embarr_2 0.132 0.152 embarr_3 -0.242 0.250 embarr_4 -0.190 0.360 embarr_5 -0.357 0.470 unhappy_2 -0.006 0.151 unhappy_3 -0.111 0.255 unhappy_4 -0.300 0.358 unhappy_5 -0.670 0.477 treated_2 -0.104 0.155 treated_3 -0.381 0.249 treated_4 -0.519 0.361 treated_5 -0.876 0.478

Given that the priors generated from the MXL model are more in line with what one would expect as compared to the MNL model (i.e. more coefficients negative and increasing in magnitude within the attributes), and that we always planned to estimate both MNL and MXL models as part of our analysis, do you think it would be sensible to optimise the design for the MXL model instead of the MNL model?

I realise that there are several parts where the code would need to change if we were to do this, and that I would still need to restrict the number of coefficients that were assumed to be Bayesian due to the issues you highlighted in your previous post.

Any advice at all regarding this matter would be much appreciated.

Best wishes,

Tom

by **Michiel Bliemer** » Wed Jan 13, 2021 10:14 am

If you would like to optimise for the MXL model, you need to use ;eff = (rppanel,d), which refers to the panel version of the mixed logit model. However, optimising for the panel mixed logit model takes extremely long, with the number of parameters you have it will take months or years of computation time, it is simply not feasible. You can consider using the means of the coefficients as priors to optimise for the MNL model. Afterwards, you can evaluate this design in Ngene for the rppanel model (instead of optimising for it). In our experience optimising the design for the MNL model leads to a design that is also reasonably efficient for estimating the panel mixed logit model, see also Bliemer and Rose (2010).

Bliemer, M.C.J., and J.M. Rose (2010) Construction of experimental designs for mixed logit models allowing for correlation across choice observations. Transportation Research Part B, Vol. 44, No. 6, pp. 720-734.

Michiel

by **tomosrobinson** » Thu Jan 14, 2021 7:45 pm

HI Michiel,

Thanks again for your reply, explanation and for the paper reference!

Can I just clarify what you mean by the statement

You can consider using the means of the coefficients as priors to optimise for the MNL model

?

Do you mean using the coefficients from the MMNL/MXL model and using them as priors to optimise for the MNL model? Or something else?

Best wishes,

Tom

by **Michiel Bliemer** » Thu Jan 14, 2021 8:17 pm

Yes that is what I mean, because the means of the random parameters will be similar to the coefficients of the MNL model. Bayesian priors are merely a guess of the parameter estimates, so if you believe that the MXL parameter estimates make more sense then you can decide to use the means of the random parameters as a proxy.

by **tomosrobinson** » Thu Jan 14, 2021 11:35 pm

Thanks Michiel!

by **tomosrobinson** » Wed Feb 03, 2021 4:32 am

Hi Michiel,

One more question if that's okay.

I've generated some designs using a version of the syntax you previously advised me on in this thread, and I'm looking to run some simulations with dummy data before committing to one design with a survey company.

I've found some other threads on this topic (for example:http://www.choice-metrics.com/forum/viewtopic.php?f=2&t=776&p=2822&hilit=simulation#p2822), but have been unable to write the correct syntax to generate the dummy data I'm looking for.

Is there a way that Ngene can "be tricked" to generate such dummy data given the specific nature of my design (i.e. some Bayesian parameters, MNL)?

Best wishes,

Tom

by **Michiel Bliemer** » Mon Feb 08, 2021 6:04 pm

Ngene can only be 'tricked' using the approach that you refer to in your post.

It is very easy to generate a sample in Excel, this is usually what I do myself. To create a choice for a certain choice task, you do the following:

1. For each alternative you compute the utilities V based on some given parameter priors.
2. For each alternative you take a draw from the Gumbell distribution to simulate a random error terms epsilon. In Excel: =-LN(-LN(RAND()))
3. Compute U = V + epsilon for each alternative
4. Generate a choice by setting the choice indicator for the alternative with the highest U to 1 and all other alternatives to 0.

Michiel

choice-metrics.com

Incorporating Bayesian Priors from Pilot Study into Design

Incorporating Bayesian Priors from Pilot Study into Design

Re: Incorporating Bayesian Priors from Pilot Study into Desi

Re: Incorporating Bayesian Priors from Pilot Study into Desi

Re: Incorporating Bayesian Priors from Pilot Study into Desi

Re: Incorporating Bayesian Priors from Pilot Study into Desi

Re: Incorporating Bayesian Priors from Pilot Study into Desi

Re: Incorporating Bayesian Priors from Pilot Study into Desi

Re: Incorporating Bayesian Priors from Pilot Study into Desi

Re: Incorporating Bayesian Priors from Pilot Study into Desi

Re: Incorporating Bayesian Priors from Pilot Study into Desi

Who is online