Incorporating Bayesian Priors from Pilot Study into Design

This forum is for posts that specifically focus on Ngene.

Moderators: Andrew Collins, Michiel Bliemer, johnr

Incorporating Bayesian Priors from Pilot Study into Design

Postby tomosrobinson » Thu Dec 17, 2020 11:56 pm

Hi there,

As discussed on a previous thread I started on the Choice Experiments - General page ('A question about blocking'), I am designing my first DCE study, which has the following characteristics:

- 7 attributes, each with 5 levels
- 120 choice sets, split into 10 blocks -> 12 choices sets per respondent
- 1,000 participants, so 100 per block
- We plan to use MNL and MXL regression models to analyse the data, and will be interested in main effects only
- We have data from a previous pilot study (n=171), so are hoping to use a Bayesian efficient design to take into account these priors

My code (so far-please correct me if I've done anything wrong!) with no Bayesian priors is:

Code: Select all
? Syntax for MNL Model (No Priors)

Design
;alts = ChoiceA*, ChoiceB*
;rows = 120
;block = 10
;eff = (mnl,d)
;model:
U(ChoiceA) =  b0
           + b1_Tired * Tired[1,2,3,4,5]
           + b2_Walking * Walking[1,2,3,4,5]
           + b3_Sports * Sports[1,2,3,4,5]
           + b4_Concentration * Concentration[1,2,3,4,5]
           + b5_Embaressed * Embaressed[1,2,3,4,5]
           + b6_Unhappy * Unhappy[1,2,3,4,5]
           + b7_Treated * Treated[1,2,3,4,5] /

U(ChoiceB) =
              b1_Tired * Tired           
           + b2_Walking * Walking           
           + b3_Sports * Sports         
           + b4_Concentration * Concentration         
           + b5_Embaressed * Embaressed           
           + b6_Unhappy * Unhappy           
           + b7_Treated * Treated             
$


As the title of this post suggests, I'm now thinking about trying to incorporate the Bayesian priors from the pilot study into the experimental design.

However, the issue I've got is that some of the priors from the pilot study do not make intuitive sense. I've pasted the coefficients and standard errors from the MNL regression models (n=171) below.

Code: Select all
         Coeff   Std Error
tired_2   0.097   0.109
tired_3   0.119   0.106
tired_4   -0.161   0.108
tired_5   -0.383   0.105
walki_2   0.064   0.106
walki_3   -0.202   0.107
walki_4   0.129   0.104
walki_5   -0.186   0.107
sport_2   0.248   0.112
sport_3   0.212   0.106
sport_4   0.428   0.110
sport_5   0.182   0.107
conce_2   -0.056   0.105
conce_3   0.153   0.105
conce_4   0.098   0.105
conce_5   -0.159   0.102
embar_2   0.260   0.110
embar_3   -0.065   0.109
embar_4   0.227   0.106
embar_5   0.134   0.105
unhap_2   0.077   0.107
unhap_3   0.090   0.106
unhap_4   -0.013   0.105
unhap_5   -0.212   0.106
treat_2   -0.074   0.109
treat_3   -0.181   0.107
 treat_4   -0.215   0.108
treat_5   -0.530   0.106



Essentially, a priori one would expect ALL the coefficients to be negative, and for the coefficients to be increasing in magnitude within the attributes (i.e. the tired_1 and tired_5 coefficients should both be negative, and tired_5 should be larger in magnitude).

So my question is, if I were to use some of the priors from the pilot study (i.e. the ones which make intuitive sense) but not others, would this be a problem?

From looking at similar posts on the forum, I note that because I know the direction the coefficients should be going in, I could assign instead assign some of the coefficients a "very small negative or positive value, e.g. -0.000001 or 0.000001 as prior. This means that the prior is essentially zero, but it would allow the automatic avoidance of dominant alternatives in Ngene".

Any help regarding this matter would be greatly appreciated. Apologies if I haven't explained myself very well or have left out some information.

Best wishes,

Tom
tomosrobinson
 
Posts: 17
Joined: Tue Nov 17, 2020 2:36 am

Re: Incorporating Bayesian Priors from Pilot Study into Desi

Postby Michiel Bliemer » Fri Dec 18, 2020 3:03 pm

A few important things to note:

1. There should not be a constant b0 in the utility function of ChoiceA because ChoiceA and ChoiceB are generic alternatives. You can add a constant in model estimation to correct for left-to-right bias, but when generating a design for an unlabelled experiment there should be no constants.

2. I assume that you used dummy coding in your model estimation? I have changed the syntax to reflect this. If you used effects coding, please substitute .dummy with .effects

3. According to your model estimates, you are estimating coefficients for levels 2 to 5, thereby I am assuming that level 1 is the reference level, and that the order of preference for each attribute is 1>2>3>4>5 (since you mention that all coefficients are expected to be negative). I find it suspicious that all your standard errors are 0.1 with 127 respondents you would expect to receive the correct sign. sport2 to sport5 are all positive, meaning that levels 2 to 5 yield a higher utility than reference level 1. Please carefully check your data, did you do the conversion to dummy coding correctly?

4. In Ngene, the LAST level of a dummy coded variable is the reference level, so I have moved level 1 to the end of the list for each attribute.

5. If you believe that all coefficients need to be negative and have a certain order, then I think you need to be pragmatic and simply impose negativity and the order in your priors. In the syntax below I looked at the most negative value for an attribute and set that coefficient for level 5, and I distributed the coefficients for levels 2, 3, and 4 between 0 (the utility for level 1) and this negative value (for level 5). If all signs were wrong, you can assume only an order such as -0.01, -0.02. -0.03, -0.04.

6. You cannot assume all coefficients to be Bayesian because you would need a lot of draws from the distributions to obtain stable results. I generally recommend restricting the number of Bayesian priors to maximum 12 and using fixed priors for the other coefficients. I selected the largest coefficients (for levels 4 and 5) to be Bayesian while keeping the others fixed. I suggest using 1000 or 2000 Sobol draws or using ;bdraws = gauss(2). This will require a large amount of computation time, so you will probably need to run your syntax for a whole day or more.

Code: Select all
Design
;alts = ChoiceA*, ChoiceB*
;rows = 120
;block = 10
;eff = (mnl,d,mean)
;bdraws = sobol(2000)
;model:
U(ChoiceA) = b1_Tired.dummy[-0.05|-0.1|(n,-0.16,0.1)|(n,-0.38,0.1)]                       * Tired[2,3,4,5,1]  ? 1 = reference level for all dummy coded variables
           + b2_Walking.dummy[-0.05|-0.1|(n,-0.15,0.1)|(n,-0.2,0.1)]                      * Walking[2,3,4,5,1]   
           + b3_Sports.dummy[-0.01|-0.02|-0.03|-0.04]                                     * Sports[2,3,4,5,1]     
           + b4_Concentration.dummy[-0.04|-0.08|-0.12|(n,-0.16,0.1)]                      * Concentration[2,3,4,5,1]
           + b5_Embaressed.dummy[-0.01|-0.02|-0.03|-0.04]                                 * Embaressed[2,3,4,5,1]
           + b6_Unhappy.dummy[-0.05|-0.1|-0.15|(n,-0.212,0.1)]                            * Unhappy[2,3,4,5,1]
           + b7_Treated.dummy[(n,-0.074,0.1)|(n,-0.181,0.1)|(n,-0.215,0.1)|(n,-0.53,0.1)] * Treated[2,3,4,5,1]
           /
U(ChoiceB) = b1_Tired * Tired           
           + b2_Walking * Walking           
           + b3_Sports * Sports         
           + b4_Concentration * Concentration         
           + b5_Embaressed * Embaressed           
           + b6_Unhappy * Unhappy           
           + b7_Treated * Treated             
$


Michiel
Michiel Bliemer
 
Posts: 1885
Joined: Tue Mar 31, 2009 4:13 pm

Re: Incorporating Bayesian Priors from Pilot Study into Desi

Postby tomosrobinson » Fri Dec 18, 2020 9:18 pm

Hi Michiel,

Once more, thank you so much for your detailed and thorough reply - it's a great help.

Re Points 1, 2 & 4 - these are basic errors on my part. Looks like I need to read the Ngene user manual again!

Re Point 3: Yes I agree that the data does seem a bit suspicious. I was not involved in the pilot study (it formed part of my colleague's PhD thesis), but in the new year I will go through the raw data again to make sure I'm not making a stupid mistake.

Re Point 5: This makes intuitive sense to me, thank you for pointing me towards this pragmatic and sensible approach.

Re Point 6: Yes, from reading other posts on the forum I suspected that including all the coefficients as Bayesian would not be possible. I'm fully expecting to leave the design running over the weekend when I finally run it in the new year!

Best wishes,

Tom
tomosrobinson
 
Posts: 17
Joined: Tue Nov 17, 2020 2:36 am

Re: Incorporating Bayesian Priors from Pilot Study into Desi

Postby tomosrobinson » Tue Jan 12, 2021 10:20 pm

Hi Michiel,

As you suggested, I've checked through the raw pilot data (that is being used to generate the Bayesian priors), and the data looks fine to me...

However, when going back through the pilot data, I ran a few more regression models (not just the MNL model as I had done before), and found that the estimated coefficients from the MXL model were much more in line with what one would expect a priori. I've pasted the output from the MXL model below:

Code: Select all
         Coeff   Std Error
tired_2   0.029   0.149
tired_3   -0.105   0.252
tired_4   -0.397   0.357
tired_5   -0.712   0.478
walking_2   0.066   0.144
walking_3   -0.275   0.248
walking_4   -0.219   0.363
walking_5   -0.499   0.468
sports_2   0.152   0.150
sports_3   -0.036   0.249
sports_4   -0.052   0.360
sports_5   -0.283   0.472
concen_2   -0.094   0.149
concen_3   -0.039   0.247
concen_4   -0.233   0.359
concen_5   -0.580   0.477
embarr_2   0.132   0.152
embarr_3   -0.242   0.250
embarr_4   -0.190   0.360
embarr_5   -0.357   0.470
unhappy_2 -0.006   0.151
unhappy_3 -0.111   0.255
unhappy_4 -0.300   0.358
unhappy_5 -0.670   0.477
treated_2   -0.104   0.155
treated_3   -0.381   0.249
treated_4   -0.519   0.361
treated_5   -0.876   0.478


Given that the priors generated from the MXL model are more in line with what one would expect as compared to the MNL model (i.e. more coefficients negative and increasing in magnitude within the attributes), and that we always planned to estimate both MNL and MXL models as part of our analysis, do you think it would be sensible to optimise the design for the MXL model instead of the MNL model?

I realise that there are several parts where the code would need to change if we were to do this, and that I would still need to restrict the number of coefficients that were assumed to be Bayesian due to the issues you highlighted in your previous post.

Any advice at all regarding this matter would be much appreciated.

Best wishes,

Tom
tomosrobinson
 
Posts: 17
Joined: Tue Nov 17, 2020 2:36 am

Re: Incorporating Bayesian Priors from Pilot Study into Desi

Postby Michiel Bliemer » Wed Jan 13, 2021 10:14 am

If you would like to optimise for the MXL model, you need to use ;eff = (rppanel,d), which refers to the panel version of the mixed logit model. However, optimising for the panel mixed logit model takes extremely long, with the number of parameters you have it will take months or years of computation time, it is simply not feasible. You can consider using the means of the coefficients as priors to optimise for the MNL model. Afterwards, you can evaluate this design in Ngene for the rppanel model (instead of optimising for it). In our experience optimising the design for the MNL model leads to a design that is also reasonably efficient for estimating the panel mixed logit model, see also Bliemer and Rose (2010).

Bliemer, M.C.J., and J.M. Rose (2010) Construction of experimental designs for mixed logit models allowing for correlation across choice observations. Transportation Research Part B, Vol. 44, No. 6, pp. 720-734.

Michiel
Michiel Bliemer
 
Posts: 1885
Joined: Tue Mar 31, 2009 4:13 pm

Re: Incorporating Bayesian Priors from Pilot Study into Desi

Postby tomosrobinson » Thu Jan 14, 2021 7:45 pm

HI Michiel,

Thanks again for your reply, explanation and for the paper reference!

Can I just clarify what you mean by the statement
You can consider using the means of the coefficients as priors to optimise for the MNL model
?

Do you mean using the coefficients from the MMNL/MXL model and using them as priors to optimise for the MNL model? Or something else?

Best wishes,

Tom
tomosrobinson
 
Posts: 17
Joined: Tue Nov 17, 2020 2:36 am

Re: Incorporating Bayesian Priors from Pilot Study into Desi

Postby Michiel Bliemer » Thu Jan 14, 2021 8:17 pm

Yes that is what I mean, because the means of the random parameters will be similar to the coefficients of the MNL model. Bayesian priors are merely a guess of the parameter estimates, so if you believe that the MXL parameter estimates make more sense then you can decide to use the means of the random parameters as a proxy.
Michiel Bliemer
 
Posts: 1885
Joined: Tue Mar 31, 2009 4:13 pm

Re: Incorporating Bayesian Priors from Pilot Study into Desi

Postby tomosrobinson » Thu Jan 14, 2021 11:35 pm

Thanks Michiel!
tomosrobinson
 
Posts: 17
Joined: Tue Nov 17, 2020 2:36 am

Re: Incorporating Bayesian Priors from Pilot Study into Desi

Postby tomosrobinson » Wed Feb 03, 2021 4:32 am

Hi Michiel,

One more question if that's okay.

I've generated some designs using a version of the syntax you previously advised me on in this thread, and I'm looking to run some simulations with dummy data before committing to one design with a survey company.

I've found some other threads on this topic (for example:http://www.choice-metrics.com/forum/viewtopic.php?f=2&t=776&p=2822&hilit=simulation#p2822), but have been unable to write the correct syntax to generate the dummy data I'm looking for.

Is there a way that Ngene can "be tricked" to generate such dummy data given the specific nature of my design (i.e. some Bayesian parameters, MNL)?

Best wishes,

Tom
tomosrobinson
 
Posts: 17
Joined: Tue Nov 17, 2020 2:36 am

Re: Incorporating Bayesian Priors from Pilot Study into Desi

Postby Michiel Bliemer » Mon Feb 08, 2021 6:04 pm

Ngene can only be 'tricked' using the approach that you refer to in your post.

It is very easy to generate a sample in Excel, this is usually what I do myself. To create a choice for a certain choice task, you do the following:

1. For each alternative you compute the utilities V based on some given parameter priors.
2. For each alternative you take a draw from the Gumbell distribution to simulate a random error terms epsilon. In Excel: =-LN(-LN(RAND()))
3. Compute U = V + epsilon for each alternative
4. Generate a choice by setting the choice indicator for the alternative with the highest U to 1 and all other alternatives to 0.

Michiel
Michiel Bliemer
 
Posts: 1885
Joined: Tue Mar 31, 2009 4:13 pm

Next

Return to Choice experiments - Ngene

Who is online

Users browsing this forum: No registered users and 22 guests

cron