choice-metrics.com

by **yashar_zr** » Tue Aug 25, 2020 5:06 am

Hi
First off and in advance, thank you for providing this resourceful platform for researchers.

I am making an SP experiment using Ngene to study the Value of Time perception of truck drivers. I have decided to go with the ML design and have already conducted my pilot using a MNL design, thus I have the prior parameter information for my D-efficient design. I am in process of finalizing and distributing the design but I wanted to have your thoughts on its specification prior to its survey distribution. So feel free to give me your comments/advises whatsoever.

I attach my current syntax and will explain a few details about it and finally finish this post with a few questions that I could not find any legitimate answer by going over literature.

Code: Select all: Design ;alts = Route A, Route B ;alg = mfederov ;rows = 12 ;eff = (rppanel, d) ;rep = 1000 ;rdraws = Halton (500) ;reject: Route A.TT + Route A.TTV >= Route B.TT + Route B.TTV, #The study area shows that the toll route's travel time and its potential delay is always lower than the free route's Route A.Dist > 0 and Route B.Dist > 0 #Since the situation where both routes have "extra" distance does not make sense, I always have to keep one of them zero ;model: U(Route A) = b1[n,-0.04051,0.00837] * TT[60,80,100,120] #Travel Time + b2[n,-0.03067,0.00570] * TC[50,100,150] #Toll Cost + b3[n,-0.02740,0.00823] * TTV[0,10,20,30] #Potential_Delay + b4[n,-0.02315,0.01293] * Dist[0,15,30] / #Extra_Distance U(Route B) = b1 * TT + b3 * TTV + b4 * Dist $

Questions:
1. Do you know any good source on RP Distributional Assumptions? Can we postpone this specification to the model estimation stage or should be the same for both the DCE and the Estimating Model?
1.1. All of my parameters are intuitively negative, Should I use lognormal distribution instead of normal? and if I do, should I reverse the signage of attributes (i.e. TT: -60, -80 etc.) or substitute the + sign in utility function with a - sign?
2. Considering the fact that giving prior information to all of my parameters would result in more realistic choice situations, how should I decide to have them all random or keep some of them fixed and the others random as it decreases model complexity?
3. When I start to introduce a bayesian approach to this current model, I receive an Undefined D-error, what is the source of this problem?
4. How should I decide the Standard Deviation of the Bayesian version as I only have the mean values?
5. Due to the nature of my constraints, I cannot use if statements as it only works with the other algorithm (RSC). Can you think of any equivalent version with the other algorithm? Does it really make any difference to use the other algorithm?
6. Do you know any good source for choosing the sampling method? Which one do you suggest?
7. I started to play around with the candidate set size and figured out that every time it only gives one valid design and there is no meaningful relationship between the candidate set size and my constrained size (1710). What should I set as my candidate set size?

Your inputs are much appreciated.
Cheers,
Yashar

by **Michiel Bliemer** » Tue Aug 25, 2020 9:42 am

Let me first say that I never optimise for mixed logit models because computation times are generally prohibitive and there is little benefit in doing so. I always optimise for the MNL model with Bayesian priors, which come out of a pilot study. It is very unusual to have mixed logit priors from a pilot study given the typically small sample sizes in a pilot study. How did you obtain your mixed logit priors? I would argue that optimising the Bayesian D-error for an MNL model (that takes unreliability of parameter estimates into account, but not heterogeneity of respondents) is better than optimising a local D-error for a mixed logit model (which does not take unreliability of parameter estimates into account, only heterogeneity of respondents).

When you estimate an MNL model based on pilot data, you obtain a parameter value (b) and a standard error (se), which you can use to inform a Bayesian prior, i.e. b1[(n,b,se)], noting the round brackets around the prior.

1. Optimisation for mixed logit models is very rare, I think that only a handful of people have ever done this, so there is essentially no guidance here. But given that at the design stage there is so much uncertainty about the distribution to use, if you want to optimise for a mixed logit model I would simply assume a normal distribution. In model estimation later, you can use any distribution you want.

2. A pilot study could provide information, but as said above, it is often difficult to estimate a mixed logit model with a high level of reliability.

3. What do you mean with a "Bayesian approach", do you mean Bayesian estimation, or Bayesian priors? You have not specified Bayesian priors in your syntax. In your syntax you are optimising for a mixed logit model with 8 coefficients to estimate (4 betas and 4 standard deviations) with using only 12 choice tasks, which does not provide much information. If you increase the number of rows to 24 then Ngene will provide a finite D-error, but you will notice that optimisation is very slow because in each design evaluation it needs to do 500 x 1000 = 50,000 computations of the D-error. As mentioned above, I would recommend against the use of an rppanel optimisation and in favour of optimising for an MNL model, which has shown to be also efficient for estimating a mixed logit model. Please refer to Bliemer and Rose (2010).

Bliemer, M.C.J., and J.M. Rose (2010) Construction of experimental designs for mixed logit models allowing for correlation across choice observations. Transportation Research Part B, Vol. 44, No. 6, pp. 720-734.

4. See above, a pilot study gives you the full Bayesian distribution.

5. You can often easily rewrite ;cond constraints into ;reject and ;require constraints, they are both simply logical statements. If you need assistance please provide me with the ;cond constraint you would like to impose and I will try to rewrite it.

6. I would recommend Gaussian quadrature, i.e. ;rdraws = gauss(5) and ;bdraws = gauss(5), which does 5^4 = 625 draws. See Bliemer et al. (2008).

Bliemer, M.C.J., J.M. Rose, and S. Hess (2008) Approximation of Bayesian efficiency in experimental choice designs. Journal of Choice Modelling, Vol. 1, pp. 98-127.

7. You have imposed constraints and Ngene says that there exist 1710 valid choice tasks. So you can set the candidate size to any value up to 1710, e.g. candidates = 1000, but you can also simply just leave it as is. Ngene does not give one design but will pick 12 rows from these 1710 rows and keeps generating other designs. For a mixed logit model, this may take VERY long, so you may need to wait for some time (minutes, hours) before it shows other designs.

I recommend using syntax like below, where I assume that you are generating a Bayesian efficient design.

Code: Select all: Design ;alts = RouteA*, RouteB* ;alg = mfederov ;rows = 24 ;eff = (mnl,d) ;bdraws = gauss(5) ;reject: RouteA.TT + RouteA.TTV >= RouteB.TT + RouteB.TTV, ?The study area shows that the toll route's travel time and its potential delay is always lower than the free route's RouteA.Dist > 0 and RouteB.Dist > 0 ?Since the situation where both routes have "extra" distance does not make sense, I always have to keep one of them zero ;model: U(RouteA) = b1[(n,-0.04051,0.00837)] * TT[60,80,100,120] ?Travel Time + b2[(n,-0.03067,0.00570)] * TC[50,100,150] ?Toll Cost + b3[(n,-0.02740,0.00823)] * TTV[0,10,20,30] ?Potential_Delay + b4[(n,-0.02315,0.01293)] * Dist[0,15,30] ?Extra_Distance / U(RouteB) = b1 * TT + b3 * TTV + b4 * Dist $

Michiel

by **yashar_zr** » Thu Aug 27, 2020 12:32 am

Thank you very much for your time Prof. Bliemer. Your comments were very informative.

I am sorry that my introduction was a bit confusing. I have actually had done my pilot study using a MNL model. I attempted to have Bayesian priors for that, however, it was not generating a realistic survey as it was not able to sufficiently differentiate attributes and levels, hence producing a lot of implausible choice situations.
Therefore, Having gone through literature and finding out that even a very small prior information so as to inform the model about the parameter signage is helpful, I decided to give the pilot experiment a very small prior parameter values (in 10^-4 order) to generate more realistic questions.
Then for the final design, I came across the fact that preference heterogeneity has to be taken into account since it is a kind of WTP study and decided to design the final experiment using mixed logit with the information gained in the pilot MNL design.

Q1. Given the pilot design, do you think I should have left it with Bayesian prior instead of non-zero with small values as signage?
Q2. Is it not ok to use information from a MNL pilot for a Mixed logit final full DCE?

I have read your paper on comparing all these design model types and have seen that mixed logit is working best, although mnl is very competitive.
Q3. Now in terms of choosing which path to follow, which path would you consider the best?
A. Switching to MNL with Bayesian priors
B. Sticking with ML and using obtained priors
C. Sticking with ML and try to add a very small level of uncertainty using Bayesian priors in addition to the random parameters?

Q4. Since I am going to have a SP survey, dont you recommend using panel version?

Q5. Now that I have explained the problem clearer, do you still think 24 Rows works better in terms of estimating my parameters as compared to 12 rows? (In that way, I have to block my design since considering my respondents I have to avoid potential boredom/fatigue very seriously)

Yashar

by **Michiel Bliemer** » Fri Aug 28, 2020 9:39 pm

Q1. The state of the art is to do a pilot study, obtain Bayesian priors, and generate a Bayesian efficient design for an MNL mode.

Q2. You can estimate a mixed logit model with a random design, an orthogonal design, or an efficient design. An efficient design optimised for the MNL model is much better for estimating a mixed logit model than using a random or orthogonal design. It is not practically possible to optimise for a mixed logit model, we never recommend that, we only recommend evaluating for a mixed logit model, not optimising for it. As long as you have enough degrees of freedom, i.e. enough rows, you can estimate any model.

Q3. See above, optimise for MNL because this is the only feasible thing to do and it has a similar efficiency. No one optimises for panel mixed logit models, I think only a handful of people have ever tried that in the literature. So definitely option A.

Q4. You optimise for an MNL model using Bayesian priors and you estimate in the end a panel mixed logit model. That is the most practical work order.

Q5. Yes, you need sufficient variation in your data to estimate more advanced models later, such as mixed logit models. 12 rows is not enough variation in my mind.

Michiel

by **yashar_zr** » Fri Sep 11, 2020 4:08 am

Thank you Prof. Bliemer, I really appreciate your guidance. I have finally opted out for optimizing a Bayesian MNL design as it improves my D-error significantly plus it reduces the complexity a lot.

I am in process of finalizing my design and I have a couple of small queries that I list them out here. Hopefully it would be helpful for other researchers as well.

Q1. In case of using unlabeled alternatives with Bayesian prior values, do you recommend using star property (*) for checking the dominance between alternatives?

Q2. When Ngene converges to the lowest possible D-error for my design, I just noticed that for some of my attributes, it is not covering all of the levels in the choice situations (i.e. I had 3 levels but I could only see 2 of them in choice situations). Will this compromise the design performance in terms of capturing non-linearities? When I change my levels to 4, it is however able to have all of them at least one time in the choice situations but with sacrificing a little bit of D-error increase (to a very small order: 10^-4). Which approach do you recommend?

Q3. I guess there is a correlation between the first and second question. Assuming that the design with the lowest D-error in either situations would not cover all attribute levels, what should one do to capture non-linearities? maybe choosing a design with a slightly higher D-error or ... ?

Thanks in advance
Yashar

by **Michiel Bliemer** » Fri Sep 11, 2020 9:50 am

Q1. Yes I would check for dominance by adding stars.

Q2. This happens because you are using the modified Federov algorithm, which relaxes the assumption of attribute level balance. Attribute level balance is not needed and it is often more efficient to let go of it, but as you mention it may not allow to estimate nonlinearies afterwards. There are three ways around this:

(a) Include attribute level constraints; with 24 rows and 4 levels, you would have perfect attribute level balance if each level appears 6 times across the design. The mfederov algorithm cannot perfectly satisfy this contraint, but it could satisfy more relaxed constraints. I would use something like:
... + b1 * TT[60,80,100,120](4-8,4-8,4-8,4-8) ...
This would require each level to appear between 4 and 8 times within the design.

(b) Use dummy coding for all attributes; using dummy coding, all attribute levels will appear more or less equally in an efficient design. But this would also require using many more priors, which is likely unwanted.

(c) Use the default swapping algorithm; the swapping algorithm can only handle conditional if-then constraints, i.e. you would need to reformulate your ;reject constraints to ;cond constraints.

Q3. See my proposed options above, you can make sure that all levels appear in the design.

Michiel

choice-metrics.com

A ML (rppanel) DCE on Value of Time

A ML (rppanel) DCE on Value of Time

Re: A ML (rppanel) DCE on Value of Time

Re: A ML (rppanel) DCE on Value of Time

Re: A ML (rppanel) DCE on Value of Time

Re: A ML (rppanel) DCE on Value of Time

Re: A ML (rppanel) DCE on Value of Time

Who is online