choice-metrics.com

by **phildias** » Thu Sep 26, 2013 3:19 am

Fellow Ngeners,

I am creating a design for a Stated Choice experiment in which users choose between two alternatives: their current mode of transportation and a new train.

I'm currently considering the two alternatives' utility functions and design syntax below (considering the person currently uses the bus):

Code: Select all: ;alts = train*, bus* ;rows = 30 ;eff = (mnl,d) ;block = 6 ;model: U(train) = b1[(u,-1,0)] * time_train[0.2,0,-0.2] + b2[(u,-1,0)] * access_time_train[15,10,5] + b3[(u,-1,0)] * cost_train[0.2,0,-0.2] + b4[(u,-1,0)] * comfort_low_train[1,0] + b5[(u,0,1)] * comfort_high_train[0,1] / U(bus) = b6 + b7[(u,-1,0)] * time_bus[0.2,0,-0.2] + b8[(u,-1,0)] * access_time_bus[15,10,5] + b9[(u,-1,0)] * cost_bus[0.2,0,-0.2] ;cond: if(train.comfort_low_train = 1, train.comfort_high_train=0), if(train.comfort_high_train = 1, train.comfort_low_train=0) $

Note: The comfort level is only varied within the new train. The default value is "Medium comfort level", where both dummies would be coded to zero. That's why I created the constraints in the syntax.

The idea here would be to ask the person's current time and cost parameters in the beginning of the survey and, in the choice tasks, have the time and cost values of both alternatives vary according to what they stated.

Example:

In the beginning of the survey (information given by respondent)
Current mode: bus
Current time: 100 minutes
Current cost: $10

In the choice tasks:

Bus: Costs $8 and takes 120 minutes
New Train: Costs $12 and takes 80 minutes
In this scenario, which mode would you chose?

Note: both modes have time and cost parameters which are related/derived from the stated values in the beginning of the survey.

The main issues are:

1) Pivoted design
I am not using Ngene's function to generate pivoted designs because I am also interested in evaluating the user's sensibility towards the price and time parameters of their current mode of transportation. That's why I adopted the -0.2, 0 and +0.2 levels on almost all parameters (except access_time, where I imagined the 20% difference would be too small). Is this the best way to deal with the design I want to generate?

2) Prior information
It is well documented in literature that time and cost parameters impact utility negatively (the more expensive and the longer the mode takes, the lower the utility), but in this case I have no other information but the sign. Therefore, I am creating a design with bayesian estimates that can help me get closer to the true estimates. My main question here is: have I crossed any lines or stated something absurd in the design syntax above?

3) Generic vs. Alternative specific coefficients
Also note that I have set alternative specific coefficients for every parameter. Afterwards, I might end up estimating generic coefficients for the time and cost parameters, but I'm just taking the safe road and considering the fact that It might be better to model the alternative specific version. Are there any problems with this?

4) Ngene Warning messages
Even though I've been using the asterisk command in the syntax to avoid alternative dominance, I have been getting this message:

Warning: Two alternatives were specified for alternative repetition checking, but do not have the same attribute names, and so will not be checked. 'train', 'bus'
Warning: Two alternatives were specified for alternative dominance checking, but do not have the same priors, and so will not be checked. 'train', 'bus'

Why is this happening? Why do the alternatives need to have the same priors to check for dominance?

I'm sorry for the long post! I hope I was clear about the issues. If anyone has any insights, it would be great.

Thank you!!!

by **johnr** » Thu Sep 26, 2013 9:15 am

Dear Phildias

The dominance check provided for by using the * is designed for unlabeled choice experiments which require generic estimates (aside from ASCs). The reason for this is explained in S8.8 of the manual. In short, the * prevents a number of conditions arising in unlabeled choice experiments. Firstly, attribute level combinations matter far more in unlabelled experiments in terms of allocation to the alternatives. For example, given two alternatives described by the attributes time and cost, assume Alt A: T1 = 10, C1 = 5; Alt2: T2 = 20, C = 2.5. In an unlabelled experiment, this is effectively the same as Alt A: T1 = 20, C1 = 2.5; Alt2: T2 = 10, C = 5 putting aside ordering effects. In a labelled experiment, however, if the marginal utilities are not the same for train and bus, then Train: T1 = 10, C1 = 5; Bus: T2 = 20, C = 2.5 is not the same as Train: T1 = 20, C1 = 2.5; Bus: T2 = 10, C = 5. The * tells the software not to allow the same combinations of attributes to appear but in different alternative orderings. In labelled experiments, where this doesn't matter (assuming alternative specific estimates), the * is meaningless, as repetition of re-ordered attributes matters not.

Note that even though you have used the same Bayesian prior distributions in your syntax, a draw from b1 will not produce the same draw as b7 even though they are both associated with time attributes and the attributes take the same levels. It is possible that b1 will draw a value of -0.8 whilst b7 will draw -0.5. In this case, the program assumes that the marginal disutility for time on train is greater than that for bus and hence a 0.2 level for train is worse than a 0.2 level for bus, all else being equal. The second draw, might produce a value of -0.2 for train time and -0.9 for bus time in which case a 0.2 level for bus is worse than a 0.2 level for train. The choice probabilities and D-error are averaged over these draws and it is over this averaging process that dominance is determined. That is, dominance is determined by a combination of the parameter and attribute values when you have designs that have alternative specific estimates. In a generic case, the parameter values do not matter as they are the same across alternatives and only the levels count.

I hope this explains the warning you are getting.

John

by **phildias** » Thu Sep 26, 2013 10:51 pm

Dear John,

Thank you for such a quick reply!! It is clear now why the * is useless in this case.

Now, regarding the other issues (mainly the pivoted design and known signs of priors), any comments?

Once again, thank you very much!

Phil

by **johnr** » Fri Sep 27, 2013 9:38 am

HI Phil

Sorry, I ran out of time yesterday to complete the reply. As I advice all my PhD students, never graduate - it only means more unproductive meetings and less time to do the fun stuff.

Re designs, the objective is to calculate the AVC matrix that you would expect to obtain if the priors you assumed were correct. The AVC matrix is a function of the priors, the choice probabilities and the attribute levels. Hence, you want to use the attribute levels that you are going to use when analysing the data, as only then will the AVC matrix optimised for the design approximate that used when estimating the data. Hence, if in analysing the data you are going to use -0.2, 0 and 0.2, then these are the levels you should use when optimising the design. Pivot designs require that you first capture the attribute levels from the respondents and then construct the attribute levels as absolute values (rather than % shifts) based on the real life levels reported by the respondent. The absolute levels may be constructed using % shifts, but the model assumes the actual values rather than the % shifts when calculating the AVC matrix. For example, if the respondents travel time is reported as 20 minutes, then even though you specify the levels as -0.2, 0 and 0.2, the design itself and AVC matrix are constructed assuming you are going to analyse the data as if the respondent saw the levels 16, 20 and 24, not -0.2, 0 and 0.2. The way you have specified the design, the design assumes you will analyse the data using levels -0.2, 0 and 0.2.

Re the price prior, you appear to have done this correctly, sort of. Bayesian priors reflect uncertainty by the analyst as to the true prior, hence using Bayesian uniform priors is a good way of reflecting knowledge of sign but not magnitude. However you need to consider the combined impact of prior and attribute level. Taking your example, the average travel time level is 0 and average prior -0.5. Hence, on average overall contribution to utility for travel time will be 0. For access time, the average time is 10 and the average prior is -0.5.The average overall contribution to utility will be -5. In calculating the choice probabilities and utilities, you are stating here that waiting time will dominate the choice as it is having a much larger impact upon overall utility than all other attributes. The magnitude of the prior should reflect the magnitude of the attribute.

Re AS versus generic parameters, this is similar to point 1. You want to get the model utility as similar to what you believe it will be after you collect the data and estimate the model. The point of all this is to approximate the AVC matrix of the final model you are going to estimate. If you believe the final model will have AS parameters, then assume them in generating the design. If not, treat them as generic. The question you are really trying to ask is what if you get this wrong. The answer is quite simple - you will likely loose efficiency. Moving from AS to generic parameters, you should still be able to estimate the parameters, but you may need a larger sample size than if you got all your assumptions right (you might need a smaller one also, one can never tell). Hence, I would suggest assuming AS in your design to play it safe, but assume you will need a larger sample size than what is suggested. This is why we always state that the S-error is the theoretical minimum sample size - it is calculated assuming all your assumptions are correct.

John

by **phildias** » Tue Oct 01, 2013 1:39 am

Hi John,

First of all, thanks for such a detailed response! I believe all my questions are now answered. We are going to run a pilot study in a few days with about 15~30 respondents receiving 5 choice tasks each. I think I'll be able to get better estimates of the coefficients and get a greater grasp of reality once this first step is complete.

Once this pilot study is finished, in what way should the results be input back into Ngene to get the "official" design? Correct me if I'm wrong, but I'm guessing that after I calculate the coefficients of the pilot and find their average+variance through MNL modeling, I create another model inside Ngene with bayesian estimates, but instead of using uniformly distributed estimates (like in the pilot) I can use normally distributed priors and make use of the pilot average/variance data, right?

Once again, thank you so much for clearing things up so well.

Phil

PS: An eternal student life would definitely be incredible, wouldn't it? Hahahaha....

by **johnr** » Tue Oct 01, 2013 3:44 pm

Hi Phil

The choice of distribution should depend on your level of confidence about the priors. A uniform distribution suggests less certainty, whilst the Normal more certainty given that it is peaked around some mean. In effect, using a Normal suggests that you believe there is a higher probability of the true parameter being near the central mass. Really there is no right or wrong answer to this question theoretically - just how risk adverse you feel on any given day.

John

PS: My plan is to go back and do additional PhDs under each of my former students and pay them back by doing everything they did to me when I was their supervisor. I figure I have enough former students now to stay a student myself for a while longer yet. Plus, as Shakespeare said, revenge is certainly a dish served cold.

choice-metrics.com

Pivoted design and known sign of priors

Pivoted design and known sign of priors

Re: Pivoted design and known sign of priors

Re: Pivoted design and known sign of priors

Re: Pivoted design and known sign of priors

Re: Pivoted design and known sign of priors

Re: Pivoted design and known sign of priors

Who is online