choice-metrics.com

by **Benrath** » Tue Aug 22, 2017 2:42 am

I am not sure if the title really represents my questions, but please bear with me for the time being. I am working on the design of a study and I am struggling with the proposed design. We are analyzing different transport choices to estimate an elasticity of demand. Therefore we plan an experiment with three different attributes (price (5 levels), time (3-5 levels) and punctuality (3 levels)) that we think are the most important factors for the decision. The respondents should decide between two alternatives of the same transport mode or opt out which would mean that they either take an alternative transport mode or do not travel at all (as a side question: would considering this as a different alternatives make a difference? To me picking either the alternative or the no travel options just indicates that my two presented choices have less utility. We do not want to specify the alternatives any further or learn anything about them). We think that we will have about 500 respondents that would be willing to respond to 12 tasks.
From what I read here and elsewhere, I would pick a design (orthogonal / (d) efficient /… ) and give the same design to all 500 respondents (right?). We could also draw randomly from all possible paired choices or 1000 sets of efficient designs, so that except for random repetitions each respondent would get a different design. To me these are two extreme cases, but I have not really found any examples for the latter. The latter case would cover a larger part of the possible state space, but have very few responses for certain tasks.
Does it make sense to do this, if we have little prior knowledge about the parameters, should we rather stick to one fixed design, or rather choose a larger efficient design with 2-3 blocks?
Assuming we have more priors should we then correct the fixed design by deleting implausible or dominated choices?

by **Michiel Bliemer** » Tue Aug 22, 2017 10:33 am

1) Showing the alternative 'neither' as a no choice may mean different things to different people. Some people may interpret this as still making the trip but using another mode, while others imagine it means staying at home. Both options will have a different utility being captured by the constant. Preferably you would avoid such ambiguity, which for example can be done by using two 'no choice options' for which you can estimate separate constants, one for 'different mode', and one for 'staying at home' or something like that. It really depends on your research question and your scenario. If your scenario is making a trip to work, then staying at home may be not an option. The interpretation of your demand elasticity depends on your definition of your 'no choice' option, so as long as you are clear to the respondents what it means, then you are at least consistent. If you tell the users that the 'no choice' may mean either another mode or staying at home, then I would expect a large variation in the constant and hence using a random constant may be useful.

2) The advantage of a random design is the large amount of variability in the data, but it will also contain many choice tasks that capture little information such that a much larger sample size is required. This is a simple but inefficient way of collecting data. More typical is to create a fixed subset of choice tasks (or create a fixed pivot design around the respondents' reference alternative) which consist of questions that capture a lot of information, and you indeed give them to all respondents. This is a smart and more efficient way of collecting data. Note that this fixed design can be of any size depending on how much variability you would like, and this depends on the number of parameters you are estimating. If you are only estimating 4 parameters (one for each attribute and a constant for the 'no choice') then your design does not need to be very large. It is no problem using just for example 12 choice tasks. But it is also no problem to use 24 or 36 and block the design such that you give a subset of 12 of the fixed design to each respondent. There is usually no need to use large fixed designs, large designs are only needed when estimating a very large number of parameters, for example if everything is dummy coded and all interactions between the dummy coded variables are included.

3) No matter what design type you use, it is important to avoid clearly dominant alternatives in your choice tasks. Ngene can create a random selection of profiles (just use ;rows = 1000 and ;fact to randomly select 1000 choice tasks) and also automatically remove dominant alternatives if you specify appropriate priors, for example by using ;alts = route1*, route2*, none, which can also be used to generate an efficient design.

If your case, each alternative as at maximum 5*5*3 = 75 possible profiles. That means that in total you will have 75*75 = 5625 possible choice tasks (the full factorial), of which many are problematic (dominant alternatives, identical alternatives. Therefore, the remaining candidate set is actually not that large, and I do not see a need to create 1000 different efficient designs.

Michiel

by **Benrath** » Tue Aug 22, 2017 11:15 pm

Thank you for the quick reply.

1) We itend to explain explicitly that not choosing Alternative 1 or 2 will mean using another alterantive of which no transport is one. We do not care about the other alternatives to avoid misspecifications and sinc we only care about the transport mode described in alternative 1 and 2 (which is the same in different specifications). What would a random constant be?
I would have imaginged that the constant for our neither would yield the same results as the sum of the constants for "alternative transport mode" and "no transport" if people would behave consistently.

2-3) Ok this sounds convincing. So we rather start of with a d-efficient design, replace the dominant and implausible choices by hand and each respondent gets the same desing. We should still think about the number of task needed, e.g. it could be 24 in 2 blocks. I think we will probably have more parameters to estimate.

From one of your presentations I saw that increasing the number of levels sometimes does not require a much larger sample. I guess this would especially be the case if I would assume one constant parameter for the attribute instead of dummies for each level? So i could even think about increasing the level for "price" and "time" to six, as this would be more quantitative. We do not expect a big effect for punctuality (in x% punctual) , so we might as well reduce this to 2 levels.

How much does the number of respondent specific parameters influence the minimum sample size or number of needed tasks?

by **Michiel Bliemer** » Tue Aug 22, 2017 11:45 pm

A random constant reflexts an error component mixed logit model. There will be alarge spread around what people think of "changing mode" as some people will think of bus and other of train or cycling, and for staying home some may have kids at home etc. i would expect a lot of heterogeneity in the constant, reflected in a distribution.

Why not let the efficient design remove dominant alternatives automatically instead of manually?

The number of levels does not influence efficiency much if it is coded as a linear effect instead of dummy coding. Often less levels will be slightly more efficient, it is usually enough to have 2 to 4 levels. With 6 levels there are smaller trade offs in the data that capture less information. But fewer levels alao mean that dominant alternatives are more likely to occur since you are only including 3 attributes.

My rule of thumb is 50 respondents for each sociodemograhic parameter that you include. This is not based on any scientific evidence though, since it is very much case specific. You may want to oversample some specific segments instead of using a representative sample.

Michiel

by **Benrath** » Thu Aug 31, 2017 1:06 am

My colleague would still be in favor of a random design in order to capture all different effects of price changes and other attribute changes. We concluded with a compromise to have a larger blocked design of 20 choices in 2 blocks. We will eliminate dominant choices and discuss implausible choices within our group. If most of us agree that a choice is so implausible that it does not represent any trade off, we will replace it with another choice or adjust it.
Would that be a sensible strategy?
Would you recommend more choices and/or blocks? We believe that respondents will be read to answer 10-12 choices meaningfully.
I think that price will be the most important dimension for the decision of respondents. Should we then rather have a balanced design for the price levels or focus on deviations around the reference price?

by **Michiel Bliemer** » Thu Aug 31, 2017 9:39 am

I think 20 choice tasks over 2 blocks is fine, but to satisfy your colleague there is no problem in increasing the number of choice tasks to 30 or 50 or 100 and use blocks of 10 choice tasks each. This increases the variation in the design, but not all questions will make a lot of trade-offs (as with random designs, you may pick levels that overlap a lot).

If price varies widely across the population, you may want to use levels pivoted around their reference price in order to make the choice task more familiar.

I think using random choice tasks and manually removing or adjusting problematic choice tasks is an inefficient way of optimising information in your design, but then again, if you have enough budget for a large sample size it is not so much of an issue I suppose and any design will do (but you do not need Ngene for that). Since your colleague seems to have expertise in experimental design, perhaps it is best to follow his recommendations as I think it is not useful for you (and also not for me) to have me argue against his advice.

Michiel

choice-metrics.com

Fixed vs random design

Fixed vs random design

Re: Fixed vs random design

Re: Fixed vs random design

Re: Fixed vs random design

Re: Fixed vs random design

Re: Fixed vs random design

Who is online