choice-metrics.com

by **jbas** » Thu Nov 05, 2020 11:53 pm

Hi all, this is my first post in this forum. Thanks for having me in this community and congrats for this wonderful software.

I have a couple of questions related to Pivot designs and conditions that I (unsuccessfully) tried to solve by myself.
My design is actually pretty simple, it contains three alternatives (car, bike, walk) and they have in common one attribute, travel time. I use the travel time by car as a reference for the bike and walk travel times, which seems to work. In addition, I add some conditions regarding another attribute that bike and walk have in common, which also seems to work. The code is as follows:

Code: Select all: Design ;alts = Car, Bike, Walk, None ;alg = swap(stop=total(3500 iterations)) ;rows = 24 ;block = 4 ;eff=(mnl, d, mean) ;cond: if(Bike.lts_bike = 1, Walk.lts_walk = [1,2]) , if(Bike.lts_bike = 2, Walk.lts_walk = [1,2,3]) , if(Bike.lts_bike = 3, Walk.lts_walk = [2,3,4]) , if(Bike.lts_bike = 4, Walk.lts_walk = [3,4]) ;model: U(Car) = b1[(u,-0.025, -0.0080)] * tt_car.ref[4,6,8] + b2 * tc_car[0.3] + b3[(u,-0.0088, -0.0036)]* pc_car[0,1,3] / U(Bike) = b1 * tt_car.piv[0%,25%,50%] + b5 * lts_bike[1,2,3,4] / U(Walk) = b1 * tt_car.piv[100%,125%,150%] + b7 * lts_walk[1,2,3,4] $

However, I finally opted for a heterogeneous pivot design, i.e. three different designs for three different respondent segments (in this case, people that travel short, medium, and long trips). Well, in this case, when I run it, I obtain the message Error: An attribute, 'walk.lts_walk', specified in the ';cond' property could not be found. Code is as follows:

Code: Select all: Design ;alts(short) = Car, Bike, Walk, None ;alts(medium) = Car, Bike, Walk, None ;alts(long) = Car, Bike, Walk, None ;alg = swap(stop=total(3500 iterations)) ;rows = 24 ;block = 4 ;eff= fish(mnl, d,mean) ;fisher(fish)= design1(short[0.33], medium[0.43],long[0.24]) ;cond: if(Bike.lts_bike = 1, Walk.lts_walk = [1,2]) , if(Bike.lts_bike = 2, Walk.lts_walk = [1,2,3]) , if(Bike.lts_bike = 3, Walk.lts_walk = [2,3,4]) , if(Bike.lts_bike = 4, Walk.lts_walk = [3,4]) ;model(short): U(Car) = b1[(u,-0.025, -0.0080)] * tt_car.ref[6] + b2 * tc_car[0.3] + b3[(u,-0.0088, -0.0036)]* pc_car[0,1,3] / U(Bike) = b1 * tt_car.piv[0%,25%,50%] + b5 * lts_bike[1,2,3,4] / U(Walk) = b1 * tt_car.piv[100%,125%,150%] + b7 * lts_walk[1,2,3,4] ;model(medium): U(Car) = b1[(u,-0.025, -0.0080)] * tt_car.ref[14] + b2 * tc_car[1] + b3[(u,-0.0088, -0.0036)]* pc_car[0,1,3] / U(Bike) = b1 * tt_car.piv[25%,50%,75%] + b5 * lts_bike[1,2,3,4] / U(Walk) = b1 * tt_car.piv[200%,250%,300%] + b7 * lts_walk[1,2,3,4] ;model(long): U(Car) = b1[(u,-0.025, -0.0080)] * tt_car.ref[17] + b2 * tc_car[1.5] + b3[(u,-0.0088, -0.0036)]* pc_car[0,1,3] / U(Bike) = b1 * tt_car.piv[50%,65%,75%] + b5 * lts_bike[1,2,3,4] / U(Walk) = b1 * tt_car.piv[300%,400%,500%] + b7 * lts_walk[1,2,3,4] $

At this point, I have three doubts regarding this design:
1. Why the conditional clauses are not working in the second case?
2. I’ve noted that, although I define levels for the car travel time in the first case, the design always assigns the lowest value (that happens also in the heterogeneous case, so I fixed them). Is it not possible to define levels for the reference?
3. This is a more general and conceptual question. I wonder about the possible problems of data correlation in this type of designs. Obviously, all travel times will be almost perfect correlated since two of them are calculated from another one. Although the coefficients produced by the model will supposedly have lowest possible standard error, could not highly correlated data be an impediment to the significance of the parameters?

Thanks for your help.

by **Michiel Bliemer** » Fri Nov 06, 2020 11:09 am

1. Conditional constraints do not work in conjunction with multiple designs using the fisher command, see page 214 of the manual. Algorithms that can handle such complexity currently do not exist. I suggest that you create a design for each of the model categories separately using separate syntax.

2. I am not sure I understand your question. A reference level is derfined as a fixed level. If you require most reference levels, then you simply need to create multiple designs. I generally recommend creating a library of designs, using different reference levels and simply picking the appropriate design for each respondent based on their reference levels. Note that in this case, you do not need to use .ref and .piv, but you can use the actual attribute levels that you show to the respondents. This makes creating designs much easier.

3. No, in many cases, correlations help in reducing standard errors and therefore obtain more reliable parameter estimates. Only when correlations are very high (e.g. 0.95 or 0.99), identifiability issues may arise and it may no longer be possible to estimate the parameters. But as long as there is no perfect correlation, correlations are fine and will always happen (especially in revealed preference data).

Michiel

by **jbas** » Fri Nov 06, 2020 10:52 pm

Thanks for your response, Michiel, it’s been of great help. Nevertheless, I still have a couple considerations:

1. I understand your suggestion, but then I wonder if three separated designs (one per category) will be ‘equivalent’ to one only design that contain the 3 categories. I mean, when the three submodels are computed in one design, all the attribute levels are considered in order to maintain its properties. But if the 3 are generated separated, then the result of one is not taken into account in the others. I hope I’m explaining myself.
2. I get your point. My question was: even though knowing that it can be done with different designs as you suggest, can the reference be one out of several values instead of a fixed one, and then the pivoting attribute just pivot on it? For instance, instead of tt_car.ref[6], be tt_car.ref[4,6,8]; and still tt_car.piv[0%,25%,50%]? I would find this very convenient for presenting choice tasks in which I don’t want to always show the same travel time by car, but still bike and walk travel times linked to it.
3. This is an interesting discussion. In my opinion, in a pivot design the pivot and references (walk, bike, and car travel times in this case) will be almost perfectly correlated since ones are an exact calculation from the other. How cannot the correlation be superior to 0.95? Actually, if I check them in the design output this is precisely the case, as shown below:

Code: Select all: Attribute car.tt_car car.tc_car car.pc_car bike.tt_car bike.lts_bike walk.tt_car walk.lts_walk Block car.tt_car 1 1 0.730297 0.986928 0.912871 0.99591 0.923186 0.912871 car.tc_car 1 1 0.730297 0.986928 0.912871 0.99591 0.923186 0.912871 car.pc_car 0.730297 0.730297 1 0.702731 0.675 0.714683 0.716337 0.666667 bike.tt_car 0.986928 0.986928 0.702731 1 0.882919 0.984711 0.917192 0.900937 bike.lts_bike 0.912871 0.912871 0.675 0.882919 1 0.897352 0.960735 0.833333 walk.tt_car 0.99591 0.99591 0.714683 0.984711 0.897352 1 0.904087 0.909137 walk.lts_walk 0.923186 0.923186 0.716337 0.917192 0.960735 0.904087 1 0.84275 Block 0.912871 0.912871 0.666667 0.900937 0.833333 0.909137 0.84275 1

Thanks a lot for your time answering my questions and congratulations once again for this great software.

by **Michiel Bliemer** » Sat Nov 07, 2020 11:51 am

1. Yes you will lose some efficiency but this is likely not much. Either way, it is not possible to add conditional constraints when creating a heterogeneous design this way, so I am not sure what other option you have if you need constraints.

2. You cannot specify tt_car.ref[4,6,8] but you can do the following:

* specify tt_car_ref[4,6,8] (i.e., specifying it as a regular attribute)
* specify tt_car_piv[4,5,6,7.5,8,9,10,12] (i.e., as a regular attribute with pivots around the reference)
* add conditional constraints, e.g. if(tt_car_ref = 4,tt_car_piv=[4,5,6])

3. The whole point of experimental design is to make sure that attributes are NOT perfectly correlated. So with pivots, even through it is based on a reference alternative, there is still variation of 0-50% with the pivots around the reference value, more than enough to be estimate to estimate all coefficients. Note that in revealed preference data correlations are often about 80-90%, e.g. travel time and travel cost are generally very highly correlated, but this still allows one to estimate coefficients for travel time and travel cost because there is sufficient variation.

Michiel

by **jbas** » Mon Nov 09, 2020 9:18 pm

Thanks for your advice on points 1 and 2. Regarding 3, not that I want to insist, but the following design, with three values for the reference and high variations for the pivots, provides correlations (H index) among them of 0.98 and 0.99. Couldn’t this be really a problem?

Code: Select all: Design ;alts = Car, Bike, Walk, None ;alg = swap(stop=total(3500 iterations)) ;rows = 24 ;block = 4 ;eff=(mnl, d,mean) ;cond: if(Bike.lts_bike = 1, Walk.lts_walk = [1,2]) , if(Bike.lts_bike = 2, Walk.lts_walk = [1,2,3]) , if(Bike.lts_bike = 3, Walk.lts_walk = [2,3,4]) , if(Bike.lts_bike = 4, Walk.lts_walk = [3,4]) ;model: U(Car) = b1[(u,-0.025, -0.0080)] * tt_car.ref[4,6,8] + b2 * tc_car[0.3] + b3[(u,-0.0088, -0.0036)] * pc_car[0,1,3] / U(Bike) = b1 * tt_car.piv[0%,25%,50%] + b5 * lts_bike[1,2,3,4] / U(Walk) = b1 * tt_car.piv[100%,125%,150%] + b7 * lts_walk[1,2,3,4] $

Code: Select all: Correlations (H Index) Attribute car.tt_car car.tc_car car.pc_car bike.tt_car bike.lts_bike walk.tt_car walk.lts_walk Block car.tt_car 1 1 0.730297 0.986928 0.912871 0.99591 0.923186 0.912871 car.tc_car 1 1 0.730297 0.986928 0.912871 0.99591 0.923186 0.912871 car.pc_car 0.730297 0.730297 1 0.716245 0.666667 0.72731 0.6742 0.666667 bike.tt_car 0.986928 0.986928 0.716245 1 0.8649 0.978341 0.917192 0.900937 bike.lts_bike 0.912871 0.912871 0.666667 0.8649 1 0.890618 0.960735 0.833333 walk.tt_car 0.99591 0.99591 0.72731 0.978341 0.890618 1 0.888763 0.909137 walk.lts_walk 0.923186 0.923186 0.6742 0.917192 0.960735 0.888763 1 0.84275 Block 0.912871 0.912871 0.666667 0.900937 0.833333 0.909137 0.84275 1

I take advantage of this reply to also comment on the efficiency measures, below. The S estimates, Sp and Sb estimates look particularly high. Any insight on this?

Code: Select all: MNL efficiency measures Bayesian Fixed Mean Std dev. Median Minimum Maximum D error 0.090726 0.090742 0.000211 0.090733 0.090393 0.091144 A error 1.50337 1.503625 0.000571 1.50344 1.502847 1.505118 B estimate 99.429913 99.376303 0.341497 99.425986 98.70528 99.858712 S estimate 14284.367738 17182.704106 9088.735989 14178.886251 7143.189584 41285.791294 Prior b1 b2 b3 b5 b7 Fixed prior value -0.0165 0 -0.0062 0 0 Sp estimates 176.267155 Undefined 14284.367738 Undefined Undefined Sp t-ratios 0.147629 0 0.016399 0 0 Sb mean estimates 236.638434 Undefined 17182.704106 Undefined Undefined Sb mean t-ratios 0.147892 0 0.016441 0 0

Thanks again for your time.

by **Michiel Bliemer** » Tue Nov 10, 2020 1:56 pm

You should not compute correlations for attributes that have a fixed level, that does not make sense, you should compare correlations across two non-fixed variables. The highest Pearson-Product moment correlations between two varying attributes is 0.85, which is fine. When correlations are too high it becomes impossible to estimate the parameters and the D-error will become very large, so you would immediately pick up on this.

Note that .ref[4,6,8] does not work as I indicated in my previous email, .ref needs to have a single value and now it is fixed to the first value (4).

S-estimates only make sense if you use reliable priors. Your priors are very small and essentially indicate that none of your attributes actually have a large impact on choice. Make sure that your pilot study data coding (from where I assume you have obtained priors) is consistent with the coding you use in Ngene. If priors do not come from a sufficiently large pilot study then S-estimates should be ignored as they become unreliable.

Michiel

by **jbas** » Tue Nov 10, 2020 8:22 pm

Great, thanks for your insights. This conversation helped me to understand much better the issue of correlation.

I’d like to ask one last thing. Following the rationale that one can make a design based on a code scheme instead of based on the actual levels (where [-1,0,1] may mean $1, $2, $3), I coded the following design as a very initial approach to my project:

Code: Select all: Design ;alts = Car, Bike, Walk, None ;alg = swap(stop=total(3500 iterations)) ;rows = 24 ;block = 4 ;eff=(mnl, d,mean) ;cond: if(Bike.lts_bike = 1, Walk.lts_walk = [1,2]) , if(Bike.lts_bike = 2, Walk.lts_walk = [1,2,3]) , if(Bike.lts_bike = 3, Walk.lts_walk = [2,3,4]) , if(Bike.lts_bike = 4, Walk.lts_walk = [3,4]) ;model: U(Car) = b1[(u,-0.025, -0.0080)] * tt_car[-1,0,1] + b2 * tc_car[-1,0,1] / U(Bike) = b3 * tt_bike[-1,0,1] + b5 * lts_bike[1,2,3,4] / U(Walk) = b6 * tt_walk[-1,0,1] + b8 * lts_walk[1,2,3,4] $

It was my intention to substitute, once the output was generated, tt_car[-1,0,1] by 4,6,8; as well as tt_bike[-1,0,1] by 0%, 25%, 50%, and so on. My question is (leaving aside that .ref needs a single value, which I didn’t know at that moment): would that be equivalent to a pivot design in which the tt_car.ref was set to 4, and tt_car.piv[0%,25%,50%] (for tt_bike)? To what extend the efficiency and correlations would be affected by the fact the we are working with a coding scheme that ‘hides’ a pivot design?

Thanks.

by **Michiel Bliemer** » Tue Nov 17, 2020 8:40 pm

Your priors must make sense on the coding scheme you use in estimation. So if your priors were estimated using $1, $2, and $3 then you need to use levels 1, 2, and 3 in your design as well.

So the best process would be:

1. Use the actual levels that you would use in your data set for estimation, i.e. 4,6,8 minutes and 1,2,3 dollars.
2. Generate a design using these actual levels.
3. Convert these levels to pivots for your survey instrument, i.e. assuming a reference level of 6 minutes and 2 dollars, travel time becomes [-33%, 0%, 33%] and travel cost becomes [-50%, 0%, 50%]. Or use absolute pivots.
4. Capture the reference levels for the respondent in your survey instrument, e.g. 8 minutes and 1 dollar.
5. Compute the pivot levels to show to this respondent, i.e. [5.36, 8, 10.64] minutes and [0.50, 1, 1.50] dollars.
6. Round the pivot values to reduce cognitive burden on respondent, so for travel time you may want to use [5, 8, 11] minutes.

Michiel

by **jbas** » Thu Nov 19, 2020 7:50 pm

Great. Thanks for your time, Michiel.

choice-metrics.com

On pivot designs and conditions

On pivot designs and conditions

Re: On pivot designs and conditions

Re: On pivot designs and conditions

Re: On pivot designs and conditions

Re: On pivot designs and conditions

Re: On pivot designs and conditions

Re: On pivot designs and conditions

Re: On pivot designs and conditions

Re: On pivot designs and conditions

Who is online