choice-metrics.com

by **Andrew** » Tue May 10, 2016 2:08 am

Dear all,

we have a design with following settings:
12 choice sets
8 blocks
2 unlabelled alternatives
6 attribues (2 attributes with 6 levels, 4 attributes with 3 levels)

We generated an efficient design with specified priors:

Code: Select all: Design ;alts = alt1, alt2 ;rows = 96 ;block =8 ;eff = (mnl,d,mean) ;model: U(alt1) = b1.effects[(n,0.8,0.1)|(n,0.01,0.02)|(n,0.01,0.02)|(n,0.01,0.02)|(n,0.01,0.02)] * att1 [0,1,2,3,4,5]+ b2.effects[(n,0.2,0.1)|(n,0.01,0.02)] * att2 [0,1,2]+ b3.effects[(n,0.6,0.1)|(n,0.01,0.02)|(n,0.01,0.02)|(n,0.01,0.02)|(n,0.01,0.02)] * att3 [0,1,2,3,4,5]+ b4.effects[(n,0.1,0.1)|(n,0.01,0.02)] * att4 [0,1,2]+ b5.effects[(n,0.1,0.1)|(n,0.01,0.02)] * att5 [0,1,2]+ b6.effects[(n,0.1,0.1)|(n,0.01,0.02)] * att6 [0,1,2]/ U(alt2) = b1 * att1 + b2 * att2 + b3 * att3 + b4 * att4 + b5 * att5 + b6 * att6 $

Double checking the design we determined unbalanced two-way frequencies especially for one attribute level. First level of attribute 3 (preferred level) showed up apparantly less often together with first level of attribute 1 (also a preferred level). Instead it was shown more often with the least preferred levels of attribute 1. A choice set example is shown below. The relevant combination of the first levels of attr 1 and 3 is underlined (zeros in alternative 2). Compared to the other attribute level combinations, this combination is very rare (and probably underrepresented) in the design.

Design
alt1.attr1 alt1.attr2 alt1.attr3 alt1.attr4 alt1.attr5 alt1.attr6
2 1 1 0 1 1
alt2.attr1 alt2.attr2 alt2.attr3 alt2.attr4 alt2.attr5 alt2.attr6
0 2 0 2 0 2

Using panel date we estimated a simple conditional logit model and expected linearity in effect-coded variables. We think, due to the patchy two-way frequencies in the design we ran into biased results. Coefficient of level 1 of attribute 3 is close to zero and breaks the linearity assumptions in this 6-level attribute. Coefficients within the remaining attributes (1, 2, 4, 5, 6) have the linearity in levels we assumed.

1. Is there a mistake in the design syntax which caused the unbalanced two-way frequencies in the design?

2. If so, is there a way to overcome the systematic design error in the estimation process?

Appreciate any help.

Kind regards,
Andrew

by **Michiel Bliemer** » Tue May 10, 2016 4:03 pm

1) There is no mistake, certain combinations just provide more information than others. Efficient designs will never have balanced two-way frequencies. Designs that have balanced two-way frequencies are called orthogonal designs, which are only efficient if all priors are equal to zero.

2) This is not a design error, this actually leads to more (Fisher) information and therefore smaller standard errors. You can overcome this by looking for an orthogonal design (if one exists) using the ;orth = seq command.

Further, please note that you are looking for a Bayesian efficient design with 18 Bayesian priors. You are using the default number of Bayesian draws, which will not nearly be enough to get a stable result. With 18 Bayesian priors you will need typically at least 3^18 = 387,420,489 draws. We usually advise not to use more than 10 Bayesian draws and use large numbers of draws and set the rest as fixed priors, since otherwise results become unstable.

by **Andrew** » Tue May 10, 2016 6:49 pm

Michiel,
many thanks for the fast response.
Indeed, we were aware of the fact that the two-way frequencies in the design wouldn't be perfectly balanced. But we were wondering about the very underepresented specific level combinations between attr1 and attr3.

Thanks again,
Andrew

by **paulm** » Wed May 11, 2016 12:15 am

On a related topic, we ran a labeled design similar to this:
;alts = brandA, brandB, none
;model:
u(brandA) = b0[0] + b1.e[.75|.25|-.25]*priceA[1,2,3,4] + b2.e[-.1]*feature1[0,1]+b3.e[-.1]*feature2[0,1] +.../
u(brandB) = c0[0] + c1.e[.75|.25|-.25]*priceB[1,2,3,4] + b2*feature1 + b3*feature2 + ...
$

We then decided that we didn't want a labeled design but an unlabeled one where brandA could appear twice and brandB could appear twice. So we converted it:

;alts=alt1*, alt2*, none
model:
u(alt1) = b0[0] + b1.3[0]*brand[1,2] + p1.e[.75|.25|-.25]*brand.d[1]*priceA[1,2,3,4] + p2.e[.75|.25|-.25]*brand.d[2]*priceB[1,2,3,4] + b2.e[-.1]*feature1 + ..
u(alt2) = b0 + b1*brand + p1*...

Ngene can't design this model.

So we cheated a bit and removed the interaction.
model:
u(alt1) = b0[0] + b1.3[0]*brand[1,2] + p1.e[.75|.25|-.25]*price[1,2,3,4] + b2.e[-.1]*feature1 + ..
u(alt2) = b0 + b1*brand + p1*price + b2*feature1 + ...

And we noticed the following:
The brands (and all the features) are perfectly correlated. That is, whenever alt1.brand = 1 alt2.brand = 2 and vice versa. So we might as well have a labeled model. In addition, the same happens for all the binary features. If alt1 has the feature, alt2 does not.

This doesn't seem right to me. What have we done wrong?

Paul

by **Michiel Bliemer** » Wed May 11, 2016 9:37 am

Hi Paul,

You have done nothing wrong, what you observe is a well-known outcome of optimal designs. You can only obtain (Fisher) information if there are trade-offs between attribute levels. If attributes have the same levels, then there is no trade-off and no information can be obtained for that coefficient. This is why optimal designs (as defined by Street and Burgess) require that all attribute levels are different across all alternatives, since this ensures minimum overlap (see also Huber and Zwerina) and maximum information. D-efficient designs as such relax this somewhat, so it may in certain circumstances be fine to keep attribute levels constant across alternatives for a limited number of choice tasks. But this means there is some loss of information and a large sample is required.

Unlabelled alternatives does not mean that levels of attributes are constant across alternatives. You can easily force some attributes to have the same level by using scenarios (e.g., priceB[priceA}, see the Ngene manual) if you would like some attributes to have the same level (but then they are always the same). This is not very efficient though and you will not capture much information here.

Note that BrandA, BrandB sounds like an unlabelled experiment to me, but maybe you have specific brand names in mind (e.g. Coca Cola, Pepsi Cola) to make it labelled.

Michiel

by **paulm** » Fri May 20, 2016 12:13 am

This brings up another question. What happens when you have some quite extreme priors? Say, for example, brandA = Coke and brandB = Pepsi. Some people will buy only Coke and others will buy only Pepsi. If the design always shows 1 Coke alternative and 1 Pepsi alternative, you learn nothing about the other preferences, for price, or features. It has to be more efficient for a design to have some sets with 2 Coke offers and no Pepsi offer. The Pepsi people will chose None but the Coke people will have a decision to make.

Would the same design result if the prior for Coke.dummy was [u,-6,6] or [(u,-6,6)]? Would the same thing happen?

Paul

by **Michiel Bliemer** » Fri May 20, 2016 9:22 am

Dear Paul,

These are all good questions, and the answer is: no, the same thing would not happen.

When a brand is dominant, you rather not use it as an alternative, so typically you would put the brand name as an attribute in an unlabelled experiment like this:

Code: Select all: design ;alts = alt1, alt2, none ;rows = 8 ;eff = (mnl,d) ;model: U(alt1) = brand[0.3] * brand[1,0] + price[-0.2] * price[1,2,3,4] + size.dummy[0.1|0.3] * size[1,2,0] / U(alt2) = brand * brand[1,0] + price * price[1,2,3,4] + size * size[1,2,0] / U(none) = b[-0.5] $

This would generate a design like this:

alt1.brand alt1.price alt1.size alt2.brand alt2.price alt2.size
0 3 0 1 2 2
1 3 2 0 2 1
0 4 2 1 1 1
1 1 0 0 4 1
0 2 2 1 3 0
1 4 1 0 1 2
1 2 1 0 3 0
0 1 1 1 4 2

You can see that 'brand' for alt1 and alt2 are different each time.

Now suppose that 'brand' is a dominant attribute, so we change the prior from 0.3 to 3.0 in the following code:

Code: Select all: design ;alts = alt1, alt2, none ;rows = 8 ;eff = (mnl,d) ;model: U(alt1) = brand[3.0] * brand[1,0] + price[-0.2] * price[1,2,3,4] + size.dummy[0.1|0.3] * size[1,2,0] / U(alt2) = brand * brand[1,0] + price * price[1,2,3,4] + size * size[1,2,0] / U(none) = b[-0.5] $

The outcome now is very different, namely 'brand' is the same in most choice tasks (but not all, because we still require at least some differences in order to estimate the parameter for 'brand').

alt1.brand alt1.price alt1.size alt2.brand alt2.price alt2.size
0 3 2 0 2 1
1 4 2 1 1 1
1 1 0 1 4 2
0 3 0 0 1 2
0 2 1 1 3 0
1 4 1 0 2 2
1 1 2 1 4 1
0 2 1 0 3 0

So Ngene is 'smart' in that it maximises the information from the different choice tasks, taking into account the relative contribution to utility each attribute is giving.

I hope that this provides more insight for you.

Michiel

by **paulm** » Sat May 21, 2016 12:00 am

Thank you!

I think my problem was that my prior was not big enough. A bigger prior would solve my problem.

Paul

choice-metrics.com

Efficient design with unbalanced two-way frequencies

Efficient design with unbalanced two-way frequencies

Re: Efficient design with unbalanced two-way frequencies

Re: Efficient design with unbalanced two-way frequencies

Re: Efficient design with unbalanced two-way frequencies

Re: Efficient design with unbalanced two-way frequencies

Re: Efficient design with unbalanced two-way frequencies

Re: Efficient design with unbalanced two-way frequencies

Re: Efficient design with unbalanced two-way frequencies

Who is online