choice-metrics.com

by **VaishnaviP** » Wed Aug 07, 2024 11:44 pm

Dear all,
This is my basic code. Because priors are not known, I started with zero priors. For the same design, when * and mfederov algorithms are used at a time, there was no attribute-level balance but different possible combinations were found to be present. When mfederov algorithm is not used with * present or not for dominance check, attribute-level balance was present but different attribute combinations were found to be missing (same combinations were repeated in different blocks). The d-error when the algorithm was used was slightly higher than when not used.
1. What exactly is the use of this algorithm?
2.What should I do to maximize the design efficiency without having dominant alternatives? Is there any check to find out dominance or identical alternatives apart from checking manually after experiment design?
3. Is the design without different pair-wise combinations of attributes (as observed when mfederov is not used) efficient?
4. Is there any problem if a design without attribute-level balance used for surveying or is it a necessary criterion that should be fulfilled?

design
;alts = Alternative1*, Alternative2*
;rows = 24
;eff = (mnl,d)
;block = 6
;model:
U(Alternative1) = a.dummy[-0.002|-0.0012|-0.001] * A[0,1,2,3] +
b.dummy[-0.002|-0.0012|-0.001] * B[0,1,2,3] +
c.dummy[-0.002|-0.0015|-0.001] * C[0,1,2,3] +
d.dummy[-0.002|-0.001|-0.001] * D[0,1,2,3] +
e[0.003] * E[12,14,16] +
f[-0.003] * F[20,15,10,5] +
g[-0.003] * G[6,7,8,9]
/
U(Alternative2) = a * A + b * B + c * C + d * D + e * E + f * F + g * G
$
The dummy variables are related to transportation facilities and the continuous are service hours, waiting time and cost of travel. To show the order of levels, I have used dummy coefficients in decreasing order with respect to the last level (highest improvement level).
5. Is the syntax correct?

Regards
P. Vaishnavi

by **Michiel Bliemer** » Thu Aug 08, 2024 9:40 am

The syntax looks right to me. The things you observe are commonly observed since these are results of the D-error and the algorithm and can also be easily addressed.

To avoid dominant alternatives, you need non-zero priors to indicate the preference order. So your near-zero priors are fine, although you could make them even closer to zero, e.g. 0.00001 for your numerical attributes, to clearly state that priors are essentially zero.

The modified Federov algorithm is a row-based algorithm that can be used if there are a lot of constraints imposed, including dominance constraints. I typically only use the mfederov algorithm if the default column-based swapping algorithm struggles to find designs. The modified Federov algorithm is unable to satisfy attribute level balance due to the nature of row-based algorithms, but in Ngene you can impose attribute level constraints to obtain reasonable attribute level balance. Alternatively, you can dummy code all attributes, including numerical attributes, which will also ensure that each level appears a similar number of times. Note that attribute level balance is not required for model estimation, it is merely a nice property of a design as it covers the attribute level space more evenly.

The fact that certain combinations of attribute levels appear more than others is the result of efficiency (D-error) in combination with numerical attributes and (near) zero priors. Combinations 5 & 20 and 10 & 15 are expected to appear most of the time for attribute F because 5 & 20 are the extreme levels and comparisons across two extreme levels provides most efficiency in model estimation (and thereby reduces the D-error). This mostly happens when using (near) zero priors. Note that this is not a concern at all, but it is actually increasing the efficiency of your design. But I can understand that this may not be desirable in all situations, and the solution would be to also dummy code the numerical attributes, as this usually ensures that other attribute level combinations also appear. Once you have informative priors from a pilot study, you will likely not have to dummy code them anymore to get more variations in attribute level combinations as this usually mostly affects designs where uninformative priors are assumed.

So try dummy coding your attributes E, F, and G and see if that resolves your concerns.

Michiel

by **VaishnaviP** » Fri Aug 09, 2024 2:23 am

Dear Michiel,
Thank you so much for your valuable suggestions. I have a few more doubts regarding the design of unlabelled experiment for my study.
When I dummy-coded all 7 variables, the d-error increased from 0.19 (when numericals were not dummy-coded) to 0.40. To reduce the error, I tried to increase the number of choice sets to 40 (blocks=10) which produced an error of 0.23.
1. For my study, I am trying to keep the error below 0.20 and what could be the cut-off for d-error for an efficient design? Is 0.20 ok?
2. When Mfederov algorithm was used, different attribute combinations were produced without attribute-level balance (as previously discussed) with an error of around 0.20 for same number of 24 choice sets (6 blocks). So, is it better to use this algorithm (when only dominance check using * is imposed)instead of dummy-coding all the variables which increases the no. of choice sets that in turn increases the sample size and the effort required to prepare a questionnaire with many blocks?
3. Or the same syntax (shown in the previous message) be used that produced extreme attribute level combinations and the design be still efficient (without essential requirement of all possible combinations to be present) for getting maximum possible information?
4. When * is used for dominance check, do the created choice sets still contain dominant or identical alternatives? Apart from manually checking the design for dominant or identical alternatives after design is created, is there any method in Ngene to do this, as checking manually becomes impossible for a greater number of choice sets?
5. Is using * the only way in Ngene to check for dominance for unlabelled experiment?
6. Since my study is unlabelled, is it better to use fixed near zero-priors or any others like Bayesian to get an efficient design? If Bayesian should be used, how to assume the mean value and distribution?
Sorry for asking these many questions at once. But, I really want to know the requirements for an efficient d-optimal design of unlabelled alternatives.
Regards
P. Vaishnavi

by **Michiel Bliemer** » Fri Aug 09, 2024 8:40 am

1. The D-error will always change if you change the model. You should not compare D-errors across different model specifications. A D-error of 0.40 is fine. There is no cut-off value for D-errors since these are case specific. Of course the D-error will decrease when you increase the number of rows, and that is fine to do, but you also have to increase the number of blocks and the efficiency per block will likely not change much.

2 & 3. There is usually not much effort required with a questionnaire with more blocks, so it is really your own choice. There are no bad choices here. Just check which design you prefer based on attribute level balance, attribute level combinations, and efficiency.

4. No, all dominant alternatives are avoided if you specified your priors correctly. You can very simply check in Excel whether there is a dominant alternative, namely by checking in Excel whether an alternative is better than (or equal to) another alternative for every attribute. This can be done with simple if statements in Excel.

5. You could also impose a lot of reject constraints yourself in the syntax, but that is quite cumbersome. The * is the best way to do it.

6. I would only use Bayesian priors after a pilot study with informative priors, I would typically not use Bayesian uninformative (near-zero) priors. The mean and standard deviation would automatically come from parameter estimates in a pilot study.

Michiel

by **VaishnaviP** » Sat Aug 10, 2024 4:42 am

Dear Michiel,
Thank you so much for your valuable suggestions.

When I increased the prior coefficients of numerical variables from 0.001 to 0.01, the pairwise combinations for 2 unlabelled alternatives remained the same (i.e., only first-last levels and second-third levels as pairs). But, when priors are further changed to 0.1 from 0.01, significant changes were observed for waiting time (levels-20,15,10,5), showing different combinations not only the last-first (5,20) and second-third levels (10,15) as pairs (as before). Whereas, the combinations for service hours (levels-12,14,16) remained the same with only 12,16 and vice-versa;14,14 as pairs and for fare (levels-6,7,8,9), the combinations included only 6,9 (& vice-versa) and 7,8 (& vice-versa) pairs as before.
1. What could be the probable reason for this change in only one numerical variable and not others? Does the range of values impact the pair-wise combinations as well?
2. Is increasing the prior coefficients to get different level combinations for numericals advisable?

Regards
P.Vaishnavi

by **Michiel Bliemer** » Sat Aug 10, 2024 9:44 am

This is the reason why I recommend using dummy coding when priors are (near) zero as otherwise only a small subset of attribute level combinations appears. But note that having only a small number of combinations is not problematic, it is actually more efficient from a statistical point of view and would not affect your model estimates. It would only be problematic if you afterwards want to add interaction effects that you did not account for in your design phase. When using informative priors from a pilot study, more attribute level combinations are generally used when optimising for efficiency because there are certain optimal choice probabilities (which cannot be achieved with zero priors). I refer to the work of Kanninen and Johnson et al. on optimal choice probabilities, also referred to as magic probabilities, e.g. in your case with 7 attributes the optimal choice probabilities for each choice task are 68% versus 32%. Important attributes with a large range can be given levels to achieve these optimal choice probabilities, whereas attributes that are less important and have narrow range are less capable of doing so. Larger priors in conjunction with large attribute levels (and range) lead to the effect that you observe. Again, you do not need to worry too much about certain attribute level combinations not appearing, this is quite common in many experimental designs, including optimal orthogonal designs (see the work of Louviere, Street, and Burgess).

Manually increasing the prior coefficients is NOT recommended. You should use priors from a pilot study. So my workflow is typically:
* Generate efficient design using dummy coding for all attributes with uninformative (near) zero priors
* Conduct pilot study, estimate parameters
* Generate efficient design where numerical attributes are no longer dummy coded with Bayesian informative priors

You are asking all the right questions, so hopefully my answers give you a better understanding of what is going on.

Michiel

by **VaishnaviP** » Fri Sep 06, 2024 11:11 pm

Dear Michiel,
Thank you so much for your insightful suggestions.

As suggested, I tried dummy coding all the 7 variables (using modified code1 below). I got a doubt which might sound silly. To show that the utility of an alternative is inversely proportional to particular numerical variables such as f(waiting time) and g(fare), ‘-‘ is used before the prior utilities (shown in first message). But, while dummy coding, do we anywhere represent the sign of the dummy coded variables (since coefficients of different levels 0,1,2,3 represent the order of levels but the relation of the attribute with alternative i.e., negatively related or vice-versa is unknown)? To try that, ‘-‘ signs were introduced (as in modified code2) and ran the syntax which displayed the warnings as shown below the code but the code was run.

Modified code1:
design
;alts = Alternative1*, Alternative2*
;rows = 48
;eff = (mnl,d)
;block = 8
;model:
U(Alternative1) = a.dummy[-0.0025|-0.0015|-0.001] * A[0,1,2,3] +
b.dummy[-0.0025|-0.0015|-0.001] * B[0,1,2,3] +
c.dummy[-0.003|-0.002|-0.001] * C[0,1,2,3] +
d.dummy[-0.002|-0.001|-0.001] * D[0,1,2,3] +
e.dummy[-0.002|-0.001] * E[0,1,2] +
f.dummy[-0.003|-0.002|-0.001] * F[0,1,2,3] +
g.dummy[-0.003|-0.002|-0.001] * G[0,1,2,3]
/
U(Alternative2) = a * A + b * B + c * C + d * D + e * E + f * F + g * G
$

Modified code2:
design
;alts = Alternative1*, Alternative2*
;rows = 48
;eff = (mnl,d)
;block = 8
;model:
U(Alternative1) = a.dummy[-0.0025|-0.0015|-0.001] * A[0,1,2,3] +
b.dummy[-0.0025|-0.0015|-0.001] * B[0,1,2,3] +
c.dummy[-0.003|-0.002|-0.001] * C[0,1,2,3] +
d.dummy[-0.002|-0.001|-0.001] * D[0,1,2,3] +
e.dummy[-0.002|-0.001] * E[0,1,2] +
-f.dummy[-0.003|-0.002|-0.001] * F[0,1,2,3] +
-g.dummy[-0.003|-0.002|-0.001] * G[0,1,2,3]
/
U(Alternative2) = a * A + b * B + c * C + d * D + e * E + f * F + g * G
$
Warning: Defaulting to prior values of zero for the following priors: 'f, g'
Warning: If Ngene generates designs, you can ignore the following warning. The model specification has dominance checks, with some (but not all) priors set to zero. As a result, in some cases, it may be difficult for Ngene to find a design that avoids dominance. If Ngene is not generating designs, then try and increase the number of non-zero priors. If there is a clear ordering of the attribute levels in terms of preference, but you do not know the magnitude of the priors, set a positive or negative prior very close to zero.
Warning: Two alternatives were specified for alternative dominance checking, but do not have the same priors, and so will not be checked. 'alternative1', 'alternative2'
Note: Defaulting to assigning blocks with the 'minsum' method.

Which of the above 2 codes be used for my study? The second code has many warnings out of which last warning is not understood saying that the 2 alternatives have different priors.

by **Michiel Bliemer** » Sat Sep 07, 2024 10:55 am

You will need to correctly rewrite the numerical attribute into a categorical attribute. So you can replace:

+ e[0.003] * E[12,14,16]
+ f[-0.003] * F[20,15,10,5]
+ g[-0.003] * G[6,7,8,9]

with:

+ e.dummy[0.006|0.012] * E[14,16,12] ? base=12
+ f.dummy[-0.015|-0.030|-0.045] * F[10,15,20,5] ? base=5
+ g.dummy[-0.003|-0.006|-0.009] * G[7,8,9,6] ? base=6

or with:

+ e.dummy[0.012|0.006] * E[16,14,12] ? base=12
+ f.dummy[-0.045|-0.030|-0.015] * F[20,15,10,5] ? base=5
+ g.dummy[-0.009|-0.006|-0.003] * G[6,7,8,9] ? base=9

The last level is always the base level in Ngene. The ordering of the priors in the dummy coding indicates the preference orders of the levels.

Michiel

by **VaishnaviP** » Tue Sep 10, 2024 3:25 am

Dear Michiel,
Thank you so much for your prompt replies. As per your suggestions, I am correcting my code as below:

design
;alts = Alternative1*, Alternative2*
;rows = 48
;eff = (mnl,d)
;block = 8
;model:
U(Alternative1) = a.dummy[-0.002|-0.0012|-0.001] * A[0,1,2,3] +
b.dummy[-0.002|-0.0012|-0.001] * B[0,1,2,3] +
c.dummy[-0.003|-0.002|-0.001] * C[0,1,2,3] +
d.dummy[-0.002|-0.001|-0.001] * D[0,1,2,3] +
e.dummy[-0.004|-0.002] * E[12,14,16] +
f.dummy[-0.015|-0.010|-0.005] * F[20,15,10,5] +
g.dummy[0.003|0.002|0.001] * G[6,7,8,9]
/
U(Alternative2) = a * A + b * B + c * C + d * D + e * E + f * F + g * G
$

Now, I have considered numerical variables as below to make priors even closer to zero:
+ e[0.001] * E[12,14,16]
+ f[-0.001] * F[20,15,10,5]
+ g[-0.001] * G[6,7,8,9]

So, the dummy codes become as shown in the modified code. I have used '+' signs for coding priors of variable 'g' to represent the negative utility, i.e., as the fare increases from 6 to 9, the utility of the alternative decreases. So, is the case with 'f' showing negative relationship. I hope that prior coefficients for each variable are compared to the base value 'zero' to get the preference order only and the values of these coefficients are not comparable across different attributes.

1. For example, do we look at the value of -0.003 of variable 'c' at zeroth level less than -0.002 of variable 'a' at the same level? I am asking this question whether there is an issue with differences in the priors for different attributes in creating non-dominant alternatives as we consider the utilities of all attributes same (since we don’t know the exact utilities)
2. As per my understanding, since the gap between the levels of numerical variable 'e' is 2, in the dummy coding, each level is represented with a gap of 2*0.001. Can you explain the conversion procedure as to how and why we do it?

Can I proceed with this modified code?

Regards,
Vaishnavi

by **Michiel Bliemer** » Tue Sep 10, 2024 9:57 am

Your questions are all related to how dummy coding works, so I suggest you become familiar with dummy coding, there are many resources online as it is widely used including in linear regression.

In dummy coding you choose the base level for each attribute. The coefficients of the other levels are relative to the base level. You only make such comparisons with an attribute, not across attributes. You are correct that all utilities are added together to calculate probabilities, but for dominance checks only comparisons within each attribute are relevant.

I think that you did the dummy coding for E, F and G correctly.

For attribute E, the utility associated with level 12 is 0.001 * 12 = 0.012 and for level 14 is 0.001 * 14 = 0.014. This means that level 14 has 0.002 more utility than level 12, and level 16 has 0.002 more utility than level 14. If you normalise level 16 as the base, which means settings its utility to 0, then level 12 has 0.004 LESS utility and level 14 has 0.002 LESS utility than level 16, hence the priors -0.004|-0.002 relative to the base.

Michiel

choice-metrics.com

Difference in level balance when mfederov algorithm is used

Difference in level balance when mfederov algorithm is used

Re: Difference in level balance when mfederov algorithm is u

Re: Difference in level balance when mfederov algorithm is u

Re: Difference in level balance when mfederov algorithm is u

Re: Difference in level balance when mfederov algorithm is u

Re: Difference in level balance when mfederov algorithm is u

Re: Difference in level balance when mfederov algorithm is u

Re: Difference in level balance when mfederov algorithm is u

Re: Difference in level balance when mfederov algorithm is u

Re: Difference in level balance when mfederov algorithm is u

Who is online