choice-metrics.com

by **LaureK** » Wed Mar 18, 2020 3:59 am

Dear all,

Many thanks for providing the possibility to post questions on this forum.

I am currently preparing a Dz-efficient design for the pilot study of a labelled choice experiment and will then generate a Bayesian Efficient design for the actual survey based on the priors obtained in the pilot.

Each choice card will present 3 labelled alternatives + an opt-out option.
I have the following attributes:
- 4 alternative specific attributes that appear in 2 alternatives (attributes A, C, D and E)
- 1 generic attribute that appears in all 3 alternatives (attribute F = the cost attribute)
- 1 alternative specific attribute that appears in only 1 of the alternatives (attribute B in alternative 2)

I am also interested in the interactions between attribute A and attributes C, D and E.
My NGene syntax (for the pilot study) so far is:

Code: Select all: Design ;alts = alt1, alt2, alt3, alt 4 ;rows = 36 ;block = 6 ;eff = (mnl,d) ;con ;cond: if(alt2.C = 0 and alt2.D = 0, alt2.E > 0), if(alt3.C = 0 and alt3.D = 0, alt3.E > 0), if(alt2.A >= alt1.A, alt2.F > alt1.F) ;model: U(alt1) = b0 + b1.dummy[0|0]*A[3,2,1]+ b2*F[1,2,3,4,5,6] / U(alt2) = b3 + b1*A[3,2,1]+ b4*B[1,2,3,4,5,6]+ b5.dummy[0|0]*C[2,1,0] + b6.dummy[0|0]*D[2,1,0] + b7.dummy[0|0]*E[2,1,0] + b2*F[1,2,3,4,5,6] + b9*A.dummy[3]*C.dummy[2] + b10*A.dummy[3]*C.dummy[1] + b11*A.dummy[2]*C.dummy[2] + b20*A.dummy[2]*C.dummy[1] + b12*A.dummy[3]*D.dummy[2] + b13*A.dummy[3]*D.dummy[1] + b14*A.dummy[2]*D.dummy[2] + b15*A.dummy[2]*D.dummy[1] + b16*A.dummy[3]*E.dummy[2] + b17*A.dummy[3]*E.dummy[1] + b18*A.dummy[2]*E.dummy[2] + b19*A.dummy[2]*E.dummy[1] / U(alt3) = b8 + b5*C[2,1,0] + b6*D[2,1,0] + b7*E[2,1,0] + b2*F[1,2,3,4,5,6] $

My questions are the following:
1- I am worried that the alternative specific constant for alternative 2, which will capture the effect of the label, might be correlated with attribute B since it is only defined in alt2, so that I would not be able to distinguish the effect of the label from the effect of attribute B. Can this be an issue?
2- If I use the condition if(alt2.A >= Alt1.A, Alt2.F > Alt1.F), is there a risk that I introduce a correlation between attribute A and attribute F?
3- I am unsure about what the size of the design should be in the presence of alternative specific attributes. Can I use a smaller number of rows (e.g. 18 in 3 blocks of 6)?
4- Initially the number of invalid designs is low but after some time (30 minutes), it reaches about 10,000 (of 75,000 evaluations). Is this a sign that something is wrong with the design?

Many thanks for your help,
Laure

by **Michiel Bliemer** » Wed Mar 18, 2020 1:16 pm

1. If Ngene produces a finite D-error (i.e., not Inf or NaN), then all parameters are identifiable and the model CAN be estimated. Constants will only be correlated if B would have a specific level. This is not the case, B has different levels across different choice tasks and therefore there is no perfect correlation and there is no issue.

2. You introduce correlations through conditions, but as long as you do not introduce PERFECT correlation it is fine. You do not have to avoid correlations. An example of a perfect correlation would be requiring something like alt1.A = alt1.F. You do not have perfect correlations and you can estimate all parameters.

3. Yes you can use 18 rows with 3 blocks, you can try it in Ngene. Ngene would tell you if the number of rows was not enough. If you try ;rows = 8 it will tell you that you need at least 9 rows. However, that is a minimum and given that you are trying to estimate 25 parameters, it is always a good idea to use more rows. You could use 18, but I would probably feel more comfortable with 36 rows in 6 blocks (6 versions of the survey with 6 choice tasks each). There are no hard hardlines, but the more choice tasks you have in your design, the more variation in your data and the more parameters you will be able to estimate. Having said that, it is generally not necessary to use a very large number of rows.

4. No, this has more something to do with the algorithms in Ngene. Each algorithm in Ngene uses some randomisation and at it may happen that certain combinations of attribute levels create perfect correlations in the data and then some of the parameters cannot be estimated (in your case, this will likely happen in some of the interaction effects, since these are more sensitive to correlations). These are discarded (invalid designs), meaning that they have an infinite D-error. But the algorithm will continue searching. As long as there are valid designs being generated you can ignore invalid designs.

Michiel

by **LaureK** » Wed Mar 18, 2020 8:22 pm

Dear Michiel,
Many thanks for your response, this is very helpful.
Kind regards,
Laure

by **jamalm** » Tue Aug 04, 2020 9:11 am

Dear all,

This is my first time using Ngene to design a DCE survey (SAS was my previous go-to software). Similar to Laure and following the Ngene manual, I am developing a design for the pilot study of a labeled CE using priors from the literature (and some just guesses) to ultimately design a Bayesian Efficient design. As of now, I have 4 attributes with 4, 2, 4, 40 levels. The attribute with 40 levels, TC, is a combination of three attributes that they all vary together (e.g., down payment for an investment, first 5 year's monthly payment, and second 5 year's monthly payment). Each choice set has two alternatives and a status quo (opt-out) option. My Ngene syntax for the pilot study as of now is what follows:

Code: Select all: Design ;alts = Green, Yellow, Default ;rows = 80 ;block=20,minsum ? In total 80 choice sets are blocked in 20 blocks of each 4 choice sets. So every respondent will answer 4 choice sets. ;eff = (mnl, d) ?;eff = (mnl, wtp(wtp1)) ?;wtp=wtp1(*/b5) ;alg = mfederov(candidates=15000000) ;require: Green.W>1, Yellow.W>1, Green.S<2, Green.V>1, Yellow.S> 1 , Yellow.TC<37 , Green.TC>36 and Green.TC<40 ;model: U(Yellow) = b01[.4]+ b2[.2]*S[1,2,3,4] + b3 [-.4]* V[1,2] + b4[-.2]* W[1,2,3,4] + b5[-.1]*TC[1,2,3,4,5,6,7,8,9,10,11,12 ,13,14,15,16,17,18,19,20,21 ,22,23,24,25,26,27,28,29,30 ,31,32,33,34,35,36,37,38,39,40] / U(Green) = b02 + b2 *S + b3 *V + b4 *W + b5 *TC $

I have some questions/concerns and I would highly appreciate your feedback:
1- Is it okay to have an attribute with 40 levels or should I break it down into its 3 perfectly correlated attributes in the design?
2- Are 80 rows and 20 blocks too many?
3- Once the design is done, would it be okay to present the TC attribute as three attributes to the respondents?
4- I ran the current syntax for an hour and selected the last iteration of the design as the one. However, when I let Ngene run for more than several hours (>3*10^6 iterations), majority (>30) of the TC attribute levels do not appear in the design. Why does Ngene do it and how can I get around that problem?

Thank you,
Jamal

by **Michiel Bliemer** » Tue Aug 04, 2020 10:02 am

1. Including 3 separate attributes into a single attribute only makes sense in my opinion if this single attribute is treated as a categorical variable through dummy or effects coding. But that would mean estimating 39 parameters, and then I do not see the benefit of merging 3 attributes into a single attribute. I think separating the 3 attributes makes most sense, thereby estimating separate parameters for the three attributes. If not all combinations of the three attribute levels make sense, then you can impose constraints using ;require and/or ;reject.

2. The number of rows or blocks should not be an issue.

3. That would be fine, but only if you dummy or effects code TC, see comment 1.

4. Your candidate set is far too large. The modified Federov algorithm only reports back once it is gone once through the whole candidate set. It takes very long for Ngene to replace each choice task with one of the 15 million choice tasks in the candidateset. Using a candidateset larger than 10,000 is generally not needed, so I would use candidates=10000 or a smaller value.

Note that instead of typing values [1, 2, ... , 40], you could also use the shortcut [1:40:1]. But as said above, I think you should separate the three attributes in the utility function.

Michiel

by **jamalm** » Wed Aug 05, 2020 6:38 am

Hi Michiel,

Thank you very much for your feedback.

1- Following your suggestion, I broke down the TC attribute into three separate attributes: DP with 5 levels, FOP with 14 levels, and SOP with 7 levels. I also deleted the W attribute. So, I now have 5 attributes: S (4 levels), V (2 L), DP (5 L), FOP (14 L), and SOP (7 L). The problem now is that ;require and/or ;reject without using 'if' commends (;cond) will not yield all the plausible combinations (based on the manual, these constraints cannot be used together). That's why I had originally created a triple combination (the TC attribute) of those three attributes. Below is the list of combinations for those three attributes.

I am not sure how I can have Ngene only allow these combinations in the design. Below is my current syntax, which, as mentioned above, yields several implausible combinations:

Code: Select all: Design ;alts = Green, Yellow, Default ;rows = 36 ;block=6,minsum ;eff = (mnl, d) ;alg = mfederov(candidates=3900) ;require: Yellow.S> 1, Yellow.fop<13, Yellow.sop<4, Green.fop>11 and Green.fop<14, Green.sop>4 and Green.sop<8 ;model: U(Yellow) = b01[.4] + b1[.4]*S[1,2,3,4] + b2 [-.3]* V[1,2] + b3[-.4]*DP[1:5:1] + b4[-.2]*FOP[1:14:1] + b5[-.3]*SOP[1:7:1] / U(Green) = b02 + b4*FOP + b5*SOP $

Any suggestions on how to change the syntax is much appreciated. If it's not possible, can I manually fix/change the combinations?

2. My understanding is that as long as the number of rows is greater than the number of parameters, the number of rows defined in the syntax shouldn't be an issue. Is that right? Or should I use ;rows=140, as it's divisible by all the levels?

3. That makes sense, thank you.

4. I had chosen such a large number for the candidate set since I kept getting the following error from Ngene: "Error: There were problems generating a fractional factorial of choice tasks. For the modified federov algorithm, increasing the number of candidates might assist." In the current syntax, candidate sets below 3078 and above 15*10^6 only work. The error that I get is, "Error: The modified Federov candidate set size of 3900 could not be achieved. The percentages of candidates that failed are: NaN% due dominance, NaN% due constraints, and NaN% due repeated alternatives. The candidate set size has been adjusted from 3900 to 3078."

Thanks for reminding me of the shortcut. It makes the syntax shorter and looks better.
Jamal

by **Michiel Bliemer** » Wed Aug 05, 2020 10:35 am

1. You can often require if-constraints to require/reject constraints. In your case, you only have a very limited number of possible combinations for those three attributes, therefore I recommend creating your own external candidate set in Excel with all allowable choice tasks. The structure of this Excel speadsheet needs to have the following headers and data:

Resp, Cset, Yellow.S, Yellow.V, Yellow.DP, Yellow.FOP, Yellow.SOP, Green.FOP, etc. (in the order of appearance in the syntax)

In the Resp column, put 1 for each row.
In the Cset column, put 1, 2, 3, etc.
In the other columns, put the attribute levels; each row should be an allowable choice task.

In total you have the following number of possible profiles:
Default: 1
Green: 4*2*4 = 32
Yellow: 4*2*24 = 192

Therefore, you will have 1*32*192 = 6,144 possible choice tasks in your candidate set.

Save the Excel file, e.g. candidateset.xlsx, and open the spreadsheet in Ngene (select Open and select the file). Then you simply use:
;alg = mfederov(candidates = candidateset.xlsx)

2. It should hold that S(J-1)>=K, where S is the number of rows in the design, J is the number of alternatives, and K is the number of parameters. Ngene will tell you if you do not satisfy this equation. You can use 140 rows if you prefer attribute level balance, but this is not needed. Given that you are only estimating a small number of parameters, you can use a much smaller number if you like.

Finally, I notice that you did not provide a utility function for Default, but you do have levels (DP,FOP,SOP)=(1,14,4). Only opt-out alternatives do not have utility function, but in your case it sounds like Default is a status quo alternative and you need to define the utility function. This can be easily done as:

U(Default) = ... + b3 * DP_default[1] + b4 * FOP_default[14] + b5 * SOP_default[4]

I hope this helps.

Michiel

by **jamalm** » Thu Aug 06, 2020 5:24 am

Michiel,

I really appreciate your thorough explanation. I followed your instruction: created a candidate set of all the possible combinations and used it in the design. I understand that attribute level balance for all the attributes won't exist as my design is not an orthogonal design. I have some follow-up questions:
1) Why don't some of the levels appear in the design? Is it due to the prior values and signs?
2) Why do some levels appear 3-4 times more than the others?
3) Will these two problems (?) cause any issues in the main design or analysis?

The current syntax is as follows:

Code: Select all: Design ;alts = Green*, Yellow*, Default ;rows = 24 ;block=6,minsum ;eff = (mnl, d) ;alg = mfederov(candidates=can_sets.xlsx) ;model: U(Yellow) = b01[.4] + b1[.4]*S[1,2,3,4] + ?level 3 doesn't appear in the design b2 [-.3]* V[1,2] + b3[-.4]*DP[1:5:1] + b4[-.2]*FOP[1:14:1]+ ?levels 2,11, and 12 don't appear in the design b5[-.3]*SOP[1:7:1] / U(Green) = b02 + b1 *S + b2 *V + b3 *DP + b4*FOP + b5*SOP / U(Default) = b3 * DP_default[1] + b4 * FOP_default[14] + b5 * SOP_default[4] $

Thank you again, Michiel. Your answers are big help.
-Jamal

by **Michiel Bliemer** » Thu Aug 06, 2020 10:18 am

1. You are optimising for utility functions with linearly coded attributes, so the most efficient choice tasks will be choice tasks with more extreme attribute levels, e.g. 1 and 14, since these provide more information due to larger trade-offs. Do you really need 14 levels for FOP?

2. See above.

3. Not if you are estimating the utility functions that you have written down, but if you would move to dummy/effects coding you could no longer estimate your model.

There are two solutions. First, you could use dummy coding for your utility functions, e.g. b4.dummy[..|..|..|..etc] * FOP[1:14:1]. This will significantly increase the number of parameters that you are estimating, but you will see that all attribute levels will apear in your design (otherwise the D-error would be infinite) and that the levels are quite balanced. You will likely want to increase the number of rows if you use dummy coding. Secondly, you could impose attribute level constraints, e.g. b4 * FOP[1,2,3,..,14](1-3,1-3,1-3,...,1-3), which means that each attribute level appears 1 to 3 times in the design.

Note that all of this is explained in the Ngene manual, see e.g. page 139.

Best wishes,
Michiel

by **jamalm** » Thu Aug 06, 2020 10:37 am

Thank you, Michiel. You have been a huge help.
-Jamal

choice-metrics.com

Labelled CE with alternative specific attributes

Labelled CE with alternative specific attributes

Re: Labelled CE with alternative specific attributes

Re: Labelled CE with alternative specific attributes

Re: Labelled CE with alternative specific attributes

Re: Labelled CE with alternative specific attributes

Re: Labelled CE with alternative specific attributes

Re: Labelled CE with alternative specific attributes

Re: Labelled CE with alternative specific attributes

Re: Labelled CE with alternative specific attributes

Re: Labelled CE with alternative specific attributes

Who is online