choice-metrics.com

by **Steven Guu** » Wed Oct 23, 2024 1:40 pm

Dear Professor,

I am currently designing a Discrete Choice Experiment (DCE) pilot study using Ngene. The design includes 18 choice tasks per respondent, with a total of 6 blocks of choice cards (24 cards in total). Each respondent will receive 3 choice cards. In this pilot study, I assume the utility function is based on continuous attributes, and I plan to use the results to establish priors for efficient design.

I am designing the DCE by optimizing the D-error measure to find an efficient design for the Multinomial Logit (MNL) model. To start, I have opted for zero priors for all the attributes.

The experiment is unlabeled, with two alternatives (no status quo option) and 6 attributes. All attributes take the same levels, currently set at £0, £5, £10, £15, and £20. I would like both alternatives (Option A and Option B) to be constrained so that the sum of the attribute levels always adds up to a fixed amount (ideally £50). However, I am only able to run a design where the sum constraint varies between £45 and £50.

Here is the current code: (Plan A)

design
;alts = opt1*, opt2*
;eff = (mnl, d)
;alg = mfederov
;rows = 18
;block=6,minsum
;require:
opt1.x1 + opt1.x2 + opt1.x3 + opt1.x4 + opt1.x5 + opt1.x6 <= 50,
opt2.x1 + opt2.x2 + opt2.x3 + opt2.x4 + opt2.x5 + opt2.x6 <= 50,
opt1.x1 + opt1.x2 + opt1.x3 + opt1.x4 + opt1.x5 + opt1.x6 >= 45,
opt2.x1 + opt2.x2 + opt2.x3 + opt2.x4 + opt2.x5 + opt2.x6 >= 45

;model:
U(opt1) = b1[0] * x1[0,5,10,15,20]
+ b2[0] * x2[0,5,10,15,20]
+ b3[0] * x3[0,5,10,15,20]
+ b4[0] * x4[0,5,10,15,20]
+ b5[0] * x5[0,5,10,15,20]
+ b6[0] * x6[0,5,10,15,20]
/
U(opt2) = b1 * x1
+ b2 * x2
+ b3 * x3
+ b4 * x4
+ b5 * x5
+ b6 * x6
$
The D-error is currently 0.001625, and the A-error is 0.010341. However, I am encountering the warning: "One or more attributes will not have level balance with the number of rows specified: opt1.x1, opt1.x2, opt1.x3, opt1.x4, opt1.x5, opt1.x6, opt2.x1, opt2.x2, opt2.x3, opt2.x4, opt2.x5, opt2.x6."

To address this, since dummy coding can be assumed for all attribute levels in a pilot study when using (near-)zero priors, I am also considering using dummy coding with near-zero priors to increase design efficiency. The design will include 25 choice tasks per respondent and 5 blocks of choice cards (24 cards in total). Each respondent will receive 5 choice card.

Below is my codes (dummy coding version Plan B)

Design
;alts = opt1*,opt2*
;rows = 25
;block = 5,minsum
;eff = (mnl,d)
;alg = mfederov

;require:
opt1.x1 + opt1.x2 + opt1.x3 + opt1.x4 + opt1.x5 + opt1.x6 <= 50,
opt2.x1 + opt2.x2 + opt2.x3 + opt2.x4 + opt2.x5 + opt2.x6 <= 50,
opt1.x1 + opt1.x2 + opt1.x3 + opt1.x4 + opt1.x5 + opt1.x6 >= 45,
opt2.x1 + opt2.x2 + opt2.x3 + opt2.x4 + opt2.x5 + opt2.x6 >= 45

;model:
U(opt1) = b1.dummy[0.01|0.02|0.03|0.04] * x1[5,10,15,20,0]
+ b2.dummy[0.01|0.02|0.03|0.04] * x2[5,10,15,20,0]
+ b3.dummy[0.01|0.02|0.03|0.04] * x3[5,10,15,20,0]
+ b4.dummy[0.01|0.02|0.03|0.04] * x4[5,10,15,20,0]
+ b5.dummy[0.01|0.02|0.03|0.04] * x5[5,10,15,20,0] + b6.dummy[0.01|0.02|0.03|0.04] * x6[5,10,15,20,0]
/
U(opt2) = b0
+ b1 * x1
+ b2 * x2
+ b3 * x3
+ b4 * x4 + b5 * x5 + b6 *x6
$

However, The D-error is currently 0.7901, and the A-error is 4.578.

Questions:
Is it possible to generate a design where the sum constraint is exactly £50?
I am flexible if the sum constraint needs to vary slightly. If it's easier to fix the sum at £60, for example, that would be perfectly acceptable.

For the pilot study, which approach you think is better: using continuous attributes or dummy coding?

How could I resolve the attribute imbalance issue?
I attempted to solve this by adding a constraint that ensures each level of the attributes appears at least once (1-4, 1-4, 1-4, 1-4, 1-4) at plan A, but I couldn't obtain a result.

Thank you for your help.

Best regards,
Steve

by **Michiel Bliemer** » Thu Oct 24, 2024 7:42 am

Since you have 5 levels, you can only obtain attribute level balance if your number of rows is divisible by 5. 18 is not divisible by 5. So you could choose 20 for example. Note that attribute level balance is not a requirements, it is merely "nice to have".

You cannot compare D-errors for different model specifications. The D-error will always increase when you use dummy coding, and that is fine. But with dummy coding, you have 20 parameters and 25 rows is not much, you would need to increase the number of rows to 50 or so, and more if you intend to include interaction effects. You should use dummy coding for all categorical variables, and when you have zero priors, I would also dummy code any numerical attributes.

To answer your question: I think that you cannot sum all attribute levels to 50 (or any other number) and also include all attributes into your utility function. Think about it, if you know 5 attribute levels then you can also compute the 6th and therefore you have perfectly correlated attributes in your data and due to multicollinearity you cannot estimate the model. So the only way to do this is to only include 5 of the attributes into your utility function, whereby the 6th attribute is considered the "remainder" to add to 50. In your choice tasks, you could show all attribute levels, including the remainder value that makes it sum to 50, but in the model you estimate I think that you need to leave one attribute out of your model. Alternatively, you can keep all attributes in your utility function but you can only include them as interaction effects with another attribute.

Michiel

by **Steven Guu** » Thu Oct 24, 2024 12:15 pm

Dear Professor Bliemer,

Thank you very much for your detailed explanation and insights; I really appreciate your help! It has been incredibly helpful in clarifying several key points, and I greatly appreciate the time you took to provide such detailed guidance.

Based on your suggestion, I suppose there is another attribute 7 (other cost) .

Design
;alts = opt1*,opt2*
;rows = 60
;block =15,minsum
;eff = (mnl,d)
;alg = mfederov

;require:
opt1.x1 + opt1.x2 + opt1.x3 + opt1.x4 + opt1.x5 + opt1.x6 <= 50,
opt2.x1 + opt2.x2 + opt2.x3 + opt2.x4 + opt2.x5 + opt2.x6 <= 50,

;model:
U(opt1) = b1.dummy[0|0|0|0] * x1[5,10,15,20,0]
+ b2.dummy[0|0|0|0] * x2[5,10,15,20,0]
+ b3.dummy[0|0|0|0] * x3[5,10,15,20,0]
+ b4.dummy[0|0|0|0] * x4[5,10,15,20,0]
+ b5.dummy[0|0|0|0] * x5[5,10,15,20,0]
+ b6.dummy[0|0|0|0] * x6[5,10,15,20,0]
/
U(opt2) =
+ b1 * x1
+ b2 * x2
+ b3 * x3
+ b4 * x4 + b5 * x5 + b6 *x6
$
I have one follow-up question, In the case where I only include 6 attributes in my utility function, how should I represent the 7th attribute in the require constraints without including it in the utility equation? Should I manually calculate the 7th attribute as the remainder, or is there another way to handle this within the design setup?
Thank you again for your invaluable assistance, and I look forward to your advice.

Steve

by **Michiel Bliemer** » Thu Oct 24, 2024 1:23 pm

I am trying to be creative here. In the script below, I specify two models, namely model m1 in which I interact all 6 attributes with a variable y so that the model is defined, and model m2 that is the actual model that you are interested in. Using model m1 in conjunction with the constraints, you can make the 6 attributes sum to 50. In model m2, attribute x6 is dropped from the model, but is still there are the "remainder" that you can obtain from the experimental design without it being in model m2. The D-error is minimised only based on model m2.

With respect to interpretation of the model, you would need to think about that, are the coefficients of model m2 all relative to attribute x6? It is a model that I have not seen before so it requires a bit of thinking what it means.

Code: Select all: Design ;alts(m1) = opt1*,opt2* ;alts(m2) = opt1*,opt2* ;rows = 60 ;block =15,minsum ;eff = m2(mnl,d) ;alg = mfederov ;require: opt1.x1 + opt1.x2 + opt1.x3 + opt1.x4 + opt1.x5 + opt1.x6 = 50, opt2.x1 + opt2.x2 + opt2.x3 + opt2.x4 + opt2.x5 + opt2.x6 = 50 ;model(m1): U(opt1) = b1 * y[0,1] * x1[5,10,15,20,0] + b2 * y * x2[5,10,15,20,0] + b3 * y * x3[5,10,15,20,0] + b4 * y * x4[5,10,15,20,0] + b5 * y * x5[5,10,15,20,0] + b6 * y * x6[5,10,15,20,0] / U(opt2) = b1 * y * x1[5,10,15,20,0] + b2 * y * x2[5,10,15,20,0] + b3 * y * x3[5,10,15,20,0] + b4 * y * x4[5,10,15,20,0] + b5 * y * x5[5,10,15,20,0] + b6 * y * x6[5,10,15,20,0] ;model(m2): U(opt1) = b1.dummy[0|0|0|0] * x1[5,10,15,20,0] + b2.dummy[0|0|0|0] * x2[5,10,15,20,0] + b3.dummy[0|0|0|0] * x3[5,10,15,20,0] + b4.dummy[0|0|0|0] * x4[5,10,15,20,0] + b5.dummy[0|0|0|0] * x5[5,10,15,20,0] / U(opt2) = + b1 * x1 + b2 * x2 + b3 * x3 + b4 * x4 + b5 * x5 $

Michiel

by **Steven Guu** » Sun Oct 27, 2024 3:21 am

Dear Professor Bliemer,

Thank you very much for your detailed guidance—I greatly appreciate the time you took to provide such thorough advice.

After some reflection, I wanted to clarify my understanding regarding the interpretation of the model. I am wondering if the following understanding is correct: if I am interested in examining people's preferences when given, say, a 50-day holiday, with options like x1 (mountaineering), x2 (fishing), …, up to x5 for various travel activities, and x6 representing staying at home, I can only determine preferences for different travel choices, as x6 is not included in the utility equation.

Consequently, although the total holiday time is fixed at 50 days, the number of travel days varies depending on changes in x6, and I am unable to calculate whether people prefer to travel or stay home. In essence, the coefficients I derive for travel preferences are still influenced by changes in total days (stay at home), but this effect cannot be accurately quantified within my model. Is this interpretation correct?

Additionally, I would like to ask if there are any methods or approaches to incorporate the influence of x6 more explicitly.

Thank you again for your invaluable assistance, and I look forward to your advice.

Warm regards,
Steve

by **Michiel Bliemer** » Sun Oct 27, 2024 7:31 am

I have two answers:

1. I think that omitting "home" has the interpretation that getting positive coefficients for the other activities means that they are more preferred than staying home, while negative coefficients would mean less preferred than home. Utilities are always "relative to", where "home" essentially becomes your reference. I could be wrong, but I think it still has a sensible interpretation.

2. The appropriate model in your case may be a multiple discrete-continuous choice model called MCDEV, see the work of Chandra Bhat. In that model, you first identify the 'outside good', in your case that would be "home", and then you model two consecutive choices, namely (i) what activity does one want to do, which is the discrete choice, and (ii) how many days of that activity does one want to do, which is your continuous choice. I do not have much expertise in MCDEV but it would be worth checking it out since "activity choice" and "activity duration choice" are often analysed using MCDEV. It is a much more complex model though, but software such as Apollo can estimate it I believe. Some of my colleagues are experts in this type of model and if you would go down this route they may be open to collaboration.

Michiel

by **NassarN** » Sun Oct 27, 2024 5:24 pm

Hi,

Please note that the 6 attributes have [0,5,10,15,20] as levels, and sum to 50. At least 3 attributes won't show 0. In activity example, the respondent is asked to choose between two 'bundles' with at least 3 activities each..

"Additionally, I would like to ask if there are any methods or approaches to incorporate the influence of x6 more explicitly."
'True Model' U(Opt) = b1*x1+...+b5*x5+b6*x6
'Estimated' U(Opt) = b'1*x1+...+b'5*x5 if x6 is the reference
where b'i = bi-b6
One approach is to consider X6 (or any other) as reference ie b6=0, another one is to consider b1+...+b6=0 where 'average activity' is the reference.

Best
Naji

by **Steven Guu** » Mon Oct 28, 2024 6:02 am

Dear Professor Bliemer,

Thank you very much for your detailed explanation and insightful suggestions on exploring the multiple discrete-continuous choice model (MCDEV). I will take the time to learn and consider this method to see if it can be applied effectively in my current research. Additionally, it would be fantastic to connect with any of your colleagues who specialize in MCDEV if I decide to pursue this method further, thanks again!

Lastly, I would also like to ask your advice on implementing the second approach you mentioned, where I attempt to interact the first five attributes with the sixth. I am curious if this approach might allow me to capture the influence of attribute six on the first five. Below is the code I am currently considering:

Design
;alts = opt1*,opt2*
;rows = 60
;block =20,minsum
;eff = (mnl,d)
;alg = mfederov

;require:
opt1.x1 + opt1.x2 + opt1.x3 + opt1.x4 + opt1.x5 + opt1.x6 = 50,
opt2.x1 + opt2.x2 + opt2.x3 + opt2.x4 + opt2.x5 + opt2.x6 = 50
;model:
U(opt1) = b1.dummy[0|0|0|0] * x1[5,10,15,20,0]
+ b2.dummy[0|0|0|0] * x2[5,10,15,20,0]
+ b3.dummy[0|0|0|0] * x3[5,10,15,20,0]
+ b4.dummy[0|0|0|0] * x4[5,10,15,20,0]
+ b5.dummy[0|0|0|0] * x5[5,10,15,20,0] + i1.[0] * x6[5,10,15,20,0] * x1 + i2.[0] * x6[5,10,15,20,0] * x2+ i3.[0] * x6[5,10,15,20,0] * x3
+ i4.[0] * x6[5,10,15,20,0] * x4 + i5.[0] * x6[5,10,15,20,0] * x5 /
U(opt2) =
+ b1 * x1
+ b2 * x2
+ b3 * x3
+ b4 * x4 + b5 * x5 + i1 * x1*x6+ i2 * x2*x6+ i3 * x3*x6+ i4 * x4*x6+ i5 * x5*x6
$

Thank you again for your invaluable assistance. I look forward to any further advice you may have.

Best regards,
Steve

by **Steven Guu** » Mon Oct 28, 2024 9:01 am

Dear Naji,

Thank you very much for explaining the different approaches for incorporating the influence of x6 in the model.

The reason I cannot include all six attributes is due to the strict constraint that the sum of all conditions equals 50. If we consider the coefficients of all attributes, it will lead to a multicollinearity problem. Therefore, I believe that, under these conditions, even if we assume b6 as 0 as a reference, the model would still account for x6. As a result, I wonder if, regardless of our assumptions about b6, it would be impossible to avoid the occurrence of multicollinearity under such constraints. I may not fully understand this issue yet, so I would greatly appreciate any additional insights you could provide to clarify my understanding.

Thanks again!

Best regards,
Steve

by **NassarN** » Tue Oct 29, 2024 6:42 am

Dear Steve

The model is invariant to any constant added to b1 .... b6, DoF are 5 (b1-b6,...b5-b6). As X6 is result of X1 to X5, the model is only impacted by X1 ... X5

One way to avoid multicollinearity, is to drop one attribute when estimating the model (see Michiel message Oct 23) here X6
1- When building the DoE. one can show x6 as (50-x1..5) in the choice cards
2- When estimating the model. you will get the b1.. b5 (the utility of one additionnal unit for X1..X5 relative to one additionnal unit for X6, b6 then is 0)
You can change the reference as I mentioned in previous message. Please note that the choice of the reference won't impact the simulations results (choice probability) , it's just important when interpreting the coefficients b as relative utility of an additionnal unit in for each attribute

Hope this helps

Best
Naji

choice-metrics.com

Efficient design with zero priors or near zero priors

Efficient design with zero priors or near zero priors

Re: Efficient design with zero priors or near zero priors

Re: Efficient design with zero priors or near zero priors

Re: Efficient design with zero priors or near zero priors

Re: Efficient design with zero priors or near zero priors

Re: Efficient design with zero priors or near zero priors

Re: Efficient design with zero priors or near zero priors

Re: Efficient design with zero priors or near zero priors

Re: Efficient design with zero priors or near zero priors

Re: Efficient design with zero priors or near zero priors

Who is online