Endpoint Unlabeled Efficient Bayesian Design

This forum is for posts that specifically focus on Ngene.

Moderators: Andrew Collins, Michiel Bliemer, johnr

Endpoint Unlabeled Efficient Bayesian Design

Postby hamad461 » Mon Oct 02, 2017 10:51 am

Dear All,
I am constructing choice sets for an unlabeled experiment. I have 5 attributes each with two levels.
I conducted a pilot study using an optimal design. The pilot had two alternatives and a no-choice option. The total choice sets were 16 divided into 2 blocks. So, every participant evaluated 8 choice sets.
I want to use the priors to construct a Bayesian efficient design. However, I am having trouble with some issues:

1-When I use 3 alternatives and a no-choice, with ;row= 16 Ngene cannot find a design.

2-When I use 2 alternatives and a no-choice, with ;row=16 Ngene produces a design that looks great. But as illustrated by Sándor and Wedel (2002), having 3 or 4 alternatives per choice set is far more accurate in estimating panel-Mixed-logit.

3-The only time that Ngene is able to find a design with 3 alternatives and a no-choice option is when I reduce the total number of choice sets to ;row =8. I do not know why Ngene is limiting me to just 8 choice sets for the whole experiment (I tried 12 and 10 rows but did not work). I would like to increase the number of choice sets to at least 16. I am planning to use Latent Class analysis and panel-Mixed-logit and I would like to have more combinations to be evaluated. My problem is that my sampling frame limits the participants to a small sub-sample (with restrictive inclusion criteria), where it is hard to obtain more than 300 participants.
3-a. What is the best way to deal with this issue?
3.b. Do more choice sets add to the efficiency of the design and compensate for fewer participants?
3.c. Is it better to use 16 choice sets in 2 blocks, where each choice set has 2 alternatives and no-choice option, or to use 8 choice sets in total where each choice set has 3 alternatives and a no-choice option?

4-The syntax below is what worked with Ngene using 8 choices with 3 alternatives, and 2 choice sets with 2 alternatives. As advised elsewhere in the forum, one could do an MNL Bayesian efficient design and optimize it to a RP-Panel design. How do I compare the efficiency between the two? If:

First Option: 16 choice sets, 2 blocks of 8 choices, each choice set has 2 alternatives and a no-choice
(MNL: D-error=0.31, S-estimate=179) and (RP-Panel: D-error=0.89, S-estimate=390) and (D-optimality=93.7%) although the design looks good, while iterating, Ngene shows “ERROR: Aborting the run. After approximately 10 minutes, an initial random design was not found”

Second Option: 8 choice sets in total, each choice set has 3 alternatives and a no-choice option
(MNL: D-error=0.6, S-estimate=374) and (RP-Panel: D-error= 1.46, S-estimate=686) and (D-optimality=98.5%)
Can I conclude from this that I can proceed with the design and can estimate taste heterogeneity using Latent Class and RP-Panel-Mixed-logit? What option of the two should I proceed with?


?The model that did not work
Design
;alts (model1)= alt1*, alt2*, alt3*, alt4
;alts (model2)= alt1*, alt2*, alt3*, alt4
;rows=16
;block=2
;eff = model1(mnl,d,mean)
;rdraws= gauss(3)
;bdraws= gauss(3)
;rep= 1000
;model(model1):
U(alt1) = x1[(n,-0.10,0.04)]*A1[0,1] + x2[(n,0.04,0.04)]*A2[0,1] + x3[(n,0.23,0.04)]*A3[0,1] + x4[(n,0.12,0.04)]*A4[0,1] + x5[(n,0.16,0.04)]*A5[0,1]/
U(alt2) = x1*A1 + x2*A2 + x3*A3 + x4*A4 + x5*A5/
U(alt3) = x1*A1 + x2*A2 + x3*A3 + x4*A4 + x5*A5/
U(alt4) = None[-1.8]

;model(model2):
U(alt1) = x1[n,-0.17,0.45]*A1[0,1] + x2[n,0.08,0.61]*A2[0,1] + x3[n,0.46,0.6]*A3[0,1] + x4[n,0.23,0.6]*A4[0,1] + x5[n,0.33,0.88]*A5[0,1]/
U(alt2) = x1*A1 + x2*A2 + x3*A3 + x4*A4 + x5*A5/
U(alt3) = x1*A1 + x2*A2 + x3*A3 + x4*A4 + x5*A5/
U(alt4) = None[-1.2]

$

This syntax shows this message:
“A valid initial random design could not be generated after approximately 10 seconds. In this time, of the 224299 attempts made, there were 0 row repetitions, 25729 alternative repetitions, and 198570 cases of dominance. There are a number of possible causes for this, including the specification of too many constraints, not having enough attributes or attribute levels for the number of rows required, and the use of too many scenario attributes. A design may yet be found, and the search will continue for 10 minutes. Alternatively, you can stop the run and alter the syntax.”


Any feedback you provide is greatly appreciated.
Sincerely,
Hamad
hamad461
 
Posts: 6
Joined: Sat Sep 30, 2017 8:24 pm

Re: Endpoint Unlabeled Efficient Bayesian Design

Postby Michiel Bliemer » Mon Oct 02, 2017 12:02 pm

The issue is that you are asking Ngene to avoid dominant alternatives and using only 2 attribute levels and 3 alternatives makes that you rule out the majority of choice tasks. I did a quick calculation, and your full factorial of choice tasks is 32,768, while only 6,540 (20%) do not contain a dominant alternative. The more choice tasks you ask, the more difficult it is for Ngene to find them using a column based algorithm (the default in Ngene). Trying to randomly select 16 choice tasks without dominant alternatives has a probability of near 0.

The solution is to use an algorithm that can better handle row based constraints, namely the modified Federov algorithm.

Simply add the following line to your syntax:
;alg = mfederov(candidates = 1000)

This will create a candidate set of 1000 choice tasks without dominant alternatives from which Ngene can easily select 16 or more. Usually 1000 candidates is more than enough to create an efficient design.

Note that the modified Federov algorithm lets go of attribute level balance. I do not think there will be any issue in your case, but if you require more level balance you can try adding some constraints on the appearance of each attribute level within the design.

To answer your specific questions:

1. This is expected, see explanation above.

2. Yes it is easier to find choice tasks without dominant alternatives with only two alternatives. Not that Sandor and Wedel (2002) did NOT estimate a panel mixed logit model, but rather a cross-sectional mixed logit model, which is not an appropriate model. See Bliemer and Rose (2010) in Transportation Research Part B for the first experimental design for panel mixed logit, which shows that it is actually a very different model from the cross-sectional model. Further, while 3 alternatives of course creates more information, it also comes at a higher choice task complexity and respondent burden.

3. This is expected, see explanation above. It is easier to find 8 random choice tasks without a dominant alternative than 16.
3a. See above, change algorithm.
3b. If you ask more questions to each respondent, then yes it adds efficiency. If you merely block the design into smaller pieces, then it does not necessarily add more efficiency, although there is a certain size of the design that has sufficient variation. I would rather go with 16 than 8.
3c. Use 16 choice tasks with 3 alternatives (although I would use 2 alternatives if you plan to put the survey on the internet as I would worry about complexity, which increases error variance, but you will capture less information with 2 alternatives).

4. I assume this question is no longer relevant.

Michiel
Michiel Bliemer
 
Posts: 1705
Joined: Tue Mar 31, 2009 4:13 pm

Re: Endpoint Unlabeled Efficient Bayesian Design

Postby hamad461 » Mon Oct 02, 2017 12:13 pm

Dear Michiel,
Thank you so much for your prompt and detailed response.

Hamad
hamad461
 
Posts: 6
Joined: Sat Sep 30, 2017 8:24 pm

Re: Endpoint Unlabeled Efficient Bayesian Design

Postby hamad461 » Mon Oct 02, 2017 1:33 pm

Dear Michiel,

In the calculation you mentioned, I know that all possible combinations = 2^5 * 2^2 * 2^5 = 32,768
Could you please tell me how did you get the 6,540?

Thank you

Hamad
hamad461
 
Posts: 6
Joined: Sat Sep 30, 2017 8:24 pm

Re: Endpoint Unlabeled Efficient Bayesian Design

Postby Michiel Bliemer » Mon Oct 02, 2017 1:40 pm

If you run the script below, it creates a design with 6,540 feasible choice tasks. When creating the full factorial, Ngene automatically applies the constraints. Convenient in this case :)

Code: Select all
Design
;alts = alt1*, alt2*, alt3*
;rows=all
;fact
;model:
U(alt1) = x1[(n,-0.10,0.04)]*A1[0,1] + x2[(n,0.04,0.04)]*A2[0,1] + x3[(n,0.23,0.04)]*A3[0,1] + x4[(n,0.12,0.04)]*A4[0,1] + x5[(n,0.16,0.04)]*A5[0,1]/
U(alt2) = x1*A1 + x2*A2 + x3*A3 + x4*A4 + x5*A5/
U(alt3) = x1*A1 + x2*A2 + x3*A3 + x4*A4 + x5*A5
$
Michiel Bliemer
 
Posts: 1705
Joined: Tue Mar 31, 2009 4:13 pm

Re: Endpoint Unlabeled Efficient Bayesian Design

Postby hamad461 » Mon Oct 02, 2017 2:18 pm

Dear Michiel,

Thank you so much. Honestly, I can't thank you enough for your detailed and quick response.

Sincerely,

Hamad
hamad461
 
Posts: 6
Joined: Sat Sep 30, 2017 8:24 pm

Re: Endpoint Unlabeled Efficient Bayesian Design

Postby hamad461 » Tue Oct 03, 2017 8:46 pm

Dear Michiel,

Thank you for your help. I have one more question that is related to experimental design.

I did add the ;alg = mfederov(candidates = 1000) and the results are great.

However, when I tried to run the design using priors from another pilot (with a smaller sample, n=30, and most results are insignificant) the resulting design has HUGE S-estimates(>8000). Although the priors form the second pilot is to some extent close to the previous one and with the same signs (except that x4 is now negative), I could not find any explanation to why is this happening. Here is the Syntax with the smaller sample's priors:


?design with small sample n=30, resulting design is prolematic
Design
;alts (model1)= alt1*, alt2*, alt3*, alt4
;alts (model2)= alt1*, alt2*, alt3*, alt4
;rows=16
;block=2
;eff = model1(mnl,d,mean)
;rdraws= gauss(3)
;bdraws= gauss(3)
;rep= 1000
;alg = mfederov(candidates = 1000)
;model(model1):
U(alt1) = x1[(n,-0.09,0.07)]*A1[0,1] + x2[(n,-0.06,0.07)]*A2[0,1] + x3[(n,0.1,0.07)]*A3[0,1] + x4[(n,-0.04,0.07)]*A4[0,1] + x5[(n,0.17,0.07)]*A5[0,1]/
U(alt2) = x1*A1 + x2*A2 + x3*A3 + x4*A4 + x5*A5/
U(alt3) = x1*A1 + x2*A2 + x3*A3 + x4*A4 + x5*A5/
U(alt4) = None[-1.6]

;model(model2):
U(alt1) = x1[n,-0.27,0.72]*A1[0,1] + x2[n,0.01,1]*A2[0,1] + x3[n,0.19,0.73]*A3[0,1] + x4[n,-0.03,0.6]*A4[0,1] + x5[n,0.5,1]*A5[0,1]/
U(alt2) = x1*A1 + x2*A2 + x3*A3 + x4*A4 + x5*A5/
U(alt3) = x1*A1 + x2*A2 + x3*A3 + x4*A4 + x5*A5/
U(alt4) = None[-0.97]
$



?original design n=84, resulting design is great
Design
;alts (model1)= alt1*, alt2*, alt3*, alt4
;alts (model2)= alt1*, alt2*, alt3*, alt4
;rows=16
;block=2
;eff = model1(mnl,d,mean)
;rdraws= gauss(3)
;bdraws= gauss(3)
;rep= 1000
;alg = mfederov(candidates = 1000)
;model(model1):
U(alt1) = x1[(n,-0.10,0.04)]*A1[0,1] + x2[(n,0.04,0.04)]*A2[0,1] + x3[(n,0.23,0.04)]*A3[0,1] + x4[(n,0.12,0.04)]*A4[0,1] + x5[(n,0.16,0.04)]*A5[0,1]/
U(alt2) = x1*A1 + x2*A2 + x3*A3 + x4*A4 + x5*A5/
U(alt3) = x1*A1 + x2*A2 + x3*A3 + x4*A4 + x5*A5/
U(alt4) = None[-1.8]

;model(model2):
U(alt1) = x1[n,-0.17,0.45]*A1[0,1] + x2[n,0.08,0.61]*A2[0,1] + x3[n,0.46,0.6]*A3[0,1] + x4[n,0.23,0.6]*A4[0,1] + x5[n,0.33,0.88]*A5[0,1]/
U(alt2) = x1*A1 + x2*A2 + x3*A3 + x4*A4 + x5*A5/
U(alt3) = x1*A1 + x2*A2 + x3*A3 + x4*A4 + x5*A5/
U(alt4) = None[-1.2]
$


Sincerely,

Hamad
hamad461
 
Posts: 6
Joined: Sat Sep 30, 2017 8:24 pm

Re: Endpoint Unlabeled Efficient Bayesian Design

Postby Michiel Bliemer » Wed Oct 04, 2017 1:30 pm

I am not sure whether you are referring to the S-estimates of the mnl model or the rppanel model, but for example looking at the rppanel model it is easy to explain why the S-estimates are huge.

In the original design, you have for example:
x2[n,0.08,0.61]*A2[0,1]

So you are estimating a mean of 0.08 having an attribute level range of only [0,1]. This is a very small effect and therefore you will need a large sample size in order to get a statistically significant parameter estimate.

However, with the small sample priors, you have:
x2[n,0.01,1]*A2[0,1]

In other words, you are trying to estimate an extremely small mean of 0.01, and this will require a huge sample size (assuming that 0.01 is correct). Note that S-estimates only make sense if your parameter priors are reasonably accurate, otherwise they are not so meaningful. In that case you can simply ignore them.

Michiel
Michiel Bliemer
 
Posts: 1705
Joined: Tue Mar 31, 2009 4:13 pm

Re: Endpoint Unlabeled Efficient Bayesian Design

Postby hamad461 » Wed Oct 04, 2017 2:41 pm

Dear Michiel,

So in the case where priors are not accurate and one is ought to ignore the S-estimate, what constitutes the sample size for an experiment that is intended to use either Latent Class or Panel-Mixed-Logit?
I know that in the aggregate estimation using MNL model, a sample size rule of thumb according to Orme (2010) in his book, Getting Started with Conjoint Analysis:
(n*t*a/c)>500
Where (n) is the sample size, (t) is the number of tasks, (a) is the number of alternatives per task (not including the non-option), and (c) is the number of levels.

Thank you,

Hamad
hamad461
 
Posts: 6
Joined: Sat Sep 30, 2017 8:24 pm

Re: Endpoint Unlabeled Efficient Bayesian Design

Postby Michiel Bliemer » Wed Oct 04, 2017 3:50 pm

An overview of sample size requirements is provided in:

Rose, J.M. and M.C.J. Bliemer (2013) Sample size requirements for stated choice experiments. Transportation, Vol. 40, No. 5, pp. 1021-1041.

Sample size calculations are very case specific and therefore in my opinion you need somewhat reliable prior values in order to get a good estimate. For example, if your attributes are not very relevant, than you may need a sample size of many thousands, while if your attributes are very important then perhaps a sample size of 10 is enough. You can use an existing rule of thumb like Orme but I would not put too much trust in these calculations. As far as I am aware, no rule of thumb exists for latent class or mixed logit model.

If you have no reliable priors, then you may need to adopt a strategy in which you keep collecting data at least until your parameters become statistically significant, since it will be difficult to predict how many you need. Dummy coded variables with 0,1 range are always the most difficult to estimate, but if you believe that people feel very strongly about these attributes then it will be easier.

Michiel
Michiel Bliemer
 
Posts: 1705
Joined: Tue Mar 31, 2009 4:13 pm


Return to Choice experiments - Ngene

Who is online

Users browsing this forum: Google [Bot] and 6 guests

cron