Page 1 of 2
Beginner help needed - am I on the rigtht track?
Posted:
Thu Apr 12, 2018 12:41 am
by kschneider
Hello,
I am brand new to choice experiments and Ngene and am trying to generate a design for a simple situation. Here is the context: we are conducting a survey in rural Malawi on exposure to a toxic metabolite produced by mold on maize (corn). We have the following conditions:
1. Sample size is 360 respondents
2. Experiment has 3 attributes: Season (harvest, lean), Mold (moldy, clean), and Price (90,100,125,150,200,250).
3. We want to give respondents two blocks of choice sets, where the season is held constant in each block (e.g. we ask mold:price choice sets all for the lean season then all for the harvest season). Moldiness will be communicated with a visual image (a photo of moldy or clean maize), with the choice set shown as two side-by-side images each with a price under the photo.
4. We are going to estimate willingness-to-pay for clean (not moldy) maize in each season.
5. We have no prior studies to estimate the coefficients but expect a positive sign on the moldy/clean (coded 0 for moldy and 1 for clean), and a negative sign on price but don't have even a sign hypothesis for the coefficient on season (coded 1 for harvest, 0 for lean). I don't know how to put in a signed hypothesis without a value as the priors so my draft code below has put 0 for all.
I have tried to run the following design code but am getting the error "Warning: No valid design has been found after 1000 evaluations. There may be a problem with the specification of the design. A common problem is that the choice probabilities are too extreme (close to 1 and 0), perhaps because some or all of the prior values are too large. Also, it is generally a good idea to start with a simple design (MNL, non-Bayesian), then add complexity. If you press stop, a design will be reported, which may assist in diagnosing the problem." Where am I going wrong?
Design
;alts = alt1, alt2
;rows = 24
;eff = (mnl,wtp(wtp))
;wtp = wtp(* / b4)
;cond:
if(alt1.A = 0, alt2.A = 0) ,
if(alt1.A = 1, alt2.A = 1)
;model:
U(alt1) = b1 + b2 * A[0,1] + b3 * B[0,1] + b4 * C[90,100,125,150,200,250] /
U(alt2) = b2 * A + b3 * B[0,1] + b4 * C[90,100,125,150,200,250] $
Any help much appreciated! Thank you!
-Kate
Re: Beginner help needed - am I on the rigtht track?
Posted:
Thu Apr 12, 2018 9:38 am
by johnr
Hi Kate
1. Sample size is 360 respondents
This is fine. Most empirical studies have a budgetary limit on the number of respondents possible.
2. Experiment has 3 attributes: Season (harvest, lean), Mold (moldy, clean), and Price (90,100,125,150,200,250).
Again, this theoretically fine, however you should note that the levels are not equidistant (100 – 90 = 10; 250 – 200 = 50). From an optimal experimental design perspective there is no issue with this, however empirically, there were some studies in the very early days of choice modelling that suggested that this may empirically influence the model results (I’m looking for the references now). A lot has changed this then, so it may have been related to how they did designs/estimation back then, or it may be a real phenomenon – I haven’t seen anything recent on this, so I cannot say one way or the other. It is just something to keep in mind. Don’t be overly worried however. We see this all the time without issues arising.
3. We want to give respondents two blocks of choice sets, where the season is held constant in each block (e.g. we ask mold:price choice sets all for the lean season then all for the harvest season). Moldiness will be communicated with a visual image (a photo of moldy or clean maize), with the choice set shown as two side-by-side images each with a price under the photo.
Why not generate two separate designs. The MNL model assumes that you can sum the components of the model that make up the VC matrix. Hence, if the constraints are not possible, you can generate two separate designs, one for each treatment.
4. We are going to estimate willingness-to-pay for clean (not moldy) maize in each season.
You can usually always generate WTP estimates, even if you optimise on another criterion. The WTP is in a linear utility function simply the ratio of two parameters. This is where you have gone wrong. By putting a zero prior on price, and dividing by price, you are dividing by zero, which is undefined. The software cannot find a design because the optimisation criteria it is returning when computing the wtp estimate is not defined. If you want to optimise on wtp, the denominator in the calculation cannot be zero.
5. We have no prior studies to estimate the coefficients but expect a positive sign on the moldy/clean (coded 0 for moldy and 1 for clean), and a negative sign on price but don't have even a sign hypothesis for the coefficient on season (coded 1 for harvest, 0 for lean). I don't know how to put in a signed hypothesis without a value as the priors so my draft code below has put 0 for all.
This is a common misunderstanding. A zero prior is not the same as not knowing the sign of the parameter, though it is often stated in this way. If you are optimising based on setting the prior to zero, then you are saying that the population parameter is zero (which it is true strictly has no positive or negative sign), not that it could be positive or could be negative. Zero is no different to say -0.1 – you are saying that you believe the parameter is -0.1, not -0.2 or 0.1, but -0.1. If your prior is 0, then you are saying you believe the parameter is 0. This is not the same as saying, it could be positive, or it could be negative – you are saying it is zero. Strictly speaking, if you want to allow for the possibility of either sign, then you should apply Bayesian priors that span either side of zero.
Here are two possible design approaches (first is where you give same questions to both groups) and the second gives different designs to each group. Note that this requires that you also add a no choice (for the harvest group – it is one for all alternatives so a constant across alternatives with no variation which causes problems). Let us think about other possible solutions, but this is a start.
Design
;alts(harvest) = alt1, alt2, alt3
;alts(mold) = alt1, alt2, alt3
;rows = 12
;eff = F1(mnl,d)
;Fisher(F1) = des1(harvest[0.5],mold[0.5])
;model(harvest):
U(alt1) = b2[-0.1] * A.covar[1] + b3[-0.1] * B[0,1] + b4[-0.005] * C[90,100,125,150,200,250] /
U(alt2) = b2 * A.covar[1] + b3 * B[0,1] + b4 * C[90,100,125,150,200,250]
;model(mold):
U(alt1) = b2[-0.1] * A.covar[0] + b3[-0.1] * B[0,1] + b4[-0.005] * C[90,100,125,150,200,250] /
U(alt2) = b2 * A.covar[0] + b3 * B[0,1] + b4 * C[90,100,125,150,200,250] $
Design
;alts(harvest) = alt1, alt2, alt3
;alts(mold) = alt1, alt2, alt3
;rows = 12
;eff = F1(mnl,d)
;Fisher(F1) = des1(harvest[0.5]) + des2(mold[0.5])
;model(harvest):
U(alt1) = b2[-0.1] * A.covar[1] + b3[-0.1] * B[0,1] + b4[-0.005] * C[90,100,125,150,200,250] /
U(alt2) = b2 * A.covar[1] + b3 * B[0,1] + b4 * C[90,100,125,150,200,250]
;model(mold):
U(alt1) = b2[-0.1] * A.covar[0] + b3[-0.1] * B[0,1] + b4[-0.005] * C[90,100,125,150,200,250] /
U(alt2) = b2 * A.covar[0] + b3 * B[0,1] + b4 * C[90,100,125,150,200,250] $
Re: Beginner help needed - am I on the rigtht track?
Posted:
Thu Apr 12, 2018 9:54 am
by kschneider
Dear John,
I cannot thank you enough! This was so incredibly helpful and exactly what I was needing. I am going to use your second option with 2 completely different designs because we will actually ask all respondents to answer in reference to both seasons, but are going to randomize the order of which season is asked first.
Thank you again SO much,
Kate
Re: Beginner help needed - am I on the rigtht track?
Posted:
Thu Apr 12, 2018 10:51 am
by johnr
Hi Kate
No problems. We are here to help. It is very important if you adopt this approach that you show respondents a no choice alternative. If you don't, then you will need to include the season variable as an interaction term rather than as a main effect as I have done. If you are concerned that people will always choose the no choice (it happens), you can always get them to rank the three alternatives (alt1, alt2 and no) rather than just pick one. That way, you still get information on the preferences for the two non-no choice alternatives.
John
Re: Beginner help needed - am I on the rigtht track?
Posted:
Thu Apr 12, 2018 1:16 pm
by Michiel Bliemer
This is the alternative that John refers to, namely using season as a scenario variable, creating interactions with season, and block the design such that it creates two blocks of 12 choice tasks in which in each block respondents are asked about both seasons:
Design
;alts = alt1, alt2
;rows = 24
;block = 2
;eff = (mnl,d)
;model:
U(alt1) = bm[-0.1] * mold[0,1] + bms * mold * season[0,1] + bp[-0.005] * price[90,100,125,150,200,250] + bps * price * season /
U(alt2) = bm * mold[0,1] + bms * mold * season[season] + bp * price + bps * price * season $
mold[0] = moldy
mold[1] = clean
season[0] = harvest
season[1] = lean
Then:
bm / bp = willingness to pay for clean maize in season "harvest"
(bm + bms) / (bp + bps) = willingness to pay for clean maize in season "lean"
This creates a single design instead of two separate designs as in John's syntax.
Michiel
Re: Beginner help needed - am I on the rigtht track?
Posted:
Fri Apr 13, 2018 11:26 am
by kschneider
Thank you again so much! I do want to have a "no choice" option, I forgot about that in my original post. I have to admit I'm outside my league here, I'm not sure I really understand the difference between what Michiel has described and John's earlier code that had me generate 2 different designs, one for each season. What is the difference between the model that has the interaction terms and one that doesn't for estimating WTP? (this is my first time estimating WTP, so I'm completely new not only to the experiment design but to putting together the right econometric model for the analysis).
From you experience and knowledge of the literature, is there a reason that I should or should not give respondents the same 12 choices for each season? My inclination is to give a completely different 12 choice sets for the second season so that they are not fatigued. Also is there an empirical way to figure out exactly how many choice sets I need to present to each respondent, given my sample size?
Thank you again SO much, I'm tremendously grateful for all the handholding!
-Kate
Re: Beginner help needed - am I on the rigtht track?
Posted:
Fri Apr 13, 2018 1:34 pm
by Michiel Bliemer
John's utility functions and my utility functions include behaviour in slightly different ways, so they represent a somewhat different model. It is up to you to decide what you believe is an appropriate model for the behaviour that you would like to capture. There is not necessarily one right way of doing things. John may do it his way, and I would choose to do it my way. I think that my solution is not only simpler but accounts for a different price sensitivity per season whereas John's model does not. You can also add a no-choice alternative in my syntax. This third alternative would only have a constant. If your objective is to obtain willingness to pay, you do not need a no-choice alternative.
I have blocked my design in 12 blocks in which each respondent sees 6 choice tasks with one season and 6 choice tasks with another season. They need not be the same 6 choice tasks. I think asking 24 choice tasks from a single respondent is way too much. By having two versions of your survey (with Block A and Block B) you create some more variety in your data set.
Michiel
Re: Beginner help needed - am I on the rigtht track?
Posted:
Fri Apr 13, 2018 2:11 pm
by johnr
Hi Kate
There are a few key differences between the two approaches, aside from different code we have used. The first syntax I show is functionally equivalent to Michiel’s with the inclusion of a no-choice. In effect, the first syntax produces a design where for each choice task, B and C are the same for the two season designs. If you look at output generated for this syntax, you will see that aside from season, the attribute combinations for B and C are the same in choice tasks 1 and 2 for the two designs. The same goes for choice task 2, 3, etc. So aside from the season attribute, respondents seeing choice task 1 from the harvest design will see the same combinations as a respondent seeing choice task 1 for the 2nd design when it comes to attributes B and C. In the second syntax I provided, the combination of levels of attributes B and C differ by the two designs.
The primary difference between the two approaches offered by Michiel and I is that he hasn’t included a no choice alternative (alt3 in my syntax). This has meant that he has assumed interaction terms in his utility functions. The problem with choice models is that attributes need to vary across alternatives to explain choice. If an attribute is always the same between all alternatives, even if it varies between choice tasks, you cannot estimate its effect. To demonstrate, consider the following example, assuming two alternatives, two attributes and two choice tasks
Task 1
U(alt1) = b1*price = 1 + b2*quality = 2
U(alt2) = b1*price = 1 + b2*quality = 1
Task 2
U(alt1) = b1*price = 2 + b2*quality = 1
U(alt2) = b1*price = 2 + b2*quality = 2
In task 1 price =1 for both alternatives, and for task 2, price = 2 for both alternatives. In this case, the price is always the same in any given task, and as it doesn’t vary, it cannot be used to explain choice (in task 1 did they choose alt1 because price was higher or lower than the price for alt2 / in task 2 did they choose alt1 because price was higher or lower than the price for alt2?). You need variation (in the independent variables) to explain variation (in the dependent variable).
My code gets around this by creating the variation with the no choice alternative
Task 1
U(alt1) = b1*price = 1 + b2*quality = 2
U(alt2) = b1*price = 1 + b2*quality = 1
U(none) = b1*0 + b2*0
Task 2
U(alt1) = b1*price = 2 + b2*quality = 1
U(alt2) = b1*price = 2 + b2*quality = 2
U(none) = b1*0 + b2*0
such that you can at least determine whether higher prices are more/less likely to switch someone to not choosing either alternative (the no choice).
Michiel overcame the problem by assuming interaction effects. You can see this in this part of code
+ bms * mold * season[0,1]
and
+ bms * mold * season[season]
In the second utility function, season[season] tells Ngene to use the same level as picked up the first time the attribute appears (which is in the first utility function. So if in the design, season[0,1] = 1, then season[season] = 1, or if season[0,1] = 0, then season[season] = 0. This is what you want as season is then fixed in a choice task – it is the same for both alternatives. Rather than create the variation as I have with a no choice alternative, the rest of the interaction term creates the variation for you. You can see this here
U(alt1) = bm[-0.1] * mold[0,1] + bms * mold * season[0,1]
U(alt2) = bm * mold[0,1] + bms * mold * season[season]
Assume the design picks mold = 1 and season =1 for alt1, then
U(alt1) = bm[-0.1] * mold[1] + bms * mold[1] * season[1]
For alt 2, mold can be either 1 or 0. Lets say it is zero. Then
U(alt2) = bm * mold[0] + bms * mold[0] * season[season] = 0
So alt2 and alt 1 are different because mold is different even though season is forced to be the same between the two alternatives. Note that you can use the same approach as Michiel without interactions if you add a no choice. The syntax would be
Design
;alts = alt1, alt2, alt3
;rows = 24
;block = 2
;eff = (mnl,d)
;model:
U(alt1) = bm[-0.1] * mold[0,1] + bms * season[0,1] + bp[-0.005] * price[90,100,125,150,200,250] /
U(alt2) = bm * mold[0,1] + bms * season[season] + bp * price $
From you experience and knowledge of the literature, is there a reason that I should or should not give respondents the same 12 choices for each season? My inclination is to give a completely different 12 choice sets for the second season so that they are not fatigued?
There is no evidence either way. Theoretically it shouldn’t matter, however some researchers (read reviewers) might harp on about possible design artefacts. This is code for people interact with the experiment, and what you show them (and how you show them) may influence their responses. Purists along these lines might argue that if you observe differences between season = 1 and season = 2 outcomes, is it because season is different, or because the other attribute combinations shown differed. Whether this is true or not, I know not, but something I have seen argued from time to time.
Also is there an empirical way to figure out exactly how many choice sets I need to present to each respondent, given my sample size?
There is a large literature on this, but the evidence is mixed. Most vary between 1 task per respondent to 20. I personally tend to err on the side of caution and aim for around 6-8, however the answer tends to depend on the literature you are publishing in. Some literature prefers very few, whilst others don’t seem to care. The evidence does suggest that having too many can result in increased error variance.
Thank you again SO much, I'm tremendously grateful for all the handholding!
Its not a problem. Its our pleasure.
Re: Beginner help needed - am I on the rigtht track?
Posted:
Mon Apr 16, 2018 2:33 am
by kschneider
Thank you both again so very much. Here is where I am moving:
1. Michiel's model is more appropriate to our context because we expect WTP to be different in the two seasons
2. We do want a third option that would be "buy none" because opting out of the market and potentially going hungry is a real choice households may be making and that's important for us to know.
3. 6 choices per season per respondent and 2 blocks of respondents (survey versions) is perfect and seems to concur with the limited guidance research out there and best practices for choice experiments in similar contexts (e.g. ISPOR reports).
4. We will divide each block in half and 50% of respondents will see the 6 lean season choices first then the 6 harvest and the other half of respondents will see the reverse season ordering. However we don't want to change seasons between choice sets so we will group all the lean and all the harvest together. The anchoring is important to accurate responses we believe, and we will report this choice in our publications.
So that brings me to the last coding issue where I want to confirm the correct addition of the third "buy none" choice into Michiel's model as follows:
Design
;alts = alt1, alt2, none
;rows = 24
;block = 2
;eff = (mnl,d)
;model:
U(alt1) = bm[-0.1] * mold[0,1] + bms * mold * season[0,1] + bp[-0.005] * price[90,100,125,150,200,250] + bps * price * season /
U(alt2) = bm * mold[0,1] + bms * mold * season[season] + bp * price + bps * price * season $
mold[0] = moldy
mold[1] = clean
season[0] = harvest
season[1] = lean
none[0] = buy
none[1] = optout
When I run the design code above it appears to work, so I just want to make sure it's doing what I hope it's doing and Michiel's code for the WTP analysis will still be the same:
bm / bp = willingness to pay for clean maize in season "harvest"
(bm + bms) / (bp + bps) = willingness to pay for clean maize in season "lean"
And we can then analyze the odds of opting out using the selection of the no choice option as the dependent variable. For interpretability, we would probably do that for each season separately rather than with the triple interaction but again would love guidance if that is a commonly estimated outcome from choice experiments.
Thank you again!
-Kate
Re: Beginner help needed - am I on the rigtht track?
Posted:
Mon Apr 16, 2018 4:07 am
by kschneider
Oh no, I've run into a problem with the code in my recent post. All the alternatives have the clean choice at a lower price than moldy, which would result in all respondents always choosing the clean and cheaper option. I guess I need to impose more constraints on the design so that this doesn't happen. What do you advise?
With continued thanks!
-Kate