Best-Worst (Case 1) dataset development

This forum is for posts covering broader stated choice experimental design issues.

Moderators: Andrew Collins, Michiel Bliemer, johnr

Best-Worst (Case 1) dataset development

Postby mbarrowc » Thu Dec 06, 2018 9:08 am

Dear all,
I hope my question is not too far outside the scope of this forum. I recently conducted a Best-Worst (Case 1) choice experiment. I am familiar with how to implement the “Count” method (Best minus Worst) but would like to run a more in-depth analysis, such as a conditional logit or latent class logit model. I am having issues with coding my dataset to allow me to do this.

Suppose I have 6 attributes (A, B, C, D, E, F) of interest. I provide participants 10 choice scenarios, with each scenario including only 3 of the attributes. How do I go about creating a dataset that will allow me to run a conditional logit or latent class logit model from the participants’ responses? Below is an example of a dataset I created but when I run a conditional logit, the last variable (e.g., F) is omitted due to collinearity. In this example, choice scenario #1 (of 10) asks participants to select a Best/Worst combo between {A, B, F}. For sake of brevity, I am only including the data related to the first choice scenario. The remaining 9 scenarios for each respondent would be coded similarly. The dependent variable in this example is “Choice” with A, B, C, D, E, and F being the independent variables. In scenario #1, the participant selected “A” as the most important attribute and “B” as the least important attribute; attribute “F” was not selected. I originally left the scenario omitted variables blank (e.g., in scenario #1 I left variable C, D, and E columns empty) however my conditional logit model came back with no observations able to be run due to each observation was missing data. The current setup does not seem correct to me, which is not surprising seeing as how I am having collinearity issues.

Dataset: With each scenario including 3 attributes, a total of 6 outcomes are possible; this is why I have 6 rows for each choice scenario. A "1" implies best while a "-1" implies worst.
Choice A B C D E F
0 1 -1 0 0 0 0
0 1 0 0 0 -1 0
0 -1 1 0 0 0 0
1 0 1 0 0 -1 0
0 -1 0 0 0 1 0
0 0 -1 0 0 1 0

I have searched extensively about this topic and have not had any luck. Any and all help and advice would be greatly appreciated. Thank you in advance.
-Mike
mbarrowc
 
Posts: 5
Joined: Wed Apr 24, 2013 7:49 am

Re: Best-Worst (Case 1) dataset development

Postby Michiel Bliemer » Thu Dec 06, 2018 10:22 am

If I understand correctly, you are trying to use data from a best-worst scaling experiment to estimate a choice model. I am not sure how to estimate a choice model using such data, but one of my colleagues mentions that this is possible and will respond in the coming days.

Michiel
Michiel Bliemer
 
Posts: 1705
Joined: Tue Mar 31, 2009 4:13 pm

Re: Best-Worst (Case 1) dataset development

Postby johnr » Thu Dec 06, 2018 11:32 am

Hi Mike

Of course it will depend on the software data format you are using. I will demonstrate using the Nlogit format. You may need to reformat the data if you are using some other software.
Say in the first task, they saw attribute D, C and E and choose C as the best and E as the worst. You start by constructing the best choice set using dummy codes (hence attribute F is omitted in my example below). The task has three alternatives coded 1, 2 and 3, with the dummy attributes accounting for what they saw in each alternative. In the example below, scenario = 1, 2, ... 10, BW = indicator of best = 1, or worst choice = -1, altij = 1,2,3 and CSET, how many alternatives in the choice task - 2 or 3. In the best choice task, you have all three available. Choice = 1 or 0 and the attributes, A...E dummy coded. This is shown below.

Scenario BW Altij CSET Choice A B C D E
1 1 1 3 0 0 0 0 1 0
1 1 2 3 1 0 0 1 0 0
1 1 3 3 0 0 0 0 0 1

We now construct a pseudo observation for the worst choice for this task. Now if you choice C as best for the task, the probability that it can simultaneously be chosen as worst = 0. Hence we delete this alternative when constructing the worst pseudo observation. Next, we simply code the attributes as -1 or 0 instead of 1 or 0. Hence.

Scenario BW Altij CSET Choice A B C D E
1 -1 1 2 0 0 0 0 -1 0
1 -1 3 2 1 0 0 0 0 -1

This is still scenarion 1, but now BW = -1. Note that we now have two alternatives available (we deleted altij = 2 as it was selected best) hence cset = 2. Further the attributes are now coded -1 and 0, rather than 1 and zero.


Repeat process for remaining nine choice tasks, and you have your choice data.

Hope this helps.


John
johnr
 
Posts: 168
Joined: Fri Mar 13, 2009 7:15 am

Re: Best-Worst (Case 1) dataset development

Postby mbarrowc » Wed Dec 19, 2018 3:53 am

John and Michiel,
Thank you both for your responses.

John, I made the suggested changes but am running into collinearity issues. When I try to run a conditional logit model it will omit the final variable (e.g., "E") because of collinearity. I am using STATA/IC (version 15) to conduct my analysis so that may be the reason, as it is not Nlogit. I had to modify your formatting slightly because of it. Below is my coding format following your suggestion. In this example, the respondent is shown attributes A, B, and C. Attribute "B" is chosen as best and attribute "A" is chosen as worst. The "ID" variable is used to identify the "group" of possible outcomes that the choice is being made out of; this is needed for the clogit command in STATA. Thus if I had 1 respondent answer 10 scenarios, the ID would span from 1 - 20. This variable ("ID") will continue to increase numerically as the number of respondents increases (e.g., the second respondent would have ID values spanning from 21 - 40).

Scenario ID Choice A B C D E
1 1 0 1 0 0 0 0
1 1 1 0 1 0 0 0
1 1 0 0 0 1 0 0
1 2 1 -1 0 0 0 0
1 2 0 0 0 -1 0 0

I have tried using another coding format but I run into the same issue. I am posting an example of the other coding format below. The coding is formatted so that each possible scenario outcome is presented. Because I am showing respondents 3 attributes (out of 6), there are a total of 6 possible scenario outcomes. In the scenario example below, respondents are presented with attributes A, B, and C and asked to choose which is best and which is worst. In the attribute columns (A-E), a 1 signifies "Best" and a -1 signifies "Worst". The remaining 9 scenarios presented to respondents are coded similarly. Unlike the formatting above, the ID variable here for the first respondent will span from 1 - 10.

Scenario ID Choice A B C D E
1 1 0 1 -1 0 0 0
1 1 0 1 0 -1 0 0
1 1 0 -1 1 0 0 0
1 1 1 0 1 -1 0 0
1 1 0 -1 0 1 0 0
1 1 0 0 -1 1 0 0

Any additional help or guidance would be greatly appreciated. Thank you again.

-Mike
mbarrowc
 
Posts: 5
Joined: Wed Apr 24, 2013 7:49 am

Re: Best-Worst (Case 1) dataset development

Postby johnr » Wed Dec 19, 2018 11:18 am

Hi Mike

The first coding looks correct to me - the second does not at first glance. Note that I only included attributes A to E (this would be 1 to 5) originally as you mentioned attributes A to F (1 to 6). Hence, I dropped one already. If the software is dropping E as well, then there is definitely something more going on with your data. Can you send me a sample of your data and I can take a look. Please note, my sole experience with Stata was to open the software and then close it 5 seconds later, so whatever you send, I will probably convert to nlogit or biogeme formating - no biggy, leave that with me.

The reason the second coding doesn't look right to me is that I'm guessing the best and worst data are perfectly co-linear in such a set up. I could be wrong (not the first time, not even the first time today, and certainly not the first time in the last hour). Anyway, I know that the first coding structure is how others code BW data (based on workshops I have attended), so for the moment lets keep that for the moment.

So email me a portion of your data and we will get this sorted.

John (john.rose-1@uts.edu.au).
johnr
 
Posts: 168
Joined: Fri Mar 13, 2009 7:15 am

Re: Best-Worst (Case 1) dataset development

Postby mbarrowc » Sat Dec 22, 2018 2:58 am

Thanks John. I just sent it over.
-Mike
mbarrowc
 
Posts: 5
Joined: Wed Apr 24, 2013 7:49 am

Re: Best-Worst (Case 1) dataset development

Postby johnr » Thu Dec 27, 2018 1:05 pm

Hi Mike

I've sent you model results using Nlogit assuming the data structure I suggested. The model appears to work fine. I'm not sure if this is a Stata issue now. As I'm not a Stata user, I can't comment and am at a loss. Certainly the data is not the issue, but maybe the format of the data for Stata?

John
johnr
 
Posts: 168
Joined: Fri Mar 13, 2009 7:15 am


Return to Choice experiments - general

Who is online

Users browsing this forum: No registered users and 5 guests