## Gathering Pilot data

This forum is for posts covering broader stated choice experimental design issues.

Moderators: Andrew Collins, Michiel Bliemer, johnr

### Gathering Pilot data

Hello
We are attempting to model the effect of a service improvement to a major bus corridor.
we are presenting our respondents with 2 choices to take the bus or to drive
when we collected some pilot data and ran a model to generate our priors we were unsure if we should use a binomial or multinomial logit model
we decided on multinomial because our data is incomplete (there are other options in reality cycling walking and the like) so our probabilities are not as simple as p and 1-p
although interestingly with only 2 response options on our form they are?

we ran both model types out of interest and got significantly different results
I was wondering if you could please explain a little about how the Mnl Calculates probability, do the probabilities sum to 1 across the choice options as might be expected?

secondly.
We conducted a small pilot and obtained priors, as we entered our data out of interest we ran out model over it and got better priors (with a smaller sample size), is this a valid methodology?
our assumption is that we cannot use the survey responses used to generate the priors in the final model is this correct?

Regards
from the Unisa team.
Unisa

Posts: 11
Joined: Tue Jul 25, 2017 4:09 pm

### Re: Gathering Pilot data

If there are only two choice alternatives, then a multinomial logit model = binomial logit model. BI means two alternatives, MULTI means two or more. So I am not sure why you get different results, binomial logit is a special case of multinomial logit and should give exactly the same results with two alternatives (unless you also included a no-choice alternative).

I am not sure what you mean with (we ran our model over it and got better priors). It is not possible to assess which priors are 'better' since we do not know the real value. A lower sample size does not necessarily mean that your priors are closer to the true parameter values.

I am also not sure what you mean with "use the survey responses to generate the priors in the final model". The parameters estimated from the pilot study can be used as priors for generating an efficient design for the final study.

Michiel
Michiel Bliemer

Posts: 598
Joined: Tue Mar 31, 2009 4:13 pm

### Re: Gathering Pilot data

Dear Michiel
Thank you for the quick reply. I will try to clarify the question with a brief description of my research.

I am presently working on a bus route in Adelaide and am investigating the outcomes of the route to be transformed into a BRT corridor (hypothetical). The next step i took was to undertake a survey with a view to understand if the existing car users would be inclined to use public buses (mode shift from cars) based on the improvements for their regular main trips.

My survey is designed for the respondents to choose one alternative from either cars (Alternative 1) or improved buses (Alternative 2). The selected attributes include Peak Travel Time, Peak Frequency and Access & Egress Walking for both alternative. One additional attribute of Transfer Wait and Delay for the buses.

For the respondents, as there is only one option to choose from the two, i considered that the binary logit model needs to be chosen while analysing the data. I however did note in the literature that MNL has been used for similar situations where there were two alternatives, which made us very unsure. So i assume that our understanding of when to use Binary model may not be correct.

(Q1) Could you please provide some guidance on this?

The priors for the attributes were initially assumed and were distributed in form of normalised values based on our best assumptions of their importances in making a choice decision.

After obtaining approximately 20 surveys, the data were analysed using Nlogit, which showed results consistent with the signs of the coefficients, and the coefficient values. In NLogit, we selected the "“Discrete Choice >> Discrete Choice” option from the model tab. We also ran the “Binary Choice >> Logit” option with an expectation to get the same results, but they weren't same.

(Q3) Could you please advise which one should i choose for my design?

Based on the results, we have further revised the priors and ran iterations in NGene and have obtained results with better "d", "a" and "s'"values. After running the program several times, i am also consistent results in those areas. So i would be very keen to know -

(Q4) if methodically i can use real survey data as pilot survey.
(Q5) if i choose to follow the new survey design would there be any scope for me to use the currently collected data obtained from the old design in NLogit?
(Q6) Would this be a correct/ valid approach overall?

It has become a very long message, i apologise for this. I would be grateful if you could please have a look and please comment when you get a chance.

Have a very happy new year.

Kind regards
Munshi
Unisa

Posts: 11
Joined: Tue Jul 25, 2017 4:09 pm

### Re: Gathering Pilot data

Hi Munshi

Econometrically, the binary logit and MNL model are the same models. The distinction is simply that for the binary model, you have two alternatives, and for the MNL model, you can have 2 or more models.

Traditionally, the binary model was used for yes/no type choices. Are you in the labour force - yes/no. You would have a utility function for done of the alternatives (say yes), and set the other utility function (no in this case) to zero. As utility is relative and not absolute, the utility for the yes utility will be relative to the no which was arbitrarily set to zero (you can theoretically set it to any value). However, you can also assign a (non-zero) utility function to both alternatives, as is your case, provided you set the constant of one of them to zero.

So you can have

U(car) = CarASC + Beta_travel_time_car*TT_car + Beta_cost_car*cost_car
U(bus) = Beta_travel_time_bus*TT_bus + Beta_cost_bus*cost_bus

Note that the utility functions for the MNL in the above example would be exactly the same. The models are equivalent.

In generating the design, the utility functions you assume (levels and betas) should match the utility functions of the models you think are going to estimate. The objective of finding the best design is to minimise something about the AVC matrix for the model you are planning to estimate (as translated from the design to the data). If you are going to use normalised values when you estimate your model, you should optimise your design for this also.

Looking at NLogit, the binary choice model looks to me to be of the yes/no example, I gave above, where you specify only one utility function, and set the second to zero. That is

U(car) = CarASC + Beta_travel_time_car*TT_car + Beta_cost_car*cost_car
U(bus) = 0

Whereas in the MNL option, you probably have

U(car) = CarASC + Beta_travel_time_car*TT_car + Beta_cost_car*cost_car
U(bus) = Beta_travel_time_bus*TT_bus + Beta_cost_bus*cost_bus

The two are not functionally equivalent as you are not estimating the same thing. If that is the case, your results are going to be different.

John
johnr

Posts: 148
Joined: Fri Mar 13, 2009 7:15 am

### Re: Gathering Pilot data

To add to John's reply, addressing Q4-Q6: A pilot survey is essentially no different than a 'real' survey. Both are surveys that collect data on choice behaviour. A pilot survey is often useful to obtain parameter values to inform the priors in creating an efficient design. You can combine the two data sets in estimation, but since the underlying design may be different, you may want to use a nested logit structure in order to account for any scale differences in the two datasets.

Michiel
Michiel Bliemer

Posts: 598
Joined: Tue Mar 31, 2009 4:13 pm

### Re: Gathering Pilot data

Dear John and Michiel

I am very grateful that you took the time and answered my questions in such details. I am a beginner and am undertaking three more choice analyses as part of my overall research, and I found your answers very helpful clarifying my understanding of the DCE concept. Thank you.

From your answers and from the information I found recently from this forum, I can readily see that I have potentially made an error in my design by “NOT Including a Constant” in utility functions that is recommended for labelled alternatives. I would like to take this opportunity to copy both of my initial and recent NGene Syntaxes and their relative results for you to have a quick glimpse at and seek response to some further queries listed at the end.

Initial Syntax (S1) – Priors were assumed and values were distributed in a way that adds up to 1 for each alternative. (I had started my survey based on this design.)

Design
?Trial
;Alts = Car,Improved Bus
;rows = 12
;Eff =(mnl,s)
;model:
U(Car)=A[-0.55]*TTCar[45,48,51,54]+B[-0.45]*AEWCar[0,5,10,15]/
U(Improved Bus )=C[-0.35]*TTImproved Bus[39,42,45,48]+D[-0.20]*SFImproved Bus[6,10]+E[-0.25]*AEWImproved Bus[5,10,15,20]+D*TWDImproved Bus[0,5,10]
\$

MNL efficiency measures

D error 0.528789
A error 3.047175
B estimate 7.850689
S estimate 145.1522

Prior a b c d e
Fixed prior value -0.55 -0.45 -0.35 -0.2 -0.25
Sp estimates 47.5106 61.1318 145.1522 123.214 145.1086
Sp t-ratios 0.284355 0.250682 0.162684 0.176574 0.162708

Current Syntax (S2)- Priors were adjusted from the coeficients observed from the analysis of 18 survey data using NLogit.

Design
?Trial
;Alts = Car,Improved Bus
;rows = 12
;Eff =(mnl,s)
;model:
U(Car)=A[-0.15]*TTCar[45,48,51,54]+B[-0.35]*AEWCar[0,5,10,15]/
U(Improved Bus)=A*TTImproved Bus[39,42,45,48]+C[-0.25]*SFImproved Bus[6,10]+B*AEWImproved Bus[5,10,15,20]+C*TWDImproved Bus[0,5,10]
\$

MNL efficiency measures

D error 0.054059
A error 0.603817
B estimate 55.11679
S estimate 38.53202

Prior a b c d e f
Fixed prior value -0.3 -0.7 -0.2 -0.1 -0.25 -0.45
Sp estimates 16.9997 13.57093 28.89348 38.53202 23.38713 13.50741
Sp t-ratios 0.475374 0.532049 0.364633 0.315751 0.405292 0.533298

A. It has been my understanding that the priors for the same attributes could differ for different alternatives;
B. In Syntax S2, the coefficient values from the survey data did vary (signs were same), however, I have adjusted the priors in accordance with the overall ratio that was found from the MNL results.

Based on the above, could you please comment on the following:

Q1 – How important is the value of a constant in a design such as mine? As I have not added a constant in my syntax would the results be now considered as incorrect?
Q2 – Is there any way that I can adjust the design to accommodate the “constant” if it is important? Or do I need to go through the experiment process all over again?
Q3 – Referring to the notes above (A and B), could you please advise if my understanding of the priors is correct, i.e. I have used it with some degree of flexibility based on the level of assumed influences of the attributes on someone’s choice making decision for a particular alternative?

Thank you once again for the advice you both have provided.

Wish you a very happy new year.

Kind regards
Munshi
munshi.nawaz

Posts: 8
Joined: Thu Dec 28, 2017 6:15 pm

### Re: Gathering Pilot data

Hi Munshi,

Q1 - Constants change the predicted probabilities and hence changes the efficiency of the design. Leaving out the constants will therefore lead to a loss of efficiency in your case, but that does not mean that your design is not useful. You do not need an optimally efficient design to estimate your model, even random designs will typically work (although they require a larger sample size).

Q2 - You cannot manually adjust your design, if you include constants with non-zero priors you will need to find a new efficient design by running the syntax again. If you do not know the priors of your constants and you set them to zero, then your current design is fine. Note my response to Q1 is that you do not HAVE to re-optimise your design, you will still be able to estimate your parameters, but by re-optimising the design with the constants it is likely possible to further reduce your standard errors. If you are collecting data from a large number of respondents, this will not be as important as if you would only have a small number of respondents.

Q3 - Yes this understanding is correct.

I notice that you are using the coefficient C for two different attributes. Therefore, your utility function will be C* (SFImproved Bus + TWDImproved Bus), where these two attributes are assumed to be added in the mind of respondents and have the same preference. You can also estimate separate coefficients C1 and C2 and test whether respondents indeed perceive then the same.

Michiel
Michiel Bliemer

Posts: 598
Joined: Tue Mar 31, 2009 4:13 pm

### Re: Gathering Pilot data

Dear Michiel

Hope you had a lovely Christmas and a New Year.

I sincerely appreciate you making the time and answering my questions in details. Thank you.

I will continue investigating how i can address the comments you made. I may come back with some further questions in the near future.

I found this forum very very helpful. I will have to learn about the analysis part of the data also (using NLogit) soon. So, i was hoping if you could suggest a forum (if there is any), similar to this one, where i can join as a student and can learn from the discussions about NLogit.

Kind Regards
Munshi
munshi.nawaz

Posts: 8
Joined: Thu Dec 28, 2017 6:15 pm

### Re: Gathering Pilot data

I believe you can access the Nlogit forum here: http://www.limdep.com/listserver/
Michiel Bliemer

Posts: 598
Joined: Tue Mar 31, 2009 4:13 pm

### Re: Gathering Pilot data

Thank you Michiel.
munshi.nawaz

Posts: 8
Joined: Thu Dec 28, 2017 6:15 pm

Next