Gathering Pilot data

This forum is for posts covering broader stated choice experimental design issues.

Moderators: Andrew Collins, Michiel Bliemer, johnr

Re: Gathering Pilot data

Postby munshi.nawaz » Mon Sep 03, 2018 3:03 pm

Dear Michiel

I do have more of an Nlogit question than an Ngene one, however, it’s based on the efficient design I have been working with.

Since our last correspondences, I have been undertaking a household questionnaire survey with a goal to understand the probability of the general public to use an improved Bus service (potentially a BRT) along my study corridor. a brief details are as follows:

A) I had to design 3 sets of surveys (I am calling those MA, MB and MC) including questions designed for people living in different locations. Samples obtained so far are: MA= 23, MB =12 and, MC =43, based on the sample size generated by NGene.

B) The scenario questions were produced separately by NGene and by the use of separate Syntax and priors (similar though), However, the following were the same:
• There are 2 alternatives – (1) Private Cars and (2) Improved Bus
• The attributes are same: (1) Travel time, (2) Peak Service Frequency (headway), (3) Access and Egress Walking time and, (4) Transfer wait and delay.

C) The attribute levels were different – firstly, because of the consideration of different locations, and secondly, the Attribute 4 (transfer wait and delay) was only applicable for the residents of the farther suburbs and who are at least 500m away from the study corridor.

I have not added my syntax to keep this message brief, please let me know if you would like to see those.

Based on the above, I would like to seek your advice on following Questions please:
Q1. Can I combine all three datasets in one spreadsheet for analysis?
Q2. If not, is there any other mechanism to get a combined outcome?
Q3. Also, I have used 18 questionnaires as pilot before I came to the final design of one of the above surveys as i mentioned in my previous questions. Again, the attribute levels have only changed in the new design from the Pilot. As you have indicated in your earlier reply, could you please advise me on a mechanism on how I can include these pilot data to my current analysis?
Q4. My understanding of the Nested Logit Model so far is that it’s based on nesting similar type of alternatives together? is there any other mechanism to do the nesting?

I would be very grateful if you could please provide some advice on this matter. It would immensely help me moving forward with my dataset and the analysis.

Thank you in advance.

Yours sincerely
Munshi
munshi.nawaz
 
Posts: 8
Joined: Thu Dec 28, 2017 6:15 pm

Re: Gathering Pilot data

Postby Michiel Bliemer » Mon Sep 03, 2018 3:15 pm

Q1: Yes you can. Also include your scenario variables in the same dataset such that you can use the scenario variables that define MA, MB, and MC as variables in your utility functions

Q2: See Q1

Q3: You can combine data from the pilot study to your other data in the same way.

Q4: Nested logit models are useful in case you expect differences in scale (= differences in error variance). There may be scale differences between your main data set and your pilot data set (e.g., because they came from an orthogonal design and not from an efficient design that typically yield more difficult questions by forcing respondents to make trade-offs), or perhaps there are scale differences between data from different scenarios. Nested logit models are often used to handle scale differences in different datasets.

Michiel
Michiel Bliemer
 
Posts: 1705
Joined: Tue Mar 31, 2009 4:13 pm

Re: Gathering Pilot data

Postby munshi.nawaz » Mon Sep 03, 2018 4:12 pm

Dear Michiel
I sincerely appreciate your advice and the guidance. It would help me greatly continuing with my analysis.
Thank you.

Yours sincerely
Munshi
munshi.nawaz
 
Posts: 8
Joined: Thu Dec 28, 2017 6:15 pm

Re: Gathering Pilot data

Postby munshi.nawaz » Sat Oct 06, 2018 3:12 am

Dear Michiel

Hope you are well.

First, i would like to inform that i have failed to subscribe to the Limdep forum as you have advised, it was showing an error. I have tried to report this to the " limdep-owner@limdep.itls.usyd.edu.au" email as instructed on the page, which also bounced back. Therefore, although my questions are in relation to the choice analysis matters, I am forwarding my query to this forum requesting an advice.

As part of my research, i am undertaking two (2) SP surveys; one to understand people's mode choice preferences, while the other is designed to understand their residential choice preferences.

I have used NGENE for the experimental designs for the both (as i sought advice on in the past), and have collected data for the mode choice survey to the minimum sample size as it prescribed.

For the residential choice survey, i initially obtained 30 responses, analysed these as "Pilots" and have come up with a new design (more efficient than the first design). Based on the new design i needed 109 samples. Unfortunately, i could obtain so far approximately 65 responses back (mostly, by way of face to face interviews and, also from letter drop off - approximately 1250 letters produced a reply of approximately 2%).

I am nearly at the end of my research tenure and am very anxious about the volume of the response available.

Hence, i would be grateful if you could please provide me with your advise on the following:

1. My understanding is that the minimum sample size (109, in my case, as per my new design) as produced by Ngene provides a 95% confidence level. How can i calculate the confidence level with a 65 survey returns?
1.(a) Would the mathematical formula be different for econometric models to a conventional transport/ social surveys (I note there are many literature available to calculate survey sample size)?

2. If i continue my analysis with the data (65) i could collect so far, what would it mean for the quality of the data and, hence, the quality of the research?

3. I do have a total of 95 surveys completed to date. 30 was based on the old design, but with the same attributes and their levels as the new 65 have been designed with). Does the old 30 be of any help in the model? and in improving the "confidence level"?

I will look forward to your advice.

Again, I am very grateful to you, Prof John Rose and to the forum for all the help i received so far.

Yours sincerely

Munshi
munshi.nawaz
 
Posts: 8
Joined: Thu Dec 28, 2017 6:15 pm

Re: Gathering Pilot data

Postby Michiel Bliemer » Sun Oct 07, 2018 12:38 pm

I have sent an email to the people at Econometric software (they make the Nlogit software and maintain the forum). Note that Econometric software is another company than ChoiceMetrics and hence we cannot assist further with your subscription to their forum. Maybe contact support@limdep.com to let them know that the email address on their website does not work.

1. First, note that sample size calculations are only exact when priors are correct. Priors are only a best guess and hence sample size calculations are also only a best guess (which means that 109 could be 50 or 200 if the parameters deviate from your priors. If N = 109, then it means that the t-ratio = beta / (sterror/sqrt(N)) > 1.96, where beta is the value of the prior and sterror is the corresponding standard error for one design replication (which you can find by taking the square root of the corresponding diagonal in the AVC matrix reported for the design in Ngene). If N = 65 then the t-ratio will be lower, and hence your parameters will have larger confidence intervals, i.e. lower statistical significance, less reliable parameter estimates. But you should not calculate confidence levels based on priors in Ngene but rather based on parameter estimates in Nlogit. Nlogit reports both the beta and sterror for each parameter.

2. It means that when testing the hypothesis H0: beta = 0 versus H1: beta <>0 you will have less statistical power, i.e., it will become more difficult to reject the null hypothesis and show that the parameters are statistically significant (i.e. not 0).

3. Yes you can use all 95 surveys in model estimation and it will help improve the reliability of your parameter estimates.

Michiel
Michiel Bliemer
 
Posts: 1705
Joined: Tue Mar 31, 2009 4:13 pm

Re: Gathering Pilot data

Postby munshi.nawaz » Sun Oct 07, 2018 6:26 pm

Dear Michiel

I am very grateful for your time and for the detailed response. I would check my selected priors and the parameters, and i may have to get back to you with some further queries soon.

Also thank you for following up with Limdep and for forwarding the contact.

Apart from this issue, i would like to seek your quick advice on a different matter, on how i can interpret a categorical variables (covariates).

The majority of my SDC variables are categorical, most have some form of order in the categories though, such as; either it's a 0/1 for "gender" or say 1-10 for the gradual upward range of "personal income" etc., which can be explained in relation to their order.

I am struggling with the method of interpreting very few of the variables, such as; "employment type", where the categorical types are random, e.g. 1= full-time work, 2=pensioner, 3= student etc. I am assuming that this issue might have been addressed in the literature, unfortunately, i have not come across any reference to date. One of my colleagues is investigating if SPSS can be used in explaining the categories, as it shows results for all categories of a variable in the MNL regression analysis, but it's not very clear how the link between the two system could be established.

I would be grateful if you could please advise me on how i can interpret "employment type" categories from Nlogit (i am using the simple MNL model) outputs. Or if you could please provide me some references that do discuss about this matter.

Thank you once again for the help.


Yours sincerely
Munshi
munshi.nawaz
 
Posts: 8
Joined: Thu Dec 28, 2017 6:15 pm

Re: Gathering Pilot data

Postby Michiel Bliemer » Sun Oct 07, 2018 9:03 pm

The interpretation of categorical variables depends on the coding you are using. As an analyst you have to select the coding. By far the most common coding scheme is called dummy coding, although some prefer effects coding or other contrasts coding schemes. These coding schemes are used not only in logit models, but also in regression models and you will be able to find a lot of information about them online.

In Ngene you define the coding scheme explicitly. The syntax supports both dummy and effects coding. In estimation software like Nlogit, you create your own data using whatever coding scheme you prefer.

Taking your example, suppose we would like to include employment type into the model with the following levels:
1 = part-time work
2 = full-time work
3 = pensioner
4 = student
5 = unemployed

There is no particular ordering here and it does not matter in what order you specify these levels, I could have selected 1 = student, etc.

Since there is no ordering, it is not appropriate to use the levels 0,1,2,3,4 directly into the utility function. In other words, the following Ngene syntax would not be recommended since it is difficult to interpret what b1 means:
U(alt1) = ... + b1 * employment[1,2,3,4,5] + ...

Instead, you can use dummy coding using the following syntax:
U(alt1) = ... + b1.dummy[..|..|..|..] * employment[1,2,3,4,5] + ...

In this case, you will not be estimating a single parameter b1, but rather you will be estimating 4 parameters:
b1(1) for level 1 (part-time work),
b1(2) for level 2 (full-time work),
b1(3) for level 3 (pensioner),
b1(4) for level 4 (student).
The last level in Ngene is considered the base level, so you will NOT estimate a parameter for level 5 (note that since you can change the order of the levels I could have selected another level as the base level, this does not matter for model estimation).

Suppose that b1(1) = 0.5. Then this means that the utility for alt1 for a person that has part-time employment will be 0.5 higher COMPARED TO SOMEONE THAT IS UNEMPLOYED. Similarly, if b1(4) = -0.2 this means that the utility for alt1 for a student will be 0.2 less compared to someone that is unemployed.

For level 5 (unemployed) there is no additional utility, i.e. it contributes 0 to utility. Note that only relative differences between utilities matter in logit models, the exact level of the utility is not important. U(alt1) = 1 and U(alt2) = 3 describes exactly the same choice behaviour as U(alt1) = 1.5 and U(alt2) = 3.5, so normalising one of the levels to 0 is appropriate (and without doing so you would not be able to estimate the model).

Consider choice observations from a full-time employed person (level 2), a pensioner (level 3), and a person who is unemployed (level 5).

The data for model estimation would contain 4 columns to represent 4 dummy coded variables (5 levels minus 1 for the base):
... 0 1 0 0 ... (this is dummy coding for level 2)
... 0 0 1 0 ... (this is dummy coding for level 3)
... 0 0 0 0 ... (this is dummy coding for level 5)

Good luck,
Michiel
Michiel Bliemer
 
Posts: 1705
Joined: Tue Mar 31, 2009 4:13 pm

Re: Gathering Pilot data

Postby munshi.nawaz » Sun Oct 07, 2018 10:26 pm

Hi Michiel

I cannot thank you enough.

I think i am understanding the method that you explained. Thank you so very much for explaining this in such a great details, it surely helps a beginner like myself.

I am probably quite late for testing this in Ngene, however, i would analyse my data using NLogit as per your suggestion and, i hope to get back to you (and to this forum) with my findings soon.

Once again, I really appreciate the help.

Yours sincerely
Munshi
munshi.nawaz
 
Posts: 8
Joined: Thu Dec 28, 2017 6:15 pm

Previous

Return to Choice experiments - general

Who is online

Users browsing this forum: No registered users and 5 guests