Main survey design based on pilot survey results

This forum is for posts that specifically focus on Ngene.

Moderators: Andrew Collins, Michiel Bliemer, johnr

Main survey design based on pilot survey results

Postby bye1830 » Sat Aug 26, 2023 7:22 pm

Dear Ngene Team,

I hope all is well with you.

I have previously reached out for advice on the design of a choice experiment-based survey, and I am pleased to inform you that the pilot survey was completed last month. To provide some context, you can refer to my earlier posts:

• Dual Response Format and Reference Alternative: viewtopic.php?f=2&t=1003
• Pilot and Main Surveys: Plans: viewtopic.php?f=2&t=1009

To recap briefly, my experiment focuses on zero-emission truck choices, including battery electric trucks and hydrogen fuel cell electric trucks. I am utilizing a dual response format while incorporating two segments: fleet operators of diesel trucks exclusively and those operating natural gas trucks as well. For the former, the reference alternative is a diesel truck, while the latter segment has two reference alternatives: diesel and natural gas trucks. For each choice task, I asked two questions: the first question asks them to choose between BEV and HFCEV, and the second question asks whether they would still choose the option selected in the first question when the reference alternative(s) were available.

In the pilot survey, I had 10 participants: 7 were diesel truck operators, and 3 were natural gas truck adopters. Each respondent was assigned 6 choice tasks, resulting in a total of 60 observations.

Using the Apollo package in R, I obtained the following estimation results:
Code: Select all
1) Forced tasks (using the data from the 1st question only)

             Estimate        s.e.   t.rat.(0)    Rob.s.e. Rob.t.rat.(0)
asc_bev     -0.312465    0.721343     -0.4332    0.694035       -0.4502
b_pcost     -0.004857    0.005391     -0.9009    0.003845       -1.2632
b_ocost     -0.026976    0.012590     -2.1426    0.010507       -2.5674
b_range      0.003739    0.001150      3.2512    0.001159        3.2255
b_offsite    0.005557    0.008025      0.6925    0.011320        0.4909
b_onsite    -0.004918    0.004596     -1.0700    0.002830       -1.7377

Overview of choices for MNL model component :
                                    bev   hfcev
Times available                      60    60
Times chosen                         33    27
Percentage chosen overall            55    45
Percentage chosen when available     55    45

2) Unforced tasks (using the data pulled both from 1st and 2nd questions)

             Estimate        s.e.   t.rat.(0)    Rob.s.e. Rob.t.rat.(0)
asc_bev     -0.458703    0.919506     -0.4989    1.019888      -0.44976
asc_hfcev   0.233798    0.619895      0.3772    0.695334       0.33624
asc_ngv      0.124302    0.617563      0.2013    1.329537       0.09349
b_pcost     -0.008447    0.006713     -1.2582    0.003997      -2.11343
b_ocost     -0.010581    0.013136     -0.8055    0.004995      -2.11850
b_range      0.002343    0.001272      1.8425    0.001268       1.84787
b_offsite    0.010934    0.009024      1.2117    0.008892       1.22967
b_onsite    -0.006933    0.005754     -1.2048    0.003097      -2.23855

Overview of choices for MNL model component :
                                    bev   hfcev   dsl     ngv
Times available                     60     60     60     18
Times chosen                        12     22     20      6
Percentage chosen overall           20     36.7   33.3   10
Percentage chosen when available    20     36.7   33.3   33.3


Subsequently, I have drafted a script for the main survey’s choice experiment design, which I plan to employ Bayesian efficient design with MNL model.

Code: Select all
Design
? Dual response design
? Efficient design with priors from the pilot survey
;alts (forced) = BEV, HFCEV, NGV, DSL
;alts (unforced) = BEV, HFCEV, NGV, DSL
;rows = 18
;block = 3
;eff = combined (mnl,d, mean)
;bdraws = sobol(200)
;alg = swap (stop = noimprov(1000 iterations))
;fisher(combined) = design1(forced[0.5], unforced[0.5])

;model(forced):
U(BEV) = b1[0] +
         b2[(n,-0.004857,0.005391)]*pcostz[105,110,115,125,150,175,200] +
         b3[(n,-0.026976,0.012590)]*ocostb[50,70] +
         b4[(n,0.003739,0.001150)]*rangeb[150,200,300,500] +
         b5[-0.0001]*offsitez[10,20,60] +
         b6[(n,-0.004918,0.004596)]*onsiteb[0,25,50,75,100] /

U(HFCEV)= b2*pcostz +
         b3*ocosth[90,115,130] +
         b4*rangeh[300,500,700] +
         b5*offsitez +
         b6*onsiteh[0,25,50,75,100] /

U(NGV)= b7[-9] +
         b2*pcostn[105,130] +
         b3*ocostn[70,90] +
         b4*rangen[700] +
         b5*offsiten[15] +
         b6*onsiten[100] /

U(DSL) = b8[-9] +
         b2*pcostd[100] +
         b3*ocostd[100] +
         b4*ranged[700] +
         b5*offsited[5]   

;model(unforced):
U(BEV) = b1[0] +
         b2[(n,-0.008447,0.006713)]*pcostz[105,110,115,125,150,175,200] +
         b3[(n,-0.010581,0.013136)]*ocostb[50,70] +
         b4[(n,0.002343,0.001272)]*rangeb[150,200,300,500] +
         b5[-0.0001]*offsitez[10,20,60] +
         b6[(n,-0.006933,0.005754)]*onsiteb[0,25,50,75,100] /

U(HFCEV)= b2*pcostz +
          b3*ocosth[90,115,130] +
          b4*rangeh[300,500,700] +
          b5*offsitez +
          b6*onsiteh[0,25,50,75,100] /

U(NGV)=  b7[0] +
         b2*pcostn[105,130] +
         b3*ocostn[70,90] +
         b4*rangen[700] +
         b5*offsiten[15] +
         b6*onsiten[100] /

U(DSL) = b8[0] +
         b2*pcostd[100] +
         b3*ocostd[100] +
         b4*ranged[700] +
         b5*offsited[5]                     
$


At this stage, I have many questions. Would you be wiling to help me? I would greatly appreciate your insights on any of the following:

Q1: In this draft script, I used the estimated results from the forced task data for the forced model and the results from the unforced task data for the unforced model. Is this approach correct, or are there more appropriate approaches?

Q2: In the estimation results, there are two sets of t-ratios and standard errors: basic and robust. In my estimation results, robust standard errors tended to be larger than basic standard errors. Could you provide guidance on which one I should use?

Q3: One estimated parameter is not statistically significant, and it has unexpected signs (e.g., b_offsite). Should I use a near-zero prior, as done before in the pilot survey design? Do you have any recommendations?

Q4: Some estimated parameters are not statistically significant, and it is uncertain whether their signs are appropriate (e.g., asc_bev, asc_hfcev, asc_ngv). In these cases, what priors would you recommend?

Q5: In this main survey script, should asterisks (*) which were placed after each alternative name be removed?

Q6: Is there a specific algorithm you’d like to recommend for this main survey design (e.g., swap, mfederov, etc)?

Q7. Of various types of draws (e.g., Halton, Gauss, Sobol) and the number of draws, which approach would you suggest?

Q8: I used 18 rows and 3 blocks in the pilot survey, initially targeting 6 to 10 respondents. For the main survey, I'm targeting 60 to 100 respondents at maximum. Do you think I should use a larger number of rows (e.g., 24 rows) for more variation in the data for the main survey?

Q9: Overall, could you please review my draft script? I’d appreciate it if you could correct any aspects that could be improved.

Q10: Lastly, when I ran the draft script, I obtained the following results. Particularly, S estimates are very large, such as over 53,000. What interpretations could be made from such large numbers?

Code: Select all
MNL efficiency measures (forced)                  
                        Bayesian            
              Fixed     Mean       Std dev.   Median     Minimum    Maximum
D error      6.80E-05   7.90E-05   1.30E-05   7.60E-05   5.70E-05   0.00014
A error      0.000173   0.000209   6.10E-05   0.00019    0.000134   0.000488
B estimate   0.00023    0.000266   0.000227   0.000197   1.70E-05   0.00122
S estimate   53,946     64,606     14,105     60,901     53,432     213,826

MNL efficiency measures (unforced)                  
                        Bayesian            
              Fixed     Mean       Std dev.   Median     Minimum     Maximum
D error      0.000154   0.000169   4.40E-05   0.000158   8.80E-05   0.000319
A error      0.000306   0.000335   8.00E-05   0.000318   0.000185   0.000607
B estimate   51.88      42.92      18.79      42.44      8.10       96.46
S estimate   170,807    196,941    197,028    177,747    91,853     2,877,703


Thank you for taking the time to read my post. I apologize if I've posed an abundance of questions. Your guidance and insights would be greatly appreciated as they will significantly aid me in moving forward with the next stage of this project. Thank you very much.

Sincerely,

YB
bye1830
 
Posts: 11
Joined: Wed Jun 22, 2022 3:19 pm

Re: Main survey design based on pilot survey results

Postby Michiel Bliemer » Sat Aug 26, 2023 11:10 pm

Q1. Yes that is fine. But you could consider estimating a joint model where you combine the data. This may reduce your standard errors.

Q2. Both are fine but I generally use the regular standard errors when determining my Bayesian priors.

Q3. If a parameter has an unexpected sign then I would generally manually change it to make sure that the prior makes sense. Perhaps a uniform distribution, (u,-0.02,0).

Q4. The estimated parameters are still the best guess you have, so you could use normally distributed priors in the same way as with statistically significant parameters. If draws from the the resulting normal distribution has extreme outliers then you could consider using the median Bayesian D-error (mnl,d,median).

Q5. That depends on whether you believe there would be any issue with dominance. If there is no obvious dominance then * are not needed.

Q6. Swap is generally preferred because it maintains attribute level balance. I only use mfederov if swap does not work (for example if I have a lot of constraints).

Q7. Sobol is one of the best draws and works with both 'mean' and 'median'. Gauss is also good but only works with 'mean'.
See this paper: https://www.sciencedirect.com/science/article/pii/S1755534513700241

Q8. More variation is not a bad idea.

Q9. You could add ;con to your script since you have labelled alternatives. Further it looks fine.

Q10. Your prior b5 = -0.0001 is extremely small and if this would be the actual parameter value then it would require a very large sample size to be estimate to estimate this parameter at a statistically significant level since your prior essentially says that the corresponding attribute does not influence choice much. You can see the sample size estimates for each parameter separately in the output, these are more useful to look at than the overall sample size estimate (which is not useful since b5 is close to zero).

Michiel
Michiel Bliemer
 
Posts: 1730
Joined: Tue Mar 31, 2009 4:13 pm

Re: Main survey design based on pilot survey results

Postby bye1830 » Fri Sep 22, 2023 10:43 am

Dear Michiel,

Thank you very much for answering all of my questions. I greatly appreciate your help, as always. Based on your answers, I have re-written the script as shown below.

Code: Select all
Design
? Dual response design
? Efficient design with priors from the pilot survey
;alts (forced) = BEV, HFCEV, NGV, DSL
;alts (unforced) = BEV, HFCEV, NGV, DSL
;rows = 24
;block = 3
;eff = combined (mnl,d, mean)
;bdraws = sobol(300)
;alg = swap (stop = noimprov(1000 iterations))
;fisher(combined) = design1(forced[0.5], unforced[0.5])
;con

;model(forced):
U(BEV) = b1[(n,-0.312465,0.721343)] +
         b2[(n,-0.004857,0.005391)]*pcostz[105,110,115,125,150,175,200] +
         b3[(n,-0.026976,0.012590)]*ocostb[50,70] +
         b4[(n,0.003739,0.001150)]*rangeb[150,200,300,500] +
         b5[(u,-0.02,0)]*offsitez[10,20,60] +
         b6[(n,-0.004918,0.004596)]*onsiteb[0,25,50,75,100] /

U(HFCEV)= b2*pcostz +
         b3*ocosth[90,115,130] +
         b4*rangeh[300,500,700] +
         b5*offsitez +
         b6*onsiteh[0,25,50,75,100] /

U(NGV)= b7[-9] +
         b2*pcostn[105,130] +
         b3*ocostn[70,90] +
         b4*rangen[700] +
         b5*offsiten[15] +
         b6*onsiten[100] /

U(DSL) = b8[-9] +
         b2*pcostd[100] +
         b3*ocostd[100] +
         b4*ranged[700] +
         b5*offsited[5]   

;model(unforced):
U(BEV) = b1[(n,-0.692500,0.805947)] +
         b2[(n,-0.008447,0.006713)]*pcostz[105,110,115,125,150,175,200] +
         b3[(n,-0.010581,0.013136)]*ocostb[50,70] +
         b4[(n,0.002343,0.001272)]*rangeb[150,200,300,500] +
         b5[(u,-0.02,0)]*offsitez[10,20,60] +
         b6[(n,-0.006933,0.005754)]*onsiteb[0,25,50,75,100] /

U(HFCEV)= b2*pcostz +
          b3*ocosth[90,115,130] +
          b4*rangeh[300,500,700] +
          b5*offsitez +
          b6*onsiteh[0,25,50,75,100] /

U(NGV)=  b7[(n,-0.109496,0.811147)] +
         b2*pcostn[105,130] +
         b3*ocostn[70,90] +
         b4*rangen[700] +
         b5*offsiten[15] +
         b6*onsiten[100] /

U(DSL) = b8[(n,-0.233798,0.619895)] +
         b2*pcostd[100] +
         b3*ocostd[100] +
         b4*ranged[700] +
         b5*offsited[5]                     
$


After running the Ngene, I have obtained the results of efficiency measures below. At this moment, I have a question regarding the interpretation of Sp estimates.

Code: Select all
MNL efficiency measures (combined)
                     Bayesian           
          Fixed      Mean       Std dev.   Median     Minimum    Maximum
D error   0.001837   0.002226   0.000415   0.002148   0.001521   0.004786
A error   0.544566   0.666516   0.143584   0.638738   0.456183   1.715677

MNL efficiency measures (forced)
Prior               b1          b2          b3          b4         b5         b6         b7        b8
Fixed prior value   -0.312465   -0.004857   -0.026976   0.003739   -0.01      -0.004918  -9        -9
Sp estimates        40.297667   9.316242    2.430341    1.136976   4.820191   7.049002   7.08518   6.975668
Sp t-ratios         0.308757    0.642149    1.257252    1.838148   0.892738   0.738231   0.736344  0.742101

MNL efficiency measures (unforced)
Prior               b1          b2          b3          b4         b5         b6         b7         b8
Fixed prior value   -0.6925     -0.008447   -0.010581   0.002343   -0.01      -0.006933  -0.109496  -0.233798
Sp estimates        20.498185   9.53626     20.065659   5.213036   18.270624  13.128503  633.009316 102.958256
Sp t-ratios         0.432911    0.634698    0.437552    0.858442   0.458542   0.540939   0.077902   0.193164



In the unforced design, the Sp estimate for each parameter is less than around 20, except for b7 (633) and b8 (103). I expect the sample size for my main survey to be between 60 and 100 at maximum. And, this design has 24 rows and 6 blocks. Thus, if I achieve 100 completed responses, this would result in 25 replications of the design (= 100 / (24/6)). So, is it appropriate to say that those parameters with an Sp estimate greater than 25 (e.g., b7 and b8) would likely be nonsignificant with a sample size of 100?

In addition, are there any other efficiency measures that I need to review?

Thank you again for taking the time to answer my questions. I would greatly appreciate any additional insights you may have.

Sincerely,

YB
bye1830
 
Posts: 11
Joined: Wed Jun 22, 2022 3:19 pm


Return to Choice experiments - Ngene

Who is online

Users browsing this forum: No registered users and 27 guests