I hope all is well with you.
I have previously reached out for advice on the design of a choice experiment-based survey, and I am pleased to inform you that the pilot survey was completed last month. To provide some context, you can refer to my earlier posts:
• Dual Response Format and Reference Alternative: viewtopic.php?f=2&t=1003
• Pilot and Main Surveys: Plans: viewtopic.php?f=2&t=1009
To recap briefly, my experiment focuses on zero-emission truck choices, including battery electric trucks and hydrogen fuel cell electric trucks. I am utilizing a dual response format while incorporating two segments: fleet operators of diesel trucks exclusively and those operating natural gas trucks as well. For the former, the reference alternative is a diesel truck, while the latter segment has two reference alternatives: diesel and natural gas trucks. For each choice task, I asked two questions: the first question asks them to choose between BEV and HFCEV, and the second question asks whether they would still choose the option selected in the first question when the reference alternative(s) were available.
In the pilot survey, I had 10 participants: 7 were diesel truck operators, and 3 were natural gas truck adopters. Each respondent was assigned 6 choice tasks, resulting in a total of 60 observations.
Using the Apollo package in R, I obtained the following estimation results:
- Code: Select all
1) Forced tasks (using the data from the 1st question only)
Estimate s.e. t.rat.(0) Rob.s.e. Rob.t.rat.(0)
asc_bev -0.312465 0.721343 -0.4332 0.694035 -0.4502
b_pcost -0.004857 0.005391 -0.9009 0.003845 -1.2632
b_ocost -0.026976 0.012590 -2.1426 0.010507 -2.5674
b_range 0.003739 0.001150 3.2512 0.001159 3.2255
b_offsite 0.005557 0.008025 0.6925 0.011320 0.4909
b_onsite -0.004918 0.004596 -1.0700 0.002830 -1.7377
Overview of choices for MNL model component :
bev hfcev
Times available 60 60
Times chosen 33 27
Percentage chosen overall 55 45
Percentage chosen when available 55 45
2) Unforced tasks (using the data pulled both from 1st and 2nd questions)
Estimate s.e. t.rat.(0) Rob.s.e. Rob.t.rat.(0)
asc_bev -0.458703 0.919506 -0.4989 1.019888 -0.44976
asc_hfcev 0.233798 0.619895 0.3772 0.695334 0.33624
asc_ngv 0.124302 0.617563 0.2013 1.329537 0.09349
b_pcost -0.008447 0.006713 -1.2582 0.003997 -2.11343
b_ocost -0.010581 0.013136 -0.8055 0.004995 -2.11850
b_range 0.002343 0.001272 1.8425 0.001268 1.84787
b_offsite 0.010934 0.009024 1.2117 0.008892 1.22967
b_onsite -0.006933 0.005754 -1.2048 0.003097 -2.23855
Overview of choices for MNL model component :
bev hfcev dsl ngv
Times available 60 60 60 18
Times chosen 12 22 20 6
Percentage chosen overall 20 36.7 33.3 10
Percentage chosen when available 20 36.7 33.3 33.3
Subsequently, I have drafted a script for the main survey’s choice experiment design, which I plan to employ Bayesian efficient design with MNL model.
- Code: Select all
Design
? Dual response design
? Efficient design with priors from the pilot survey
;alts (forced) = BEV, HFCEV, NGV, DSL
;alts (unforced) = BEV, HFCEV, NGV, DSL
;rows = 18
;block = 3
;eff = combined (mnl,d, mean)
;bdraws = sobol(200)
;alg = swap (stop = noimprov(1000 iterations))
;fisher(combined) = design1(forced[0.5], unforced[0.5])
;model(forced):
U(BEV) = b1[0] +
b2[(n,-0.004857,0.005391)]*pcostz[105,110,115,125,150,175,200] +
b3[(n,-0.026976,0.012590)]*ocostb[50,70] +
b4[(n,0.003739,0.001150)]*rangeb[150,200,300,500] +
b5[-0.0001]*offsitez[10,20,60] +
b6[(n,-0.004918,0.004596)]*onsiteb[0,25,50,75,100] /
U(HFCEV)= b2*pcostz +
b3*ocosth[90,115,130] +
b4*rangeh[300,500,700] +
b5*offsitez +
b6*onsiteh[0,25,50,75,100] /
U(NGV)= b7[-9] +
b2*pcostn[105,130] +
b3*ocostn[70,90] +
b4*rangen[700] +
b5*offsiten[15] +
b6*onsiten[100] /
U(DSL) = b8[-9] +
b2*pcostd[100] +
b3*ocostd[100] +
b4*ranged[700] +
b5*offsited[5]
;model(unforced):
U(BEV) = b1[0] +
b2[(n,-0.008447,0.006713)]*pcostz[105,110,115,125,150,175,200] +
b3[(n,-0.010581,0.013136)]*ocostb[50,70] +
b4[(n,0.002343,0.001272)]*rangeb[150,200,300,500] +
b5[-0.0001]*offsitez[10,20,60] +
b6[(n,-0.006933,0.005754)]*onsiteb[0,25,50,75,100] /
U(HFCEV)= b2*pcostz +
b3*ocosth[90,115,130] +
b4*rangeh[300,500,700] +
b5*offsitez +
b6*onsiteh[0,25,50,75,100] /
U(NGV)= b7[0] +
b2*pcostn[105,130] +
b3*ocostn[70,90] +
b4*rangen[700] +
b5*offsiten[15] +
b6*onsiten[100] /
U(DSL) = b8[0] +
b2*pcostd[100] +
b3*ocostd[100] +
b4*ranged[700] +
b5*offsited[5]
$
At this stage, I have many questions. Would you be wiling to help me? I would greatly appreciate your insights on any of the following:
Q1: In this draft script, I used the estimated results from the forced task data for the forced model and the results from the unforced task data for the unforced model. Is this approach correct, or are there more appropriate approaches?
Q2: In the estimation results, there are two sets of t-ratios and standard errors: basic and robust. In my estimation results, robust standard errors tended to be larger than basic standard errors. Could you provide guidance on which one I should use?
Q3: One estimated parameter is not statistically significant, and it has unexpected signs (e.g., b_offsite). Should I use a near-zero prior, as done before in the pilot survey design? Do you have any recommendations?
Q4: Some estimated parameters are not statistically significant, and it is uncertain whether their signs are appropriate (e.g., asc_bev, asc_hfcev, asc_ngv). In these cases, what priors would you recommend?
Q5: In this main survey script, should asterisks (*) which were placed after each alternative name be removed?
Q6: Is there a specific algorithm you’d like to recommend for this main survey design (e.g., swap, mfederov, etc)?
Q7. Of various types of draws (e.g., Halton, Gauss, Sobol) and the number of draws, which approach would you suggest?
Q8: I used 18 rows and 3 blocks in the pilot survey, initially targeting 6 to 10 respondents. For the main survey, I'm targeting 60 to 100 respondents at maximum. Do you think I should use a larger number of rows (e.g., 24 rows) for more variation in the data for the main survey?
Q9: Overall, could you please review my draft script? I’d appreciate it if you could correct any aspects that could be improved.
Q10: Lastly, when I ran the draft script, I obtained the following results. Particularly, S estimates are very large, such as over 53,000. What interpretations could be made from such large numbers?
- Code: Select all
MNL efficiency measures (forced)
Bayesian
Fixed Mean Std dev. Median Minimum Maximum
D error 6.80E-05 7.90E-05 1.30E-05 7.60E-05 5.70E-05 0.00014
A error 0.000173 0.000209 6.10E-05 0.00019 0.000134 0.000488
B estimate 0.00023 0.000266 0.000227 0.000197 1.70E-05 0.00122
S estimate 53,946 64,606 14,105 60,901 53,432 213,826
MNL efficiency measures (unforced)
Bayesian
Fixed Mean Std dev. Median Minimum Maximum
D error 0.000154 0.000169 4.40E-05 0.000158 8.80E-05 0.000319
A error 0.000306 0.000335 8.00E-05 0.000318 0.000185 0.000607
B estimate 51.88 42.92 18.79 42.44 8.10 96.46
S estimate 170,807 196,941 197,028 177,747 91,853 2,877,703
Thank you for taking the time to read my post. I apologize if I've posed an abundance of questions. Your guidance and insights would be greatly appreciated as they will significantly aid me in moving forward with the next stage of this project. Thank you very much.
Sincerely,
YB