choice-metrics.com

by **bye1830** » Sat Aug 26, 2023 7:22 pm

Dear Ngene Team,

I hope all is well with you.

I have previously reached out for advice on the design of a choice experiment-based survey, and I am pleased to inform you that the pilot survey was completed last month. To provide some context, you can refer to my earlier posts:

• Dual Response Format and Reference Alternative: viewtopic.php?f=2&t=1003
• Pilot and Main Surveys: Plans: viewtopic.php?f=2&t=1009

To recap briefly, my experiment focuses on zero-emission truck choices, including battery electric trucks and hydrogen fuel cell electric trucks. I am utilizing a dual response format while incorporating two segments: fleet operators of diesel trucks exclusively and those operating natural gas trucks as well. For the former, the reference alternative is a diesel truck, while the latter segment has two reference alternatives: diesel and natural gas trucks. For each choice task, I asked two questions: the first question asks them to choose between BEV and HFCEV, and the second question asks whether they would still choose the option selected in the first question when the reference alternative(s) were available.

In the pilot survey, I had 10 participants: 7 were diesel truck operators, and 3 were natural gas truck adopters. Each respondent was assigned 6 choice tasks, resulting in a total of 60 observations.

Using the Apollo package in R, I obtained the following estimation results:

Code: Select all: 1) Forced tasks (using the data from the 1st question only) Estimate s.e. t.rat.(0) Rob.s.e. Rob.t.rat.(0) asc_bev -0.312465 0.721343 -0.4332 0.694035 -0.4502 b_pcost -0.004857 0.005391 -0.9009 0.003845 -1.2632 b_ocost -0.026976 0.012590 -2.1426 0.010507 -2.5674 b_range 0.003739 0.001150 3.2512 0.001159 3.2255 b_offsite 0.005557 0.008025 0.6925 0.011320 0.4909 b_onsite -0.004918 0.004596 -1.0700 0.002830 -1.7377 Overview of choices for MNL model component : bev hfcev Times available 60 60 Times chosen 33 27 Percentage chosen overall 55 45 Percentage chosen when available 55 45 2) Unforced tasks (using the data pulled both from 1st and 2nd questions) Estimate s.e. t.rat.(0) Rob.s.e. Rob.t.rat.(0) asc_bev -0.458703 0.919506 -0.4989 1.019888 -0.44976 asc_hfcev 0.233798 0.619895 0.3772 0.695334 0.33624 asc_ngv 0.124302 0.617563 0.2013 1.329537 0.09349 b_pcost -0.008447 0.006713 -1.2582 0.003997 -2.11343 b_ocost -0.010581 0.013136 -0.8055 0.004995 -2.11850 b_range 0.002343 0.001272 1.8425 0.001268 1.84787 b_offsite 0.010934 0.009024 1.2117 0.008892 1.22967 b_onsite -0.006933 0.005754 -1.2048 0.003097 -2.23855 Overview of choices for MNL model component : bev hfcev dsl ngv Times available 60 60 60 18 Times chosen 12 22 20 6 Percentage chosen overall 20 36.7 33.3 10 Percentage chosen when available 20 36.7 33.3 33.3

Subsequently, I have drafted a script for the main survey’s choice experiment design, which I plan to employ Bayesian efficient design with MNL model.

Code: Select all: Design ? Dual response design ? Efficient design with priors from the pilot survey ;alts (forced) = BEV, HFCEV, NGV, DSL ;alts (unforced) = BEV, HFCEV, NGV, DSL ;rows = 18 ;block = 3 ;eff = combined (mnl,d, mean) ;bdraws = sobol(200) ;alg = swap (stop = noimprov(1000 iterations)) ;fisher(combined) = design1(forced[0.5], unforced[0.5]) ;model(forced): U(BEV) = b1[0] + b2[(n,-0.004857,0.005391)]*pcostz[105,110,115,125,150,175,200] + b3[(n,-0.026976,0.012590)]*ocostb[50,70] + b4[(n,0.003739,0.001150)]*rangeb[150,200,300,500] + b5[-0.0001]*offsitez[10,20,60] + b6[(n,-0.004918,0.004596)]*onsiteb[0,25,50,75,100] / U(HFCEV)= b2*pcostz + b3*ocosth[90,115,130] + b4*rangeh[300,500,700] + b5*offsitez + b6*onsiteh[0,25,50,75,100] / U(NGV)= b7[-9] + b2*pcostn[105,130] + b3*ocostn[70,90] + b4*rangen[700] + b5*offsiten[15] + b6*onsiten[100] / U(DSL) = b8[-9] + b2*pcostd[100] + b3*ocostd[100] + b4*ranged[700] + b5*offsited[5] ;model(unforced): U(BEV) = b1[0] + b2[(n,-0.008447,0.006713)]*pcostz[105,110,115,125,150,175,200] + b3[(n,-0.010581,0.013136)]*ocostb[50,70] + b4[(n,0.002343,0.001272)]*rangeb[150,200,300,500] + b5[-0.0001]*offsitez[10,20,60] + b6[(n,-0.006933,0.005754)]*onsiteb[0,25,50,75,100] / U(HFCEV)= b2*pcostz + b3*ocosth[90,115,130] + b4*rangeh[300,500,700] + b5*offsitez + b6*onsiteh[0,25,50,75,100] / U(NGV)= b7[0] + b2*pcostn[105,130] + b3*ocostn[70,90] + b4*rangen[700] + b5*offsiten[15] + b6*onsiten[100] / U(DSL) = b8[0] + b2*pcostd[100] + b3*ocostd[100] + b4*ranged[700] + b5*offsited[5] $

At this stage, I have many questions. Would you be wiling to help me? I would greatly appreciate your insights on any of the following:

Q1: In this draft script, I used the estimated results from the forced task data for the forced model and the results from the unforced task data for the unforced model. Is this approach correct, or are there more appropriate approaches?

Q2: In the estimation results, there are two sets of t-ratios and standard errors: basic and robust. In my estimation results, robust standard errors tended to be larger than basic standard errors. Could you provide guidance on which one I should use?

Q3: One estimated parameter is not statistically significant, and it has unexpected signs (e.g., b_offsite). Should I use a near-zero prior, as done before in the pilot survey design? Do you have any recommendations?

Q4: Some estimated parameters are not statistically significant, and it is uncertain whether their signs are appropriate (e.g., asc_bev, asc_hfcev, asc_ngv). In these cases, what priors would you recommend?

Q5: In this main survey script, should asterisks (*) which were placed after each alternative name be removed?

Q6: Is there a specific algorithm you’d like to recommend for this main survey design (e.g., swap, mfederov, etc)?

Q7. Of various types of draws (e.g., Halton, Gauss, Sobol) and the number of draws, which approach would you suggest?

Q8: I used 18 rows and 3 blocks in the pilot survey, initially targeting 6 to 10 respondents. For the main survey, I'm targeting 60 to 100 respondents at maximum. Do you think I should use a larger number of rows (e.g., 24 rows) for more variation in the data for the main survey?

Q9: Overall, could you please review my draft script? I’d appreciate it if you could correct any aspects that could be improved.

Q10: Lastly, when I ran the draft script, I obtained the following results. Particularly, S estimates are very large, such as over 53,000. What interpretations could be made from such large numbers?

Code: Select all: MNL efficiency measures (forced) Bayesian Fixed Mean Std dev. Median Minimum Maximum D error 6.80E-05 7.90E-05 1.30E-05 7.60E-05 5.70E-05 0.00014 A error 0.000173 0.000209 6.10E-05 0.00019 0.000134 0.000488 B estimate 0.00023 0.000266 0.000227 0.000197 1.70E-05 0.00122 S estimate 53,946 64,606 14,105 60,901 53,432 213,826 MNL efficiency measures (unforced) Bayesian Fixed Mean Std dev. Median Minimum Maximum D error 0.000154 0.000169 4.40E-05 0.000158 8.80E-05 0.000319 A error 0.000306 0.000335 8.00E-05 0.000318 0.000185 0.000607 B estimate 51.88 42.92 18.79 42.44 8.10 96.46 S estimate 170,807 196,941 197,028 177,747 91,853 2,877,703

Thank you for taking the time to read my post. I apologize if I've posed an abundance of questions. Your guidance and insights would be greatly appreciated as they will significantly aid me in moving forward with the next stage of this project. Thank you very much.

Sincerely,

YB

by **Michiel Bliemer** » Sat Aug 26, 2023 11:10 pm

Q1. Yes that is fine. But you could consider estimating a joint model where you combine the data. This may reduce your standard errors.

Q2. Both are fine but I generally use the regular standard errors when determining my Bayesian priors.

Q3. If a parameter has an unexpected sign then I would generally manually change it to make sure that the prior makes sense. Perhaps a uniform distribution, (u,-0.02,0).

Q4. The estimated parameters are still the best guess you have, so you could use normally distributed priors in the same way as with statistically significant parameters. If draws from the the resulting normal distribution has extreme outliers then you could consider using the median Bayesian D-error (mnl,d,median).

Q5. That depends on whether you believe there would be any issue with dominance. If there is no obvious dominance then * are not needed.

Q6. Swap is generally preferred because it maintains attribute level balance. I only use mfederov if swap does not work (for example if I have a lot of constraints).

Q7. Sobol is one of the best draws and works with both 'mean' and 'median'. Gauss is also good but only works with 'mean'.
See this paper: https://www.sciencedirect.com/science/article/pii/S1755534513700241

Q8. More variation is not a bad idea.

Q9. You could add ;con to your script since you have labelled alternatives. Further it looks fine.

Q10. Your prior b5 = -0.0001 is extremely small and if this would be the actual parameter value then it would require a very large sample size to be estimate to estimate this parameter at a statistically significant level since your prior essentially says that the corresponding attribute does not influence choice much. You can see the sample size estimates for each parameter separately in the output, these are more useful to look at than the overall sample size estimate (which is not useful since b5 is close to zero).

Michiel

by **bye1830** » Fri Sep 22, 2023 10:43 am

Dear Michiel,

Thank you very much for answering all of my questions. I greatly appreciate your help, as always. Based on your answers, I have re-written the script as shown below.

Code: Select all: Design ? Dual response design ? Efficient design with priors from the pilot survey ;alts (forced) = BEV, HFCEV, NGV, DSL ;alts (unforced) = BEV, HFCEV, NGV, DSL ;rows = 24 ;block = 3 ;eff = combined (mnl,d, mean) ;bdraws = sobol(300) ;alg = swap (stop = noimprov(1000 iterations)) ;fisher(combined) = design1(forced[0.5], unforced[0.5]) ;con ;model(forced): U(BEV) = b1[(n,-0.312465,0.721343)] + b2[(n,-0.004857,0.005391)]*pcostz[105,110,115,125,150,175,200] + b3[(n,-0.026976,0.012590)]*ocostb[50,70] + b4[(n,0.003739,0.001150)]*rangeb[150,200,300,500] + b5[(u,-0.02,0)]*offsitez[10,20,60] + b6[(n,-0.004918,0.004596)]*onsiteb[0,25,50,75,100] / U(HFCEV)= b2*pcostz + b3*ocosth[90,115,130] + b4*rangeh[300,500,700] + b5*offsitez + b6*onsiteh[0,25,50,75,100] / U(NGV)= b7[-9] + b2*pcostn[105,130] + b3*ocostn[70,90] + b4*rangen[700] + b5*offsiten[15] + b6*onsiten[100] / U(DSL) = b8[-9] + b2*pcostd[100] + b3*ocostd[100] + b4*ranged[700] + b5*offsited[5] ;model(unforced): U(BEV) = b1[(n,-0.692500,0.805947)] + b2[(n,-0.008447,0.006713)]*pcostz[105,110,115,125,150,175,200] + b3[(n,-0.010581,0.013136)]*ocostb[50,70] + b4[(n,0.002343,0.001272)]*rangeb[150,200,300,500] + b5[(u,-0.02,0)]*offsitez[10,20,60] + b6[(n,-0.006933,0.005754)]*onsiteb[0,25,50,75,100] / U(HFCEV)= b2*pcostz + b3*ocosth[90,115,130] + b4*rangeh[300,500,700] + b5*offsitez + b6*onsiteh[0,25,50,75,100] / U(NGV)= b7[(n,-0.109496,0.811147)] + b2*pcostn[105,130] + b3*ocostn[70,90] + b4*rangen[700] + b5*offsiten[15] + b6*onsiten[100] / U(DSL) = b8[(n,-0.233798,0.619895)] + b2*pcostd[100] + b3*ocostd[100] + b4*ranged[700] + b5*offsited[5] $

After running the Ngene, I have obtained the results of efficiency measures below. At this moment, I have a question regarding the interpretation of Sp estimates.

Code: Select all: MNL efficiency measures (combined) Bayesian Fixed Mean Std dev. Median Minimum Maximum D error 0.001837 0.002226 0.000415 0.002148 0.001521 0.004786 A error 0.544566 0.666516 0.143584 0.638738 0.456183 1.715677 MNL efficiency measures (forced) Prior b1 b2 b3 b4 b5 b6 b7 b8 Fixed prior value -0.312465 -0.004857 -0.026976 0.003739 -0.01 -0.004918 -9 -9 Sp estimates 40.297667 9.316242 2.430341 1.136976 4.820191 7.049002 7.08518 6.975668 Sp t-ratios 0.308757 0.642149 1.257252 1.838148 0.892738 0.738231 0.736344 0.742101 MNL efficiency measures (unforced) Prior b1 b2 b3 b4 b5 b6 b7 b8 Fixed prior value -0.6925 -0.008447 -0.010581 0.002343 -0.01 -0.006933 -0.109496 -0.233798 Sp estimates 20.498185 9.53626 20.065659 5.213036 18.270624 13.128503 633.009316 102.958256 Sp t-ratios 0.432911 0.634698 0.437552 0.858442 0.458542 0.540939 0.077902 0.193164

In the unforced design, the Sp estimate for each parameter is less than around 20, except for b7 (633) and b8 (103). I expect the sample size for my main survey to be between 60 and 100 at maximum. And, this design has 24 rows and 6 blocks. Thus, if I achieve 100 completed responses, this would result in 25 replications of the design (= 100 / (24/6)). So, is it appropriate to say that those parameters with an Sp estimate greater than 25 (e.g., b7 and b8) would likely be nonsignificant with a sample size of 100?

In addition, are there any other efficiency measures that I need to review?

Thank you again for taking the time to answer my questions. I would greatly appreciate any additional insights you may have.

Sincerely,

YB

choice-metrics.com

Main survey design based on pilot survey results

Main survey design based on pilot survey results

Re: Main survey design based on pilot survey results

Re: Main survey design based on pilot survey results

Who is online