choice-metrics.com

Posted: **Thu Dec 09, 2021 10:23 pm**

Hi all,

I have question regarding the required number of respondents for a pivot design, and consequences for estimation. In our experiment respondents choose between purchasing an 'electric vehicle' and a 'conventional car'. So 2 alternatives in the choice set. The price attribute for the car is a pivot attribute. At an earlier stage in the survey, we ask respondents 'if you buy a car, what would be the price point you're looking for'. So this is an 'open field variable'. It basically can take any value within a specified range. With the price point, the displayed prices are pivoted -25%, 0, +25%. We intent to estimate the model with PandasBiogeme, in the beginning very basic MNL (to get some insights, only a small research question), extending it with sociodemographics. This leaves me wondering how many respondents are needed? Rough estimate of course...

Earlier, I used the rule of thumb as suggested by Johnsen and Orme, but it doesn't really apply in this case. With a possible number of attribute levels is quite large, the number of required responses is increases significantly. I guess we should treat this variable as a 'continuous variable'. Is there anyway to estimate the number of respondents beforehand? I am aware of the guideline published by Orme that mentions continuous variables, however that does seem to need pilot data to get indications about the standard error and consequently required number of responses?

I do have some pilot data... but it is not really usable, as the setup of this question in the experiment changed. Based on feedback from respondents. It is now 'open format' and it used to be a 'drop down list' (giving also potentially a huge number of levels). But if it is best to use what data indications we got, it appears to the number of respondents need to be 8x the pilot size.

Other question: as written our intention is to do basic MNL. However, I am doubting between extending the MNL with socio-demographics or creating segments. Goal is to try to capture different segments or preferences in mainly price (for example based on gender or income level etc.). Another colleague suggested segmenting collected responses and try to estimate separate MNL models for each group. Of course, for the last approach you need enough data for every group (see also the previous question), are you guys aware of any broader advantages/disadvantages of both methods? Literature maybe? Or is it basically the same?

Thanks for your opinions and expertise!

Posted: **Fri Dec 10, 2021 8:28 am**

Regarding sample size, the only proper way to compute it is via parametrer priors and predicted standard errors. Ngene automatically computes sample size estimates based on priors provided in the syntax, I refer to Rose and Bliemer (2013). These priors typically come from a pilot study using the same alternatives and attributes. Without doing a pilot study, it is difficult to compute sample size estimates. You could use cost as a numerical variable, not as a dummy coded categorical variable in your utility function, so you would generally only estimate a single coefficient for cost. Sample size estimates when using a pivot design can be based on the average cost level. You could use a nonlinear transformation if you wish in Biogeme, like beta * ln(cost) in the utility function.

Rose, J.M. and M.C.J. Bliemer (2013) Sample size requirements for stated choice experiments. Transportation, Vol. 40, No. 5, pp. 1021-1041.

Adding sociodemographics to your utility functions and estimating a single model is preferred over creating segments and estimating separate models for each segment. The reason is that in the latter approach you lose statistical power when you compare estimates across the separate models. Making comparisons will require a statistical test that involves two standard errors. In contrast, when using a joint model, you can perform a statistical test by looking at only a single parameter and therefore only involves one standard error in the test.

For example, in a joint model you could use:

U = b1 * cost + b2 * cost * gender

If b2 = 0 then there is no gender effect.

Another example:

U = b1 * cost * income^b2

This is a typical way of including income, where income = (actual income / average income) and b2 is the income elasticity. b2 is typically negative, indicating that people with a higher income are less cost sensitive. If b2 = 0 then there is no income effect.

In other words, I would recommend adding sociodemographics as interactions with other variables in your utility function, which allows you to investigate the impact of different segments in the population and also conduct powerful statistical tests based on the entire data set.

Michiel

Posted: **Tue Dec 14, 2021 2:36 am**

Michiel Bliemer wrote:Regarding sample size, the only proper way to compute it is via parametrer priors and predicted standard errors. Ngene automatically computes sample size estimates based on priors provided in the syntax, I refer to Rose and Bliemer (2013). These priors typically come from a pilot study using the same alternatives and attributes. Without doing a pilot study, it is difficult to compute sample size estimates. You could use cost as a numerical variable, not as a dummy coded categorical variable in your utility function, so you would generally only estimate a single coefficient for cost. Sample size estimates when using a pivot design can be based on the average cost level. You could use a nonlinear transformation if you wish in Biogeme, like beta * ln(cost) in the utility function.

Rose, J.M. and M.C.J. Bliemer (2013) Sample size requirements for stated choice experiments. Transportation, Vol. 40, No. 5, pp. 1021-1041.

Adding sociodemographics to your utility functions and estimating a single model is preferred over creating segments and estimating separate models for each segment. The reason is that in the latter approach you lose statistical power when you compare estimates across the separate models. Making comparisons will require a statistical test that involves two standard errors. In contrast, when using a joint model, you can perform a statistical test by looking at only a single parameter and therefore only involves one standard error in the test.

For example, in a joint model you could use:

U = b1 * cost + b2 * cost * gender

If b2 = 0 then there is no gender effect.

Another example:

U = b1 * cost * income^b2

This is a typical way of including income, where income = (actual income / average income) and b2 is the income elasticity. b2 is typically negative, indicating that people with a higher income are less cost sensitive. If b2 = 0 then there is no income effect.

In other words, I would recommend adding sociodemographics as interactions with other variables in your utility function, which allows you to investigate the impact of different segments in the population and also conduct powerful statistical tests based on the entire data set.

Michiel

Thanks Michiel for your opinion. Appreciate it and helped me in confirming my set-up regarding the single model vs segments. The income elasticity is a nice way to do this. Wondering if it still works with a categorical income level (instead of more precise open field answer). I guess so, but less precise. I would be nice to experiment with it versus adding several dummies for the different income levels.

Posted: **Tue Dec 14, 2021 9:12 am**

Income is often asked in categories, e.g. $50,000-$75,000, but you could consider taking the mean of each category and use it as a continuous variable (and it requires to make an assumption for a category such as $200,000+). Using income with dummy coding is possible, but you will get a very large number of parameters to estimate, which I generally try to avoid.

choice-metrics.com

Pivot design - number of respondents

Pivot design - number of respondents

Re: Pivot design - number of respondents

Re: Pivot design - number of respondents

Re: Pivot design - number of respondents