Hi all,
I have question regarding the required number of respondents for a pivot design, and consequences for estimation. In our experiment respondents choose between purchasing an 'electric vehicle' and a 'conventional car'. So 2 alternatives in the choice set. The price attribute for the car is a pivot attribute. At an earlier stage in the survey, we ask respondents 'if you buy a car, what would be the price point you're looking for'. So this is an 'open field variable'. It basically can take any value within a specified range. With the price point, the displayed prices are pivoted -25%, 0, +25%. We intent to estimate the model with PandasBiogeme, in the beginning very basic MNL (to get some insights, only a small research question), extending it with sociodemographics. This leaves me wondering how many respondents are needed? Rough estimate of course...
Earlier, I used the rule of thumb as suggested by Johnsen and Orme, but it doesn't really apply in this case. With a possible number of attribute levels is quite large, the number of required responses is increases significantly. I guess we should treat this variable as a 'continuous variable'. Is there anyway to estimate the number of respondents beforehand? I am aware of the guideline published by Orme that mentions continuous variables, however that does seem to need pilot data to get indications about the standard error and consequently required number of responses?
I do have some pilot data... but it is not really usable, as the setup of this question in the experiment changed. Based on feedback from respondents. It is now 'open format' and it used to be a 'drop down list' (giving also potentially a huge number of levels). But if it is best to use what data indications we got, it appears to the number of respondents need to be 8x the pilot size.
Other question: as written our intention is to do basic MNL. However, I am doubting between extending the MNL with socio-demographics or creating segments. Goal is to try to capture different segments or preferences in mainly price (for example based on gender or income level etc.). Another colleague suggested segmenting collected responses and try to estimate separate MNL models for each group. Of course, for the last approach you need enough data for every group (see also the previous question), are you guys aware of any broader advantages/disadvantages of both methods? Literature maybe? Or is it basically the same?
Thanks for your opinions and expertise!