Dear moderators and users,
I am turning to you with a problem of counterintuitive pilot results, namely, unexpected signs of parameters.
Following points briefly describe the project.
1. The survey aims to find valuations of travel time reliability.
2. D-efficient design for MNL was created for 2 utility functions with model averaging property.
3. Both functions have 4 common attributes (mean waiting time, in-vehicle time, travel cost, maximum possible waiting time), and 2 different attributes describing reliability.
4. The attributes are not independent. Some of them are functions of other attributes, which are not part of utility functions. This was implemented by supplying to Ngene a list of all allowed choice alternatives.
5. Three designs were created - to be given to short/medium/long travel time respondents. Medium and long design yielded small S-errors (<20 resp.), but the best design for the short travel time gave high S-error (>1000 resp.). Data from all segments are pooled for the estimation.
6. A pilot of 40 respondents was carried out. No counterintuitive behaviour was observed - the respondents either traded-off the attributes or used different lexicographic rules.
The problem is: the signs of some parameters are positive, which is unexpected (for mean waiting time or travel cost or several parameters depending on the utility function used).
All negative estimates can be obtained only, if mean waiting time is excluded.
But this is not acceptable for the intended usage of the results. Moreover, when conducting the survey it seemed that mean waiting time was evaluated as negative.
My question is: do you suspect there is a fundamental problem with my design?
Or could it be a matter of sample size, in which case I should keep interviewing additional respondents for the pilot until all (or at least most important) parameters have the expected signs?
If necessary, I will be happy to provide any more information.
Many thanks in advance and best regards,
Baiba