choice-metrics.com

Posted: **Wed Mar 10, 2021 4:18 am**

Dear Choice-Metrics team,

First, I would like to thank you for developing Ngene and for releasing such a detailed user manual! It not only explains the software well, but also provides more insights on experimental design theory for SC experiments than most of the literature. I have read it on the weekend and have a much better understanding of SC experiments now. Before continuing with some information about my research context and my inquiries, I would like to warn you that I’m very new to discrete choice analysis and therefore want to apologise in advance for any silly questions or mistakes.

My objective is to investigate whether entrepreneurs who raise equity have high preference heterogeneity concerning their investor choice. For this reason, I want to analyze the importance of investor characteristics as well as interaction (complementary and substitutability) effects between them.

Model specification: My idea is to present entrepreneurs 10 choice sets that comprise 3 alternatives (hypothetical investors 1, 2 and 3). Each alternative is described by 5 generic attributes (e.g., reputation or R&D-related support), each of which can take on 3 levels (low, medium, and high). I want to dummy code each generic attribute into two dummy variables indicating the deviation from the reference value. Participants are then asked to identify the best and the worst of the three hypothetical investors (each respondent thus makes 20 choices.). I found this approach in another paper and it has the advantage that I will obtain a complete ranking of alternatives for each choice set. As an estimation method to analyze such rank-ordered data, the authors in the mentioned paper applied a rank-ordered mixed logit model (also called random coefficient models). As this worked well for them, I want to follow them and also apply this model type. Considering all these specifications, I want to generate an efficient fractional-factorial design. In this context, I developed the following questions:

1) Coding scheme: I want to show 3 qualitative levels (e.g., low, medium, and high reputation). Is it correct to use design coding (e.g., A[0,1,2]) and to replace the numbers by the words «low, medium, and high» later on in the questionnaire?

2) Dummy coded attributes: In the mentioned paper, they used the value with the (presumably) lowest benefit as reference for each attribute to «ensure convenient interpretation of coefficient estimates». Let’s say I have the attribute level «having a low reputation» and I want to set this level (since it has the lowest benefit) as the reference. Considering page 111 in the manual, would I then have to relate the 2 in the term A[0,1,2] to the level «low» as the dummy variables (e.g., b1.dummy[1.2 | 2.2]) will automatically relate to the first two numbers/levels in the brackets?

3) Attribute level balance and algorithms: In the paper I found, the attribute levels with the (presumably) lowest benefit appeared 12x in the design each, whereas the other levels appeared 9x each. If I understand it correct, then this means that the attribute level balance property was satisfied, right? I would prefer a similar setup to ensure that the parameters can be estimated well on the whole range of levels. Is this a problem considering that row based algorithms like the Modified Federov algorithm are suggested for unlabelled designs like mine?

4) Rp vs. ec vs. rpec: In the paper I found (sorry for again refering to it), they also generated their efficient design with Ngene. However, I do not understand why they apply a different utility function (compared to the ones described in the manual). They modeled the utility of alternative j in choice set t for respondent n as a quadratic additive function of the alternative's characteristics, described by the vector xnjt. This function contained a linear term ßnxnjt, and a quadratic term. In the latter, which captured the interaction effects, the symmetric matrix carried a coefficient for each interaction that they included. I think up to here it is identical. Then they state "Note that both the coefficient vector and the coefficient matrix are respondent-specific. The enjt are residual error terms that are assumed to be independently and identically distributed and to follow an extreme value distribution." If I understand it correct, this is a combined design (random parameters and error components) but the error components are not normally distributed (as suggested in the manual). Do you have an idea why this could make sense and whether it is possible to model it with Ngene?

5) Values of the parameter estimates: As I want to employ a rank-ordered mixed logit model, I had a look into the section "estimating random parameters models". It says that the parameters need to be defined as distributions. However, I do not know much about the parameters except for their sign so far.
5.1) Considering this, would you suggest to conduct a small pilot study in order to get a better idea about the necessary inputs for the normal or uniform distributions?
5.2) If so, how would you generate the design for the pilot study?
5.3) Is it also necessary to define the interaction parameters as distributions (I want to analyze 5-10 two-way interaction effects)?

6) Internet survey: Can you recommend any internet survey software that is particularly compatible with Ngene? Maybe Sawtooth?

Sorry for all these questions. I thought collecting them is better than asking many follow-up questions.

I would be very pleased to receive your answers / opinions on these issues.

Thank you in advance and kind regards

Michael

Posted: **Fri Mar 12, 2021 8:37 am**

1) You can use design coding or any other coding scheme you like since they are merely symbols to be replaced with labels low, medium, and high. So you can use 0,1,2 as you indicate or you can use 3,2,1 or 100,200,0, whatever you prefer. I generally use design coding where I set the reference level (which in Ngene is the last level) to zero, i.e. [1,2,0]

2) Yes that is correct. In Ngene, I would typically use the following syntax:

U(...) = ... + b1.dummy[1.2|2.2] * A[1,2,0] ? 0 = low reputation (ref), 1 = medium reputation, 2 = high reputation

3) If some levels appear 12 times and other levels appear 9 times, then the design is not completely attribute level balanced, but the balance is quite good. When you use dummy coding it is fine to use a row-based algorithm since maximising efficiency automatically ensures that each level appears a sufficient number of times to be able to estimate all parameters. For example, if a certain level does not appear at all in the design, then the corresponding dummy parameter cannot be estimated and the D-error would be infinite. Minimising the D-error means that all levels of dummy coded variables appear more or less equally.

4) The authors are referring to random parameters, but not error components. They are referring to the typical logit epsilons that are extremely value distributed, which also underlies the multinomial logit model. So they will likely have estimated an rppanel model. Note that optimising for an rppanel model is extremely challenging, therefore it is recommended to optimise for the mnl model and only evaluate the design for rppanel.

5) Yes, a pilot study is always recommended for obtaining priors. You would use near-zero priors for generating a design for the pilot study indicating the sign (i.e., the ordering of the levels) only, which allows Ngene to avoid dominant alternatives. You would use syntax like:

;alts = alt1*, alt2*, alt3* ? which indicates generic alternatives and performs dominance checks
...
U(...) = ... + b1.dummy[0.00001|0.00002] * A[1,2,0] ? 0 = low reputation (ref), 1 = medium reputation, 2 = high reputation

6) SurveyEngine is most compatible with Ngene. This is a survey instrument specifically designed for choice experiments and Ngene designs can be imported or copied-pasted directly into the survey tool. SurveyEngine is the instrument that I used for most of my surveys. They are based in Germany.

A comment: best-worst responses in a choice experiment are often critised since making a worst choice is not a natural selection process. Instead, some have argued for best-best responses (first best, and of the remaining alternatives, the next best) as a more acceptable way to obtain full rankings of the alternatives.

Michiel

Posted: **Wed Mar 17, 2021 3:24 am**

Hello Michiel,

Thank you so much for your help and for suggesting SurveyEngine. I checked SurveyEngine out and the import function works really well with Ngene.

Moreover, I followed your recommendations and developed a code for the pilot study. May I kindly ask you to have take a look on it and answer some questions?

Code: Select all: ? MNL Model Design ;alts = Investor1, Investor2, Investor3 ;rows = 12 ;eff = (mnl,d) ? I optimise for the mnl model and only evaluate the design for rppanel ;model: U(Investor1) = b1.dummy[-0.00001|0.00001] * VAS[1,2,0] ? 0 = average (ref), 1 = sign. less than av., 2 = sign. more than av. + b2.dummy[-0.00001|0.00001] * REP[1,2,0] ? 0 = average (ref), 1 = sign. less than av., 2 = sign. more than av. + b3.dummy[-0.00001|0.00001] * AUT[1,2,0] ? 0 = average (ref), 1 = sign. less than av., 2 = sign. more than av. + b4.dummy[-0.00001|0.00001] * EXP[1,2,0] ? 0 = average (ref), 1 = sign. less than av., 2 = sign. more than av. + b5.dummy[-0.00001|0.00001] * OFF[1,2,0] ? 0 = average (ref), 1 = sign. less than av., 2 = sign. more than av. + i1[0] * VAS.dummy[1] * REP.dummy[2] ? Just an example, of how I would code the interaction effect. I would include many more of them. / U(Investor2) = b1 * VAS + b2 * REP + b3 * AUT + b4 * EXP + b5 * OFF / U(Investor3) = b1 * VAS + b2 * REP + b3 * AUT + b4 * EXP + b5 * OFF $

1.) Last time, I mentioned that I want to use the value with the (presumably) lowest benefit as base for each attribute to ensure convenient interpretation of coefficient estimates (I found that in some other papers). Example: The probability that entrepreneurs select investors with a high reputation is, on average, xy percentage points higher than the probability of them selecting investors with a low reputation (the reference level). The disadvantage of this is that I cannot use the low levels in my interactions. I have read in the forum that effects coding could be an alternative but I would prefer dummy coding. Hence, I had the idea to adjust the levels to «sign. less than average», «average», and «sign. more than average» and to take «average» as my base for all attributes. This allows me to measure interactions between the extrems. Do you have any concerns with this approach?
2.) Theoretically, I am interested in all possible interaction effects but I could of course focus on less. What do you think, is a managable number of interaction effects for my experimental design?
3.) I am not sure about the direction of the interaction effects. Is it ok to take a prior value of zero for the interaction parameters?
4.) I would also like to include variables that captured the properties of the participants (e.g., age, gender, education, personal characteristics etc.). Thus, I had a look into Section «Designs with covariates» of the manual but it seems challenging to include them. Would it be ok to ignore them in the first step (=pilot study)?

Thank you for your support and best regards

Michael

Posted: **Tue Mar 23, 2021 8:26 am**

Hi Michael,

1. You are free to choose the base/refencel level for dummy coding so I do not see an issue with using the middle level as the reference.

2. You can include as many as you like but note that Ngene will optimise for all parameters and if most of them are interaction parameters, then the information collected on the main effects may become lower given that information capture needs to be spread equally across all parameters. So you may want to focus on the most important interactions, but it is no problem putting all of them in. In any case, you should use a larger number of rows to create more variety in your data, this is especially important when estimating interaction effects and when having many parameters to estimate. You could use ;rows = 24 and ;block = 2 or ;rows = 36 and ;block = 3 if you want to show 12 choice tasks to a single respondent.

3. Yes you would default them to zero.

4. Yes most people (I would say 99.99%) leave such covariates out when optimising the design. It is very rare to use covariates at the design optimisation stage.

Michiel

choice-metrics.com

Efficient fractional-factorial design for mixed logit model

Efficient fractional-factorial design for mixed logit model

Re: Efficient fractional-factorial design for mixed logit mo

Re: Efficient fractional-factorial design for mixed logit mo

Re: Efficient fractional-factorial design for mixed logit mo