## Pilot data for main study design and sample size estimate

This forum is for posts covering broader stated choice experimental design issues.

Moderators: Andrew Collins, Michiel Bliemer, johnr

### Pilot data for main study design and sample size estimate

Hi there, I have conducted a pilot study to inform the use of priors for generating my main survey design and have some questions about how I take these findings forward for the main study. For context, my design is binary – each choice set involves one medical test option and participants decide whether to test or not. The decision to not test is assumed to have zero utility.

1) I included an asc (b0) for the decision to test when generating the pilot design in NGene. However, because my design is binary, including a constant that equals 1 when someone decides to test and 0 when someone decides not to test, means the asc is a perfect predictor of choice, so I have not manually created an asc for running the model. Instead, I have conducted a binary logistic regression using the pilot data (n=60) which automatically includes a constant (intercept term) in the output – my understanding of this variable (which is positive and significant 2.63, p<.001) is that is reflects the probability of choosing Yes (to test) at the reference level of the categorical variables in the model (which are effects coded). I want to check this is an appropriate conclusion and approach for interpreting the constant before proceeding with the main study?

2) I planned to use the priors from the pilot study to conduct a sample size estimate using the de Bekker-Grob 2015 method. For the design matrix used for the calculation, there are two rows per choice set, one with the decision to test (with the appropriate attribute levels for that attribute) and the other for the decision to not test (all attributes = 0). Should the constant term be included in the sample size calculation here? If so, is it appropriate to add a column to the design which equals 1 for the test row and 0 for the no test row, and then use the coefficient for the constant term estimated in the logistic regression model as the parameter?

3) My design has 36 choice tasks divided into 3 blocks of 12. For the sample size calculation, does this mean that I will need to multiply the sample size estimates by 3 to obtain the total required sample size for the main study?

4) I’m estimating required sample size for a binary logistic regression model by including one parameter per variable in the calculation. For the main study I plan to run a mixed effects logistic regression to account for heterogeneity and repeated measures, in which case should I be using 2 parameters per variable to calculate required sample size at this stage? If so, is there an example / advice for how to construct the design matrix for this for the sample size calculation?

sab

Posts: 16
Joined: Fri Dec 15, 2023 3:55 am

### Re: Pilot data for main study design and sample size estimat

1. You indeed need a constant for the test alternative, relative to the opt-out that does not have a constant, and you need to add this constant into the utility function when generating an efficient design. The constant does not reflect the probability to test, it merely indicates that people prefer testing over not testing, ceteris paribus (although this may not be possible if the attributes in the test alternative cannot attain the value of zero).

2. Ngene uses only one row per choice set, maybe you are talking about long format in your estimation software or maybe the format used in De Bekker-Grob et al. (2015). You need to include the constant when you specify the utility function as it affects the choice probabilities and hence the sample size calculations. You can also compute the sample size estimate for the constant.

3. Yes

4. No there is no sample size calculation unless you also have priors for the mixed logit model. Ngene can calculate the sample size estimates for mixed logit if you provide the priors for the distributional parameters. However, mixed logit parameters are often not very reliable based on a pilot study, so I would not bother.

Michiel
Michiel Bliemer

Posts: 1815
Joined: Tue Mar 31, 2009 4:13 pm

### Re: Pilot data for main study design and sample size estimat

Thanks very much for your help with this Michiel.

1) I am analysing the data in Stata with the set-up being one row per choice task with a binary choice variable 0/1. I have not added a column to my data set for the constant as it would be identical to the choice variable. Instead, I am using the constant that is automatically included in the output for a binary logistic regression in Stata (the intercept term). Is the coefficient for this constant (the intercept term which is equal to 2.63, p<.001) the parameter I should use for generating the design for the main study e.g.
U(test) = 2.63*b0 + b1.effects[0|0]*modality(0,1,2) + ….. etc
I was reading in your other posts that the prior for the constant should not be zero if including informative priors for other parameters. So if it is not sensible to use this coefficient of 2.63 (which seems quite large), my impression is I should at least include a small, positive coefficient for the constant term when generating the design for the main study in NGene to indicate that choosing to test is the more preferred option?

2) Is this coefficient for the constant (2.63) also the parameter I should specify for calculating the sample size estimate for the constant? In contrast to the Stata layout (1 row per task), in the data matrix for the De Bekker-Grob format, I have a row for test and a row for no test. Should I add a column to the design matrix which equals 1 for test and 0 for no test row, and then specify 2.63 as the parameter for this variable?
sab

Posts: 16
Joined: Fri Dec 15, 2023 3:55 am

### Re: Pilot data for main study design and sample size estimat

1. If Stata estimates an intercept, then this is indeed the ASC. You would enter it as b0[2.63] + ... in Ngene. The value sounds correct if most respondents choose the test alternative, and/or if the attributes have negative utilities. You can see in Ngene what the choice probabilities are in each choice task with the given priors.

2. Yes and yes.

Michiel
Michiel Bliemer

Posts: 1815
Joined: Tue Mar 31, 2009 4:13 pm

### Re: Pilot data for main study design and sample size estimat

Thanks so much for your help with this Michiel.
sab

Posts: 16
Joined: Fri Dec 15, 2023 3:55 am