Hi there, I have conducted a pilot study to inform the use of priors for generating my main survey design and have some questions about how I take these findings forward for the main study. For context, my design is binary – each choice set involves one medical test option and participants decide whether to test or not. The decision to not test is assumed to have zero utility.

1) I included an asc (b0) for the decision to test when generating the pilot design in NGene. However, because my design is binary, including a constant that equals 1 when someone decides to test and 0 when someone decides not to test, means the asc is a perfect predictor of choice, so I have not manually created an asc for running the model. Instead, I have conducted a binary logistic regression using the pilot data (n=60) which automatically includes a constant (intercept term) in the output – my understanding of this variable (which is positive and significant 2.63, p<.001) is that is reflects the probability of choosing Yes (to test) at the reference level of the categorical variables in the model (which are effects coded). I want to check this is an appropriate conclusion and approach for interpreting the constant before proceeding with the main study?

2) I planned to use the priors from the pilot study to conduct a sample size estimate using the de Bekker-Grob 2015 method. For the design matrix used for the calculation, there are two rows per choice set, one with the decision to test (with the appropriate attribute levels for that attribute) and the other for the decision to not test (all attributes = 0). Should the constant term be included in the sample size calculation here? If so, is it appropriate to add a column to the design which equals 1 for the test row and 0 for the no test row, and then use the coefficient for the constant term estimated in the logistic regression model as the parameter?

3) My design has 36 choice tasks divided into 3 blocks of 12. For the sample size calculation, does this mean that I will need to multiply the sample size estimates by 3 to obtain the total required sample size for the main study?

4) I’m estimating required sample size for a binary logistic regression model by including one parameter per variable in the calculation. For the main study I plan to run a mixed effects logistic regression to account for heterogeneity and repeated measures, in which case should I be using 2 parameters per variable to calculate required sample size at this stage? If so, is there an example / advice for how to construct the design matrix for this for the sample size calculation?

Thanks very much in advance for any advice you can give.