how do i get the prior values given a pilot study

This forum is for posts that specifically focus on Ngene.

Moderators: Andrew Collins, Michiel Bliemer, johnr

how do i get the prior values given a pilot study

Postby elin » Tue Jan 27, 2015 2:12 am

Hi,
I am new at using this software and working with efficient designs. I thought i would start by making a MNL efficient design using priors. I have read the manual and have figured out how to use the code. What i dont understand is how priors are calculated (given that i actually have some results from a small pilot study)? are they simply the B-coefficients from running eg. a multinomial logit model? can this be done i Ngene or should i use limdep?
Also, I understand that one gets estimates of the D-errors etc. However are there any guidelines as to what constitute a good design? e.g a D errror below a certain number/threshold is considered a good efficient design?

Kind regards/
Elin
elin
 
Posts: 1
Joined: Fri Oct 17, 2014 9:20 pm

Re: how do i get the prior values given a pilot study

Postby Michiel Bliemer » Tue Jan 27, 2015 8:01 am

Hi Elin,

Some answers to your questions:
1. Yes the priors are simply the betas that come out of an estimation packages, such as Limdep/Nlogit, Biogeme, SAS, etc. If you are creating Bayesian efficient designs, you also need the standard errors that come out of the same estimation.
2. Each design will be different and there is no value of D-error that is good or bad, this is case specific. It usually helps best to look at the S-estimates, which tell you the likely sample size needed to estimate the parameters (assuming that your priors are correct). If these S-estimates are reasonable, say lower than 50 or 100, then it should be fine. If they produce values over 1000, then you may wish to reconsider your design or doubt your priors.

Michiel
Michiel Bliemer
 
Posts: 1885
Joined: Tue Mar 31, 2009 4:13 pm

Re: how do i get the prior values given a pilot study

Postby johnr » Tue Jan 27, 2015 8:30 am

Hi Elin

One can obtain priors from multiple sources, including but not limited to existing literature, expect judgement and pilot studies. Our preference is for the later, however the first two can also be quite informative if used appropriately.

Priors obtained from existing literature

Very rarely will we be examining a completely new empirical area and quite often than not someone somewhere will have done a study in the same or similar area (e.g., how many times have buses versus trains been looked at in Transportation). One can often obtain information about the priors from other related studies. Care needs to be used however, as the magnitude of the parameter estimates will be related to the size of the Xs used in the study, and hence, unless the existing studies are using the same Xs you are using, you may need to rescale the priors you are assuming to take into account these differences. Also, contextual, cultural and even sample differences, etc. may mean that the study you are conducting may be being drawn from a different population than was used in the previous studies. Hence, whilst one can use parameters from previous studies, this should be done with much thought.

Expert Judgment

Sometimes experts (such as yourself) may have some insights. We know for example that the marginal utility for price should be negative, so why not use this information in generating the design. Even if you only know the expected sign of the parameter, you can use this via say a Bayesian prior (e.g., price[(u,-1,0)] indicates that the price parameter is expected to be between -1 and 0, without specifying a specific value). Again, care is required here, as experts may be too close the problem. If I want to study bubble gum choice, then I am sure the marketing manager for some bubble gum brand can rattle off 100s of attributes which are important, only a limited number of which consumers may actually care about. The best experts to ask are those drawn from the population of interest ...

Pilot studies

Personally, whilst I have used both of the above methods, I prefer to rely on pilot studies drawn from the population of interest. Not only does this provide priors, but if done correctly, can offer new insights that have been missed.

Getting the priors wrong

1. There is much confusion about efficient designs and priors, both in and outside academia. The theory we are relying on emanates from the original work of McFadden (1974), which if read carefully states asymptotically that the logit model will retrieve the true population estimates if the utility function is properly specified. McFadden in the same paper also showed that the model is capable of retrieving the population parameter estimates in finite samples. These properties should hold irrespective of the data, with the data simply driving the sample size requirements (subject to parameter estimates, utility specification, etc.). The important point is that getting the priors wrong simply implies that the sample size requirements required to observe statistical significance will be different to what you thought it would be. Unless one is after specific (interaction) effects, which were not included in the utility specification when generating the design, one should always be able to obtain the population parameters - it is just a case at what sample size (the issue with interactions is that it requires certain combinations of attributes (levels) to appear over the design, which may not occur unless specifically requested (they may occur on the other hand).

2. What is little recognised is that if one assumes priors are zero, then the logit probabilities converge to 1/J for all S (where J is number of alternatives and S is the number of choice tasks), and the logit model will approximate a linear model where the optimal design will typically be orthogonal. Hence, assuming an orthogonal design is equivalent to assuming the priors are zero - or another way to put it is that an orthogonal design is (often) the optimal design under locally optimal priors equal to zero. This is simply an assumption about the priors. The point is that the design obtained depends on the priors assumed, however as per 1 above, the issue is solely about sample size - any design, even random should be able to retrieve the population parameters under the right conditions.

3. The right conditions is the important point missing here. The logit model makes certain assumptions, one not often discussed being that the sample is making trade-offs. If however the design has dominated alternatives, or some other properties (such as minimum overlap), then it is possible (possible being the operative word) that this assumption is being violated. In the case of dominated alternatives, this means that respondents acting rationally should have no error, and that they should always choose the dominated alternative (hence violating the assumption of trading off). This does not mean (not being the operative word) that the design is bad. It means that the model we are imposing on the data is wrong! Some argue for example that Street and Burgess designs should not be used because respondents may act lexicographically when confronted with such designs due to the minimum overlap property often associated with such designs. The flip side of this argument, is that perhaps such designs are useful in allowing respondents to behave how they would anyway, however the point is, that if respondents are answering questions in this manner, then perhaps it is time not to question the design, but rather the model we are using to analyse such data. Note, I am not arguing for/against S&B designs, what I am attempting to state (not so eloquently is that the data is rarely if ever wrong, the models we the analyst impose are the problem).

Back to your question

Ngene does not currently have an estimation routine - simply a design routine. You will need to estimate the pilot models in another software package and take the parameters into the Ngene environment. The D-error value is itself largely meaningless other than as are (within design) relative measure. One cannot compare D-errors across different designs (different in terms of attributes, levels, priors, etc.). One can only compare D-errors within design, with smaller D-errors being preferred to larger D-errors. Hence, there is no actual answer to this question (some reviewers ask for the D-error value of a design in articles, and although I personally often state the D-error of the design used, it really has no actual meaning unless one knows the lowest D-error for that design to compare it to - simply put, smaller is better - and you only know the smallest D-error if you try all the designs or you can mathematically work it out as Street and Burgess have done for a specific special case). We typically run the design overnight or a for a few days and select the best design found after that time.

Sorry for rambling.

John
johnr
 
Posts: 171
Joined: Fri Mar 13, 2009 7:15 am

Re: how do i get the prior values given a pilot study

Postby mkhakdaman » Mon May 22, 2017 1:07 am

johnr wrote:Hi Elin

One can obtain priors from multiple sources, including but not limited to existing literature, expect judgement and pilot studies. Our preference is for the later, however the first two can also be quite informative if used appropriately.

Priors obtained from existing literature

Very rarely will we be examining a completely new empirical area and quite often than not someone somewhere will have done a study in the same or similar area (e.g., how many times have buses versus trains been looked at in Transportation). One can often obtain information about the priors from other related studies. Care needs to be used however, as the magnitude of the parameter estimates will be related to the size of the Xs used in the study, and hence, unless the existing studies are using the same Xs you are using, you may need to rescale the priors you are assuming to take into account these differences. Also, contextual, cultural and even sample differences, etc. may mean that the study you are conducting may be being drawn from a different population than was used in the previous studies. Hence, whilst one can use parameters from previous studies, this should be done with much thought.

Expert Judgment

Sometimes experts (such as yourself) may have some insights. We know for example that the marginal utility for price should be negative, so why not use this information in generating the design. Even if you only know the expected sign of the parameter, you can use this via say a Bayesian prior (e.g., price[(u,-1,0)] indicates that the price parameter is expected to be between -1 and 0, without specifying a specific value). Again, care is required here, as experts may be too close the problem. If I want to study bubble gum choice, then I am sure the marketing manager for some bubble gum brand can rattle off 100s of attributes which are important, only a limited number of which consumers may actually care about. The best experts to ask are those drawn from the population of interest ...

Pilot studies

Personally, whilst I have used both of the above methods, I prefer to rely on pilot studies drawn from the population of interest. Not only does this provide priors, but if done correctly, can offer new insights that have been missed.

Getting the priors wrong

1. There is much confusion about efficient designs and priors, both in and outside academia. The theory we are relying on emanates from the original work of McFadden (1974), which if read carefully states asymptotically that the logit model will retrieve the true population estimates if the utility function is properly specified. McFadden in the same paper also showed that the model is capable of retrieving the population parameter estimates in finite samples. These properties should hold irrespective of the data, with the data simply driving the sample size requirements (subject to parameter estimates, utility specification, etc.). The important point is that getting the priors wrong simply implies that the sample size requirements required to observe statistical significance will be different to what you thought it would be. Unless one is after specific (interaction) effects, which were not included in the utility specification when generating the design, one should always be able to obtain the population parameters - it is just a case at what sample size (the issue with interactions is that it requires certain combinations of attributes (levels) to appear over the design, which may not occur unless specifically requested (they may occur on the other hand).

2. What is little recognised is that if one assumes priors are zero, then the logit probabilities converge to 1/J for all S (where J is number of alternatives and S is the number of choice tasks), and the logit model will approximate a linear model where the optimal design will typically be orthogonal. Hence, assuming an orthogonal design is equivalent to assuming the priors are zero - or another way to put it is that an orthogonal design is (often) the optimal design under locally optimal priors equal to zero. This is simply an assumption about the priors. The point is that the design obtained depends on the priors assumed, however as per 1 above, the issue is solely about sample size - any design, even random should be able to retrieve the population parameters under the right conditions.

3. The right conditions is the important point missing here. The logit model makes certain assumptions, one not often discussed being that the sample is making trade-offs. If however the design has dominated alternatives, or some other properties (such as minimum overlap), then it is possible (possible being the operative word) that this assumption is being violated. In the case of dominated alternatives, this means that respondents acting rationally should have no error, and that they should always choose the dominated alternative (hence violating the assumption of trading off). This does not mean (not being the operative word) that the design is bad. It means that the model we are imposing on the data is wrong! Some argue for example that Street and Burgess designs should not be used because respondents may act lexicographically when confronted with such designs due to the minimum overlap property often associated with such designs. The flip side of this argument, is that perhaps such designs are useful in allowing respondents to behave how they would anyway, however the point is, that if respondents are answering questions in this manner, then perhaps it is time not to question the design, but rather the model we are using to analyse such data. Note, I am not arguing for/against S&B designs, what I am attempting to state (not so eloquently is that the data is rarely if ever wrong, the models we the analyst impose are the problem).

Back to your question

Ngene does not currently have an estimation routine - simply a design routine. You will need to estimate the pilot models in another software package and take the parameters into the Ngene environment. The D-error value is itself largely meaningless other than as are (within design) relative measure. One cannot compare D-errors across different designs (different in terms of attributes, levels, priors, etc.). One can only compare D-errors within design, with smaller D-errors being preferred to larger D-errors. Hence, there is no actual answer to this question (some reviewers ask for the D-error value of a design in articles, and although I personally often state the D-error of the design used, it really has no actual meaning unless one knows the lowest D-error for that design to compare it to - simply put, smaller is better - and you only know the smallest D-error if you try all the designs or you can mathematically work it out as Street and Burgess have done for a specific special case). We typically run the design overnight or a for a few days and select the best design found after that time.

Sorry for rambling.

John



Dear John

Thanks for the explanation of ways to get priors. I need to cite the journal paper/book that introduces these ways in my journal paper. I would appreciate if you can introduce it to me.

Many thanks and regards,
Masoud
mkhakdaman
 
Posts: 1
Joined: Fri May 19, 2017 10:31 pm

Re: how do i get the prior values given a pilot study

Postby Michiel Bliemer » Tue May 23, 2017 12:43 am

One reference you could use is:

Bliemer, M.C.J., and A.T. Collins (2016) On determining priors for the generation of efficient stated choice experimental designs. Journal of Choice Modelling, Vol. 21, pp. 10-14.

Another one may be:

Kessels, R., B. Jones, P. Goos, and M. Vandebroek (2008) Recommendations on the use of Bayesian optimal designs for choice experiments. Quality and Reliability Engineering International, Vol. 24, pp. 737-744.


Michiel
Michiel Bliemer
 
Posts: 1885
Joined: Tue Mar 31, 2009 4:13 pm


Return to Choice experiments - Ngene

Who is online

Users browsing this forum: No registered users and 8 guests

cron