choice-metrics.com

by **mattw** » Sat Jul 15, 2017 2:55 am

Hello NGENE Users,

I am working with a team on a stated preference survey featuring salmon recovery. We have 200+ choice situation responses from a pilot, and are refining the design in an effort to improve the coefficient estimates from the pending full survey mailing. We expect to have several hundred choice situation responses at the end of the study.

We have code running in NGENE, but want to verify it with expert NGENE users so that we can make the most of the pilot study results. We are also getting an error when trying to run some NGENE features.

To summarize our conceptual design, we have just 2 attributes, SalmonStatus, and Cost. Each choice situation has three options, a Status Quo option and two generic alternatives, 1 and 2. The Status Quo always holds a $0 for the Cost attribute, and a NoRecovery level for SalmonStatus (which means significant possibility of extinction). Neither of these SQ levels ever occur in alts 1 or 2. Besides SQ there are six additional levels SalmonStatus may take, which include number of expected "returns" of salmon to spawning area yearly:
1) SlowBasic (Recovery expected in 50 yrs, returns at 40k, and low extinction risk)
2) SlowHigh (Recovery expected in 50 yrs, returns at 70k, and low extinction risk)
3) MediumBasic (Recovery expected in 25 yrs, returns at 40k, and low extinction risk)
4) MediumHigh (Recovery expected in 25 yrs, returns at 70k, and low extinction risk)
5) FastBasic (Recovery expected in 15 yrs, returns at 40k, and low extinction risk)
6) FastHigh (Recovery expected in 15 yrs, returns at 70k, and low extinction risk)

Since a recovery timeline (50, 25, or 15 yrs) only makes sense in the case of recovery (Basic or High), we believe recovery time cannot be identified as a separate variable from salmon abundance. That is, relative to the SQ, you cannot increase abundance without also incurring a waiting time. Thus we arrived at the bundled categoric variable described above. Preference for abundance however will be estimable e.g. from the coefficient difference from 1) to 2), and time preference will be estimable e.g. from the coefficient difference from 1) to 5).

I have some code below, including priors from the pilot. This code does run and does return a design, but I have several questions-
• In our pilot mnl discrete choice model results we have a "combined" ASC for choosing either alt1 or alt2, with resulting coefficient of about -0.05. I believe I can put this either under the Utility specs for alt1 and alt2 as -0.05, or under the Utility spec for SQ as +0.05, which is how I have it here. Is that correct? I think I have to include some sort of U(SQ) in the model or NGENE will think SQ is a no-choice alternative rather than a SQ alternative.

• The design from the code below returns a lot of dominating options within a block. For example, the same level of SalmonStatus will show up at a different cost in one of the three choice situations. Or, within the same block, SalmonStatus 1) will show up at the same or higher price than SalmonStatus 2). Is there a way to prevent that from happening in NGENE? If it must be manually adjusted, we can go that route too, but are there any guidelines on how much impact on d-efficiency is "safe" as a result of that manipulation? I can also write several conditions to prevent dominance, but I would prefer not to enforce such conditions ACROSS blocks, just WITHIN blocks, is there a coding option for this?

• In an effort to reduce dominance in the design, if I try to include the "*" behind alt1 and alt2 on the second line of code, I get the following error:

A valid initial random design could not be generated after approximately 10 seconds. In this time, of the 375099 attempts made, there were 0 row repetitions, 17221 alternative repetitions, and 357878 cases of dominance. There are a number of possible causes for this, including the specification of too many constraints, not having enough attributes or attribute levels for the number of rows required, and the use of too many scenario attributes. A design may yet be found, and the search will continue for 10 minutes. Alternatively, you can stop the run and alter the syntax.

And then the program eventually aborts. I suspect this error when using the "*" is related to having just two attributes. However I am hoping to generate a highly efficient design using these attributes and something similar to the rows and block parameters in the code, and it seems like this is what NGENE is for, so I am wondering if my code just needs to be changed in some way. We do have flexibility to change the number of rows in the design, i.e. increase or decrease number of survey versions. We would like to keep three questions per block however.

Many thanks in advance for your thoughts and suggestions!

- Matt

Design
;alts = alt1, alt2, SQ
;rows = 18
;block = 6

? Level descriptions for Status are
? 1) SlowBasic; 2) SlowHigh;
? 3) MediumBasic; 4) MediumHigh;
? 5) QuickBasic; 6) QuickHigh;
? 7) NoRecovery
? Other variable is Cost

;eff = (mnl, d)

;model:

U(alt1) = b1.dummy[(n,0.9,0.25)|(n,0.95,0.25)|(n,1.0,0.25)|(n,1.05,0.25)|(n,1.1,0.25)|(n,1.2,0.25)] * Status[1,2,3,4,5,6,7] + b2[-0.004] * Cost[40,80,150,250,350] /

U(alt2) = b1 * Status + b2 * Cost /

U(SQ) = b0[0.05]

$

by **Michiel Bliemer** » Tue Jul 18, 2017 5:46 pm

I notice that in your syntax you actually allow Status = 7 to occur in alt1 and alt2, is that correct? From your post I understood that this cannot happen.

I first checked to see whether there are actually enough choice tasks without a dominant alternative, and this seems fine. I ran the following syntax and can confirm that there are 300 choice tasks without a dominant alternative:

Code: Select all: Design ;alts = alt1*, alt2*, SQ* ;rows = all ;fact ;require: sq.Status = 7 and sq.Cost = 0, alt1.Status < 7 and alt1.Cost > 0, alt2.Status < 7 and alt2.Cost > 0 ;model: U(alt1) = b1.dummy[(n,0.9,0.25)|(n,0.95,0.25)|(n,1.0,0.25)|(n,1.05,0.25)|(n,1.1,0.25)|(n,1.2,0.25)] * Status[1,2,3,4,5,6,7] + b2[-0.004] * Cost[0,40,80,150,250,350] / U(alt2) = b1 * Status + b2 * Cost / U(SQ) = b0[0.05] + b1 * Status + b2 * Cost $

Using the default swapping algorithm it is difficult for Ngene to find an initial design in which none of the 18 choice tasks are problematic (this has a probability of near zero). You did the right thing by adding the * but you need to use the modified Federov algorithm, i.e. using alg = mfederov. I use constant b0 to pick up any preference towards the status quo merely because it is the status quo. Since in the status quo it holds that b1 * Status[7] + b2 * Cost[0] = 0 + 0 = 0, you can also write U(SQ) = b0[0.05] as you have done. So far so good.

However, inspecting the 300 choice tasks generated with the syntax above, it is clear that alt1 and alt2 will never have Status = 7, which is the reference level. Even if we only impose sq.Status = 7 and sq.Cost = 0, alt and alt2 will always have a Status lower than 7 in order to avoid dominant alternatives. This means that the reference level of the dummy coded variable Status never occurs in al11 and alt2, which makes the model non-identifiable since constant b0 is now correlated with the reference level of Status. In other words, you cannot estimate b0, so in this case it needs to be omitted.

I think the reason that you were able to estimate a model using the pilot study data is that you allow Status = 7 to occur in alt1 and alt2, is that right?

This would be my solution:

1. Generate a candidate set of choice tasks using the above syntax, which will generate a design with 300 rows. Save this design under the name 'choicetasks.ngd' and keep it open in Ngene.
2. Run the following syntax:

Code: Select all: Design ;alts = alt1, alt2, SQ ;rows = 18 ;block = 6 ;alg = mfederov(candidates = choicetasks.ngd) ;eff = (mnl, d, mean) ;bdraws = gauss(3) ;model: U(alt1) = b1.dummy[(n,0.9,0.25)|(n,0.95,0.25)|(n,1.0,0.25)|(n,1.05,0.25)|(n,1.1,0.25)|(n,1.2,0.25)] * Status[1,2,3,4,5,6,7] + b2[-0.004] * Cost[0,40,80,150,250,350] / U(alt2) = b1 * Status + b2 * Cost / U(SQ) = b1 * Status + b2 * Cost $

In order to run this syntax, you will need a special version of Ngene that can read in an external candidate set. You can obtain this version by emailing info@choice-metrics.com. Note that optimisation with the modified Federov is relatively slow, so just run it for a while. I also increased the number of Bayesian draws by adding ;bdraws = gauss(3), since the default halton(200) is not enough for this number of Bayesian priors.

I quickly ran this for a few minutes and I got the following design that should satisfy all your requirements:

Choice situation alt1.status alt1.cost alt2.status alt2.cost sq.status sq.cost Block
1 2 40 3 350 7 0 4
2 1 250 2 350 7 0 1
3 5 40 6 350 7 0 1
4 3 40 4 350 7 0 2
5 2 250 3 350 7 0 6
6 2 40 5 80 7 0 1
7 4 350 3 40 7 0 2
8 2 40 6 80 7 0 5
9 5 350 4 40 7 0 3
10 5 350 1 250 7 0 5
11 4 350 1 250 7 0 3
12 4 40 5 350 7 0 6
13 6 80 3 40 7 0 5
14 1 40 2 350 7 0 3
15 1 250 3 350 7 0 4
16 6 350 5 40 7 0 4
17 1 40 6 80 7 0 6
18 4 40 6 80 7 0 2

Michiel

by **Michiel Bliemer** » Tue Jul 18, 2017 6:29 pm

I think the following syntax will also do the trick for you, which does not require the special version of Ngene:

Code: Select all: Design ;alts = alt1*, alt2*, SQ* ;rows = 18 ;block = 6 ;alg = mfederov ;eff = (mnl, d, mean) ;bdraws = gauss(3) ;require: sq.Status = 7, alt1.Status < 7, alt2.Status < 7 ;model: U(alt1) = b1.dummy[(n,0.9,0.25)|(n,0.95,0.25)|(n,1.0,0.25)|(n,1.05,0.25)|(n,1.1,0.25)|(n,1.2,0.25)] * Status[1,2,3,4,5,6,7] + b2[-0.004] * Cost[40,80,150,250,350] / U(alt2) = b1 * Status + b2 * Cost / U(SQ) = b1 * Status + b2 * CostSQ[0] $

You will get a warning, but I believe it should still produce a design that you are looking for.

Michiel

by **mattw** » Tue Aug 15, 2017 5:15 am

Hello Michiel,

Thank you so much for your extremely helpful reply, and apologies for this late follow-up. There are a few things that we wanted to ask for clarification on as we move forward with our study.

1) Yes, you are correct that our model was not identifiable as written. There was a mistake in our coding of a combined ASC that allowed (incorrect) estimation. Level 7 of recovery (i.e. no recovery) and Level 1 of price ($0) only occur in the Status Quo option. Thus, we are stuck with a situation in which we will not be able to estimate a separate Status Quo effect. After reviewing Cooper, Rose, and Crase AJARE 2011 (as mentioned in another NGENE post on a similar topic), I don't think effects coding will help us in this situation, but please let me know if we are missing something there. Furthermore, please do let us know if you know of published studies where the authors simply could not estimate a Status Quo effect due to the type of issue being addressed in the study. I am familiar with the importance of Status Quo effects, as reviewed by Scarpa et al. in Land Economics 2007, but it also seems like our situation of not being able to identify them may not be so unusual. Here, the reason we can't identify a Status Quo effect is that we have only one type of recovery variable, and recovery must always come at some cost.

2) We are using the code in your second follow-up post (with modified priors from rerunning the estimation), which does indeed run, thank you so much for your individual attention on this. We wanted to make sure the following two warning messages that occur in the output are indeed benign, especially the first one: Warning: Two alternatives were specified for alternative repetition checking, but do not have the same attribute names, and so will not be checked. 'alt2', 'sq'
Warning: One or more attributes will not have level balance with the number of rows specified: alt1.status, alt1.cost, alt2.status, alt2.cost, sq.status

3) Reviewing the designs that NGENE returns, we found that dominance does not occur within a choice situation, but does occur within a block. For example, the same recovery level at a different cost level. In the other designs and surveys I have administered, we have always manually adjusted price levels in the computer-generated designs in order to avoid dominance within a block such that respondents don't become confused (for example: 'why is this same option cheaper in question 2 as compared with question 1?'). In the past I have specifically consulted with other (more experienced) practitioners in environmental economics, and they recommended avoiding dominance within blocks. I am wondering if there is a way to do this in NGENE, and if there is not, is that because NGENE developers and users don't worry about dominance within blocks? Related to this question, I was wondering if NGENE can read in a manually adjusted design, and report how D-efficiency changes as a result. I never used that option in SAS, but I think it can do something like that.

As before, thanks so much for your attention to this! This forum is a wonderful source of info, I have been scanning prior posts and learned quite a bit.

Matt

by **Michiel Bliemer** » Tue Aug 15, 2017 1:39 pm

1) I am not aware of a solution. This is more a question regarding model estimation and parameter identification, so it would be best to ask someone with more expertise in model estimation, I am afraid I am best in answering questions regarding experimental design.

2) The first warning means that the SQ alternative is not included in dominance checks. This is because of the way the algorithm works and how the syntax is formulated. You can get dominance checks to work using the special version of Ngene, but that is much more complicated, and if you do not see any clearly dominant alternatives in your choice tasks then I think you do not need to worry about it. The second warning regarding attribute level balance is what happens when you apply the modified Federov algorithm. Letting go of attribute level balance is actually more efficient, but there may be other reasons why you would like some degree of attribute level balance. You can add constraints on the number of times each level needs to appear within the design. For dummy coded variables this is not really necessary as this happens automatically in the optimisation, but for linear coded variables (like your cost) you may want to do it in this way:

Code: Select all: Design ;alts = alt1*, alt2*, SQ* ;rows = 18 ;block = 6 ;alg = mfederov ;eff = (mnl, d, mean) ;bdraws = gauss(3) ;require: sq.Status = 7, alt1.Status < 7, alt2.Status < 7 ;model: U(alt1) = b1.dummy[(n,0.9,0.25)|(n,0.95,0.25)|(n,1.0,0.25)|(n,1.05,0.25)|(n,1.1,0.25)|(n,1.2,0.25)] * Status[1,2,3,4,5,6,7] + b2[-0.004] * Cost[40,80,150,250,350](2-4,2-4,2-4,2-4,2-4) / U(alt2) = b1 * Status + b2 * Cost / U(SQ) = b1 * Status + b2 * CostSQ[0] $

Note that I have added (2-4,2-4,2-4,2-4,2-4) after the cost attribute to indicate that i want each level to appear between 2 and 4 times over 18 choice tasks.

3) I am not sure what you mean with dominance in a block, as alternative dominance can only appear in a choice task. Maybe this is something that I am not aware of. In stated choice surveys, each choice task is hypothetical and an option in choice task 1 has no relationship with an option in choice task 2. But by all means, feel free to manually change attribute levels if you think it makes sense, you can let Ngene determine the efficiency of manually adjusted designs by using the ;eval command, which reads in the design and evaluates it.

Michiel

by **mattw** » Thu Aug 24, 2017 8:36 am

Hello Michiel,

Thanks for your response!

We have tried the "eval" command, thanks for pointing us in this direction and apologies for not previously noticing this option in the NGENE manual. We did have some follow-up questions after trying it:

* It does not seem like you can edit a design file within NGENE, so we opened the .ngd file in another program and edited it, and then resaved it with a new name, but keeping the .ngd extension. We did not have much luck attempting to read in Excel files as part of the eval command but .ngd files (usually) worked. We did not modify any of the efficiency or other leading information in an .ngd file, just information in the design matrix itself. Does that sound correct? It seemed strange to leave all the efficiency information in the .ngd file, but apparently the eval command just over-writes it.

* Here is the syntax that we used to run the evaluation, which has stripped away some of the syntax needed to run a design and adds the eval command at the end. Through trial and error I think we determined the part of the syntax NGENE needs to do the evaluation:

Design
;alts = alt1*, alt2*, SQ*
;rows = 18
;block = 6

;eff = (mnl, d, mean)

;model:
U(alt1) = b1.dummy[(n,0.9,0.25)|(n,0.95,0.25)|(n,1.0,0.25)|(n,1.05,0.25)|(n,1.1,0.25)|(n,1.2,0.25)] * Status[1,2,3,4,5,6,7] + b2[-0.004] * Cost[40,80,150,250,350] /
U(alt2) = b1 * Status + b2 * Cost /
U(SQ) = b1 * Status + b2 * CostSQ[0]

; eval = <filepath>
$

* The original design had a D-efficiency of about 0.16; our modifications to the design come in at about 0.19. Are there any guidelines, e.g. on a percentage basis, of how much design efficiency loss is relatively safe within the confines a single experiment?

* The eval command worked for some manipulations of the design matrix, but for some reason, when we tried entering a matrix matching the original design output, we received the following error message:

"Unhandled exception has occurred in your application. If you click continue, the application will ignore this error and attempt to continue. If you click Quit, the application will close immediately. Index was outside the bounds of the array"

We then realized that we could not even open the .ngd file that had been modified such that the design matrix matched the original design, or NGENE would again return an error.

* Can you offer any guidance on the above errors, so that we can gain confidence that we can use the eval command to test a range of possibilities? Are there some conditions that NGENE checks for to make sure an .ngd file is internally consistent in some way?

Also - just to refresh how we got here - we are trying to avoid "dominance within blocks". That is, if a respondent sees more than one choice situation, there should not be a dominating alternative across all the options in those situations. NGENE can check for dominance within a choice situation but not across choice situations in a block. The idea is that people may be confused if they see the exact same choice profile but for a different cost in one question versus another. You indicated that this was not a concern you were familiar with - and so perhaps we need not be overly concerned either. We have had dominance across blocks come up as a concern in focus groups, even though people are instructed to treat each choice situation separately.

If anyone that may be reading this thread has an opinion on the importance or non-importance of the "dominance within blocks" issue, we would be grateful for your input. Thanks also for any further guidance on usage of the "eval" command.

Matt

by **Michiel Bliemer** » Thu Aug 24, 2017 10:43 am

I have no experience in editing ngd files directly, it is probably safe to do but I would personally always copy & paste in Excel. Reading in Excel files should work, you just need to make sure that the format is correct. I can assist if necessary. Your Excel file needs to have the following setup:
* no headers
* first column consists of all ones
* second column is numbered 1 to S where S is the number of choice tasks
* remaining columns are the design in the same order as presented in the Ngene output
To read in a design, it is best to first open the design by opening it as a data file in Ngene, and once read in (and available in the project), you can refer to it in your syntax.

There is no guidelines for efficiency, but usually I rather have a design that makes sense and that I am happy with than a design that has optimal efficiency. I think what you have done is fine.

I cannot really give much guidance on directly editing ngd files because this is not the appropriate way of editing designs and we do not support it. I do not know what the consequences are of editing such a file directly. There are several things that could possibly go wrong, this file needs to have internal consistency and maybe this was somehow broken by editing it.

I would be interested to know more about "dominance within blocks" and will try to find some literature on this, as I am not familiar with this concept, so thanks for pointing this out.

Michiel

by **mattw** » Fri Sep 08, 2017 5:51 am

Hello Michiel,

Thanks for the guidance on evaluating designs - unfortunately we have still not been able to get this to work. NGENE does output something, however in the output window the Mean Bayesian MNL D-Error comes up as "Undefined". This is the outcome for each of the manually adjusted designs we have tried to evaluate, as well as the original design. I think we have followed your recommended format for the excel file but we must be missing something. Below I have attempted to input both the evaluation syntax, as well as the information in the input excel file. Thanks for your continuing review of this.

Matt

***************
? This will evaluate a given design. Modify the filepath as needed.
Design
;alts = alt1*, alt2*, SQ*
;rows = 18
;block = 6

;eff = (mnl, d, mean)

;model:
U(alt1) = b1.dummy[(n,0.9,0.25)|(n,0.95,0.25)|(n,1.0,0.25)|(n,1.05,0.25)|(n,1.1,0.25)|(n,1.2,0.25)] * Status[1,2,3,4,5,6,7] + b2[-0.004] * Cost[40,80,150,250,350] /
U(alt2) = b1 * Status + b2 * Cost /
U(SQ) = b1 * Status + b2 * CostSQ[0]

; eval = <filepath>
$

*****************

1 1 1 250 2 350 7 0 3
1 2 1 40 6 80 7 0 1
1 3 2 40 3 350 7 0 1
1 4 1 40 2 350 7 0 5
1 5 4 40 6 350 7 0 2
1 6 5 40 6 350 7 0 6
1 7 1 40 4 80 7 0 3
1 8 5 350 4 40 7 0 2
1 9 3 350 1 250 7 0 4
1 10 3 40 6 80 7 0 5
1 11 3 250 4 350 7 0 6
1 12 3 40 5 350 7 0 4
1 13 5 40 6 250 7 0 1
1 14 3 40 4 350 7 0 4
1 15 2 40 6 80 7 0 6
1 16 2 40 5 80 7 0 5
1 17 2 40 4 350 7 0 2
1 18 1 250 5 350 7 0 3

by **Michiel Bliemer** » Sun Sep 10, 2017 11:04 am

If Ngene gives an undefined D-error it usually means that the model parameters are not identifiable, ie the parameters cannot be estimated using this data. I suspect that the issue is that you are trying to estimate a dummy coded variable where the reference alternative only appears in the SQ alternative, which is often problematic (see several other posts here on the forum on this same issue). If this is indeed the problem, then you are asking to estimate a model that can actually not be estimated based on the restrictions you put on the data, amd I am not able to resolve that for you. If i have time I will try to look more closely on Monday.

Michiel

by **Michiel Bliemer** » Thu Sep 14, 2017 1:42 pm

This is the link to the post that my colleague wrote on this topic: http://choice-metrics.com/forum/viewtopic.php?f=2&t=188&p=638#p638

It is not an issue with Ngene, but an issue in your model and identification of your parameters. There is a paper reference in that post that may be useful in this context.

Michiel

choice-metrics.com

Dummy Coding, SQ Option, Dominance within Blocks

Dummy Coding, SQ Option, Dominance within Blocks

Re: Dummy Coding, SQ Option, Dominance within Blocks

Re: Dummy Coding, SQ Option, Dominance within Blocks

Re: Dummy Coding, SQ Option, Dominance within Blocks

Re: Dummy Coding, SQ Option, Dominance within Blocks

Re: Dummy Coding, SQ Option, Dominance within Blocks

Re: Dummy Coding, SQ Option, Dominance within Blocks

Re: Dummy Coding, SQ Option, Dominance within Blocks

Re: Dummy Coding, SQ Option, Dominance within Blocks

Re: Dummy Coding, SQ Option, Dominance within Blocks

Who is online