by johnr » Tue Sep 08, 2015 11:57 am
Hi David
The issue of interaction effects is an interesting one, however as with most things design related, there is a lot of miscommunication out in the literature about the subject matter. I will try and answer this systematically, but forgive me if I digress at times.
1. The language of main effects + interaction effect designs is a hangover from the literature dealing with linear models, and does not naturally translate over to non-linear models neatly. Consider the following design which results in a zero correlation structure for the main and interaction effects in the traditional sense
Design
;alts = alt1, alt2
;rows = 8
;orth = sim
;foldover
;model:
U(alt1) = b1 * A[-1,1] + b2 * B[-1,1] + b3 * A*B /
U(alt2) = b1 * A + b2 * B + b3 * A*B $
a. I want you to generate the design and copy and paste it into excel (I don't know how to post images, so unfortunately I will have to describe the process I want you to follow. Hopefully it will translate - if you email me directly, I will send you the actual files also).
b. Create the two interaction columns (i.e., for the first and second alternatives).
c. Calculate the correlation structure of the main and interaction effects just to confirm that they are identified (according to traditional linear land thinking).
d. Now, for the main and interaction effects, compute X'X where X is the design (both main and interaction effects). This can be done using the matrix multiplication function in Excel (mmult()), and matrix transposition function (transpose()). You need to first select a block of cells 6*6 in size, type the formula (which will appear in the first cell), and press shift+ctrl+enter simultaneously. My equation looks thus: =MMULT(TRANSPOSE(B2:G17),B2:G17), where the design (main effects and interactions are located in cells B2:G17.
e. You should obtain a matrix, 6*6 in size, where the elements of the leading diagonal are all 16, and the off-diagonals are all zeros. I placed this matrix in cells B29:G34.
f. Now we want to take the inverse of this matrix. Select another set 6*6 of blank cells, and use the minverse() function, where the cells obtained in e are located in the brackets. In my spreadsheet, I used =MINVERSE(B29:G34).
g. You should obtain a matrix, 6*6 in size, where the elements of the leading diagonal are all 0.0625, and the off-diagonals are all zeros.
So what did we do that for? In linear (regression) land, the AVC matrix of a design is sigma*(X'X)^-1. Sigma is a scalar that scales all elements in (X'X)^-1 equally, hence we will ignore it to simplify what I am trying to say. So let us examine (X'X)^-1 only for this discussion, which is a non-scaled version of the AVC matrix you would obtain if you used this design in a linear regression model. If you follow my logic above, step g. gives you (X'X)^-1, the non-scaled version of the AVC matrix you would obtain if you used this design in a linear regression model. If you examine it, you get small values for the leading diagonals (0.0625), and zero values for the off-diagonals. These are the parameter variances (the square roots of which are the standard errors) and the parameter covariances. So the design I generated not only has a good correlation structure (if you assume zero correlations is good), but it translates through to the model in terms of producing very low standard errors (maximises t-ratio) and zero parameter covariances. Note, as per my previous posts, we don't care about the correlation structure of X (at least we shouldn't) as the aim is to estimate a model based on the design (so we care about the properties of the model), and this example shows why for linear models, you want an orthogonal (or if not orthogonal, a design with zero correlations). The parameters of the model will not be correlated (zero covariances), hence the influence of B1 on Y is independent of B2 (again, zero covariance between B1 and B2).
As an aside, for the off-diagonals which are all zero, multiplying them by sigma has no influence. Hence sigma, which relates to Y and the Betas (econometrics 101), will affect the standard errors only.
2. Now for the complicated part. We are now going to simulate choice data using the same design and see how it goes in practice. I will use Nlogit (you can use whatever you want to for this), so will explain how to set this up for Nlogit format. Note that other software may use other formats so translate this to whatever is the appropriate format for the estimation software you are going to use.
a. First of all, each row in Nlogit is an alternative, not a choice task (multiple rows are choice tasks). The design output from Ngene is different in that each row is a choice task. Hence, you need to reformat the design (with the interaction terms) such that it will look like this (I will explain everything in detail below). For the moment, you will see that Altij ={1,2}, where Altij = 1 is the first alternative, and Altij = 2 is the second. The attributes (main and interaction effects), simply sit next to the relevant alternative... I have created another variable called Cset which is always equal to 2. This will tell Nlogit how many alternatives each choice task has (which is fixed at two in this design). The Resp variable is simply a respondent index (I am assuming each (fold-over) block is assigned to a different respondent).
Resp Block Cset Altij Choice Att1 Att2 Int
1 1 2 1 1 -1 -1 1
1 1 2 2 0 -1 1 -1
1 1 2 1 0 -1 1 -1
1 1 2 2 1 1 -1 -1
1 1 2 1 1 1 -1 -1
1 1 2 2 0 1 -1 -1
1 1 2 1 0 1 1 1
1 1 2 2 1 -1 1 -1
1 1 2 1 0 1 1 1
1 1 2 2 1 -1 -1 1
1 1 2 1 0 1 -1 -1
1 1 2 2 1 1 1 1
1 1 2 1 1 -1 1 -1
1 1 2 2 0 1 1 1
1 1 2 1 0 -1 -1 1
1 1 2 2 1 -1 -1 1
2 2 2 1 1 1 1 1
2 2 2 2 0 1 -1 -1
2 2 2 1 0 1 -1 -1
2 2 2 2 1 -1 1 -1
2 2 2 1 1 -1 1 -1
2 2 2 2 0 -1 1 -1
2 2 2 1 1 -1 -1 1
2 2 2 2 0 1 -1 -1
2 2 2 1 1 -1 -1 1
2 2 2 2 0 1 1 1
2 2 2 1 0 -1 1 -1
2 2 2 2 1 -1 -1 1
2 2 2 1 0 1 -1 -1
2 2 2 2 1 -1 -1 1
2 2 2 1 0 1 1 1
2 2 2 2 1 1 1 1
b. The choice variable is a little more complicated to construct. The assumption, is that people will choose the alternative that maximises their utility, where Unsj = Vnsj + Ensj, and Vnsj = beta*Xnsj and Ensj is IID EV1 distributed (n = respondent, s choice task and j alternative). So we need to construct Vnsj and Ensj first. Let us assume the following utility function:
Vnsj = -0.5*X1 - 0.6*X2 + 0.8*int(eraction)
such that for example, V111 for the first alternative will be V111 = -0.5*-1 - 0.6*-1 + 0.8*1 = 1.9 and V112 = -0.5*-1 - 0.6*1 + 0.8*-1 = 0.9. You can work out the values for the remaining Vnsj of the design.
Now we need to construct the Ensj values, which are randomly distributed IID EV1. This can be simulated using the following equation =-LN(-LN(RAND())) where ln is log and rand() is a random value. This simulates a random draw from the randomly distributed IID EV1 term. Each alternative will have its own randomly drawn value, so you can drag this down (copy and paste) the equation for each alternative and choice task.
Now you have both Vnsj and Ensj for each alternative and choice task, and can compute Unsj by summing the two together. Under utility maximisation, respondents will choose the alternative that maximums their utility. The choice variable above, I constructed using a simple if statement such that choice = 1 if Unsj > Unsi, or zero otherwise.
c. If you have been able to follow the above, you should have choice data for 2 respondents. Create 100 respondents by copying the data 50 times stacked under each other (recall that the design is blocked into 2, so 50 * 2 = 100 respondents). Make sure you do not paste special as you still want the random draws from the error terms to vary over respondents, choice tasks and alternatives. I would save this, least you want to loose the thing.
3. Open another Excel spreadsheet, and copy and paste special the simulated data for the 100 respondents into the new spreadsheet. Paste special so that the choice index is now fixed, otherwise it will change every time you do something. Save this file somewhere as a csv file.
4. Open Nlogit (or whatever software you choose) and open the CSV data. Estimate the following model (you can do alt specific too if you want, but I am trying to keep this simple)
nlogit
;lhs=choice,cset,Altij
;choices=A,B
;model:
U(A) = b1*Att1 + b2*Att2 + B3*Int /
U(B) = b1*Att1 + b2*Att2 + B3*Int $
for my data I got b1 = -0.48066, b2 = -0.62534 and b3 = 0.77920, (I assumed -0.5, -0.6 and 0.8 when I generated the data, so pretty damn close). Note you will get different values, given that you took different random draws (different draws from the rand() function) for the error term, but they should be close to the parameters you assumed when generating the model.
5. Now heres the kicker. Open the AVC matrix for the model you estimated (it can be found in the project bar under the Matrices folder and is called VARB). This is what I got:
0.00486822 0.00176692 -0.00187315
0.00176692 0.00502271 -0.00208152
-0.00187315 -0.00208152 0.00536592
Now, what do we see when we look at the off-diagonal elements - I see non-zero values, which may look small, but recall that these are for 100 respondents. If you multiple them by sqrt(50) it will give you a better idea of what is happening (the AVC matrix is divisible by sqrt(N) where N is the number of design replications, not respondents - even though we have 100 respondents, the design is replicated only 50 times - multiplying the elements of the AVC matrix by sqrt(50) normalises it to N = 1, which is equivalent to what I assumed when I was working in linear land above (so I can compare apples to apples). Now I get
0.034423514 0.012494011 -0.013245171
0.012494011 0.035515923 -0.014718569
-0.013245171 -0.014718569 0.037942784
Now, I will give you a moment to appreciate what you have just done.
Lets summarise this ... You took a "main effects plus two way interaction effects design" and showed that the parameters are correlated, which means that it is not a "main effects and two way interaction design" as main effects and two way interaction designs should have zero covariances. Or did you mean you took a "main effects plus two way interaction effects design" generated for a linear model, and applied it to a non-linear model, and assumed it would translate, but obviously it didn't, in which case,
a) should we call it a "main effects plus two way interaction effects design" if, when applied to the model you estimated, it is not doing what it should, or alternatively,
b) when generating such a design, call it something else (in case you may not have noticed, if I were forced to choose between a and b, I would pick b).
So lets go back to basics... the AVC matrices of non-linear models are dependent on the betas (repeat the above with different priors if you don't believe me yet - simply replace the betas I assumed when I generated the design). Design theory (in both linear and non-linear land) is about minimising the elements of the AVC matrix of the design (for both main effects and interaction effects) - recall, we care about the model results, not the design. If the betas were all zero, you would have found in the above that the covarances are equal to zero - again, try it if you don't believe me - simply set the priors/betas = 0 in the simulation task.
Ergo, when people talk about main effects plus interaction designs as they do, they are really either
1. talking about linear models,or
2. talking about non-linear models generated under the assumption of an MNL model, with zero priors (all the betas are zero), optimised for D-error
I presume we are not talking about linear models, otherwise I am on the wrong forum, in which case I apologise profusely for wasting everyone's time (Ngene deals with non-linear choice models only at the moment). The point is, that if you want to discuss design generation for choice models, you should be talking about the assumptions under which you generated the design, and then perhaps the properties of the resulting design, not starting with the properties of the design and ignoring or pretending that it was generated under a certain set of assumptions (whether you knew you were making these assumptions or not is a moot point, you are regardless of whether you knew or not).
So when you say "Completing fractional factorial design with foldover inclusion", I read you are generating a design for an MNL model under the assumption of local priors equal to zero assuming D-error as your optimality criterion and that you are using tricks developed for linear models (the foldover etc.) as your algorithm to locate the design (i.e., search only amongst orthogonal designs (a constraint) and use the foldover). If these are you assumptions, I am fine with it, but what I want to eradicate completely from the choice modelling literature is linear land language and thinking when generating and more importantly writing about designs. What I am calling for is a change in dialogue - that we need to talk about and discuss the assumptions used when generating designs for choice models and treat designs as outputs of these assumptions, not inputs which is how the literature (myself included in the past) have tended to treat them. What I don't understand however is the comment that Bayesian efficient designs don't include interaction terms. Why do they not? As I said above, what you called "fractional factorial design with foldover inclusion" is actually a D-efficient design assuming an MNL model under the assumption of zero priors - it is an EFFICIENT DESIGN - just efficient under a particular set of assumptions. So why is it special? The fact is that it isn't - the zero priors assumption is just an assumption, that could be equally as valid if you assumed other non-zero priors.
Indeed, you can assume priors for interaction terms for Bayesian designs also. You can set them at zero if you want to ensure that they can be estimated (that is the combinations will occur over the design at minimum), use non-zero local priors (if you know the direction), or use weak priors (uniforms either side of zero).
Conclusion: I am not anti (2 - 2 in your post), however I am very anti the language used when applied to discrete choice literature (not just by you but others so please don't take it personally). Until we start talking assumptions -> designs, rather than start with discussing designs that are assumed to be (but in non-linear land are not) assumption free, I'm afraid that the literature will simply fail to move forward. We need to stop with the orthogonal versus efficient design debate NOW, as orthogonal designs are efficient designs (MNL, zero local priors, D-error). The debate we need to be having is what are the appropriate assumptions that make sense for discrete choice models (which is the most important debate that is not being debated in the literature), but we can only have that debate when people stop thinking linear land like. Until then, we are stuck in no-(wo)mans land.
John