Page 1 of 1

Unequal sampling over blocks

PostPosted: Thu Feb 08, 2024 12:21 am
by MirtheN
Hi,

I have created an efficient design consisting of two blocks of 10 choice situations. I distributed two questionnaires and received 13 responses for the first block and 10 for the second block. My question is: How can I account for the difference in sample size when estimating the discrete choice model? Should I subsample or weigh the responses from the first block?

Thanks!

Re: Unequal sampling over blocks

PostPosted: Fri Feb 09, 2024 9:54 am
by johnr
This really depends on what you are hoping to achieve, but generally no. There are multiple reasons that the properties of a design may not translate directly to the data we use to estimate with. One such reason is what you describe, unequal blocks. In large samples, the efficiency of the design matters much less than in small samples, hence an exact mapping between the design and the data is less important. Typically, as researchers, once data is collected, we forget about the design altogether, other than to describe it in a paper or report. In McFaddens' original 1974, he used simulations to demonstrate both the asymptotic and small sample properties of the logit model, and concluded that the model will generally retrieve the population parameter estimates as given by the data. Of course, we can introduce a number of biases into the process by how we conduct the experiment, and indeed, via the experimental design itself in some instances. One such example is with Street and Burgess designs, where zero overlap may induce particular behaviour by respondents such as lexicographical choices (in one data set on dating choices I collected years ago, the attributes described potential people the respondent could date - one attribute was whether the potential date was a single parent or not - from the sample we collected using a Street and Burgess design, respondents almost to a person, never selected to date an individual who was a single parent. They would rather date an axe murdering psychopath (I'm not saying this is a desirable outcome, just what we found). Respondents never traded any of the other attributes as the selection was solely on this one attribute - as they never saw two people who were single parents or two who were not together in a task. In any case, the point being, even if you think that the blocks may cause some form of bias in responses such as one block as all easy questions and the other harder to answer questions, in large samples, such biases will get washed out, if the discrepancy in the number of blocks is not very large.

Also, there are tensions and different thinking as to whether one should weight data or not at all. Whilst weighting data is common in consulting there are concerns that have started to appear in academia. This comes down to how generalizable different subsegments of the sample are to the population. To demonstrate, what if you have 100 respondents and 98 answer block A and 2 block B. What if the two respondents who answered block b randomly have extreme preferences that differ to much of the population. If you weight the data, the 98 respondents who saw block A will each have a weight of 0.51 (98/50) and the two respondents who saw block B a weight of 25 each. In this case, you are giving the two respondents with extreme preferences a huge weight in estimation that will likely bias your estimates much more than if you don't weight. Now take the same example, but assume that 52 saw block A and 48 block B. Now the weights are 0.96 and 1.04 respectively. These weights are likely to have limited impact on the estimates. The conclusion of such a simple thought exercise is simple. In cases where you probably really want to use weighting (the first example where the sample is very off), you may cause more problems as you are giving large weights to a few respondents who may themselves be problematic (and you just don't know), and in cases where the sample and population differences are small, weighting won't impact much at all. Indeed, I beleive this is one of the points that McFadden et al. (2006) was attempting to make. They suggest that you DO NOT Weight data in estimation, but rather use weighting in subsequent applications of the model (simulations for example).

McFadden, D., Heiss, F., Jun, B. and Winter, J. (2006) On testing for independence in weighted contingency tables, Medium for Econometric Applications, Rotterdam Econometrics Institute, 14.2, 11-19.

The only case I can think where you might weight the data this way is if the purpose of the paper is to demonstrate something about the design properties themselves rather than is a simple application using the design. Even then, it would be rare to do.

John

Re: Unequal sampling over blocks

PostPosted: Sat Feb 10, 2024 4:43 am
by MirtheN
Thank you very much for your reply, John!