Page 1 of 1

Model specification dummy vs. linear coded attributes

PostPosted: Mon Feb 13, 2023 12:39 am
by JvB
Hi Michiel,

I have a question on model specification:

I have a choice experiment with three attributes - A1(b_Beitr) with only numerical levels, A2 (b_HoeheEEE) and A3 (b_ZeitEEE) with three numerical levels plus one nominal level each.

In the first approach I have dummy coded A2 and A3 entirely due to the one nominal level - see code below:

Daten_Choice_Pivot <- read.csv("DatenApollo.csv", header = TRUE)

Daten_Choice_Pivot$Beitr_1[which(Daten_Choice_Pivot$Beitr_1==1)] <- 1.6
Daten_Choice_Pivot$Beitr_1[which(Daten_Choice_Pivot$Beitr_1==2)] <- 1.8
Daten_Choice_Pivot$Beitr_1[which(Daten_Choice_Pivot$Beitr_1==3)] <- 3.3
Daten_Choice_Pivot$Beitr_1[which(Daten_Choice_Pivot$Beitr_1==4)] <- 4.8

Daten_Choice_Pivot$Beitr_2[which(Daten_Choice_Pivot$Beitr_2==1)] <- 1.6
Daten_Choice_Pivot$Beitr_2[which(Daten_Choice_Pivot$Beitr_2==2)] <- 1.8
Daten_Choice_Pivot$Beitr_2[which(Daten_Choice_Pivot$Beitr_2==3)] <- 3.3
Daten_Choice_Pivot$Beitr_2[which(Daten_Choice_Pivot$Beitr_2==4)] <- 4.8

Daten_Choice_Pivot$Beitr_3[which(Daten_Choice_Pivot$Beitr_3==1)] <- 1.6
Daten_Choice_Pivot$Beitr_3[which(Daten_Choice_Pivot$Beitr_3==2)] <- 1.8
Daten_Choice_Pivot$Beitr_3[which(Daten_Choice_Pivot$Beitr_3==3)] <- 3.3
Daten_Choice_Pivot$Beitr_3[which(Daten_Choice_Pivot$Beitr_3==4)] <- 4.8

database = Daten_Choice_Pivot
# ################################################################# #
#### DEFINE MODEL PARAMETERS ####
# ################################################################# #
### Vector of parameters, including any kept fixed during estimation
apollo_beta = c(b0 = 0, b_Beitr = 0,
b_HoeheEEE300 = 0,
b_HoeheEEE600 = 0,
b_HoeheEEE900 = 0,
b_HoeheEEEunb = 0,
b_ZeitEEE12 = 0,
b_ZeitEEE42 = 0,
b_ZeitEEE72 = 0,
b_ZeitEEEunb = 0)
### Vector with parameter names (in quotes) to be kept fixed at
# their starting values during estimation.
# Use apollo_beta_fixed = c() if none
apollo_fixed = c("b_HoeheEEEunb", "b_ZeitEEEunb")
# ################################################################# #
#### GROUP AND VALIDATE INPUTS ####
# ################################################################# #
apollo_inputs = apollo_validateInputs()
# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION ####
# ################################################################# #
apollo_probabilities=function(apollo_beta, apollo_inputs,
functionality="estimate"){

### Function initialisation: do not change the following three commands
### Attach and detach inputs, and create empty list of probabilities
apollo_attach(apollo_beta, apollo_inputs)
on.exit(apollo_detach(apollo_beta, apollo_inputs))
P = list()

### List of MNL utilities: must use the same names as in mnl_settings
V = list()

V[["SQ"]] = b0 + b_Beitr*Beitr_1 + b_HoeheEEE300*(HoeheEEE_1==1) + b_HoeheEEE600*(HoeheEEE_1==2) +
b_HoeheEEE900*(HoeheEEE_1==3)+ b_HoeheEEEunb*(HoeheEEE_1==4) + b_ZeitEEE12*(ZeitEEE_1==1) +
b_ZeitEEE42*(ZeitEEE_1==2) + b_ZeitEEE72*(ZeitEEE_1==3) + b_ZeitEEEunb*(ZeitEEE_1==4)
V[["RefA"]] = b_Beitr*Beitr_2 + b_HoeheEEE300*(HoeheEEE_2==1) + b_HoeheEEE600*(HoeheEEE_2==2) +
b_HoeheEEE900*(HoeheEEE_2==3)+ b_HoeheEEEunb*(HoeheEEE_2==4) + b_ZeitEEE12*(ZeitEEE_2==1) +
b_ZeitEEE42*(ZeitEEE_2==2) + b_ZeitEEE72*(ZeitEEE_2==3) + b_ZeitEEEunb*(ZeitEEE_2==4)
V[["RefB"]] = b_Beitr*Beitr_3 + b_HoeheEEE300*(HoeheEEE_3==1) + b_HoeheEEE600*(HoeheEEE_3==2) +
b_HoeheEEE900*(HoeheEEE_3==3)+ b_HoeheEEEunb*(HoeheEEE_3==4) + b_ZeitEEE12*(ZeitEEE_3==1) +
b_ZeitEEE42*(ZeitEEE_3==2) + b_ZeitEEE72*(ZeitEEE_3==3) + b_ZeitEEEunb*(ZeitEEE_3==4)
________________________
In my second approach I have separated A2 and A3 in one linear coded part (containing the three numerical levels) and one dummy coded part (containing 1 if nominal level is choosen and 0 if not) - see code below:

Daten_Choice_Pivot <- read.csv("DatenApollo.csv", header = TRUE)

#Beitrag
Daten_Choice_Pivot$Beitr_1[which(Daten_Choice_Pivot$Beitr_1==1)] <- 1.6
Daten_Choice_Pivot$Beitr_1[which(Daten_Choice_Pivot$Beitr_1==2)] <- 1.8
Daten_Choice_Pivot$Beitr_1[which(Daten_Choice_Pivot$Beitr_1==3)] <- 3.3
Daten_Choice_Pivot$Beitr_1[which(Daten_Choice_Pivot$Beitr_1==4)] <- 4.8

Daten_Choice_Pivot$Beitr_2[which(Daten_Choice_Pivot$Beitr_2==1)] <- 1.6
Daten_Choice_Pivot$Beitr_2[which(Daten_Choice_Pivot$Beitr_2==2)] <- 1.8
Daten_Choice_Pivot$Beitr_2[which(Daten_Choice_Pivot$Beitr_2==3)] <- 3.3
Daten_Choice_Pivot$Beitr_2[which(Daten_Choice_Pivot$Beitr_2==4)] <- 4.8

Daten_Choice_Pivot$Beitr_3[which(Daten_Choice_Pivot$Beitr_3==1)] <- 1.6
Daten_Choice_Pivot$Beitr_3[which(Daten_Choice_Pivot$Beitr_3==2)] <- 1.8
Daten_Choice_Pivot$Beitr_3[which(Daten_Choice_Pivot$Beitr_3==3)] <- 3.3
Daten_Choice_Pivot$Beitr_3[which(Daten_Choice_Pivot$Beitr_3==4)] <- 4.8

# HoeheEEE
Daten_Choice_Pivot$HoeheEEE_1[which(Daten_Choice_Pivot$HoeheEEE_1==1)] <- 3
Daten_Choice_Pivot$HoeheEEE_1[which(Daten_Choice_Pivot$HoeheEEE_1==2)] <- 6
Daten_Choice_Pivot$HoeheEEE_1[which(Daten_Choice_Pivot$HoeheEEE_1==3)] <- 9

#Daten_Choice_Pivot$HoeheEEE_2[which(Daten_Choice_Pivot$HoeheEEE_2==1)] <- 3
#Daten_Choice_Pivot$HoeheEEE_2[which(Daten_Choice_Pivot$HoeheEEE_2==2)] <- 6
#Daten_Choice_Pivot$HoeheEEE_2[which(Daten_Choice_Pivot$HoeheEEE_2==3)] <- 9

Daten_Choice_Pivot$HoeheEEE_2 <- ifelse(Daten_Choice_Pivot$HoeheEEE_2==1, 3,
ifelse(Daten_Choice_Pivot$HoeheEEE_2==2, 6,
ifelse(Daten_Choice_Pivot$HoeheEEE_2==3, 9, 4)))

#Daten_Choice_Pivot$HoeheEEE_3[which(Daten_Choice_Pivot$HoeheEEE_3==1)] <- 3
#Daten_Choice_Pivot$HoeheEEE_3[which(Daten_Choice_Pivot$HoeheEEE_3==2)] <- 6
#Daten_Choice_Pivot$HoeheEEE_3[which(Daten_Choice_Pivot$HoeheEEE_3==3)] <- 9


Daten_Choice_Pivot$HoeheEEE_3 <- ifelse(Daten_Choice_Pivot$HoeheEEE_3==1, 3,
ifelse(Daten_Choice_Pivot$HoeheEEE_3==2, 6,
ifelse(Daten_Choice_Pivot$HoeheEEE_3==3, 9, 4)))

#ZeitEEE
Daten_Choice_Pivot$ZeitEEE_1[which(Daten_Choice_Pivot$ZeitEEE_1==1)] <- 1.2
Daten_Choice_Pivot$ZeitEEE_1[which(Daten_Choice_Pivot$ZeitEEE_1==2)] <- 4.2
Daten_Choice_Pivot$ZeitEEE_1[which(Daten_Choice_Pivot$ZeitEEE_1==3)] <- 7.2

Daten_Choice_Pivot$ZeitEEE_2[which(Daten_Choice_Pivot$ZeitEEE_2==1)] <- 1.2
Daten_Choice_Pivot$ZeitEEE_2[which(Daten_Choice_Pivot$ZeitEEE_2==2)] <- 4.2
Daten_Choice_Pivot$ZeitEEE_2[which(Daten_Choice_Pivot$ZeitEEE_2==3)] <- 7.2

Daten_Choice_Pivot$ZeitEEE_3[which(Daten_Choice_Pivot$ZeitEEE_3==1)] <- 1.2
Daten_Choice_Pivot$ZeitEEE_3[which(Daten_Choice_Pivot$ZeitEEE_3==2)] <- 4.2
Daten_Choice_Pivot$ZeitEEE_3[which(Daten_Choice_Pivot$ZeitEEE_3==3)] <- 7.2

database = Daten_Choice_Pivot

# ################################################################# #
#### DEFINE MODEL PARAMETERS ####
# ################################################################# #
### Vector of parameters, including any kept fixed during estimation
apollo_beta = c(b0 = 0,
b_Beitr = 0,
b_HoeheEEE = 0,
b_HoeheEEEunb = 0, # dummy
b_ZeitEEE = 0,
b_ZeitEEEunb = 0) # dummy
### Vector with parameter names (in quotes) to be kept fixed at
# their starting values during estimation.
# Use apollo_beta_fixed = c() if none
apollo_fixed = c() #"b_HoeheEEEunb", "b_ZeitEEEunb")
# ################################################################# #
#### GROUP AND VALIDATE INPUTS ####
# ################################################################# #
apollo_inputs = apollo_validateInputs()
# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION ####
# ################################################################# #
apollo_probabilities=function(apollo_beta, apollo_inputs,
functionality="estimate"){

### Function initialisation: do not change the following three commands
### Attach and detach inputs, and create empty list of probabilities
apollo_attach(apollo_beta, apollo_inputs)
on.exit(apollo_detach(apollo_beta, apollo_inputs))
P = list()


### List of MNL utilities: must use the same names as in mnl_settings
V = list()

V[["SQ"]] = b0 + b_Beitr*Beitr_1 + b_HoeheEEE*HoeheEEE_1 + b_HoeheEEEunb*(HoeheEEE_1==4) + b_ZeitEEE*ZeitEEE_1 +
+ b_ZeitEEEunb*(ZeitEEE_1==4)
V[["RefA"]] = b_Beitr*Beitr_2 + b_HoeheEEE*(HoeheEEE_2==3|HoeheEEE_2==6|HoeheEEE_2==9)*HoeheEEE_2 + b_HoeheEEEunb*(HoeheEEE_2==4) + b_ZeitEEE*ZeitEEE_2 +
+ b_ZeitEEEunb*(ZeitEEE_2==4)
V[["RefB"]] = b_Beitr*Beitr_3 + b_HoeheEEE*(HoeheEEE_3==3|HoeheEEE_3==6|HoeheEEE_3==9)*HoeheEEE_3 + b_HoeheEEEunb*(HoeheEEE_3==4) + b_ZeitEEE*ZeitEEE_3 +
+ b_ZeitEEEunb*(ZeitEEE_3==4)

___________________________________________________
Do you see any problems with the second approach?

I am thankful for any insights.

Thank you very much in advance.
Best,
J.

Re: Model specification dummy vs. linear coded attributes

PostPosted: Wed Feb 15, 2023 3:29 pm
by Michiel Bliemer
I am not very proficient in Apollo or R so I am not able to understand or comment on your code (you may want to post on the Apollo forum), but just from what you are writing, if an attribute has one categorical level (level 1) and three numerical levels (levels 2, 3, 4), I would probably split it into dummy coded variable with 2 levels and a numerical variable with 3 levels, where the utility function becomes something like:

U = ... + b1 * (A2==1) + b2 * (A2<>1) * A2 + ...

In other words, you have a parameter for level 1 (versus the reference of not being level 1), and a parameter for the numerical levels that should only be included if A2 is not level 1.

This is just an idea and I have not tried this, but I think it makes sense.

Re: Model specification dummy vs. linear coded attributes

PostPosted: Wed Feb 15, 2023 7:10 pm
by JvB
Hi Michiel,

thank you for your response. What you are writing is exactly what my code does. I will post it to the Apollo forum again to check for the code itself.
But thank you for commenting on the general idea. Very helpful!

Best,
J.