Class Probability in LCM
Posted: Fri Nov 24, 2023 2:42 pm
Dear Moderators,
This question relates to Nlogit analysis. My DCE study explores user preferences for a mobile health app.
I have 4 attributes, training (Tr), typing (ty), monitoring (m) and health education (he), each with 3 levels.
Each participant answered 8 choice tasks with 3 alternatives, mobile app A, app B and neither.
When a participant chose neither option, they were then forced to select one option from app A or B, with same attribute-level combinations as in the original choice task.
So, I have two datasets; (1) combined dataset with responses both for conditional and unconditional choice tasks and (2) unconditional dataset.
My panel MMNL model for the combined dataset indicates that coefficients for all attribute-levels except Tr2 and Ty2 are statistically significant. Please see below.
|-> sample ;all $
|-> Nlogit
;lhs= choice,cset,alt
;choices= appA, appB, neither, appC, appD
;rpl
;fcn = tr2(n), tr3(n), ty2(n), ty3(n), m2(n), m3(n), he2(n), he3(n)
;pts=500 ;halton
;pds=Pan2
;model:
U(appA) = ASC_A + TR2*tr2 + TR3*tr3 + TY2*ty2 + TY3*ty3 + M2*m2 + M3*m3 + HE2*he2 + HE3*he3 /
U(appB) = ASC_B + TR2*tr2 + TR3*tr3 + TY2*ty2 + TY3*ty3 + M2*m2 + M3*m3 + HE2*he2 + HE3*he3 /
U(appC) = ASC_C + TR2*tr2 + TR3*tr3 + TY2*ty2 + TY3*ty3 + M2*m2 + M3*m3 + HE2*he2 + HE3*he3 /
U(appD) = TR2*tr2 + TR3*tr3 + TY2*ty2 + TY3*ty3 + M2*m2 + M3*m3 + HE2*he2 + HE3*he3
$
Normal exit: 31 iterations. Status=0, F= 2435.644
-----------------------------------------------------------------------------
Random Parameters Logit Model
Dependent variable CHOICE
Log likelihood function -2435.64402
Restricted log likelihood -4511.25447
Chi squared [ 19 d.f.] 4151.22089
Significance level .00000
McFadden Pseudo R-squared .4600961
Estimation based on N = 2803, K = 19
Inf.Cr.AIC = 4909.3 AIC/N = 1.751
Model estimated: Nov 19, 2023, 19:21:57
Constants only must be computed directly
Use NLOGIT ;...;RHS=ONE$
At start values -2650.0815 .0809******
Response data are given as ind. choices
Replications for simulated probs. = 500
Halton sequences used for simulations
RPL model with panel has 302 groups
Variable number of obs./group =PAN2
Number of obs.= 2803, skipped 0 obs
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Random parameters in utility functions
TR2| .00500 .09337 .05 .9573 -.17801 .18801
TR3| -.81923*** .13418 -6.11 .0000 -1.08223 -.55624
TY2| -.04009 .12429 -.32 .7470 -.28370 .20351
TY3| .40498*** .12948 3.13 .0018 .15121 .65876
M2| 1.10921*** .14963 7.41 .0000 .81594 1.40247
M3| 1.35348*** .18585 7.28 .0000 .98922 1.71775
HE2| .21667** .09269 2.34 .0194 .03499 .39834
HE3| .67798*** .11630 5.83 .0000 .45003 .90593
|Nonrandom parameters in utility functions
ASC_A| .18888 .18619 1.01 .3104 -.17606 .55381
ASC_B| -.08429 .18404 -.46 .6469 -.44502 .27643
ASC_C| .67803*** .15667 4.33 .0000 .37097 .98509
|Distns. of RPs. Std.Devs or limits of triangular
NsTR2| .74171*** .15390 4.82 .0000 .44009 1.04334
NsTR3| 1.47791*** .13593 10.87 .0000 1.21150 1.74433
NsTY2| .72191*** .12957 5.57 .0000 .46796 .97587
NsTY3| .99317*** .11326 8.77 .0000 .77119 1.21514
NsM2| 1.09848*** .10748 10.22 .0000 .88781 1.30914
NsM3| 1.83987*** .15884 11.58 .0000 1.52855 2.15120
NsHE2| .59425*** .17010 3.49 .0005 .26085 .92764
NsHE3| .90148*** .11224 8.03 .0000 .68150 1.12147
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
However, my LCM model for the combined dataset indicates two classes where the PrbCls1 is 0.0 and PrbCls2 is 1.0. Please see below.
|-> sample ;all $
|-> Nlogit
;lhs=choice,cset,alt
;choices= appA, appB, neither, appC, appD
;lcm
;pts=2
;pds=pan2
;model:
U(appA) = ASC_A + TR2*tr2 + TR3*tr3 + TY2*ty2 + TY3*ty3 + M2*m2 + M3*m3 + HE2*he2 + HE3*he3 /
U(appB) = ASC_B + TR2*tr2 + TR3*tr3 + TY2*ty2 + TY3*ty3 + M2*m2 + M3*m3 + HE2*he2 + HE3*he3 /
U(appC) = ASC_C + TR2*tr2 + TR3*tr3 + TY2*ty2 + TY3*ty3 + M2*m2 + M3*m3 + HE2*he2 + HE3*he3 /
U(appD) = TR2*tr2 + TR3*tr3 + TY2*ty2 + TY3*ty3 + M2*m2 + M3*m3 + HE2*he2 + HE3*he3
$
Normal exit: 5 iterations. Status=0, F= 2650.082
Line search at iteration 58 does not improve fn. Exiting optimization.
-----------------------------------------------------------------------------
Latent Class Logit Model
Dependent variable CHOICE
Log likelihood function -2399.17591
Restricted log likelihood -4511.25447
Chi squared [ 23 d.f.] 4224.15712
Significance level .00000
McFadden Pseudo R-squared .4681799
Estimation based on N = 2803, K = 23
Inf.Cr.AIC = 4844.4 AIC/N = 1.728
Model estimated: Nov 24, 2023, 09:27:00
Constants only must be computed directly
Use NLOGIT ;...;RHS=ONE$
At start values -2650.0370 .0947******
Response data are given as ind. choices
Number of latent classes = 2
Average Class Probabilities
.472 .528
LCM model with panel has 302 groups
Variable number of obs./group =PAN2
Number of obs.= 2803, skipped 0 obs
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Utility parameters in latent class -->> 1
ASC_A|1| .70351*** .24393 2.88 .0039 .22542 1.18160
TR2|1| .03834 .07570 .51 .6125 -.11003 .18671
TR3|1| -.45627*** .07965 -5.73 .0000 -.61239 -.30016
TY2|1| .47226*** .12344 3.83 .0001 .23033 .71420
TY3|1| .88861*** .13413 6.62 .0000 .62571 1.15150
M2|1| 1.52117*** .15935 9.55 .0000 1.20885 1.83348
M3|1| 2.00652*** .18045 11.12 .0000 1.65283 2.36020
HE2|1| .33145*** .07495 4.42 .0000 .18456 .47835
HE3|1| .65084*** .08541 7.62 .0000 .48345 .81824
ASC_B|1| .49572** .24586 2.02 .0438 .01384 .97759
ASC_C|1| .08365 .28282 .30 .7674 -.47066 .63796
|Utility parameters in latent class -->> 2
ASC_A|2| -.16553 .25222 -.66 .5116 -.65987 .32880
TR2|2| -.11543 .11546 -1.00 .3174 -.34172 .11086
TR3|2| -.68659*** .12960 -5.30 .0000 -.94061 -.43257
TY2|2| -.24102 .14812 -1.63 .1037 -.53133 .04928
TY3|2| -.16998 .15433 -1.10 .2707 -.47246 .13249
M2|2| -.05793 .16353 -.35 .7232 -.37844 .26259
M3|2| -.28550 .18659 -1.53 .1260 -.65121 .08021
HE2|2| -.06837 .11835 -.58 .5635 -.30033 .16359
HE3|2| -.15197 .12901 -1.18 .2388 -.40483 .10090
ASC_B|2| -.19388 .25305 -.77 .4436 -.68985 .30208
ASC_C|2| .38469*** .13191 2.92 .0035 .12615 .64322
|Estimated latent class probabilities
PrbCls1| 0.0 .4136D-08 .00 1.0000 -.81071D-08 .81071D-08
PrbCls2| 1.00000*** .4136D-08 ******** .0000 1.00000 1.00000
--------+--------------------------------------------------------------------
Note: nnnnn.D-xx or D+xx => multiply by 10 to -xx or +xx.
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
When I ran the model for ;pts=3, Class probabilities remain the same with the probability for one Class having 1.0 and 0.0 for other, as shown below.
|Estimated latent class probabilities
PrbCls1| 0.0 .1464D-07 .00 1.0000 -.28690D-07 .28690D-07
PrbCls2| 1.00000*** .2872D-05 ******** .0000 .99999 1.00001
PrbCls3| 0.0 .2872D-05 .00 1.0000 -.56295D-05 .56298D-05
My questions are;
1. Why does the LCM indicate that the probability of participants belonging to Class 2 is 100%, when the MMNL model indicate there are significant preference heterogeneity for attribute-levels among users.
Is there a fault in my LCM code, or could this result be plausible?
2. If certain demographic variables such as age and sex become statistically non-significant in all Classes, do you recommend removing those non-significant variables and re-running the model, or keeping all socio-demographic data in the model irrespective of their significance?
Thank you so much for your time.
Kind regards,
Sumudu
This question relates to Nlogit analysis. My DCE study explores user preferences for a mobile health app.
I have 4 attributes, training (Tr), typing (ty), monitoring (m) and health education (he), each with 3 levels.
Each participant answered 8 choice tasks with 3 alternatives, mobile app A, app B and neither.
When a participant chose neither option, they were then forced to select one option from app A or B, with same attribute-level combinations as in the original choice task.
So, I have two datasets; (1) combined dataset with responses both for conditional and unconditional choice tasks and (2) unconditional dataset.
My panel MMNL model for the combined dataset indicates that coefficients for all attribute-levels except Tr2 and Ty2 are statistically significant. Please see below.
|-> sample ;all $
|-> Nlogit
;lhs= choice,cset,alt
;choices= appA, appB, neither, appC, appD
;rpl
;fcn = tr2(n), tr3(n), ty2(n), ty3(n), m2(n), m3(n), he2(n), he3(n)
;pts=500 ;halton
;pds=Pan2
;model:
U(appA) = ASC_A + TR2*tr2 + TR3*tr3 + TY2*ty2 + TY3*ty3 + M2*m2 + M3*m3 + HE2*he2 + HE3*he3 /
U(appB) = ASC_B + TR2*tr2 + TR3*tr3 + TY2*ty2 + TY3*ty3 + M2*m2 + M3*m3 + HE2*he2 + HE3*he3 /
U(appC) = ASC_C + TR2*tr2 + TR3*tr3 + TY2*ty2 + TY3*ty3 + M2*m2 + M3*m3 + HE2*he2 + HE3*he3 /
U(appD) = TR2*tr2 + TR3*tr3 + TY2*ty2 + TY3*ty3 + M2*m2 + M3*m3 + HE2*he2 + HE3*he3
$
Normal exit: 31 iterations. Status=0, F= 2435.644
-----------------------------------------------------------------------------
Random Parameters Logit Model
Dependent variable CHOICE
Log likelihood function -2435.64402
Restricted log likelihood -4511.25447
Chi squared [ 19 d.f.] 4151.22089
Significance level .00000
McFadden Pseudo R-squared .4600961
Estimation based on N = 2803, K = 19
Inf.Cr.AIC = 4909.3 AIC/N = 1.751
Model estimated: Nov 19, 2023, 19:21:57
Constants only must be computed directly
Use NLOGIT ;...;RHS=ONE$
At start values -2650.0815 .0809******
Response data are given as ind. choices
Replications for simulated probs. = 500
Halton sequences used for simulations
RPL model with panel has 302 groups
Variable number of obs./group =PAN2
Number of obs.= 2803, skipped 0 obs
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Random parameters in utility functions
TR2| .00500 .09337 .05 .9573 -.17801 .18801
TR3| -.81923*** .13418 -6.11 .0000 -1.08223 -.55624
TY2| -.04009 .12429 -.32 .7470 -.28370 .20351
TY3| .40498*** .12948 3.13 .0018 .15121 .65876
M2| 1.10921*** .14963 7.41 .0000 .81594 1.40247
M3| 1.35348*** .18585 7.28 .0000 .98922 1.71775
HE2| .21667** .09269 2.34 .0194 .03499 .39834
HE3| .67798*** .11630 5.83 .0000 .45003 .90593
|Nonrandom parameters in utility functions
ASC_A| .18888 .18619 1.01 .3104 -.17606 .55381
ASC_B| -.08429 .18404 -.46 .6469 -.44502 .27643
ASC_C| .67803*** .15667 4.33 .0000 .37097 .98509
|Distns. of RPs. Std.Devs or limits of triangular
NsTR2| .74171*** .15390 4.82 .0000 .44009 1.04334
NsTR3| 1.47791*** .13593 10.87 .0000 1.21150 1.74433
NsTY2| .72191*** .12957 5.57 .0000 .46796 .97587
NsTY3| .99317*** .11326 8.77 .0000 .77119 1.21514
NsM2| 1.09848*** .10748 10.22 .0000 .88781 1.30914
NsM3| 1.83987*** .15884 11.58 .0000 1.52855 2.15120
NsHE2| .59425*** .17010 3.49 .0005 .26085 .92764
NsHE3| .90148*** .11224 8.03 .0000 .68150 1.12147
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
However, my LCM model for the combined dataset indicates two classes where the PrbCls1 is 0.0 and PrbCls2 is 1.0. Please see below.
|-> sample ;all $
|-> Nlogit
;lhs=choice,cset,alt
;choices= appA, appB, neither, appC, appD
;lcm
;pts=2
;pds=pan2
;model:
U(appA) = ASC_A + TR2*tr2 + TR3*tr3 + TY2*ty2 + TY3*ty3 + M2*m2 + M3*m3 + HE2*he2 + HE3*he3 /
U(appB) = ASC_B + TR2*tr2 + TR3*tr3 + TY2*ty2 + TY3*ty3 + M2*m2 + M3*m3 + HE2*he2 + HE3*he3 /
U(appC) = ASC_C + TR2*tr2 + TR3*tr3 + TY2*ty2 + TY3*ty3 + M2*m2 + M3*m3 + HE2*he2 + HE3*he3 /
U(appD) = TR2*tr2 + TR3*tr3 + TY2*ty2 + TY3*ty3 + M2*m2 + M3*m3 + HE2*he2 + HE3*he3
$
Normal exit: 5 iterations. Status=0, F= 2650.082
Line search at iteration 58 does not improve fn. Exiting optimization.
-----------------------------------------------------------------------------
Latent Class Logit Model
Dependent variable CHOICE
Log likelihood function -2399.17591
Restricted log likelihood -4511.25447
Chi squared [ 23 d.f.] 4224.15712
Significance level .00000
McFadden Pseudo R-squared .4681799
Estimation based on N = 2803, K = 23
Inf.Cr.AIC = 4844.4 AIC/N = 1.728
Model estimated: Nov 24, 2023, 09:27:00
Constants only must be computed directly
Use NLOGIT ;...;RHS=ONE$
At start values -2650.0370 .0947******
Response data are given as ind. choices
Number of latent classes = 2
Average Class Probabilities
.472 .528
LCM model with panel has 302 groups
Variable number of obs./group =PAN2
Number of obs.= 2803, skipped 0 obs
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
CHOICE| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Utility parameters in latent class -->> 1
ASC_A|1| .70351*** .24393 2.88 .0039 .22542 1.18160
TR2|1| .03834 .07570 .51 .6125 -.11003 .18671
TR3|1| -.45627*** .07965 -5.73 .0000 -.61239 -.30016
TY2|1| .47226*** .12344 3.83 .0001 .23033 .71420
TY3|1| .88861*** .13413 6.62 .0000 .62571 1.15150
M2|1| 1.52117*** .15935 9.55 .0000 1.20885 1.83348
M3|1| 2.00652*** .18045 11.12 .0000 1.65283 2.36020
HE2|1| .33145*** .07495 4.42 .0000 .18456 .47835
HE3|1| .65084*** .08541 7.62 .0000 .48345 .81824
ASC_B|1| .49572** .24586 2.02 .0438 .01384 .97759
ASC_C|1| .08365 .28282 .30 .7674 -.47066 .63796
|Utility parameters in latent class -->> 2
ASC_A|2| -.16553 .25222 -.66 .5116 -.65987 .32880
TR2|2| -.11543 .11546 -1.00 .3174 -.34172 .11086
TR3|2| -.68659*** .12960 -5.30 .0000 -.94061 -.43257
TY2|2| -.24102 .14812 -1.63 .1037 -.53133 .04928
TY3|2| -.16998 .15433 -1.10 .2707 -.47246 .13249
M2|2| -.05793 .16353 -.35 .7232 -.37844 .26259
M3|2| -.28550 .18659 -1.53 .1260 -.65121 .08021
HE2|2| -.06837 .11835 -.58 .5635 -.30033 .16359
HE3|2| -.15197 .12901 -1.18 .2388 -.40483 .10090
ASC_B|2| -.19388 .25305 -.77 .4436 -.68985 .30208
ASC_C|2| .38469*** .13191 2.92 .0035 .12615 .64322
|Estimated latent class probabilities
PrbCls1| 0.0 .4136D-08 .00 1.0000 -.81071D-08 .81071D-08
PrbCls2| 1.00000*** .4136D-08 ******** .0000 1.00000 1.00000
--------+--------------------------------------------------------------------
Note: nnnnn.D-xx or D+xx => multiply by 10 to -xx or +xx.
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
When I ran the model for ;pts=3, Class probabilities remain the same with the probability for one Class having 1.0 and 0.0 for other, as shown below.
|Estimated latent class probabilities
PrbCls1| 0.0 .1464D-07 .00 1.0000 -.28690D-07 .28690D-07
PrbCls2| 1.00000*** .2872D-05 ******** .0000 .99999 1.00001
PrbCls3| 0.0 .2872D-05 .00 1.0000 -.56295D-05 .56298D-05
My questions are;
1. Why does the LCM indicate that the probability of participants belonging to Class 2 is 100%, when the MMNL model indicate there are significant preference heterogeneity for attribute-levels among users.
Is there a fault in my LCM code, or could this result be plausible?
2. If certain demographic variables such as age and sex become statistically non-significant in all Classes, do you recommend removing those non-significant variables and re-running the model, or keeping all socio-demographic data in the model irrespective of their significance?
Thank you so much for your time.
Kind regards,
Sumudu