S IMULTANEOUS MODELING OF B INARY R ESPONSES : A SEQUENCE OF BINARY MODELS AND O RDINAL MULTINOMIAL WITH B AYESIAN ESTIMATES IMPACTING HIV/AIDS

A BSTRACT B ACKGROUND : In this research, we examined several binary factors impact binary 36 outcomes simultaneous and how the information of HIV/AIDS is perceived by the public 37 is associated with outcomes to HIV/AIDS. 38 M ETHODS : We used polytomous responses through a sequence of binary models and a 39 multinomial logistic regression model with Bayesian estimates to analyze the 2009 40 Mozambique survey data as it pertains to blood test , heard of HIV/AIDS and heard about 41 campaign . 42 R ESULTS : The analysis reveals that both heard about HIV and heard about the campaign 43 are represented differentially in testing positive. Wealth, education and thinking of risk 44 is positively associated with heard about HIV and heard about the campaign regardless 45 of HIV. However, religious is a positive factor for social efforts of hearing of HIV/AIDS 46 and the campaign. Both the polytomous response model and the ordinal model with 47 model gave the same findings in regards to the marginal mean. However, the polytomous 48 (conditional) models gave additional information about education. 49 C ONCLUSIONS : While knowledge of the disease continues to be important, the future 50 social effort to combat HIV in Mozambique may need different strategies in different 51 subpopulation groups. 52 53 56 57

For this study, variables such as marital status, education, work, and electricity 110 in the home are key factors, and therefore were used as covariates in the models and 111 coded as follows: gender (0 females, 1 males), education (0 for ≤ 3 years of education, 112 1 otherwise), marital status (0 living alone, 1 otherwise), working (0 for not working, 1 113 otherwise) and electricity at home (0 for no, 1 for yes). 114

STATISTICAL METHODS 115
It is customary in any national survey to have a plethora of factors impacting 116 several outcomes of interest in an attempt to identify the key factors. However, those 117 factors may provide different results if they examine the influence on one outcome at a 118 time versus several outcomes. As such, it is important to use models that address the 119 questions based on simultaneous modeling. To do otherwise, may result in make 120 conclusions and recommendations that may not be appropriate. It denies one the 121 opportunity to make statements about the entire system of variables. Due to the 122 uniqueness of the data, we relied on two different simultaneous modeling to identify the 123 key factors. 124

POLYTOMOUS RESPONSES BY SEQUENCE OF BINARY MODELS 125
The uniqueness of the Mozambique survey data affords a method of modeling 126 polytomous responses through a sequence of binary models or as a nested dichotomy 127 model. The approach is attractive when the response is naturally arranged as 128 polytomous responses by a system of nested dichotomous models. This method 129 decomposes a multi-class problem into a collection of binary problems. Such a system 130 recursively applies binary splits to divide the set of classes into two subsets and trains (trees) and then model them at each node or stage, Figure 1. 134

FIGURE 1 ABOUT HERE 135 136
Based on the data structure shown in Figure 1, we performed three submodels: 137 Model #1 is fitted at stage 1 to model the logit of knowledgeable of HIV/AIDS defined as: 138 Model #1: logit (P Know ) = log ( P Know P Know ̿̿̿̿̿̿̿̿̿ ) = ω 1 + β 1 X Gen + β 2 X Ele + β 3 X Edu + β 4 X Emp + β 5 X Mar , 139 where β i for i = 1,2, … ,5; X Gen denotes the covariate for gender, X Ele denotes the covariate 140 for electricity, X Edu denotes the covariate for education, X Emp denotes the covariate for 141 employment, and X Mar denotes the covariate for marriage; P Know denoted the probability 142 of knowledgeable about HIV/AIDS and P Know ̿̿̿̿̿̿̿̿̿ is the complement. Model #2 is fitted at 143 stage 2, to model using data from those subjects that are knowledge about HIV/AIDS, 144 to model the logit of being aware of the HIV/AIDS campaign: 145 Model 2 : logit(P Aware ) = ω 1 + β 1 X Gen + β 2 X Ele + β 3 X Edu + β 4 X Emp + β 5 X Mar 146 where P Aware denoted the probability of being aware of the campaign about HIV/AIDS 147 and P Aware ̿̿̿̿̿̿̿̿̿̿ is the complement. The Model #3 is fitted at stage 3, to model using data 148 from those subjects that are knowledge about HIV/AIDS and are aware of the HIV/AIDS 149 campaign, to model the logit of a positive blood test for HIV/AIDS.

DERIVED MULTINOMIAL 158
We considered information provided by the three binary responses [blood test, 159 awareness of campaign, and knowledge of disease] to jointly model through a  There is 000 =1910 with cell probability 000 , denotes those who did not test logit (P 001 ) = log ( P 001 P 000 ) = θ 1 + β 1 X Gen + β 2 X Ele + β 3 X Edu + β 4 X Emp + β 5 X Mar 207 where (θ 1 , β 1 , … . . β 5 )has as prior distribution that of the normal distribution. Similarly 208 for the logit (P 011 ), and the logit (P 111 ) . We fit these cumulative multinomial models with 209 Bayes estimates using PROC MCMC. 210

RESULTS 211
There are 8,834 Mozambique respondents in the database. Table 2

TABLE 3 ABOUT HERE 227
The degree of dependency among the three binary responses was also assessed, and 229 the results are shown in Table 4. There is a strong association between knowledge of 230 HIV/AIDS and awareness of an HIV/AIDS campaign (Φ = 0.6152, p < 0.001). When 231 blood test and awareness of an HIV/AIDS campaign were analyzed, the correlation weak 232 and yet statistically significant, (Φ = 0.1622). The correlation between blood test and 233 knowledge of HIV/AIDS disease is also statistically significant, but weak as well. 234

POLYTOMOUS RESPONSES THROUGH SEQUENCE OF MODELS 237
The present data structure is unique but applicable for the fit of sequencing 238 models. As such, we fit a sequence of logit models. We began with logit of probability of 239 knowledgeable of HIV/AIDS with key predictors of gender, electricity, education, single 240 and employment. We found that those residents having electricity, who were educated, 241 and who were married were more likely to be knowledgeable of HIV/AIDS, Table 5. In Stage #2, we fit a logit of probability of HIV/AIDS awareness campaign. Of 246 those who were knowledgeable, 5,110 residents heard about the campaign and 1,814 247 did not. In stage #2, we fit a model only to the residents who were knowledgeable of the 248 campaign. We found that males, having electricity in the home, educated, and marital 249 status were key factors in being aware of the campaign, Table 5. 250 In Stage #3, we fit the logit of testing positive. Seven hundred and sixty-nine 251 residents tested positive and 4,341 tested negative. In stage #3, we fit a model only to the residents who were knowledgeable of the campaign. We found that of those aware 253 and knowledgeable, females and employed were more likely to test positive, Table 5. 254 The sequence of binary model is particularly useful when one wants to investigate 255 certain subgroups. Education is a key factor in people being knowledgeable about AIDS 256 and being aware of the campaign. 257

MULTINOMIAL LOGISTIC REGRESSION WITH BAYES ESTIMATES 258
The data structure with empty cells and limited information for certain response 259 categories necessitated the need to find a model that is less affected by the extremity in 260 the distributions and in certain subgroups, Albert and Chib (1993  The diagnostic plots, trace plot, autocorrelation plot and kernel density plot indicate 281 that all posterior estimates converged. An example of those plots is given in Figure 2. 282 The trace plot is constant over the graph indicates that the Markov chain has stabilized. 283 The autocorrelation implies good mixing. It is imperative when analyzing survey data to do so when possible with 292 simultaneous modeling. In this research, we analyzed gender, education, electricity, 293 employment and marital status in addressing the impact on blood test (measures of 294 risk), awareness, and knowledge of HIV/AIDS, simultaneously. The data structure was 295 not the usual normal data patterns. We encountered empty cells and cells with too little 296 information to fit the usual statistical model. The empty cells and the little information 297 led to a unique set of data. On one hand, it allowed a set of sequences of submodels. On 298 the other hand, it leads to an ordering of the cells, which was informational. We used 299 two related but different models that looked at the responses simultaneously.

• Competing interests 327
There is no competing interests 328 Neither Dr. Dornelles, Dr. Fang, Dr. Wang nor Dr. Wilson had any funding in 330 regards to this research. There are no funders. 331