Sample menu:


Multinomial Probit Models

Symmetric multinomial probit models

Burgette and Hahn, 2013.  "A symmetric prior for multinomial probit models." Submitted.  Manuscript available here.

Previous Bayesian multinomial probit models have required the analyst to choose a base or reference category that identifies the location of the latent utilities that are commonly used to define the model.  Previous work has shown that the choice of which category serves as the base category can impact the posterior predictions.  We describe a Bayesian multinomial probit model that requires no base category and therefore is symmetric with respect to relabeling the outcome categories.  We achieve this through a series of sum-to-zero identifying restrictions on the latent utilities and the regression parameters.  We propose an efficient marginal data augmentation Gibbs sampler to estimate the model.

The trace-restricted multinomial probit model

Burgette and Nordheim (2012). "The trace restriction: An alternative identification strategy for the Bayesian multinomial probit model." Journal of Business and Economic Statistics, 30(3): 404-410. 

Previously, multinomial probit (MNP) models have been identified by fixing one of the variance parameters at one. In a consumer choice dataset, we demonstrate that posterior predictions can be sensitive to the choice of which element we fix. To avoid this arbitrary choice, we propose a model that instead restricts the trace of the covariance matrix. In simulated data, we find that this results in more reliable predictions.  Further, the trace restriction can provide stronger identification, yielding more meaningful marginal posterior distributions.  The trace restriction is now the default behavior of the MNP package of Kosuke Imai and David van Dyk.

Simulating geography to protect data confidentiality

Burgette and Reiter (2011). "Multiple-shrinkage multinomial probit models with applications to simulating geographies in public use data." Forthcoming in Bayesian Analysis.  Manuscript available here.

Public release of spatially-referenced microdata can entail significant risk that motivated intruders will be able to learn the identities of respondents who provide sensitive data. To mitigate this risk, it is standard to aggregate data over large geographic areas, which can degrade the utility of the data for legitimate researchers. As an alternative, we propose methods to produce synthetic sets of areal identifiers. Our goal is to simulate multiple sets of data that--on average--retain the statistical properties of the observed data, while protecting respondents' anonymity. We propose methods to simulate areal identifiers using a multinomial probit model. Because this results in a model that (in typical applications) will have hundreds or even thousands of response categories, we propose a sparse structure for the multinomial model. Further, we suggest a simplified, latent Potts model structure for the regression coefficients, which can help to preserve spatial relationships. We demonstrate our methods on simulated and genuine data.

Multinomial probit selection/switching model

Burgette and Nordheim (2010). "A full Gibbs sampler for the Bayesian multinomial probit switching model." Manuscript available here.

In this paper, we propose a model for a selection or switching model with a multinomial response. These models are useful when respondents self-select the level of a treatment, or select themselves into or out of the sample. Unlike related work, our algorithm only requires Gibbs steps. C code and an R interface are available in the endogMNP package, available from CRAN.

Methods for missing data

Bayesian nonparametrics for differing measurement scales

Burgette and Reiter (2011).  "Nonparametric Bayesian multiple imputation for missing data due to mid-study switching of measurement methods."  To appear in the Journal of the American Statistical Association. Manuscript available here.

In an ongoing study of adverse birth outcomes, the study team switched from one analytical lab to another when measuring levels of contaminants like lead in the mothers' blood.  Inspection makes it clear that the marginal distributions of contaminant measurements are very different for the two labs.  We describe three Bayesian nonparametric approaches for flexibly combining the observations into a single lab's scale.  Through a series of simulation studies, we provide guidelines for when each method is most appropriate.  We then apply our methods to the birth data. 

Multiple imputation via regression trees

Burgette and Reiter (2010). "Multiple imputation for missing data via sequential regression trees," American Journal of Epidemiology, 170(9): 1070-1076.  Available here.

We consider flexible nonparametric models for performing multiple imputation.  We find that this imputation strategy significantly outperforms MICE imputations based on main effects when the true model includes interactions and quadratic terms. We apply these methods to an epidemiological study of adverse birth outcomes.

Quantile regression

Quantile regression via latent factors

Burgette and Reiter (2011). "Modeling adverse birth outcomes via confirmatory factor quantile regression." Forthcoming at Biometrics.

We develop a Bayesian quantile regression model that assumes a confirmatory factor model structure for at least part of the design matrix.  This can be useful when a collection of the covariates measure a latent concept that we would like to include as a predictor.  For example, we include a "psychosocial health" factor in a model of low birth weights, where psychosocial health is measured indirectly via a collection of observed variables.  C code with an R interface is available in the factorQR package, which is on CRAN.

Exploratory data analysis for quantile regression

Burgette, Reiter and Miranda (2011). "Exploratory quantile regression with many covariates: An application to adverse birth outcomes." Epidemiology 22(6): 859-866.

We propose a framework for exploring high-dimensional predictor spaces for quantile regression, using lasso and elastic net penalties (both fit by boosting).  We apply these methods to a study of birth weights, where the focus is on the lower percentiles of the response distribution.  Here we have more than 600 covariates when we include the two-way interactions that are of potential interest. 


Bayesian growth mixture modeling

Neelon, Swamy, Burgette and Miranda (2011). "A Bayesian growth mixture model to examine gestational hypertension and birth outcomes."  Statistics in Medicine 30(22): 2721-2735.

We propose a Bayesian growth mixture model to jointly examine the associations between longitudinal blood pressure trajectories, preterm birth (PTB), and low birth weight (LBW).  The model partitions women into distinct classes characterized by a longitudinal mean arterial pressure curve and joint probabilities of PTB and LBW.