期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A hierarchical Bayesian model for binary data incorporating selection bias

Seongmi Choi Balgobin Nandram 《统计学通讯:模拟与计算》2017,46(6):4767-4782

We consider a Bayesian nonignorable model to accommodate a nonignorable selection mechanism for predicting small area proportions. Our main objective is to extend a model on selection bias in a previously published paper, coauthored by four authors, to accommodate small areas. These authors assume that the survey weights (or their reciprocals that we also call selection probabilities) are available, but there is no simple relation between the binary responses and the selection probabilities. To capture the nonignorable selection bias within each area, they assume that the binary responses and the selection probabilities are correlated. To accommodate the small areas, we extend their model to a hierarchical Bayesian nonignorable model and we use Markov chain Monte Carlo methods to fit it. We illustrate our methodology using a numerical example obtained from data on activity limitation in the U.S. National Health Interview Survey. We also perform a simulation study to assess the effect of the correlation between the binary responses and the selection probabilities. 相似文献

2.

Likelihood analysis for a class of beta mixed models

Wagner Hugo Bonat Paulo Justiniano Ribeiro Jr Walmes Marques Zeviani 《Journal of applied statistics》2015,42(2):252-266

Beta regression is a suitable choice for modelling continuous response variables taking values on the unit interval. Data structures such as hierarchical, repeated measures and longitudinal typically induce extra variability and/or dependence and can be accounted for by the inclusion of random effects. In this sense, Statistical inference typically requires numerical methods, possibly combined with sampling algorithms. A class of Beta mixed models is adopted for the analysis of two real problems with grouped data structures. We focus on likelihood inference and describe the implemented algorithms. The first is a study on the life quality index of industry workers with data collected according to an hierarchical sampling scheme. The second is a study assessing the impact of hydroelectric power plants upon measures of water quality indexes up, downstream and at the reservoirs of the dammed rivers, with a nested and longitudinal data structure. Results from different algorithms are reported for comparison including from data-cloning, an alternative to numerical approximations which also allows assessing identifiability. Confidence intervals based on profiled likelihoods are compared with those obtained by asymptotic quadratic approximations, showing relevant differences for parameters related to the random effects. In both cases, the scientific hypothesis of interest was investigated by comparing alternative models, leading to relevant interpretations of the results within each context. 相似文献

3.

On the Correlation Structure of Gaussian Copula Models for Geostatistical Count Data

下载免费PDF全文

Zifei Han Victor De Oliveira 《Australian & New Zealand Journal of Statistics》2016,58(1):47-69

We describe a class of random field models for geostatistical count data based on Gaussian copulas. Unlike hierarchical Poisson models often used to describe this type of data, Gaussian copula models allow a more direct modelling of the marginal distributions and association structure of the count data. We study in detail the correlation structure of these random fields when the family of marginal distributions is either negative binomial or zero‐inflated Poisson; these represent two types of overdispersion often encountered in geostatistical count data. We also contrast the correlation structure of one of these Gaussian copula models with that of a hierarchical Poisson model having the same family of marginal distributions, and show that the former is more flexible than the latter in terms of range of feasible correlation, sensitivity to the mean function and modelling of isotropy. An exploratory analysis of a dataset of Japanese beetle larvae counts illustrate some of the findings. All of these investigations show that Gaussian copula models are useful alternatives to hierarchical Poisson models, specially for geostatistical count data that display substantial correlation and small overdispersion. 相似文献

4.

Longitudinal Poisson modeling: an application for CD4 counting in HIV-infected patients

Emílio Augusto Coelho-Barros Jorge Alberto Achcar Josmar Mazucheli 《Journal of applied statistics》2010,37(5):865-880

In this paper, we present different “frailty” models to analyze longitudinal data in the presence of covariates. These models incorporate the extra-Poisson variability and the possible correlation among the repeated counting data for each individual. Assuming a CD4 counting data set in HIV-infected patients, we develop a hierarchical Bayesian analysis considering the different proposed models and using Markov Chain Monte Carlo methods. We also discuss some Bayesian discrimination aspects for the choice of the best model. 相似文献

5.

High dimensional variable selection with clustered data: an application of random multivariate survival forests for detection of outlier medical device components

Guy Cafri Peter Calhoun Juanjuan Fan 《Journal of Statistical Computation and Simulation》2019,89(8):1410-1422

In many medical studies patients are nested or clustered within doctor. With many explanatory variables, variable selection with clustered data can be challenging. We propose a method for variable selection based on random forest that addresses clustered data through stratified binary splits. Our motivating example involves the detection orthopedic device components from a large pool of candidates, where each patient belongs to a surgeon. Simulations compare the performance of survival forests grown using the stratified logrank statistic to conventional and robust logrank statistics, as well as a method to select variables using a threshold value based on a variable's empirical null distribution. The stratified logrank test performs superior to conventional and robust methods when data are generated to have cluster-specific effects, and when cluster sizes are sufficiently large, perform comparably to the splitting alternatives in the absence of cluster-specific effects. Thresholding was effective at distinguishing between important and unimportant variables. 相似文献

6.

Bayesian computation for Log-Gaussian Cox processes: a comparative analysis of methods 总被引：1，自引：0，他引：1

Ming Teng Farouk Nathoo 《Journal of Statistical Computation and Simulation》2017,87(11):2227-2252

The Log-Gaussian Cox process is a commonly used model for the analysis of spatial point pattern data. Fitting this model is difficult because of its doubly stochastic property, that is, it is a hierarchical combination of a Poisson process at the first level and a Gaussian process at the second level. Various methods have been proposed to estimate such a process, including traditional likelihood-based approaches as well as Bayesian methods. We focus here on Bayesian methods and several approaches that have been considered for model fitting within this framework, including Hamiltonian Monte Carlo, the Integrated nested Laplace approximation, and Variational Bayes. We consider these approaches and make comparisons with respect to statistical and computational efficiency. These comparisons are made through several simulation studies as well as through two applications, the first examining ecological data and the second involving neuroimaging data. 相似文献

7.

Estimation in large cross random-effect models by data augmentation

D. Clayton & J. Rasbash 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》1999,162(3):425-436

Estimation in mixed linear models is, in general, computationally demanding, since applied problems may involve extensive data sets and large numbers of random effects. Existing computer algorithms are slow and/or require large amounts of memory. These problems are compounded in generalized linear mixed models for categorical data, since even approximate methods involve fitting of a linear mixed model within steps of an iteratively reweighted least squares algorithm. Only in models in which the random effects are hierarchically nested can the computations for fitting these models to large data sets be carried out rapidly. We describe a data augmentation approach to these computational difficulties in which we repeatedly fit an overlapping series of submodels, incorporating the missing terms in each submodel as 'offsets'. The submodels are chosen so that they have a nested random-effect structure, thus allowing maximum exploitation of the computational efficiency which is available in this case. Examples of the use of the algorithm for both metric and discrete responses are discussed, all calculations being carried out using macros within the MLwiN program. 相似文献

8.

Marginal Correlation in Longitudinal Binary Data Based on Generalized Linear Mixed Models

Tony Vangeneugden Geert Molenberghs Annouschka Laenen Helena Geys Caroline Beunckens Cristina Sotto 《统计学通讯:理论与方法》2013,42(19):3540-3557

This work aims at investigating marginal correlation within and between longitudinal data sequences. Useful and intuitive approximate expressions are derived based on generalized linear mixed models. Data from four double-blind randomized clinical trials are used to estimate the intra-class coefficient of reliability for a binary response. Additionally, the correlation between such a binary response and a continuous response is derived to evaluate the criterion validity of the binary response variable and the established continuous response variable. 相似文献

9.

Non-concave penalization in linear mixed-effect models and regularized selection of fixed effects

Abhik Ghosh Magne Thoresen 《AStA Advances in Statistical Analysis》2018,102(2):179-210

Mixed-effect models are very popular for analyzing data with a hierarchical structure. In medical applications, typical examples include repeated observations within subjects in a longitudinal design, patients nested within centers in a multicenter design. However, recently, due to the medical advances, the number of fixed-effect covariates collected from each patient can be quite large, e.g., data on gene expressions of each patient, and all of these variables are not necessarily important for the outcome. So, it is very important to choose the relevant covariates correctly for obtaining the optimal inference for the overall study. On the other hand, the relevant random effects will often be low-dimensional and pre-specified. In this paper, we consider regularized selection of important fixed-effect variables in linear mixed-effect models along with maximum penalized likelihood estimation of both fixed and random-effect parameters based on general non-concave penalties. Asymptotic and variable selection consistency with oracle properties are proved for low-dimensional cases as well as for high dimensionality of non-polynomial order of sample size (number of parameters is much larger than sample size). We also provide a suitable computationally efficient algorithm for implementation. Additionally, all the theoretical results are proved for a general non-convex optimization problem that applies to several important situations well beyond the mixed model setup (like finite mixture of regressions) illustrating the huge range of applicability of our proposal. 相似文献

10.

Estimation of a large cross-classified multilevel model to study academic achievement in a modular degree course

V. Simonite W. J. Browne 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2003,166(1):119-133

相似文献

11.

Multistage sampling for latent variable models

Thomas DC 《Lifetime data analysis》2007,13(4):565-581

I consider the design of multistage sampling schemes for epidemiologic studies involving latent variable models, with surrogate measurements of the latent variables on a subset of subjects. Such models arise in various situations: when detailed exposure measurements are combined with variables that can be used to assign exposures to unmeasured subjects; when biomarkers are obtained to assess an unobserved pathophysiologic process; or when additional information is to be obtained on confounding or modifying variables. In such situations, it may be possible to stratify the subsample on data available for all subjects in the main study, such as outcomes, exposure predictors, or geographic locations. Three circumstances where analytic calculations of the optimal design are possible are considered: (i) when all variables are binary; (ii) when all are normally distributed; and (iii) when the latent variable and its measurement are normally distributed, but the outcome is binary. In each of these cases, it is often possible to considerably improve the cost efficiency of the design by appropriate selection of the sampling fractions. More complex situations arise when the data are spatially distributed: the spatial correlation can be exploited to improve exposure assignment for unmeasured locations using available measurements on neighboring locations; some approaches for informative selection of the measurement sample using location and/or exposure predictor data are considered. 相似文献

12.

Small area estimation via heteroscedastic nested‐error regression

Jiming Jiang Thuan Nguyen 《Revue canadienne de statistique》2012,40(3):588-603

We show that the maximum likelihood estimators (MLEs) of the fixed effects and within‐cluster correlation are consistent in a heteroscedastic nested‐error regression (HNER) model with completely unknown within‐cluster variances under mild conditions. The result implies that the empirical best linear unbiased prediction (EBLUP) method for small area estimation is valid in such a case. We also show that ignoring the heteroscedasticity can lead to inconsistent estimation of the within‐cluster correlation and inferior predictive performance. A jackknife measure of uncertainty for the EBLUP is developed under the HNER model. Simulation studies are carried out to investigate the finite‐sample performance of the EBLUP and MLE under the HNER model, with comparisons to those under the nested‐error regression model in various situations, as well as that of the jackknife measure of uncertainty. The well‐known Iowa crops data is used for illustration. The Canadian Journal of Statistics 40: 588–603; 2012 © 2012 Statistical Society of Canada 相似文献

13.

Structural Nested Models and Standard Software: A Mathematical Foundation through Partial Likelihood

JUDITH J. LOK 《Scandinavian Journal of Statistics》2007,34(1):186-206

Abstract. In observational studies treatment may be adapted to the patient's state during the course of time. These covariates may in turn also react on the treatment under study, and so on. This makes it hard to distinguish between treatment effect and selection bias. Structural nested models aim at estimating treatment effect in such complicated situations, even when treatment may change at any time. We show that structural nested models can often be calculated with standard software, by using standard models to predict treatment as a tool to estimate treatment effect. Robins ( Survival analysis, Volume 6 of Encyclopedia of Biostatistics , John Wiley and Sons, Chichester, 1998) conjectured this, but so far it was unproven. We use a partial likelihood approach to choose the estimators and tests as a subclass of the estimators and tests in Lok (math. ST/0410271 at http://arXiv.org , 2004). We show that this is the class of estimators and tests that can be calculated with standard software. The estimators are consistent and asymptotically normal, and have interesting asymptotic properties. 相似文献

14.

Parameter identifiability in a class of random graph mixture models

Elizabeth S. Allman Catherine Matias John A. Rhodes 《Journal of statistical planning and inference》2011,141(5):1719-1736

We prove identifiability of parameters for a broad class of random graph mixture models. These models are characterized by a partition of the set of graph nodes into latent (unobservable) groups. The connectivities between nodes are independent random variables when conditioned on the groups of the nodes being connected. In the binary random graph case, in which edges are either present or absent, these models are known as stochastic blockmodels and have been widely used in the social sciences and, more recently, in biology. Their generalizations to weighted random graphs, either in parametric or non-parametric form, are also of interest. Despite these many applications, the parameter identifiability issue for such models has only been touched upon in the literature. We give here a thorough investigation of this problem. Our work also has consequences for parameter estimation. In particular, the estimation procedure proposed by Frank and Harary for binary affiliation models is revisited in this article. 相似文献

15.

Extended Poisson process modelling of dilution series data

M. J. Faddy D. M. Smith 《Journal of the Royal Statistical Society. Series C, Applied statistics》2008,57(4):461-471

Summary. Data comprising colony counts, or a binary variable representing fertile (or sterile) samples, as a dilution series of the containing medium are analysed by using extended Poisson process modelling. These models form a class of flexible probability distributions that are widely applicable to count and grouped binary data. Standard distributions such as Poisson and binomial, and those representing overdispersion and underdispersion relative to these distributions can be expressed within this class. For all the models in the class, likelihoods can be obtained. These models have not been widely used because of the perceived difficulty of performing the calculations and the lack of associated software. Exact calculation of the probabilities that are involved can be time consuming although accurate approximations that use considerably less computational time are available. Although dilution series data are the focus here, the models are applicable to any count or binary data. A benefit of the approach is the ability to draw likelihood-based inferences from the data. 相似文献

16.

Parallel Markov chain Monte Carlo for Bayesian hierarchical models with big data,in two stages

Zheng Wei 《Journal of applied statistics》2019,46(11):1917-1936

Due to the escalating growth of big data sets in recent years, new Bayesian Markov chain Monte Carlo (MCMC) parallel computing methods have been developed. These methods partition large data sets by observations into subsets. However, for Bayesian nested hierarchical models, typically only a few parameters are common for the full data set, with most parameters being group specific. Thus, parallel Bayesian MCMC methods that take into account the structure of the model and split the full data set by groups rather than by observations are a more natural approach for analysis. Here, we adapt and extend a recently introduced two-stage Bayesian hierarchical modeling approach, and we partition complete data sets by groups. In stage 1, the group-specific parameters are estimated independently in parallel. The stage 1 posteriors are used as proposal distributions in stage 2, where the target distribution is the full model. Using three-level and four-level models, we show in both simulation and real data studies that results of our method agree closely with the full data analysis, with greatly increased MCMC efficiency and greatly reduced computation times. The advantages of our method versus existing parallel MCMC computing methods are also described. 相似文献

17.

Improved estimation procedures for multilevel models with binary response: a case-study 总被引：2，自引：1，他引：1

Germán Rodríguez & Noreen Goldman 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2001,164(2):339-355

During recent years, analysts have been relying on approximate methods of inference to estimate multilevel models for binary or count data. In an earlier study of random-intercept models for binary outcomes we used simulated data to demonstrate that one such approximation, known as marginal quasi-likelihood, leads to a substantial attenuation bias in the estimates of both fixed and random effects whenever the random effects are non-trivial. In this paper, we fit three-level random-intercept models to actual data for two binary outcomes, to assess whether refined approximation procedures, namely penalized quasi-likelihood and second-order improvements to marginal and penalized quasi-likelihood, also underestimate the underlying parameters. The extent of the bias is assessed by two standards of comparison: exact maximum likelihood estimates, based on a Gauss–Hermite numerical quadrature procedure, and a set of Bayesian estimates, obtained from Gibbs sampling with diffuse priors. We also examine the effectiveness of a parametric bootstrap procedure for reducing the bias. The results indicate that second-order penalized quasi-likelihood estimates provide a considerable improvement over the other approximations, but all the methods of approximate inference result in a substantial underestimation of the fixed and random effects when the random effects are sizable. We also find that the parametric bootstrap method can eliminate the bias but is computationally very intensive. 相似文献

18.

Bayesian inference for categorical data analysis 总被引：1，自引：0，他引：1

Alan Agresti David B. Hitchcock 《Statistical Methods and Applications》2005,14(3):297-330

This article surveys Bayesian methods for categorical data analysis, with primary emphasis on contingency table analysis. Early innovations were proposed by Good (1953, 1956, 1965) for smoothing proportions in contingency tables and by Lindley (1964) for inference about odds ratios. These approaches primarily used conjugate beta and Dirichlet priors. Altham (1969, 1971) presented Bayesian analogs of small-sample frequentist tests for 2 x 2 tables using such priors. An alternative approach using normal priors for logits received considerable attention in the 1970s by Leonard and others (e.g., Leonard 1972). Adopted usually in a hierarchical form, the logit-normal approach allows greater flexibility and scope for generalization. The 1970s also saw considerable interest in loglinear modeling. The advent of modern computational methods since the mid-1980s has led to a growing literature on fully Bayesian analyses with models for categorical data, with main emphasis on generalized linear models such as logistic regression for binary and multi-category response variables. 相似文献

19.

On Rao score and PearsonX 2 statistics in generalized linear models

Gianfranco Lovison 《Statistical Papers》2005,46(4):555-574

The identity of the Rao score and PearsonX ² statistics is well known in the areas where the latter was first introduced: goodness-of-fit in contingency tables and binary responses. We show in this paper that the same identity holds when the two statistics are used for testing goodness-of-fit of Generalized Linear Models. We also highlight the connections that exist between the two statistics when they are used for the comparison of nested models. Finally, we discuss some merits of these unifying results. Work financially supported by cofin. MIUR grants 2000 and 2002. 相似文献

20.

Comparative GMM and GQL logistic regression models on hierarchical data

Bei Wang Jeffrey R. Wilson 《Journal of applied statistics》2018,45(3):409-425

We often rely on the likelihood to obtain estimates of regression parameters but it is not readily available for generalized linear mixed models (GLMMs). Inferences for the regression coefficients and the covariance parameters are key in these models. We presented alternative approaches for analyzing binary data from a hierarchical structure that do not rely on any distributional assumptions: a generalized quasi-likelihood (GQL) approach and a generalized method of moments (GMM) approach. These are alternative approaches to the typical maximum-likelihood approximation approach in Statistical Analysis System (SAS) such as Laplace approximation (LAP). We examined and compared the performance of GQL and GMM approaches with multiple random effects to the LAP approach as used in PROC GLIMMIX, SAS. The GQL approach tends to produce unbiased estimates, whereas the LAP approach can lead to highly biased estimates for certain scenarios. The GQL approach produces more accurate estimates on both the regression coefficients and the covariance parameters with smaller standard errors as compared to the GMM approach. We found that both GQL and GMM approaches are less likely to result in non-convergence as opposed to the LAP approach. A simulation study was conducted and a numerical example was presented for illustrative purposes. 相似文献