期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Modeling proportions and marginal counts simultaneously for clustered multinomial data with random cluster sizes

Guohua Yan Renjun Ma 《Journal of applied statistics》2016,43(6):1074-1087

Clustered multinomial data with random cluster sizes commonly appear in health, environmental and ecological studies. Traditional approaches for analyzing clustered multinomial data contemplate two assumptions. One of these assumptions is that cluster sizes are fixed, whereas the other demands cluster sizes to be positive. Randomness of the cluster sizes may be the determinant of the within-cluster correlation and between-cluster variation. We propose a baseline-category mixed model for clustered multinomial data with random cluster sizes based on Poisson mixed models. Our orthodox best linear unbiased predictor approach to this model depends only on the moment structure of unobserved distribution-free random effects. Our approach also consolidates the marginal and conditional modeling interpretations. Unlike the traditional methods, our approach can accommodate both random and zero cluster sizes. Two real-life multinomial data examples, crime data and food contamination data, are used to manifest our proposed methodology. 相似文献

2.

A Comparative Study of Approaches for Predicting Prostate Cancer from Longitudinal Data

Christopher H. Morrell Shan L. Sheng Larry J. Brant 《统计学通讯:模拟与计算》2013,42(9):1494-1513

Disease prediction based on longitudinal data can be done using various modeling approaches. Alternative approaches are compared using data from a longitudinal study to predict the onset of disease. The data are modeled using linear mixed-effects models. Posterior probabilities of group membership are computed starting with the first observation and sequentially adding observations until the subject is classified as developing the disease or until the last measurement is used. Individuals are classified by computing posterior probabilities using the marginal distributions of the mixed-effects models, the conditional distributions (conditional on the group-specific random effects), and the distributions of the random effects. 相似文献

3.

Estimation and inference of the joint conditional distribution for multivariate longitudinal data using nonparametric copulas

Minjung Kwak 《Journal of nonparametric statistics》2017,29(3):491-514

In this paper we study estimating the joint conditional distributions of multivariate longitudinal outcomes using regression models and copulas. For the estimation of marginal models, we consider a class of time-varying transformation models and combine the two marginal models using nonparametric empirical copulas. Our models and estimation method can be applied in many situations where the conditional mean-based models are not good enough. Empirical copulas combined with time-varying transformation models may allow quite flexible modelling for the joint conditional distributions for multivariate longitudinal data. We derive the asymptotic properties for the copula-based estimators of the joint conditional distribution functions. For illustration we apply our estimation method to an epidemiological study of childhood growth and blood pressure. 相似文献

4.

A new variational Bayesian algorithm with application to human mobility pattern modeling

Burton Wu Clare A. McGrory Anthony N. Pettitt 《Statistics and Computing》2012,22(1):185-203

A new variational Bayesian (VB) algorithm, split and eliminate VB (SEVB), for modeling data via a Gaussian mixture model (GMM) is developed. This new algorithm makes use of component splitting in a way that is more appropriate for analyzing a large number of highly heterogeneous spiky spatial patterns with weak prior information than existing VB-based approaches. SEVB is a highly computationally efficient approach to Bayesian inference and like any VB-based algorithm it can perform model selection and parameter value estimation simultaneously. A significant feature of our algorithm is that the fitted number of components is not limited by the initial proposal giving increased modeling flexibility. We introduce two types of split operation in addition to proposing a new goodness-of-fit measure for evaluating mixture models. We evaluate their usefulness through empirical studies. In addition, we illustrate the utility of our new approach in an application on modeling human mobility patterns. This application involves large volumes of highly heterogeneous spiky data; it is difficult to model this type of data well using the standard VB approach as it is too restrictive and lacking in the flexibility required. Empirical results suggest that our algorithm has also improved upon the goodness-of-fit that would have been achieved using the standard VB method, and that it is also more robust to various initialization settings. 相似文献

5.

The Performance of Two Data-Generation Processes for Data with Specified Marginal Treatment Odds Ratios

Peter C. Austin James Stafford 《统计学通讯:模拟与计算》2013,42(6):1039-1051

Monte Carlo simulation methods are increasingly being used to evaluate the property of statistical estimators in a variety of settings. The utility of these methods depends upon the existence of an appropriate data-generating process. Observational studies are increasingly being used to estimate the effects of exposures and interventions on outcomes. Conventional regression models allow for the estimation of conditional or adjusted estimates of treatment effects. There is an increasing interest in statistical methods for estimating marginal or average treatment effects. However, in many settings, conditional treatment effects can differ from marginal treatment effects. Therefore, existing data-generating processes for conditional treatment effects are of little use in assessing the performance of methods for estimating marginal treatment effects. In the current study, we describe and evaluate the performance of two different data-generation processes for generating data with a specified marginal odds ratio. The first process is based upon computing Taylor Series expansions of the probabilities of success for treated and untreated subjects. The expansions are then integrated over the distribution of the random variables to determine the marginal probabilities of success for treated and untreated subjects. The second process is based upon an iterative process of evaluating marginal odds ratios using Monte Carlo integration. The second method was found to be computationally simpler and to have superior performance compared to the first method. 相似文献

6.

An approximate maximum likelihood procedure for parameter estimation in multivariate discrete data regression models

Andrew W. Roddam 《Journal of applied statistics》2001,28(2):273-279

This paper considers an alternative to iterative procedures used to calculate maximum likelihood estimates of regression coefficients in a general class of discrete data regression models. These models can include both marginal and conditional models and also local regression models. The classical estimation procedure is generally via a Fisher-scoring algorithm and can be computationally intensive for high-dimensional problems. The alternative method proposed here is non-iterative and is likely to be more efficient in high-dimensional problems. The method is demonstrated on two different classes of regression models. 相似文献

7.

Non-parametric Maximum-Likelihood Estimation in a Semiparametric Mixture Model for Competing-Risks Data 总被引：1，自引：0，他引：1

I-SHOU CHANG CHAO A. HSIUNG CHI-CHUNG WEN YUH-JENN WU CHE-CHI YANG 《Scandinavian Journal of Statistics》2007,34(4):870-895

Abstract. This paper describes our studies on non-parametric maximum-likelihood estimators in a semiparametric mixture model for competing-risks data, in which proportional hazards models are specified for failure time models conditional on cause and a multinomial model is specified for the marginal distribution of cause conditional on covariates. We provide a verifiable identifiability condition and, based on it, establish an asymptotic profile likelihood theory for this model. We also provide efficient algorithms for the computation of the non-parametric maximum-likelihood estimate and its asymptotic variance. The success of this method is demonstrated in simulation studies and in the analysis of Taiwan severe acute respiratory syndrome data. 相似文献

8.

Dependence Estimation for Marginal Models of Multivariate Survival Data

Segal Mark R. Neuhaus John M. James Ian R. 《Lifetime data analysis》1997,3(3):251-268

We have previously(Segal and Neuhaus, 1993) devised methods for obtaining marginal regression coefficients and associated variance estimates for multivariate survival data, using a synthesis of the Poisson regression formulation for univariate censored survival analysis and generalized estimating equations (GEE's). The method is parametric in that a baseline survival distribution is specified. Analogous semiparametric models, with unspecified baseline survival, have also been developed (Wei, Lin and Weissfeld, 1989; Lin, 1994).Common to both these approaches is the provision of robust variances for the regression parameters. However, none of this work has addressed the more difficult area of dependence estimation. While GEE approaches ostensibly provide such estimates, we show that there are problems adopting these with multivariate survival data. Further, we demonstrate that these problems can affect estimation of the regression coefficients themselves. An alternate, ad hoc approach to dependence estimation, based on design effects, is proposed and evaluated via simulation and illustrative examples. This revised version was published online in July 2006 with corrections to the Cover Date. 相似文献

9.

Regression analysis of longitudinal data with outcome‐dependent sampling and informative censoring

Weining Shen Suyu Liu Yong Chen Jing Ning 《Scandinavian Journal of Statistics》2019,46(3):831-847

We consider a regression analysis of longitudinal data in the presence of outcome‐dependent observation times and informative censoring. Existing approaches commonly require a correct specification of the joint distribution of longitudinal measurements, the observation time process, and informative censoring time under the joint modeling framework and can be computationally cumbersome due to the complex form of the likelihood function. In view of these issues, we propose a semiparametric joint regression model and construct a composite likelihood function based on a conditional order statistics argument. As a major feature of our proposed methods, the aforementioned joint distribution is not required to be specified, and the random effect in the proposed joint model is treated as a nuisance parameter. Consequently, the derived composite likelihood bypasses the need to integrate over the random effect and offers the advantage of easy computation. We show that the resulting estimators are consistent and asymptotically normal. We use simulation studies to evaluate the finite‐sample performance of the proposed method and apply it to a study of weight loss data that motivated our investigation. 相似文献

10.

Some asymptotic results for semiparametric nonlinear mixed-effects models with incomplete data

Wei Liu Lang Wu 《Journal of statistical planning and inference》2010,140(1):52-64

In modeling complex longitudinal data, semiparametric nonlinear mixed-effects (SNLME) models are very flexible and useful. Covariates are often introduced in the models to partially explain the inter-individual variations. In practice, data are often incomplete in the sense that there are often measurement errors and missing data in longitudinal studies. The likelihood method is a standard approach for inference for these models but it can be computationally very challenging, so computationally efficient approximate methods are quite valuable. However, the performance of these approximate methods is often based on limited simulation studies, and theoretical results are unavailable for many approximate methods. In this article, we consider a computationally efficient approximate method for a class of SNLME models with incomplete data and investigate its theoretical properties. We show that the estimates based on the approximate method are consistent and asymptotically normally distributed. 相似文献

11.

A generalized bivariate Bernoulli model with covariate dependence

M. Ataharul Islam Abdulhamid A. Alzaid Rafiqul I. Chowdhury Khalaf S. Sultan 《Journal of applied statistics》2013,40(5):1064-1075

Dependence in outcome variables may pose formidable difficulty in analyzing data in longitudinal studies. In the past, most of the studies made attempts to address this problem using the marginal models. However, using the marginal models alone, it is difficult to specify the measures of dependence in outcomes due to association between outcomes as well as between outcomes and explanatory variables. In this paper, a generalized approach is demonstrated using both the conditional and marginal models. This model uses link functions to test for dependence in outcome variables. The estimation and test procedures are illustrated with an application to the mobility index data from the Health and Retirement Survey and also simulations are performed for correlated binary data generated from the bivariate Bernoulli distributions. The results indicate the usefulness of the proposed method. 相似文献

12.

Simultaneous inference in structured additive conditional copula regression models: a unifying Bayesian approach

Nadja Klein Thomas Kneib 《Statistics and Computing》2016,26(4):841-860

While most regression models focus on explaining distributional aspects of one single response variable alone, interest in modern statistical applications has recently shifted towards simultaneously studying multiple response variables as well as their dependence structure. A particularly useful tool for pursuing such an analysis are copula-based regression models since they enable the separation of the marginal response distributions and the dependence structure summarised in a specific copula model. However, so far copula-based regression models have mostly been relying on two-step approaches where the marginal distributions are determined first whereas the copula structure is studied in a second step after plugging in the estimated marginal distributions. Moreover, the parameters of the copula are mostly treated as a constant not related to covariates and most regression specifications for the marginals are restricted to purely linear predictors. We therefore propose simultaneous Bayesian inference for both the marginal distributions and the copula using computationally efficient Markov chain Monte Carlo simulation techniques. In addition, we replace the commonly used linear predictor by a generic structured additive predictor comprising for example nonlinear effects of continuous covariates, spatial effects or random effects and furthermore allow to make the copula parameters covariate-dependent. To facilitate Bayesian inference, we construct proposal densities for a Metropolis–Hastings algorithm relying on quadratic approximations to the full conditionals of regression coefficients avoiding manual tuning. The performance of the resulting Bayesian estimates is evaluated in simulations comparing our approach with penalised likelihood inference, studying the choice of a specific copula model based on the deviance information criterion, and comparing a simultaneous approach with a two-step procedure. Furthermore, the flexibility of Bayesian conditional copula regression models is illustrated in two applications on childhood undernutrition and macroecology. 相似文献

13.

AN APPROACH FOR JOINTLY MODELING MULTIVARIATE LONGITUDINAL MEASUREMENTS AND DISCRETE TIME-TO-EVENT DATA

Albert PS Shih JH 《The annals of applied statistics》2010,4(3):1517-1532

In many medical studies, patients are followed longitudinally and interest is on assessing the relationship between longitudinal measurements and time to an event. Recently, various authors have proposed joint modeling approaches for longitudinal and time-to-event data for a single longitudinal variable. These joint modeling approaches become intractable with even a few longitudinal variables. In this paper we propose a regression calibration approach for jointly modeling multiple longitudinal measurements and discrete time-to-event data. Ideally, a two-stage modeling approach could be applied in which the multiple longitudinal measurements are modeled in the first stage and the longitudinal model is related to the time-to-event data in the second stage. Biased parameter estimation due to informative dropout makes this direct two-stage modeling approach problematic. We propose a regression calibration approach which appropriately accounts for informative dropout. We approximate the conditional distribution of the multiple longitudinal measurements given the event time by modeling all pairwise combinations of the longitudinal measurements using a bivariate linear mixed model which conditions on the event time. Complete data are then simulated based on estimates from these pairwise conditional models, and regression calibration is used to estimate the relationship between longitudinal data and time-to-event data using the complete data. We show that this approach performs well in estimating the relationship between multivariate longitudinal measurements and the time-to-event data and in estimating the parameters of the multiple longitudinal process subject to informative dropout. We illustrate this methodology with simulations and with an analysis of primary biliary cirrhosis (PBC) data. 相似文献

14.

Local Box–Cox transformation on time-varying parametric models for smoothing estimation of conditional CDF with longitudinal data

Mohammed Chowdhury Colin Wu Reza Modarres 《Journal of Statistical Computation and Simulation》2017,87(15):2900-2914

Nonparametric estimation and inferences of conditional distribution functions with longitudinal data have important applications in biomedical studies, such as epidemiological studies and longitudinal clinical trials. Estimation approaches without any structural assumptions may lead to inadequate and numerically unstable estimators in practice. We propose in this paper a nonparametric approach based on time-varying parametric models for estimating the conditional distribution functions with a longitudinal sample. Our model assumes that the conditional distribution of the outcome variable at each given time point can be approximated by a parametric model after local Box–Cox transformation. Our estimation is based on a two-step smoothing method, in which we first obtain the raw estimators of the conditional distribution functions at a set of disjoint time points, and then compute the final estimators at any time by smoothing the raw estimators. Applications of our two-step estimation method have been demonstrated through a large epidemiological study of childhood growth and blood pressure. Finite sample properties of our procedures are investigated through a simulation study. Application and simulation results show that smoothing estimation from time-variant parametric models outperforms the existing kernel smoothing estimator by producing narrower pointwise bootstrap confidence band and smaller root mean squared error. 相似文献

15.

Estimating non-simplified vine copulas using penalized splines

Christian Schellhase Fabian Spanhel 《Statistics and Computing》2018,28(2):387-409

Vine copulas (or pair-copula constructions) have become an important tool for high-dimensional dependence modeling. Typically, so-called simplified vine copula models are estimated where bivariate conditional copulas are approximated by bivariate unconditional copulas. We present the first nonparametric estimator of a non-simplified vine copula that allows for varying conditional copulas using penalized hierarchical B-splines. Throughout the vine copula, we test for the simplifying assumption in each edge, establishing a data-driven non-simplified vine copula estimator. To overcome the curse of dimensionality, we approximate conditional copulas with more than one conditioning argument by a conditional copula with the first principal component as conditioning argument. An extensive simulation study is conducted, showing a substantial improvement in the out-of-sample Kullback–Leibler divergence if the null hypothesis of a simplified vine copula can be rejected. We apply our method to the famous uranium data and present a classification of an eye state data set, demonstrating the potential benefit that can be achieved when conditional copulas are modeled. 相似文献

16.

The effect of omitted covariates in marginal and partially conditional recurrent event analyses

Zhong Yujie Cook Richard J. 《Lifetime data analysis》2019,25(2):280-300

There have been many advances in statistical methodology for the analysis of recurrent event data in recent years. Multiplicative semiparametric rate-based models are widely used in clinical trials, as are more general partially conditional rate-based models involving event-based stratification. The partially conditional model provides protection against extra-Poisson variation as well as event-dependent censoring, but conditioning on outcomes post-randomization can induce confounding and compromise causal inference. The purpose of this article is to examine the consequences of model misspecification in semiparametric marginal and partially conditional rate-based analysis through omission of prognostic variables. We do so using estimating function theory and empirical studies.

相似文献

17.

Marginal likelihood estimation from the Metropolis output: tips and tricks for efficient implementation in generalized linear latent variable models

《Journal of Statistical Computation and Simulation》2012,82(10):2091-2105

The marginal likelihood can be notoriously difficult to compute, and particularly so in high-dimensional problems. Chib and Jeliazkov employed the local reversibility of the Metropolis–Hastings algorithm to construct an estimator in models where full conditional densities are not available analytically. The estimator is free of distributional assumptions and is directly linked to the simulation algorithm. However, it generally requires a sequence of reduced Markov chain Monte Carlo runs which makes the method computationally demanding especially in cases when the parameter space is large. In this article, we study the implementation of this estimator on latent variable models which embed independence of the responses to the observables given the latent variables (conditional or local independence). This property is employed in the construction of a multi-block Metropolis-within-Gibbs algorithm that allows to compute the estimator in a single run, regardless of the dimensionality of the parameter space. The counterpart one-block algorithm is also considered here, by pointing out the difference between the two approaches. The paper closes with the illustration of the estimator in simulated and real-life data sets. 相似文献

18.

Variational approximation for heteroscedastic linear models and matching pursuit algorithms

David J. Nott Minh-Ngoc Tran Chenlei Leng 《Statistics and Computing》2012,22(2):497-512

Modern statistical applications involving large data sets have focused attention on statistical methodologies which are both efficient computationally and able to deal with the screening of large numbers of different candidate models. Here we consider computationally efficient variational Bayes approaches to inference in high-dimensional heteroscedastic linear regression, where both the mean and variance are described in terms of linear functions of the predictors and where the number of predictors can be larger than the sample size. We derive a closed form variational lower bound on the log marginal likelihood useful for model selection, and propose a novel fast greedy search algorithm on the model space which makes use of one-step optimization updates to the variational lower bound in the current model for screening large numbers of candidate predictor variables for inclusion/exclusion in a computationally thrifty way. We show that the model search strategy we suggest is related to widely used orthogonal matching pursuit algorithms for model search but yields a framework for potentially extending these algorithms to more complex models. The methodology is applied in simulations and in two real examples involving prediction for food constituents using NIR technology and prediction of disease progression in diabetes. 相似文献

19.

Variance component models for longitudinal count data with baseline information: epilepsy data revisited

Marco Alfò Murray Aitkin 《Statistics and Computing》2006,16(3):231-238

Random effect models have often been used in longitudinal data analysis since they allow for association among repeated measurements due to unobserved heterogeneity. Various approaches have been proposed to extend mixed models for repeated count data to include dependence on baseline counts. Dependence between baseline counts and individual-specific random effects result in a complex form of the (conditional) likelihood. An approximate solution can be achieved ignoring this dependence, but this approach could result in biased parameter estimates and in wrong inferences. We propose a computationally feasible approach to overcome this problem, leaving the random effect distribution unspecified. In this context, we show how the EM algorithm for nonparametric maximum likelihood (NPML) can be extended to deal with dependence of repeated measures on baseline counts. 相似文献

20.

A Bayesian nonparametric Markovian model for non-stationary time series

Maria DeYoreo Athanasios Kottas 《Statistics and Computing》2017,27(6):1525-1538

Stationary time series models built from parametric distributions are, in general, limited in scope due to the assumptions imposed on the residual distribution and autoregression relationship. We present a modeling approach for univariate time series data, which makes no assumptions of stationarity, and can accommodate complex dynamics and capture non-standard distributions. The model for the transition density arises from the conditional distribution implied by a Bayesian nonparametric mixture of bivariate normals. This results in a flexible autoregressive form for the conditional transition density, defining a time-homogeneous, non-stationary Markovian model for real-valued data indexed in discrete time. To obtain a computationally tractable algorithm for posterior inference, we utilize a square-root-free Cholesky decomposition of the mixture kernel covariance matrix. Results from simulated data suggest that the model is able to recover challenging transition densities and non-linear dynamic relationships. We also illustrate the model on time intervals between eruptions of the Old Faithful geyser. Extensions to accommodate higher order structure and to develop a state-space model are also discussed. 相似文献