期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Modeling proportions and marginal counts simultaneously for clustered multinomial data with random cluster sizes

Guohua Yan Renjun Ma 《Journal of applied statistics》2016,43(6):1074-1087

Clustered multinomial data with random cluster sizes commonly appear in health, environmental and ecological studies. Traditional approaches for analyzing clustered multinomial data contemplate two assumptions. One of these assumptions is that cluster sizes are fixed, whereas the other demands cluster sizes to be positive. Randomness of the cluster sizes may be the determinant of the within-cluster correlation and between-cluster variation. We propose a baseline-category mixed model for clustered multinomial data with random cluster sizes based on Poisson mixed models. Our orthodox best linear unbiased predictor approach to this model depends only on the moment structure of unobserved distribution-free random effects. Our approach also consolidates the marginal and conditional modeling interpretations. Unlike the traditional methods, our approach can accommodate both random and zero cluster sizes. Two real-life multinomial data examples, crime data and food contamination data, are used to manifest our proposed methodology. 相似文献

2.

A model selection criterion for clustered survival analysis with informative cluster size

Li-Chu Chien Li-Ying Chang Chung-Wei Shen 《Pharmaceutical statistics》2023,22(1):79-95

We propose a model selection criterion for correlated survival data when the cluster size is informative to the outcome. This approach, called Resampling Cluster Survival Information Criterion (RCSIC), uses the Cox proportional hazards model that is weighted with the inverse of the cluster size. The RCSIC based on the within-cluster resampling idea takes into account the possible variability of the within-cluster subsampling and the possible informativeness of cluster sizes. The RCSIC allows for easy execution for the within-cluster resampling idea without a large number of resamples of the data. In contrast with the traditional model selection method in survival analysis, the RCSIC has an additional penalization for the within-cluster subsampling variability. Our simulations show the satisfactory results where the RCSIC provides a more robust power for variable selection in terms of clustered survival analysis, regardless of whether informative cluster size exists or not. Applying the RCSIC method to a periodontal disease studies, we identify the tooth loss in patients associated with the risk factors, Age, Filled Tooth, Molar, Crown, Decayed Tooth, and Smoking Status, respectively. 相似文献

3.

Regression analysis of clustered interval-censored failure time data with linear transformation models in the presence of informative cluster size

Hui Zhao Chenchen Ma Junlong Li 《Journal of nonparametric statistics》2018,30(3):703-715

This paper discusses regression analysis of clustered interval-censored failure time data, which often occur in medical follow-up studies among other areas. For such data, sometimes the failure time may be related to the cluster size, the number of subjects within each cluster or we have informative cluster sizes. For the problem, we present a within-cluster resampling method for the situation where the failure time of interest can be described by a class of linear transformation models. In addition to the establishment of the asymptotic properties of the proposed estimators of regression parameters, an extensive simulation study is conducted for the assessment of the finite sample properties of the proposed method and suggests that it works well in practical situations. An application to the example that motivated this study is also provided. 相似文献

4.

Comparison of GEE1 and GEE2 estimation applied to clustered logistic regression

《Journal of Statistical Computation and Simulation》2012,82(4):361-378

Generalized estimating equations (GEE) have become a popular method for marginal regression modelling of data that occur in clusters. Features of the GEE methodology are the use of a ‘working covariance’, an approximation to the underlying covariance, which is used to improve the efficiency in estimating the regression coefficients, and the ‘sandwich’ estimate of variance, which provides a way of consistently estimating their standard errors. These techniques have been extended to include estimating equations for the underlying correlation structure, both to improve the efficiency of the regression coefficient estimates and to provide estimates of correlations between units in a cluster, when these are of interest. If the mean structure is of primary interest, then a simpler set of equations (GEE1) can be used, whereas if the underlying covariance structure is of interest in its own right, the use of the more complex GEE2 estimating equations is often recommended. In this paper, we compare the effect of increasing the complexity of the ‘working covariances’ on the variance of the parameter estimates, as well as the mean-squared error of the ‘sandwich’ estimate of variance. We give asymptotic expressions for these variances and mean-squared error terms. We use these to study the behaviour of different variants of GEE1 and GEE2 when we change the number of clusters, the cluster size, and the within-cluster correlation. We conclude that the extra complexity of the full GEE2 approach is not usually justified if the mean structure is of primary interest. 相似文献

5.

Comparison of different computational implementations on fitting generalized linear mixed-effects models for repeated count measures

《Journal of Statistical Computation and Simulation》2012,82(12):2392-2404

ABSTRACT

In modelling repeated count outcomes, generalized linear mixed-effects models are commonly used to account for within-cluster correlations. However, inconsistent results are frequently generated by various statistical R packages and SAS procedures, especially in case of a moderate or strong within-cluster correlation or overdispersion. We investigated the underlying numerical approaches and statistical theories on which these packages and procedures are built. We then compared the performance of these statistical packages and procedures by simulating both Poisson-distributed and overdispersed count data. The SAS NLMIXED procedure outperformed the others procedures in all settings. 相似文献

6.

A mixed-effects least square support vector regression model for three-level count data

Mohammad Moqaddasi Amiri Leili Tapak 《Journal of Statistical Computation and Simulation》2019,89(15):2801-2812

Hierarchical study design often occurs in many areas such as epidemiology, psychology, sociology, public health, engineering, and agriculture. This imposes correlation in data structure that needs to be account for in modelling process. In this study, a three-level mixed-effects least squares support vector regression (MLS-SVR) model is proposed to extend the standard least squares support vector regression (LS-SVR) model for handling cluster correlated data. The MLS-SVR model incorporates multiple random effects which allow handling unequal number of observations for each case at non-fixed time points (a very unbalanced situation) and correlation between subjects simultaneously. The methodology consists of a regression modelling step that is performed straightforwardly by solving a linear system. The proposed model is illustrated through numerical studies on simulated data sets and a real data example on human Brucellosis frequency. The generalization performance of the proposed MLS-SVR is evaluated by comparing to ordinary LS-SVR and some other parametric models. 相似文献

7.

A comparison of centring parameterisations of Gaussian process-based models for Bayesian computation using MCMC

Mark R. Bass Sujit K. Sahu 《Statistics and Computing》2017,27(6):1491-1512

Markov chain Monte Carlo (MCMC) algorithms for Bayesian computation for Gaussian process-based models under default parameterisations are slow to converge due to the presence of spatial- and other-induced dependence structures. The main focus of this paper is to study the effect of the assumed spatial correlation structure on the convergence properties of the Gibbs sampler under the default non-centred parameterisation and a rival centred parameterisation (CP), for the mean structure of a general multi-process Gaussian spatial model. Our investigation finds answers to many pertinent, but as yet unanswered, questions on the choice between the two. Assuming the covariance parameters to be known, we compare the exact rates of convergence of the two by varying the strength of the spatial correlation, the level of covariance tapering, the scale of the spatially varying covariates, the number of data points, the number and the structure of block updating of the spatial effects and the amount of smoothness assumed in a Matérn covariance function. We also study the effects of introducing differing levels of geometric anisotropy in the spatial model. The case of unknown variance parameters is investigated using well-known MCMC convergence diagnostics. A simulation study and a real-data example on modelling air pollution levels in London are used for illustrations. A generic pattern emerges that the CP is preferable in the presence of more spatial correlation or more information obtained through, for example, additional data points or by increased covariate variability. 相似文献

8.

Modelling the role of variables in model-based cluster analysis

Giuliano Galimberti Annamaria Manisi Gabriele Soffritti 《Statistics and Computing》2018,28(1):145-169

In the framework of cluster analysis based on Gaussian mixture models, it is usually assumed that all the variables provide information about the clustering of the sample units. Several variable selection procedures are available in order to detect the structure of interest for the clustering when this structure is contained in a variable sub-vector. Currently, in these procedures a variable is assumed to play one of (up to) three roles: (1) informative, (2) uninformative and correlated with some informative variables, (3) uninformative and uncorrelated with any informative variable. A more general approach for modelling the role of a variable is proposed by taking into account the possibility that the variable vector provides information about more than one structure of interest for the clustering. This approach is developed by assuming that such information is given by non-overlapped and possibly correlated sub-vectors of variables; it is also assumed that the model for the variable vector is equal to a product of conditionally independent Gaussian mixture models (one for each variable sub-vector). Details about model identifiability, parameter estimation and model selection are provided. The usefulness and effectiveness of the described methodology are illustrated using simulated and real datasets. 相似文献

9.

Quantile dispersion graphs to compare the efficiencies of cluster randomized designs

S. Mukhopadhyay S. W. Looney 《Journal of applied statistics》2009,36(11):1293-1305

The purpose of this article is to compare efficiencies of several cluster randomized designs using the method of quantile dispersion graphs (QDGs). A cluster randomized design is considered whenever subjects are randomized at a group level but analyzed at the individual level. A prior knowledge of the correlation existing between subjects within the same cluster is necessary to design these cluster randomized trials. Using the QDG approach, we are able to compare several cluster randomized designs without requiring any information on the intracluster correlation. For a given design, several quantiles of the power function, which are directly related to the effect size, are obtained for several effect sizes. The quantiles depend on the intracluster correlation present in the model. The dispersion of these quantiles over the space of the unknown intracluster correlation is determined, and then depicted by the QDGs. Two applications of the proposed methodology are presented. 相似文献

10.

The Evaluation of the Performance of IPWGEE,a Simulation Study

Maria Iachina 《统计学通讯:模拟与计算》2013,42(6):1212-1227

This article compares two recently proposed test statistics for unobserved cluster effects (C, SSR _w) with three statistics frequently mentioned in panel econometrics (BP, SLM, F). Simulations include data generating processes with a cluster-level explanatory variable, scenarios with unequally sized clusters, processes that have an incorrectly specified cluster structure, and processes that have no cluster structure but rather spatial correlation. All but the F test exhibit small-sample deviation from the asymptotic distribution. The SLM, F, and SSR _w tests show equivalent power when cluster sizes are balanced. SLM has greatest power when cluster sizes are unbalanced. 相似文献

11.

Marginal Projected Multivariate Linear Models for Clustered Angular Data

下载免费PDF全文

Daniel B. Hall Jing Shen 《Australian & New Zealand Journal of Statistics》2015,57(2):241-257

Among the diverse frameworks that have been proposed for regression analysis of angular data, the projected multivariate linear model provides a particularly appealing and tractable methodology. In this model, the observed directional responses are assumed to correspond to the angles formed by latent bivariate normal random vectors that are assumed to depend upon covariates through a linear model. This implies an angular normal distribution for the observed angles, and incorporates a regression structure through a familiar and convenient relationship. In this paper we extend this methodology to accommodate clustered data (e.g., longitudinal or repeated measures data) by formulating a marginal version of the model and basing estimation on an EM‐like algorithm in which correlation among within‐cluster responses is taken into account by incorporating a working correlation matrix into the M step. A sandwich estimator is used for the parameter estimates’ covariance matrix. The methodology is motivated and illustrated using an example involving clustered measurements of microbril angle on loblolly pine (Pinus taeda L.) Simulation studies are presented that evaluate the finite sample properties of the proposed fitting method. In addition, the relationship between within‐cluster correlation on the latent Euclidean vectors and the corresponding correlation structure for the observed angles is explored. 相似文献

12.

Comparison of the estimators of the intra-cluster correlation for the nested error regression model

Sukanya Intarapak Rawee Suwandechochai 《统计学通讯:模拟与计算》2017,46(3):2057-2070

The intra-cluster correlation is insisted on nested error regression model that, in practice, is rarely known. This article demonstrates the size in generalized least squares (GLS) F-test using Fuller–Battese transformation and modification F-test. For the balanced case, the former using strictly positive, analysis of covariance (ANCOVA) and analysis of variance (ANOVA) estimators of intra-cluster correlation can control the size for moderate intra-cluster correlations. For small intra-cluster correlation, they perform well when the numbers of cluster are large. The latter using the ANOVA estimator performs well except for small numbers of cluster. When intra-cluster correlation is large, it cannot control the size. For the unbalanced case, the GLS F-test using the Fuller–Battese transformation and the modification F-test using the strictly positive, the ANCOVA and the ANOVA estimators maintain the significance level for small total sample size and small intra-cluster correlations when there is a large variation in cluster sizes, but they perform well in controlling the size for large total sample size and small different variation in cluster sizes. Besides, Henderson’s method 3 estimator maintains the significance level for a few situations. 相似文献

13.

Confidence intervals for the difference between two median survival times for clustered survival data

Yu-Mei Chang Pao-Sheng Shen Guan-Wei Liu 《Journal of applied statistics》2016,43(12):2325-2345

Clustered survival data arise often in clinical trial design, where the correlated subunits from the same cluster are randomized to different treatment groups. Under such design, we consider the problem of constructing confidence interval for the difference of two median survival time given the covariates. We use Cox gamma frailty model to account for the within-cluster correlation. Based on the conditional confidence intervals, we can identify the possible range of covariates over which the two groups would provide different median survival times. The associated coverage probability and the expected length of the proposed interval are investigated via a simulation study. The implementation of the confidence intervals is illustrated using a real data set. 相似文献

14.

Population-averaged and subjectspecific approaches for clustered categorical data

《Journal of Statistical Computation and Simulation》2012,82(1-3):231-253

Modeling clustered categorical data based on extensions of generalized linear model theory has received much attention in recent years. The rapidly increasing number of approaches suitable for categorical data in which clusters are uncorrelated, but correlations exist within a cluster, has caused uncertainty among applied scientists as to their respective merits and demerits. Upon centering estimation around solving an unbiased estimating function for mean parameters and estimation of covariance parameters describing within-cluster or among-cluster heterogeneity, many approaches can easily be related. This contribution describes a series of algorithms and their implementation in detail, based on a classification of inferential procedures for clustered data. 相似文献

15.

On the use of between–within models to adjust for confounding due to unmeasured cluster-level covariates

Babette A. Brumback Li Li Zhuangyu Cai 《统计学通讯:模拟与计算》2017,46(5):3841-3854

Between–within models are generalized linear mixed models (GLMMs) for clustered data that incorporate a random intercept together with fixed effects for within-cluster and between-cluster covariates; the between-cluster covariates represent the cluster means of the within-cluster covariates. One popular use of these models is to adjust for confounding of the effect of within-cluster covariates due to unmeasured between-cluster covariates. Previous research has shown via simulations that using this approach can yield inconsistent estimators. We present theory and simulations as evidence that a primary cause of the inconsistency is heteroscedasticity of the linearized version of the GLMM used for estimation. 相似文献

16.

Randomization of Clusters Versus Randomization of Persons Within Clusters

《The American statistician》2013,67(2):173-179

Many experiments aim at populations with persons nested within clusters. Randomization to treatment conditions can be done at the cluster level or at the person level within each cluster. The latter may result in control group contamination, and cluster randomization is therefore oftenpreferred in practice. This article models the control group contamination, calculates the required sample sizes for both levels of randomization, and gives the degree of contamination for which cluster randomization is preferable above randomization of persons within clusters. Moreover, itprovides examples of situations where one has to make a choice between both levels of randomization. 相似文献

17.

Optimal model averaging estimation for correlation structure in generalized estimating equations

Fang Fang Jingli Wang 《统计学通讯:模拟与计算》2019,48(5):1574-1593

Longitudinal data analysis requires a proper estimation of the within-cluster correlation structure in order to achieve efficient estimates of the regression parameters. When applying likelihood-based methods one may select an optimal correlation structure by the AIC or BIC. However, such information criteria are not applicable for estimating equation based approaches. In this paper we develop a model averaging approach to estimate the correlation matrix by a weighted sum of a group of patterned correlation matrices under the GEE framework. The optimal weight is determined by minimizing the difference between the weighted sum and a consistent yet inefficient estimator of the correlation structure. The computation of our proposed approach only involves a standard quadratic programming on top of the standard GEE procedure and can be easily implemented in practice. We provide theoretical justifications and extensive numerical simulations to support the application of the proposed estimator. A couple of well-known longitudinal data sets are revisited where we implement and illustrate our methodology. 相似文献

18.

Some Properties of the Liang-Zeger Method Applied to Clustered Binary Regression

Andrew Balemi & Alan Lee 《Australian & New Zealand Journal of Statistics》1999,41(1):43-58

The Generalized Estimating Equation (GEE) method popularized by Liang and Zeger provides a very general method for fitting regression models to observations that occur in clusters. Features of the method are the specification of a 'working correlation' (a guess at the true correlation structure of the data) which is used to improve efficiency in estimating the regression coefficients, and the 'information sandwich' which provides a way of consistently estimating the standard errors of the estimated regression coefficients even if (as we might expect) the working correlation is wrong. This paper develops asymptotic expressions for the bias and efficiency both of the regression coefficient estimates and of the sandwich estimate, and uses them to study the behaviour of the estimates.
It looks at the effect of the choice of the working correlation on the estimate and also examines the effect of different cluster sizes and different degrees of correlation between the covariates. The performance of these methods is found to be excellent, particularly when the degree of correlation in the responses and covariates is small to moderate. 相似文献

19.

Generalized Additive Modelling of Mixed Distribution Markov Models with Application to Melbourne's Rainfall

Rob J. Hyndman & Gary K. Grunwald 《Australian & New Zealand Journal of Statistics》2000,42(2):145-158

The paper considers the modelling of time series using a generalized additive model with first-order Markov structure and mixed transition density having a discrete component at zero and a continuous component with positive sample space. Such models have application, for example, in modelling daily occurrence and intensity of rainfall, and in modelling numbers and sizes of insurance claims. The paper shows how these methods extend the usual sinusoidal seasonal assumption in standard chain-dependent models by assuming a general smooth pattern of occurrence and intensity over time. These models can be fitted using standard statistical software. The methods of Grunwald & Jones (2000) can be used to combine these separate occurrence and intensity models into a single model for amount. The models are used to investigate the relationship between the Southern Oscillation Index and Melbourne's rainfall, illustrated with 36 years of rainfall data from Melbourne, Australia. 相似文献

20.

Modelling method effects as individual causal effects

Steffi Pohl Rolf Steyer Katrin Kraus 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2008,171(1):41-63

Summary. Method effects often occur when different methods are used for measuring the same construct. We present a new approach for modelling this kind of phenomenon, consisting of a definition of method effects and a first model, the method effect model , that can be used for data analysis. This model may be applied to multitrait–multimethod data or to longitudinal data where the same construct is measured with at least two methods at all occasions. In this new approach, the definition of the method effects is based on the theory of individual causal effects by Neyman and Rubin. Method effects are accordingly conceptualized as the individual effects of applying measurement method j instead of k . They are modelled as latent difference scores in structural equation models. A reference method needs to be chosen against which all other methods are compared. The model fit is invariant to the choice of the reference method. The model allows the estimation of the average of the individual method effects, their variance, their correlation with the traits (and other latent variables) and the correlation of different method effects among each other. Furthermore, since the definition of the method effects is in line with the theory of causality, the method effects may (under certain conditions) be interpreted as causal effects of the method. The method effect model is compared with traditional multitrait–multimethod models. An example illustrates the application of the model to longitudinal data analysing the effect of negatively (such as 'feel bad') as compared with positively formulated items (such as 'feel good') measuring mood states. 相似文献