期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Prediction in multilevel generalized linear models 总被引：2，自引：0，他引：2

Anders Skrondal Sophia Rabe-Hesketh 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2009,172(3):659-687

Summary. We discuss prediction of random effects and of expected responses in multilevel generalized linear models. Prediction of random effects is useful for instance in small area estimation and disease mapping, effectiveness studies and model diagnostics. Prediction of expected responses is useful for planning, model interpretation and diagnostics. For prediction of random effects, we concentrate on empirical Bayes prediction and discuss three different kinds of standard errors; the posterior standard deviation and the marginal prediction error standard deviation (comparative standard errors) and the marginal sampling standard deviation (diagnostic standard error). Analytical expressions are available only for linear models and are provided in an appendix . For other multilevel generalized linear models we present approximations and suggest using parametric bootstrapping to obtain standard errors. We also discuss prediction of expectations of responses or probabilities for a new unit in a hypothetical cluster, or in a new (randomly sampled) cluster or in an existing cluster. The methods are implemented in gllamm and illustrated by applying them to survey data on reading proficiency of children nested in schools. Simulations are used to assess the performance of various predictions and associated standard errors for logistic random-intercept models under a range of conditions. 相似文献

2.

Identifying predictors of violent behaviour among students using the conventional logistic and multilevel logistic models

Bidemi Yusuf Olayinka Omigbodun Babatunde Adedokun Odunayo Akinyemi 《Journal of applied statistics》2011,38(5):1055-1061

Analysing individual-, school- and class-level observations is a good and efficient approach in epidemiologic research. Using data on violent behaviour among secondary school students we compared results from the conventional logistic modelling with multilevel logistic modelling approach using the gllamm command in Stata. We illustrated the advantage of multilevel modelling over the conventional logistic modelling through an example of data from violence experience among secondary school students. We constructed a logistic model with a random intercept on the school and class levels to account for unexplained heterogeneity between schools and classes. In the multilevel model, we estimated that, in an average school, the odds of experiencing violence are 3 (OR=2.99, 95% CI: 1.86, 4.81, p<0.0001) times higher for students who use drugs as opposed to the odds of experiencing violence for students who do not use drugs. However, the estimates in the conventional logistic model are slightly lower.

We estimated that a normally distributed random intercept for schools and classes that accounts for any unexplained heterogeneity between schools and classes has variances 0.017 and 0.035, respectively. We therefore recommend the multilevel logistic modelling when data are clustered. 相似文献

3.

Multilevel models for longitudinal data

Fiona Steele 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2008,171(1):5-19

Summary. Repeated measures and repeated events data have a hierarchical structure which can be analysed by using multilevel models. A growth curve model is an example of a multilevel random-coefficients model, whereas a discrete time event history model for recurrent events can be fitted as a multilevel logistic regression model. The paper describes extensions to the basic growth curve model to handle auto-correlated residuals, multiple-indicator latent variables and correlated growth processes, and event history models for correlated event processes. The multilevel approach to the analysis of repeated measures data is contrasted with structural equation modelling. The methods are illustrated in analyses of children's growth, changes in social and political attitudes, and the interrelationship between partnership transitions and childbearing. 相似文献

4.

基于分层模型的缺失数据插补方法研究

于力超金勇进《统计研究》2018,35(11):93-104

大规模抽样调查多采用复杂抽样设计,得到具有分层嵌套结构的调查数据集,其中不可避免会遇到数据缺失问题,针对分层结构含缺失数据集的插补策略目前鲜有研究。本文将Gibbs算法应用到分层含缺失数据集的多重插补过程中,分别研究了固定效应模型插补法和随机效应模型插补法,进而通过理论推导和数值模拟,在不同组内相关系数、群组规模、数据缺失比例等情形下,从参数估计结果的无偏性和有效性两方面,比较不同方法的插补效果,给出插补模型的选择建议。研究结果表明,采用随机效应模型作为插补模型时,得到的参数估计结果更准确,而固定效应模型作为插补模型操作相对简便,在数据缺失比例较小、组内相关系数较大、群组规模较大等情形下,可以采用固定效应插补模型,否则建议采用随机效应插补模型。相似文献

5.

A cluster tree based model selection approach for logistic regression classifier

Ozge Tanju 《Journal of Statistical Computation and Simulation》2018,88(7):1394-1414

Model selection methods are important to identify the best approximating model. To identify the best meaningful model, purpose of the model should be clearly pre-stated. The focus of this paper is model selection when the modelling purpose is classification. We propose a new model selection approach designed for logistic regression model selection where main modelling purpose is classification. The method is based on the distance between the two clustering trees. We also question and evaluate the performances of conventional model selection methods based on information theory concepts in determining best logistic regression classifier. An extensive simulation study is used to assess the finite sample performances of the cluster tree based and the information theoretic model selection methods. Simulations are adjusted for whether the true model is in the candidate set or not. Results show that the new approach is highly promising. Finally, they are applied to a real data set to select a binary model as a means of classifying the subjects with respect to their risk of breast cancer. 相似文献

6.

A data-driven selection of the number of clusters in the Dirichlet allocation model via Bayesian mixture modelling

E. F. Saraiva C. A. B. Pereira A. K. Suzuki 《Journal of Statistical Computation and Simulation》2019,89(15):2848-2870

In this paper, we consider a Bayesian mixture model that allows us to integrate out the weights of the mixture in order to obtain a procedure in which the number of clusters is an unknown quantity. To determine clusters and estimate parameters of interest, we develop an MCMC algorithm denominated by sequential data-driven allocation sampler. In this algorithm, a single observation has a non-null probability to create a new cluster and a set of observations may create a new cluster through the split-merge movements. The split-merge movements are developed using a sequential allocation procedure based in allocation probabilities that are calculated according to the Kullback–Leibler divergence between the posterior distribution using the observations previously allocated and the posterior distribution including a ‘new’ observation. We verified the performance of the proposed algorithm on the simulated data and then we illustrate its use on three publicly available real data sets. 相似文献

7.

The effect of number of clusters and cluster size on statistical power and Type I error rates when testing random effects variance components in multilevel linear and logistic regression models

Peter C. Austin George Leckie 《Journal of Statistical Computation and Simulation》2018,88(16):3151-3163

When using multilevel regression models that incorporate cluster-specific random effects, the Wald and the likelihood ratio (LR) tests are used for testing the null hypothesis that the variance of the random effects distribution is equal to zero. We conducted a series of Monte Carlo simulations to examine the effect of the number of clusters and the number of subjects per cluster on the statistical power to detect a non-null random effects variance and to compare the empirical type I error rates of the Wald and LR tests. Statistical power increased with increasing number of clusters and number of subjects per cluster. Statistical power was greater for the LR test than for the Wald test. These results applied to both the linear and logistic regressions, but were more pronounced for the latter. The use of the LR test is preferable to the use of the Wald test. 相似文献

8.

Multilevel models for repeated binary outcomes: attitudes and voting over the electoral cycle

M. Yang H. Goldstein & A. Heath 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》1999,163(1):49-62

Models for fitting longitudinal binary responses are explored by using a panel study of voting intentions. A standard multilevel repeated measures logistic model is shown to be inadequate owing to a substantial proportion of respondents who maintain a constant response over time. A multivariate binary response model is shown to be a better fit to the data. 相似文献

9.

Outliers in multilevel data 总被引：2，自引：0，他引：2

I. H. Langford & T. Lewis 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》1998,161(2):121-160

This paper offers the data analyst a range of practical procedures for dealing with outliers in multilevel data. It first develops several techniques for data exploration for outliers and outlier analysis and then applies these to the detailed analysis of outliers in two large scale multilevel data sets from educational contexts. The techniques include the use of deviance reduction, measures based on residuals, leverage values, hierarchical cluster analysis and a measure called DFITS. Outlier analysis is more complex in a multilevel data set than in, say, a univariate sample or a set of regression data, where the concept of an outlying value is straightforward. In the multilevel situation one has to consider, for example, at what level or levels a particular response is outlying, and in respect of which explanatory variables; furthermore, the treatment of a particular response at one level may affect its status or the status of other units at other levels in the model. 相似文献

10.

A semiparametric multilevel survival model 总被引：1，自引：0，他引：1

Wenyang Zhang Fiona Steele 《Journal of the Royal Statistical Society. Series C, Applied statistics》2004,53(2):387-404

Summary. We propose a semiparametric multilevel survival model for clustered duration data in which the effect of a continuous covariate is represented by an unspecified, possibly non-linear, function. This model makes no distributional assumption about the cluster level random effects. The performance of the method is assessed via Monte Carlo simulations. The model is applied in an analysis of first-birth intervals in Bangladesh to examine period effects in the timing of first births, while allowing for clustering within communities; the analysis reveals a non-linear trend in the first-birth interval over time. 相似文献

11.

Optimality of Equal vs. Unequal Cluster Sizes in Multilevel Intervention Studies: A Monte Carlo Study for Small Sample Sizes

Math J. J. M. Candel Gerard J. P. Van Breukelen Larissa Kotova Martijn P. F. Berger 《统计学通讯:模拟与计算》2013,42(1):222-239

Optimality of equal versus unequal cluster sizes in the context of multilevel intervention studies is examined. A Monte Carlo study is done to examine to what degree asymptotic results on the optimality hold for realistic sample sizes and for different estimation methods. The relative D-criterion, comparing equal versus unequal cluster sizes, almost always exceeded 85%, implying that loss of information due to unequal cluster sizes can be compensated for by increasing the number of clusters by 18%. The simulation results are in line with asymptotic results, showing that, for realistic sample sizes and various estimation methods, the asymptotic results can be used in planning multilevel intervention studies. 相似文献

12.

Orthogonalized Residuals for Estimation of Marginally Specified Association Parameters in Multivariate Binary Data

BAHJAT F. QAQISH RICHARD C. ZINK JOHN S. PREISSER 《Scandinavian Journal of Statistics》2012,39(3):515-527

Abstract. This paper focuses on marginal regression models for correlated binary responses when estimation of the association structure is of primary interest. A new estimating function approach based on orthogonalized residuals is proposed. A special case of the proposed procedure allows a new representation of the alternating logistic regressions method through marginal residuals. The connections between second‐order generalized estimating equations, alternating logistic regressions, pseudo‐likelihood and other methods are explored. Efficiency comparisons are presented, with emphasis on variable cluster size and on the role of higher‐order assumptions. The new method is illustrated with an analysis of data on impaired pulmonary function. 相似文献

13.

Meta-analysis using multilevel models with an application to the study of class size effects 总被引：1，自引：0，他引：1

Harvey Goldstein Min Yang Rumana Omar Rebecca Turner & Simon Thompson 《Journal of the Royal Statistical Society. Series C, Applied statistics》2000,49(3):399-412

Meta-analysis is formulated as a special case of a multilevel (hierarchical data) model in which the highest level is that of the study and the lowest level that of an observation on an individual respondent. Studies can be combined within a single model where the responses occur at different levels of the data hierarchy and efficient estimates are obtained. An example is given from studies of class sizes and achievement in schools, where study data are available at the aggregate level in terms of overall mean values for classes of different sizes, and also at the student level. 相似文献

14.

Diagnostics of multiple group influential observations for logistic regression models

Burcin Coskun 《Journal of Statistical Computation and Simulation》2019,89(16):3118-3136

In this paper, two new multiple influential observation detection methods, GCD.GSPR and mCD*, are introduced for logistic regression. The proposed diagnostic measures are compared with the generalized difference in fits (GDFFITS) and the generalized squared difference in beta (GSDFBETA), which are multiple influential diagnostics. The simulation study is conducted with one, two and five independent variable logistic regression models. The performance of the diagnostic measures is examined for a single contaminated independent variable for each model and in the case where all the independent variables are contaminated with certain contamination rates and intensity. In addition, the performance of the diagnostic measures is compared in terms of the correct identification rate and swamping rate via a frequently referred to data set in the literature. 相似文献

15.

Multilevel modelling of complex survey data

Sophia Rabe-Hesketh Anders Skrondal 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2006,169(4):805-827

Summary. Multilevel modelling is sometimes used for data from complex surveys involving multistage sampling, unequal sampling probabilities and stratification. We consider generalized linear mixed models and particularly the case of dichotomous responses. A pseudolikelihood approach for accommodating inverse probability weights in multilevel models with an arbitrary number of levels is implemented by using adaptive quadrature. A sandwich estimator is used to obtain standard errors that account for stratification and clustering. When level 1 weights are used that vary between elementary units in clusters, the scaling of the weights becomes important. We point out that not only variance components but also regression coefficients can be severely biased when the response is dichotomous. The pseudolikelihood methodology is applied to complex survey data on reading proficiency from the American sample of the 'Program for international student assessment' 2000 study, using the Stata program gllamm which can estimate a wide range of multilevel and latent variable models. Performance of pseudo-maximum-likelihood with different methods for handling level 1 weights is investigated in a Monte Carlo experiment. Pseudo-maximum-likelihood estimators of (conditional) regression coefficients perform well for large cluster sizes but are biased for small cluster sizes. In contrast, estimators of marginal effects perform well in both situations. We conclude that caution must be exercised in pseudo-maximum-likelihood estimation for small cluster sizes when level 1 weights are used. 相似文献

16.

A generalized linear mixed model for longitudinal binary data with a marginal logit link function

Parzen M Ghosh S Lipsitz S Sinha D Fitzmaurice GM Mallick BK Ibrahim JG 《The annals of applied statistics》2011,5(1):449-467

Longitudinal studies of a binary outcome are common in the health, social, and behavioral sciences. In general, a feature of random effects logistic regression models for longitudinal binary data is that the marginal functional form, when integrated over the distribution of the random effects, is no longer of logistic form. Recently, Wang and Louis (2003) proposed a random intercept model in the clustered binary data setting where the marginal model has a logistic form. An acknowledged limitation of their model is that it allows only a single random effect that varies from cluster to cluster. In this paper, we propose a modification of their model to handle longitudinal data, allowing separate, but correlated, random intercepts at each measurement occasion. The proposed model allows for a flexible correlation structure among the random intercepts, where the correlations can be interpreted in terms of Kendall's τ. For example, the marginal correlations among the repeated binary outcomes can decline with increasing time separation, while the model retains the property of having matching conditional and marginal logit link functions. Finally, the proposed method is used to analyze data from a longitudinal study designed to monitor cardiac abnormalities in children born to HIV-infected women. 相似文献

17.

Coherent mortality forecasting by the weighted multilevel functional principal component approach

Ruhao Wu 《Journal of applied statistics》2019,46(10):1774-1791

In human mortality modelling, if a population consists of several subpopulations it can be desirable to model their mortality rates simultaneously while taking into account the heterogeneity among them. The mortality forecasting methods tend to result in divergent forecasts for subpopulations when independence is assumed. However, under closely related social, economic and biological backgrounds, mortality patterns of these subpopulations are expected to be non-divergent in the future. In this article, we propose a new method for coherent modelling and forecasting of mortality rates for multiple subpopulations, in the sense of nondivergent life expectancy among subpopulations. The mortality rates of subpopulations are treated as multilevel functional data and a weighted multilevel functional principal component (wMFPCA) approach is proposed to model and forecast them. The proposed model is applied to sex-specific data for nine developed countries, and the results show that, in terms of overall forecasting accuracy, the model outperforms the independent model and the Product-Ratio model as well as the unweighted multilevel functional principal component approach. 相似文献

18.

Robust interval forecasting algorithm based on a probabilistic cluster model

Yury Krakovsky 《Journal of Statistical Computation and Simulation》2018,88(12):2309-2324

For substantiation of managerial decisions the forecasting results of dynamic indicators are used. Therefore, forecasting accuracy of these indicators must be acceptable. Consequently, forecasting algorithms are constantly improved to get the acceptable accuracy. This paper considers a variant of the method of forecasting binary outcomes. This method allows prediction of whether or not a future value of the indicator exceeds a predetermined value. This method ‘interval forecasting’ was named. In this paper a robust interval forecasting algorithm based on a probabilistic cluster model is proposed. The algorithm’s accuracy was compared with an algorithm based on logistic regression. The indicators with different statistical properties were chosen. The obtained results have shown the accuracy of both the algorithms is approximately similar in most cases. However, the cases when the algorithm based on logistic regression demonstrated unacceptable accuracy, unlike the presented algorithm have been identified. Thus, this new algorithm is more accurate. 相似文献

19.

Assessing the performance of variational methods for mixed logistic regression models

《Journal of Statistical Computation and Simulation》2012,82(8):765-779

We present a variational estimation method for the mixed logistic regression model. The method is based on a lower bound approximation of the logistic function [Jaakkola, J.S. and Jordan, M.I., 2000, Bayesian parameter estimation via variational methods. Statistics & Computing, 10, 25–37.]. Based on the approximation, an EM algorithm can be derived that results in a considerable simplification of the maximization problem in that it does not require the numerical evaluation of integrals over the random effects. We assess the performance of the variational method for the mixed logistic regression model in a simulation study and an empirical data example, and compare it to Laplace's method. The results indicate that the variational method is a viable choice for estimating the fixed effects of the mixed logistic regression model under the condition that the number of outcomes within each cluster is sufficiently high. 相似文献

20.

A resampling approach to estimate variance components of multilevel models

Zilin Wang Mary E. Thompson 《Revue canadienne de statistique》2012,40(1):150-171

In a multilevel model for complex survey data, the weight‐inflated estimators of variance components can be biased. We propose a resampling method to correct this bias. The performance of the bias corrected estimators is studied through simulations using populations generated from a simple random effects model. The simulations show that, without lowering the precision, the proposed procedure can reduce the bias of the estimators, especially for designs that are both informative and have small cluster sizes. Application of these resampling procedures to data from an artificial workplace survey provides further evidence for the empirical value of this method. The Canadian Journal of Statistics 40: 150–171; 2012 © 2012 Statistical Society of Canada 相似文献