期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multilevel modelling of complex survey data

Sophia Rabe-Hesketh Anders Skrondal 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2006,169(4):805-827

Summary. Multilevel modelling is sometimes used for data from complex surveys involving multistage sampling, unequal sampling probabilities and stratification. We consider generalized linear mixed models and particularly the case of dichotomous responses. A pseudolikelihood approach for accommodating inverse probability weights in multilevel models with an arbitrary number of levels is implemented by using adaptive quadrature. A sandwich estimator is used to obtain standard errors that account for stratification and clustering. When level 1 weights are used that vary between elementary units in clusters, the scaling of the weights becomes important. We point out that not only variance components but also regression coefficients can be severely biased when the response is dichotomous. The pseudolikelihood methodology is applied to complex survey data on reading proficiency from the American sample of the 'Program for international student assessment' 2000 study, using the Stata program gllamm which can estimate a wide range of multilevel and latent variable models. Performance of pseudo-maximum-likelihood with different methods for handling level 1 weights is investigated in a Monte Carlo experiment. Pseudo-maximum-likelihood estimators of (conditional) regression coefficients perform well for large cluster sizes but are biased for small cluster sizes. In contrast, estimators of marginal effects perform well in both situations. We conclude that caution must be exercised in pseudo-maximum-likelihood estimation for small cluster sizes when level 1 weights are used. 相似文献

2.

Hierarchical Bayesian Models for the Estimation of Correlated Effects in Multilevel Data: A Simulation Study to Assess Model Performance

Giulia Roli Paola Monari 《统计学通讯:理论与方法》2013,42(12):2644-2653

In this article, we aim at assessing hierarchical Bayesian modeling for the analysis of multiple exposures and highly correlated effects in a multilevel setting. We exploit an artificial data set to apply our method and show the gains in the final estimates of the crucial parameters. As a motivating example to simulate data, we consider a real prospective cohort study designed to investigate the association of dietary exposures with the occurrence of colon-rectum cancer in a multilevel framework, where, e.g., individuals have been enrolled from different countries or cities. We rely on the presence of some additional information suitable to mediate the final effects of the exposures and to be arranged in a level-2 regression to model similarities among the parameters of interest (e.g., data on the nutrient compositions for each dietary item). 相似文献

3.

The effects of missing serial effects and/or heteroscedastic errors on mixed models using repeated growth data

《Journal of Statistical Computation and Simulation》2012,82(16):3367-3382

When a two-level multilevel model (MLM) is used for repeated growth data, the individuals constitute level 2 and the successive measurements constitute level 1, which is nested within the individuals that make up level 2. The heterogeneity among individuals is represented by either the random-intercept or random-coefficient (slope) model. The variance components at level 1 involve serial effects and measurement errors under constant variance or heteroscedasticity. This study hypothesizes that missing serial effects or/and heteroscedasticity may bias the results obtained from two-level models. To illustrate this effect, we conducted two simulation studies, where the simulated data were based on the characteristics of an empirical mouse tumour data set. The results suggest that for repeated growth data with constant variance (measurement error) and misspecified serial effects (ρ > 0.3), the proportion of level-2 variation (intra-class correlation coefficient) increases with ρ and the two-level random-coefficient model is the minimum AIC (or AIC_c) model when compared with the fixed model, heteroscedasticity model, and random-intercept model. In addition, the serial effect (ρ > 0.1) and heteroscedasticity are both misspecified, implying that the two-level random-coefficient model is the minimum AIC (or AIC_c) model when compared with the fixed model and random-intercept model. This study demonstrates that missing serial effects and/or heteroscedasticity may indicate heterogeneity among individuals in repeated growth data (mixed or two-level MLM). This issue is critical in biomedical research. 相似文献

4.

基于EM算法的改进MLM模型及参数估计

闵素芹何晓群《统计与信息论坛》2013,28(4):14-18

传统的分层模型假设组与组之间独立,没有考虑组之间的相关性。而以地理单元分组的数据往往具有空间依赖性,个体不仅受本地区的影响,也可能受相邻地区的影响。此时,传统分层模型层-2残差分布的假设不再成立。为了处理空间分层数据,将空间统计和空间计量经济模型的思想引入到分层模型中,既纳入分层的思想,又顾及空间相关性,提出了空间分层线性模型,并给出了其固定效应、方差协方差成分和空间回归参数的最大似然估计,在运用EM算法时,结合运用了Fisher得分算法。相似文献

5.

Decomposition of Prediction Error in Multilevel Models

D. Afshartous J. de Leeuw 《统计学通讯:模拟与计算》2013,42(4):909-928

ABSTRACT

We present a decomposition of prediction error for the multilevel model in the context of predicting a future observable y _*j in the jth group of a hierarchical dataset. The multilevel prediction rule is used for prediction and the components of prediction error are estimated via a simulation study that spans the various combinations of level-1 (individual) and level-2 (group) sample sizes and different intraclass correlation values. Additionally, analytical results present the increase in predicted mean square error (PMSE) with respect to prediction error bias. The components of prediction error provide information with respect to the cost of parameter estimation versus data imputation for predicting future values in a hierarchical data set. Specifically, the cost of parameter estimation is very small compared to data imputation. 相似文献

6.

Importance of sampling weights in multilevel modeling of international large-scale assessment data

Inga Laukaityte Marie Wiberg 《统计学通讯:理论与方法》2018,47(20):4991-5012

Multilevel modeling is an important tool for analyzing large-scale assessment data. However, the standard multilevel modeling will typically give biased results for such complex survey data. This bias can be eliminated by introducing design weights which must be used carefully as they can affect the results. The aim of this paper is to examine different approaches and to give recommendations concerning handling design weights in multilevel models when analyzing large-scale assessments such as TIMSS (The Trends in International Mathematics and Science Study). To achieve the goal of the paper, we examined real data from two countries and included a simulation study. The analyses in the empirical study showed that using no weights or only level 1 weights sometimes could lead to misleading conclusions. The simulation study only showed small differences in estimation of the weighted and unweighted models when informative design weights were used. The use of unscaled or not rescaled weights however caused significant differences in some parameter estimates. 相似文献

7.

One-sample location tests for multilevel data

Denis Larocque Jaakko Nevalainen Hannu Oja 《Journal of statistical planning and inference》2008,138(8):2469-2482

In this paper, we consider testing the location parameter with multilevel (or hierarchical) data. A general family of weighted test statistics is introduced. This family includes extensions to the case of multilevel data of familiar procedures like the t, the sign and the Wilcoxon signed-rank tests. Under mild assumptions, the test statistics have a null limiting normal distribution which facilitates their use. An investigation of the relative merits of selected members of the family of tests is achieved theoretically by deriving their asymptotic relative efficiency (ARE) and empirically via a simulation study. It is shown that the performance of a test depends on the clusters configurations and on the intracluster correlations. Explicit formulas for optimal weights and a discussion of the impact of omitting a level are provided for 2 and 3-level data. It is shown that using appropriate weights can greatly improve the performance of the tests. Finally, the use of the new tests is illustrated with a real data example. 相似文献

8.

Evaluation of Conditional Weight Approximations for Two-Level Models

Laura Stapleton 《统计学通讯:模拟与计算》2013,42(2):182-204

This article evaluates two methods of approximating cluster-level and conditional sampling weights when only unconditional sampling weights are available. For estimation of a multilevel analysis that does not include all facets of a sampling design, conditional sampling weights at each stage of the model should be used, but typically only the unconditional sampling weight of the ultimate sampling unit is provided on federal publicly-released datasets. Methods of approximating these conditional weights have been suggested but there has been no study of their adequacy. This demonstration and simulation study examines the feasibility of using these weight approximations. 相似文献

9.

Weighting for unequal selection probabilities in multilevel models

D. Pfeffermann C. J. Skinner D. J. Holmes H. Goldstein & J. Rasbash 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1998,60(1):23-40

When multilevel models are estimated from survey data derived using multistage sampling, unequal selection probabilities at any stage of sampling may induce bias in standard estimators, unless the sources of the unequal probabilities are fully controlled for in the covariates. This paper proposes alternative ways of weighting the estimation of a two-level model by using the reciprocals of the selection probabilities at each stage of sampling. Consistent estimators are obtained when both the sample number of level 2 units and the sample number of level 1 units within sampled level 2 units increase. Scaling of the weights is proposed to improve the properties of the estimators and to simplify computation. Variance estimators are also proposed. In a limited simulation study the scaled weighted estimators are found to perform well, although non-negligible bias starts to arise for informative designs when the sample number of level 1 units becomes small. The variance estimators perform extremely well. The procedures are illustrated using data from the survey of psychiatric morbidity. 相似文献

10.

NONPARAMETRIC ESTIMATION OF CONDITIONAL CUMULATIVE HAZARDS FOR MISSING POPULATION MARKS

Dipankar Bandyopadhyay Amalia Jácome Pumar 《Australian & New Zealand Journal of Statistics》2010,52(1):75-91

A new function for the competing risks model, the conditional cumulative hazard function, is introduced, from which the conditional distribution of failure times of individuals failing due to cause j can be studied. The standard Nelson–Aalen estimator is not appropriate in this setting, as population membership (mark) information may be missing for some individuals owing to random right-censoring. We propose the use of imputed population marks for the censored individuals through fractional risk sets. Some asymptotic properties, including uniform strong consistency, are established. We study the practical performance of this estimator through simulation studies and apply it to a real data set for illustration. 相似文献

11.

Pseudolikelihood estimation in a class of problems with response-related missing covariates

X. Joan HU Jerald F. Lawless 《Revue canadienne de statistique》1997,25(2):125-142

Many practical situations involve a response variable Y and covariates X , where data on (Y, X ) are incomplete for some portion of a sample of individuals. We consider two general types of pseudolikelihood estimation for problems in which missingness may be response-related. These are typically simpler to implement than ordinary maximum likelihood, which in this context is semiparametric. Asymptotics for the pseudolikelihood methods are presented, and simulations conducted to investigate the methods for an important class of problems involving lifetime data. Our results indicate that for these problems the two methods are effective and comparable with respect to efficiency. 相似文献

12.

基于分层模型的缺失数据插补方法研究

于力超金勇进《统计研究》2018,35(11):93-104

大规模抽样调查多采用复杂抽样设计,得到具有分层嵌套结构的调查数据集,其中不可避免会遇到数据缺失问题,针对分层结构含缺失数据集的插补策略目前鲜有研究。本文将Gibbs算法应用到分层含缺失数据集的多重插补过程中,分别研究了固定效应模型插补法和随机效应模型插补法,进而通过理论推导和数值模拟,在不同组内相关系数、群组规模、数据缺失比例等情形下,从参数估计结果的无偏性和有效性两方面,比较不同方法的插补效果,给出插补模型的选择建议。研究结果表明,采用随机效应模型作为插补模型时,得到的参数估计结果更准确,而固定效应模型作为插补模型操作相对简便,在数据缺失比例较小、组内相关系数较大、群组规模较大等情形下,可以采用固定效应插补模型,否则建议采用随机效应插补模型。相似文献

13.

Multilevel regression mixture analysis

Bengt Muthén Tihomir Asparouhov 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2009,172(3):639-657

Summary. A two-level regression mixture model is discussed and contrasted with the conventional two-level regression model. Simulated and real data shed light on the modelling alternatives. The real data analyses investigate gender differences in mathematics achievement from the US National Education Longitudinal Survey. The two-level regression mixture analyses show that unobserved heterogeneity should not be presupposed to exist only at level 2 at the expense of level 1. Both the simulated and the real data analyses show that level 1 heterogeneity in the form of latent classes can be mistaken for level 2 heterogeneity in the form of the random effects that are used in conventional two-level regression analysis. Because of this, mixture models have an important role to play in multilevel regression analyses. Mixture models allow heterogeneity to be investigated more fully, more correctly attributing different portions of the heterogeneity to the different levels. 相似文献

14.

A comparison of univariate and multivariate multilevel models for repeated measures of use of antenatal care in Uttar Pradesh 总被引：1，自引：0，他引：1

Paula L. Griffiths James J. Brown Peter W. F. Smith 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2004,167(4):597-611

Summary. We compare two different multilevel modelling approaches to the analysis of repeated measures data to assess the effect of mother level characteristics on women's use of prenatal care services in Uttar Pradesh, India. We apply univariate multilevel models to our data and find that the model assumptions are severely violated and the parameter estimates are not stable, particularly for the mother level random effect. To overcome this we apply a multivariate multilevel model. The correlation structure shows that, once the decision has been made regarding use of antenatal care by the mother for her first observed birth in the data, she does not tend to change this decision for higher order births. 相似文献

15.

Missing data techniques for multilevel data: implications of model misspecification

Anne C. Black Ofer Harel D. Betsy McCoach 《Journal of applied statistics》2011,38(9):1845-1865

When modeling multilevel data, it is important to accurately represent the interdependence of observations within clusters. Ignoring data clustering may result in parameter misestimation. However, it is not well established to what degree parameter estimates are affected by model misspecification when applying missing data techniques (MDTs) to incomplete multilevel data. We compare the performance of three MDTs with incomplete hierarchical data. We consider the impact of imputation model misspecification on the quality of parameter estimates by employing multiple imputation under assumptions of a normal model (MI/NM) with two-level cross-sectional data when values are missing at random on the dependent variable at rates of 10%, 30%, and 50%. Five criteria are used to compare estimates from MI/NM to estimates from MI assuming a linear mixed model (MI/LMM) and maximum likelihood estimation to the same incomplete data sets. With 10% missing data (MD), techniques performed similarly for fixed-effects estimates, but variance components were biased with MI/NM. Effects of model misspecification worsened at higher rates of MD, with the hierarchical structure of the data markedly underrepresented by biased variance component estimates. MI/LMM and maximum likelihood provided generally accurate and unbiased parameter estimates but performance was negatively affected by increased rates of MD. 相似文献

16.

Semiparametric inference for estimating equations with nonignorably missing covariates

Ji Chen Fang Fang 《Journal of nonparametric statistics》2018,30(3):796-812

We consider statistical inference of unknown parameters in estimating equations (EEs) when some covariates have nonignorably missing values, which is quite common in practice but has rarely been discussed in the literature. When an instrument, a fully observed covariate vector that helps identifying parameters under nonignorable missingness, is available, the conditional distribution of the missing covariates given other covariates can be estimated by the pseudolikelihood method of Zhao and Shao [(2015), ‘Semiparametric pseudo likelihoods in generalised linear models with nonignorable missing data’, Journal of the American Statistical Association, 110, 1577–1590)] and be used to construct unbiased EEs. These modified EEs then constitute a basis for valid inference by empirical likelihood. Our method is applicable to a wide range of EEs used in practice. It is semiparametric since no parametric model for the propensity of missing covariate data is assumed. Asymptotic properties of the proposed estimator and the empirical likelihood ratio test statistic are derived. Some simulation results and a real data analysis are presented for illustration. 相似文献

17.

Determinants of Contraceptive Use in Egypt: A Multilevel Approach

Caterina Giusti Daniele Vignoli 《Statistical Methods and Applications》2006,15(1):89-106

The increasing use of family planning methods seems to be the intermediate determinant which mostly influences the fertility decline in developing countries, and in particular in those countries which are in an advanced phase of demographic transition such as Egypt. Moreover large countries, like Egypt, are characterized by very different geographical realities and even by strong regional heterogeneities. The aim of this study is the analysis of the determinants of contraceptive use in Egypt, with particular reference to the differentials due to the socio-economic context and to the area of residence. To estimate each individual and regional factors’ effect on contraceptive use, a logistic two-level random intercept model is fitted to EDHS 2000 data; the use of a multilevel analysis is suggested by the two-level data structure: the first level units are the women, the second level units are their regions of residence. 相似文献

18.

Inference for longitudinal data from complex sampling surveys: An approach based on quadratic inference functions

Laura Dumitrescu Wei Qian J. N. K. Rao 《Scandinavian Journal of Statistics》2021,48(1):246-274

We propose a survey weighted quadratic inference function method for the analysis of data collected from longitudinal surveys, as an alternative to the survey weighted generalized estimating equation method. The procedure yields estimators of model parameters, which are shown to be consistent and have a limiting normal distribution. Furthermore, based on the inference function, a pseudolikelihood ratio type statistic for testing a composite hypothesis on model parameters and a statistic for testing the goodness of fit of the assumed model are proposed. We establish their asymptotic distributions as weighted sums of independent chi‐squared random variables and obtain Rao‐Scott corrections to those statistics leading to a chi‐squared distribution, approximately. We examine the performance of the proposed methods in a simulation study. 相似文献

19.

A protective estimator for longitudinal binary data subject to non-ignorable non-monotone missingness

Garrett M. Fitzmaurice Stuart R. Lipsitz Geert Molenberghs Joseph G. Ibrahim 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2005,168(4):723-735

Summary. In longitudinal studies missing data are the rule not the exception. We consider the analysis of longitudinal binary data with non-monotone missingness that is thought to be non-ignorable. In this setting a full likelihood approach is complicated algebraically and can be computationally prohibitive when there are many measurement occasions. We propose a 'protective' estimator that assumes that the probability that a response is missing at any occasion depends, in a completely unspecified way, on the value of that variable alone. Relying on this 'protectiveness' assumption, we describe a pseudolikelihood estimator of the regression parameters under non-ignorable missingness, without having to model the missing data mechanism directly. The method proposed is applied to CD4 cell count data from two longitudinal clinical trials of patients infected with the human immunodeficiency virus. 相似文献

20.

Practical Maximum Pseudolikelihood for Spatial Point Patterns(with Discussion) 总被引：3，自引：0，他引：3

Adrian Baddeley & Rolf Turner 《Australian & New Zealand Journal of Statistics》2000,42(3):283-322

This paper describes a technique for computing approximate maximum pseudolikelihood estimates of the parameters of a spatial point process. The method is an extension of Berman & Turner's (1992) device for maximizing the likelihoods of inhomogeneous spatial Poisson processes. For a very wide class of spatial point process models the likelihood is intractable, while the pseudolikelihood is known explicitly, except for the computation of an integral over the sampling region. Approximation of this integral by a finite sum in a special way yields an approximate pseudolikelihood which is formally equivalent to the (weighted) likelihood of a loglinear model with Poisson responses. This can be maximized using standard statistical software for generalized linear or additive models, provided the conditional intensity of the process takes an 'exponential family' form. Using this approach a wide variety of spatial point process models of Gibbs type can be fitted rapidly, incorporating spatial trends, interaction between points, dependence on spatial covariates, and mark information. 相似文献