期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multilevel modelling of complex survey data

Sophia Rabe-Hesketh Anders Skrondal 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2006,169(4):805-827

Summary. Multilevel modelling is sometimes used for data from complex surveys involving multistage sampling, unequal sampling probabilities and stratification. We consider generalized linear mixed models and particularly the case of dichotomous responses. A pseudolikelihood approach for accommodating inverse probability weights in multilevel models with an arbitrary number of levels is implemented by using adaptive quadrature. A sandwich estimator is used to obtain standard errors that account for stratification and clustering. When level 1 weights are used that vary between elementary units in clusters, the scaling of the weights becomes important. We point out that not only variance components but also regression coefficients can be severely biased when the response is dichotomous. The pseudolikelihood methodology is applied to complex survey data on reading proficiency from the American sample of the 'Program for international student assessment' 2000 study, using the Stata program gllamm which can estimate a wide range of multilevel and latent variable models. Performance of pseudo-maximum-likelihood with different methods for handling level 1 weights is investigated in a Monte Carlo experiment. Pseudo-maximum-likelihood estimators of (conditional) regression coefficients perform well for large cluster sizes but are biased for small cluster sizes. In contrast, estimators of marginal effects perform well in both situations. We conclude that caution must be exercised in pseudo-maximum-likelihood estimation for small cluster sizes when level 1 weights are used. 相似文献

2.

Estimated estimating equations: semiparametric inference for clustered and longitudinal data

Jeng-Min Chiou Hans-Georg Müller 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2005,67(4):531-553

Summary. We introduce a flexible marginal modelling approach for statistical inference for clustered and longitudinal data under minimal assumptions. This estimated estimating equations approach is semiparametric and the proposed models are fitted by quasi-likelihood regression, where the unknown marginal means are a function of the fixed effects linear predictor with unknown smooth link, and variance–covariance is an unknown smooth function of the marginal means. We propose to estimate the nonparametric link and variance–covariance functions via smoothing methods, whereas the regression parameters are obtained via the estimated estimating equations. These are score equations that contain nonparametric function estimates. The proposed estimated estimating equations approach is motivated by its flexibility and easy implementation. Moreover, if data follow a generalized linear mixed model, with either a specified or an unspecified distribution of random effects and link function, the model proposed emerges as the corresponding marginal (population-average) version and can be used to obtain inference for the fixed effects in the underlying generalized linear mixed model, without the need to specify any other components of this generalized linear mixed model. Among marginal models, the estimated estimating equations approach provides a flexible alternative to modelling with generalized estimating equations. Applications of estimated estimating equations include diagnostics and link selection. The asymptotic distribution of the proposed estimators for the model parameters is derived, enabling statistical inference. Practical illustrations include Poisson modelling of repeated epileptic seizure counts and simulations for clustered binomial responses. 相似文献

3.

Clustered data with small sample sizes: Comparing the performance of model-based and design-based approaches

Daniel M. McNeish Jeffery R. Harring 《统计学通讯:模拟与计算》2017,46(2):855-869

Two classes of methods properly account for clustering of data: design-based methods and model-based methods. Estimates from both methods have been shown to be approximately equal with large samples. However, both classes are known to produce biased standard error estimates with small samples. This paper compares the bias of standard errors and statistical power of marginal effects for generalized estimating equations (a design-based method) and generalized/linear mixed effects models (model-based methods) with small sample sizes via a simulation study. Provided that the distributional assumptions are met, model-based methods produced the least-biased standard error estimates and greater relative statistical power. 相似文献

4.

Weighted empirical adaptive variance estimators for correlated data regression

T. Lumley & P. Heagerty 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1999,61(2):459-477

Estimating equations based on marginal generalized linear models are useful for regression modelling of correlated data, but inference and testing require reliable estimates of standard errors. We introduce a class of variance estimators based on the weighted empirical variance of the estimating functions and show that an adaptive choice of weights allows reliable estimation both asymptotically and by simulation in finite samples. Connections with previous bootstrap and jackknife methods are explored. The effect of reliable variance estimation is illustrated in data on health effects of air pollution in King County, Washington. 相似文献

5.

Predicting random effects with an expanded finite population mixed model

Edward J. Stanek III Julio M. Singer 《Journal of statistical planning and inference》2008

Prediction of random effects is an important problem with expanding applications. In the simplest context, the problem corresponds to prediction of the latent value (the mean) of a realized cluster selected via two-stage sampling. Recently, Stanek and Singer [Predicting random effects from finite population clustered samples with response error. J. Amer. Statist. Assoc. 99, 119–130] developed best linear unbiased predictors (BLUP) under a finite population mixed model that outperform BLUPs from mixed models and superpopulation models. Their setup, however, does not allow for unequally sized clusters. To overcome this drawback, we consider an expanded finite population mixed model based on a larger set of random variables that span a higher dimensional space than those typically applied to such problems. We show that BLUPs for linear combinations of the realized cluster means derived under such a model have considerably smaller mean squared error (MSE) than those obtained from mixed models, superpopulation models, and finite population mixed models. We motivate our general approach by an example developed for two-stage cluster sampling and show that it faithfully captures the stochastic aspects of sampling in the problem. We also consider simulation studies to illustrate the increased accuracy of the BLUP obtained under the expanded finite population mixed model. 相似文献

6.

Composite Likelihood Estimation for Multivariate Probit Latent Traits Models

M.-L. Feddag 《统计学通讯:理论与方法》2013,42(14):2551-2566

Inference in generalized linear mixed models with multivariate random effects is often made cumbersome by the high-dimensional intractable integrals involved in the marginal likelihood. This article presents an inferential methodology based on the marginal composite likelihood approach for the probit latent traits models. This method belonging to the broad class of pseudo-likelihood involves marginal pairs probabilities of the responses which has analytical expression. The different results are illustrated with a simulation study and with an analysis of real data from health related quality of life. 相似文献

7.

Effects of the estimation of covariance matrix parameters in the generalized multivariate linear model

Gregory C. Reinsel 《统计学通讯:理论与方法》2013,42(5):639-650

We Consider the generalized multivariate linear model and assume the covariance matrix of the p x 1 vector of responses on a given individual can be represented in the general linear structure form described by Anderson (1973). The effects of the use of estimates of the parameters of the covariance matrix on the generalized least squares estimator of the regression coefficients and on the prediction of a portion of a future vector, when only the first portion of the vector has been observed, are investigated. Approximations are derived for the covariance matrix of the generalized least squares estimator and for the mean square error matrix of the usual predictor, for the practical case where estimated parameters are used. 相似文献

8.

Functional Mixed Effects Model for Small Area Estimation

下载免费PDF全文

Tapabrata Maiti Samiran Sinha Ping‐Shou Zhong 《Scandinavian Journal of Statistics》2016,43(3):886-903

Functional data analysis has become an important area of research because of its ability of handling high‐dimensional and complex data structures. However, the development is limited in the context of linear mixed effect models and, in particular, for small area estimation. The linear mixed effect models are the backbone of small area estimation. In this article, we consider area‐level data and fit a varying coefficient linear mixed effect model where the varying coefficients are semiparametrically modelled via B‐splines. We propose a method of estimating the fixed effect parameters and consider prediction of random effects that can be implemented using a standard software. For measuring prediction uncertainties, we derive an analytical expression for the mean squared errors and propose a method of estimating the mean squared errors. The procedure is illustrated via a real data example, and operating characteristics of the method are judged using finite sample simulation studies. 相似文献

9.

Aggregated functional data model for near-infrared spectroscopy calibration and prediction

Ronaldo Dias Guilherme Ludwig Marley A. Saraiva 《Journal of applied statistics》2015,42(1):127-143

Calibration and prediction for NIR spectroscopy data are performed based on a functional interpretation of the Beer–Lambert formula. Considering that, for each chemical sample, the resulting spectrum is a continuous curve obtained as the summation of overlapped absorption spectra from each analyte plus a Gaussian error, we assume that each individual spectrum can be expanded as a linear combination of B-splines basis. Calibration is then performed using two procedures for estimating the individual analytes’ curves: basis smoothing and smoothing splines. Prediction is done by minimizing the square error of prediction. To assess the variance of the predicted values, we use a leave-one-out jackknife technique. Departures from the standard error models are discussed through a simulation study, in particular, how correlated errors impact on the calibration step and consequently on the analytes’ concentration prediction. Finally, the performance of our methodology is demonstrated through the analysis of two publicly available datasets. 相似文献

10.

Generalized Estimating Equations to Binary Probit Model

M-L. Feddag 《统计学通讯:理论与方法》2014,43(19):3997-4010

Inference in generalized linear mixed models with multivariate random effects is often made cumbersome by the high-dimensional intractable integrals involved in the marginal likelihood. This article presents an inferential methodology based on the generalized estimating equations for the probit latent traits models. This method belonging to the broad class of semi parametric approaches involves marginal joint moments of order 1 and 2, which has analytical expression. The different results are illustrated with a simulation study. 相似文献

11.

Theoretical evaluation of prediction error in linear regression with a bivariate response variable containing missing data

Lars Erik Gangsei Trygve Almøy Solve Sæbø 《统计学通讯:理论与方法》2017,46(20):9921-9929

Methods for linear regression with multivariate response variables are well described in statistical literature. In this study we conduct a theoretical evaluation of the expected squared prediction error in bivariate linear regression where one of the response variables contains missing data. We make the assumption of known covariance structure for the error terms. On this basis, we evaluate three well-known estimators: standard ordinary least squares, generalized least squares, and a James–Stein inspired estimator. Theoretical risk functions are worked out for all three estimators to evaluate under which circumstances it is advantageous to take the error covariance structure into account. 相似文献

12.

Multilevel Mixed Linear Models for Survival Data 总被引：2，自引：0，他引：2

Ha ID Lee Y 《Lifetime data analysis》2005,11(1):131-142

For the analysis of correlated survival data mixed linear models are useful alternatives to frailty models. By their use the survival times can be directly modelled, so that the interpretation of the fixed and random effects is straightforward. However, because of intractable integration involved with the use of marginal likelihood the class of models in use has been severely restricted. Such a difficulty can be avoided by using hierarchical-likelihood, which provides a statistically efficient and fast fitting algorithm for multilevel models. The proposed method is illustrated using the chronic granulomatous disease data. A simulation study is carried out to evaluate the performance. 相似文献

13.

Diagnostics for elliptical linear mixed models with first-order autoregressive errors

《Journal of Statistical Computation and Simulation》2012,82(10):1281-1296

For longitudinal time series data, linear mixed models that contain both random effects across individuals and first-order autoregressive errors within individuals may be appropriate. Some statistical diagnostics based on the models under a proposed elliptical error structure are developed in this work. It is well known that the class of elliptical distributions offers a more flexible framework for modelling since it contains both light- and heavy-tailed distributions. Iterative procedures for the maximum-likelihood estimates of the model parameters are presented. Score tests for the presence of autocorrelation and the homogeneity of autocorrelation coefficients among individuals are constructed. The properties of test statistics are investigated through Monte Carlo simulations. The local influence method for the models is also given. The analysed results of a real data set illustrate the values of the models and diagnostic statistics. 相似文献

14.

Computationally feasible estimation of the covariance structure in generalized linear mixed models

《Journal of Statistical Computation and Simulation》2012,82(12):1229-1239

In this paper, we discuss how a regression model, with a non-continuous response variable, which allows for dependency between observations, should be estimated when observations are clustered and measurements on the subjects are repeated. The cluster sizes are assumed to be large. We find that the conventional estimation technique suggested by the literature on generalized linear mixed models (GLMM) is slow and sometimes fails due to non-convergence and lack of memory on standard PCs. We suggest to estimate the random effects as fixed effects by generalized linear model and to derive the covariance matrix from these estimates. A simulation study shows that our proposal is feasible in terms of mean-square error and computation time. We recommend that our proposal be implemented in the software of GLMM techniques so that the estimation procedure can switch between the conventional technique and our proposal, depending on the size of the clusters. 相似文献

15.

The effect of number of clusters and cluster size on statistical power and Type I error rates when testing random effects variance components in multilevel linear and logistic regression models

Peter C. Austin George Leckie 《Journal of Statistical Computation and Simulation》2018,88(16):3151-3163

When using multilevel regression models that incorporate cluster-specific random effects, the Wald and the likelihood ratio (LR) tests are used for testing the null hypothesis that the variance of the random effects distribution is equal to zero. We conducted a series of Monte Carlo simulations to examine the effect of the number of clusters and the number of subjects per cluster on the statistical power to detect a non-null random effects variance and to compare the empirical type I error rates of the Wald and LR tests. Statistical power increased with increasing number of clusters and number of subjects per cluster. Statistical power was greater for the LR test than for the Wald test. These results applied to both the linear and logistic regressions, but were more pronounced for the latter. The use of the LR test is preferable to the use of the Wald test. 相似文献

16.

An experimental design criterion for minimizing meta-model prediction errors applied to die casting process design

Theodore T. Allen Liyang Yu John Schmitz 《Journal of the Royal Statistical Society. Series C, Applied statistics》2003,52(1):103-117

Summary. We propose the expected integrated mean-squared error (EIMSE) experimental design criterion and show how we used it to design experiments to meet the needs of researchers in die casting engineering. This criterion expresses in a direct way the researchers' goal to minimize the expected meta-model prediction errors, taking into account the effects of both random experimental errors and errors deriving from our uncertainty about the true model form. Because we needed to make assumptions about the prior distribution of model coefficients to estimate the EIMSE, we performed a sensitivity analysis to verify that the relative prediction performance of the design generated was largely insensitive to our assumptions. Also, we discuss briefly the general advantages of EIMSE optimal designs, including lower expected bias errors compared with popular response surface designs and substantially lower variance errors than certain Box–Draper all-bias designs. 相似文献

17.

Prediction Error Estimation Under Bregman Divergence for Non-Parametric Regression and Classification

CHUNMING ZHANG 《Scandinavian Journal of Statistics》2008,35(3):496-523

Abstract. Prediction error is critical to assess model fit and evaluate model prediction. We propose the cross-validation (CV) and approximated CV methods for estimating prediction error under the Bregman divergence (BD), which embeds nearly all of the commonly used loss functions in the regression, classification procedures and machine learning literature. The approximated CV formulas are analytically derived, which facilitate fast estimation of prediction error under BD. We then study a data-driven optimal bandwidth selector for local-likelihood estimation that minimizes the overall prediction error or equivalently the covariance penalty. It is shown that the covariance penalty and CV methods converge to the same mean-prediction-error-criterion. We also propose a lower-bound scheme for computing the local logistic regression estimates and demonstrate that the algorithm monotonically enhances the target local likelihood and converges. The idea and methods are extended to the generalized varying-coefficient models and additive models. 相似文献

18.

Bayesian modeling of autoregressive partial linear models with scale mixture of normal errors

Guillermo Ferreira Luis M. Castro Ronaldo Dias 《Journal of applied statistics》2013,40(8):1796-1816

Normality and independence of error terms are typical assumptions for partial linear models. However, these assumptions may be unrealistic in many fields, such as economics, finance and biostatistics. In this paper, a Bayesian analysis for partial linear model with first-order autoregressive errors belonging to the class of the scale mixtures of normal distributions is studied in detail. The proposed model provides a useful generalization of the symmetrical linear regression model with independent errors, since the distribution of the error term covers both correlated and thick-tailed distributions, and has a convenient hierarchical representation allowing easy implementation of a Markov chain Monte Carlo scheme. In order to examine the robustness of the model against outlying and influential observations, a Bayesian case deletion influence diagnostics based on the Kullback–Leibler (K–L) divergence is presented. The proposed method is applied to monthly and daily returns of two Chilean companies. 相似文献

19.

Why use arbitrary points scores?: ordered categories in models of educational progress 总被引：2，自引：1，他引：1

A. Fielding 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》1999,162(3):303-328

Much statistical modelling of random effects on ordered responses, particularly of grades in educational research, continues to use linear models and to treat the responses through arbitrary scores. Methodological and software developments now facilitate the proper treatment of such situations through more realistic generalized random-effects models. This paper reviews some methodological comparisons of these approaches. It highlights the flexibility offered by the macro facilities of the multilevel random-effects software MLwiN. It considers applications to an analysis of primary school educational progress from reception to England and Wales national curriculum key stage 1 mathematics. By contrasting the results from generalized modelling and scoring approaches it draws some conclusions about the theoretical, methodological and practical options that are available. It also considers that results of generalized random-model estimation may be more intelligible to users of analytical results. 相似文献

20.

Bayesian Inference in Generalized Error and Generalized Student-t Regression Models

Efthymios G. Tsionas 《统计学通讯:理论与方法》2013,42(3):388-407

This study takes up inference in linear models with generalized error and generalized t distributions. For the generalized error distribution, two computational algorithms are proposed. The first is based on indirect Bayesian inference using an approximating finite scale mixture of normal distributions. The second is based on Gibbs sampling. The Gibbs sampler involves only drawing random numbers from standard distributions. This is important because previously the impression has been that an exact analysis of the generalized error regression model using Gibbs sampling is not possible. Next, we describe computational Bayesian inference for linear models with generalized t disturbances based on Gibbs sampling, and exploiting the fact that the model is a mixture of generalized error distributions with inverse generalized gamma distributions for the scale parameter. The linear model with this specification has also been thought not to be amenable to exact Bayesian analysis. All computational methods are applied to actual data involving the exchange rates of the British pound, the French franc, and the German mark relative to the U.S. dollar. 相似文献