期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A new REML (parameter expanded) EM algorithm for linear mixed models

S. M. Diffey A. B. Smith A. H. Welsh B. R. Cullis 《Australian & New Zealand Journal of Statistics》2017,59(4):433-448

Linear mixed models are regularly applied to animal and plant breeding data to evaluate genetic potential. Residual maximum likelihood (REML) is the preferred method for estimating variance parameters associated with this type of model. Typically an iterative algorithm is required for the estimation of variance parameters. Two algorithms which can be used for this purpose are the expectation‐maximisation (EM) algorithm and the parameter expanded EM (PX‐EM) algorithm. Both, particularly the EM algorithm, can be slow to converge when compared to a Newton‐Raphson type scheme such as the average information (AI) algorithm. The EM and PX‐EM algorithms require specification of the complete data, including the incomplete and missing data. We consider a new incomplete data specification based on a conditional derivation of REML. We illustrate the use of the resulting new algorithm through two examples: a sire model for lamb weight data and a balanced incomplete block soybean variety trial. In the cases where the AI algorithm failed, a REML PX‐EM based on the new incomplete data specification converged in 28% to 30% fewer iterations than the alternative REML PX‐EM specification. For the soybean example a REML EM algorithm using the new specification converged in fewer iterations than the current standard specification of a REML PX‐EM algorithm. The new specification integrates linear mixed models, Henderson's mixed model equations, REML and the REML EM algorithm into a cohesive framework. 相似文献

2.

Variable selection for longitudinal data with high-dimensional covariates and dropouts

Xueying Zheng Bo Fu Jiajia Zhang 《Journal of Statistical Computation and Simulation》2018,88(4):712-725

A new variable selection approach utilizing penalized estimating equations is developed for high-dimensional longitudinal data with dropouts under a missing at random (MAR) mechanism. The proposed method is based on the best linear approximation of efficient scores from the full dataset and does not need to specify a separate model for the missing or imputation process. The coordinate descent algorithm is adopted to implement the proposed method and is computational feasible and stable. The oracle property is established and extensive simulation studies show that the performance of the proposed variable selection method is much better than that of penalized estimating equations dealing with complete data which do not account for the MAR mechanism. In the end, the proposed method is applied to a Lifestyle Education for Activity and Nutrition study and the interaction effect between intervention and time is identified, which is consistent with previous findings. 相似文献

3.

Functional Mixed Effects Model for Small Area Estimation

下载免费PDF全文

Tapabrata Maiti Samiran Sinha Ping‐Shou Zhong 《Scandinavian Journal of Statistics》2016,43(3):886-903

Functional data analysis has become an important area of research because of its ability of handling high‐dimensional and complex data structures. However, the development is limited in the context of linear mixed effect models and, in particular, for small area estimation. The linear mixed effect models are the backbone of small area estimation. In this article, we consider area‐level data and fit a varying coefficient linear mixed effect model where the varying coefficients are semiparametrically modelled via B‐splines. We propose a method of estimating the fixed effect parameters and consider prediction of random effects that can be implemented using a standard software. For measuring prediction uncertainties, we derive an analytical expression for the mean squared errors and propose a method of estimating the mean squared errors. The procedure is illustrated via a real data example, and operating characteristics of the method are judged using finite sample simulation studies. 相似文献

4.

A linear mixed model for analyzing longitudinal skew-normal responses with random dropout

M. Ganjali T. Baghfalaki M. Khazaei 《Journal of the Korean Statistical Society》2013,42(2):149-160

In this paper, a linear mixed effects model is used to fit skewed longitudinal data in the presence of dropout. Two distributional assumptions are considered to produce background for heavy tailed models. One is the linear mixed model with skew-normal random effects and normal errors and the other one is the linear mixed model with skew-normal errors and normal random effects. An ECM algorithm is developed to obtain the parameter estimates. Also an empirical Bayes approach is used for estimating random effects. A simulation study is implemented to investigate the performance of the presented algorithm. Results of an application are also reported where standard errors of estimates are calculated using the Bootstrap approach. 相似文献

5.

INFERENCE IN DIRICHLET PROCESS MIXED GENERALIZED LINEAR MODELS BY USING MONTE CARLO EM

Malay Naskar Kalyan Das 《Australian & New Zealand Journal of Statistics》2004,46(4):685-701

Generalized linear mixed models are widely used for describing overdispersed and correlated data. Such data arise frequently in studies involving clustered and hierarchical designs. A more flexible class of models has been developed here through the Dirichlet process mixture. An additional advantage of using such mixture models is that the observations can be grouped together on the basis of the overdispersion present in the data. This paper proposes a partial empirical Bayes method for estimating all the model parameters by adopting a version of the EM algorithm. An augmented model that helps to implement an efficient Gibbs sampling scheme, under the non‐conjugate Dirichlet process generalized linear model, generates observations from the conditional predictive distribution of unobserved random effects and provides an estimate of the average number of mixing components in the Dirichlet process mixture. A simulation study has been carried out to demonstrate the consistency of the proposed method. The approach is also applied to a study on outdoor bacteria concentration in the air and to data from 14 retrospective lung‐cancer studies. 相似文献

6.

Semiparametric Efficient Estimation for a Class of Generalized Proportional Odds Cure Models

Mao M Wang JL 《Journal of the American Statistical Association》2010,105(489):302-311

We present a mixture cure model with the survival time of the "uncured" group coming from a class of linear transformation models, which is an extension of the proportional odds model. This class of model, first proposed by Dabrowska and Doksum (1988), which we term "generalized proportional odds model," is well suited for the mixture cure model setting due to a clear separation between long-term and short-term effects. A standard expectation-maximization algorithm can be employed to locate the nonparametric maximum likelihood estimators, which are shown to be consistent and semiparametric efficient. However, there are difficulties in the M-step due to the nonparametric component. We overcome these difficulties by proposing two different algorithms. The first is to employ an majorize-minimize (MM) algorithm in the M-step instead of the usual Newton-Raphson method, and the other is based on an alternative form to express the model as a proportional hazards frailty model. The two new algorithms are compared in a simulation study with an existing estimating equation approach by Lu and Ying (2004). The MM algorithm provides both computational stability and efficiency. A case study of leukemia data is conducted to illustrate the proposed procedures. 相似文献

7.

Fast and approximate exhaustive variable selection for generalised linear models with APES

Kevin YX Wang Garth Tarr Jean YH Yang Samuel Mueller 《Australian & New Zealand Journal of Statistics》2019,61(4):445-465

We present APproximated Exhaustive Search (APES), which enables fast and approximated exhaustive variable selection in Generalised Linear Models (GLMs). While exhaustive variable selection remains as the gold standard in many model selection contexts, traditional exhaustive variable selection suffers from computational feasibility issues. More precisely, there is often a high cost associated with computing maximum likelihood estimates (MLE) for all subsets of GLMs. Efficient algorithms for exhaustive searches exist for linear models, most notably the leaps‐and‐bound algorithm and, more recently, the mixed integer optimisation (MIO) algorithm. The APES method learns from observational weights in a generalised linear regression super‐model and reformulates the GLM problem as a linear regression problem. In this way, APES can approximate a true exhaustive search in the original GLM space. Where exhaustive variable selection is not computationally feasible, we propose a best‐subset search, which also closely approximates a true exhaustive search. APES is made available in both as a standalone R package as well as part of the already existing mplot package. 相似文献

8.

Adaptive resampling algorithms for estimating bootstrap distributions

Jiaqiao Hu Zheng Su 《Journal of statistical planning and inference》2008

Based on recent developments in the field of operations research, we propose two adaptive resampling algorithms for estimating bootstrap distributions. One algorithm applies the principle of the recently proposed cross-entropy (CE) method for rare event simulation, and does not require calculation of the resampling probability weights via numerical optimization methods (e.g., Newton's method), whereas the other algorithm can be viewed as a multi-stage extension of the classical two-step variance minimization approach. The two algorithms can be easily used as part of a general algorithm for Monte Carlo calculation of bootstrap confidence intervals and tests, and are especially useful in estimating rare event probabilities. We analyze theoretical properties of both algorithms in an idealized setting and carry out simulation studies to demonstrate their performance. Empirical results on both one-sample and two-sample problems as well as a real survival data set show that the proposed algorithms are not only superior to traditional approaches, but may also provide more than an order of magnitude of computational efficiency gains. 相似文献

9.

Estimated estimating equations: semiparametric inference for clustered and longitudinal data

Jeng-Min Chiou Hans-Georg Müller 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2005,67(4):531-553

Summary. We introduce a flexible marginal modelling approach for statistical inference for clustered and longitudinal data under minimal assumptions. This estimated estimating equations approach is semiparametric and the proposed models are fitted by quasi-likelihood regression, where the unknown marginal means are a function of the fixed effects linear predictor with unknown smooth link, and variance–covariance is an unknown smooth function of the marginal means. We propose to estimate the nonparametric link and variance–covariance functions via smoothing methods, whereas the regression parameters are obtained via the estimated estimating equations. These are score equations that contain nonparametric function estimates. The proposed estimated estimating equations approach is motivated by its flexibility and easy implementation. Moreover, if data follow a generalized linear mixed model, with either a specified or an unspecified distribution of random effects and link function, the model proposed emerges as the corresponding marginal (population-average) version and can be used to obtain inference for the fixed effects in the underlying generalized linear mixed model, without the need to specify any other components of this generalized linear mixed model. Among marginal models, the estimated estimating equations approach provides a flexible alternative to modelling with generalized estimating equations. Applications of estimated estimating equations include diagnostics and link selection. The asymptotic distribution of the proposed estimators for the model parameters is derived, enabling statistical inference. Practical illustrations include Poisson modelling of repeated epileptic seizure counts and simulations for clustered binomial responses. 相似文献

10.

Estimation of the linear mixed integrated Ornstein–Uhlenbeck model

Rachael A. Hughes Michael G. Kenward Jonathan A. C. Sterne Kate Tilling 《Journal of Statistical Computation and Simulation》2017,87(8):1541-1558

The linear mixed model with an added integrated Ornstein–Uhlenbeck (IOU) process (linear mixed IOU model) allows for serial correlation and estimation of the degree of derivative tracking. It is rarely used, partly due to the lack of available software. We implemented the linear mixed IOU model in Stata and using simulations we assessed the feasibility of fitting the model by restricted maximum likelihood when applied to balanced and unbalanced data. We compared different (1) optimization algorithms, (2) parameterizations of the IOU process, (3) data structures and (4) random-effects structures. Fitting the model was practical and feasible when applied to large and moderately sized balanced datasets (20,000 and 500 observations), and large unbalanced datasets with (non-informative) dropout and intermittent missingness. Analysis of a real dataset showed that the linear mixed IOU model was a better fit to the data than the standard linear mixed model (i.e. independent within-subject errors with constant variance). 相似文献

11.

Small area estimation under unit-level temporal linear mixed models

Domingo Morales 《Journal of Statistical Computation and Simulation》2019,89(9):1592-1620

Data from past time periods and temporal correlation are rich sources of information for estimating small area parameters at the current period. This paper investigates the use of unit-level temporal linear mixed models for estimating linear parameters. Two models are considered, with domain and domain-time random effects. The first model assumes time independency and the second one AR(1)-type time correlation. They are fitted by a Fisher-scoring algorithm that calculates the residual maximum likelihood estimators of the model parameters. Based on the introduced models, empirical best linear unbiased predictors of small area linear parameters are studied, and analytic estimators for evaluating the performance of their mean squared errors are proposed. Three simulation experiments are carried out to study the behaviour of the fitting algorithm, the small area predictors and the estimators of the mean squared error. By using data of the Spanish surveys of income and living conditions of 2004–2008, an application to the estimation of 2008 average normalized net annual incomes in Spanish provinces by sex is given. 相似文献

12.

Likelihood-based approaches for multivariate linear models under inequality constraints for incomplete data

Shurong Zheng Jianhua Guo Ning-Zhong Shi Guo-Liang Tian 《Journal of statistical planning and inference》2012

In this paper, we consider a multivariate linear model with complete/incomplete data, where the regression coefficients are subject to a set of linear inequality restrictions. We first develop an expectation/conditional maximization (ECM) algorithm for calculating restricted maximum likelihood estimates of parameters of interest. We then establish the corresponding convergence properties for the proposed ECM algorithm. Applications to growth curve models and linear mixed models are presented. Confidence interval construction via the double-bootstrap method is provided. Some simulation studies are performed and a real example is used to illustrate the proposed methods. 相似文献

13.

Stochastic proximal-gradient algorithms for penalized mixed models

Fort Gersende Ollier Edouard Samson Adeline 《Statistics and Computing》2019,29(2):231-253

Motivated by penalized likelihood maximization in complex models, we study optimization problems where neither the function to optimize nor its gradient has an explicit expression, but its gradient can be approximated by a Monte Carlo technique. We propose a new algorithm based on a stochastic approximation of the proximal-gradient (PG) algorithm. This new algorithm, named stochastic approximation PG (SAPG) is the combination of a stochastic gradient descent step which—roughly speaking—computes a smoothed approximation of the gradient along the iterations, and a proximal step. The choice of the step size and of the Monte Carlo batch size for the stochastic gradient descent step in SAPG is discussed. Our convergence results cover the cases of biased and unbiased Monte Carlo approximations. While the convergence analysis of some classical Monte Carlo approximation of the gradient is already addressed in the literature (see Atchadé et al. in J Mach Learn Res 18(10):1–33, 2017), the convergence analysis of SAPG is new. Practical implementation is discussed, and guidelines to tune the algorithm are given. The two algorithms are compared on a linear mixed effect model as a toy example. A more challenging application is proposed on nonlinear mixed effect models in high dimension with a pharmacokinetic data set including genomic covariates. To our best knowledge, our work provides the first convergence result of a numerical method designed to solve penalized maximum likelihood in a nonlinear mixed effect model.

相似文献

14.

A simultaneous variable selection methodology for linear mixed models

Juming Pan Junfeng Shang 《Journal of Statistical Computation and Simulation》2018,88(17):3323-3337

Selecting an appropriate structure for a linear mixed model serves as an appealing problem in a number of applications such as in the modelling of longitudinal or clustered data. In this paper, we propose a variable selection procedure for simultaneously selecting and estimating the fixed and random effects. More specifically, a profile log-likelihood function, along with an adaptive penalty, is utilized for sparse selection. The Newton-Raphson optimization algorithm is performed to complete the parameter estimation. By jointly selecting the fixed and random effects, the proposed approach increases selection accuracy compared with two-stage procedures, and the usage of the profile log-likelihood can improve computational efficiency in one-stage procedures. We prove that the proposed procedure enjoys the model selection consistency. A simulation study and a real data application are conducted for demonstrating the effectiveness of the proposed method. 相似文献

15.

A consistent estimator for logistic mixed effect models

Yizheng Wei Yanyuan Ma Tanya P. Garcia Samiran Sinha 《Revue canadienne de statistique》2019,47(2):140-156

We propose a consistent and locally efficient method of estimating the model parameters of a logistic mixed effect model with random slopes. Our approach relaxes two typical assumptions: the random effects being normally distributed, and the covariates and random effects being independent of each other. Adhering to these assumptions is particularly difficult in health studies where, in many cases, we have limited resources to design experiments and gather data in long‐term studies, while new findings from other fields might emerge, suggesting the violation of such assumptions. So it is crucial to have an estimator that is robust to such violations; then we could make better use of current data harvested using various valuable resources. Our method generalizes the framework presented in Garcia & Ma (2016) which also deals with a logistic mixed effect model but only considers a random intercept. A simulation study reveals that our proposed estimator remains consistent even when the independence and normality assumptions are violated. This contrasts favourably with the traditional maximum likelihood estimator which is likely to be inconsistent when there is dependence between the covariates and random effects. Application of this work to a study of Huntington's disease reveals that disease diagnosis can be enhanced using assessments of cognitive performance. The Canadian Journal of Statistics 47: 140–156; 2019 © 2019 Statistical Society of Canada 相似文献

16.

Distributed inference for two-sample U-statistics in massive data analysis

Bingyao Huang Yanyan Liu Liuhua Peng 《Scandinavian Journal of Statistics》2023,50(3):1090-1115

This paper considers distributed inference for two-sample U-statistics under the massive data setting. In order to reduce the computational complexity, this paper proposes distributed two-sample U-statistics and blockwise linear two-sample U-statistics. The blockwise linear two-sample U-statistic, which requires less communication cost, is more computationally efficient especially when the data are stored in different locations. The asymptotic properties of both types of distributed two-sample U-statistics are established. In addition, this paper proposes bootstrap algorithms to approximate the distributions of distributed two-sample U-statistics and blockwise linear two-sample U-statistics for both nondegenerate and degenerate cases. The distributed weighted bootstrap for the distributed two-sample U-statistic is new in the literature. The proposed bootstrap procedures are computationally efficient and are suitable for distributed computing platforms with theoretical guarantees. Extensive numerical studies illustrate that the proposed distributed approaches are feasible and effective. 相似文献

17.

Empirical Likelihood for Censored Linear Regression and Variable Selection

下载免费PDF全文

Tong Tong Wu Gang Li Chengyong Tang 《Scandinavian Journal of Statistics》2015,42(3):798-812

The linear regression model for right censored data, also known as the accelerated failure time model using the logarithm of survival time as the response variable, is a useful alternative to the Cox proportional hazards model. Empirical likelihood as a non‐parametric approach has been demonstrated to have many desirable merits thanks to its robustness against model misspecification. However, the linear regression model with right censored data cannot directly benefit from the empirical likelihood for inferences mainly because of dependent elements in estimating equations of the conventional approach. In this paper, we propose an empirical likelihood approach with a new estimating equation for linear regression with right censored data. A nested coordinate algorithm with majorization is used for solving the optimization problems with non‐differentiable objective function. We show that the Wilks' theorem holds for the new empirical likelihood. We also consider the variable selection problem with empirical likelihood when the number of predictors can be large. Because the new estimating equation is non‐differentiable, a quadratic approximation is applied to study the asymptotic properties of penalized empirical likelihood. We prove the oracle properties and evaluate the properties with simulated data. We apply our method to a Surveillance, Epidemiology, and End Results small intestine cancer dataset. 相似文献

18.

Estimation in mixed models through three step minimization

Dário Ferreira Sandra S. Ferreira Célia Nunes João T. Mexia 《统计学通讯:模拟与计算》2017,46(2):1156-1166

The aim of this article is to present an estimation procedure for both fixed effects and variance components in linear mixed models. This procedure consists of a maximum-likelihood method which we call Three Step Minimization (TSM). The major contribution of this method is that when variances tend to be null standard algorithms behave badly, unlike the TSM method, which uses a grid search algorithm in a compact set. A numerical application with real and simulated data is provided. 相似文献

19.

Restricted likelihood inference for generalized linear mixed models

Ruggero Bellio Alessandra R. Brazzale 《Statistics and Computing》2011,21(2):173-183

We aim to promote the use of the modified profile likelihood function for estimating the variance parameters of a GLMM in analogy to the REML criterion for linear mixed models. Our approach is based on both quasi-Monte Carlo integration and numerical quadrature, obtaining in either case simulation-free inferential results. We will illustrate our idea by applying it to regression models with binary responses or count data and independent clusters, covering also the case of two-part models. Two real data examples and three simulation studies support the use of the proposed solution as a natural extension of REML for GLMMs. An R package implementing the methodology is available online. 相似文献

20.

An algorithm for the multivariate group lasso with covariance estimation

I. Wilms C. Croux 《Journal of applied statistics》2018,45(4):668-681

We study a group lasso estimator for the multivariate linear regression model that accounts for correlated error terms. A block coordinate descent algorithm is used to compute this estimator. We perform a simulation study with categorical data and multivariate time series data, typical settings with a natural grouping among the predictor variables. Our simulation studies show the good performance of the proposed group lasso estimator compared to alternative estimators. We illustrate the method on a time series data set of gene expressions. 相似文献