首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Linear mixed models are regularly applied to animal and plant breeding data to evaluate genetic potential. Residual maximum likelihood (REML) is the preferred method for estimating variance parameters associated with this type of model. Typically an iterative algorithm is required for the estimation of variance parameters. Two algorithms which can be used for this purpose are the expectation‐maximisation (EM) algorithm and the parameter expanded EM (PX‐EM) algorithm. Both, particularly the EM algorithm, can be slow to converge when compared to a Newton‐Raphson type scheme such as the average information (AI) algorithm. The EM and PX‐EM algorithms require specification of the complete data, including the incomplete and missing data. We consider a new incomplete data specification based on a conditional derivation of REML. We illustrate the use of the resulting new algorithm through two examples: a sire model for lamb weight data and a balanced incomplete block soybean variety trial. In the cases where the AI algorithm failed, a REML PX‐EM based on the new incomplete data specification converged in 28% to 30% fewer iterations than the alternative REML PX‐EM specification. For the soybean example a REML EM algorithm using the new specification converged in fewer iterations than the current standard specification of a REML PX‐EM algorithm. The new specification integrates linear mixed models, Henderson's mixed model equations, REML and the REML EM algorithm into a cohesive framework.  相似文献   

2.
The purpose of this paper is to describe a simple procedure for the estima-tion of parameters in the unbalanced mixed linear model. There are implications for hypothesis testing and interval estimation, A feature of these estimators is that they are expressed in terms of simple formulas. This has obvious advantages for computations and small sample analysis. In addition, the formulas suggest useful diagnostic procedures for assessing the quality of the data as well as possible defects in the model assumptions. The concepts are illustrated with several examples. Evidence is presented to indicate that, in cases of modest imbalance, these estimators are highly efficient and dominate AOV estimates over most of the parameter space. In cases of more extreme imbalance, the results are qualitatively the same but the estimators are less efficient than the AOV estimators for small values of the parameters. The extension of this method to factorial models with missing cells is not complete.  相似文献   

3.
Two-colour microarray experiments form an important tool in gene expression analysis. Due to the high risk of missing observations in microarray experiments, it is fundamental to concentrate not only on optimal designs but also on designs which are robust against missing observations. As an extension of Latif et al. (2009), we define the optimal breakdown number for a collection of designs to describe the robustness, and we calculate the breakdown number for various D-optimal block designs. We show that, for certain values of the numbers of treatments and arrays, the designs which are D-optimal have the highest breakdown number. Our calculations use methods from graph theory.  相似文献   

4.
A fast and accurate method of confidence interval construction for the smoothing parameter in penalised spline and partially linear models is proposed. The method is akin to a parametric percentile bootstrap where Monte Carlo simulation is replaced by saddlepoint approximation, and can therefore be viewed as an approximate bootstrap. It is applicable in a quite general setting, requiring only that the underlying estimator be the root of an estimating equation that is a quadratic form in normal random variables. This is the case under a variety of optimality criteria such as those commonly denoted by maximum likelihood (ML), restricted ML (REML), generalized cross validation (GCV) and Akaike's information criteria (AIC). Simulation studies reveal that under the ML and REML criteria, the method delivers a near‐exact performance with computational speeds that are an order of magnitude faster than existing exact methods, and two orders of magnitude faster than a classical bootstrap. Perhaps most importantly, the proposed method also offers a computationally feasible alternative when no known exact or asymptotic methods exist, e.g. GCV and AIC. An application is illustrated by applying the methodology to well‐known fossil data. Giving a range of plausible smoothed values in this instance can help answer questions about the statistical significance of apparent features in the data.  相似文献   

5.
Restricted maximum likelihood (REML) is a procedure for estimating a variance function in a heteroscedastic linear model. Although REML has been extended to non-linear models, the case in which the data are dominated by replicated observations with unknown values of the independent variable of interest, such as the concentration of a substance in a blood sample, has not been considered. We derive a REML procedure for an immunoassay and show that the resulting estimator is superior to those currently being used. Some interesting properties of the REML estimator are derived, and its relationship to other estimators is discussed.  相似文献   

6.
A simulation study of the binomial-logit model with correlated random effects is carried out based on the generalized linear mixed model (GLMM) methodology. Simulated data with various numbers of regression parameters and different values of the variance component are considered. The performance of approximate maximum likelihood (ML) and residual maximum likelihood (REML) estimators is evaluated. For a range of true parameter values, we report the average biases of estimators, the standard error of the average bias and the standard error of estimates over the simulations. In general, in terms of bias, the two methods do not show significant differences in estimating regression parameters. The REML estimation method is slightly better in reducing the bias of variance component estimates.  相似文献   

7.
Inverse probability weighting (IPW) can deal with confounding in non randomized studies. The inverse weights are probabilities of treatment assignment (propensity scores), estimated by regressing assignment on predictors. Problems arise if predictors can be missing. Solutions previously proposed include assuming assignment depends only on observed predictors and multiple imputation (MI) of missing predictors. For the MI approach, it was recommended that missingness indicators be used with the other predictors. We determine when the two MI approaches, (with/without missingness indicators) yield consistent estimators and compare their efficiencies.We find that, although including indicators can reduce bias when predictors are missing not at random, it can induce bias when they are missing at random. We propose a consistent variance estimator and investigate performance of the simpler Rubin’s Rules variance estimator. In simulations we find both estimators perform well. IPW is also used to correct bias when an analysis model is fitted to incomplete data by restricting to complete cases. Here, weights are inverse probabilities of being a complete case. We explain how the same MI methods can be used in this situation to deal with missing predictors in the weight model, and illustrate this approach using data from the National Child Development Survey.  相似文献   

8.
We propose an 1-regularized likelihood method for estimating the inverse covariance matrix in the high-dimensional multivariate normal model in presence of missing data. Our method is based on the assumption that the data are missing at random (MAR) which entails also the completely missing at random case. The implementation of the method is non-trivial as the observed negative log-likelihood generally is a complicated and non-convex function. We propose an efficient EM algorithm for optimization with provable numerical convergence properties. Furthermore, we extend the methodology to handle missing values in a sparse regression context. We demonstrate both methods on simulated and real data.  相似文献   

9.
We derive the optimal regression function (i.e., the best approximation in the L2 sense) when the vector of covariates has a random dimension. Furthermore, we consider applications of these results to problems in statistical regression and classification with missing covariates. It will be seen, perhaps surprisingly, that the correct regression function for the case with missing covariates can sometimes perform better than the usual regression function corresponding to the case with no missing covariates. This is because even if some of the covariates are missing, an indicator random variable δδ, which is always observable, and is equal to 1 if there are no missing values (and 0 otherwise), may have far more information and predictive power about the response variable Y than the missing covariates do. We also propose kernel-based procedures for estimating the correct regression function nonparametrically. As an alternative estimation procedure, we also consider the least-squares method.  相似文献   

10.
Density function is a fundamental concept in data analysis. Non-parametric methods including kernel smoothing estimate are available if the data is completely observed. However, in studies such as diagnostic studies following a two-stage design the membership of some of the subjects may be missing. Simply ignoring those subjects with unknown membership is valid only in the MCAR situation. In this paper, we consider kernel smoothing estimate of the density functions, using the inverse probability approaches to address the missing values. We illustrate the approaches with simulation studies and real study data in mental health.  相似文献   

11.
This work presents a new method to deal with missing values in financial time series. Previous works are generally based in state-space models and Kalman filter and few consider ARCH family models. The traditional approach is to bound the data together and perform the estimation without considering the presence of missing values. The existing methods generally consider missing values in the returns. The proposed method considers the presence of missing values in the price of the assets instead of in the returns. The performance of the method in estimating the parameters and the volatilities is evaluated through a Monte Carlo simulation. Value at risk is also considered in the simulation. An empirical application to NASDAQ 100 Index series is presented.  相似文献   

12.
Coefficient estimation in linear regression models with missing data is routinely carried out in the mean regression framework. However, the mean regression theory breaks down if the error variance is infinite. In addition, correct specification of the likelihood function for existing imputation approach is often challenging in practice, especially for skewed data. In this paper, we develop a novel composite quantile regression and a weighted quantile average estimation procedure for parameter estimation in linear regression models when some responses are missing at random. Instead of imputing the missing response by randomly drawing from its conditional distribution, we propose to impute both missing and observed responses by their estimated conditional quantiles given the observed data and to use the parametrically estimated propensity scores to weigh check functions that define a regression parameter. Both estimation procedures are resistant to heavy‐tailed errors or outliers in the response and can achieve nice robustness and efficiency. Moreover, we propose adaptive penalization methods to simultaneously select significant variables and estimate unknown parameters. Asymptotic properties of the proposed estimators are carefully investigated. An efficient algorithm is developed for fast implementation of the proposed methodologies. We also discuss a model selection criterion, which is based on an ICQ ‐type statistic, to select the penalty parameters. The performance of the proposed methods is illustrated via simulated and real data sets.  相似文献   

13.
A version of the nonparametric bootstrap, which resamples the entire subjects from original data, called the case bootstrap, has been increasingly used for estimating uncertainty of parameters in mixed‐effects models. It is usually applied to obtain more robust estimates of the parameters and more realistic confidence intervals (CIs). Alternative bootstrap methods, such as residual bootstrap and parametric bootstrap that resample both random effects and residuals, have been proposed to better take into account the hierarchical structure of multi‐level and longitudinal data. However, few studies have been performed to compare these different approaches. In this study, we used simulation to evaluate bootstrap methods proposed for linear mixed‐effect models. We also compared the results obtained by maximum likelihood (ML) and restricted maximum likelihood (REML). Our simulation studies evidenced the good performance of the case bootstrap as well as the bootstraps of both random effects and residuals. On the other hand, the bootstrap methods that resample only the residuals and the bootstraps combining case and residuals performed poorly. REML and ML provided similar bootstrap estimates of uncertainty, but there was slightly more bias and poorer coverage rate for variance parameters with ML in the sparse design. We applied the proposed methods to a real dataset from a study investigating the natural evolution of Parkinson's disease and were able to confirm that the methods provide plausible estimates of uncertainty. Given that most real‐life datasets tend to exhibit heterogeneity in sampling schedules, the residual bootstraps would be expected to perform better than the case bootstrap. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

14.
A common problem in the meta analysis of continuous data is that some studies do not report sufficient information to calculate the standard deviation (SDs) of the treatment effect. One of the approaches in handling this problem is through imputation. This article examines the empirical implications of imputing the missing SDs on the standard error (SE) of the overall meta analysis estimate. The simulation results show that if the SDs are missing under Missing Completely at Random and Missing at Random mechanism, imputation is recommended. With non random missing, imputation can lead to overestimation of the SE of the estimate.  相似文献   

15.
Time series arising in practice often have an inherently irregular sampling structure or missing values, that can arise for example due to a faulty measuring device or complex time-dependent nature. Spectral decomposition of time series is a traditionally useful tool for data variability analysis. However, existing methods for spectral estimation often assume a regularly-sampled time series, or require modifications to cope with irregular or ‘gappy’ data. Additionally, many techniques also assume that the time series are stationary, which in the majority of cases is demonstrably not appropriate. This article addresses the topic of spectral estimation of a non-stationary time series sampled with missing data. The time series is modelled as a locally stationary wavelet process in the sense introduced by Nason et al. (J. R. Stat. Soc. B 62(2):271–292, 2000) and its realization is assumed to feature missing observations. Our work proposes an estimator (the periodogram) for the process wavelet spectrum, which copes with the missing data whilst relaxing the strong assumption of stationarity. At the centre of our construction are second generation wavelets built by means of the lifting scheme (Sweldens, Wavelet Applications in Signal and Image Processing III, Proc. SPIE, vol. 2569, pp. 68–79, 1995), designed to cope with irregular data. We investigate the theoretical properties of our proposed periodogram, and show that it can be smoothed to produce a bias-corrected spectral estimate by adopting a penalized least squares criterion. We demonstrate our method with real data and simulated examples.  相似文献   

16.
We study methods to estimate regression and variance parameters for over-dispersed and correlated count data from highly stratified surveys. Our application involves counts of fish catches from stratified research surveys and we propose a novel model in fisheries science to address changes in survey protocols. A challenge with this model is the large number of nuisance parameters which leads to computational issues and biased statistical inferences. We use a computationally efficient profile generalized estimating equation method and compare it to marginal maximum likelihood (MLE) and restricted MLE (REML) methods. We use REML to address bias and inaccurate confidence intervals because of many nuisance parameters. The marginal MLE and REML approaches involve intractable integrals and we used a new R package that is designed for estimating complex nonlinear models that may include random effects. We conclude from simulation analyses that the REML method provides more reliable statistical inferences among the three methods we investigated.  相似文献   

17.
This paper considers the estimation of coefficients in a linear regression model with missing observations in the independent variables and introduces a modification of the standard first order regression method for imputation of missing values. The modification provides stochastic values for imputation and, as an extension, makes use of the principle of weighted mixed regression. The proposed procedures are compared with two popular procedures—one which utilizes only the complete observations and the other which employs the standard first order regression imputation method for missing values. A simulation experiment to evaluate the gain in efficiency and to examine interesting issues like the impact of varying degree of multicollinearity in explanatory variables is proceeded. Some work on the case of discrete regressor variables is in progress and will be reported in a future article to follow.  相似文献   

18.
Various statistical tests have been developed for testing the equality of means in matched pairs with missing values. However, most existing methods are commonly based on certain distributional assumptions such as normality, 0-symmetry or homoscedasticity of the data. The aim of this paper is to develop a statistical test that is robust against deviations from such assumptions and also leads to valid inference in case of heteroscedasticity or skewed distributions. This is achieved by applying a clever randomization approach to handle missing data. The resulting test procedure is not only shown to be asymptotically correct but is also finitely exact if the distribution of the data is invariant with respect to the considered randomization group. Its small sample performance is further studied in an extensive simulation study and compared to existing methods. Finally, an illustrative data example is analysed.  相似文献   

19.
We derive the best linear unbiased interpolation for the missing order statistics of a random sample using the well-known projection theorem. The proposed interpolation method only needs the first two moments on both sides of a missing order statistic. A simulation study is performed to compare the proposed method with a few interpolation methods for exponential and Lévy distributions.  相似文献   

20.
Hierarchical generalized linear models (HGLMs) have become popular in data analysis. However, their maximum likelihood (ML) and restricted maximum likelihood (REML) estimators are often difficult to compute, especially when the random effects are correlated; this is because obtaining the likelihood function involves high-dimensional integration. Recently, an h-likelihood method that does not involve numerical integration has been proposed. In this study, we show how an h-likelihood method can be implemented by modifying the existing ML and REML procedures. A small simulation study is carried out to investigate the performances of the proposed methods for HGLMs with correlated random effects.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号