首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the parametric regression model, the covariate missing problem under missing at random is considered. It is often desirable to use flexible parametric or semiparametric models for the covariate distribution, which can reduce a potential misspecification problem. Recently, a completely nonparametric approach was developed by [H.Y. Chen, Nonparametric and semiparametric models for missing covariates in parameter regression, J. Amer. Statist. Assoc. 99 (2004), pp. 1176–1189; Z. Zhang and H.E. Rockette, On maximum likelihood estimation in parametric regression with missing covariates, J. Statist. Plann. Inference 47 (2005), pp. 206–223]. Although it does not require a model for the covariate distribution or the missing data mechanism, the proposed method assumes that the covariate distribution is supported only by observed values. Consequently, their estimator is a restricted maximum likelihood estimator (MLE) rather than the global MLE. In this article, we show the restricted semiparametric MLE could be very misleading in some cases. We discuss why this problem occurs and suggest an algorithm to obtain the global MLE. Then, we assess the performance of the proposed method via some simulation experiments.  相似文献   

2.
Incomplete covariate data is a common occurrence in many studies in which the outcome is survival time. With generalized linear models, when the missing covariates are categorical, a useful technique for obtaining parameter estimates is the EM by the method of weights proposed in Ibrahim (1990). In this article, we extend the EM by the method of weights to survival outcomes whose distributions may not fall in the class of generalized linear models. This method requires the estimation of the parameters of the distribution of the covariates. We present a clinical trials example with five covariates, four of which have some missing values.  相似文献   

3.
Summary. In many biomedical studies, covariates are subject to measurement error. Although it is well known that the regression coefficients estimators can be substantially biased if the measurement error is not accommodated, there has been little study of the effect of covariate measurement error on the estimation of the dependence between bivariate failure times. We show that the dependence parameter estimator in the Clayton–Oakes model can be considerably biased if the measurement error in the covariate is not accommodated. In contrast with the typical bias towards the null for marginal regression coefficients, the dependence parameter can be biased in either direction. We introduce a bias reduction technique for the bivariate survival function in copula models while assuming an additive measurement error model and replicated measurement for the covariates, and we study the large and small sample properties of the dependence parameter estimator proposed.  相似文献   

4.
Multilevel models have been widely applied to analyze data sets which present some hierarchical structure. In this paper we propose a generalization of the normal multilevel models, named elliptical multilevel models. This proposal suggests the use of distributions in the elliptical class, thus involving all symmetric continuous distributions, including the normal distribution as a particular case. Elliptical distributions may have lighter or heavier tails than the normal ones. In the case of normal error models with the presence of outlying observations, heavy-tailed error models may be applied to accommodate such observations. In particular, we discuss some aspects of the elliptical multilevel models, such as maximum likelihood estimation and residual analysis to assess features related to the fitting and the model assumptions. Finally, two motivating examples analyzed under normal multilevel models are reanalyzed under Student-t and power exponential multilevel models. Comparisons with the normal multilevel model are performed by using residual analysis.  相似文献   

5.
We propose a general framework for regression models with functional response containing a potentially large number of flexible effects of functional and scalar covariates. Special emphasis is put on historical functional effects, where functional response and functional covariate are observed over the same interval and the response is only influenced by covariate values up to the current grid point. Historical functional effects are mostly used when functional response and covariate are observed on a common time interval, as they account for chronology. Our formulation allows for flexible integration limits including, e.g., lead or lag times. The functional responses can be observed on irregular curve-specific grids. Additionally, we introduce different parameterizations for historical effects and discuss identifiability issues.The models are estimated by a component-wise gradient boosting algorithm which is suitable for models with a potentially high number of covariate effects, even more than observations, and inherently does model selection. By minimizing corresponding loss functions, different features of the conditional response distribution can be modeled, including generalized and quantile regression models as special cases. The methods are implemented in the open-source R package FDboost. The methodological developments are motivated by biotechnological data on Escherichia coli fermentations, but cover a much broader model class.  相似文献   

6.
Small area estimation is studied under a nested error linear regression model with area level covariate subject to measurement error. Ghosh and Sinha (2007) obtained a pseudo-Bayes (PB) predictor of a small area mean and a corresponding pseudo-empirical Bayes (PEB) predictor, using the sample means of the observed covariate values to estimate the true covariate values. In this paper, we first derive an efficient PB predictor by using all the available data to estimate true covariate values. We then obtain a corresponding PEB predictor and show that it is asymptotically “optimal”. In addition, we employ a jackknife method to estimate the mean squared prediction error (MSPE) of the PEB predictor. Finally, we report the results of a simulation study on the performance of our PEB predictor and associated jackknife MSPE estimator. Our results show that the proposed PEB predictor can lead to significant gain in efficiency over the previously proposed PEB predictor. Area level models are also studied.  相似文献   

7.
ABSTRACT

Latent variable modeling is commonly used in behavioral, social, and medical science research. The models used in such analysis relate all observed variables to latent common factors. In many applications, the observations are highly non normal or discrete, e.g., polytomous responses or counts. The existing approaches for non normal observations can be considered lacking in several aspects, especially for multi-group samples situations. We propose a generalized linear model approach for multi-sample latent variable analysis that can handle a broad class of non normal and discrete observations, and that furnishes meaningful interpretation and inference in multi-group studies through maximum likelihood analysis. A Monte Carlo EM algorithm is proposed for parameter estimation. The convergence assessment and standard error estimation is addressed. Simulation studies are reported to show the usefulness of the our approach. An example from a substance abuse prevention study is also presented.  相似文献   

8.
Many estimation procedures for quantitative linear models with autocorrelated errors have been proposed in the literature. A number of these procedures have been compared in various ways for different sample sizes and autocorrelation parameters values and for structured or random explanatory vaiables. In this paper, we revisit three situations that were considered to some extent in previous studies, by comparing ten estimation procedures: Ordinary Least Squares (OLS), Generalized Least Squares (GLS), estimated Generalized Least Squares (six procedures), Maximum Likelihood (ML), and First Differences (FD). The six estimated GLS procedures and the ML procedure differ in the way the error autocovariance matrix is estimated. The three situations can be defined as follows: Case 1, the explanatory variable x in the simple linear regression is fixed; Case 2,x is purely random; and Case 3x is first-order autoregressive. Following a theoretical presentation, the ten estimation procedures are compared in a Monte Carlo study conducted in the time domain, where the errors are first-order autoregressive in Cases 1-3. The measure of comparison for the estimation procedures is their efficiency relative to OLS. It is evaluated as a function of the time series length and the magnitude and sign of the error autocorrelation parameter. Overall, knowledge of the model of the time series process generating the errors enhances efficiency in estimated GLS. Differences in the efficiency of estimation procedures between Case 1 and Cases 2 and 3 as well as differences in efficiency among procedures in a given situation are observed and discussed.  相似文献   

9.
Observations collected over time are often autocorrelated rather than independent, and sometimes include observations below or above detection limits (i.e. censored values reported as less or more than a level of detection) and/or missing data. Practitioners commonly disregard censored data cases or replace these observations with some function of the limit of detection, which often results in biased estimates. Moreover, parameter estimation can be greatly affected by the presence of influential observations in the data. In this paper we derive local influence diagnostic measures for censored regression models with autoregressive errors of order p (hereafter, AR(p)‐CR models) on the basis of the Q‐function under three useful perturbation schemes. In order to account for censoring in a likelihood‐based estimation procedure for AR(p)‐CR models, we used a stochastic approximation version of the expectation‐maximisation algorithm. The accuracy of the local influence diagnostic measure in detecting influential observations is explored through the analysis of empirical studies. The proposed methods are illustrated using data, from a study of total phosphorus concentration, that contain left‐censored observations. These methods are implemented in the R package ARCensReg.  相似文献   

10.
We consider parametric regression problems with some covariates missing at random. It is shown that the regression parameter remains identifiable under natural conditions. When the always observed covariates are discrete, we propose a semiparametric maximum likelihood method, which does not require parametric specification of the missing data mechanism or the covariate distribution. The global maximum likelihood estimator (MLE), which maximizes the likelihood over the whole parameter set, is shown to exist under simple conditions. For ease of computation, we also consider a restricted MLE which maximizes the likelihood over covariate distributions supported by the observed values. Under regularity conditions, the two MLEs are asymptotically equivalent and strongly consistent for a class of topologies on the parameter set.  相似文献   

11.
This article deals with parameter estimation in the Cox proportional hazards model when covariates are measured with error. We consider both the classical additive measurement error model and a more general model which represents the mis-measured version of the covariate as an arbitrary linear function of the true covariate plus random noise. Only moment conditions are imposed on the distributions of the covariates and measurement error. Under the assumption that the covariates are measured precisely for a validation set, we develop a class of estimating equations for the vector-valued regression parameter by correcting the partial likelihood score function. The resultant estimators are proven to be consistent and asymptotically normal with easily estimated variances. Furthermore, a corrected version of the Breslow estimator for the cumulative hazard function is developed, which is shown to be uniformly consistent and, upon proper normalization, converges weakly to a zero-mean Gaussian process. Simulation studies indicate that the asymptotic approximations work well for practical sample sizes. The situation in which replicate measurements (instead of a validation set) are available is also studied.  相似文献   

12.
Two bootstrap procedures are introduced into the hybrid of the backfitting algorithm and the Cochrane–Orcutt procedure in the estimation of a spatial-temporal model. The use of time blocks of consecutive observations in resampling steps proved to be optimal in terms of stability and efficiency of estimates. Between iterations, there were minimal changes in the empirical distributions of the parameter estimates associated with the covariate and temporal effects indicating convergence of the algorithm. Crop yield data are used to illustrate the proposed methods.

The simulation study indicated that prediction error from the fitted model (estimated from either Method 1 or Method 2) is very low. Also, the prediction error is relatively robust to the number of spatial units and the number of time points.  相似文献   

13.
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.  相似文献   

14.
We regard the simple linear calibration problem where only the response y of the regression line y = β0 + β1 t is observed with errors. The experimental conditions t are observed without error. For the errors of the observations y we assume that there may be some gross errors providing outlying observations. This situation can be modeled by a conditionally contaminated regression model. In this model the classical calibration estimator based on the least squares estimator has an unbounded asymptotic bias. Therefore we introduce calibration estimators based on robust one-step-M-estimators which have a bounded asymptotic bias. For this class of estimators we discuss two problems: The optimal estimators and their corresponding optimal designs. We derive the locally optimal solutions and show that the maximin efficient designs for non-robust estimation and robust estimation coincide.  相似文献   

15.
16.
In this study, we consider the problem of selecting explanatory variables of fixed effects in linear mixed models under covariate shift, which is when the values of covariates in the model for prediction differ from those in the model for observed data. We construct a variable selection criterion based on the conditional Akaike information introduced by Vaida & Blanchard (2005). We focus especially on covariate shift in small area estimation and demonstrate the usefulness of the proposed criterion. In addition, numerical performance is investigated through simulations, one of which is a design‐based simulation using a real dataset of land prices. The Canadian Journal of Statistics 46: 316–335; 2018 © 2018 Statistical Society of Canada  相似文献   

17.
Coefficient estimation in linear regression models with missing data is routinely carried out in the mean regression framework. However, the mean regression theory breaks down if the error variance is infinite. In addition, correct specification of the likelihood function for existing imputation approach is often challenging in practice, especially for skewed data. In this paper, we develop a novel composite quantile regression and a weighted quantile average estimation procedure for parameter estimation in linear regression models when some responses are missing at random. Instead of imputing the missing response by randomly drawing from its conditional distribution, we propose to impute both missing and observed responses by their estimated conditional quantiles given the observed data and to use the parametrically estimated propensity scores to weigh check functions that define a regression parameter. Both estimation procedures are resistant to heavy‐tailed errors or outliers in the response and can achieve nice robustness and efficiency. Moreover, we propose adaptive penalization methods to simultaneously select significant variables and estimate unknown parameters. Asymptotic properties of the proposed estimators are carefully investigated. An efficient algorithm is developed for fast implementation of the proposed methodologies. We also discuss a model selection criterion, which is based on an ICQ ‐type statistic, to select the penalty parameters. The performance of the proposed methods is illustrated via simulated and real data sets.  相似文献   

18.
Using some logarithmic and integral transformation we transform a continuous covariate frailty model into a polynomial regression model with a random effect. The responses of this mixed model can be ‘estimated’ via conditional hazard function estimation. The random error in this model does not have zero mean and its variance is not constant along the covariate and, consequently, these two quantities have to be estimated. Since the asymptotic expression for the bias is complicated, the two-large-bandwidth trick is proposed to estimate the bias. The proposed transformation is very useful for clustered incomplete data subject to left truncation and right censoring (and for complex clustered data in general). Indeed, in this case no standard software is available to fit the frailty model, whereas for the transformed model standard software for mixed models can be used for estimating the unknown parameters in the original frailty model. A small simulation study illustrates the good behavior of the proposed method. This method is applied to a bladder cancer data set.  相似文献   

19.
金蛟等 《统计研究》2021,38(11):150-160
回归模型在经济学、生物医学、流行病学、工农业生产等众多领域有着广泛的应用,而在实际数据收集时常常出现无法获得变量的精确数据或全部数据的情况,即常碰到测量误差数据、缺失数据等复杂数据情形。对于回归模型中存在测量误差的情况,如在参数估计时不加以修正,则易产生估计偏差,使得估计精度下降。对于数据缺失情形,如不采取合理的处理方法也会导致模型分析结果不佳。故此,本文研究含有测量误差数据时,解释变量具有随机缺失时的线性测量误差模型和部分线性测量误差模型的稳健参数估计问题。本文提出了一种在测量误差服从拉普拉斯分布时参数的损失修正估计,通过蒙特卡洛模拟和医学研究中的实证分析,显示本文所提的估计方法具有偏差小、精度高、稳健性强的优势。  相似文献   

20.
The additive hazards model is one of the most commonly used regression models in the analysis of failure time data and many methods have been developed for its inference in various situations. However, no established estimation procedure exists when there are covariates with missing values and the observed responses are interval-censored; both types of complications arise in various settings including demographic, epidemiological, financial, medical and sociological studies. To address this deficiency, we propose several inverse probability weight-based and reweighting-based estimation procedures for the situation where covariate values are missing at random. The resulting estimators of regression model parameters are shown to be consistent and asymptotically normal. The numerical results that we report from a simulation study suggest that the proposed methods work well in practical situations. An application to a childhood cancer survival study is provided. The Canadian Journal of Statistics 48: 499–517; 2020 © 2020 Statistical Society of Canada  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号