首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Often the variables in a regression model are difficult or expensive to obtain so auxiliary variables are collected in a preliminary step of a study and the model variables are measured at later stages on only a subsample of the study participants called the validation sample. We consider a study in which at the first stage some variables, throughout called auxiliaries, are collected; at the second stage the true outcome is measured on a subsample of the first-stage sample, and at the third stage the true covariates are collected on a subset of the second-stage sample. In order to increase efficiency, the probabilities of selection into the second and third-stage samples are allowed to depend on the data observed at the previous stages. In this paper we describe a class of inverse-probability-of-selection-weighted semiparametric estimators for the parameters of the model for the conditional mean of the outcomes given the covariates. We assume that a subject's probability of being sampled at subsequent stages is bounded away from zero and depends only on the subject's data collected at the previous sampling stages. We show that the asymptotic variance of the optimal estimator in our class is equal to the semiparametric variance bound for the model. Since the optimal estimator depends on unknown population parameters it is not available for data analysis. We therefore propose an adaptive estimation procedure for locally efficient inferences. A simulation study is carried out to study the finite sample properties of the proposed estimators.  相似文献   

2.
Measurement error, the difference between a measured (observed) value of quantity and its true value, is perceived as a possible source of estimation bias in many surveys. To correct for such bias, a validation sample can be used in addition to the original sample for adjustment of measurement error. Depending on the type of validation sample, we can either use the internal calibration approach or the external calibration approach. Motivated by Korean Longitudinal Study of Aging (KLoSA), we propose a novel application of fractional imputation to correct for measurement error in the analysis of survey data. The proposed method is to create imputed values of the unobserved true variables, which are mis-measured in the main study, by using validation subsample. Furthermore, the proposed method can be directly applicable when the measurement error model is a mixture distribution. Variance estimation using Taylor linearization is developed. Results from a limited simulation study are also presented.  相似文献   

3.
An estimation procedure is proposed for the Cox model in cohort studies with validation sampling, where crude covariate information is observed for the full cohort and true covariate information is collected on a validation set sampled randomly from the full cohort. The method proposed makes use of the partial information from data that are available on the entire cohort by fitting a working Cox model relating crude covariates to the failure time. The resulting estimator is consistent regardless of the specification of the working model and is asymptotically more efficient than the validation-set-only estimator. Approximate asymptotic relative efficiencies with respect to some alternative methods are derived under a simple scenario and further studied numerically. The finite sample performance is investigated and compared with alternative methods via simulation studies. A similar procedure also works for the case where the validation set is a stratified random sample from the cohort.  相似文献   

4.
We consider regression analysis when part of covariates are incomplete in generalized linear models. The incomplete covariates could be due to measurement error or missing for some study subjects. We assume there exists a validation sample in which the data is complete and is a simple random subsample from the whole sample. Based on the idea of projection-solution method in Heyde (1997, Quasi-Likelihood and its Applications: A General Approach to Optimal Parameter Estimation. Springer, New York), a class of estimating functions is proposed to estimate the regression coefficients through the whole data. This method does not need to specify a correct parametric model for the incomplete covariates to yield a consistent estimate, and avoids the ‘curse of dimensionality’ encountered in the existing semiparametric method. Simulation results shows that the finite sample performance and efficiency property of the proposed estimates are satisfactory. Also this approach is computationally convenient hence can be applied to daily data analysis.  相似文献   

5.
Abstract.  The two-stage design is popular in epidemiology studies and clinical trials due to its cost effectiveness. Typically, the first stage sample contains cheaper and possibly biased information, while the second stage validation sample consists of a subset of subjects with accurate and complete information. In this paper, we study estimation of a survival function with right-censored survival data from a two-stage design. A non-parametric estimator is derived by combining data from both stages. We also study its large sample properties and derive pointwise and simultaneous confidence intervals for the survival function. The proposed estimator effectively reduces the variance and finite-sample bias of the Kaplan–Meier estimator solely based on the second stage validation sample. Finally, we apply our method to a real data set from a medical device postmarketing surveillance study.  相似文献   

6.
In biostatistical applications interest often focuses on the estimation of the distribution of time T between two consecutive events. If the initial event time is observed and the subsequent event time is only known to be larger or smaller than an observed monitoring time C, then the data conforms to the well understood singly-censored current status model, also known as interval censored data, case I. Additional covariates can be used to allow for dependent censoring and to improve estimation of the marginal distribution of T. Assuming a wrong model for the conditional distribution of T, given the covariates, will lead to an inconsistent estimator of the marginal distribution. On the other hand, the nonparametric maximum likelihood estimator of FT requires splitting up the sample in several subsamples corresponding with a particular value of the covariates, computing the NPMLE for every subsample and then taking an average. With a few continuous covariates the performance of the resulting estimator is typically miserable. In van der Laan, Robins (1996) a locally efficient one-step estimator is proposed for smooth functionals of the distribution of T, assuming nothing about the conditional distribution of T, given the covariates, but assuming a model for censoring, given the covariates. The estimators are asymptotically linear if the censoring mechanism is estimated correctly. The estimator also uses an estimator of the conditional distribution of T, given the covariates. If this estimate is consistent, then the estimator is efficient and if it is inconsistent, then the estimator is still consistent and asymptotically normal. In this paper we show that the estimators can also be used to estimate the distribution function in a locally optimal way. Moreover, we show that the proposed estimator can be used to estimate the distribution based on interval censored data (T is now known to lie between two observed points) in the presence of covariates. The resulting estimator also has a known influence curve so that asymptotic confidence intervals are directly available. In particular, one can apply our proposal to the interval censored data without covariates. In Geskus (1992) the information bound for interval censored data with two uniformly distributed monitoring times at the uniform distribution (for T has been computed. We show that the relative efficiency of our proposal w.r.t. this optimal bound equals 0.994, which is also reflected in finite sample simulations. Finally, the good practical performance of the estimator is shown in a simulation study. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

7.
We examine moving average (MA) filters for estimating the integrated variance (IV) of a financial asset price in a framework where high-frequency price data are contaminated with market microstructure noise. We show that the sum of squared MA residuals must be scaled to enable a suitable estimator of IV. The scaled estimator is shown to be consistent, first-order efficient, and asymptotically Gaussian distributed about the integrated variance under restrictive assumptions. Under more plausible assumptions, such as time-varying volatility, the MA model is misspecified. This motivates an extensive simulation study of the merits of the MA-based estimator under misspecification. Specifically, we consider nonconstant volatility combined with rounding errors and various forms of dependence between the noise and efficient returns. We benchmark the scaled MA-based estimator to subsample and realized kernel estimators and find that the MA-based estimator performs well despite the misspecification.  相似文献   

8.
Sporting careers observed over a preset time interval can be partitioned into two distinct subsamples. These samples consist of individuals whose careers had already commenced at the start of the time interval (prevalent subsample) and individuals whose careers began during the time interval (incident subsample) as well the respective individual-level covariate data such as salary, height, weight, performance statistics, draft position, etc. Under the assumption of a proportional hazards model, we propose a partial likelihood estimator to model the effect of covariates on survival via an adjusted risk set sampling procedure for when the incident cohort data is used in conjunction with the prevalent cohort data. We use simulated failure time data to validate the combined cohort proportional hazards methodology and illustrate our model using an NBA data set for career durations measured between 1990 and 2008.  相似文献   

9.
We examine moving average (MA) filters for estimating the integrated variance (IV) of a financial asset price in a framework where high-frequency price data are contaminated with market microstructure noise. We show that the sum of squared MA residuals must be scaled to enable a suitable estimator of IV. The scaled estimator is shown to be consistent, first-order efficient, and asymptotically Gaussian distributed about the integrated variance under restrictive assumptions. Under more plausible assumptions, such as time-varying volatility, the MA model is misspecified. This motivates an extensive simulation study of the merits of the MA-based estimator under misspecification. Specifically, we consider nonconstant volatility combined with rounding errors and various forms of dependence between the noise and efficient returns. We benchmark the scaled MA-based estimator to subsample and realized kernel estimators and find that the MA-based estimator performs well despite the misspecification.  相似文献   

10.
We consider failure time regression analysis with an auxiliary variable in the presence of a validation sample. We extend the nonparametric inference procedure of Zhou and Pepe to handle a continuous auxiliary or proxy covariate. We estimate the induced relative risk function with a kernel smoother and allow the selection probability of the validation set to depend on the observed covariates. We present some asymptotic properties for the kernel estimator and provide some simulation results. The method proposed is illustrated with a data set from an on-going epidemiologic study.  相似文献   

11.
An alternative to the conventional sample quantlle Is proposed as a nonparametrlc estimator of a continuous population quantlle.The alternative estimator Is a "generalized sample quantlle" obtained by averaging an appropriate subsample quantlle over all subsamples of .a fixed size.Since the resulting statistic is a U-statistic with representation also as a linear combination of order statistics, known results are employed then to establish asymptotic normality.The alternative estimator is shown to be asymptotically efficient in the class of nonparametrlc models specified by Pfanzagl (1975).Analytic results and Monte Carlo studies with a moderate sample size for a variety of distributions Indicate that the proposed estimator usually provides mean square error of estimation less than that of the conventional sample quantile.  相似文献   

12.
Simple nonparametric estimates of the conditional distribution of a response variable given a covariate are often useful for data exploration purposes or to help with the specification or validation of a parametric or semi-parametric regression model. In this paper we propose such an estimator in the case where the response variable is interval-censored and the covariate is continuous. Our approach consists in adding weights that depend on the covariate value in the self-consistency equation proposed by Turnbull (J R Stat Soc Ser B 38:290–295, 1976), which results in an estimator that is no more difficult to implement than Turnbull’s estimator itself. We show the convergence of our algorithm and that our estimator reduces to the generalized Kaplan–Meier estimator (Beran, Nonparametric regression with randomly censored survival data, 1981) when the data are either complete or right-censored. We demonstrate by simulation that the estimator, bootstrap variance estimation and bandwidth selection (by rule of thumb or cross-validation) all perform well in finite samples. We illustrate the method by applying it to a dataset from a study on the incidence of HIV in a group of female sex workers from Kinshasa.  相似文献   

13.
We consider the additive hazards regression analysis by utilising auxiliary covariate information to improve the efficiency of the statistical inference when the primary covariate is ascertained only for a randomly selected subsample. We construct a martingale-based estimating equation for the regression parameter and establish the asymptotic consistency and normality of the resultant estimator. Simulation study shows that our proposed method can improve the efficiency compared with the estimator which discards the auxiliary covariate information. A real example is also analysed as an illustration.  相似文献   

14.
A two-phase sampling estimator of the ratio-type for estimating the mean of a finite population, has been considered where the value of ρCy/Cx can be guessed or estimated in advance. Here Cy and Cx denote respectively the coefficients of variation of the characteristic under study, y, and the auxiliary characteristic x and ρ denotes the coefficient of correlation between y and x. When the value of ρCy/Cx is guessed or estimated exactly, the estimator has a smaller large-sample variance compared with either an ordinary ratio estimator or an ordinary linear regression estimator in two-phase sampling in the case where the first-phase sample is drawn independently from the second-phase sample. If the sample at the second phase is a subsample of the first-phase sample, the estimator has variance equal to that of the linear regression estimator. The largest value of the difference between the assumed value and the actual value of ρCy/Cx has been obtained so as not to result in the variance of the estimator being larger than the variances of either an ordinary ratio estimator or an ordinary linear regression estimator.  相似文献   

15.
Errors in measurement frequently occur in observing responses. If case–control data are based on certain reported responses, which may not be the true responses, then we have contaminated case–control data. In this paper, we first show that the ordinary logistic regression analysis based on contaminated case–control data can lead to very serious biased conclusions. This can be concluded from the results of a theoretical argument, one example, and two simulation studies. We next derive the semiparametric maximum likelihood estimate (MLE) of the risk parameter of a logistic regression model when there is a validation subsample. The asymptotic normality of the semiparametric MLE will be shown along with consistent estimate of asymptotic variance. Our example and two simulation studies show these estimates to have reasonable performance under finite sample situations.  相似文献   

16.
In this paper we propose a new nonparametric estimator of the conditional distribution function under a semiparametric censorship model. We establish an asymptotic representation of the estimator as a sum of iid random variables, balanced by some kernel weights. This representation is used for obtaining large sample results such as the rate of uniform convergence of the estimator, or its limit distributional law. We prove that the new estimator outperforms the conditional Kaplan–Meier estimator for censored data, in the sense that it exhibits lower asymptotic variance. Illustration through real data analysis is provided.  相似文献   

17.
We consider logistic regression with covariate measurement error. Most existing approaches require certain replicates of the error‐contaminated covariates, which may not be available in the data. We propose generalized method of moments (GMM) nonparametric correction approaches that use instrumental variables observed in a calibration subsample. The instrumental variable is related to the underlying true covariates through a general nonparametric model, and the probability of being in the calibration subsample may depend on the observed variables. We first take a simple approach adopting the inverse selection probability weighting technique using the calibration subsample. We then improve the approach based on the GMM using the whole sample. The asymptotic properties are derived, and the finite sample performance is evaluated through simulation studies and an application to a real data set.  相似文献   

18.
Sample covariance matrices play a central role in numerous popular statistical methodologies, for example principal components analysis, Kalman filtering and independent component analysis. However, modern random matrix theory indicates that, when the dimension of a random vector is not negligible with respect to the sample size, the sample covariance matrix demonstrates significant deviations from the underlying population covariance matrix. There is an urgent need to develop new estimation tools in such cases with high‐dimensional data to recover the characteristics of the population covariance matrix from the observed sample covariance matrix. We propose a novel solution to this problem based on the method of moments. When the parametric dimension of the population spectrum is finite and known, we prove that the proposed estimator is strongly consistent and asymptotically Gaussian. Otherwise, we combine the first estimation method with a cross‐validation procedure to select the unknown model dimension. Simulation experiments demonstrate the consistency of the proposed procedure. We also indicate possible extensions of the proposed estimator to the case where the population spectrum has a density.  相似文献   

19.
An extension of the linear growth curve model (Biometrics 38 (1982) 963) was proposed by Stukel and Demidenko (Biometrics 53 (1997) 720) to study the effects of population covariates on one or more characteristics of the curve, when the characteristics are expressed as linear combinations of the growth curve parameters. In the present paper, this general growth curve model receives a comprehensive theoretical treatment. A two-stage estimator, consisting of a generalized least squares estimator under constraints for the population parameters and a moment estimator for the variance parameters, is developed for application in the non-Gaussian error situation. Two likelihood based estimators, global maximum likelihood and second-stage maximum likelihood, are also developed. It is shown that all three estimators are consistent, asymptotically normally distributed, and efficient, and are equivalent when the number of individuals tends to infinity. An expression for the bias in the estimator of the population parameters is derived under second stage model misspecification. We show that if parameters that are not of primary interest are incorrectly specified, bias may occur in parameters that are of interest using the standard growth curve model. The general growth curve model does not require specification of such nuisance parameters and is robust in terms of bias. The general linear growth curve model is used to study the effects of host sex on pancreatic tumor growth in rats.  相似文献   

20.
We examine the finite sample properties of the maximum likelihood estimator for the binary logit model with random covariates. Previous studies have either relied on large-sample asymptotics or have assumed non-random covariates. Analytic expressions for the first-order bias and second-order mean squared error function for the maximum likelihood estimator in this model are derived, and we undertake numerical evaluations to illustrate these analytic results for the single covariate case. For various data distributions, the bias of the estimator is signed the same as the covariate’s coefficient, and both the absolute bias and the mean squared errors increase symmetrically with the absolute value of that parameter. The behaviour of a bias-adjusted maximum likelihood estimator, constructed by subtracting the (maximum likelihood) estimator of the first-order bias from the original estimator, is examined in a Monte Carlo experiment. This bias-correction is effective in all of the cases considered, and is recommended for use when this logit model is estimated by maximum likelihood using small samples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号