首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
Summary.  Problems of the analysis of data with incomplete observations are all too familiar in statistics. They are doubly difficult if we are also uncertain about the choice of model. We propose a general formulation for the discussion of such problems and develop approximations to the resulting bias of maximum likelihood estimates on the assumption that model departures are small. Loss of efficiency in parameter estimation due to incompleteness in the data has a dual interpretation: the increase in variance when an assumed model is correct; the bias in estimation when the model is incorrect. Examples include non-ignorable missing data, hidden confounders in observational studies and publication bias in meta-analysis. Doubling variances before calculating confidence intervals or test statistics is suggested as a crude way of addressing the possibility of undetectably small departures from the model. The problem of assessing the risk of lung cancer from passive smoking is used as a motivating example.  相似文献   

2.
A correlated probit model approximation for conditional probabilities (Mendell and Elston 1974) is used to estimate the variance for binary matched pairs data by maximum likelihood. Using asymptotic data, the bias of the estimates is shown to be small for a wide range of intra-class correlations and incidences. This approximation is also compared with other recently published, or implemented, improved approximations. For the small sample examples presented, it shows a substantial advantage over other approximations. The method is extended to allow covariates for each observation, and fitting by iteratively reweighted least squares.  相似文献   

3.
Singh and Arnab (2010) presented a bias adjustment to the jackknife variance estimator of Rao and Sitter (1995) in the presence of non-response. In their paper, they obtained a second-order approximation of the bias of the Rao-Sitter variance estimator and then proposed a bias-adjusted estimator based on this approximation. To compare their proposed variance estimator to various other variance estimators, they performed a simulation study and showed that their variance estimator is superior to the Rao-Sitter variance estimator. In fact they showed that the Rao-Sitter variance estimator suffers from severe underestimation. These results contradict those in the literature, which indicate that the Rao-Sitter variance estimator suffers from a positive bias if the sampling fractions are not negligible; see Rao and Sitter (1995), Lee et al. (1995) and Haziza and Picard (2011). Because of this contradiction, we felt that a further investigation was warranted. In this paper, we attempt to recreate the results of Singh and Arnab (2010) and, in fact, show that their second order approximation to the bias of the Rao-Sitter variance estimator is incorrect and that their simulation results are also questionable.  相似文献   

4.
Compositional data can be transformed to directional data by the square root transformation and then modelled by using the Kent distribution. The current approach for estimating the parameters in the Kent model for compositional data relies on a large concentration assumption which assumes that the majority of the transformed data is not distributed too close to the boundaries of the positive orthant. When the data is distributed close to the boundaries with large variance significant folding may result. To treat this case we propose new estimators of the parameters derived based on the actual folded Kent distribution which are obtained via the EM algorithm. We show that these new estimators significantly reduce the bias in the current estimators when both the sample size and amount of folding is moderately large. We also propose using a saddlepoint density approximation for the Kent distribution normalising constant in order to more accurately estimate the shape parameters when the concentration is small or only moderately large.  相似文献   

5.
We derived two methods to estimate the logistic regression coefficients in a meta-analysis when only the 'aggregate' data (mean values) from each study are available. The estimators we proposed are the discriminant function estimator and the reverse Taylor series approximation. These two methods of estimation gave similar estimators using an example of individual data. However, when aggregate data were used, the discriminant function estimators were quite different from the other two estimators. A simulation study was then performed to evaluate the performance of these two estimators as well as the estimator obtained from the model that simply uses the aggregate data in a logistic regression model. The simulation study showed that all three estimators are biased. The bias increases as the variance of the covariate increases. The distribution type of the covariates also affects the bias. In general, the estimator from the logistic regression using the aggregate data has less bias and better coverage probabilities than the other two estimators. We concluded that analysts should be cautious in using aggregate data to estimate the parameters of the logistic regression model for the underlying individual data.  相似文献   

6.
In many applications, it is of interest to simultaneously model the mean and variance of a response when no replication exists. Modeling the mean and variance simultaneously is commonly referred to as dual modeling. Parametric approaches to dual modeling are popular when the underlying mean and variance functions can be expressed explicitly. Quite often, however, nonparametric approaches are more appropriate due to the presence of unusual curvature in the underlying functions. In sparse data situations, nonparametric methods often fit the data too closely while parametric estimates exhibit problems with bias. We propose a semi-parametric dual modeling approach [Dual Model Robust Regression (DMRR)] for non-replicated data. DMRR combines parametric and nonparametric fits resulting in improved mean and variance estimation. The methodology is illustrated with a data set from the literature as well as via a simulation study.  相似文献   

7.
Subject dropout is an inevitable problem in longitudinal studies. It makes the analysis challenging when the main interest is the change in outcome from baseline to endpoint of study. The last observation carried forward (LOCF) method is a very common approach for handling this problem. It assumes that the last measured outcome is frozen in time after the point of dropout, an unrealistic assumption given any time trends. Though existence and direction of the bias can sometimes be anticipated, the more important statistical question involves the actual magnitude of the bias and this requires computation. This paper provides explicit expressions for the exact bias in the LOCF estimates of mean change and its variance when the longitudinal data follow a linear mixed-effects model with linear time trajectories. General dropout patterns are considered that may depend on treatment group, subject-specific trajectories and follow different time to dropout distributions. In our case studies, the magnitude of bias for mean change estimators linearly increases as time to dropout decreases. The bias depends heavily on the dropout interval. The variance term is always underestimated.  相似文献   

8.
This article proposes a fast approximation for the small sample bias correction of the iterated bootstrap. The approximation adapts existing fast approximation techniques of the bootstrap p-value and quantile functions to the problem of estimating the bias function. We show an optimality result which holds under general conditions not requiring an asymptotic pivot. Monte Carlo evidence, from the linear instrumental variable model and the nonlinear GMM, suggest that in addition to its computational appeal and success in reducing the mean and median bias in identified models, the fast approximation provides scope for bias reduction in weakly identified configurations.  相似文献   

9.
We consider the problem of estimating the quantiles of a distribution function in a fixed design regression model in which the observations are subject to random right censoring. The quantile estimator is defined via a conditional Kaplan-Meier type estimator for the distribution at a given design point. We establish an a.s. asymptotic representation for this quantile estimator, from which we obtain its asymptotic normality. Because a complicated estimation procedure is necessary for estimating the asymptotic bias and variance, we use a resampling procedure, which provides us, via an asymptotic representation for the bootstrapped estimator, with an alternative for the normal approximation.  相似文献   

10.
The author considers (asymptotically) minimax extrapolation designs for an approximately multiple linear model with the model contaminant f being restricted only by its L2 norm. He splits the integrated mean squared prediction error (IMSPE) of the fitted value over the extrapolation space into two parts, namely the integrated prediction variance (IPV) and the integrated prediction bias (IPB). For a spherical design space and an annular extrapolation space, he constructs the design that minimizes the maximum value, over f, of IPB subject to bounding IPV. He also constructs the design that minimizes IPV subject to bounding the maximum IPB.  相似文献   

11.
The Rasch model is useful in the problem of estimating the population size from multiple incomplete lists. It is of great interest to tell whether there are list effects and whether individuals differ in their catchabilities. These two important model selection problems can be easily addressed conditionally. A conditional likelihood ratio test is used to evaluate the list effects and several graphical methods are used to diagnose the individual catchabilities, while neither the unknown population size nor the unknown mixing distribution of individual catchabilities is required to be estimated. Three epidemiological applications are used for illustration.  相似文献   

12.
This paper addresses the problem of obtaining maximum likelihood estimates for the parameters of the Pearson Type I distribution (beta distribution with unknown end points and shape parameters). Since they do not seem to have appeared in the literature, the likelihood equations and the information matrix are derived. The regularity conditions which ensure asymptotic normality and efficiency are examined, and some apparent conflicts in the literature are noted. To ensure regularity, the shape parameters must be greater than two, giving an (assymmetrical) bell-shaped distribution with high contact in the tails. A numerical investigation was carried out to explore the bias and variance of the maximum likelihood estimates and their dependence on sample size. The numerical study indicated that only for large samples (n ≥ 1000) does the bias in the estimates become small and does the Cramér-Rao bound give a good approximation for their variance. The likelihood function has a global maximum which corresponds to parameter estimates that are inadmissable. Useful parameter estimates can be obtained at a local maximum, which is sometimes difficult to locate when the sample size is small.  相似文献   

13.
A bioequivalence test is to compare bioavailability parameters, such as the maximum observed concentration (Cmax) or the area under the concentration‐time curve, for a test drug and a reference drug. During the planning of a bioequivalence test, it requires an assumption about the variance of Cmax or area under the concentration‐time curve for the estimation of sample size. Since the variance is unknown, current 2‐stage designs use variance estimated from stage 1 data to determine the sample size for stage 2. However, the estimation of variance with the stage 1 data is unstable and may result in too large or too small sample size for stage 2. This problem is magnified in bioequivalence tests with a serial sampling schedule, by which only one sample is collected from each individual and thus the correct assumption of variance becomes even more difficult. To solve this problem, we propose 3‐stage designs. Our designs increase sample sizes over stages gradually, so that extremely large sample sizes will not happen. With one more stage of data, the power is increased. Moreover, the variance estimated using data from both stages 1 and 2 is more stable than that using data from stage 1 only in a 2‐stage design. These features of the proposed designs are demonstrated by simulations. Testing significance levels are adjusted to control the overall type I errors at the same level for all the multistage designs.  相似文献   

14.
The capture-recapture method is applied to estimate the population size of a target population based on ascertainment data in epidemiological applications. We generalize the three-list case of Chao & Tsay (1998) to situations where more than three lists are available. An estimation procedure is presented using the concept of sample coverage, which can be interpreted as a measure of overlap information among multiple list records. When there is enough overlap, an estimator of the total population size is proposed. The bootstrap method is used to construct a variance estimator and confidence interval. If the overlap rate is relatively low, then the population size cannot be precisely estimated and thus only a lower (upper) bound is proposed for positively (negatively) dependent lists. The proposed method is applied to two data sets, one with a high and one with a low overlap rate.  相似文献   

15.
In high-dimensional linear regression, the dimension of variables is always greater than the sample size. In this situation, the traditional variance estimation technique based on ordinary least squares constantly exhibits a high bias even under sparsity assumption. One of the major reasons is the high spurious correlation between unobserved realized noise and several predictors. To alleviate this problem, a refitted cross-validation (RCV) method has been proposed in the literature. However, for a complicated model, the RCV exhibits a lower probability that the selected model includes the true model in case of finite samples. This phenomenon may easily result in a large bias of variance estimation. Thus, a model selection method based on the ranks of the frequency of occurrences in six votes from a blocked 3×2 cross-validation is proposed in this study. The proposed method has a considerably larger probability of including the true model in practice than the RCV method. The variance estimation obtained using the model selected by the proposed method also shows a lower bias and a smaller variance. Furthermore, theoretical analysis proves the asymptotic normality property of the proposed variance estimation.  相似文献   

16.
In this paper we examine the small sample distribution of the likelihood ratio test in the random effects model which is often recommended for meta-analyses. We find that this distribution depends strongly on the true value of the heterogeneity parameter (between-study variance) of the model, and that the correct p-value may be quite different from its large sample approximation. We recommend that the dependence of the heterogeneity parameter be examined for the data at hand and suggest a (simulation) method for this. Our setup allows for explanatory variables on the study level (meta-regression) and we discuss other possible applications, too. Two data sets are analyzed and two simulation studies are performed for illustration.  相似文献   

17.
18.
A simulation study of the binomial-logit model with correlated random effects is carried out based on the generalized linear mixed model (GLMM) methodology. Simulated data with various numbers of regression parameters and different values of the variance component are considered. The performance of approximate maximum likelihood (ML) and residual maximum likelihood (REML) estimators is evaluated. For a range of true parameter values, we report the average biases of estimators, the standard error of the average bias and the standard error of estimates over the simulations. In general, in terms of bias, the two methods do not show significant differences in estimating regression parameters. The REML estimation method is slightly better in reducing the bias of variance component estimates.  相似文献   

19.
This paper discusses a general strategy for reducing measurement-error-induced bias in statistical models. It is assumed that the measurement error is unbiased with a known variance although no other distributional assumptions on the measurement-error are employed,

Using a preliminary fit of the model to the observed data, a transformation of the variable measured with error is estimated. The transformation is constructed so that the estimates obtained by refitting the model to the ‘corrected’ data have smaller bias,

Whereas the general strategy can be applied in a number of settings, this paper focuses on the problem of covariate measurement error in generalized linear models, Two estimators are derived and their effectiveness at reducing bias is demonstrated in a Monte Carlo study.  相似文献   

20.
Abstract. Frailty models with a non‐parametric baseline hazard are widely used for the analysis of survival data. However, their maximum likelihood estimators can be substantially biased in finite samples, because the number of nuisance parameters associated with the baseline hazard increases with the sample size. The penalized partial likelihood based on a first‐order Laplace approximation still has non‐negligible bias. However, the second‐order Laplace approximation to a modified marginal likelihood for a bias reduction is infeasible because of the presence of too many complicated terms. In this article, we find adequate modifications of these likelihood‐based methods by using the hierarchical likelihood.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号