首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 437 毫秒
1.
This paper extends the missing plot substitution technique to the case where the missing observations-cause some previously estimable functions to become non-estimable. It is shown that with appropriate modifications, the usual methods of analysis remain valid. We also obtain necessary and sufficient conditions under which the sum of squares due to a hypothesis can be calculated without “re-estimating” the missing observations  相似文献   

2.
A typical added variable plot is a commonly used plot in assessing the accuracy of a normal linear model. This plot is often used to evaluate the effect of adding an explanatory variable into the model and to detect possibly high leverage points or influential observations on the added variable. However, this type of plot is generally in doubt, once the normal distributional assumptions are violated. In this article, we extend the robust likelihood technique introduced by Royall and Tsou [11] to propose a robust added variable plot. The validity of this diagnostic plot requires no knowledge of the true underlying distributions so long as their second moments exist. The usefulness of the robust graphical approach is demonstrated through a few illustrations and simulations.  相似文献   

3.
A simple derivation of expected mean squares is given for the randomized (complete) block design, showing that “experimental error,” the error term for testing treatments, is comprised of three sources of variability: block by treatment interaction, within block plot-to-plot variability, and within experimental plot sampling variation. The approach could readily be extended to incorporate measurement error as a fourth component of experimental error.  相似文献   

4.
Missing response problem is ubiquitous in survey sampling, medical, social science and epidemiology studies. It is well known that non-ignorable missing is the most difficult missing data problem where the missing of a response depends on its own value. In statistical literature, unlike the ignorable missing data problem, not many papers on non-ignorable missing data are available except for the full parametric model based approach. In this paper we study a semiparametric model for non-ignorable missing data in which the missing probability is known up to some parameters, but the underlying distributions are not specified. By employing Owen (1988)’s empirical likelihood method we can obtain the constrained maximum empirical likelihood estimators of the parameters in the missing probability and the mean response which are shown to be asymptotically normal. Moreover the likelihood ratio statistic can be used to test whether the missing of the responses is non-ignorable or completely at random. The theoretical results are confirmed by a simulation study. As an illustration, the analysis of a real AIDS trial data shows that the missing of CD4 counts around two years are non-ignorable and the sample mean based on observed data only is biased.  相似文献   

5.
The analysis of doubly-balanced incomplete block designs (also known as 3-designs) with missing values using a competing effects model is discussed and illustrated with a numerical example.  相似文献   

6.
In clinical trials we always expect some missing data. If data are missing completely at random, then missing data can be ignored for the purpose of statistical inference. In most situation, however, ignoring missing data will introduce bias. Adjustment is possible for missing data if the missing mechanism is known, which is rare in real problems. Our approach is to estimate directly the mean outcome of each treatment group in the presence of missing data. To this end, we post-stratify all the subjects by the expected value of outcome (or by a variable predictive of the outcome) so that subjects within a stratum may be considered homogeneous with respect to the expected outcome, and assume that subjects within a stratum are missing at random. We apply this post-stratification approach to a recently concluded clinical trial where a high proportion of data are missing and the missingness depends on the same factors affecting the outcome variable. A simulation study shows that the post-stratification approach reduces the bias substantially compared to the naive approach where only non-missing subjects are analyzed.  相似文献   

7.
This article examines methods to efficiently estimate the mean response in a linear model with an unknown error distribution under the assumption that the responses are missing at random. We show how the asymptotic variance is affected by the estimator of the regression parameter, and by the imputation method. To estimate the regression parameter, the ordinary least squares is efficient only if the error distribution happens to be normal. If the errors are not normal, then we propose a one step improvement estimator or a maximum empirical likelihood estimator to efficiently estimate the parameter.To investigate the imputation’s impact on the estimation of the mean response, we compare the listwise deletion method and the propensity score method (which do not use imputation at all), and two imputation methods. We demonstrate that listwise deletion and the propensity score method are inefficient. Partial imputation, where only the missing responses are imputed, is compared to full imputation, where both missing and non-missing responses are imputed. Our results reveal that, in general, full imputation is better than partial imputation. However, when the regression parameter is estimated very poorly, the partial imputation will outperform full imputation. The efficient estimator for the mean response is the full imputation estimator that utilizes an efficient estimator of the parameter.  相似文献   

8.
Missing data often complicate the analysis of scientific data. Multiple imputation is a general purpose technique for analysis of datasets with missing values. The approach is applicable to a variety of missing data patterns but often complicated by some restrictions like the type of variables to be imputed and the mechanism underlying the missing data. In this paper, the authors compare the performance of two multiple imputation methods, namely fully conditional specification and multivariate normal imputation in the presence of ordinal outcomes with monotone missing data patterns. Through a simulation study and an empirical example, the authors show that the two methods are indeed comparable meaning any of the two may be used when faced with scenarios, at least, as the ones presented here.  相似文献   

9.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

10.
The estimation of the mixtures of regression models is usually based on the normal assumption of components and maximum likelihood estimation of the normal components is sensitive to noise, outliers, or high-leverage points. Missing values are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this article, we propose the mixtures of regression models for contaminated incomplete heterogeneous data. The proposed models provide robust estimates of regression coefficients varying across latent subgroups even under the presence of missing values. The methodology is illustrated through simulation studies and a real data analysis.  相似文献   

11.
A normal quantile-quantile (QQ) plot is an important diagnostic for checking the assumption of normality. Though useful, these plots confuse students in my introductory statistics classes. A water-filling analogy, however, intuitively conveys the underlying concept. This analogy characterizes a QQ plot as a parametric plot of the water levels in two gradually filling vases. Each vase takes its shape from a probability distribution or sample. If the vases share a common shape, then the water levels match throughout the filling, and the QQ plot traces a diagonal line. An R package qqvases provides an interactive animation of this process and is suitable for classroom use.  相似文献   

12.
Data consisting of ranks within blocks are considered for randomized block designs when there are missing values. Tied ranks are possible. Such data can be analysed using the Skillings–Mack test. Here we suggest a new approach based on carrying out an ANOVA on the ranks using the general linear model platform available in many statistical packages. Such a platform allows an ANOVA to be calculated when there are missing values. Indicative sizes and powers show the ANOVA approach performs better than the Skillings–Mack test.  相似文献   

13.
Graphical sensitivity analyses have recently been recommended for clinical trials with non‐ignorable missing outcome. We demonstrate an adaptation of this methodology for a continuous outcome of a trial of three cognitive‐behavioural therapies for mild depression in primary care, in which one arm had unexpectedly high levels of missing data. Fixed‐value and multiple imputations from a normal distribution (assuming either varying mean and fixed standard deviation, or fixed mean and varying standard deviation) were used to obtain contour plots of the contrast estimates with their P‐values superimposed, their confidence intervals, and the root mean square errors. Imputation was based either on the outcome value alone, or on change from baseline. The plots showed fixed‐value imputation to be more sensitive than imputing from a normal distribution, but the normally distributed imputations were subject to sampling noise. The contours of the sensitivity plots were close to linear in appearance, with the slope approximately equal to the ratio of the proportions of subjects with missing data in each trial arm.  相似文献   

14.
In this paper we study the cure rate survival model involving a competitive risk structure with missing categorical covariates. A parametric distribution that can be written as a sequence of one-dimensional conditional distributions is specified for the missing covariates. We consider the missing data at random situation so that the missing covariates may depend only on the observed ones. Parameter estimates are obtained by using the EM algorithm via the method of weights. Extensive simulation studies are conducted and reported to compare estimates efficiency with and without missing data. As expected, the estimation approach taking into consideration the missing covariates presents much better efficiency in terms of mean square errors than the complete case situation. Effects of increasing cured fraction and censored observations are also reported. We demonstrate the proposed methodology with two real data sets. One involved the length of time to obtain a BS degree in Statistics, and another about the time to breast cancer recurrence.  相似文献   

15.
We analyse longitudinal data on CD4 cell counts from patients who participated in clinical trials that compared two therapeutic treatments: zidovudine and didanosine. The investigators were interested in modelling the CD4 cell count as a function of treatment, age at base-line and disease stage at base-line. Serious concerns can be raised about the normality assumption of CD4 cell counts that is implicit in many methods and therefore an analysis may have to start with a transformation. Instead of assuming that we know the transformation (e.g. logarithmic) that makes the outcome normal and linearly related to the covariates, we estimate the transformation, by using maximum likelihood, within the Box–Cox family. There has been considerable work on the Box–Cox transformation for univariate regression models. Here, we discuss the Box–Cox transformation for longitudinal regression models when the outcome can be missing over time, and we also implement a maximization method for the likelihood, assumming that the missing data are missing at random.  相似文献   

16.
In this paper, we propose to monitor a Markov chain sampler using the cusum path plot of a chosen one-dimensional summary statistic. We argue that the cusum path plot can bring out, more effectively than the sequential plot, those aspects of a Markov sampler which tell the user how quickly or slowly the sampler is moving around in its sample space, in the direction of the summary statistic. The proposal is then illustrated in four examples which represent situations where the cusum path plot works well and not well. Moreover, a rigorous analysis is given for one of the examples. We conclude that the cusum path plot is an effective tool for convergence diagnostics of a Markov sampler and for comparing different Markov samplers.  相似文献   

17.
Complete and partial diallel cross designs are examined as to their construction and robustness against the loss of a block of observations. A simple generalized inverse is found for the information matrix of the line effects, which allows evaluation of expressions for the variances of the line-effect differences with and without the missing block. A-efficiencies, based on average variances of the elementary contrasts of the line-effects, suggest that these designs are fairly robust. The loss of efficiency is generally less than 10%, but it is shown that specific comparisons might suffer a loss of efficiency of as much as 40%.  相似文献   

18.
The emphasis in the literature is on normalizing transformations, despite the greater importance of the homogeneity of variance in analysis. A strategy for a choice of variance-stabilizing transformation is suggested. The relevant component of variation must be identified and, when this is not within-subject variation, a major explanatory variable must also be selected to subdivide the data. A plot of group standard deviation against group mean, or log standard deviation against log mean, may identify a simple power transformation or shifted log transformation. In other cases, within the shifted Box-Cox family of transformations, a contour plot to show the region of minimum heterogeneity defined by an appropriate index is proposed to enable an informed choice of transformation. If used in conjunction with the maximum-likelihood contour plot for the normalizing transformation, then it is possible to assess whether or not there exists a transformation that satisfies both criteria.  相似文献   

19.
Missing data in clinical trials are inevitable. We highlight the ICH guidelines and CPMP points to consider on missing data. Specifically, we outline how we should consider missing data issues when designing, planning and conducting studies to minimize missing data impact. We also go beyond the coverage of the above two documents, provide a more detailed review of the basic concepts of missing data and frequently used terminologies, and examples of the typical missing data mechanism, and discuss technical details and literature for several frequently used statistical methods and associated software. Finally, we provide a case study where the principles outlined in this paper are applied to one clinical program at protocol design, data analysis plan and other stages of a clinical trial.  相似文献   

20.
We consider the problem of full information maximum likelihood (FIML) estimation in factor analysis when a majority of the data values are missing. The expectation–maximization (EM) algorithm is often used to find the FIML estimates, in which the missing values on manifest variables are included in complete data. However, the ordinary EM algorithm has an extremely high computational cost. In this paper, we propose a new algorithm that is based on the EM algorithm but that efficiently computes the FIML estimates. A significant improvement in the computational speed is realized by not treating the missing values on manifest variables as a part of complete data. When there are many missing data values, it is not clear if the FIML procedure can achieve good estimation accuracy. In order to investigate this, we conduct Monte Carlo simulations under a wide variety of sample sizes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号