首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Inequality-restricted hypotheses testing methods containing multivariate one-sided testing methods are useful in practice, especially in multiple comparison problems. In practice, multivariate and longitudinal data often contain missing values since it may be difficult to observe all values for each variable. However, although missing values are common for multivariate data, statistical methods for multivariate one-sided tests with missing values are quite limited. In this article, motivated by a dataset in a recent collaborative project, we develop two likelihood-based methods for multivariate one-sided tests with missing values, where the missing data patterns can be arbitrary and the missing data mechanisms may be non-ignorable. Although non-ignorable missing data are not testable based on observed data, statistical methods addressing this issue can be used for sensitivity analysis and might lead to more reliable results, since ignoring informative missingness may lead to biased analysis. We analyse the real dataset in details under various possible missing data mechanisms and report interesting findings which are previously unavailable. We also derive some asymptotic results and evaluate our new tests using simulations.  相似文献   

2.
The occurrence of missing data is an often unavoidable consequence of repeated measures studies. Fortunately, multivariate general linear models such as growth curve models and linear mixed models with random effects have been well developed to analyze incomplete normally-distributed repeated measures data. Most statistical methods have assumed that the missing data occur at random. This assumption may include two types of missing data mechanism: missing completely at random (MCAR) and missing at random (MAR) in the sense of Rubin (1976). In this paper, we develop a test procedure for distinguishing these two types of missing data mechanism for incomplete normally-distributed repeated measures data. The proposed test is similar in spiril to the test of Park and Davis (1992). We derive the test for incomplete normally-distribrlted repeated measures data using linear mixed models. while Park and Davis (1992) cleirved thr test for incomplete repeatctl categorical data in the framework of Grizzle Starmer. and Koch (1969). Thr proposed procedure can be applied easily to any other multivariate general linear model which allow for missing data. The test is illustrated using the hip-replacernent patient.data from Crowder and Hand (1990).  相似文献   

3.
Between-group comparisons often entail many correlated response variables. The multivariate linear model, with its assumption of multivariate normality, is the accepted standard tool for these tests. When this assumption is violated, the non-parametric multivariate Kruskal–Wallis (MKW) test is frequently used. However, this test requires complete cases with no missing values in response variables. Deletion of cases with missing values likely leads to inefficient statistical inference. Here we extend the MKW test to retain information from partially observed cases. Results of simulated studies and analysis of real data show that the proposed method provides adequate coverage and superior power to complete case analyses.  相似文献   

4.
There are various techniques for dealing with incomplete data; some are computationally highly intensive and others are not as computationally intensive, while all may be comparable in their efficiencies. In spite of these developments, analysis using only the complete data subset is performed when using popular statistical software. In an attempt to demonstrate the efficiencies and advantages of using all available data, we compared several approaches that are relatively simple but efficient alternatives to those using the complete data subset for analyzing repeated measures data with missing values, under the assumption of a multivariate normal distribution of the data. We also assumed that the missing values occur in a monotonic pattern and completely at random. The incomplete data procedure is demonstrated to be more powerful than the procedure of using the complete data subset, generally when the within-subject correlation gets large. One other principal finding is that even with small sample data, for which various covariance models may be indistinguishable, the empirical size and power are shown to be sensitive to misspecified assumptions about the covariance structure. Overall, the testing procedures that do not assume any particular covariance structure are shown to be more robust in keeping the empirical size at the nominal level than those assuming a special structure.  相似文献   

5.
Missing data are a common problem in almost all areas of empirical research. Ignoring the missing data mechanism, especially when data are missing not at random (MNAR), can result in biased and/or inefficient inference. Because MNAR mechanism is not verifiable based on the observed data, sensitivity analysis is often used to assess it. Current sensitivity analysis methods primarily assume a model for the response mechanism in conjunction with a measurement model and examine sensitivity to missing data mechanism via the parameters of the response model. Recently, Jamshidian and Mata (Post-modelling sensitivity analysis to detect the effect of missing data mechanism, Multivariate Behav. Res. 43 (2008), pp. 432–452) introduced a new method of sensitivity analysis that does not require the difficult task of modelling the missing data mechanism. In this method, a single measurement model is fitted to all of the data and to a sub-sample of the data. Discrepancy in the parameter estimates obtained from the the two data sets is used as a measure of sensitivity to missing data mechanism. Jamshidian and Mata describe their method mainly in the context of detecting data that are missing completely at random (MCAR). They used a bootstrap type method, that relies on heuristic input from the researcher, to test for the discrepancy of the parameter estimates. Instead of using bootstrap, the current article obtains confidence interval for parameter differences on two samples based on an asymptotic approximation. Because it does not use bootstrap, the developed procedure avoids likely convergence problems with the bootstrap methods. It does not require heuristic input from the researcher and can be readily implemented in statistical software. The article also discusses methods of obtaining sub-samples that may be used to test missing at random in addition to MCAR. An application of the developed procedure to a real data set, from the first wave of an ongoing longitudinal study on aging, is presented. Simulation studies are performed as well, using two methods of missing data generation, which show promise for the proposed sensitivity method. One method of missing data generation is also new and interesting in its own right.  相似文献   

6.
Crossover designs are used often in clinical trials. It is not uncommon that subjects discontinue before completing all treatment periods in a crossover study. Despite availability of statistical methodologies utilizing all available data and software for obtaining valid inferences under the assumption of missing at random (MAR), naïve approaches, such as the complete case (CC) analysis, which is only valid with a strong assumption of missing completely at random are still widely used in practice. In this article, we obtain the analytical form of the estimation bias of treatment effects with CC for linear mixed models. We use simulation studies to examine the inflation of Type I error and efficiency loss in the inferences with CC under MAR. Invalidity and inefficiency of two other commonly used approaches for defining analyzed data in the presence of missing data, including data from at least two periods in three period crossover and available cases for a specific comparison of interest, are also demonstrated through simulation studies.  相似文献   

7.
We propose a new regression-based filter for extracting signals online from multivariate high frequency time series. It separates relevant signals of several variables from noise and (multivariate) outliers.

Unlike parallel univariate filters, the new procedure takes into account the local covariance structure between the single time series components. It is based on high-breakdown estimates, which makes it robust against (patches of) outliers in one or several of the components as well as against outliers with respect to the multivariate covariance structure. Moreover, the trade-off problem between bias and variance for the optimal choice of the window width is approached by choosing the size of the window adaptively, depending on the current data situation.

Furthermore, we present an advanced algorithm of our filtering procedure that includes the replacement of missing observations in real time. Thus, the new procedure can be applied in online-monitoring practice. Applications to physiological time series from intensive care show the practical effect of the proposed filtering technique.  相似文献   

8.
Although Fan showed that the mixed-effects model for repeated measures (MMRM) is appropriate to analyze complete longitudinal binary data in terms of the rate difference, they focused on using the generalized estimating equations (GEE) to make statistical inference. The current article emphasizes validity of the MMRM when the normal-distribution-based pseudo likelihood approach is used to make inference for complete longitudinal binary data. For incomplete longitudinal binary data with missing at random missing mechanism, however, the MMRM, using either the GEE or the normal-distribution-based pseudo likelihood inferential procedure, gives biased results in general and should not be used for analysis.  相似文献   

9.
A MATLAB package testing for multivariate normality (TMVN) is implemented as an interactive and graphical tool to examine multivariate normality (MVN). Monte Carlo simulation studies have failed to find a uniformly most powerful MVN test, which requires a rather extensive statistical inference procedure. TMVN contains several competitive MVN tests and provides a flexible and extensive testing environment for univariate or multivariate data analyses. Simulated results provide information of which test may possess more power for the selected non-MVN alternatives. Fisher's Iris data are used to show how TMVN can be used in practice.  相似文献   

10.
This paper proposes two asymptotic expansions relating to discrimination based on two-step monotone missing samples. These asymptotic expansions have been obtained by Okamoto (1963) and McLachlan (1973) for complete data under multivariate normality. This paper extends the results up to the terms of the first order in the case of two-step monotone missing samples, respectively. Especially, these asymptotic expansions play important roles in obtaining the asymptotic approximations for the probabilities of misclassification in discriminant analysis. The simulation studies have been also conducted in order to evaluate the accuracy of the approximation derived in this paper.  相似文献   

11.
Bayesian palaeoclimate reconstruction   总被引:1,自引:0,他引:1  
Summary.  We consider the problem of reconstructing prehistoric climates by using fossil data that have been extracted from lake sediment cores. Such reconstructions promise to provide one of the few ways to validate modern models of climate change. A hierarchical Bayesian modelling approach is presented and its use, inversely, is demonstrated in a relatively small but statistically challenging exercise: the reconstruction of prehistoric climate at Glendalough in Ireland from fossil pollen. This computationally intensive method extends current approaches by explicitly modelling uncertainty and reconstructing entire climate histories. The statistical issues that are raised relate to the use of compositional data (pollen) with covariates (climate) which are available at many modern sites but are missing for the fossil data. The compositional data arise as mixtures and the missing covariates have a temporal structure. Novel aspects of the analysis include a spatial process model for compositional data, local modelling of lattice data, the use, as a prior, of a random walk with long-tailed increments, a two-stage implementation of the Markov chain Monte Carlo approach and a fast approximate procedure for cross-validation in inverse problems. We present some details, contrasting its reconstructions with those which have been generated by a method in use in the palaeoclimatology literature. We suggest that the method provides a basis for resolving important challenging issues in palaeoclimate research. We draw attention to several challenging statistical issues that need to be overcome.  相似文献   

12.
A set of longitudinal binary, partially incomplete, data on obesity among children in the USA is reanalysed. The multivariate Bernoulli distribution is parameterized by the univariate marginal probabilities and dependence ratios of all orders, which together support maximum likelihood inference. The temporal association of obesity is strong and complex but stationary. We fit a saturated model for the distribution of response patterns and find that non-response is missing completely at random for boys but that the probability of obesity is consistently higher among girls who provided incomplete records than among girls who provided complete records. We discuss the statistical and substantive features of, respectively, pattern mixture and selection models for this data set.  相似文献   

13.
Methods for linear regression with multivariate response variables are well described in statistical literature. In this study we conduct a theoretical evaluation of the expected squared prediction error in bivariate linear regression where one of the response variables contains missing data. We make the assumption of known covariance structure for the error terms. On this basis, we evaluate three well-known estimators: standard ordinary least squares, generalized least squares, and a James–Stein inspired estimator. Theoretical risk functions are worked out for all three estimators to evaluate under which circumstances it is advantageous to take the error covariance structure into account.  相似文献   

14.
Summary.  We consider three sorts of diagnostics for random imputations: displays of the completed data, which are intended to reveal unusual patterns that might suggest problems with the imputations, comparisons of the distributions of observed and imputed data values and checks of the fit of observed data to the model that is used to create the imputations. We formulate these methods in terms of sequential regression multivariate imputation, which is an iterative procedure in which the missing values of each variable are randomly imputed conditionally on all the other variables in the completed data matrix. We also consider a recalibration procedure for sequential regression imputations. We apply these methods to the 2002 environmental sustainability index, which is a linear aggregation of 64 environmental variables on 142 countries.  相似文献   

15.
In statistical models involving constrained or missing data, likelihoods containing integrals emerge. In the case of both constrained and missing data, the result is a ratio of integrals, which for multivariate data may defy exact or approximate analytic expression. Seeking maximum-likelihood estimates in such settings, we propose Monte Carlo approximants for these integrals, and subsequently maximize the resulting approximate likelihood. Iteration of this strategy expedites the maximization, while the Gibbs sampler is useful for the required Monte Carlo generation. As a result, we handle a class of models broader than the customary EM setting without using an EM-type algorithm. Implementation of the methodology is illustrated in two numerical examples.  相似文献   

16.
Multiple imputation (MI) is an increasingly popular method for analysing incomplete multivariate data sets. One of the most crucial assumptions of this method relates to mechanism leading to missing data. Distinctness is typically assumed, which indicates a complete independence of mechanisms underlying missingness and data generation. In addition, missing at random or missing completely at random is assumed, which explicitly states under which conditions missingness is independent of observed data. Despite common use of MI under these assumptions, plausibility and sensitivity to these fundamental assumptions have not been well-investigated. In this work, we investigate the impact of non-distinctness and non-ignorability. In particular, non-ignorability is due to unobservable cluster-specific effects (e.g. random-effects). Through a comprehensive simulation study, we show that MI inferences suggest that nonignoriability due to non-distinctness do not immediately imply dismal performance while non-ignorability due to missing not at random leads to quite subpar performance.  相似文献   

17.
Non-response (or missing data) is often encountered in large-scale surveys. To enable the behavioural analysis of these data sets, statistical treatments are commonly applied to complete or remove these data. However, the correctness of such procedures critically depends on the nature of the underlying missingness generation process. Clearly, the efficacy of applying either case deletion or imputation procedures rests on the unknown missingness generation mechanism. The contribution of this paper is twofold. The study is the first to propose a simple sequential method to attempt to identify the form of missingness. Second, the effectiveness of the tests is assessed by generating (experimentally) nine missing data sets by imposed MCAR, MAR and NMAR processes, with data removed.  相似文献   

18.
A longitudinal study commonly follows a set of variables, measured for each individual repeatedly over time, and usually suffers from incomplete data problem. A common approach for dealing with longitudinal categorical responses is to use the Generalized Linear Mixed Model (GLMM). This model induces the potential relation between response variables over time via a vector of random effects, assumed to be shared parameters in the non-ignorable missing mechanism. Most GLMMs assume that the random-effects parameters follow a normal or symmetric distribution and this leads to serious problems in real applications. In this paper, we propose GLMMs for the analysis of incomplete multivariate longitudinal categorical responses with a non-ignorable missing mechanism based on a shared parameter framework with the less restrictive assumption of skew-normality for the random effects. These models may contain incomplete data with monotone and non-monotone missing patterns. The performance of the model is evaluated using simulation studies and a well-known longitudinal data set extracted from a fluvoxamine trial is analyzed to determine the profile of fluvoxamine in ambulatory clinical psychiatric practice.  相似文献   

19.
Traditional factor analysis (FA) rests on the assumption of multivariate normality. However, in some practical situations, the data do not meet this assumption; thus, the statistical inference made from such data may be misleading. This paper aims at providing some new tools for the skew-normal (SN) FA model when missing values occur in the data. In such a model, the latent factors are assumed to follow a restricted version of multivariate SN distribution with additional shape parameters for accommodating skewness. We develop an analytically feasible expectation conditional maximization algorithm for carrying out parameter estimation and imputation of missing values under missing at random mechanisms. The practical utility of the proposed methodology is illustrated with two real data examples and the results are compared with those obtained from the traditional FA counterparts.  相似文献   

20.
Summary.  We apply multivariate shrinkage to estimate local area rates of unemployment and economic inactivity by using UK Labour Force Survey data. The method exploits the similarity of the rates of claiming unemployment benefit and the unemployment rates as defined by the International Labour Organisation. This is done without any distributional assumptions, merely relying on the high correlation of the two rates. The estimation is integrated with a multiple-imputation procedure for missing employment status of subjects in the database (item non-response). The hot deck method that is used in the imputations is adapted to reflect the uncertainty in the model for non-response. The method is motivated as a development (improvement) of the current operational procedure in which the imputed value is a non-stochastic function of the data. An extension of the procedure to subjects who are absent from the database (unit non-response) is proposed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号