期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A model for comparisons with ordered categorical data

Stephen L. Meeks Ralph B. D' Agostino 《统计学通讯:理论与方法》2013,42(8):895-906

A model developed by Andrich for ordered categorical data is extended to develop tests for treatment effects with paired or matched samples. In particular, this includes analysis for pre-post studies and crossover designs. Some advantages of this model are that it allows for misclassification of subjects, yields reasonable conditional requirements for exact analysis, a normal approximation is good for all but the smallest of sample sizes, and it is relatively simple mathematically. Furthermore, the form of the tests derived are logical extensions of tests for unordered categories. 相似文献

2.

Area under the ROC curve comparison in the presence of missing data

Pablo Martínez-Camblor 《Journal of the Korean Statistical Society》2013,42(4):431-442

The area under the receiver operating characteristic (ROC) curve (AUC) is broadly accepted and often used as a diagnostic accuracy index. Moreover, the equality among the predictive capacity of two or more diagnostic systems is frequently checked from the comparison of their respective AUCs. In paired designs, this comparison is usually performed by using only the subjects who have collected all the necessary information, in the so-called available-case analysis. On the other hand, the presence of missing data is a frequent problem, especially in retrospective and observational studies. The loss of statistical power and the misuse of the available information (with the resulting ethical implications) are the main consequences. In this paper a non-parametric method is developed to exploit all available information. In order to approximate the distribution for the proposed statistic, the asymptotic distribution is computed and two different resampling plans are studied. In addition, the methodology is applied to a real-world medical problem. Finally, some technical issues are also reported in the Appendix. 相似文献

3.

Tests of independence in incomplete multi-way tables using likelihood functions

Shin-Soo Kang Michael D. Larsen 《Journal of the Korean Statistical Society》2012,41(2):189-198

Kang (2006) and Kang and Larsen (in press) used the log likelihood function with Lagrangian multipliers for estimation of cell probabilities in two-way incomplete contingency tables. This paper extends results and simulations to three-way and multi-way tables. Numerous studies cross-classify subjects by three or more categorical factors. Constraints on cell probabilities are incorporated through Lagrangian multipliers. Variances of the MLEs are derived from the matrix of second derivatives of the log likelihood with respect to cell probabilities and the Lagrange multiplier. Wald and likelihood ratio tests of independence are derived using the estimates and estimated variances. In simulation results in Kang and Larsen (in press), for data missing at random, maximum likelihood estimation (MLE) produced more efficient estimates of population proportions than either multiple imputation (MI) based on data augmentation or complete case (CC) analysis. Neither MLE nor MI, however, lead to an improvement over CC analysis with respect to power of tests for independence in two-way tables. Results are extended to multidimensional tables with arbitrary patterns of missing data when the variables are recorded on individual subjects. In three-way and higher-way tables, however, there is information relevant for judging independence in partially classified information, as long as two or more variables are jointly observed. Simulations study three-dimensional tables with three patterns of association and two levels of missing information. 相似文献

4.

On the asymptotic non‐equivalence of efficient‐GMM and MEL estimators in models with missing data

Xuerong Chen Yan Chen Alan T.K. Wan Yong Zhou 《Scandinavian Journal of Statistics》2019,46(2):361-388

The generalized method of moments (GMM) and empirical likelihood (EL) are popular methods for combining sample and auxiliary information. These methods are used in very diverse fields of research, where competing theories often suggest variables satisfying different moment conditions. Results in the literature have shown that the efficient‐GMM (GMM_E) and maximum empirical likelihood (MEL) estimators have the same asymptotic distribution to order n^?1/2 and that both estimators are asymptotically semiparametric efficient. In this paper, we demonstrate that when data are missing at random from the sample, the utilization of some well‐known missing‐data handling approaches proposed in the literature can yield GMM_E and MEL estimators with nonidentical properties; in particular, it is shown that the GMM_E estimator is semiparametric efficient under all the missing‐data handling approaches considered but that the MEL estimator is not always efficient. A thorough examination of the reason for the nonequivalence of the two estimators is presented. A particularly strong feature of our analysis is that we do not assume smoothness in the underlying moment conditions. Our results are thus relevant to situations involving nonsmooth estimating functions, including quantile and rank regressions, robust estimation, the estimation of receiver operating characteristic (ROC) curves, and so on. 相似文献

5.

A likelihood-based approach for multivariate one-sided tests with missing data

Guohai Zhou Lang Wu Rollin Brant J. Mark Ansermino 《Journal of applied statistics》2017,44(11):2000-2016

Inequality-restricted hypotheses testing methods containing multivariate one-sided testing methods are useful in practice, especially in multiple comparison problems. In practice, multivariate and longitudinal data often contain missing values since it may be difficult to observe all values for each variable. However, although missing values are common for multivariate data, statistical methods for multivariate one-sided tests with missing values are quite limited. In this article, motivated by a dataset in a recent collaborative project, we develop two likelihood-based methods for multivariate one-sided tests with missing values, where the missing data patterns can be arbitrary and the missing data mechanisms may be non-ignorable. Although non-ignorable missing data are not testable based on observed data, statistical methods addressing this issue can be used for sensitivity analysis and might lead to more reliable results, since ignoring informative missingness may lead to biased analysis. We analyse the real dataset in details under various possible missing data mechanisms and report interesting findings which are previously unavailable. We also derive some asymptotic results and evaluate our new tests using simulations. 相似文献

6.

Robust multivariate mixture regression models with incomplete data

Hwa Kyung Lim Naveen N. Narisetty 《Journal of Statistical Computation and Simulation》2017,87(2):328-347

Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis. 相似文献

7.

Tukey's gh Distribution for Multiple Imputation

《The American statistician》2013,67(3):251-256

Tukey proposed a class of distributions, the g-and-h family (gh family), based on a transformation of a standard normal variable to accommodate different skewness and elongation in the distribution of variables arising in practical applications. It is easy to draw values from this distribution even though it is hard to explicitly state the probability density function. Given this flexibility, the gh family may be extremely useful in creating multiple imputations for missing data. This article demonstrates how this family, as well as its generalizations, can be used in the multiple imputation analysis of incomplete data. The focus of this article is on a scalar variable with missing values. In the absence of any additional information, data are missing completely at random, and hence the correct analysis is the complete-case analysis. Thus, the application of the gh multiple imputation to the scalar cases affords comparison with the correct analysis and with other model-based multiple imputation methods. Comparisons are made using simulated datasets and the data from a survey of adolescents ascertaining driving after drinking alcohol. 相似文献

8.

A comparative study of tests for paired lifetime data

Wang Z Ng HK 《Lifetime data analysis》2006,12(4):505-522

In this paper, we investigate different procedures for testing the equality of two mean survival times in paired lifetime studies. We consider Owen’s M-test and Q-test, a likelihood ratio test, the paired t-test, the Wilcoxon signed rank test and a permutation test based on log-transformed survival times in the comparative study. We also consider the paired t-test, the Wilcoxon signed rank test and a permutation test based on original survival times for the sake of comparison. The size and power characteristics of these tests are studied by means of Monte Carlo simulations under a frailty Weibull model. For less skewed marginal distributions, the Wilcoxon signed rank test based on original survival times is found to be desirable. Otherwise, the M-test and the likelihood ratio test are the best choices in terms of power. In general, one can choose a test procedure based on information about the correlation between the two survival times and the skewness of the marginal survival distributions. 相似文献

9.

k-POD: A Method for k-Means Clustering of Missing Data 总被引：1，自引：0，他引：1

Jocelyn T. Chi Eric C. Chi Richard G. Baraniuk 《The American statistician》2013,67(1):91-99

The k-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, are common in many applications. Mainstream approaches to clustering missing data reduce the missing data problem to a complete data formulation through either deletion or imputation but these solutions may incur significant costs. Our k-POD method presents a simple extension of k-means clustering for missing data that works even when the missingness mechanism is unknown, when external information is unavailable, and when there is significant missingness in the data.

[Received November 2014. Revised August 2015.] 相似文献

10.

Handling of missing data in long‐term clinical trials: a case study

Mark Janssens Geert Molenberghs René Kerstens 《Pharmaceutical statistics》2012,11(6):442-448

Missing data in clinical trials is a well‐known problem, and the classical statistical methods used can be overly simple. This case study shows how well‐established missing data theory can be applied to efficacy data collected in a long‐term open‐label trial with a discontinuation rate of almost 50%. Satisfaction with treatment in chronically constipated patients was the efficacy measure assessed at baseline and every 3 months postbaseline. The improvement in treatment satisfaction from baseline was originally analyzed with a paired t‐test ignoring missing data and discarding the correlation structure of the longitudinal data. As the original analysis started from missing completely at random assumptions regarding the missing data process, the satisfaction data were re‐examined, and several missing at random (MAR) and missing not at random (MNAR) techniques resulted in adjusted estimate for the improvement in satisfaction over 12 months. Throughout the different sensitivity analyses, the effect sizes remained significant and clinically relevant. Thus, even for an open‐label trial design, sensitivity analysis, with different assumptions for the nature of dropouts (MAR or MNAR) and with different classes of models (selection, pattern‐mixture, or multiple imputation models), has been found useful and provides evidence towards the robustness of the original analyses; additional sensitivity analyses could be undertaken to further qualify robustness. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

11.

A mixed effects model for analyzing area under the curve of longitudinally measured biomarkers with missing data

Luoxi Shi Dorothy K. Hatsukami Joseph S. Koopmeiners Chap T. Le Neal L. Benowitz Eric C. Donny Xianghua Luo 《Pharmaceutical statistics》2021,20(6):1249-1264

A simple approach for analyzing longitudinally measured biomarkers is to calculate summary measures such as the area under the curve (AUC) for each individual and then compare the mean AUC between treatment groups using methods such as t test. This two-step approach is difficult to implement when there are missing data since the AUC cannot be directly calculated for individuals with missing measurements. Simple methods for dealing with missing data include the complete case analysis and imputation. A recent study showed that the estimated mean AUC difference between treatment groups based on the linear mixed model (LMM), rather than on individually calculated AUCs by simple imputation, has negligible bias under random missing assumptions and only small bias when missing is not at random. However, this model assumes the outcome to be normally distributed, which is often violated in biomarker data. In this paper, we propose to use a LMM on log-transformed biomarkers, based on which statistical inference for the ratio, rather than difference, of AUC between treatment groups is provided. The proposed method can not only handle the potential baseline imbalance in a randomized trail but also circumvent the estimation of the nuisance variance parameters in the log-normal model. The proposed model is applied to a recently completed large randomized trial studying the effect of nicotine reduction on biomarker exposure of smokers. 相似文献

12.

Sample size calculations for time-averaged difference of longitudinal binary outcomes

Ying Lou Jing Cao Song Zhang 《统计学通讯:理论与方法》2017,46(1):344-353

In clinical trials with repeated measurements, the responses from each subject are measured multiple times during the study period. Two approaches have been widely used to assess the treatment effect, one that compares the rate of change between two groups and the other that tests the time-averaged difference (TAD). While sample size calculations based on comparing the rate of change between two groups have been reported by many investigators, the literature has paid relatively little attention to the sample size estimation for time-averaged difference (TAD) in the presence of heterogeneous correlation structure and missing data in repeated measurement studies. In this study, we investigate sample size calculation for the comparison of time-averaged responses between treatment groups in clinical trials with longitudinally observed binary outcomes. The generalized estimating equation (GEE) approach is used to derive a closed-form sample size formula, which is flexible enough to account for arbitrary missing patterns and correlation structures. In particular, we demonstrate that the proposed sample size can accommodate a mixture of missing patterns, which is frequently encountered by practitioners in clinical trials. To our knowledge, this is the first study that considers the mixture of missing patterns in sample size calculation. Our simulation shows that the nominal power and type I error are well preserved over a wide range of design parameters. Sample size calculation is illustrated through an example. 相似文献

13.

Reference‐based sensitivity analysis for time‐to‐event data

Andrew Atkinson Michael G. Kenward Tim Clayton James R. Carpenter 《Pharmaceutical statistics》2019,18(6):645-658

The analysis of time‐to‐event data typically makes the censoring at random assumption, ie, that—conditional on covariates in the model—the distribution of event times is the same, whether they are observed or unobserved (ie, right censored). When patients who remain in follow‐up stay on their assigned treatment, then analysis under this assumption broadly addresses the de jure, or “while on treatment strategy” estimand. In such cases, we may well wish to explore the robustness of our inference to more pragmatic, de facto or “treatment policy strategy,” assumptions about the behaviour of patients post‐censoring. This is particularly the case when censoring occurs because patients change, or revert, to the usual (ie, reference) standard of care. Recent work has shown how such questions can be addressed for trials with continuous outcome data and longitudinal follow‐up, using reference‐based multiple imputation. For example, patients in the active arm may have their missing data imputed assuming they reverted to the control (ie, reference) intervention on withdrawal. Reference‐based imputation has two advantages: (a) it avoids the user specifying numerous parameters describing the distribution of patients' postwithdrawal data and (b) it is, to a good approximation, information anchored, so that the proportion of information lost due to missing data under the primary analysis is held constant across the sensitivity analyses. In this article, we build on recent work in the survival context, proposing a class of reference‐based assumptions appropriate for time‐to‐event data. We report a simulation study exploring the extent to which the multiple imputation estimator (using Rubin's variance formula) is information anchored in this setting and then illustrate the approach by reanalysing data from a randomized trial, which compared medical therapy with angioplasty for patients presenting with angina. 相似文献

14.

Inferences on missing information under multiple imputation and two-stage multiple imputation

Ofer Harel 《Statistical Methodology》2007,4(1):75-89

In the presence of missing values, researchers may be interested in the rates of missing information. The rates of missing information are (a) important for assessing how the missing information contributes to inferential uncertainty about, Q, the population quantity of interest, (b) are an important component in the decision of the number of imputations, and (c) can be used to test model uncertainty and model fitting. In this article I will derive the asymptotic distribution of the rates of missing information in two scenarios: the conventional multiple imputation (MI), and the two-stage MI. Numerically I will show that the proposed asymptotic distribution agrees with the simulated one. I will also suggest the number of imputations needed to obtain reliable missing information rate estimates for each method, based on the asymptotic distribution. 相似文献

15.

Optimum Design for Type-I Step-stress Accelerated Life Tests of Two-parameter Weibull Distributions

《统计学通讯:理论与方法》2012,41(21):3863-3877

In this article, we focus on the general k-step step-stress accelerated life tests with Type-I censoring for two-parameter Weibull distributions based on the tampered failure rate (TFR) model. We get the optimum design for the tests under the criterion of the minimization of the asymptotic variance of the maximum likelihood estimate of the pth percentile of the lifetime under the normal operating conditions. Optimum test plans for the simple step-stress accelerated life tests under Type-I censoring are developed for the Weibull distribution and the exponential distribution in particular. Finally, an example is provided to illustrate the proposed design and a sensitivity analysis is conducted to investigate the robustness of the design. 相似文献

16.

Data-driven sensitivity analysis to detect missing data mechanism with applications to structural equation modelling

Mortaza Jamshidian Ke-Hai Yuan 《Journal of Statistical Computation and Simulation》2013,83(7):1344-1362

Missing data are a common problem in almost all areas of empirical research. Ignoring the missing data mechanism, especially when data are missing not at random (MNAR), can result in biased and/or inefficient inference. Because MNAR mechanism is not verifiable based on the observed data, sensitivity analysis is often used to assess it. Current sensitivity analysis methods primarily assume a model for the response mechanism in conjunction with a measurement model and examine sensitivity to missing data mechanism via the parameters of the response model. Recently, Jamshidian and Mata (Post-modelling sensitivity analysis to detect the effect of missing data mechanism, Multivariate Behav. Res. 43 (2008), pp. 432–452) introduced a new method of sensitivity analysis that does not require the difficult task of modelling the missing data mechanism. In this method, a single measurement model is fitted to all of the data and to a sub-sample of the data. Discrepancy in the parameter estimates obtained from the the two data sets is used as a measure of sensitivity to missing data mechanism. Jamshidian and Mata describe their method mainly in the context of detecting data that are missing completely at random (MCAR). They used a bootstrap type method, that relies on heuristic input from the researcher, to test for the discrepancy of the parameter estimates. Instead of using bootstrap, the current article obtains confidence interval for parameter differences on two samples based on an asymptotic approximation. Because it does not use bootstrap, the developed procedure avoids likely convergence problems with the bootstrap methods. It does not require heuristic input from the researcher and can be readily implemented in statistical software. The article also discusses methods of obtaining sub-samples that may be used to test missing at random in addition to MCAR. An application of the developed procedure to a real data set, from the first wave of an ongoing longitudinal study on aging, is presented. Simulation studies are performed as well, using two methods of missing data generation, which show promise for the proposed sensitivity method. One method of missing data generation is also new and interesting in its own right. 相似文献

17.

A new multivariate imputation method based on Bayesian networks

P. Niloofar M. Ganjali 《Journal of applied statistics》2014,41(3):501-518

Dealing with incomplete data is a pervasive problem in statistical surveys. Bayesian networks have been recently used in missing data imputation. In this research, we propose a new methodology for the multivariate imputation of missing data using discrete Bayesian networks and conditional Gaussian Bayesian networks. Results from imputing missing values in coronary artery disease data set and milk composition data set as well as a simulation study from cancer-neapolitan network are presented to demonstrate and compare the performance of three Bayesian network-based imputation methods with those of multivariate imputation by chained equations (MICE) and the classical hot-deck imputation method. To assess the effect of the structure learning algorithm on the performance of the Bayesian network-based methods, two methods called Peter-Clark algorithm and greedy search-and-score have been applied. Bayesian network-based methods are: first, the method introduced by Di Zio et al. [Bayesian networks for imputation, J. R. Stat. Soc. Ser. A 167 (2004), 309–322] in which, each missing item of a variable is imputed using the information given in the parents of that variable; second, the method of Di Zio et al. [Multivariate techniques for imputation based on Bayesian networks, Neural Netw. World 15 (2005), 303–310] which uses the information in the Markov blanket set of the variable to be imputed and finally, our new proposed method which applies the whole available knowledge of all variables of interest, consisting the Markov blanket and so the parent set, to impute a missing item. Results indicate the high quality of our new proposed method especially in the presence of high missingness percentages and more connected networks. Also the new method have shown to be more efficient than the MICE method for small sample sizes with high missing rates. 相似文献

18.

Power and sample size for GEE analysis of incomplete paired outcomes in 2 × 2 crossover trials

Yongqiang Tang 《Pharmaceutical statistics》2021,20(4):820-839

The 2 × 2 crossover trial uses subjects as their own control to reduce the intersubject variability in the treatment comparison, and typically requires fewer subjects than a parallel design. The generalized estimating equations (GEE) methodology has been commonly used to analyze incomplete discrete outcomes from crossover trials. We propose a unified approach to the power and sample size determination for the Wald Z-test and t-test from GEE analysis of paired binary, ordinal and count outcomes in crossover trials. The proposed method allows misspecification of the variance and correlation of the outcomes, missing outcomes, and adjustment for the period effect. We demonstrate that misspecification of the working variance and correlation functions leads to no or minimal efficiency loss in GEE analysis of paired outcomes. In general, GEE requires the assumption of missing completely at random. For bivariate binary outcomes, we show by simulation that the GEE estimate is asymptotically unbiased or only minimally biased, and the proposed sample size method is suitable under missing at random (MAR) if the working correlation is correctly specified. The performance of the proposed method is illustrated with several numerical examples. Adaption of the method to other paired outcomes is discussed. 相似文献

19.

Some extensions of zero-inflated models and Bayesian tests for them

《Journal of Statistical Computation and Simulation》2012,82(18):3792-3810

Zero-inflated power series distribution is commonly used for modelling count data with extra zeros. Inflation at point zero has been investigated and several tests for zero inflation have been examined. However sometimes, inflation occurs at a point apart from zero. In this case, we say inflation occurs at an arbitrary point j. The j-inflation has been discussed less than zero inflation. In this paper, inflation at an arbitrary point j is studied with more details and a Bayesian test for detecting inflation at point j is presented. The Bayesian method is extended to inflation at arbitrary points i and j. The relationship between the distribution for inflation at point j, inflation at points i and j and missing value imputation is studied. It is shown how to obtain a proper estimate of the population variance if a mean-imputed missing at random data set is used. Some simulation studies are conducted and the proposed Bayesian test is applied on two real data sets. 相似文献

20.

The analysis of ordinal time-series data via a transition (Markov) model

Kathryn Bartimote-Aufflick Peter C. Thomson 《Journal of applied statistics》2011,38(9):1883-1897

While standard techniques are available for the analysis of time-series (longitudinal) data, and for ordinal (rating) data, not much is available for the combination of the two, at least in a readily-usable form. However, this data type is common place in the natural and health sciences where repeated ratings are recorded on the same subject. To analyse these data, this paper considers a transition (Markov) model where the rating of a subject at one time depends explicitly on the observed rating at the previous point of time by incorporating the previous rating as a predictor variable. Complications arise with adequate handling of data at the first observation (t=1), as there is no prior observation to use as a predictor. To overcome this, it is postulated the existence of a rating at time t=0; however it is treated as ‘missing data’ and the expectation–maximisation algorithm used to accommodate this. The particular benefits of this method are shown for shorter time series. 相似文献