期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Testing Periodicity in Short Series and Application to Gene Expression Data

M. Shahidul Islam 《统计学通讯:模拟与计算》2013,42(4):561-573

Testing for periodicity in microarray time series encounters the challenges of short series length, missing values and presence of non-Fourier frequencies. In this article, a test method for such series has been proposed. The method is completely simulation based and finds p-values for test of periodicity through fitting Pearson Type VI distribution. The simulation results compare and reveal the excellence of this method over Fisher's g test for varying series length, frequencies, and error variance. This approach is applied to Caulobacter crescentus cell cycle data in order to demonstrate the practical performance. 相似文献

2.

A new multivariate imputation method based on Bayesian networks

P. Niloofar M. Ganjali 《Journal of applied statistics》2014,41(3):501-518

Dealing with incomplete data is a pervasive problem in statistical surveys. Bayesian networks have been recently used in missing data imputation. In this research, we propose a new methodology for the multivariate imputation of missing data using discrete Bayesian networks and conditional Gaussian Bayesian networks. Results from imputing missing values in coronary artery disease data set and milk composition data set as well as a simulation study from cancer-neapolitan network are presented to demonstrate and compare the performance of three Bayesian network-based imputation methods with those of multivariate imputation by chained equations (MICE) and the classical hot-deck imputation method. To assess the effect of the structure learning algorithm on the performance of the Bayesian network-based methods, two methods called Peter-Clark algorithm and greedy search-and-score have been applied. Bayesian network-based methods are: first, the method introduced by Di Zio et al. [Bayesian networks for imputation, J. R. Stat. Soc. Ser. A 167 (2004), 309–322] in which, each missing item of a variable is imputed using the information given in the parents of that variable; second, the method of Di Zio et al. [Multivariate techniques for imputation based on Bayesian networks, Neural Netw. World 15 (2005), 303–310] which uses the information in the Markov blanket set of the variable to be imputed and finally, our new proposed method which applies the whole available knowledge of all variables of interest, consisting the Markov blanket and so the parent set, to impute a missing item. Results indicate the high quality of our new proposed method especially in the presence of high missingness percentages and more connected networks. Also the new method have shown to be more efficient than the MICE method for small sample sizes with high missing rates. 相似文献

3.

Sequential imputation for models with latent variables assuming latent ignorability

Lauren J. Beesley Jeremy M. G. Taylor Roderick J. A. Little 《Australian & New Zealand Journal of Statistics》2019,61(2):213-233

Models that involve an outcome variable, covariates, and latent variables are frequently the target for estimation and inference. The presence of missing covariate or outcome data presents a challenge, particularly when missingness depends on the latent variables. This missingness mechanism is called latent ignorable or latent missing at random and is a generalisation of missing at random. Several authors have previously proposed approaches for handling latent ignorable missingness, but these methods rely on prior specification of the joint distribution for the complete data. In practice, specifying the joint distribution can be difficult and/or restrictive. We develop a novel sequential imputation procedure for imputing covariate and outcome data for models with latent variables under latent ignorable missingness. The proposed method does not require a joint model; rather, we use results under a joint model to inform imputation with less restrictive modelling assumptions. We discuss identifiability and convergence‐related issues, and simulation results are presented in several modelling settings. The method is motivated and illustrated by a study of head and neck cancer recurrence. Imputing missing data for models with latent variables under latent‐dependent missingness without specifying a full joint model. 相似文献

4.

A gradient-based algorithm for semiparametric models with missing covariates

《Journal of Statistical Computation and Simulation》2012,82(4):381-390

In the parametric regression model, the covariate missing problem under missing at random is considered. It is often desirable to use flexible parametric or semiparametric models for the covariate distribution, which can reduce a potential misspecification problem. Recently, a completely nonparametric approach was developed by [H.Y. Chen, Nonparametric and semiparametric models for missing covariates in parameter regression, J. Amer. Statist. Assoc. 99 (2004), pp. 1176–1189; Z. Zhang and H.E. Rockette, On maximum likelihood estimation in parametric regression with missing covariates, J. Statist. Plann. Inference 47 (2005), pp. 206–223]. Although it does not require a model for the covariate distribution or the missing data mechanism, the proposed method assumes that the covariate distribution is supported only by observed values. Consequently, their estimator is a restricted maximum likelihood estimator (MLE) rather than the global MLE. In this article, we show the restricted semiparametric MLE could be very misleading in some cases. We discuss why this problem occurs and suggest an algorithm to obtain the global MLE. Then, we assess the performance of the proposed method via some simulation experiments. 相似文献

5.

Using the EM-algorithm for survival data with incomplete categorical covariates

Stuart R. Lipsitz Joseph G. Ibrahim 《Lifetime data analysis》1996,2(1):5-14

Incomplete covariate data is a common occurrence in many studies in which the outcome is survival time. With generalized linear models, when the missing covariates are categorical, a useful technique for obtaining parameter estimates is the EM by the method of weights proposed in Ibrahim (1990). In this article, we extend the EM by the method of weights to survival outcomes whose distributions may not fall in the class of generalized linear models. This method requires the estimation of the parameters of the distribution of the covariates. We present a clinical trials example with five covariates, four of which have some missing values. 相似文献

6.

Power and sample size for GEE analysis of incomplete paired outcomes in 2 × 2 crossover trials

Yongqiang Tang 《Pharmaceutical statistics》2021,20(4):820-839

The 2 × 2 crossover trial uses subjects as their own control to reduce the intersubject variability in the treatment comparison, and typically requires fewer subjects than a parallel design. The generalized estimating equations (GEE) methodology has been commonly used to analyze incomplete discrete outcomes from crossover trials. We propose a unified approach to the power and sample size determination for the Wald Z-test and t-test from GEE analysis of paired binary, ordinal and count outcomes in crossover trials. The proposed method allows misspecification of the variance and correlation of the outcomes, missing outcomes, and adjustment for the period effect. We demonstrate that misspecification of the working variance and correlation functions leads to no or minimal efficiency loss in GEE analysis of paired outcomes. In general, GEE requires the assumption of missing completely at random. For bivariate binary outcomes, we show by simulation that the GEE estimate is asymptotically unbiased or only minimally biased, and the proposed sample size method is suitable under missing at random (MAR) if the working correlation is correctly specified. The performance of the proposed method is illustrated with several numerical examples. Adaption of the method to other paired outcomes is discussed. 相似文献

7.

Robust multivariate mixture regression models with incomplete data

Hwa Kyung Lim Naveen N. Narisetty 《Journal of Statistical Computation and Simulation》2017,87(2):328-347

Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis. 相似文献

8.

Inferences on missing information under multiple imputation and two-stage multiple imputation

Ofer Harel 《Statistical Methodology》2007,4(1):75-89

In the presence of missing values, researchers may be interested in the rates of missing information. The rates of missing information are (a) important for assessing how the missing information contributes to inferential uncertainty about, Q, the population quantity of interest, (b) are an important component in the decision of the number of imputations, and (c) can be used to test model uncertainty and model fitting. In this article I will derive the asymptotic distribution of the rates of missing information in two scenarios: the conventional multiple imputation (MI), and the two-stage MI. Numerically I will show that the proposed asymptotic distribution agrees with the simulated one. I will also suggest the number of imputations needed to obtain reliable missing information rate estimates for each method, based on the asymptotic distribution. 相似文献

9.

Marginal maximum a posteriori estimation using Markov chain Monte Carlo

Arnaud Doucet Simon J. Godsill Christian P. Robert 《Statistics and Computing》2002,12(1):77-84

Markov chain Monte Carlo (MCMC) methods, while facilitating the solution of many complex problems in Bayesian inference, are not currently well adapted to the problem of marginal maximum a posteriori (MMAP) estimation, especially when the number of parameters is large. We present here a simple and novel MCMC strategy, called State-Augmentation for Marginal Estimation (SAME), which leads to MMAP estimates for Bayesian models. We illustrate the simplicity and utility of the approach for missing data interpolation in autoregressive time series and blind deconvolution of impulsive processes. 相似文献

10.

Weighted modified first order regression procedures for estimation in linear models with missingX-observations

Helge Toutenburg Andreas Fieger V. K. Srivastava 《Statistical Papers》1999,40(3):351-361

This paper considers the estimation of coefficients in a linear regression model with missing observations in the independent variables and introduces a modification of the standard first order regression method for imputation of missing values. The modification provides stochastic values for imputation and, as an extension, makes use of the principle of weighted mixed regression. The proposed procedures are compared with two popular procedures—one which utilizes only the complete observations and the other which employs the standard first order regression imputation method for missing values. A simulation experiment to evaluate the gain in efficiency and to examine interesting issues like the impact of varying degree of multicollinearity in explanatory variables is proceeded. Some work on the case of discrete regressor variables is in progress and will be reported in a future article to follow. 相似文献

11.

Semiparametric inference for estimating equations with nonignorably missing covariates

Ji Chen Fang Fang 《Journal of nonparametric statistics》2018,30(3):796-812

We consider statistical inference of unknown parameters in estimating equations (EEs) when some covariates have nonignorably missing values, which is quite common in practice but has rarely been discussed in the literature. When an instrument, a fully observed covariate vector that helps identifying parameters under nonignorable missingness, is available, the conditional distribution of the missing covariates given other covariates can be estimated by the pseudolikelihood method of Zhao and Shao [(2015), ‘Semiparametric pseudo likelihoods in generalised linear models with nonignorable missing data’, Journal of the American Statistical Association, 110, 1577–1590)] and be used to construct unbiased EEs. These modified EEs then constitute a basis for valid inference by empirical likelihood. Our method is applicable to a wide range of EEs used in practice. It is semiparametric since no parametric model for the propensity of missing covariate data is assumed. Asymptotic properties of the proposed estimator and the empirical likelihood ratio test statistic are derived. Some simulation results and a real data analysis are presented for illustration. 相似文献

12.

Doubly robust estimation of partially linear models for longitudinal data with dropouts and measurement error in covariates

Huiming Lin Jiajia Zhang Wing K. Fung 《Statistics》2018,52(1):84-98

In longitudinal studies, missing responses and mismeasured covariates are commonly seen due to the data collection process. Without cautiousness in data analysis, inferences from the standard statistical approaches may lead to wrong conclusions. In order to improve the estimation for longitudinal data analysis, a doubly robust estimation method for partially linear models, which can simultaneously account for the missing responses and mismeasured covariates, is proposed. Imprecisions of covariates are corrected by taking advantage of the independence between replicate measurement errors, and missing responses are handled by the doubly robust estimation under the mechanism of missing at random. The asymptotic properties of the proposed estimators are established under regularity conditions, and simulation studies demonstrate desired properties. Finally, the proposed method is applied to data from the Lifestyle Education for Activity and Nutrition study. 相似文献

13.

On the asymptotic non‐equivalence of efficient‐GMM and MEL estimators in models with missing data

Xuerong Chen Yan Chen Alan T.K. Wan Yong Zhou 《Scandinavian Journal of Statistics》2019,46(2):361-388

The generalized method of moments (GMM) and empirical likelihood (EL) are popular methods for combining sample and auxiliary information. These methods are used in very diverse fields of research, where competing theories often suggest variables satisfying different moment conditions. Results in the literature have shown that the efficient‐GMM (GMM_E) and maximum empirical likelihood (MEL) estimators have the same asymptotic distribution to order n^?1/2 and that both estimators are asymptotically semiparametric efficient. In this paper, we demonstrate that when data are missing at random from the sample, the utilization of some well‐known missing‐data handling approaches proposed in the literature can yield GMM_E and MEL estimators with nonidentical properties; in particular, it is shown that the GMM_E estimator is semiparametric efficient under all the missing‐data handling approaches considered but that the MEL estimator is not always efficient. A thorough examination of the reason for the nonequivalence of the two estimators is presented. A particularly strong feature of our analysis is that we do not assume smoothness in the underlying moment conditions. Our results are thus relevant to situations involving nonsmooth estimating functions, including quantile and rank regressions, robust estimation, the estimation of receiver operating characteristic (ROC) curves, and so on. 相似文献

14.

Estimation in capture–recapture models when covariates are subject to measurement errors and missing data

Liqun Xi Ray Watson Ji‐Ping Wang Paul S. F. Yip 《Revue canadienne de statistique》2009,37(4):645-658

For capture–recapture models when covariates are subject to measurement errors and missing data, a set of estimating equations is constructed to estimate population size and relevant parameters. These estimating equations can be solved by an algorithm similar to the EM algorithm. The proposed method is also applicable to the situation when covariates with no measurement errors have missing data. Simulation studies are used to assess the performance of the proposed estimator. The estimator is also applied to a capture–recapture experiment on the bird species Prinia flaviventris in Hong Kong. The Canadian Journal of Statistics 37: 645–658; 2009 © 2009 Statistical Society of Canada 相似文献

15.

An estimated‐score approach for dealing with missing covariate data in matched case–control studies

Samiran Sinha 《Revue canadienne de statistique》2010,38(4):680-697

Matched case–control designs are commonly used in epidemiological studies for estimating the effect of exposure variables on the risk of a disease by controlling the effect of confounding variables. Due to retrospective nature of the study, information on a covariate could be missing for some subjects. A straightforward application of the conditional logistic likelihood for analyzing matched case–control data with the partially missing covariate may yield inefficient estimators of the parameters. A robust method has been proposed to handle this problem using an estimated conditional score approach when the missingness mechanism does not depend on the disease status. Within the conditional logistic likelihood framework, an empirical procedure is used to estimate the odds of the disease for the subjects with missing covariate values. The asymptotic distribution and the asymptotic variance of the estimator when the matching variables and the completely observed covariates are categorical. The finite sample performance of the proposed estimator is assessed through a simulation study. Finally, the proposed method has been applied to analyze two matched case–control studies. The Canadian Journal of Statistics 38: 680–697; 2010 © 2010 Statistical Society of Canada 相似文献

16.

Bias Reduction in Logistic Regression with Missing Responses When the Missing Data Mechanism is Nonignorable

《The American statistician》2012,66(4):340-349

ABSTRACT

In logistic regression with nonignorable missing responses, Ibrahim and Lipsitz proposed a method for estimating regression parameters. It is known that the regression estimates obtained by using this method are biased when the sample size is small. Also, another complexity arises when the iterative estimation process encounters separation in estimating regression coefficients. In this article, we propose a method to improve the estimation of regression coefficients. In our likelihood-based method, we penalize the likelihood by multiplying it by a noninformative Jeffreys prior as a penalty term. The proposed method reduces bias and is able to handle the issue of separation. Simulation results show substantial bias reduction for the proposed method as compared to the existing method. Analyses using real world data also support the simulation findings. An R package called brlrmr is developed implementing the proposed method and the Ibrahim and Lipsitz method. 相似文献

17.

Hypothesis test for paired samples in the presence of missing data

Pablo Martínez-Camblor Norberto Corral Jesus María de la Hera 《Journal of applied statistics》2013,40(1):76-87

Missing data are present in almost all statistical analysis. In simple paired design tests, when some subject has one of the involved variables missing in the so-called partially overlapping samples scheme, it is usually discarded for the analysis. The lack of consistency between the information reported in the univariate and multivariate analysis is, perhaps, the main consequence. Although the randomness on the missing mechanism (missingness completely at random) is an usual and needed assumption for this particular situation, missing data presence could lead to serious inconsistencies on the reported conclusions. In this paper, the authors develop a simple and direct procedure which allows using the whole available information in order to perform paired tests. In particular, the proposed methodology is applied to check the equality among the means from two paired samples. In addition, the use of two different resampling techniques is also explored. Finally, real-world data are analysed. 相似文献

18.

A mixed effects model for analyzing area under the curve of longitudinally measured biomarkers with missing data

Luoxi Shi Dorothy K. Hatsukami Joseph S. Koopmeiners Chap T. Le Neal L. Benowitz Eric C. Donny Xianghua Luo 《Pharmaceutical statistics》2021,20(6):1249-1264

A simple approach for analyzing longitudinally measured biomarkers is to calculate summary measures such as the area under the curve (AUC) for each individual and then compare the mean AUC between treatment groups using methods such as t test. This two-step approach is difficult to implement when there are missing data since the AUC cannot be directly calculated for individuals with missing measurements. Simple methods for dealing with missing data include the complete case analysis and imputation. A recent study showed that the estimated mean AUC difference between treatment groups based on the linear mixed model (LMM), rather than on individually calculated AUCs by simple imputation, has negligible bias under random missing assumptions and only small bias when missing is not at random. However, this model assumes the outcome to be normally distributed, which is often violated in biomarker data. In this paper, we propose to use a LMM on log-transformed biomarkers, based on which statistical inference for the ratio, rather than difference, of AUC between treatment groups is provided. The proposed method can not only handle the potential baseline imbalance in a randomized trail but also circumvent the estimation of the nuisance variance parameters in the log-normal model. The proposed model is applied to a recently completed large randomized trial studying the effect of nicotine reduction on biomarker exposure of smokers. 相似文献

19.

Bias correction in logistic regression with missing categorical covariates

Ujjwal Das Tapabrata Maiti Vivek Pradhan 《Journal of statistical planning and inference》2010

Logistic regression plays an important role in many fields. In practice, we often encounter missing covariates in different applied sectors, particularly in biomedical sciences. Ibrahim (1990) proposed a method to handle missing covariates in generalized linear model (GLM) setup. It is well known that logistic regression estimates using small or medium sized missing data are biased. Considering the missing data that are missing at random, in this paper we have reduced the bias by two methods; first we have derived a closed form bias expression using Cox and Snell (1968), and second we have used likelihood based modification similar to Firth (1993). Here we have analytically shown that the Firth type likelihood modification in Ibrahim led to the second order bias reduction. The proposed methods are simple to apply on an existing method, need no analytical work, with the exception of a little change in the optimization function. We have carried out extensive simulation studies comparing the methods, and our simulation results are also supported by a real world data. 相似文献

20.

Comparison of Nonparametric Regression Curves by Spline Smoothing

Na Li 《统计学通讯:理论与方法》2013,42(22):3972-3987

In this article, procedures are proposed to test the hypothesis of equality of two or more regression functions. Tests are proposed by p-values, first under homoscedastic regression model, which are derived using fiducial method based on cubic spline interpolation. Then, we construct a test in the heteroscedastic case based on Fisher's method of combining independent tests. We study the behaviors of the tests by simulation experiments, in which comparisons with other tests are also given. The proposed tests have good performances. Finally, an application to a data set are given to illustrate the usefulness of the proposed test in practice. 相似文献