期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Incomplete covariates in the Cox model with applications to biological marker data

Traci Leong Stuart R. Lipsitz & Joseph G. Ibrahim 《Journal of the Royal Statistical Society. Series C, Applied statistics》2001,50(4):467-484

A common occurrence in clinical trials with a survival end point is missing covariate data. With ignorably missing covariate data, Lipsitz and Ibrahim proposed a set of estimating equations to estimate the parameters of Cox's proportional hazards model. They proposed to obtain parameter estimates via a Monte Carlo EM algorithm. We extend those results to non-ignorably missing covariate data. We present a clinical trials example with three partially observed laboratory markers which are used as covariates to predict survival. 相似文献

2.

A monte carlo comparison of the smoothing,scoring and em algorithms for dispersion matrix estimation with incomplete growth curve data

《Journal of Statistical Computation and Simulation》2012,82(1-2):77-92

Incomplete growth curve data often result from missing or mistimed observations in a repeated measures design. Virtually all methods of analysis rely on the dispersion matrix estimates. A Monte Carlo simulation was used to compare three methods of estimation of dispersion matrices for incomplete growth curve data. The three methods were: 1) maximum likelihood estimation with a smoothing algorithm, which finds the closest positive semidefinite estimate of the pairwise estimated dispersion matrix; 2) a mixed effects model using the EM (estimation maximization) algorithm; and 3) a mixed effects model with the scoring algorithm. The simulation included 5 dispersion structures, 20 or 40 subjects with 4 or 8 observations per subject and 10 or 30% missing data. In all the simulations, the smoothing algorithm was the poorest estimator of the dispersion matrix. In most cases, there were no significant differences between the scoring and EM algorithms. The EM algorithm tended to be better than the scoring algorithm when the variances of the random effects were close to zero, especially for the simulations with 4 observations per subject and two random effects. 相似文献

3.

Hypothesis test for paired samples in the presence of missing data

Pablo Martínez-Camblor Norberto Corral Jesus María de la Hera 《Journal of applied statistics》2013,40(1):76-87

Missing data are present in almost all statistical analysis. In simple paired design tests, when some subject has one of the involved variables missing in the so-called partially overlapping samples scheme, it is usually discarded for the analysis. The lack of consistency between the information reported in the univariate and multivariate analysis is, perhaps, the main consequence. Although the randomness on the missing mechanism (missingness completely at random) is an usual and needed assumption for this particular situation, missing data presence could lead to serious inconsistencies on the reported conclusions. In this paper, the authors develop a simple and direct procedure which allows using the whole available information in order to perform paired tests. In particular, the proposed methodology is applied to check the equality among the means from two paired samples. In addition, the use of two different resampling techniques is also explored. Finally, real-world data are analysed. 相似文献

4.

Comparison of algorithms for replacing missing data in discriminant analysis

J.Twedt Daniel D.S. Gill 《统计学通讯:理论与方法》2013,42(6):1567-1578

We examined the impact of different methods for replacing missing data in discriminant analyses conducted on randomly generated samples from multivariate normal and non-normal distributions. The probabilities of correct classification were obtained for these discriminant analyses before and after randomly deleting data as well as after deleted data were replaced using: (1) variable means, (2) principal component projections, and (3) the EM algorithm. Populations compared were: (1) multivariate normal with covariance matrices ∑₁=∑₂, (2) multivariate normal with ∑₁≠∑₂ and (3) multivariate non-normal with ∑₁=∑₂. Differences in the probabilities of correct classification were most evident for populations with small Mahalanobis distances or high proportions of missing data. The three replacement methods performed similarly but all were better than non - replacement. 相似文献

5.

A likelihood-based constrained algorithm for multivariate normal mixture models

Salvatore?Ingrassia Email author 《Statistical Methods and Applications》2004,13(2):151-166

It is well known that the log-likelihood function for samples coming from normal mixture distributions may present spurious maxima and singularities. For this reason here we reformulate some Hathaways results and we propose two constrained estimation procedures for multivariate normal mixture modelling according to the likelihood approach. Their perfomances are illustrated on the grounds of some numerical simulations based on the EM algorithm. A comparison between multivariate normal mixtures and the hot-deck approach in missing data imputation is also considered.Salvatore Ingrassia: S. Ingrassia carried out the research as part of the project Metodi Statistici e Reti Neuronali per lAnalisi di Dati Complessi (PRIN 2000, resp. G. Lunetta). 相似文献

6.

Measuring sexual partner networks for transmission of sexually transmitted diseases 总被引：1，自引：0，他引：1

A. C. Ghani & G. P. Garnett 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》1998,161(2):227-238

Patterns of sexual mixing and the sexual partner network are important determinants of the spread of all sexually transmitted diseases (STDs), including the human immunodeficiency virus. Novel statistical problems arise in the analysis and interpretation of studies aimed at measuring patterns of sexual mixing and sexual partner networks. Samples of mixing patterns and network structures derived from randomly sampling individuals are not themselves random samples of measures of partnerships or networks. In addition, the sensitive nature of questions on sexual activity will result in the introduction of non-response biases, which in estimating network structures are likely to be non-ignorable. Adjusting estimates for these biases by using standard statistical approaches is complicated by the complex interactions between the mechanisms generating bias and the non-independent nature of network data. Using a two-step Monte Carlo simulation approach, we have shown that measures of mixing patterns and the network structure that do not account for missing data and non-random sampling are severely biased. Here, we use this approach to adjust raw estimates in data to incorporate these effects. The results suggest that the risk for transmission of STDs in empirical data is underestimated by ignoring missing data and non-random sampling. 相似文献

7.

A stochastic approximation algorithm for maximum-likelihood estimation with incomplete data

Ming Gao Gu Shaolin Li 《Revue canadienne de statistique》1998,26(4):567-582

We propose a new stochastic approximation (SA) algorithm for maximum-likelihood estimation (MLE) in the incomplete-data setting. This algorithm is most useful for problems when the EM algorithm is not possible due to an intractable E-step or M-step. Compared to other algorithm that have been proposed for intractable EM problems, such as the MCEM algorithm of Wei and Tanner (1990), our proposed algorithm appears more generally applicable and efficient. The approach we adopt is inspired by the Robbins-Monro (1951) stochastic approximation procedure, and we show that the proposed algorithm can be used to solve some of the long-standing problems in computing an MLE with incomplete data. We prove that in general O(n) simulation steps are required in computing the MLE with the SA algorithm and O(n log n) simulation steps are required in computing the MLE using the MCEM and/or the MCNR algorithm, where n is the sample size of the observations. Examples include computing the MLE in the nonlinear error-in-variable model and nonlinear regression model with random effects. 相似文献

8.

Computational aspects of the EM algorithm for spatial econometric models with missing data

Thomas Suesse Andrew Zammit-Mangion 《Journal of Statistical Computation and Simulation》2017,87(9):1767-1786

Maximum likelihood (ML) estimation with spatial econometric models is a long-standing problem that finds application in several areas of economic importance. The problem is particularly challenging in the presence of missing data, since there is an implied dependence between all units, irrespective of whether they are observed or not. Out of the several approaches adopted for ML estimation in this context, that of LeSage and Pace [Models for spatially dependent missing data. J Real Estate Financ Econ. 2004;29(2):233–254] stands out as one of the most commonly used with spatial econometric models due to its ability to scale with the number of units. Here, we review their algorithm, and consider several similar alternatives that are also suitable for large datasets. We compare the methods through an extensive empirical study and conclude that, while the approximate approaches are suitable for large sampling ratios, for small sampling ratios the only reliable algorithms are those that yield exact ML or restricted ML estimates. 相似文献

9.

The em algorithm for the quasi-likelihood regression model

Myunghee Cho Paik 《统计学通讯:理论与方法》2013,42(6):1403-1430

The objective of this paper is to present a method which can accommodate certain types of missing data by using the quasi-likelihood function for the complete data. This method can be useful when we can make first and second moment assumptions only; in addition, it can be helpful when the EM algorithm applied to the actual likelihood becomes overly complicated. First we derive a loss function for the observed data using an exponential family density which has the same mean and variance structure of the complete data. This loss function is the counterpart of the quasi-deviance for the observed data. Then the loss function is minimized using the EM algorithm. The use of the EM algorithm guarantees a decrease in the loss function at every iteration. When the observed data can be expressed as a deterministic linear transformation of the complete data, or when data are missing completely at random, the proposed method yields consistent estimators. Examples are given for overdispersed polytomous data, linear random effects models, and linear regression with missing covariates. Simulation results for the linear regression model with missing covariates show that the proposed estimates are more efficient than estimates based on completely observed units, even when outcomes are bimodal or skewed. 相似文献

10.

Cure rate survival models with missing covariates: a simulation study

Renata Santana Fonseca Heleno Bolfarine 《Journal of Statistical Computation and Simulation》2013,83(1):97-113

In this paper we study the cure rate survival model involving a competitive risk structure with missing categorical covariates. A parametric distribution that can be written as a sequence of one-dimensional conditional distributions is specified for the missing covariates. We consider the missing data at random situation so that the missing covariates may depend only on the observed ones. Parameter estimates are obtained by using the EM algorithm via the method of weights. Extensive simulation studies are conducted and reported to compare estimates efficiency with and without missing data. As expected, the estimation approach taking into consideration the missing covariates presents much better efficiency in terms of mean square errors than the complete case situation. Effects of increasing cured fraction and censored observations are also reported. We demonstrate the proposed methodology with two real data sets. One involved the length of time to obtain a BS degree in Statistics, and another about the time to breast cancer recurrence. 相似文献

11.

GETTING THE ‘CORRECT’ ANSWER FROM SURVEY RESPONSES: A SIMPLE APPLICATION OF THE EM ALGORITHM

A. J. Lee 《Australian & New Zealand Journal of Statistics》2011,53(3):353-364

This note addresses a problem that can arise in surveys, namely when some respondents misinterpret the rating method and so assign high ratings when they intended to assign low ratings, and vice versa. We present a method that allows these misinterpretations to be corrected with high probability, and more meaningful conclusions to be drawn. The method is illustrated with data from a Community Value survey. 相似文献

12.

Nonlinear mixed‐effect models with nonignorably missing covariates

Lang Wu 《Revue canadienne de statistique》2004,32(1):27-37

Nonlinear mixed‐effect models are often used in the analysis of longitudinal data. However, it sometimes happens that missing values for some of the model covariates are not purely random. Motivated by an application to HTV viral dynamics, where this situation occurs, the author considers likelihood inference for this type of problem. His approach involves a Monte Carlo EM algorithm, along with a Gibbs sampler and rejection/importance sampling methods. A concrete application is provided. 相似文献

13.

Hypothesis testing for the dispersion parameter of the hyper-Poisson regression model

Daiane de Souza Santos Vicente Cancho Josemar Rodrigues 《Journal of Statistical Computation and Simulation》2019,89(5):763-775

The Poisson distribution is widely used to deal with count data, however, it is unable to capture the dispersion problems. The hyper-Poisson distribution is a particular case of the extended Conway–Maxwell distribution which takes into account the dispersion phenomena of the count data. The main motivation to consider this model is the possibility to link the mean to the regressor variables in very natural way to solve testing problems. So, this paper will be focalized in the gradient statistics to detect dispersions and to compare with the classical likelihood ratio statistic. Two illustrative applications are considered. 相似文献

14.

Sample size estimation for a two-group comparison of repeated count outcomes using GEE

Ying Lou Jing Cao Song Zhang 《统计学通讯:理论与方法》2017,46(14):6743-6753

Randomized clinical trials with count measurements as the primary outcome are common in various medical areas such as seizure counts in epilepsy trials, or relapse counts in multiple sclerosis trials. Controlled clinical trials frequently use a conventional parallel-group design that assigns subjects randomly to one of two treatment groups and repeatedly evaluates them at baseline and intervals across a treatment period of a fixed duration. The primary interest is to compare the rates of change between treatment groups. Generalized estimating equations (GEEs) have been widely used to compare rates of change between treatment groups because of its robustness to misspecification of the true correlation structure. In this paper, we derive a sample size formula for comparing the rates of change between two groups in a repeatedly measured count outcome using GEE. The sample size formula incorporates general missing patterns such as independent missing and monotone missing, and general correlation structures such as AR(1) and compound symmetry (CS). The performance of the sample size formula is evaluated through simulation studies. Sample size estimation is illustrated by a clinical trial example from epilepsy. 相似文献

15.

Robust logistic regression of family data in the presence of missing genotypes

Yanping Qiu 《Journal of applied statistics》2019,46(5):926-945

Large cohort studies are commonly launched to study the risk effect of genetic variants or other risk factors on a chronic disorder. In these studies, family data are often collected to provide additional information for the purpose of improving the inference results. Statistical analysis of the family data can be very challenging due to the missing observations of genotypes, incomplete records of disease occurrences in family members, and the complicated dependence attributed to the shared genetic background and environmental factors. In this article, we investigate a class of logistic models with family-shared random effects to tackle these challenges, and develop a robust regression method based on the conditional logistic technique for statistical inference. An expectation–maximization (EM) algorithm with fast computation speed is developed to handle the missing genotypes. The proposed estimators are shown to be consistent and asymptotically normal. Additionally, a score test based on the proposed method is derived to test the genetic effect. Extensive simulation studies demonstrate that the proposed method performs well in finite samples in terms of estimate accuracy, robustness and computational speed. The proposed procedure is applied to an Alzheimer's disease study. 相似文献

16.

Joint regression modeling for missing categorical covariates in generalized linear models

Luis Carlos Pérez-Ruiz Gabriel Escarela 《Journal of applied statistics》2018,45(15):2741-2759

Missing covariates data is a common issue in generalized linear models (GLMs). A model-based procedure arising from properly specifying joint models for both the partially observed covariates and the corresponding missing indicator variables represents a sound and flexible methodology, which lends itself to maximum likelihood estimation as the likelihood function is available in computable form. In this paper, a novel model-based methodology is proposed for the regression analysis of GLMs when the partially observed covariates are categorical. Pair-copula constructions are used as graphical tools in order to facilitate the specification of the high-dimensional probability distributions of the underlying missingness components. The model parameters are estimated by maximizing the weighted log-likelihood function by using an EM algorithm. In order to compare the performance of the proposed methodology with other well-established approaches, which include complete-cases and multiple imputation, several simulation experiments of Binomial, Poisson and Normal regressions are carried out under both missing at random and non-missing at random mechanisms scenarios. The methods are illustrated by modeling data from a stage III melanoma clinical trial. The results show that the methodology is rather robust and flexible, representing a competitive alternative to traditional techniques. 相似文献

17.

Mixed Graphical Models with Missing Data and the Partial Imputation EM Algorithm 总被引：2，自引：0，他引：2

Zhi Geng Kang Wan & Feng Tao 《Scandinavian Journal of Statistics》2000,27(3):433-444

In this paper we discuss graphical models for mixed types of continuous and discrete variables with incomplete data. We use a set of hyperedges to represent an observed data pattern. A hyperedge is a set of variables observed for a group of individuals. In a mixed graph with two types of vertices and two types of edges, dots and circles represent discrete and continuous variables respectively. A normal graph represents a graphical model and a hypergraph represents an observed data pattern. In terms of the mixed graph, we discuss decomposition of mixed graphical models with incomplete data, and we present a partial imputation method which can be used in the EM algorithm and the Gibbs sampler to speed their convergence. For a given mixed graphical model and an observed data pattern, we try to decompose a large graph into several small ones so that the original likelihood can be factored into a product of likelihoods with distinct parameters for small graphs. For the case that a graph cannot be decomposed due to its observed data pattern, we can impute missing data partially so that the graph can be decomposed. 相似文献

18.

Evaluating a new test using a reference test with estimated sensitivity and specificity

Stuart G. Baker 《统计学通讯:理论与方法》2013,42(9):2739-2752

Methodology is proposed for evaluating a new test using a reference test. Unlike previous methodology, sensitivity and the specificity of the reference test are not assumed to be known; instead they are estimated from data comparing the reference test with a gold standard. 相似文献

19.

Computation of the estimated parameters and wald statistic for the generalized growth curve model

Neil C. Sehwertman John R. Huseby David M. Allen 《统计学通讯:模拟与计算》2013,42(6):675-693

Multivariate analysis is difficult when there are missing observations in the response vectors. Kleinbaum (1973) proposed a Wald statistic useful in the analysis of incomplete multivariate data. SUBROUTINE C0EF calculates the estimated parameter matrix g in the generalization of the Potthoff-Roy (1964) growth curve model proposed by Kleinbaum (1973). SUBROUTINE WALD calculates the Wald statistic for hypotheses of the form Hn: H 5 D = 0 as proposed by Kleinbaum (1973). 相似文献

20.

Maximum-likelihood estimation for constrained- or missing-data models

Alan E. Gelfand Bradley P. Carlin 《Revue canadienne de statistique》1993,21(3):303-311

In statistical models involving constrained or missing data, likelihoods containing integrals emerge. In the case of both constrained and missing data, the result is a ratio of integrals, which for multivariate data may defy exact or approximate analytic expression. Seeking maximum-likelihood estimates in such settings, we propose Monte Carlo approximants for these integrals, and subsequently maximize the resulting approximate likelihood. Iteration of this strategy expedites the maximization, while the Gibbs sampler is useful for the required Monte Carlo generation. As a result, we handle a class of models broader than the customary EM setting without using an EM-type algorithm. Implementation of the methodology is illustrated in two numerical examples. 相似文献