期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Adaptive-Cox model averaging for right-censored data

Yu-Mei Chang Pao-Sheng Shen 《统计学通讯:理论与方法》2017,46(19):9364-9376

In medical studies, Cox proportional hazards model is a commonly used method to deal with the right-censored survival data accompanied by many explanatory covariates. In practice, the Akaike's information criterion (AIC) or the Bayesian information criterion (BIC) is usually used to select an appropriate subset of covariates. It is well known that neither the AIC criterion nor the BIC criterion dominates for all situations. In this paper, we propose an adaptive-Cox model averaging procedure to get a more robust hazard estimator. First, by applying AIC and BIC criteria to perturbed datasets, we obtain two model averaging (MA) estimated survival curves, called AIC-MA and BIC-MA. Then, based on Kullback–Leibler loss, a better estimate of survival curve between AIC-MA and BIC-MA is chosen, which results in an adaptive-Cox estimate of survival curve. Simulation results show the superiority of our approach and an application of the proposed method is also presented by analyzing the German Breast Cancer Study dataset. 相似文献

2.

On the use of the selection matrix in the maximum likelihood estimation of normal distribution models with missing data

Keiji Takai 《统计学通讯:理论与方法》2018,47(14):3392-3407

In this article, by using the constant and random selection matrices, several properties of the maximum likelihood (ML) estimates and the ML estimator of a normal distribution with missing data are derived. The constant selection matrix allows us to obtain an explicit form of the ML estimates and the exact relationship between the EM algorithm and the score function. The random selection matrix allows us to clarify how the missing-data mechanism works in the proof of the consistency of the ML estimator, to derive the asymptotic properties of the sequence by the EM algorithm, and to derive the information matrix. 相似文献

3.

Model selection of generalized partially linear models with missing covariates

Ying-Zi FuXue-Dong Chen 《Journal of statistical planning and inference》2012,142(1):126-138

In this paper, a generalized partially linear model (GPLM) with missing covariates is studied and a Monte Carlo EM (MCEM) algorithm with penalized-spline (P-spline) technique is developed to estimate the regression coefficients and nonparametric function, respectively. As classical model selection procedures such as Akaike's information criterion become invalid for our considered models with incomplete data, some new model selection criterions for GPLMs with missing covariates are proposed under two different missingness mechanism, say, missing at random (MAR) and missing not at random (MNAR). The most attractive point of our method is that it is rather general and can be extended to various situations with missing observations based on EM algorithm, especially when no missing data involved, our new model selection criterions are reduced to classical AIC. Therefore, we can not only compare models with missing observations under MAR/MNAR settings, but also can compare missing data models with complete-data models simultaneously. Theoretical properties of the proposed estimator, including consistency of the model selection criterions are investigated. A simulation study and a real example are used to illustrate the proposed methodology. 相似文献

4.

Resampling for Order Estimation of Autoregressive Models with Missing Data

Abdelaziz El Matouat Freedath Djibril Moussa Hassania Hamzaoui 《统计学通讯:模拟与计算》2015,44(5):1187-1196

In this article, we consider the order estimation of autoregressive models with incomplete data using the expectation–maximization (EM) algorithm-based information criteria. The criteria take the form of a penalization of the conditional expectation of the log-likelihood. The evaluation of the penalization term generally involves numerical differentiation and matrix inversion. We introduce a simplification of the penalization term for autoregressive model selection and we propose a penalty factor based on a resampling procedure in the criteria formula. The simulation results show the improvements yielded by the proposed method when compared with the classical information criteria for model selection with incomplete data. 相似文献

5.

A test of the missing data mechanism for repeated measures data

Taesung Park Seungyeoun Lee Robert F. Woolson 《统计学通讯:理论与方法》2013,42(10):2813-2829

The occurrence of missing data is an often unavoidable consequence of repeated measures studies. Fortunately, multivariate general linear models such as growth curve models and linear mixed models with random effects have been well developed to analyze incomplete normally-distributed repeated measures data. Most statistical methods have assumed that the missing data occur at random. This assumption may include two types of missing data mechanism: missing completely at random (MCAR) and missing at random (MAR) in the sense of Rubin (1976). In this paper, we develop a test procedure for distinguishing these two types of missing data mechanism for incomplete normally-distributed repeated measures data. The proposed test is similar in spiril to the test of Park and Davis (1992). We derive the test for incomplete normally-distribrlted repeated measures data using linear mixed models. while Park and Davis (1992) cleirved thr test for incomplete repeatctl categorical data in the framework of Grizzle Starmer. and Koch (1969). Thr proposed procedure can be applied easily to any other multivariate general linear model which allow for missing data. The test is illustrated using the hip-replacernent patient.data from Crowder and Hand (1990). 相似文献

6.

Estimation and influence diagnostics for zero-inflated hyper-Poisson regression model: full Bayesian analysis

Vicente G. Cancho Bao Yiqi Jose A. Fiorucci Gladys D. C. Barriga Dipak K. Dey 《统计学通讯:理论与方法》2018,47(11):2741-2759

The purpose of this paper is to develop a Bayesian analysis for the zero-inflated hyper-Poisson model. Markov chain Monte Carlo methods are used to develop a Bayesian procedure for the model and the Bayes estimators are compared by simulation with the maximum-likelihood estimators. Regression modeling and model selection are also discussed and case deletion influence diagnostics are developed for the joint posterior distribution based on the functional Bregman divergence, which includes ψ-divergence and several others, divergence measures, such as the Itakura–Saito, Kullback–Leibler, and χ² divergence measures. Performance of our approach is illustrated in artificial, real apple cultivation experiment data, related to apple cultivation. 相似文献

7.

The role of information in nonstationary regression

Patrick Marsh 《Statistics》2019,53(3):656-672

The role of standard likelihood-based measures of information and efficiency is unclear when regressions involve nonstationary data. Typically the standardized score is not asymptotically Gaussian and the standardized Hessian has a stochastic, rather than deterministic limit. Here we consider a time series regression involving a deterministic covariate which can be evaporating, slowly evolving or nonstationary. It is shown that conditional information, or equivalently, profile Kullback–Leibler and Fisher information remain informative about both the accuracy, i.e. asymptotic variance, of profile maximum likelihood estimators, and the power of point optimal invariant tests for a unit root. Specifically, these information measures indicate fractional, rather than linear trends that may minimize inferential accuracy. Such is confirmed in a numerical experiment. 相似文献

8.

Empirical information criteria for time series forecasting model selection

《Journal of Statistical Computation and Simulation》2012,82(10):831-840

In this article, we propose a new empirical information criterion (EIC) for model selection which penalizes the likelihood of the data by a non-linear function of the number of parameters in the model. It is designed to be used where there are a large number of time series to be forecast. However, a bootstrap version of the EIC can be used where there is a single time series to be forecast. The EIC provides a data-driven model selection tool that can be tuned to the particular forecasting task.

We compare the EIC with other model selection criteria including Akaike’s information criterion (AIC) and Schwarz’s Bayesian information criterion (BIC). The comparisons show that for the M3 forecasting competition data, the EIC outperforms both the AIC and BIC, particularly for longer forecast horizons. We also compare the criteria on simulated data and find that the EIC does better than existing criteria in that case also. 相似文献

9.

A data-driven selection of the number of clusters in the Dirichlet allocation model via Bayesian mixture modelling

E. F. Saraiva C. A. B. Pereira A. K. Suzuki 《Journal of Statistical Computation and Simulation》2019,89(15):2848-2870

In this paper, we consider a Bayesian mixture model that allows us to integrate out the weights of the mixture in order to obtain a procedure in which the number of clusters is an unknown quantity. To determine clusters and estimate parameters of interest, we develop an MCMC algorithm denominated by sequential data-driven allocation sampler. In this algorithm, a single observation has a non-null probability to create a new cluster and a set of observations may create a new cluster through the split-merge movements. The split-merge movements are developed using a sequential allocation procedure based in allocation probabilities that are calculated according to the Kullback–Leibler divergence between the posterior distribution using the observations previously allocated and the posterior distribution including a ‘new’ observation. We verified the performance of the proposed algorithm on the simulated data and then we illustrate its use on three publicly available real data sets. 相似文献

10.

Consistency of a class of information criteria for model selection in non-linear regression

Dominique Haughton 《统计学通讯:理论与方法》2013,42(5-6):1619-1629

In this paper we prove the consistency in probability of a class of generalized BIC criteria for model selection in non-linear regression, by using asymptotic results of Gallant. This extends a result obtained by Nishii for model selection in linear regression. 相似文献

11.

Stochastic EM algorithm of a finite mixture model from hurdle Poisson distribution with missing responses

Ying-zi Fu 《统计学通讯:理论与方法》2013,42(20):5918-5932

ABSTRACT

In this article, a finite mixture model of hurdle Poisson distribution with missing outcomes is proposed, and a stochastic EM algorithm is developed for obtaining the maximum likelihood estimates of model parameters and mixing proportions. Specifically, missing data is assumed to be missing not at random (MNAR)/non ignorable missing (NINR) and the corresponding missingness mechanism is modeled through probit regression. To improve the algorithm efficiency, a stochastic step is incorporated into the E-step based on data augmentation, whereas the M-step is solved by the method of conditional maximization. A variation on Bayesian information criterion (BIC) is also proposed to compare models with different number of components with missing values. The considered model is a general model framework and it captures the important characteristics of count data analysis such as zero inflation/deflation, heterogeneity as well as missingness, providing us with more insight into the data feature and allowing for dispersion to be investigated more fully and correctly. Since the stochastic step only involves simulating samples from some standard distributions, the computational burden is alleviated. Once missing responses and latent variables are imputed to replace the conditional expectation, our approach works as part of a multiple imputation procedure. A simulation study and a real example illustrate the usefulness and effectiveness of our methodology. 相似文献

12.

Model selection information criteria in latent class models with missing data and contingency question

《Journal of Statistical Computation and Simulation》2012,82(1):159-170

Latent class analysis (LCA) has been found to have important applications in social and behavioural sciences for modelling categorical response variables, and non-response is typical when collecting data. In this study, the non-response mainly included ‘contingency questions’ and real ‘missing data’. The primary objective of this study was to evaluate the effects of some potential factors on model selection indices in LCA with non-response data. We simulated missing data with contingency question and evaluated the accuracy rates of eight information criteria for selecting the correct models. The results showed that the main factors are latent class proportions, conditional probabilities, sample size, the number of items, the missing data rate and the contingency data rate. Interactions of the conditional probabilities with class proportions, sample size and the number of items are also significant. From our simulation results, the impact of missing data and contingency questions can be amended by increasing the sample size or the number of items. 相似文献

13.

Model selection criteria for dual-inflated data

Ting Hsiang Lin Min-Hsiao Tsai 《Journal of Statistical Computation and Simulation》2016,86(13):2663-2672

ABSTRACT

Inflated data are prevalent in many situations and a variety of inflated models with extensions have been derived to fit data with excessive counts of some particular responses. The family of information criteria (IC) has been used to compare the fit of models for selection purposes. Yet despite the common use in statistical applications, there are not too many studies evaluating the performance of IC in inflated models. In this study, we studied the performance of IC for data with dual-inflated data. The new zero- and K-inflated Poisson (ZKIP) regression model and conventional inflated models including Poisson regression and zero-inflated Poisson (ZIP) regression were fitted for dual-inflated data and the performance of IC were compared. The effect of sample sizes and the proportions of inflated observations towards selection performance were also examined. The results suggest that the Bayesian information criterion (BIC) and consistent Akaike information criterion (CAIC) are more accurate than the Akaike information criterion (AIC) in terms of model selection when the true model is simple (i.e. Poisson regression (POI)). For more complex models, such as ZIP and ZKIP, the AIC was consistently better than the BIC and CAIC, although it did not reach high levels of accuracy when sample size and the proportion of zero observations were small. The AIC tended to over-fit the data for the POI, whereas the BIC and CAIC tended to under-parameterize the data for ZIP and ZKIP. Therefore, it is desirable to study other model selection criteria for dual-inflated data with small sample size. 相似文献

14.

Missing covariates in generalized linear models when the missing data mechanism is non-ignorable

J. G. Ibrahim S. R. Lipsitz & M.-H. Chen 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1999,61(1):173-190

We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed. 相似文献

15.

A Short Note on Almost Sure Convergence of Bayes Factors in the General Set-Up

Debashis Chatterjee Trisha Maitra Sourabh Bhattacharya 《The American statistician》2020,74(1):17-20

ABSTRACT

Although there is a significant literature on the asymptotic theory of Bayes factor, the set-ups considered are usually specialized and often involves independent and identically distributed data. Even in such specialized cases, mostly weak consistency results are available. In this article, for the first time ever, we derive the almost sure convergence theory of Bayes factor in the general set-up that includes even dependent data and misspecified models. Somewhat surprisingly, the key to the proof of such a general theory is a simple application of a result of Shalizi to a well-known identity satisfied by the Bayes factor. Supplementary materials for this article are available online. 相似文献

16.

Using auxiliary data for parameter estimation with non-ignorably missing outcomes

Joseph G. Ibrahim Stuart R. Lipsitz & Nick Horton 《Journal of the Royal Statistical Society. Series C, Applied statistics》2001,50(3):361-373

We propose a method for estimating parameters in generalized linear models when the outcome variable is missing for some subjects and the missing data mechanism is non-ignorable. We assume throughout that the covariates are fully observed. One possible method for estimating the parameters is maximum likelihood with a non-ignorable missing data model. However, caution must be used when fitting non-ignorable missing data models because certain parameters may be inestimable for some models. Instead of fitting a non-ignorable model, we propose the use of auxiliary information in a likelihood approach to reduce the bias, without having to specify a non-ignorable model. The method is applied to a mental health study. 相似文献

17.

Distribution estimation with auxiliary information for missing data

Xu Liu Peixin LiuYong Zhou 《Journal of statistical planning and inference》2011,141(2):711-724

There is much literature on statistical inference for distribution under missing data, but surprisingly very little previous attention has been paid to missing data in the context of estimating distribution with auxiliary information. In this article, the auxiliary information with missing data is proposed. We use Zhou, Wan and Wang's method (2008) to mitigate the effects of missing data through a reformulation of the estimating equations, imputed through a semi-parametric procedure. Whence we can estimate distribution and the τth quantile of the distribution by taking auxiliary information into account. Asymptotic properties of the distribution estimator and corresponding sample quantile are derived and analyzed. The distribution estimators based on our method are found to significantly outperform the corresponding estimators without auxiliary information. Some simulation studies are conducted to illustrate the finite sample performance of the proposed estimators. 相似文献

18.

Bootstrap estimation and model selection for multivariate normal mixtures using parallel computing with graphics processing units

Masanari Iida Yoichi Miyata 《统计学通讯:模拟与计算》2018,47(5):1326-1342

In applications of multivariate finite mixture models, estimating the number of unknown components is often difficult. We propose a bootstrap information criterion, whereby we calculate the expected log-likelihood at maximum a posteriori estimates for model selection. Accurate estimation using the bootstrap requires a large number of bootstrap replicates. We accelerate this computation by employing parallel processing with graphics processing units (GPUs) on the Compute Unified Device Architecture (CUDA) platform. We conducted a runtime comparison of CUDA algorithms between implementation on the GPU and that on a CPU. The results showed significant performance gains in the proposed CUDA algorithms over multithread CPUs. 相似文献

19.

Improved model selection criteria for SETAR time series models

Pedro Galeano Daniel Peña 《Journal of statistical planning and inference》2007

The purpose of this paper is threefold. First, we obtain the asymptotic properties of the modified model selection criteria proposed by Hurvich et al. (1990. Improved estimators of Kullback-Leibler information for autoregressive model selection in small samples. Biometrika 77, 709–719) for autoregressive models. Second, we provide some highlights on the better performance of this modified criteria. Third, we extend the modification introduced by these authors to model selection criteria commonly used in the class of self-exciting threshold autoregressive (SETAR) time series models. We show the improvements of the modified criteria in their finite sample performance. In particular, for small and medium sample size the frequency of selecting the true model improves for the consistent criteria and the root mean square error (RMSE) of prediction improves for the efficient criteria. These results are illustrated via simulation with SETAR models in which we assume that the threshold and the parameters are unknown. 相似文献

20.

Comparison of algorithms for replacing missing data in discriminant analysis

J.Twedt Daniel D.S. Gill 《统计学通讯:理论与方法》2013,42(6):1567-1578

We examined the impact of different methods for replacing missing data in discriminant analyses conducted on randomly generated samples from multivariate normal and non-normal distributions. The probabilities of correct classification were obtained for these discriminant analyses before and after randomly deleting data as well as after deleted data were replaced using: (1) variable means, (2) principal component projections, and (3) the EM algorithm. Populations compared were: (1) multivariate normal with covariance matrices ∑₁=∑₂, (2) multivariate normal with ∑₁≠∑₂ and (3) multivariate non-normal with ∑₁=∑₂. Differences in the probabilities of correct classification were most evident for populations with small Mahalanobis distances or high proportions of missing data. The three replacement methods performed similarly but all were better than non - replacement. 相似文献