In this paper, we propose a hybrid method to estimate the baseline hazard for Cox proportional hazard model. In the proposed method, the nonparametric estimate of the survival function by Kaplan Meier, and the parametric estimate of the logistic function in the Cox proportional hazard by partial likelihood method are combined to estimate a parametric baseline hazard function. We compare the estimated baseline hazard using the proposed method and the Cox model. The results show that the estimated baseline hazard using hybrid method is improved in comparison with estimated baseline hazard using the Cox model. The performance of each method is measured based on the estimated parameters of the baseline distribution as well as goodness of fit of the model. We have used real data as well as simulation studies to compare performance of both methods. Monte Carlo simulations carried out in order to evaluate the performance of the proposed method. The results show that the proposed hybrid method provided better estimate of the baseline in comparison with the estimated values by the Cox model.  相似文献   

In this study, we evaluate several forms of both Akaike-type and Information Complexity (ICOMP)-type information criteria, in the context of selecting an optimal subset least squares ratio (LSR) regression model. Our simulation studies are designed to mimic many characteristics present in real data – heavy tails, multicollinearity, redundant variables, and completely unnecessary variables. Our findings are that LSR in conjunction with one of the ICOMP criteria is very good at selecting the true model. Finally, we apply these methods to the familiar body fat data set.  相似文献   

Let π1…, πk denote k(≥ 2) populations with unknown means μ1 , …, μk and variances σ1 2 , …, σk 2 , respectively and let πo denote the control population having mean μo and variance σo 2 . It is assumed that these populations are normally distributed with correlation matrix {ρij}. The goal is to select a subset, of populations of π1 , …, πk which contains all the populations with means larger than or equal to the mean of the control one. Procedures are given for selecting such a subset so that the probability that all the populations with means larger than or equal to the mean of the control one are included in the selected subset is at least equal to a predetermined value P?(l/k < P? < 1). The goal treated here is a first step screening procedure that allows the experimenter to choose a subset and withhold judgement about which one has the largest mean. Then, if the one with the largest mean is desired it can be chosen from the selected subset on the basis of cost and other considerations. Percentage points are also included.  相似文献   

The max X2 technique for estimating rhe order of autoregressive processes (McClave (1976)) is extended to moving average models. The autöregressive-moving average duality is exploited by using the inverse autocorrelation function and the subset autoregression algorithm. The technique is demonstrated via simulations, and is applied to Box and Jenkins (1970) Series A.  相似文献   

A new method for detecting the parameter changes in generalized autoregressive heteroskedasticity GARCH (1,1) model is proposed. In the proposed method, time series observations are divided into several segments and a GARCH (1,1) model is fitted to each segment. The goodness-of-fit of the global model composed of these local GARCH (1,1) models is evaluated using the corresponding information criterion (IC). The division that minimizes IC defines the best model. Furthermore, since the simultaneous estimation of all possible models requires huge computational time, a new time-saving algorithm is proposed. Simulation results and empirical results both indicate that the proposed method is useful in analysing financial data.  相似文献   

在联合广义线性模型中,散度参数与均值都被赋予了广义线性模型的结构,本文主要考虑在只有分布的一阶矩和二阶矩指定的条件下,联合广义线性模型中均值部分的变量选择问题。本文采用广义拟似然函数,提出了新的模型选择准则(EAIC);该准则是Akaike信息准则的推广。论文通过模拟研究验证了该准则的效果。  相似文献   

Biclustering is the simultaneous clustering of two related dimensions, for example, of individuals and features, or genes and experimental conditions. Very few statistical models for biclustering have been proposed in the literature. Instead, most of the research has focused on algorithms to find biclusters. The models underlying them have not received much attention. Hence, very little is known about the adequacy and limitations of the models and the efficiency of the algorithms. In this work, we shed light on associated statistical models behind the algorithms. This allows us to generalize most of the known popular biclustering techniques, and to justify, and many times improve on, the algorithms used to find the biclusters. It turns out that most of the known techniques have a hidden Bayesian flavor. Therefore, we adopt a Bayesian framework to model biclustering. We propose a measure of biclustering complexity (number of biclusters and overlapping) through a penalized plaid model, and present a suitable version of the deviance information criterion to choose the number of biclusters, a problem that has not been adequately addressed yet. Our ideas are motivated by the analysis of gene expression data.  相似文献   

This paper presents an extension of mean-squared forecast error (MSFE) model averaging for integrating linear regression models computed on data frames of various lengths. Proposed method is considered to be a preferable alternative to best model selection by various efficiency criteria such as Bayesian information criterion (BIC), Akaike information criterion (AIC), F-statistics and mean-squared error (MSE) as well as to Bayesian model averaging (BMA) and naïve simple forecast average. The method is developed to deal with possibly non-nested models having different number of observations and selects forecast weights by minimizing the unbiased estimator of MSFE. Proposed method also yields forecast confidence intervals with a given significance level what is not possible when applying other model averaging methods. In addition, out-of-sample simulation and empirical testing proves efficiency of such kind of averaging when forecasting economic processes.  相似文献   

As new technologies permit the generation of hitherto unprecedented volumes of data (e.g. genome-wide association study data), researchers struggle to keep up with the added complexity and time commitment required for its analysis. For this reason, model selection commonly relies on machine learning and data-reduction techniques, which tend to afford models with obscure interpretations. Even in cases with straightforward explanatory variables, the so-called ‘best’ model produced by a given model-selection technique may fail to capture information of vital importance to the domain-specific questions at hand. Herein we propose a new concept for model selection, feasibility, for use in identifying multiple models that are in some sense optimal and may unite to provide a wider range of information relevant to the topic of interest, including (but not limited to) interaction terms. We further provide an R package and associated Shiny Applications for use in identifying or validating feasible models, the performance of which we demonstrate on both simulated and real-life data.  相似文献   

The problem of selecting s out of k given compounts which contains at least c of the t best ones is considered. In the case of underlying distribution families with location or scale parameter it is shown that the indiffence zone approach can be strengthened to confidence statements for the parameters of the selected components. These confidence statements are valid over the entire parameter space without decreasing the infimum of the probability of a correct selection.  相似文献   

This paper deals with the implementation of model selection criteria to data generated by ARMA processes. The recently introduced modified divergence information criterion is used and compared with traditional selection criteria like the Akaike information criterion (AIC) and the Schwarz information criterion (SIC). The appropriateness of the selected model is tested for one- and five-step ahead predictions with the use of the normalized mean squared forecast errors (NMSFE).  相似文献   

Summary. We consider the problem of identifying the genetic loci (called quantitative trait loci (QTLs)) contributing to variation in a quantitative trait, with data on an experimental cross. A large number of different statistical approaches to this problem have been described; most make use of multiple tests of hypotheses, and many consider models allowing only a single QTL. We feel that the problem is best viewed as one of model selection. We discuss the use of model selection ideas to identify QTLs in experimental crosses. We focus on a back-cross experiment, with strictly additive QTLs, and concentrate on identifying QTLs, considering the estimation of their effects and precise locations of secondary importance. We present the results of a simulation study to compare the performances of the more prominent methods.  相似文献   

Motivated by the papers of Woodward and Gray (1979) and Gray, Kelly and McIntire (1978) on the R and S array approach to ARMA modeling, the authors show that the R and S array algorithm is completely equivalent to Levinson recursion. Since entries in the R and S array can be computed by either algorithm, the equivalence provides greater insight into the R and S methodology as well as its links to Akaike's AIC or FPE. Numerical simulations serve to highlight the differences between the various approaches as well as illustrate the problems associated with exact methods. The K and S array approach is shown to be an effective procedure for determining ARMA model orders.  相似文献   

We generalize the Gaussian mixture transition distribution (GMTD) model introduced by Le and co-workers to the mixture autoregressive (MAR) model for the modelling of non-linear time series. The models consist of a mixture of K stationary or non-stationary AR components. The advantages of the MAR model over the GMTD model include a more full range of shape changing predictive distributions and the ability to handle cycles and conditional heteroscedasticity in the time series. The stationarity conditions and autocorrelation function are derived. The estimation is easily done via a simple EM algorithm and the model selection problem is addressed. The shape changing feature of the conditional distributions makes these models capable of modelling time series with multimodal conditional distributions and with heteroscedasticity. The models are applied to two real data sets and compared with other competing models. The MAR models appear to capture features of the data better than other competing models do.  相似文献   

For regression problems with grouped covariates, we adapt the idea of sparse group lasso (SGL) [10 J. Friedman, T. Hastie, and R. Tibshirani, A note on the group lasso and a sparse group lasso, Tech. Rep., Statistics Department, Stanford University, 2010. [Google Scholar]] to the framework of the sufficient dimension reduction. Assuming that the regression falls into a single-index structure, we propose a method called the sparse group sufficient dimension reduction to conduct group and within-group variable selections simultaneously without assuming a specific link function. Simulation studies show that our method is comparable to the SGL under the regular linear model setting and outperforms SGL with higher true positive rates and substantially lower false positive rates when the regression function is nonlinear. One immediate application of our method is to the gene pathway data analysis where genes naturally fall into groups (pathways). An analysis of a glioblastoma microarray data is included for illustration of our method.  相似文献   

We investigate the exact coverage and expected length properties of the model averaged tail area (MATA) confidence interval proposed by Turek and Fletcher, CSDA, 2012, in the context of two nested, normal linear regression models. The simpler model is obtained by applying a single linear constraint on the regression parameter vector of the full model. For given length of response vector and nominal coverage of the MATA confidence interval, we consider all possible models of this type and all possible true parameter values, together with a wide class of design matrices and parameters of interest. Our results show that, while not ideal, MATA confidence intervals perform surprisingly well in our regression scenario, provided that we use the minimum weight within the class of weights that we consider on the simpler model.  相似文献   

To measure the distance between a robust function evaluated under the true regression model and under a fitted model, we propose generalized Kullback–Leibler information. Using this generalization we have developed three robust model selection criteria, AICR*, AICCR* and AICCR, that allow the selection of candidate models that not only fit the majority of the data but also take into account non-normally distributed errors. The AICR* and AICCR criteria can unify most existing Akaike information criteria; three examples of such unification are given. Simulation studies are presented to illustrate the relative performance of each criterion.  相似文献   

Growth curve models are used to analyze repeated measures data (longitudinal data), which are functions of time. In this paper, some necessary and sufficient conditions for linear function B1YB2 to be the best linear unbiased estimator (BLUE) of estimable functions X1ΘX2 (or K1ΘK2) under the general growth curve model were established. In addition, the representations of BLUE(K1ΘK2) (or BLUE(X1ΘX2)) were derived when the conditions are satisfied. Two special notions of linear sufficiency with respect to the general growth curve model are given in the end. The findings of this paper enrich some known results in the literature.  相似文献   

