首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ABSTRACT

Inflated data are prevalent in many situations and a variety of inflated models with extensions have been derived to fit data with excessive counts of some particular responses. The family of information criteria (IC) has been used to compare the fit of models for selection purposes. Yet despite the common use in statistical applications, there are not too many studies evaluating the performance of IC in inflated models. In this study, we studied the performance of IC for data with dual-inflated data. The new zero- and K-inflated Poisson (ZKIP) regression model and conventional inflated models including Poisson regression and zero-inflated Poisson (ZIP) regression were fitted for dual-inflated data and the performance of IC were compared. The effect of sample sizes and the proportions of inflated observations towards selection performance were also examined. The results suggest that the Bayesian information criterion (BIC) and consistent Akaike information criterion (CAIC) are more accurate than the Akaike information criterion (AIC) in terms of model selection when the true model is simple (i.e. Poisson regression (POI)). For more complex models, such as ZIP and ZKIP, the AIC was consistently better than the BIC and CAIC, although it did not reach high levels of accuracy when sample size and the proportion of zero observations were small. The AIC tended to over-fit the data for the POI, whereas the BIC and CAIC tended to under-parameterize the data for ZIP and ZKIP. Therefore, it is desirable to study other model selection criteria for dual-inflated data with small sample size.  相似文献   

2.
The QR-factorization provides a set of orthogonal variables which has advantages over other orthogonal representations, such as principal components and the singular-value decomposition, in selecting subsets of regression variables by least squares methods. Stopping rules, in particular, are easily understood. A new stopping rule is derived for prediction. This is derived by approximately minimizing the mean squared error in estimating the squared error of prediction. A clear distinction is made between the kind of stopping rule which is relevant when the objective is prediction, and when the objective is asymptotic consistency. Progress with reducing the bias due to the model selection procedure is briefly summarized.  相似文献   

3.
Latent class analysis (LCA) has been found to have important applications in social and behavioural sciences for modelling categorical response variables, and non-response is typical when collecting data. In this study, the non-response mainly included ‘contingency questions’ and real ‘missing data’. The primary objective of this study was to evaluate the effects of some potential factors on model selection indices in LCA with non-response data. We simulated missing data with contingency question and evaluated the accuracy rates of eight information criteria for selecting the correct models. The results showed that the main factors are latent class proportions, conditional probabilities, sample size, the number of items, the missing data rate and the contingency data rate. Interactions of the conditional probabilities with class proportions, sample size and the number of items are also significant. From our simulation results, the impact of missing data and contingency questions can be amended by increasing the sample size or the number of items.  相似文献   

4.
This paper proposes the second-order least squares estimation, which is an extension of the ordinary least squares method, for censored regression models where the error term has a general parametric distribution (not necessarily normal). The strong consistency and asymptotic normality of the estimator are derived under fairly general regularity conditions. We also propose a computationally simpler estimator which is consistent and asymptotically normal under the same regularity conditions. Finite sample behavior of the proposed estimators under both correctly and misspecified models are investigated through Monte Carlo simulations. The simulation results show that the proposed estimator using optimal weighting matrix performs very similar to the maximum likelihood estimator, and the estimator with the identity weight is more robust against the misspecification.  相似文献   

5.
Linear, least squares statistical methods in which the "parameters" are interpreted as random variables were introduced by Whittle, and further developed by Hartigan and others. They are applied here to the problem of estimating the coefficients in an orthogonal expansion of a multivariate density, given a simple random sample.  相似文献   

6.
In this paper we prove the consistency in probability of a class of generalized BIC criteria for model selection in non-linear regression, by using asymptotic results of Gallant. This extends a result obtained by Nishii for model selection in linear regression.  相似文献   

7.
Consider a partially linear regression model with an unknown vector parameter β, an unknown functiong(·), and unknown heteroscedastic error variances. In this paper we develop an asymptotic semiparametric generalized least squares estimation theory under some weak moment conditions. These moment conditions are satisfied by many of the error distributions encountered in practice, and our theory does not require the number of replications to go to infinity.  相似文献   

8.
Two new model selection procedures based on a measure of roughness of the residuals in simple regression are proposed and studied. The first criterion utilises a certain loss function and the second comprises the application of hypotheses tests, using the bootstrap methodology. The performances of these selection rules are illustrated and comparisons are made with traditional criteria using real and artificial data, and it is found that the new selection methods perform more satisfactorily.  相似文献   

9.
There is currently much discussion about lasso-type regularized regression which is a useful tool for simultaneous estimation and variable selection. Although the lasso-type regularization has several advantages in regression modelling, owing to its sparsity, it suffers from outliers because of using penalized least-squares methods. To overcome this issue, we propose a robust lasso-type estimation procedure that uses the robust criteria as the loss function, imposing L1-type penalty called the elastic net. We also introduce to use the efficient bootstrap information criteria for choosing optimal regularization parameters and a constant in outlier detection. Simulation studies and real data analysis are given to examine the efficiency of the proposed robust sparse regression modelling. We observe that our modelling strategy performs well in the presence of outliers.  相似文献   

10.
Generalized Pareto distribution (GPD) is widely used to model exceedances over thresholds. In this paper, we propose a new method, called weighted non linear least squares (WNLS), to estimate the parameters of the three-parameter GPD. Some asymptotic results of the proposed method are provided. An extensive simulation is carried out to evaluate the finite sample behaviour of the proposed method and to compare the behaviour with other methods suggested in the literature. The simulation results show that WNLS outperforms other methods in general situations. Finally, the WNLS is applied to analysis the real-life data.  相似文献   

11.
Equivalent conditions are derived for the equality of GLSE (generalized least squares estimator) and partially GLSE (PGLSE), the latter introduced by Amemiya (1983). By adopting a more general approach the ordinary least squares estimator (OLSE) can shown to be a special PGLSE. Furthcrmore, linearly restricted estimators proposed by Balestra (1983) are investigated in this context. To facilitate the comparison of estimators extensive use of oblique and orthogonal projectors is made.  相似文献   

12.
In a multi-sample simple regression model, generally, homogeneity of the regression slopes leads to improved estimation of the intercepts. Analogous to the preliminary test estimators, (smooth) shrinkage least squares estimators of Intercepts based on the James-Stein rule on regression slopes are considered. Relative pictures on the (asymptotic) risk of the classical, preliminary test and the shrinkage least squares estimators are also presented. None of the preliminary test and shrinkage least squares estimators may dominate over the other, though each of them fares well relative to the other estimators.  相似文献   

13.
It is well-known in the literature on multicollinearity that one of the major consequences of multicollinearity on the ordinary least squares estimator is that the estimator produces large sampling variances, which in turn might inappropriately lead to exclusion of otherwise significant coefficients from the model. To circumvent this problem, two accepted estimation procedures which are often suggested are the restricted least squares method and the ridge regression method. While the former leads to a reduction in the sampling variance of the estimator, the later ensures a smaller mean square error value for the estimator. In this paper we have proposed a new estimator which is based on a criterion that combines the ideas underlying these two estimators. The standard properties of this new estimator have been studied in the paper. It has also been shown that this estimator is superior to both the restricted least squares as well as the ordinary ridge regression estimators by the criterion of mean sauare error of the estimator of the regression coefficients when the restrictions are indeed correct. The conditions for superiority of this estimator over the other two have also been derived for the situation when the restrictions are not correct.  相似文献   

14.
Summary.  Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very large p and small n paradigm. We derive a similar result for a multivariate response regression with partial least squares. We then propose a sparse partial least squares formulation which aims simultaneously to achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and dimension reduction approaches via simulation experiments. We illustrate the practical utility of sparse partial least squares regression in a joint analysis of gene expression and genomewide binding data.  相似文献   

15.
The general form of a matrix which appears in the normal equation for estimating parameters in the Gauss-Markoff linear model has been obtained.  相似文献   

16.
This note is concerned with the limiting properties of the least squares estimation for the random coefficient autoregressive model. In contrast with existing results, ours is applicable to a wide range of models under more general assumptions.  相似文献   

17.
This paper investigates estimation of parameters in a combination of the multivariate linear model and growth curve model, called a generalized GMANOVA model. Making analogy between the outer product of data vectors and covariance yields an approach to directly do least squares to covariance. An outer product least squares estimator of covariance (COPLS estimator) is obtained and its distribution is presented if a normal assumption is imposed on the error matrix. Based on the COPLS estimator, two-stage generalized least squares estimators of the regression coefficients are derived. In addition, asymptotic normalities of these estimators are investigated. Simulation studies have shown that the COPLS estimator and two-stage GLS estimators are alternative competitors with more efficiency in the sense of sample mean, standard deviations and mean of the variance estimates to the existing ML estimator in finite samples. An example of application is also illustrated.  相似文献   

18.
The main focus of our paper is to compare the performance of different model selection criteria used for multivariate reduced rank time series. We consider one of the most commonly used reduced rank model, that is, the reduced rank vector autoregression (RRVAR (p, r)) introduced by Velu et al. [Reduced rank models for multiple time series. Biometrika. 1986;7(31):105–118]. In our study, the most popular model selection criteria are included. The criteria are divided into two groups, that is, simultaneous selection and two-step selection criteria, accordingly. Methods from the former group select both an autoregressive order p and a rank r simultaneously, while in the case of two-step criteria, first an optimal order p is chosen (using model selection criteria intended for the unrestricted VAR model) and then an optimal rank r of coefficient matrices is selected (e.g. by means of sequential testing). Considered model selection criteria include well-known information criteria (such as Akaike information criterion, Schwarz criterion, Hannan–Quinn criterion, etc.) as well as widely used sequential tests (e.g. the Bartlett test) and the bootstrap method. An extensive simulation study is carried out in order to investigate the efficiency of all model selection criteria included in our study. The analysis takes into account 34 methods, including 6 simultaneous methods and 28 two-step approaches, accordingly. In order to carefully analyse how different factors affect performance of model selection criteria, we consider over 150 simulation settings. In particular, we investigate the influence of the following factors: time series dimension, different covariance structure, different level of correlation among components and different level of noise (variance). Moreover, we analyse the prediction accuracy concerned with the application of the RRVAR model and compare it with results obtained for the unrestricted vector autoregression. In this paper, we also present a real data application of model selection criteria for the RRVAR model using the Polish macroeconomic time series data observed in the period 1997–2007.  相似文献   

19.
20.
The restrictive properties of compositional data, that is multivariate data with positive parts that carry only relative information in their components, call for special care to be taken while performing standard statistical methods, for example, regression analysis. Among the special methods suitable for handling this problem is the total least squares procedure (TLS, orthogonal regression, regression with errors in variables, calibration problem), performed after an appropriate log-ratio transformation. The difficulty or even impossibility of deeper statistical analysis (confidence regions, hypotheses testing) using the standard TLS techniques can be overcome by calibration solution based on linear regression. This approach can be combined with standard statistical inference, for example, confidence and prediction regions and bounds, hypotheses testing, etc., suitable for interpretation of results. Here, we deal with the simplest TLS problem where we assume a linear relationship between two errorless measurements of the same object (substance, quantity). We propose an iterative algorithm for estimating the calibration line and also give confidence ellipses for the location of unknown errorless results of measurement. Moreover, illustrative examples from the fields of geology, geochemistry and medicine are included. It is shown that the iterative algorithm converges to the same values as those obtained using the standard TLS techniques. Fitted lines and confidence regions are presented for both original and transformed compositional data. The paper contains basic principles of linear models and addresses many related problems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号