首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The conceptual predictive statistic, Cp, is a widely used criterion for model selection in linear regression. Cp serves as an estimator of a discrepancy, a measure that reflects the disparity between the generating model and a fitted candidate model. This discrepancy, based on scaled squared error loss, is asymmetric: an alternate measure is obtained by reversing the roles of the two models in the definition of the measure. We propose a variant of the Cp statistic based on estimating a symmetrized version of the discrepancy targeted by Cp. We claim that the resulting criterion provides better protection against overfitting than Cp, since the symmetric discrepancy is more sensitive towards detecting overspecification than its asymmetric counterpart. We illustrate our claim by presenting simulation results. Finally, we demonstrate the practical utility of the new criterion by discussing a modeling application based on data collected in a cardiac rehabilitation program at University of Iowa Hospitals and Clinics.  相似文献   

2.
We address the issue of order identification for hidden Markov models with Poisson and Gaussian emissions. We prove information-theoretic BIC-like mixture inequalities in the spirit of Finesso [1991. Consistent estimation of the order for Markov and hidden Markov chains. Ph.D. Thesis, University of Maryland]; Liu and Narayan [1994. Order estimation and sequential universal data compression of a hidden Markov source by the method of mixtures. Canad. J. Statist. 30(4), 573–589]; Gassiat and Boucheron [2003. Optimal error exponents in hidden Markov models order estimation. IEEE Trans. Inform. Theory 49(4), 964–980]. These inequalities lead to consistent penalized estimators that need no prior bound on the order. A simulation study and an application to postural analysis in humans are provided.  相似文献   

3.
In likelihood analysis of categorized data, it is well known that within a restricted class of log-linear models the likelihood kernels for multinomial and product multinomial sampling distributions are identical. In practical terms the estimation procedure for one is appropriate for the other. There does not appear to be a widespread realization that a similar result holds for a wide class of the Grizzle, Starmer, and Koch (1969) weighted least squares techniques. In this report such a fundamental relationship is explicitly presented and illustrated through two analyses of Bartlett's (1935) data.  相似文献   

4.
Linear discriminant analysis and quadratic discriminant analysis are used to predict group membership. Rare populations present situations in which group sizes differ drastically. This article examined k = 2 and k = 4 predictor variables for groups with different levels of rarity and different levels of sensitivity and specificity. Sample size recommendations were generated for both minimum and maximum group overlap using the leave-one-out (L-O-O) method of estimation. Minimum sample size recommendations are provided in tables for immediate implementation by applied researchers.  相似文献   

5.
Multilevel models have been widely applied to analyze data sets which present some hierarchical structure. In this paper we propose a generalization of the normal multilevel models, named elliptical multilevel models. This proposal suggests the use of distributions in the elliptical class, thus involving all symmetric continuous distributions, including the normal distribution as a particular case. Elliptical distributions may have lighter or heavier tails than the normal ones. In the case of normal error models with the presence of outlying observations, heavy-tailed error models may be applied to accommodate such observations. In particular, we discuss some aspects of the elliptical multilevel models, such as maximum likelihood estimation and residual analysis to assess features related to the fitting and the model assumptions. Finally, two motivating examples analyzed under normal multilevel models are reanalyzed under Student-t and power exponential multilevel models. Comparisons with the normal multilevel model are performed by using residual analysis.  相似文献   

6.
Summary.  Model selection for marginal regression analysis of longitudinal data is challenging owing to the presence of correlation and the difficulty of specifying the full likelihood, particularly for correlated categorical data. The paper introduces a novel Bayesian information criterion type model selection procedure based on the quadratic inference function, which does not require the full likelihood or quasi-likelihood. With probability approaching 1, the criterion selects the most parsimonious correct model. Although a working correlation matrix is assumed, there is no need to estimate the nuisance parameters in the working correlation matrix; moreover, the model selection procedure is robust against the misspecification of the working correlation matrix. The criterion proposed can also be used to construct a data-driven Neyman smooth test for checking the goodness of fit of a postulated model. This test is especially useful and often yields much higher power in situations where the classical directional test behaves poorly. The finite sample performance of the model selection and model checking procedures is demonstrated through Monte Carlo studies and analysis of a clinical trial data set.  相似文献   

7.
In the conventional hypothesis-testing approach to the detection of a unit root and a trend break, selections of the outlier type (additive or innovational) and of the break type (jump or kink) are carried out arbitrarily, because there is no generally accepted statistical technique. To overcome this problem, a model-selection approach using the modified Bayesian information criterion (MBIC) is proposed. Whether the observed time series contains a unit root and a trend break is determined as a result of model selection from among alternative models with and without unit root and trend break. The efficacy of the proposed approach is verified using comprehensive simulations.  相似文献   

8.
9.
In this article, we develop a robust variable selection procedure jointly for fixed and random effects in linear mixed models for longitudinal data. We propose a penalized robust estimator for both the regression coefficients and the variance of random effects based on a re-parametrization of the linear mixed models. Under some regularity conditions, we show the oracle properties of the proposed robust variable selection method. Simulation study shows the robustness of the proposed method against outliers. In the end, the proposed methods is illustrated in the analysis of a real data set.  相似文献   

10.
Abstract

Sample size calculation is an important component in designing an experiment or a survey. In a wide variety of fields—including management science, insurance, and biological and medical science—truncated normal distributions are encountered in many applications. However, the sample size required for the left-truncated normal distribution has not been investigated, because the distribution of the sample mean from the left-truncated normal distribution is complex and difficult to obtain. This paper compares an ad hoc approach to two newly proposed methods based on the Central Limit Theorem and on a high degree saddlepoint approximation for calculating the required sample size with the prespecified power. As shown by use of simulations and an example of health insurance cost in China, the ad hoc approach underestimates the sample size required to achieve prespecified power. The method based on the high degree saddlepoint approximation provides valid sample size and power calculations, and it performs better than the Central Limit Theorem. When the sample size is not too small, the Central Limit Theorem also provides a valid, but relatively simple tool to approximate that sample size.  相似文献   

11.
Model choice is one of the most crucial aspect in any statistical data analysis. It is well known that most models are just an approximation to the true data-generating process but among such model approximations, it is our goal to select the ‘best’ one. Researchers typically consider a finite number of plausible models in statistical applications, and the related statistical inference depends on the chosen model. Hence, model comparison is required to identify the ‘best’ model among several such candidate models. This article considers the problem of model selection for spatial data. The issue of model selection for spatial models has been addressed in the literature by the use of traditional information criteria-based methods, even though such criteria have been developed based on the assumption of independent observations. We evaluate the performance of some of the popular model selection critera via Monte Carlo simulation experiments using small to moderate samples. In particular, we compare the performance of some of the most popular information criteria such as Akaike information criterion (AIC), Bayesian information criterion, and corrected AIC in selecting the true model. The ability of these criteria to select the correct model is evaluated under several scenarios. This comparison is made using various spatial covariance models ranging from stationary isotropic to nonstationary models.  相似文献   

12.
In Wu and Zen (1999), a linear model selection procedure based on M-estimation is proposed, which includes many classical model selection criteria as its special cases, and it is shown that the selection procedure is strongly consistent for a variety of penalty functions. In this paper, we will investigate its small sample performances for some choices of fixed penalty functions. It can be seen that the performance varies with the choice of the penalty. Hence, a randomized penalty based on observed data is proposed, which preserves the consistency property and provides improved performance over a fixed choice of penalty functions.  相似文献   

13.
It is shown that dropping quantitative variables from a linear regression, based on t-statistics, is mathematically equivalent to dropping variables based on commonly used information criteria.  相似文献   

14.
In this paper, we investigate a mixture problem with two responses, which are functions of the mixing proportions, and are correlated with known dispersion matrix. We obtain D- and A-optimal designs for estimating the parameters of the response functions, when none or some of the regression coefficients of the two functions are the same. It is shown that when no prior knowledge about the regression coefficients is available, the D-optimal design is independent of the dispersion matrix, while the A-optimal design depends on it, provided the response functions are of different degree. On the other hand, when some of the regression coefficients are known to be the same for both the functions, the D-optimal design depends on the dispersion matrix when the two response functions are not of the same degree.  相似文献   

15.
Selecting an appropriate structure for a linear mixed model serves as an appealing problem in a number of applications such as in the modelling of longitudinal or clustered data. In this paper, we propose a variable selection procedure for simultaneously selecting and estimating the fixed and random effects. More specifically, a profile log-likelihood function, along with an adaptive penalty, is utilized for sparse selection. The Newton-Raphson optimization algorithm is performed to complete the parameter estimation. By jointly selecting the fixed and random effects, the proposed approach increases selection accuracy compared with two-stage procedures, and the usage of the profile log-likelihood can improve computational efficiency in one-stage procedures. We prove that the proposed procedure enjoys the model selection consistency. A simulation study and a real data application are conducted for demonstrating the effectiveness of the proposed method.  相似文献   

16.
The present article deals with the problem of misspecifying the disturbance-covariance matrix as scalar, when it is locally non scalar. We consider a family of shrinkage estimators based on OLS estimator and compare its asymptotic properties with the properties of OLS estimator. We proposed a similar family of estimators based on FGLS and compared its asymptotic properties with the shrinkage estimator based on OLS under a Pitman's drift process. The effect of misspecifying the disturbances covariance matrix was analyzed with the help of a numerical simulation.  相似文献   

17.
ABSTRACT

I use longitudinal survey data from commercial fishing deckhands in the Alaskan Bering Sea to provide new insights on empirical methods commonly used to estimate compensating wage differentials and the value of statistical life (VSL). The unique setting exploits intertemporal variation in fatality rates and wages within worker-vessel pairs caused by a combination of weather patterns and policy changes, allowing identification of parameters and biases that it has only been possible to speculate about in more general settings. I show that estimation strategies common in the literature produce biased estimates in this setting, and decompose the bias components due to latent worker, establishment, and job-match heterogeneity. The estimates also remove the confounding effects of endogenous job mobility and dynamic labor market search, narrowing a conceptual gap between search-based hedonic wage theory and its empirical applications. I find that workers’ marginal aversion to fatal risk falls as risk levels rise, which suggests complementarities in the benefits of public safety policies. Supplementary materials for this article are available online.  相似文献   

18.
顾客满意度模型的样本量研究   总被引:2,自引:0,他引:2       下载免费PDF全文
梁燕  金勇进 《统计研究》2007,24(7):68-74
本文在对顾客满意度模型及其估计方法PLS(Partial Least Square)进行简单讨论的基础上,详细研究了顾客满意度模型PLS估计方法需要的样本量,并针对中国顾客满意度研究的实际案例数据,给出了顾客满意度模型的样本量要求的建议,对顾客满意度实践有指导意义。  相似文献   

19.
Sparsity-inducing penalties are useful tools for variable selection and are also effective for regression problems where the data are functions. We consider the problem of selecting not only variables but also decision boundaries in multiclass logistic regression models for functional data, using sparse regularization. The parameters of the functional logistic regression model are estimated in the framework of the penalized likelihood method with the sparse group lasso-type penalty, and then tuning parameters for the model are selected using the model selection criterion. The effectiveness of the proposed method is investigated through simulation studies and the analysis of a gene expression data set.  相似文献   

20.
This article deals with a semisupervised learning based on naive Bayes assumption. A univariate Gaussian mixture density is used for continuous input variables whereas a histogram type density is adopted for discrete input variables. The EM algorithm is used for the computation of maximum likelihood estimators of parameters in the model when we fix the number of mixing components for each continuous input variable. We carry out a model selection for choosing a parsimonious model among various fitted models based on an information criterion. A common density method is proposed for the selection of significant input variables. Simulated and real datasets are used to illustrate the performance of the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号