首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In practice, when a principal component analysis is applied on a large number of variables the resultant principal components may not be easy to interpret, as each principal component is a linear combination of all the original variables. Selection of a subset of variables that contains, in some sense, as much information as possible and enhances the interpretations of the first few covariance principal components is one possible approach to tackle this problem. This paper describes several variable selection criteria and investigates which criteria are best for this purpose. Although some criteria are shown to be better than others, the main message of this study is that it is unwise to rely on only one or two criteria. It is also clear that the interdependence between variables and the choice of how to measure closeness between the original components and those using subsets of variables are both important in determining the best criteria to use.  相似文献   

2.
The goal of the current paper is to compare consistent and inconsistent model selection criteria by looking at their convergence rates (to be defined in the first section). The prototypes of the two types of criteria are the AIC and BIC criterion respectively. For linear regression models with normally distributed errors, we show that the convergence rates for AIC and BIC are 0(n-1) and 0((n log n)-1/2) respectively. When the error distributions are unknown, the two criteria become indistinguishable, all having convergence rate O(n-1/2). We also argue that the BIC criterion has nearly optimal convergence rate. The results partially justified some of the controversial simulation results in which inconsistent criteria seem to outperform consistent ones.  相似文献   

3.
The performance of minimum aberration two-level fractional factorial designs is studied under two criteria of model robustness. Simple sufficient conditions for a design to dominate another design with respect to each of these two criteria are derived. It is also shown that a minimum aberration design of resolution III or higher maximizes the number of two-factor interactions which are not aliases of main effects and, subject to that condition, minimizes the sum of squares of the sizes of alias sets of two-factor interactions. This roughly says that minimum aberration designs tend to make the sizes of the alias sets very uniform. It follows that minimum aberration is a good surrogate for the two criteria of model robustness that are studied here. Examples are given to show that minimum aberration designs are indeed highly efficient.  相似文献   

4.
The paper addresses a formal definition of a confounder based on the qualitative definition that is commonly used in standard epidemiology text-books. To derive the criterion for a factor to be a confounder given by Miettinen and Cook and to clarify inconsistency between various criteria for a confounder, we introduce the concepts of an irrelevant factor, an occasional confounder and a uniformly irrelevant factor. We discuss criteria for checking these and show that Miettinen and Cook's criterion can also be applied to occasional confounders. Moreover, we consider situations with multiple potential confounders, and we obtain two necessary conditions that are satisfied by each confounder set. None of the definitions and results presented in this paper require the ignorability and sufficient control confounding assumptions which are commonly employed in observational and epidemiological studies.  相似文献   

5.
In this paper, different dissimilarity measures are investigated to construct maximin designs for compositional data. Specifically, the effect of different dissimilarity measures on the maximin design criterion for two case studies is presented. Design evaluation criteria are proposed to distinguish between the maximin designs generated. An optimization algorithm is also presented. Divergence is found to be the best dissimilarity measure to use in combination with the maximin design criterion for creating space-filling designs for mixture variables.  相似文献   

6.
One of the main advantages of factorial experiments is the information that they can offer on interactions. When there are many factors to be studied, some or all of this information is often sacrificed to keep the size of an experiment economically feasible. Two strategies for group screening are presented for a large number of factors, over two stages of experimentation, with particular emphasis on the detection of interactions. One approach estimates only main effects at the first stage (classical group screening), whereas the other new method (interaction group screening) estimates both main effects and key two-factor interactions at the first stage. Three criteria are used to guide the choice of screening technique, and also the size of the groups of factors for study in the first-stage experiment. The criteria seek to minimize the expected total number of observations in the experiment, the probability that the size of the experiment exceeds a prespecified target and the proportion of active individual factorial effects which are not detected. To implement these criteria, results are derived on the relationship between the grouped and individual factorial effects, and the probability distributions of the numbers of grouped factors whose main effects or interactions are declared active at the first stage. Examples are used to illustrate the methodology, and some issues and open questions for the practical implementation of the results are discussed.  相似文献   

7.
Abstract

This paper is devoted to attain multiple objects via proposing two compound optimality criteria constructed with A-optimality criterion. The offered compound criteria are ADP-optimality to seek about an optimal design for minimizing the average variance, having an efficient parameter estimates, likewise, maximizing the probability of a particular event and AKL-optimality that provides an identified balance between model discrimination and minimizing the average variance of the parameter estimates. The equivalence theorems are stated and proved. Finally, a numerical example is applied on probit GLMs to illustrate the results for both compound criteria.  相似文献   

8.
As a natural successor of the information criteria AIC and ABIC, information criteria for the Bayes models were developed by evaluating the bias of the log likelihood of the predictive distribution as an estimate of its expected log-likelihood. Considering two specific situations for the true distribution, two information criteria, PIC1 and PIC2 are derived. Linear Gaussian cases are considered in details and the evaluation of the maximum a posteriori estimator is also considered. By a simple example of estimating the signal to noise ratio, it was shown that the PIC2 is a good approximation to the expected log-likelihood in the entire region of the signal to noise ratio. On the other hand, PIC1 performs good only for the smaller values of the variance ratio. For illustration, the problems of trend estimation and seasonal adjustment are considered. Examples show that the hyper-parameters estimated by the new criteria are usually closer to the best ones than those by the ABIC.  相似文献   

9.
We consider the problem of choosing the ridge parameter. Two penalized maximum likelihood (PML) criteria based on a distribution-free and a data-dependent penalty function are proposed. These PML criteria can be considered as “continuous” versions of AIC. A systematic simulation is conducted to compare the suggested criteria to several existing methods. The simulation results strongly support the use of our method. The method is also applied to two real data sets.  相似文献   

10.
To measure the distance between a robust function evaluated under the true regression model and under a fitted model, we propose generalized Kullback–Leibler information. Using this generalization we have developed three robust model selection criteria, AICR*, AICCR* and AICCR, that allow the selection of candidate models that not only fit the majority of the data but also take into account non-normally distributed errors. The AICR* and AICCR criteria can unify most existing Akaike information criteria; three examples of such unification are given. Simulation studies are presented to illustrate the relative performance of each criterion.  相似文献   

11.
After initiating the theory of optimal design by Smith (1918), many optimality criteria were introduced. Atkinson et al. (2007) used the definition of compound design criteria to combine two optimality criteria and introduced the DT- and CD-optimalities criteria. This paper introduces the CDT-optimum design that provides a specified balance between model discrimination, parameter estimation and estimation of a parametric function such as the area under curve in models for drug absorbance. An equivalence theorem is presented for the case of two models.  相似文献   

12.
Drug switchability requires the evidence of individual bioequivalence which -refers to the comparison of the closeness between the two distributions of the pharmacokinetic (PK) responses from the same subject obtained under the repeated administrations of the test and reference formulations. Advantages and drawbacks of the current statistical procedures for assessment of individual bioequivalence are discussed with emphasis on the aggregate-based criteria, An intersection-union test based on disaggregate criteria is proposed for the evaluation of individual bioequivalence. In addition, a modified aggregated criterion is suggested to overcome the drawbacks suffered by aggregate criteria. The relationships among different criteria are examined, and the performance of the procedures will be compared. A numerical example is given to illustrate the proposed procedures.  相似文献   

13.
Model selection criteria are frequently developed by constructing estimators of discrepancy measures that assess the disparity between the 'true' model and a fitted approximating model. The Akaike information criterion (AIC) and its variants result from utilizing Kullback's directed divergence as the targeted discrepancy. The directed divergence is an asymmetric measure of separation between two statistical models, meaning that an alternative directed divergence can be obtained by reversing the roles of the two models in the definition of the measure. The sum of the two directed divergences is Kullback's symmetric divergence. In the framework of linear models, a comparison of the two directed divergences reveals an important distinction between the measures. When used to evaluate fitted approximating models that are improperly specified, the directed divergence which serves as the basis for AIC is more sensitive towards detecting overfitted models, whereas its counterpart is more sensitive towards detecting underfitted models. Since the symmetric divergence combines the information in both measures, it functions as a gauge of model disparity which is arguably more balanced than either of its individual components. With this motivation, the paper proposes a new class of criteria for linear model selection based on targeting the symmetric divergence. The criteria can be regarded as analogues of AIC and two of its variants: 'corrected' AIC or AICc and 'modified' AIC or MAIC. The paper examines the selection tendencies of the new criteria in a simulation study and the results indicate that they perform favourably when compared to their AIC analogues.  相似文献   

14.
Two designs equivalent under one or two criteria may be compared under other criteria. For certain configurations of eigenvalues of the information matrices, we decide which design is the better of the two for many other such criteria. The relationship to universal optimality (in the case of equivalence under one criterion) is indicated. For two criteria, applications are given to weighing and treatment-with-covariate settings.  相似文献   

15.
Several estimators of squared prediction error have been suggested for use in model and bandwidth selection problems. Among these are cross-validation, generalized cross-validation and a number of related techniques based on the residual sum of squares. For many situations with squared error loss, e.g. nonparametric smoothing, these estimators have been shown to be asymptotically optimal in the sense that in large samples the estimator minimizing the selection criterion also minimizes squared error loss. However, cross-validation is known not to be asymptotically optimal for some `easy' location problems. We consider selection criteria based on estimators of squared prediction risk for choosing between location estimators. We show that criteria based on adjusted residual sum of squares are not asymptotically optimal for choosing between asymptotically normal location estimators that converge at rate n 1/2but are when the rate of convergence is slower. We also show that leave-one-out cross-validation is not asymptotically optimal for choosing between √ n -differentiable statistics but leave- d -out cross-validation is optimal when d ∞ at the appropriate rate.  相似文献   

16.
In this paper we consider a Bayesian predictive approach to sample size determination in equivalence trials. Equivalence experiments are conducted to show that the unknown difference between two parameters is small. For instance, in clinical practice this kind of experiment aims to determine whether the effects of two medical interventions are therapeutically similar. We declare an experiment successful if an interval estimate of the effects‐difference is included in a set of values of the parameter of interest indicating a negligible difference between treatment effects (equivalence interval). We derive two alternative criteria for the selection of the optimal sample size, one based on the predictive expectation of the interval limits and the other based on the predictive probability that these limits fall in the equivalence interval. Moreover, for both criteria we derive a robust version with respect to the choice of the prior distribution. Numerical results are provided and an application is illustrated when the normal model with conjugate prior distributions is assumed.  相似文献   

17.
A variable sample size (VSS) scheme directly monitoring the coefficient of variation (CV), instead of monitoring the transformed statistics, is proposed. Optimal chart parameters are computed based on two criteria: (i) minimizing the out-of-control ARL (ARL1) and (ii) minimizing the out-of-control ASS (ASS1). Then the performances are compared between these two criteria. The advantages of the proposed chart over the VSS chart based on the transformed statistics in the existing literature are: the former (i) provides an easier alternative as no transformation is involved and (ii) requires less number of observations to detect a shift when ASS1 is minimized.  相似文献   

18.
We consider optimal designs for a class of symmetric models for binary data which includes the common probit and logit models. We show that for a large group of optimality criteria which includes the main ones in the literature (e.g. A-, D-, E-, F- and G-optimality) the optimal design for our class of models is a two-point design with support points symmetrically placed about the ED50 but with possibly unequal weighting. We demonstrate how one can further reduce the problem to a one-variable optimization by characterizing various of the common criteria. We also use the results to demonstrate major qualitative differences between the F - and c-optimal designs, two design criteria which have similar motivation.  相似文献   

19.
When there are many explanatory variables in the regression model, there is a chance that some of these are intercorrelated. This is where the problem of multicollinearity creeps in due to which precision and accuracy of the coefficients is marred, and the quest to find the best model becomes tedious. To tackle such a situation, Model selection criteria are applied for selecting the best model that fits the data. Current study focuses on the evaluation of the four unmodified and four modified versions of generalized information criteria—Akaike Information Criterion, Schwarz's Bayes Information Criteria, Hannan-Quinn Information Criterion, and Akaike Information Criterion corrected for small samples. A simulation study using SAS software was carried out in order to compare the unmodified and modified versions of the generalized information criteria and to discover the best version amongst the four modified model selection criteria, for identifying the best model, when the collinearity assumption is violated. For the proposed simulation, two samples of size 50 and 100, for three explanatory variables X1, X2, and X3, are drawn from Normal distribution. Two situations of collinearity violations between X1 and X2 are looked into, first when ρ = 0.6 and second when ρ = 0.8. The outcomes of the simulations are displayed in the tables along with visual representations. The results revealed that modified versions of the generalized information criteria are more sensitive in identifying models marred with high multicollinearity as compared to the unmodified generalized information criteria.  相似文献   

20.
In unsupervised classification, Hidden Markov Models (HMM) are used to account for a neighborhood structure between observations. The emission distributions are often supposed to belong to some parametric family. In this paper, a semiparametric model where the emission distributions are a mixture of parametric distributions is proposed to get a higher flexibility. We show that the standard EM algorithm can be adapted to infer the model parameters. For the initialization step, starting from a large number of components, a hierarchical method to combine them into the hidden states is proposed. Three likelihood-based criteria to select the components to be combined are discussed. To estimate the number of hidden states, BIC-like criteria are derived. A simulation study is carried out both to determine the best combination between the combining criteria and the model selection criteria and to evaluate the accuracy of classification. The proposed method is also illustrated using a biological dataset from the model plant Arabidopsis thaliana. A R package HMMmix is freely available on the CRAN.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号