共查询到20条相似文献,搜索用时 19 毫秒
1.
K. Lam 《统计学通讯:模拟与计算》2013,42(3):995-1006
The problem of selecting the best of k normal populations with unknown and possibly unequal variances is considered The two-stage procedure proposed by Rinott (1978) is improved so that less samples need to be drawn in the second-stage of the sampling scheme 相似文献
2.
Regression plays a central role in the discipline of statistics and is the primary analytic technique in many research areas. Variable selection is a classical and major problem for regression. This article emphasizes the economic aspect of variable selection. The problem is formulated in terms of the cost of predictors to be purchased for future use: only the subset of covariates used in the model will need to be purchased. This leads to a decision-theoretic formulation of the variable selection problems, which includes the cost of predictors as well as their effect. We adopt a Bayesian perspective and propose two approaches to address uncertainty about the model and model parameters. These approaches, termed the restricted and extended approaches, lead us to rethink model averaging. From an objective or robust Bayes point of view, the former is preferred. The proposed method is applied to three popular datasets for illustration. 相似文献
3.
Lexin Li R. Dennis Cook Christopher J. Nachtsheim 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2005,67(2):285-299
Summary. The importance of variable selection in regression has grown in recent years as computing power has encouraged the modelling of data sets of ever-increasing size. Data mining applications in finance, marketing and bioinformatics are obvious examples. A limitation of nearly all existing variable selection methods is the need to specify the correct model before selection. When the number of predictors is large, model formulation and validation can be difficult or even infeasible. On the basis of the theory of sufficient dimension reduction, we propose a new class of model-free variable selection approaches. The methods proposed assume no model of any form, require no nonparametric smoothing and allow for general predictor effects. The efficacy of the methods proposed is demonstrated via simulation, and an empirical example is given. 相似文献
4.
Nitis Mukhopadhyay 《统计学通讯:理论与方法》2013,42(7):671-683
In this paper we study the procedures of Dudewicz and Dalal ( 1975 ), and the modifications suggested by Rinott ( 1978 ), for selecting the largest mean from k normal populations with unknown variances. We look at the case k = 2 in detail, because there is an optimal allocation scheme here. We do not really allocate the total number of samples into two groups, but we estimate this optimal sample size, as well, so as to guarantee the probability of correct selection (written as P(CS)) at least P?, 1/2 < P? < 1 . We prove that the procedure of Rinott is “asymptotically in-efficient” (to be defined below) in the sense of Chow and Robbins ( 1965 ) for any k 2. Next, we propose two-stage procedures having all the properties of Rinott's procedure, together with the property of “asymptotic efficiency” - which is highly desirable. 相似文献
5.
A robust rank-based estimator for variable selection in linear models, with grouped predictors, is studied. The proposed estimation procedure extends the existing rank-based variable selection [Johnson, B.A., and Peng, L. (2008), ‘Rank-based Variable Selection’, Journal of Nonparametric Statistics, 20(3):241–252] and the ww-scad [Wang, L., and Li, R. (2009), ‘Weighted Wilcoxon-type Smoothly Clipped Absolute Deviation Method’, Biometrics, 65(2):564–571] to linear regression models with grouped variables. The resulting estimator is robust to contamination or deviations in both the response and the design space.The Oracle property and asymptotic normality of the estimator are established under some regularity conditions. Simulation studies reveal that the proposed method performs better than the existing rank-based methods [Johnson, B.A., and Peng, L. (2008), ‘Rank-based Variable Selection’, Journal of Nonparametric Statistics, 20(3):241–252; Wang, L., and Li, R. (2009), ‘Weighted Wilcoxon-type Smoothly Clipped Absolute Deviation Method’, Biometrics, 65(2):564–571] for grouped variables models. This estimation procedure also outperforms the adaptive hlasso [Zhou, N., and Zhu, J. (2010), ‘Group Variable Selection Via a Hierarchical Lasso and its Oracle Property’, Interface, 3(4):557–574] in the presence of local contamination in the design space or for heavy-tailed error distribution. 相似文献
6.
Yosef Rinott 《统计学通讯:理论与方法》2013,42(8):799-811
In this paper we discuss a modification of the Dudewicz-Dalal procedure for the problem of selecting the population with the largest mean from k normal populations with unknown variances. We derive some inequalities and use them to lower-bound the probability of correct selection. These bounds are applied to the determination of the second-stage sample size which is required in order to achieve a prescribed probability of correct selection. We discuss the resulting procedure and compare it to that of Dudewicz and Dalai (1975). 相似文献
7.
Empirical likelihood based variable selection 总被引:1,自引:0,他引:1
Asokan Mulayath Variyath Jiahua Chen Bovas Abraham 《Journal of statistical planning and inference》2010
Information criteria form an important class of model/variable selection methods in statistical analysis. Parametric likelihood is a crucial part of these methods. In some applications such as the generalized linear models, the models are only specified by a set of estimating functions. To overcome the non-availability of well defined likelihood function, the information criteria under empirical likelihood are introduced. Under this setup, we successfully solve the existence problem of the profile empirical likelihood due to the over constraint in variable selection problems. The asymptotic properties of the new method are investigated. The new method is shown to be consistent at selecting the variables under mild conditions. Simulation studies find that the proposed method has comparable performance to the parametric information criteria when a suitable parametric model is available, and is superior when the parametric model assumption is violated. A real data set is also used to illustrate the usefulness of the new method. 相似文献
8.
U-estimates are defined as maximizers of objective functions that are U-statistics. As an alternative to M-estimates, U-estimates have been extensively used in linear regression, classification, survival analysis, and many other areas. They may rely on weaker data and model assumptions and be preferred over alternatives. In this article, we investigate penalized variable selection with U-estimates. We propose smooth approximations of the objective functions, which can greatly reduce computational cost without affecting asymptotic properties. We study penalized variable selection using penalties that have been well investigated with M-estimates, including the LASSO, adaptive LASSO, and bridge, and establish their asymptotic properties. Generically applicable computational algorithms are described. Performance of the penalized U-estimates is assessed using numerical studies. 相似文献
9.
《Journal of Statistical Computation and Simulation》2012,82(3-4):177-185
For stepwise regression and discriminant analysis the parameters F in and F out govern the inclusion and deletion of variables. The candidate variable with the biggest F—ratio is included if this exceeds F inthe included variable with the smallest F—ratio is deleted if this is less than F in If F in ≧F out; then return to a previous subset size implies improvement in the criterion measure. This result also holds for a generalization, stepwise multivariate analysis, which includes stepwise regression and discriminant analysis as special cases Eliminations do not occur if forward regression and backward elimination yield the same sequence of subsets. Conversely, there is a more liberal stepping rule which always eliminates if the two sequences differ. 相似文献
10.
Penalized variable selection methods have been extensively studied for standard time-to-event data. Such methods cannot be directly applied when subjects are at risk of multiple mutually exclusive events, known as competing risks. The proportional subdistribution hazard (PSH) model proposed by Fine and Gray (J Am Stat Assoc 94:496–509, 1999) has become a popular semi-parametric model for time-to-event data with competing risks. It allows for direct assessment of covariate effects on the cumulative incidence function. In this paper, we propose a general penalized variable selection strategy that simultaneously handles variable selection and parameter estimation in the PSH model. We rigorously establish the asymptotic properties of the proposed penalized estimators and modify the coordinate descent algorithm for implementation. Simulation studies are conducted to demonstrate the good performance of the proposed method. Data from deceased donor kidney transplants from the United Network of Organ Sharing illustrate the utility of the proposed method. 相似文献
11.
We consider the problem of variable selection for a class of varying coefficient models with instrumental variables. We focus on the case that some covariates are endogenous variables, and some auxiliary instrumental variables are available. An instrumental variable based variable selection procedure is proposed by using modified smooth-threshold estimating equations (SEEs). The proposed procedure can automatically eliminate the irrelevant covariates by setting the corresponding coefficient functions as zero, and simultaneously estimate the nonzero regression coefficients by solving the smooth-threshold estimating equations. The proposed variable selection procedure avoids the convex optimization problem, and is flexible and easy to implement. Simulation studies are carried out to assess the performance of the proposed variable selection method. 相似文献
12.
Hugh Chipman 《Revue canadienne de statistique》1996,24(1):17-36
In data sets with many predictors, algorithms for identifying a good subset of predictors are often used. Most such algorithms do not allow for any relationships between predictors. For example, stepwise regression might select a model containing an interaction AB but neither main effect A or B. This paper develops mathematical representations of this and other relations between predictors, which may then be incorporated in a model selection procedure. A Bayesian approach that goes beyond the standard independence prior for variable selection is adopted, and preference for certain models is interpreted as prior information. Priors relevant to arbitrary interactions and polynomials, dummy variables for categorical factors, competing predictors, and restrictions on the size of the models are developed. Since the relations developed are for priors, they may be incorporated in any Bayesian variable selection algorithm for any type of linear model. The application of the methods here is illustrated via the stochastic search variable selection algorithm of George and McCulloch (1993), which is modified to utilize the new priors. The performance of the approach is illustrated with two constructed examples and a computer performance dataset. 相似文献
13.
A fast Bayesian method that seamlessly fuses classification and hypothesis testing via discriminant analysis is developed. Building upon the original discriminant analysis classifier, modelling components are added to identify discriminative variables. A combination of cake priors and a novel form of variational Bayes we call reverse collapsed variational Bayes gives rise to variable selection that can be directly posed as a multiple hypothesis testing approach using likelihood ratio statistics. Some theoretical arguments are presented showing that Chernoff-consistency (asymptotically zero type I and type II error) is maintained across all hypotheses. We apply our method on some publicly available genomics datasets and show that our method performs well in practice for its computational cost. An R package VaDA has also been made available on Github. 相似文献
14.
P. J. Brown M. Vannucci & T. Fearn 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1998,60(3):627-641
The multivariate regression model is considered with p regressors. A latent vector with p binary entries serves to identify one of two types of regression coefficients: those close to 0 and those not. Specializing our general distributional setting to the linear model with Gaussian errors and using natural conjugate prior distributions, we derive the marginal posterior distribution of the binary latent vector. Fast algorithms aid its direct computation, and in high dimensions these are supplemented by a Markov chain Monte Carlo approach to sampling from the known posterior distribution. Problems with hundreds of regressor variables become quite feasible. We give a simple method of assigning the hyperparameters of the prior distribution. The posterior predictive distribution is derived and the approach illustrated on compositional analysis of data involving three sugars with 160 near infrared absorbances as regressors. 相似文献
15.
When one or few observations are deleted in the multiple linear regression model, they can affect the variable selection. In this paper we derived the formula for the Mallows Cp criterion when k observations are deleted and express it as a functionn of basic building blocks such as residuals and leverages. Also, two real date sets are used to see how the selected model changes as few observations re deleted. 相似文献
16.
A stepwise variable selection procedure for multinomial discrimination is presented and discussed. Based upon the work of Kullback and Hills, stopping rules are proposed and illustrated for a set of data on communication buyer behavior. 相似文献
17.
Constrained estimators that enforce variable selection and grouping of highly correlated data have been shown to be successful in finding sparse representations and obtaining good performance in prediction. We consider polytopes as a general class of compact and convex constraint regions. Well-established procedures like LASSO (Tibshirani, 1996) or OSCAR (Bondell and Reich, 2008) are shown to be based on specific subclasses of polytopes. The general framework of polytopes can be used to investigate the geometric structure that underlies these procedures. Moreover, we propose a specifically designed class of polytopes that enforces variable selection and grouping. Simulation studies and an application illustrate the usefulness of the proposed method. 相似文献
18.
The adaptive least absolute shrinkage and selection operator (Lasso) and least absolute deviation (LAD)-Lasso are two attractive shrinkage methods for simultaneous variable selection and regression parameter estimation. While the adaptive Lasso is efficient for small magnitude errors, LAD-Lasso is robust against heavy-tailed errors and severe outliers. In this article, we consider a data-driven convex combination of these two modern procedures to produce a robust adaptive Lasso, which not only enjoys the oracle properties, but synthesizes the advantages of the adaptive Lasso and LAD-Lasso. It fully adapts to different error structures including the infinite variance case and automatically chooses the optimal weight to achieve both robustness and high efficiency. Extensive simulation studies demonstrate a good finite sample performance of the robust adaptive Lasso. Two data sets are analyzed to illustrate the practical use of the procedure. 相似文献
19.
A rank-based variable selection procedure is developed for the semiparametric accelerated failure time model with censored observations where the penalized likelihood (partial likelihood) method is not directly applicable. 相似文献
20.
The article considers a Gaussian model with the mean and the variance modeled flexibly as functions of the independent variables. The estimation is carried out using a Bayesian approach that allows the identification of significant variables in the variance function, as well as averaging over all possible models in both the mean and the variance functions. The computation is carried out by a simulation method that is carefully constructed to ensure that it converges quickly and produces iterates from the posterior distribution that have low correlation. Real and simulated examples demonstrate that the proposed method works well. The method in this paper is important because (a) it produces more realistic prediction intervals than nonparametric regression estimators that assume a constant variance; (b) variable selection identifies the variables in the variance function that are important; (c) variable selection and model averaging produce more efficient prediction intervals than those obtained by regular nonparametric regression. 相似文献