首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Using a forward selection procedure for selecting the best subset of regression variables involves the calculation of critical values (cutoffs) for an F-ratio at each step of a multistep search process. On dropping the restrictive (unrealistic) assumptions used in previous works, the null distribution of the F-ratio depends on unknown regression parameters for the variables already included in the subset. For the case of known σ, by conditioning the F-ratio on the set of regressors included so far and also on the observed (estimated) values of their regression coefficients, we obtain a forward selection procedure whose stepwise type I error does not depend on the unknown (nuisance) parameters. A numerical example with an orthogonal design matrix illustrates the difference between conditional cutoffs, cutoffs for the centralF-distribution, and cutoffs suggested by Pope and Webster.  相似文献   

2.
In data sets with many predictors, algorithms for identifying a good subset of predictors are often used. Most such algorithms do not allow for any relationships between predictors. For example, stepwise regression might select a model containing an interaction AB but neither main effect A or B. This paper develops mathematical representations of this and other relations between predictors, which may then be incorporated in a model selection procedure. A Bayesian approach that goes beyond the standard independence prior for variable selection is adopted, and preference for certain models is interpreted as prior information. Priors relevant to arbitrary interactions and polynomials, dummy variables for categorical factors, competing predictors, and restrictions on the size of the models are developed. Since the relations developed are for priors, they may be incorporated in any Bayesian variable selection algorithm for any type of linear model. The application of the methods here is illustrated via the stochastic search variable selection algorithm of George and McCulloch (1993), which is modified to utilize the new priors. The performance of the approach is illustrated with two constructed examples and a computer performance dataset.  相似文献   

3.
To compare several promising product designs, manufacturers must measure their performance under multiple environmental conditions. In many applications, a product design is considered to be seriously flawed if its performance is poor for any level of the environmental factor. For example, if a particular automobile battery design does not function well under temperature extremes, then a manufacturer may not want to put this design into production. Thus, this paper considers the measure of a product's quality to be its worst performance over the levels of the environmental factor. We develop statistical procedures to identify (a near) optimal product design among a given set of product designs, i.e., the manufacturing design that maximizes the worst product performance over the levels of the environmental variable. We accomplish this by intuitive procedures based on the split-plot experimental design (and the randomized complete block design as a special case); split-plot designs have the essential structure of a product array and the practical convenience of local randomization. Two classes of statistical procedures are provided. In the first, the δ-best formulation of selection problems, we determine the number of replications of the basic split-plot design that are needed to guarantee, with a given confidence level, the selection of a product design whose minimum performance is within a specified amount, δ, of the performance of the optimal product design. In particular, if the difference between the quality of the best and second best manufacturing designs is δ or more, then the procedure guarantees that the best design will be selected with specified probability. For applications where a split-plot experiment that involves several product designs has been completed without the planning required of the δ-best formulation, we provide procedures to construct a ‘confidence subset’ of the manufacturing designs; the selected subset contains the optimal product design with a prespecified confidence level. The latter is called the subset selection formulation of selection problems. Examples are provided to illustrate the procedures.  相似文献   

4.
We introduce a novel predictive statistical modeling technique called Hybrid Radial Basis Function Neural Networks (HRBF-NN) as a forecaster. HRBF-NN is a flexible forecasting technique that integrates regression trees, ridge regression, with radial basis function (RBF) neural networks (NN). We develop a new computational procedure using model selection based on information-theoretic principles as the fitness function using the genetic algorithm (GA) to carry out subset selection of best predictors. Due to the dynamic and chaotic nature of the underlying stock market process, as is well known, the task of generating economically useful stock market forecasts is difficult, if not impossible. HRBF-NN is well suited for modeling complex non-linear relationships and dependencies between the stock indices. We propose HRBF-NN as our forecaster and a predictive modeling tool to study the daily movements of stock indices. We show numerical examples to determine a predictive relationship between the Istanbul Stock Exchange National 100 Index (ISE100) and seven other international stock market indices. We select the best subset of predictors by minimizing the information complexity (ICOMP) criterion as the fitness function within the GA. Using the best subset of variables we construct out-of-sample forecasts for the ISE100 index to determine the daily directional movements. Our results obtained demonstrate the utility and the flexibility of HRBF-NN as a clever predictive modeling tool for highly dependent and nonlinear data.  相似文献   

5.
In this paper, we translate variable selection for linear regression into multiple testing, and select significant variables according to testing result. New variable selection procedures are proposed based on the optimal discovery procedure (ODP) in multiple testing. Due to ODP’s optimality, if we guarantee the number of significant variables included, it will include less non significant variables than marginal p-value based methods. Consistency of our procedures is obtained in theory and simulation. Simulation results suggest that procedures based on multiple testing have improvement over procedures based on selection criteria, and our new procedures have better performance than marginal p-value based procedures.  相似文献   

6.
In market research and some other areas, it is common that a sample of n judges (consumers, evaluators, etc.) are asked to independently rank a series of k objects or candidates. It is usually difficult to obtain the judges' full cooperation to completely rank all k objects. A practical way to overcome this difficulty is to give each judge the freedom to choose the number of top candidates he is willing to rank. A frequently encountered question in this type of survey is how to select the best object or candidate from the incompletely ranked data. This paper proposes a subset selection procedure which constructs a random subset of all the k objects involved in the survey such that the best object is included in the subset with a prespecified confidence. It is shown that the proposed subset selection procedure is distribution-free over a very broad class of underlying distributions. An example from a market research study is used to illustrate the proposed procedure.  相似文献   

7.
Variable selection is an important task in regression analysis. Performance of the statistical model highly depends on the determination of the subset of predictors. There are several methods to select most relevant variables to construct a good model. However in practice, the dependent variable may have positive continuous values and not normally distributed. In such situations, gamma distribution is more suitable than normal for building a regression model. This paper introduces an heuristic approach to perform variable selection using artificial bee colony optimization for gamma regression models. We evaluated the proposed method against with classical selection methods such as backward and stepwise. Both simulation studies and real data set examples proved the accuracy of our selection procedure.  相似文献   

8.
ABSTRACT

Functional linear model is of great practical importance, as exemplified by applications in high-throughput studies such as meteorological and biomedical research. In this paper, we propose a new functional variable selection procedure, called functional variable selection via Gram–Schmidt (FGS) orthogonalization, for a functional linear model with a scalar response and multiple functional predictors. Instead of the regularization methods, FGS takes into account the similarity between the functional predictors in a data-driven way and utilizes the technique of Gram–Schmidt orthogonalization to remove the irrelevant predictors. FGS can successfully discriminate between the relevant and the irrelevant functional predictors to achieve a high true positive ratio without including many irrelevant predictors, and yield explainable models, which offers a new perspective for the variable selection method in the functional linear model. Simulation studies are carried out to evaluate the finite sample performance of the proposed method, and also a weather data set is analysed.  相似文献   

9.
We restrict attention to a class of Bernoulli subset selection procedures which take observations one-at-a-time and can be compared directly to the Gupta-Sobel single-stage procedure. For the criterion of minimizing the expected total number of observations required to terminate experimentation, we show that optimal sampling rules within this class are not of practical interest. We thus turn to procedures which, although not optimal, exhibit desirable behavior with regard to this criterion. A procedure which employs a modification of the so-called least-failures sampling rule is proposed, and is shown to possess many desirable properties among a restricted class of Bernoulli subset selection procedures. Within this class, it is optimal for minimizing the number of observations taken from populations excluded from consideration following a subset selection experiment, and asymptotically optimal for minimizing the expected total number of observations required. In addition, it can result in substantial savings in the expected total num¬ber of observations required as compared to a single-stage procedure, thus it may be de¬sirable to a practitioner if sampling is costly or the sample size is limited.  相似文献   

10.
This paper examines the design and performance of sequential experiments where extensive switching is undesirable. Given an objective function to optimize by sampling between Bernoulli populations, two different models are considered. The constraint model restricts the maximum number of switches possible, while the cost model introduces a charge for each switch. Optimal allocation procedures and a new “hyperopic” procedure are discussed and their behavior examined. For the cost model, if one views the costs as control variables then the optimal allocation procedures yield the optimal tradeoff of expected switches vs. expected value of the objective function.  相似文献   

11.
Biased sampling occurs often in observational studies. With one biased sample, the problem of nonparametrically estimating both a target density function and a selection bias function is unidentifiable. This paper studies the nonparametric estimation problem when there are two biased samples that have some overlapping observations (i.e. recaptures) from a finite population. Since an intelligent subject sampled previously may experience a memory effect if sampled again, two general 2-stage models that incorporate both a selection bias and a possible memory effect are proposed. Nonparametric estimators of the target density, selection bias, and memory functions, as well as the population size are developed. Asymptotic properties of these estimators are studied and confidence bands for the selection function and memory function are provided. Our procedures are compared with those ignoring the memory effect or the selection bias in finite sample situations. A nonparametric model selection procedure is also given for choosing a model from the two 2-stage models and a mixture of these two models. Our procedures work well with or without a memory effect, and with or without a selection bias. The paper concludes with an application to a real survey data set.  相似文献   

12.
Abstract

An aspect of cluster analysis which has been widely studied in recent years is the weighting and selection of variables. Procedures have been proposed which are able to identify the cluster structure present in a data matrix when that structure is confined to a subset of variables. Other methods assess the relative importance of each variable as revealed by a suitably chosen weight. But when a cluster structure is present in more than one subset of variables and is different from one subset to another, those solutions as well as standard clustering algorithms can lead to misleading results. Some very recent methodologies for finding consensus classifications of the same set of units can be useful also for the identification of cluster structures in a data matrix, but each one seems to be only partly satisfactory for the purpose at hand. Therefore a new more specific procedure is proposed and illustrated by analyzing two real data sets; its performances are evaluated by means of a simulation experiment.  相似文献   

13.
In many regression problems, predictors are naturally grouped. For example, when a set of dummy variables is used to represent categorical variables, or a set of basis functions of continuous variables is included in the predictor set, it is important to carry out a feature selection both at the group level and at individual variable levels within the group simultaneously. To incorporate the group and variables within-group information into a regularized model fitting, several regularization methods have been developed, including the Cox regression and the conditional mean regression. Complementary to earlier works, the simultaneous group and within-group variables selection method is examined in quantile regression. We propose a hierarchically penalized quantile regression, and show that the hierarchical penalty possesses the oracle property in quantile regression, as well as in the Cox regression. The proposed method is evaluated through simulation studies and a real data application.  相似文献   

14.
Abstract

Dominance analysis is a procedure for measuring the importance of predictors in multiple regression analysis. We show that dominance analysis can be enhanced using a dynamic programing approach for the rank-ordering of predictors. Using customer satisfaction data from a call center operation, we demonstrate how the integration of dominance analysis with dynamic programing can provide a better understanding of predictor importance. As a cautionary note, we recommend careful reflection on the relationship between predictor importance and variable subset selection. We observed that slight changes in the selected predictor subset can have an impact on the importance rankings produced by a dominance analysis.  相似文献   

15.
The purpose of this paper is to develop a Bayesian approach for log-Birnbaum–Saunders Student-t regression models under right-censored survival data. Markov chain Monte Carlo (MCMC) methods are used to develop a Bayesian procedure for the considered model. In order to attenuate the influence of the outlying observations on the parameter estimates, we present in this paper Birnbaum–Saunders models in which a Student-t distribution is assumed to explain the cumulative damage. Also, some discussions on the model selection to compare the fitted models are given and case deletion influence diagnostics are developed for the joint posterior distribution based on the Kullback–Leibler divergence. The developed procedures are illustrated with a real data set.  相似文献   

16.
17.
In many studies a large number of variables is measured and the identification of relevant variables influencing an outcome is an important task. For variable selection several procedures are available. However, focusing on one model only neglects that there usually exist other equally appropriate models. Bayesian or frequentist model averaging approaches have been proposed to improve the development of a predictor. With a larger number of variables (say more than ten variables) the resulting class of models can be very large. For Bayesian model averaging Occam’s window is a popular approach to reduce the model space. As this approach may not eliminate any variables, a variable screening step was proposed for a frequentist model averaging procedure. Based on the results of selected models in bootstrap samples, variables are eliminated before deriving a model averaging predictor. As a simple alternative screening procedure backward elimination can be used. Through two examples and by means of simulation we investigate some properties of the screening step. In the simulation study we consider situations with fifteen and 25 variables, respectively, of which seven have an influence on the outcome. With the screening step most of the uninfluential variables will be eliminated, but also some variables with a weak effect. Variable screening leads to more applicable models without eliminating models, which are more strongly supported by the data. Furthermore, we give recommendations for important parameters of the screening step.  相似文献   

18.
Model selection strategies play an important, if not explicit, role in quantitative research. The inferential properties of these strategies are largely unknown, therefore, there is little basis for recommending (or avoiding) any particular set of strategies. In this paper, we evaluate several commonly used model selection procedures [Bayesian information criterion (BIC), adjusted R 2, Mallows’ C p, Akaike information criteria (AIC), AICc, and stepwise regression] using Monte-Carlo simulation of model selection when the true data generating processes (DGP) are known.

We find that the ability of these selection procedures to include important variables and exclude irrelevant variables increases with the size of the sample and decreases with the amount of noise in the model. None of the model selection procedures do well in small samples, even when the true DGP is largely deterministic; thus, data mining in small samples should be avoided entirely. Instead, the implicit uncertainty in model specification should be explicitly discussed. In large samples, BIC is better than the other procedures at correctly identifying most of the generating processes we simulated, and stepwise does almost as well. In the absence of strong theory, both BIC and stepwise appear to be reasonable model selection strategies in large samples. Under the conditions simulated, adjusted R 2, Mallows’ C p AIC, and AICc are clearly inferior and should be avoided.  相似文献   


19.
In this paper, we derive statistical selection procedures to partition k normal populations into ‘good’ or ‘bad’ ones, respectively, using the nonparametric empirical Bayes approach. The relative regret risk of a selection procedure is used as a measure of its performance. We establish the asymptotic optimality of the proposed empirical Bayes selection procedures and investigate the associated rates of convergence. Under a very mild condition, the proposed empirical Bayes selection procedures are shown to have rates of convergence of order close to O(k−1/2) where k is the number of populations involved in the selection problem. With further strong assumptions, the empirical Bayes selection procedures have rates of convergence of order O(kα(r−1)/(2r+1)), where 1<α<2 and r is an integer greater than 2.  相似文献   

20.
In this article, we consider the problem of selecting functional variables using the L1 regularization in a functional linear regression model with a scalar response and functional predictors, in the presence of outliers. Since the LASSO is a special case of the penalized least-square regression with L1 penalty function, it suffers from the heavy-tailed errors and/or outliers in data. Recently, Least Absolute Deviation (LAD) and the LASSO methods have been combined (the LAD-LASSO regression method) to carry out robust parameter estimation and variable selection simultaneously for a multiple linear regression model. However, variable selection of the functional predictors based on LASSO fails since multiple parameters exist for a functional predictor. Therefore, group LASSO is used for selecting functional predictors since group LASSO selects grouped variables rather than individual variables. In this study, we propose a robust functional predictor selection method, the LAD-group LASSO, for a functional linear regression model with a scalar response and functional predictors. We illustrate the performance of the LAD-group LASSO on both simulated and real data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号