首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
The location model is a familiar basis for discriminant analysis of mixtures of categorical and continuous variables. Its usual implementation involves second-order smoothing, using multivariate regression for the continuous variables and log-linear models for the categorical variables. In spite of the smoothing, these procedures still require many parameters to be estimated and this in turn restricts the categorical variables to a small number if implementation is to be feasible. In this paper we propose non-parametric smoothing procedures for both parts of the model. The number of parameters to be estimated is dramatically reduced and the range of applicability thereby greatly increased. The methods are illustrated on several data sets, and the performances are compared with a range of other popular discrimination techniques. The proposed method compares very favourably with all its competitors.  相似文献   

2.
Ordered multiple categorical (MC) variable has been widely considered and studied as response variable, and few studies have carefully considered it as a predictor in linear regression. When doing this, the existence of some pseudo-categories may result in overfitting, and to detect those pseudo-categories by hypothesis test of all dummy variables might have low specificity. In this paper, we propose a transformation method of dummy variables for such ordered MC predictors, after which a model selection method combined with BIC will be elaborated. Theoretical consistency of our model selection method is established under some common assumptions. Both simulation studies and real data analysis of a medical survey indicate that our method provides good performance and is applicable to a wide range of biomedical research.  相似文献   

3.
The high-dimensional data arises in diverse fields of sciences, engineering and humanities. Variable selection plays an important role in dealing with high dimensional statistical modelling. In this article, we study the variable selection of quadratic approximation via the smoothly clipped absolute deviation (SCAD) penalty with a diverging number of parameters. We provide a unified method to select variables and estimate parameters for various of high dimensional models. Under appropriate conditions and with a proper regularization parameter, we show that the estimator has consistency and sparsity, and the estimators of nonzero coefficients enjoy the asymptotic normality as they would have if the zero coefficients were known in advance. In addition, under some mild conditions, we can obtain the global solution of the penalized objective function with the SCAD penalty. Numerical studies and a real data analysis are carried out to confirm the performance of the proposed method.  相似文献   

4.
In this paper we are concerned with the problems of variable selection and estimation in double generalized linear models in which both the mean and the dispersion are allowed to depend on explanatory variables. We propose a maximum penalized pseudo-likelihood method when the number of parameters diverges with the sample size. With appropriate selection of the tuning parameters, the consistency of the variable selection procedure and asymptotic properties of the resulting estimators are established. We also carry out simulation studies and a real data analysis to assess the finite sample performance of the proposed variable selection procedure, showing that the proposed variable selection method works satisfactorily.  相似文献   

5.
In this paper, we translate variable selection for linear regression into multiple testing, and select significant variables according to testing result. New variable selection procedures are proposed based on the optimal discovery procedure (ODP) in multiple testing. Due to ODP’s optimality, if we guarantee the number of significant variables included, it will include less non significant variables than marginal p-value based methods. Consistency of our procedures is obtained in theory and simulation. Simulation results suggest that procedures based on multiple testing have improvement over procedures based on selection criteria, and our new procedures have better performance than marginal p-value based procedures.  相似文献   

6.
If a number of candidate variables are available, variable selection is a key task aiming to identify those candidates which influence the outcome of interest. Methods as backward elimination, forward selection, etc. are often implemented, despite their drawbacks. One of these drawbacks is the instability of their results with respect to small perturbations in the data. To handle this issue, resampling-based procedures have been introduced; using a resampling technique, e.g. bootstrap, these procedures generate several pseudo-samples that are used to compute the inclusion frequency of each variable, i.e. the proportion of pseudo-samples in which the variable is selected. Based on the inclusion frequencies, it is possible to discriminate between relevant and irrelevant variables. These procedures may fail in case of correlated variables. To deal with this issue, two procedures based on 2×2 tables of inclusion frequencies have been developed in the literature. In this paper we analyse the behaviours of these two procedures and the role of their tuning parameters in an extensive simulation study.  相似文献   

7.
In this article, we consider the problem of selecting functional variables using the L1 regularization in a functional linear regression model with a scalar response and functional predictors, in the presence of outliers. Since the LASSO is a special case of the penalized least-square regression with L1 penalty function, it suffers from the heavy-tailed errors and/or outliers in data. Recently, Least Absolute Deviation (LAD) and the LASSO methods have been combined (the LAD-LASSO regression method) to carry out robust parameter estimation and variable selection simultaneously for a multiple linear regression model. However, variable selection of the functional predictors based on LASSO fails since multiple parameters exist for a functional predictor. Therefore, group LASSO is used for selecting functional predictors since group LASSO selects grouped variables rather than individual variables. In this study, we propose a robust functional predictor selection method, the LAD-group LASSO, for a functional linear regression model with a scalar response and functional predictors. We illustrate the performance of the LAD-group LASSO on both simulated and real data.  相似文献   

8.
Abstract

Variable selection is a fundamental challenge in statistical learning if one works with data sets containing huge amount of predictors. In this artical we consider procedures popular in model selection: Lasso and adaptive Lasso. Our goal is to investigate properties of estimators based on minimization of Lasso-type penalized empirical risk with a convex loss function, in particular nondifferentiable. We obtain theorems concerning rate of convergence in estimation, consistency in model selection and oracle properties for Lasso estimators if the number of predictors is fixed, i.e. it does not depend on the sample size. Moreover, we study properties of Lasso and adaptive Lasso estimators on simulated and real data sets.  相似文献   

9.
Coefficient estimation in linear regression models with missing data is routinely carried out in the mean regression framework. However, the mean regression theory breaks down if the error variance is infinite. In addition, correct specification of the likelihood function for existing imputation approach is often challenging in practice, especially for skewed data. In this paper, we develop a novel composite quantile regression and a weighted quantile average estimation procedure for parameter estimation in linear regression models when some responses are missing at random. Instead of imputing the missing response by randomly drawing from its conditional distribution, we propose to impute both missing and observed responses by their estimated conditional quantiles given the observed data and to use the parametrically estimated propensity scores to weigh check functions that define a regression parameter. Both estimation procedures are resistant to heavy‐tailed errors or outliers in the response and can achieve nice robustness and efficiency. Moreover, we propose adaptive penalization methods to simultaneously select significant variables and estimate unknown parameters. Asymptotic properties of the proposed estimators are carefully investigated. An efficient algorithm is developed for fast implementation of the proposed methodologies. We also discuss a model selection criterion, which is based on an ICQ ‐type statistic, to select the penalty parameters. The performance of the proposed methods is illustrated via simulated and real data sets.  相似文献   

10.
In this article, we study model selection and model averaging in quantile regression. Under general conditions, we develop a focused information criterion and a frequentist model average estimator for the parameters in quantile regression model, and examine their theoretical properties. The new procedures provide a robust alternative to the least squares method or likelihood method, and a major advantage of the proposed procedures is that when the variance of random error is infinite, the proposed procedure works beautifully while the least squares method breaks down. A simulation study and a real data example are presented to show that the proposed method performs well with a finite sample and is easy to use in practice.  相似文献   

11.
Applying nonparametric variable selection criteria in nonlinear regression models generally requires a substantial computational effort if the data set is large. In this paper we present a selection technique that is computationally much less demanding and performs well in comparison with methods currently available. It is based on a polynomial approximation of the nonlinear model. Performing the selection only requires repeated least squares estimation of models that are linear in parameters. The main limitation of the method is that the number of variables among which to select cannot be very large if the sample is small and the order of an adequate polynomial at the same time is high. Large samples can be handled without problems.  相似文献   

12.
Many procedures have been developed to deal with the high-dimensional problem that is emerging in various business and economics areas. To evaluate and compare these procedures, modeling uncertainty caused by model selection and parameter estimation has to be assessed and integrated into a modeling process. To do this, a data perturbation method estimates the modeling uncertainty inherited in a selection process by perturbing the data. Critical to data perturbation is the size of perturbation, as the perturbed data should resemble the original dataset. To account for the modeling uncertainty, we derive the optimal size of perturbation, which adapts to the data, the model space, and other relevant factors in the context of linear regression. On this basis, we develop an adaptive data-perturbation method that, unlike its nonadaptive counterpart, performs well in different situations. This leads to a data-adaptive model selection method. Both theoretical and numerical analysis suggest that the data-adaptive model selection method adapts to distinct situations in that it yields consistent model selection and optimal prediction, without knowing which situation exists a priori. The proposed method is applied to real data from the commodity market and outperforms its competitors in terms of price forecasting accuracy.  相似文献   

13.
We propose a penalized quantile regression for partially linear varying coefficient (VC) model with longitudinal data to select relevant non parametric and parametric components simultaneously. Selection consistency and oracle property are established. Furthermore, if linear part and VC part are unknown, we propose a new unified method, which can do three types of selections: separation of varying and constant effects, selection of relevant variables, and it can be carried out conveniently in one step. Consistency in the three types of selections and oracle property in estimation are established as well. Simulation studies and real data analysis also confirm our method.  相似文献   

14.
In this paper, we consider the problem of variable selection for partially varying coefficient single-index model, and present a regularized variable selection procedure by combining basis function approximations with smoothly clipped absolute deviation penalty. The proposed procedure simultaneously selects significant variables in the single-index parametric components and the nonparametric coefficient function components. With appropriate selection of the tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. Finite sample performance of the proposed method is illustrated by a simulation study and real data analysis.  相似文献   

15.
This paper considers a linear regression model with regression parameter vector β. The parameter of interest is θ= aTβ where a is specified. When, as a first step, a data‐based variable selection (e.g. minimum Akaike information criterion) is used to select a model, it is common statistical practice to then carry out inference about θ, using the same data, based on the (false) assumption that the selected model had been provided a priori. The paper considers a confidence interval for θ with nominal coverage 1 ‐ α constructed on this (false) assumption, and calls this the naive 1 ‐ α confidence interval. The minimum coverage probability of this confidence interval can be calculated for simple variable selection procedures involving only a single variable. However, the kinds of variable selection procedures used in practice are typically much more complicated. For the real‐life data presented in this paper, there are 20 variables each of which is to be either included or not, leading to 220 different models. The coverage probability at any given value of the parameters provides an upper bound on the minimum coverage probability of the naive confidence interval. This paper derives a new Monte Carlo simulation estimator of the coverage probability, which uses conditioning for variance reduction. For these real‐life data, the gain in efficiency of this Monte Carlo simulation due to conditioning ranged from 2 to 6. The paper also presents a simple one‐dimensional search strategy for parameter values at which the coverage probability is relatively small. For these real‐life data, this search leads to parameter values for which the coverage probability of the naive 0.95 confidence interval is 0.79 for variable selection using the Akaike information criterion and 0.70 for variable selection using Bayes information criterion, showing that these confidence intervals are completely inadequate.  相似文献   

16.
Variable selection in the presence of grouped variables is troublesome for competing risks data: while some recent methods deal with group selection only, simultaneous selection of both groups and within-group variables remains largely unexplored. In this context, we propose an adaptive group bridge method, enabling simultaneous selection both within and between groups, for competing risks data. The adaptive group bridge is applicable to independent and clustered data. It also allows the number of variables to diverge as the sample size increases. We show that our new method possesses excellent asymptotic properties, including variable selection consistency at group and within-group levels. We also show superior performance in simulated and real data sets over several competing approaches, including group bridge, adaptive group lasso, and AIC / BIC-based methods.  相似文献   

17.
The single index model is a useful regression model. In this paper, we propose a nonconcave penalized least squares method to estimate both the parameters and the link function of the single index model. Compared to other variable selection and estimation methods, the proposed method can estimate parameters and select variables simultaneously. When the dimension of parameters in the single index model is a fixed constant, under some regularity conditions, we demonstrate that the proposed estimators for parameters have the so-called oracle property, and furthermore we establish the asymptotic normality and develop a sandwich formula to estimate the standard deviations of the proposed estimators. Simulation studies and a real data analysis are presented to illustrate the proposed methods.  相似文献   

18.
A regression model with skew-normal errors provides a useful extension for ordinary normal regression models when the data set under consideration involves asymmetric outcomes. Variable selection is an important issue in all regression analyses, and in this paper, we investigate the simultaneously variable selection in joint location and scale models of the skew-normal distribution. We propose a unified penalized likelihood method which can simultaneously select significant variables in the location and scale models. Furthermore, the proposed variable selection method can simultaneously perform parameter estimation and variable selection in the location and scale models. With appropriate selection of the tuning parameters, we establish the consistency and the oracle property of the regularized estimators. Simulation studies and a real example are used to illustrate the proposed methodologies.  相似文献   

19.
Variable selection problem is one of the most important tasks in regression analysis, especially in a high-dimensional setting. In this paper, we study this problem in the context of scalar response functional regression model, which is a linear model with scalar response and functional regressors. The functional model can be represented by certain multiple linear regression model via basis expansions of functional variables. Based on this model and random subspace method of Mielniczuk and Teisseyre (Comput Stat Data Anal 71:725–742, 2014), two simple variable selection procedures for scalar response functional regression model are proposed. The final functional model is selected by using generalized information criteria. Monte Carlo simulation studies conducted and a real data example show very satisfactory performance of new variable selection methods under finite samples. Moreover, they suggest that considered procedures outperform solutions found in the literature in terms of correctly selected model, false discovery rate control and prediction error.  相似文献   

20.
This paper presents the derivation of a categorical variable selection technique which utilizes the entropy function as a measure of variability for nominally scaled variables. The selection criterion uses likelihood ratio statistics which, for the hypotheses under consideration, are identical to minimum discrimination information statistics. Thus, the paper provides an alternative motivation for a selection technique based on discriminatory power, and it provides an extension of that technique to the multipopulation discrimination problem. The selection technique is illustrated for a study in which we discriminate among three populations: cervical cancer patients, population-based controls, and hospital-based controls.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号