首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Varying covariate effects often manifest meaningful heterogeneity in covariate-response associations. In this paper, we adopt a quantile regression model that assumes linearity at a continuous range of quantile levels as a tool to explore such data dynamics. The consideration of potential non-constancy of covariate effects necessitates a new perspective for variable selection, which, under the assumed quantile regression model, is to retain variables that have effects on all quantiles of interest as well as those that influence only part of quantiles considered. Current work on l 1-penalized quantile regression either does not concern varying covariate effects or may not produce consistent variable selection in the presence of covariates with partial effects, a practical scenario of interest. In this work, we propose a shrinkage approach by adopting a novel uniform adaptive LASSO penalty. The new approach enjoys easy implementation without requiring smoothing. Moreover, it can consistently identify the true model (uniformly across quantiles) and achieve the oracle estimation efficiency. We further extend the proposed shrinkage method to the case where responses are subject to random right censoring. Numerical studies confirm the theoretical results and support the utility of our proposals.  相似文献   

2.
We consider kernel methods to construct nonparametric estimators of a regression function based on incomplete data. To tackle the presence of incomplete covariates, we employ Horvitz–Thompson-type inverse weighting techniques, where the weights are the selection probabilities. The unknown selection probabilities are themselves estimated using (1) kernel regression, when the functional form of these probabilities are completely unknown, and (2) the least-squares method, when the selection probabilities belong to a known class of candidate functions. To assess the overall performance of the proposed estimators, we establish exponential upper bounds on the \(L_p\) norms, \(1\le p<\infty \), of our estimators; these bounds immediately yield various strong convergence results. We also apply our results to deal with the important problem of statistical classification with partially observed covariates.  相似文献   

3.
Abstract

In this article, we study the variable selection and estimation for linear regression models with missing covariates. The proposed estimation method is almost as efficient as the popular least-squares-based estimation method for normal random errors and empirically shown to be much more efficient and robust with respect to heavy tailed errors or outliers in the responses and covariates. To achieve sparsity, a variable selection procedure based on SCAD is proposed to conduct estimation and variable selection simultaneously. The procedure is shown to possess the oracle property. To deal with the covariates missing, we consider the inverse probability weighted estimators for the linear model when the selection probability is known or unknown. It is shown that the estimator by using estimated selection probability has a smaller asymptotic variance than that with true selection probability, thus is more efficient. Therefore, the important Horvitz-Thompson property is verified for penalized rank estimator with the covariates missing in the linear model. Some numerical examples are provided to demonstrate the performance of the estimators.  相似文献   

4.
Variable selection over a potentially large set of covariates in a linear model is quite popular. In the Bayesian context, common prior choices can lead to a posterior expectation of the regression coefficients that is a sparse (or nearly sparse) vector with a few nonzero components, those covariates that are most important. This article extends the “global‐local” shrinkage idea to a scenario where one wishes to model multiple response variables simultaneously. Here, we have developed a variable selection method for a K‐outcome model (multivariate regression) that identifies the most important covariates across all outcomes. The prior for all regression coefficients is a mean zero normal with coefficient‐specific variance term that consists of a predictor‐specific factor (shared local shrinkage parameter) and a model‐specific factor (global shrinkage term) that differs in each model. The performance of our modeling approach is evaluated through simulation studies and a data example.  相似文献   

5.
In this article, we develop a Bayesian variable selection method that concerns selection of covariates in the Poisson change-point regression model with both discrete and continuous candidate covariates. Ranging from a null model with no selected covariates to a full model including all covariates, the Bayesian variable selection method searches the entire model space, estimates posterior inclusion probabilities of covariates, and obtains model averaged estimates on coefficients to covariates, while simultaneously estimating a time-varying baseline rate due to change-points. For posterior computation, the Metropolis-Hastings within partially collapsed Gibbs sampler is developed to efficiently fit the Poisson change-point regression model with variable selection. We illustrate the proposed method using simulated and real datasets.  相似文献   

6.
We derive the optimal regression function (i.e., the best approximation in the L2 sense) when the vector of covariates has a random dimension. Furthermore, we consider applications of these results to problems in statistical regression and classification with missing covariates. It will be seen, perhaps surprisingly, that the correct regression function for the case with missing covariates can sometimes perform better than the usual regression function corresponding to the case with no missing covariates. This is because even if some of the covariates are missing, an indicator random variable δδ, which is always observable, and is equal to 1 if there are no missing values (and 0 otherwise), may have far more information and predictive power about the response variable Y than the missing covariates do. We also propose kernel-based procedures for estimating the correct regression function nonparametrically. As an alternative estimation procedure, we also consider the least-squares method.  相似文献   

7.
Logistic regression plays an important role in many fields. In practice, we often encounter missing covariates in different applied sectors, particularly in biomedical sciences. Ibrahim (1990) proposed a method to handle missing covariates in generalized linear model (GLM) setup. It is well known that logistic regression estimates using small or medium sized missing data are biased. Considering the missing data that are missing at random, in this paper we have reduced the bias by two methods; first we have derived a closed form bias expression using Cox and Snell (1968), and second we have used likelihood based modification similar to Firth (1993). Here we have analytically shown that the Firth type likelihood modification in Ibrahim led to the second order bias reduction. The proposed methods are simple to apply on an existing method, need no analytical work, with the exception of a little change in the optimization function. We have carried out extensive simulation studies comparing the methods, and our simulation results are also supported by a real world data.  相似文献   

8.
Abstract

In this article we propose a new mixed-effects regression model for fractional bounded response variables. Our model allows us to incorporate covariates directly to the expected value, so we can quantify exactly the influence of these covariates in the mean of the variable of interest rather than to the conditional mean. Estimation is carried out from a Bayesian perspective. Due to the complexity of the augmented posterior distribution, we use a Hamiltonian Monte Carlo algorithm, the No-U-Turn sampler, implemented using the Stan software. A simulation study was performed showing that our model has a better performance than other traditional longitudinal models for bounded variables. Finally, we applied our beta-inflated mean mixed-effects regression model to real data which consists of utilization of credit lines in the peruvian financial system.  相似文献   

9.
In this paper, a generalized partially linear model (GPLM) with missing covariates is studied and a Monte Carlo EM (MCEM) algorithm with penalized-spline (P-spline) technique is developed to estimate the regression coefficients and nonparametric function, respectively. As classical model selection procedures such as Akaike's information criterion become invalid for our considered models with incomplete data, some new model selection criterions for GPLMs with missing covariates are proposed under two different missingness mechanism, say, missing at random (MAR) and missing not at random (MNAR). The most attractive point of our method is that it is rather general and can be extended to various situations with missing observations based on EM algorithm, especially when no missing data involved, our new model selection criterions are reduced to classical AIC. Therefore, we can not only compare models with missing observations under MAR/MNAR settings, but also can compare missing data models with complete-data models simultaneously. Theoretical properties of the proposed estimator, including consistency of the model selection criterions are investigated. A simulation study and a real example are used to illustrate the proposed methodology.  相似文献   

10.
This article considers a nonparametric varying coefficient regression model with longitudinal observations. The relationship between the dependent variable and the covariates is assumed to be linear at a specific time point, but the coefficients are allowed to change over time. A general formulation is used to treat mean regression, median regression, quantile regression, and robust mean regression in one setting. The local M-estimators of the unknown coefficient functions are obtained by local linear method. The asymptotic distributions of M-estimators of unknown coefficient functions at both interior and boundary points are established. Various applications of the main results, including estimating conditional quantile coefficient functions and robustifying the mean regression coefficient functions are derived. Finite sample properties of our procedures are studied through Monte Carlo simulations.  相似文献   

11.
In this paper, we investigate the problem of determining the relationship, represented by similarity of the homologous gene configuration, between paired circular genomes using a regression analysis. We propose a new regression model for studying two circular genomes, where the Möbius transformation naturally arises and is taken as the link function, and propose the least circular distance estimation method, as an appropriate method for analyzing circular variables. The main utility of the new regression model is in identification of a new angular location of one of a homologous gene pair between two circular genomes, for various types of possible gene mutations, given that of the other gene. Furthermore, we demonstrate the utility of our new regression model for grouping of various genomes based on closeness of their relationship. Using angular locations of homologous genes from the five pairs of circular genomes (Horimoto et al. in Bioinformatics 14:789–802, 1998), the new model is compared with the existing models.  相似文献   

12.
Abstract

In this paper, we propose a variable selection method for quantile regression model in ultra-high dimensional longitudinal data called as the weighted adaptive robust lasso (WAR-Lasso) which is double-robustness. We derive the consistency and the model selection oracle property of WAR-Lasso. Simulation studies show the double-robustness of WAR-Lasso in both cases of heavy-tailed distribution of the errors and the heavy contaminations of the covariates. WAR-Lasso outperform other methods such as SCAD and etc. A real data analysis is carried out. It shows that WAR-Lasso tends to select fewer variables and the estimated coefficients are in line with economic significance.  相似文献   

13.
We consider a partially linear model with diverging number of groups of parameters in the parametric component. The variable selection and estimation of regression coefficients are achieved simultaneously by using the suitable penalty function for covariates in the parametric component. An MM-type algorithm for estimating parameters without inverting a high-dimensional matrix is proposed. The consistency and sparsity of penalized least-squares estimators of regression coefficients are discussed under the setting of some nonzero regression coefficients with very small values. It is found that the root pn/n-consistency and sparsity of the penalized least-squares estimators of regression coefficients cannot be given consideration simultaneously when the number of nonzero regression coefficients with very small values is unknown, where pn and n, respectively, denote the number of regression coefficients and sample size. The finite sample behaviors of penalized least-squares estimators of regression coefficients and the performance of the proposed algorithm are studied by simulation studies and a real data example.  相似文献   

14.
15.
We introduce extensions of stability selection, a method to stabilise variable selection methods introduced by Meinshausen and Bühlmann (J R Stat Soc 72:417–473, 2010). We propose to apply a base selection method repeatedly to random subsamples of observations and subsets of covariates under scrutiny, and to select covariates based on their selection frequency. We analyse the effects and benefits of these extensions. Our analysis generalizes the theoretical results of Meinshausen and Bühlmann (J R Stat Soc 72:417–473, 2010) from the case of half-samples to subsamples of arbitrary size. We study, in a theoretical manner, the effect of taking random covariate subsets using a simplified score model. Finally we validate these extensions on numerical experiments on both synthetic and real datasets, and compare the obtained results in detail to the original stability selection method.  相似文献   

16.
We consider the problem of model (or variable) selection in the classical regression model using the GIC (general information criterion). In this method the maximum likelihood is used with a penalty function denoted by Cn, depending on the sample size n and chosen to ensure consistency in the selection of the true model. There are various choices of Cn suggested in the literature on model selection. In this paper we show that a particular choice of Cn based on observed data, which makes it random, preserves the consistency property and provides improved performance over a fixed choice of Cn.  相似文献   

17.
The problem of modeling the relationship between a set of covariates and a multivariate response with correlated components often arises in many areas of research such as genetics, psychometrics, signal processing. In the linear regression framework, such task can be addressed using a number of existing methods. In the high-dimensional sparse setting, most of these methods rely on the idea of penalization in order to efficiently estimate the regression matrix. Examples of such methods include the lasso, the group lasso, the adaptive group lasso or the simultaneous variable selection (SVS) method. Crucially, a suitably chosen penalty also allows for an efficient exploitation of the correlation structure within the multivariate response. In this paper we introduce a novel variant of such method called the adaptive SVS, which is closely linked with the adaptive group lasso. Via a simulation study we investigate its performance in the high-dimensional sparse regression setting. We provide a comparison with a number of other popular methods under different scenarios and show that the adaptive SVS is a powerful tool for efficient recovery of signal in such setting. The methods are applied to genetic data.  相似文献   

18.
Selection of appropriate predictors for right censored time to event data is very often encountered by the practitioners. We consider the ?1 penalized regression or “least absolute shrinkage and selection operator” as a tool for predictor selection in association with accelerated failure time model. The choice of the penalizing parameter λ is crucial to identify the correct set of covariates. In this paper, we propose an information theory-based method to choose λ under log-normal distribution. Furthermore, an efficient algorithm is discussed in the same context. The performance of the proposed λ and the algorithm is illustrated through simulation studies and a real data analysis. The convergence of the algorithm is also discussed.  相似文献   

19.
We develop a Bayesian variable selection method for logistic regression models that can simultaneously accommodate qualitative covariates and interaction terms under various heredity constraints. We use expectation-maximization variable selection (EMVS) with a deterministic annealing variant as the platform for our method, due to its proven flexibility and efficiency. We propose a variance adjustment of the priors for the coefficients of qualitative covariates, which controls false-positive rates, and a flexible parameterization for interaction terms, which accommodates user-specified heredity constraints. This method can handle all pairwise interaction terms as well as a subset of specific interactions. Using simulation, we show that this method selects associated covariates better than the grouped LASSO and the LASSO with heredity constraints in various exploratory research scenarios encountered in epidemiological studies. We apply our method to identify genetic and non-genetic risk factors associated with smoking experimentation in a cohort of Mexican-heritage adolescents.  相似文献   

20.
In this paper we design a sure independent ranking and screening procedure for censored regression (cSIRS, for short) with ultrahigh dimensional covariates. The inverse probability weighted cSIRS procedure is model-free in the sense that it does not specify a parametric or semiparametric regression function between the response variable and the covariates. Thus, it is robust to model mis-specification. This model-free property is very appealing in ultrahigh dimensional data analysis, particularly when there is lack of information for the underlying regression structure. The cSIRS procedure is also robust in the presence of outliers or extreme values as it merely uses the rank of the censored response variable. We establish both the sure screening and the ranking consistency properties for the cSIRS procedure when the number of covariates p satisfies \(p=o\{\exp (an)\}\), where a is a positive constant and n is the available sample size. The advantages of cSIRS over existing competitors are demonstrated through comprehensive simulations and an application to the diffuse large-B-cell lymphoma data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号