期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Shrinkage estimation of varying covariate effects based on quantile regression

Limin Peng Jinfeng Xu Nancy Kutner 《Statistics and Computing》2014,24(5):853-869

Varying covariate effects often manifest meaningful heterogeneity in covariate-response associations. In this paper, we adopt a quantile regression model that assumes linearity at a continuous range of quantile levels as a tool to explore such data dynamics. The consideration of potential non-constancy of covariate effects necessitates a new perspective for variable selection, which, under the assumed quantile regression model, is to retain variables that have effects on all quantiles of interest as well as those that influence only part of quantiles considered. Current work on l ₁-penalized quantile regression either does not concern varying covariate effects or may not produce consistent variable selection in the presence of covariates with partial effects, a practical scenario of interest. In this work, we propose a shrinkage approach by adopting a novel uniform adaptive LASSO penalty. The new approach enjoys easy implementation without requiring smoothing. Moreover, it can consistently identify the true model (uniformly across quantiles) and achieve the oracle estimation efficiency. We further extend the proposed shrinkage method to the case where responses are subject to random right censoring. Numerical studies confirm the theoretical results and support the utility of our proposals. 相似文献

2.

On the $$L_p$$ norms of kernel regression estimators for incomplete data with applications to classification

Timothy Reese Majid Mojirsheibani 《Statistical Methods and Applications》2017,26(1):81-112

We consider kernel methods to construct nonparametric estimators of a regression function based on incomplete data. To tackle the presence of incomplete covariates, we employ Horvitz–Thompson-type inverse weighting techniques, where the weights are the selection probabilities. The unknown selection probabilities are themselves estimated using (1) kernel regression, when the functional form of these probabilities are completely unknown, and (2) the least-squares method, when the selection probabilities belong to a known class of candidate functions. To assess the overall performance of the proposed estimators, we establish exponential upper bounds on the $L_p$ norms, $1\le p<\infty $, of our estimators; these bounds immediately yield various strong convergence results. We also apply our results to deal with the important problem of statistical classification with partially observed covariates. 相似文献

3.

Penalized inverse probability weighted estimators for weighted rank regression with missing covariates

Hu Yang Jing Lv 《统计学通讯:理论与方法》2013,42(5):1388-1402

Abstract

In this article, we study the variable selection and estimation for linear regression models with missing covariates. The proposed estimation method is almost as efficient as the popular least-squares-based estimation method for normal random errors and empirically shown to be much more efficient and robust with respect to heavy tailed errors or outliers in the responses and covariates. To achieve sparsity, a variable selection procedure based on SCAD is proposed to conduct estimation and variable selection simultaneously. The procedure is shown to possess the oracle property. To deal with the covariates missing, we consider the inverse probability weighted estimators for the linear model when the selection probability is known or unknown. It is shown that the estimator by using estimated selection probability has a smaller asymptotic variance than that with true selection probability, thus is more efficient. Therefore, the important Horvitz-Thompson property is verified for penalized rank estimator with the covariates missing in the linear model. Some numerical examples are provided to demonstrate the performance of the estimators. 相似文献

4.

Bayesian variable selection for multioutcome models through shared shrinkage

Debamita Kundu Riten Mitra Jeremy T. Gaskins 《Scandinavian Journal of Statistics》2021,48(1):295-320

Variable selection over a potentially large set of covariates in a linear model is quite popular. In the Bayesian context, common prior choices can lead to a posterior expectation of the regression coefficients that is a sparse (or nearly sparse) vector with a few nonzero components, those covariates that are most important. This article extends the “global‐local” shrinkage idea to a scenario where one wishes to model multiple response variables simultaneously. Here, we have developed a variable selection method for a K‐outcome model (multivariate regression) that identifies the most important covariates across all outcomes. The prior for all regression coefficients is a mean zero normal with coefficient‐specific variance term that consists of a predictor‐specific factor (shared local shrinkage parameter) and a model‐specific factor (global shrinkage term) that differs in each model. The performance of our modeling approach is evaluated through simulation studies and a data example. 相似文献

5.

Bayesian variable selection in Poisson change-point regression analysis

S. Min 《统计学通讯:模拟与计算》2017,46(3):2267-2282

In this article, we develop a Bayesian variable selection method that concerns selection of covariates in the Poisson change-point regression model with both discrete and continuous candidate covariates. Ranging from a null model with no selected covariates to a full model including all covariates, the Bayesian variable selection method searches the entire model space, estimates posterior inclusion probabilities of covariates, and obtains model averaged estimates on coefficients to covariates, while simultaneously estimating a time-varying baseline rate due to change-points. For posterior computation, the Metropolis-Hastings within partially collapsed Gibbs sampler is developed to efficiently fit the Poisson change-point regression model with variable selection. We illustrate the proposed method using simulated and real datasets. 相似文献

6.

On the correct regression function (in L₂) and its applications when the dimension of the covariate vector is random

Majid Mojirsheibani 《Journal of statistical planning and inference》2012

We derive the optimal regression function (i.e., the best approximation in the L₂ sense) when the vector of covariates has a random dimension. Furthermore, we consider applications of these results to problems in statistical regression and classification with missing covariates. It will be seen, perhaps surprisingly, that the correct regression function for the case with missing covariates can sometimes perform better than the usual regression function corresponding to the case with no missing covariates. This is because even if some of the covariates are missing, an indicator random variable δ

δ

, which is always observable, and is equal to 1 if there are no missing values (and 0 otherwise), may have far more information and predictive power about the response variable Y than the missing covariates do. We also propose kernel-based procedures for estimating the correct regression function nonparametrically. As an alternative estimation procedure, we also consider the least-squares method. 相似文献

7.

Bias correction in logistic regression with missing categorical covariates

Ujjwal Das Tapabrata Maiti Vivek Pradhan 《Journal of statistical planning and inference》2010

Logistic regression plays an important role in many fields. In practice, we often encounter missing covariates in different applied sectors, particularly in biomedical sciences. Ibrahim (1990) proposed a method to handle missing covariates in generalized linear model (GLM) setup. It is well known that logistic regression estimates using small or medium sized missing data are biased. Considering the missing data that are missing at random, in this paper we have reduced the bias by two methods; first we have derived a closed form bias expression using Cox and Snell (1968), and second we have used likelihood based modification similar to Firth (1993). Here we have analytically shown that the Firth type likelihood modification in Ibrahim led to the second order bias reduction. The proposed methods are simple to apply on an existing method, need no analytical work, with the exception of a little change in the optimization function. We have carried out extensive simulation studies comparing the methods, and our simulation results are also supported by a real world data. 相似文献

8.

A beta-inflated mean regression model with mixed effects for fractional response variables

Renzo Fernández Cristian L. Bayes Luis Valdivieso 《Journal of Statistical Computation and Simulation》2018,88(10):1936-1957

Abstract

In this article we propose a new mixed-effects regression model for fractional bounded response variables. Our model allows us to incorporate covariates directly to the expected value, so we can quantify exactly the influence of these covariates in the mean of the variable of interest rather than to the conditional mean. Estimation is carried out from a Bayesian perspective. Due to the complexity of the augmented posterior distribution, we use a Hamiltonian Monte Carlo algorithm, the No-U-Turn sampler, implemented using the Stan software. A simulation study was performed showing that our model has a better performance than other traditional longitudinal models for bounded variables. Finally, we applied our beta-inflated mean mixed-effects regression model to real data which consists of utilization of credit lines in the peruvian financial system. 相似文献

9.

Model selection of generalized partially linear models with missing covariates

Ying-Zi FuXue-Dong Chen 《Journal of statistical planning and inference》2012,142(1):126-138

In this paper, a generalized partially linear model (GPLM) with missing covariates is studied and a Monte Carlo EM (MCEM) algorithm with penalized-spline (P-spline) technique is developed to estimate the regression coefficients and nonparametric function, respectively. As classical model selection procedures such as Akaike's information criterion become invalid for our considered models with incomplete data, some new model selection criterions for GPLMs with missing covariates are proposed under two different missingness mechanism, say, missing at random (MAR) and missing not at random (MNAR). The most attractive point of our method is that it is rather general and can be extended to various situations with missing observations based on EM algorithm, especially when no missing data involved, our new model selection criterions are reduced to classical AIC. Therefore, we can not only compare models with missing observations under MAR/MNAR settings, but also can compare missing data models with complete-data models simultaneously. Theoretical properties of the proposed estimator, including consistency of the model selection criterions are investigated. A simulation study and a real example are used to illustrate the proposed methodology. 相似文献

10.

Asymptotic Normality of M-Estimators for Varying Coefficient Models with Longitudinal Data

Tang Qingguo Cheng Longsheng 《统计学通讯:理论与方法》2013,42(9):1422-1440

This article considers a nonparametric varying coefficient regression model with longitudinal observations. The relationship between the dependent variable and the covariates is assumed to be linear at a specific time point, but the coefficients are allowed to change over time. A general formulation is used to treat mean regression, median regression, quantile regression, and robust mean regression in one setting. The local M-estimators of the unknown coefficient functions are obtained by local linear method. The asymptotic distributions of M-estimators of unknown coefficient functions at both interior and boundary points are established. Various applications of the main results, including estimating conditional quantile coefficient functions and robustifying the mean regression coefficient functions are derived. Finite sample properties of our procedures are studied through Monte Carlo simulations. 相似文献

11.

Statistical inference for homologous gene pairs between two circular genomes: a new circular–circular regression model

Ashis SenGupta Sungsu Kim 《Statistical Methods and Applications》2016,25(3):421-432

In this paper, we investigate the problem of determining the relationship, represented by similarity of the homologous gene configuration, between paired circular genomes using a regression analysis. We propose a new regression model for studying two circular genomes, where the Möbius transformation naturally arises and is taken as the link function, and propose the least circular distance estimation method, as an appropriate method for analyzing circular variables. The main utility of the new regression model is in identification of a new angular location of one of a homologous gene pair between two circular genomes, for various types of possible gene mutations, given that of the other gene. Furthermore, we demonstrate the utility of our new regression model for grouping of various genomes based on closeness of their relationship. Using angular locations of homologous genes from the five pairs of circular genomes (Horimoto et al. in Bioinformatics 14:789–802, 1998), the new model is compared with the existing models. 相似文献

12.

Sparsity identification in ultra-high dimensional quantile regression models with longitudinal data

Xianli Gao 《统计学通讯:理论与方法》2020,49(19):4712-4736

Abstract

In this paper, we propose a variable selection method for quantile regression model in ultra-high dimensional longitudinal data called as the weighted adaptive robust lasso (WAR-Lasso) which is double-robustness. We derive the consistency and the model selection oracle property of WAR-Lasso. Simulation studies show the double-robustness of WAR-Lasso in both cases of heavy-tailed distribution of the errors and the heavy contaminations of the covariates. WAR-Lasso outperform other methods such as SCAD and etc. A real data analysis is carried out. It shows that WAR-Lasso tends to select fewer variables and the estimated coefficients are in line with economic significance. 相似文献

13.

Penalized least-squares estimation for regression coefficients in high-dimensional partially linear models

Huey-Fan Ni 《Journal of statistical planning and inference》2012,142(2):379-389

We consider a partially linear model with diverging number of groups of parameters in the parametric component. The variable selection and estimation of regression coefficients are achieved simultaneously by using the suitable penalty function for covariates in the parametric component. An MM-type algorithm for estimating parameters without inverting a high-dimensional matrix is proposed. The consistency and sparsity of penalized least-squares estimators of regression coefficients are discussed under the setting of some nonzero regression coefficients with very small values. It is found that the root p_n/n-consistency and sparsity of the penalized least-squares estimators of regression coefficients cannot be given consideration simultaneously when the number of nonzero regression coefficients with very small values is unknown, where p_n and n, respectively, denote the number of regression coefficients and sample size. The finite sample behaviors of penalized least-squares estimators of regression coefficients and the performance of the proposed algorithm are studied by simulation studies and a real data example. 相似文献

14.

Large sample properties of some survival estimators in heterogeneous samples

《Journal of statistical planning and inference》1997,60(1):123-138

相似文献

15.

Extensions of stability selection using subsamples of observations and covariates

Andre Beinrucker Ürün Dogan Gilles Blanchard 《Statistics and Computing》2016,26(5):1059-1077

We introduce extensions of stability selection, a method to stabilise variable selection methods introduced by Meinshausen and Bühlmann (J R Stat Soc 72:417–473, 2010). We propose to apply a base selection method repeatedly to random subsamples of observations and subsets of covariates under scrutiny, and to select covariates based on their selection frequency. We analyse the effects and benefits of these extensions. Our analysis generalizes the theoretical results of Meinshausen and Bühlmann (J R Stat Soc 72:417–473, 2010) from the case of half-samples to subsamples of arbitrary size. We study, in a theoretical manner, the effect of taking random covariate subsets using a simplified score model. Finally we validate these extensions on numerical experiments on both synthetic and real datasets, and compare the obtained results in detail to the original stability selection method. 相似文献

16.

Model selection with data-oriented penalty

《Journal of statistical planning and inference》1999,77(1):103-117

We consider the problem of model (or variable) selection in the classical regression model using the GIC (general information criterion). In this method the maximum likelihood is used with a penalty function denoted by C_n, depending on the sample size n and chosen to ensure consistency in the selection of the true model. There are various choices of C_n suggested in the literature on model selection. In this paper we show that a particular choice of C_n based on observed data, which makes it random, preserves the consistency property and provides improved performance over a fixed choice of C_n. 相似文献

17.

Modeling association between multivariate correlated outcomes and high-dimensional sparse covariates: the adaptive SVS method

J. Pecanka A. W. van der Vaart 《Journal of applied statistics》2019,46(5):893-913

The problem of modeling the relationship between a set of covariates and a multivariate response with correlated components often arises in many areas of research such as genetics, psychometrics, signal processing. In the linear regression framework, such task can be addressed using a number of existing methods. In the high-dimensional sparse setting, most of these methods rely on the idea of penalization in order to efficiently estimate the regression matrix. Examples of such methods include the lasso, the group lasso, the adaptive group lasso or the simultaneous variable selection (SVS) method. Crucially, a suitably chosen penalty also allows for an efficient exploitation of the correlation structure within the multivariate response. In this paper we introduce a novel variant of such method called the adaptive SVS, which is closely linked with the adaptive group lasso. Via a simulation study we investigate its performance in the high-dimensional sparse regression setting. We provide a comparison with a number of other popular methods under different scenarios and show that the adaptive SVS is a powerful tool for efficient recovery of signal in such setting. The methods are applied to genetic data. 相似文献

18.

Covariate selection for accelerated failure time data

Ujjwal Das Nader Ebrahimi 《统计学通讯:理论与方法》2017,46(8):4051-4064

Selection of appropriate predictors for right censored time to event data is very often encountered by the practitioners. We consider the ?₁ penalized regression or “least absolute shrinkage and selection operator” as a tool for predictor selection in association with accelerated failure time model. The choice of the penalizing parameter λ is crucial to identify the correct set of covariates. In this paper, we propose an information theory-based method to choose λ under log-normal distribution. Furthermore, an efficient algorithm is discussed in the same context. The performance of the proposed λ and the algorithm is illustrated through simulation studies and a real data analysis. The convergence of the algorithm is also discussed. 相似文献

19.

Using the EM algorithm for Bayesian variable selection in logistic regression models with related covariates

M. D. Koslovsky M. D. Swartz L. Leon-Novelo W. Chan A. V. Wilkinson 《Journal of Statistical Computation and Simulation》2018,88(3):575-596

We develop a Bayesian variable selection method for logistic regression models that can simultaneously accommodate qualitative covariates and interaction terms under various heredity constraints. We use expectation-maximization variable selection (EMVS) with a deterministic annealing variant as the platform for our method, due to its proven flexibility and efficiency. We propose a variance adjustment of the priors for the coefficients of qualitative covariates, which controls false-positive rates, and a flexible parameterization for interaction terms, which accommodates user-specified heredity constraints. This method can handle all pairwise interaction terms as well as a subset of specific interactions. Using simulation, we show that this method selects associated covariates better than the grouped LASSO and the LASSO with heredity constraints in various exploratory research scenarios encountered in epidemiological studies. We apply our method to identify genetic and non-genetic risk factors associated with smoking experimentation in a cohort of Mexican-heritage adolescents. 相似文献

20.

Model-free feature screening for ultrahigh dimensional censored regression

Tingyou Zhou Liping Zhu 《Statistics and Computing》2017,27(4):947-961

In this paper we design a sure independent ranking and screening procedure for censored regression (cSIRS, for short) with ultrahigh dimensional covariates. The inverse probability weighted cSIRS procedure is model-free in the sense that it does not specify a parametric or semiparametric regression function between the response variable and the covariates. Thus, it is robust to model mis-specification. This model-free property is very appealing in ultrahigh dimensional data analysis, particularly when there is lack of information for the underlying regression structure. The cSIRS procedure is also robust in the presence of outliers or extreme values as it merely uses the rank of the censored response variable. We establish both the sure screening and the ranking consistency properties for the cSIRS procedure when the number of covariates p satisfies $p=o\{\exp (an)\}$, where a is a positive constant and n is the available sample size. The advantages of cSIRS over existing competitors are demonstrated through comprehensive simulations and an application to the diffuse large-B-cell lymphoma data set. 相似文献