期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A variable selection approach to monotonic regression with Bernstein polynomials

S. McKay Curtis Sujit K. Ghosh 《Journal of applied statistics》2011,38(5):961-976

One of the standard problems in statistics consists of determining the relationship between a response variable and a single predictor variable through a regression function. Background scientific knowledge is often available that suggests that the regression function should have a certain shape (e.g. monotonically increasing or concave) but not necessarily a specific parametric form. Bernstein polynomials have been used to impose certain shape restrictions on regression functions. The Bernstein polynomials are known to provide a smooth estimate over equidistant knots. Bernstein polynomials are used in this paper due to their ease of implementation, continuous differentiability, and theoretical properties. In this work, we demonstrate a connection between the monotonic regression problem and the variable selection problem in the linear model. We develop a Bayesian procedure for fitting the monotonic regression model by adapting currently available variable selection procedures. We demonstrate the effectiveness of our method through simulations and the analysis of real data. 相似文献

2.

Bayesian analysis of semiparametric Bernstein polynomial regression models for data with sample selection

Hea-Jung Kim Taeyoung Roh 《Statistics》2013,47(5):1082-1111

In regression analysis, a sample selection scheme often applies to the response variable, which results in missing not at random observations on the variable. In this case, a regression analysis using only the selected cases would lead to biased results. This paper proposes a Bayesian methodology to correct this bias based on a semiparametric Bernstein polynomial regression model that incorporates the sample selection scheme into a stochastic monotone trend constraint, variable selection, and robustness against departures from the normality assumption. We present the basic theoretical properties of the proposed model that include its stochastic representation, sample selection bias quantification, and hierarchical model specification to deal with the stochastic monotone trend constraint in the nonparametric component, simple bias corrected estimation, and variable selection for the linear components. We then develop computationally feasible Markov chain Monte Carlo methods for semiparametric Bernstein polynomial functions with stochastically constrained parameter estimation and variable selection procedures. We demonstrate the finite-sample performance of the proposed model compared to existing methods using simulation studies and illustrate its use based on two real data applications. 相似文献

3.

Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach 总被引：1，自引：0，他引：1

John D. Storey Jonathan E. Taylor David Siegmund 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2004,66(1):187-205

Summary. The false discovery rate (FDR) is a multiple hypothesis testing quantity that describes the expected proportion of false positive results among all rejected null hypotheses. Benjamini and Hochberg introduced this quantity and proved that a particular step-up p -value method controls the FDR. Storey introduced a point estimate of the FDR for fixed significance regions. The former approach conservatively controls the FDR at a fixed predetermined level, and the latter provides a conservatively biased estimate of the FDR for a fixed predetermined significance region. In this work, we show in both finite sample and asymptotic settings that the goals of the two approaches are essentially equivalent. In particular, the FDR point estimates can be used to define valid FDR controlling procedures. In the asymptotic setting, we also show that the point estimates can be used to estimate the FDR conservatively over all significance regions simultaneously, which is equivalent to controlling the FDR at all levels simultaneously. The main tool that we use is to translate existing FDR methods into procedures involving empirical processes. This simplifies finite sample proofs, provides a framework for asymptotic results and proves that these procedures are valid even under certain forms of dependence. 相似文献

4.

Robust variable selection for the varying coefficient model based on composite L 1–L 2 regression

Weihua Zhao Jicai Liu 《Journal of applied statistics》2013,40(9):2024-2040

The varying coefficient model (VCM) is an important generalization of the linear regression model and many existing estimation procedures for VCM were built on L ₂ loss, which is popular for its mathematical beauty but is not robust to non-normal errors and outliers. In this paper, we address the problem of both robustness and efficiency of estimation and variable selection procedure based on the convex combined loss of L ₁ and L ₂ instead of only quadratic loss for VCM. By using local linear modeling method, the asymptotic normality of estimation is driven and a useful selection method is proposed for the weight of composite L ₁ and L ₂. Then the variable selection procedure is given by combining local kernel smoothing with adaptive group LASSO. With appropriate selection of tuning parameters by Bayesian information criterion (BIC) the theoretical properties of the new procedure, including consistency in variable selection and the oracle property in estimation, are established. The finite sample performance of the new method is investigated through simulation studies and the analysis of body fat data. Numerical studies show that the new method is better than or at least as well as the least square-based method in terms of both robustness and efficiency for variable selection. 相似文献

5.

Consistent Bayesian information criterion based on a mixture prior for possibly high-dimensional multivariate linear regression models

Haruki Kono Tatsuya Kubokawa 《Scandinavian Journal of Statistics》2023,50(3):1022-1047

In the problem of selecting variables in a multivariate linear regression model, we derive new Bayesian information criteria based on a prior mixing a smooth distribution and a delta distribution. Each of them can be interpreted as a fusion of the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Inheriting their asymptotic properties, our information criteria are consistent in variable selection in both the large-sample and the high-dimensional asymptotic frameworks. In numerical simulations, variable selection methods based on our information criteria choose the true set of variables with high probability in most cases. 相似文献

6.

Large-scale multiple testing under dependence

Wenguang Sun T. Tony Cai 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2009,71(2):393-424

Summary. The paper considers the problem of multiple testing under dependence in a compound decision theoretic framework. The observed data are assumed to be generated from an underlying two-state hidden Markov model. We propose oracle and asymptotically optimal data-driven procedures that aim to minimize the false non-discovery rate FNR subject to a constraint on the false discovery rate FDR. It is shown that the performance of a multiple-testing procedure can be substantially improved by adaptively exploiting the dependence structure among hypotheses, and hence conventional FDR procedures that ignore this structural information are inefficient. Both theoretical properties and numerical performances of the procedures proposed are investigated. It is shown that the procedures proposed control FDR at the desired level, enjoy certain optimality properties and are especially powerful in identifying clustered non-null cases. The new procedure is applied to an influenza-like illness surveillance study for detecting the timing of epidemic periods. 相似文献

7.

False discovery rates for large-scale model checking under certain dependence

Lu Deng Xuemin Zi 《统计学通讯:理论与方法》2018,47(1):64-79

In many scientific fields, it is interesting and important to determine whether an observed data stream comes from a prespecified model or not, particularly when the number of data streams is of large scale, where multiple hypotheses testing is necessary. In this article, we consider large-scale model checking under certain dependence among different data streams observed at the same time. We propose a false discovery rate (FDR) control procedure to check those unusual data streams. Specifically, we derive an approximation of false discovery and construct a point estimate of FDR. Theoretical results show that, under some mild assumptions, our proposed estimate of FDR is simultaneously conservatively consistent with the true FDR, and hence it is an asymptotically strong control procedure. Simulation comparisons with some competing procedures show that our proposed FDR procedure behaves better in general settings. Application of our proposed FDR procedure is illustrated by the StarPlus fMRI data. 相似文献

8.

Investigation about a screening step in model selection

Willi Sauerbrei Norbert Holländer Anika Buchholz 《Statistics and Computing》2008,18(2):195-208

In many studies a large number of variables is measured and the identification of relevant variables influencing an outcome is an important task. For variable selection several procedures are available. However, focusing on one model only neglects that there usually exist other equally appropriate models. Bayesian or frequentist model averaging approaches have been proposed to improve the development of a predictor. With a larger number of variables (say more than ten variables) the resulting class of models can be very large. For Bayesian model averaging Occam’s window is a popular approach to reduce the model space. As this approach may not eliminate any variables, a variable screening step was proposed for a frequentist model averaging procedure. Based on the results of selected models in bootstrap samples, variables are eliminated before deriving a model averaging predictor. As a simple alternative screening procedure backward elimination can be used. Through two examples and by means of simulation we investigate some properties of the screening step. In the simulation study we consider situations with fifteen and 25 variables, respectively, of which seven have an influence on the outcome. With the screening step most of the uninfluential variables will be eliminated, but also some variables with a weak effect. Variable screening leads to more applicable models without eliminating models, which are more strongly supported by the data. Furthermore, we give recommendations for important parameters of the screening step. 相似文献

9.

Consistency of Bayesian linear model selection with a growing number of parameters

Zuofeng Shang Murray K. Clayton 《Journal of statistical planning and inference》2011,141(11):3463-3474

Linear models with a growing number of parameters have been widely used in modern statistics. One important problem about this kind of model is the variable selection issue. Bayesian approaches, which provide a stochastic search of informative variables, have gained popularity. In this paper, we will study the asymptotic properties related to Bayesian model selection when the model dimension p is growing with the sample size n. We consider p≤n and provide sufficient conditions under which: (1) with large probability, the posterior probability of the true model (from which samples are drawn) uniformly dominates the posterior probability of any incorrect models; and (2) the posterior probability of the true model converges to one in probability. Both (1) and (2) guarantee that the true model will be selected under a Bayesian framework. We also demonstrate several situations when (1) holds but (2) fails, which illustrates the difference between these two properties. Finally, we generalize our results to include g-priors, and provide simulation examples to illustrate the main results. 相似文献

10.

Some Results on the Control of the False Discovery Rate under Dependence

ALESSIO FARCOMENI 《Scandinavian Journal of Statistics》2007,34(2):275-297

Abstract. Controlling the false discovery rate (FDR) is a powerful approach to multiple testing, with procedures developed with applications in many areas. Dependence among the test statistics is a common problem, and many attempts have been made to extend the procedures. In this paper, we show that a certain degree of dependence is allowed among the test statistics, when the number of tests is large, with no need for any correction. We then suggest a way to conservatively estimate the proportion of false nulls, both under dependence and independence, and discuss the advantages of using such estimators when controlling the FDR. 相似文献

11.

Generalised Rank Regression Estimator with Standard Error Adjusted Lasso

下载免费PDF全文

A.S. Turkmen O. Ozturk 《Australian & New Zealand Journal of Statistics》2016,58(1):121-135

One of the standard variable selection procedures in multiple linear regression is to use a penalisation technique in least‐squares (LS) analysis. In this setting, many different types of penalties have been introduced to achieve variable selection. It is well known that LS analysis is sensitive to outliers, and consequently outliers can present serious problems for the classical variable selection procedures. Since rank‐based procedures have desirable robustness properties compared to LS procedures, we propose a rank‐based adaptive lasso‐type penalised regression estimator and a corresponding variable selection procedure for linear regression models. The proposed estimator and variable selection procedure are robust against outliers in both response and predictor space. Furthermore, since rank regression can yield unstable estimators in the presence of multicollinearity, in order to provide inference that is robust against multicollinearity, we adjust the penalty term in the adaptive lasso function by incorporating the standard errors of the rank estimator. The theoretical properties of the proposed procedures are established and their performances are investigated by means of simulations. Finally, the estimator and variable selection procedure are applied to the Plasma Beta‐Carotene Level data set. 相似文献

12.

Bayesian variable selection in Poisson change-point regression analysis

S. Min 《统计学通讯:模拟与计算》2017,46(3):2267-2282

In this article, we develop a Bayesian variable selection method that concerns selection of covariates in the Poisson change-point regression model with both discrete and continuous candidate covariates. Ranging from a null model with no selected covariates to a full model including all covariates, the Bayesian variable selection method searches the entire model space, estimates posterior inclusion probabilities of covariates, and obtains model averaged estimates on coefficients to covariates, while simultaneously estimating a time-varying baseline rate due to change-points. For posterior computation, the Metropolis-Hastings within partially collapsed Gibbs sampler is developed to efficiently fit the Poisson change-point regression model with variable selection. We illustrate the proposed method using simulated and real datasets. 相似文献

13.

Bayesian variable selection for the Cox regression model with missing covariates

Ibrahim JG Chen MH Kim S 《Lifetime data analysis》2008,14(4):496-520

In this paper, we develop Bayesian methodology and computational algorithms for variable subset selection in Cox proportional hazards models with missing covariate data. A new joint semi-conjugate prior for the piecewise exponential model is proposed in the presence of missing covariates and its properties are examined. The covariates are assumed to be missing at random (MAR). Under this new prior, a version of the Deviance Information Criterion (DIC) is proposed for Bayesian variable subset selection in the presence of missing covariates. Monte Carlo methods are developed for computing the DICs for all possible subset models in the model space. A Bone Marrow Transplant (BMT) dataset is used to illustrate the proposed methodology. 相似文献

14.

Comparing Fits of Latent Trait and Latent Class Models Applied to Sparse Binary Data: An Illustration with Human Resource Management Data

Lilian M. De Menezes Ana Lasaosa 《Journal of applied statistics》2007,34(3):303-319

This paper addresses the problem of comparing the fit of latent class and latent trait models when the indicators are binary and the contingency table is sparse. This problem is common in the analysis of data from large surveys, where many items are associated with an unobservable variable. A study of human resource data illustrates: (1) how the usual goodness-of-fit tests, model selection and cross-validation criteria can be inconclusive; (2) how model selection and evaluation procedures from time series and economic forecasting can be applied to extend residual analysis in this context. 相似文献

15.

Reversible jump Markov chain Monte Carlo algorithms for Bayesian variable selection in logistic mixed models

Jia-Chiun Pan Mei-Hsien Lee 《统计学通讯:模拟与计算》2018,47(8):2234-2247

In this article, to reduce computational load in performing Bayesian variable selection, we used a variant of reversible jump Markov chain Monte Carlo methods, and the Holmes and Held (HH) algorithm, to sample model index variables in logistic mixed models involving a large number of explanatory variables. Furthermore, we proposed a simple proposal distribution for model index variables, and used a simulation study and real example to compare the performance of the HH algorithm with our proposed and existing proposal distributions. The results show that the HH algorithm with our proposed proposal distribution is a computationally efficient and reliable selection method. 相似文献

16.

Economic variable selection

Koji Miyawaki Steven N. MacEachern 《Revue canadienne de statistique》2023,51(1):19-37

Regression plays a central role in the discipline of statistics and is the primary analytic technique in many research areas. Variable selection is a classical and major problem for regression. This article emphasizes the economic aspect of variable selection. The problem is formulated in terms of the cost of predictors to be purchased for future use: only the subset of covariates used in the model will need to be purchased. This leads to a decision-theoretic formulation of the variable selection problems, which includes the cost of predictors as well as their effect. We adopt a Bayesian perspective and propose two approaches to address uncertainty about the model and model parameters. These approaches, termed the restricted and extended approaches, lead us to rethink model averaging. From an objective or robust Bayes point of view, the former is preferred. The proposed method is applied to three popular datasets for illustration. 相似文献

17.

Focused and Model Average Estimation for Regression Analysis of Panel Count Data

下载免费PDF全文

Haiying Wang Yang Li Jianguo Sun 《Scandinavian Journal of Statistics》2015,42(3):732-745

Panel count data arise in many fields and a number of estimation procedures have been developed along with two procedures for variable selection. In this paper, we discuss model selection and parameter estimation together. For the former, a focused information criterion (FIC) is presented and for the latter, a frequentist model average (FMA) estimation procedure is developed. A main advantage, also the difference from the existing model selection methods, of the FIC is that it emphasizes the accuracy of the estimation of the parameters of interest, rather than all parameters. Further efficiency gain can be achieved by the FMA estimation procedure as unlike existing methods, it takes into account the variability in the stage of model selection. Asymptotic properties of the proposed estimators are established, and a simulation study conducted suggests that the proposed methods work well for practical situations. An illustrative example is also provided. © 2014 Board of the Foundation of the Scandinavian Journal of Statistics 相似文献

18.

NEW EFFICIENT ESTIMATION AND VARIABLE SELECTION METHODS FOR SEMIPARAMETRIC VARYING-COEFFICIENT PARTIALLY LINEAR MODELS 总被引：1，自引：0，他引：1

Kai B Li R Zou H 《Annals of statistics》2011,39(1):305-332

The complexity of semiparametric models poses new challenges to statistical inference and model selection that frequently arise from real applications. In this work, we propose new estimation and variable selection procedures for the semiparametric varying-coefficient partially linear model. We first study quantile regression estimates for the nonparametric varying-coefficient functions and the parametric regression coefficients. To achieve nice efficiency properties, we further develop a semiparametric composite quantile regression procedure. We establish the asymptotic normality of proposed estimators for both the parametric and nonparametric parts and show that the estimators achieve the best convergence rate. Moreover, we show that the proposed method is much more efficient than the least-squares-based method for many non-normal errors and that it only loses a small amount of efficiency for normal errors. In addition, it is shown that the loss in efficiency is at most 11.1% for estimating varying coefficient functions and is no greater than 13.6% for estimating parametric components. To achieve sparsity with high-dimensional covariates, we propose adaptive penalization methods for variable selection in the semiparametric varying-coefficient partially linear model and prove that the methods possess the oracle property. Extensive Monte Carlo simulation studies are conducted to examine the finite-sample performance of the proposed procedures. Finally, we apply the new methods to analyze the plasma beta-carotene level data. 相似文献

19.

Bayesian model selection using test statistics

Jianhua Hu Valen E. Johnson 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2009,71(1):143-158

Summary. Existing Bayesian model selection procedures require the specification of prior distributions on the parameters appearing in every model in the selection set. In practice, this requirement limits the application of Bayesian model selection methodology. To overcome this limitation, we propose a new approach towards Bayesian model selection that uses classical test statistics to compute Bayes factors between possible models. In several test cases, our approach produces results that are similar to previously proposed Bayesian model selection and model averaging techniques in which prior distributions were carefully chosen. In addition to eliminating the requirement to specify complicated prior distributions, this method offers important computational and algorithmic advantages over existing simulation-based methods. Because it is easy to evaluate the operating characteristics of this procedure for a given sample size and specified number of covariates, our method facilitates the selection of hyperparameter values through prior-predictive simulation. 相似文献

20.

Bayesian spectral analysis models for quantile regression with Dirichlet process mixtures

Seongil Jo Taeyoung Roh 《Journal of nonparametric statistics》2016,28(1):177-206

This paper presents a Bayesian analysis of partially linear additive models for quantile regression. We develop a semiparametric Bayesian approach to quantile regression models using a spectral representation of the nonparametric regression functions and the Dirichlet process (DP) mixture for error distribution. We also consider Bayesian variable selection procedures for both parametric and nonparametric components in a partially linear additive model structure based on the Bayesian shrinkage priors via a stochastic search algorithm. Based on the proposed Bayesian semiparametric additive quantile regression model referred to as BSAQ, the Bayesian inference is considered for estimation and model selection. For the posterior computation, we design a simple and efficient Gibbs sampler based on a location-scale mixture of exponential and normal distributions for an asymmetric Laplace distribution, which facilitates the commonly used collapsed Gibbs sampling algorithms for the DP mixture models. Additionally, we discuss the asymptotic property of the sempiparametric quantile regression model in terms of consistency of posterior distribution. Simulation studies and real data application examples illustrate the proposed method and compare it with Bayesian quantile regression methods in the literature. 相似文献