首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Our article presents a general treatment of the linear regression model, in which the error distribution is modelled nonparametrically and the error variances may be heteroscedastic, thus eliminating the need to transform the dependent variable in many data sets. The mean and variance components of the model may be either parametric or nonparametric, with parsimony achieved through variable selection and model averaging. A Bayesian approach is used for inference with priors that are data-based so that estimation can be carried out automatically with minimal input by the user. A Dirichlet process mixture prior is used to model the error distribution nonparametrically; when there are no regressors in the model, the method reduces to Bayesian density estimation, and we show that in this case the estimator compares favourably with a well-regarded plug-in density estimator. We also consider a method for checking the fit of the full model. The methodology is applied to a number of simulated and real examples and is shown to work well.  相似文献   

2.
The problem of selecting the correct subset of predictors within a linear model has received much attention in recent literature. Within the Bayesian framework, a popular choice of prior has been Zellner's gg-prior which is based on the inverse of empirical covariance matrix of the predictors. An extension of the Zellner's prior is proposed in this article which allow for a power parameter on the empirical covariance of the predictors. The power parameter helps control the degree to which correlated predictors are smoothed towards or away from one another. In addition, the empirical covariance of the predictors is used to obtain suitable priors over model space. In this manner, the power parameter also helps to determine whether models containing highly collinear predictors are preferred or avoided. The proposed power parameter can be chosen via an empirical Bayes method which leads to a data adaptive choice of prior. Simulation studies and a real data example are presented to show how the power parameter is well determined from the degree of cross-correlation within predictors. The proposed modification compares favorably to the standard use of Zellner's prior and an intrinsic prior in these examples.  相似文献   

3.
The paper presents an overview of maximum likelihood estimation using simulated likelihood, including the use of antithetic variables and evaluation of the simulation error of the resulting estimates. It gives a general purpose implementation of simulated maximum likelihood and uses it to re‐visit four models that have previously appeared in the published literature: a state–space model for count data; a nested random effects model for binomial data; a nonlinear growth model with crossed random effects; and a crossed random effects model for binary salamander‐mating data. In the case of the last three examples, this appears to be the first time that maximum likelihood fits of these models have been presented.  相似文献   

4.
The composed error of a stochastic frontier (SF) model consists of two random variables, and the identification of the model relies heavily on the distribution assumptions for each of these variables. While the literature has put much effort into applying various SF models to a wide range of empirical problems, little has been done to test the distribution assumptions of these two variables. In this article, by exploiting the specification structures of the SF model, we propose a centered-residuals-based method of moments which can be easily and flexibly applied to testing the distribution assumptions on both of the random variables and to estimating the model parameters. A Monte Carlo simulation is conducted to assess the performance of the proposed method. We also provide two empirical examples to demonstrate the use of the proposed estimator and test using real data.  相似文献   

5.
In this article, we study model selection and model averaging in quantile regression. Under general conditions, we develop a focused information criterion and a frequentist model average estimator for the parameters in quantile regression model, and examine their theoretical properties. The new procedures provide a robust alternative to the least squares method or likelihood method, and a major advantage of the proposed procedures is that when the variance of random error is infinite, the proposed procedure works beautifully while the least squares method breaks down. A simulation study and a real data example are presented to show that the proposed method performs well with a finite sample and is easy to use in practice.  相似文献   

6.
Motivated by an entropy inequality, we propose for the first time a penalized profile likelihood method for simultaneously selecting significant variables and estimating unknown coefficients in multiple linear regression models in this article. The new method is robust to outliers or errors with heavy tails and works well even for error with infinite variance. Our proposed approach outperforms the adaptive lasso in both theory and practice. It is observed from the simulation studies that (i) the new approach possesses higher probability of correctly selecting the exact model than the least absolute deviation lasso and the adaptively penalized composite quantile regression approach and (ii) exact model selection via our proposed approach is robust regardless of the error distribution. An application to a real dataset is also provided.  相似文献   

7.
Variable selection over a potentially large set of covariates in a linear model is quite popular. In the Bayesian context, common prior choices can lead to a posterior expectation of the regression coefficients that is a sparse (or nearly sparse) vector with a few nonzero components, those covariates that are most important. This article extends the “global‐local” shrinkage idea to a scenario where one wishes to model multiple response variables simultaneously. Here, we have developed a variable selection method for a K‐outcome model (multivariate regression) that identifies the most important covariates across all outcomes. The prior for all regression coefficients is a mean zero normal with coefficient‐specific variance term that consists of a predictor‐specific factor (shared local shrinkage parameter) and a model‐specific factor (global shrinkage term) that differs in each model. The performance of our modeling approach is evaluated through simulation studies and a data example.  相似文献   

8.
The goal of this paper is to compare several widely used Bayesian model selection methods in practical model selection problems, highlight their differences and give recommendations about the preferred approaches. We focus on the variable subset selection for regression and classification and perform several numerical experiments using both simulated and real world data. The results show that the optimization of a utility estimate such as the cross-validation (CV) score is liable to finding overfitted models due to relatively high variance in the utility estimates when the data is scarce. This can also lead to substantial selection induced bias and optimism in the performance evaluation for the selected model. From a predictive viewpoint, best results are obtained by accounting for model uncertainty by forming the full encompassing model, such as the Bayesian model averaging solution over the candidate models. If the encompassing model is too complex, it can be robustly simplified by the projection method, in which the information of the full model is projected onto the submodels. This approach is substantially less prone to overfitting than selection based on CV-score. Overall, the projection method appears to outperform also the maximum a posteriori model and the selection of the most probable variables. The study also demonstrates that the model selection can greatly benefit from using cross-validation outside the searching process both for guiding the model size selection and assessing the predictive performance of the finally selected model.  相似文献   

9.
The article considers a Gaussian model with the mean and the variance modeled flexibly as functions of the independent variables. The estimation is carried out using a Bayesian approach that allows the identification of significant variables in the variance function, as well as averaging over all possible models in both the mean and the variance functions. The computation is carried out by a simulation method that is carefully constructed to ensure that it converges quickly and produces iterates from the posterior distribution that have low correlation. Real and simulated examples demonstrate that the proposed method works well. The method in this paper is important because (a) it produces more realistic prediction intervals than nonparametric regression estimators that assume a constant variance; (b) variable selection identifies the variables in the variance function that are important; (c) variable selection and model averaging produce more efficient prediction intervals than those obtained by regular nonparametric regression.  相似文献   

10.
In this article, we present a Bayesian modeling for response variables restricted to the interval (0, 1), such as proportions and rates, using the simplex distribution for cases in which data have a longitudinal form, taking random effects into account. In order to investigate the stability of posterior distribution, we study through sensitivity analysis, the effect of three different uniparametric prior distributions for variance parameters of random effect on the final estimation. For this purpose, we consider homogeneous and heterogeneous structures for parameters in location and dispersion submodels. Models and results are illustrated with simulated and real data application.  相似文献   

11.
In data sets with many predictors, algorithms for identifying a good subset of predictors are often used. Most such algorithms do not allow for any relationships between predictors. For example, stepwise regression might select a model containing an interaction AB but neither main effect A or B. This paper develops mathematical representations of this and other relations between predictors, which may then be incorporated in a model selection procedure. A Bayesian approach that goes beyond the standard independence prior for variable selection is adopted, and preference for certain models is interpreted as prior information. Priors relevant to arbitrary interactions and polynomials, dummy variables for categorical factors, competing predictors, and restrictions on the size of the models are developed. Since the relations developed are for priors, they may be incorporated in any Bayesian variable selection algorithm for any type of linear model. The application of the methods here is illustrated via the stochastic search variable selection algorithm of George and McCulloch (1993), which is modified to utilize the new priors. The performance of the approach is illustrated with two constructed examples and a computer performance dataset.  相似文献   

12.
Wavelet shrinkage estimation is an increasingly popular method for signal denoising and compression. Although Bayes estimators can provide excellent mean-squared error (MSE) properties, the selection of an effective prior is a difficult task. To address this problem, we propose empirical Bayes (EB) prior selection methods for various error distributions including the normal and the heavier-tailed Student t -distributions. Under such EB prior distributions, we obtain threshold shrinkage estimators based on model selection, and multiple-shrinkage estimators based on model averaging. These EB estimators are seen to be computationally competitive with standard classical thresholding methods, and to be robust to outliers in both the data and wavelet domains. Simulated and real examples are used to illustrate the flexibility and improved MSE performance of these methods in a wide variety of settings.  相似文献   

13.
For the balanced variance component model when the intraclass correlation coefficient is of interest, Bayesian analysis is often appropriate. Berger and Bernardo’s (1992a) grouped ordering reference prior approach is used to analyze this model. The reference priors are developed and compared for the posterior inference with real and simulated data. We examine whether the reference priors satisfy the probability-matching criterion. Further, the reference prior is shown to be good in the sense of correct frequentist coverage probability of the posterior quantile.  相似文献   

14.
Linear discriminant analysis between two populations is considered in this paper. Error rate is reviewed as a criterion for selection of variables, and a stepwise procedure is outlined that selects variables on the basis of empirical estimates of error. Problems with assessment of the selected variables are highlighted. A leave-one-out method is proposed for estimating the true error rate of the selected variables, or alternatively of the selection procedure itself. Monte Carlo simulations, of multivariate binary as well as multivariate normal data, demonstrate the feasibility of the proposed method and indicate its much greater accuracy relative to that of other available methods.  相似文献   

15.
In this paper, we examine the potential determinants of foreign direct investment. For this purpose, we apply new exact subset selection procedures, which are based on idealized assumptions, as well as their possibly more plausible empirical counterparts to an international data set to select the optimal set of predictors. Unlike the standard model selection procedures AIC and BIC, which penalize only the number of variables included in a model, and the subset selection procedures RIC and MRIC, which consider also the total number of available candidate variables, our data-specific procedures even take the correlation structure of all candidate variables into account. Our main focus is on a new procedure, which we have designed for situations where some of the potential predictors are certain to be included in the model. For a sample of 73 developing countries, this procedure selects only four variables, namely imports, net income from abroad, gross capital formation, and GDP per capita. An important secondary finding of our study is that the data-specific procedures, which are based on extensive simulations and are therefore very time-consuming, can be approximated reasonably well by the much simpler exact methods.  相似文献   

16.
The structural approach of inference for the parameters of a simultaneous equation model with heteroscedastic error variance is investigated in this paper. The joint and the marginal structural distributions for the coefficients of the exogenous variables and the scale parameters of the error variables, and the marginal likelihood function of the coefficients of the endogenous variables have been derived. The estimates are directly obtainable from the structural distribution and the marginal likelihood function of the parameters. The marginal distribution of a subset of coefficients of exogenous variables provides the basis for making inference for a particular subset of parameter of interest.  相似文献   

17.
Variable selection problem is one of the most important tasks in regression analysis, especially in a high-dimensional setting. In this paper, we study this problem in the context of scalar response functional regression model, which is a linear model with scalar response and functional regressors. The functional model can be represented by certain multiple linear regression model via basis expansions of functional variables. Based on this model and random subspace method of Mielniczuk and Teisseyre (Comput Stat Data Anal 71:725–742, 2014), two simple variable selection procedures for scalar response functional regression model are proposed. The final functional model is selected by using generalized information criteria. Monte Carlo simulation studies conducted and a real data example show very satisfactory performance of new variable selection methods under finite samples. Moreover, they suggest that considered procedures outperform solutions found in the literature in terms of correctly selected model, false discovery rate control and prediction error.  相似文献   

18.
A Gaussian process (GP) can be thought of as an infinite collection of random variables with the property that any subset, say of dimension n, of these variables have a multivariate normal distribution of dimension n, mean vector β and covariance matrix Σ [O'Hagan, A., 1994, Kendall's Advanced Theory of Statistics, Vol. 2B, Bayesian Inference (John Wiley & Sons, Inc.)]. The elements of the covariance matrix are routinely specified through the multiplication of a common variance by a correlation function. It is important to use a correlation function that provides a valid covariance matrix (positive definite). Further, it is well known that the smoothness of a GP is directly related to the specification of its correlation function. Also, from a Bayesian point of view, a prior distribution must be assigned to the unknowns of the model. Therefore, when using a GP to model a phenomenon, the researcher faces two challenges: the need of specifying a correlation function and a prior distribution for its parameters. In the literature there are many classes of correlation functions which provide a valid covariance structure. Also, there are many suggestions of prior distributions to be used for the parameters involved in these functions. We aim to investigate how sensitive the GPs are to the (sometimes arbitrary) choices of their correlation functions. For this, we have simulated 25 sets of data each of size 64 over the square [0, 5]×[0, 5] with a specific correlation function and fixed values of the GP's parameters. We then fit different correlation structures to these data, with different prior specifications and check the performance of the adjusted models using different model comparison criteria.  相似文献   

19.
This article presents a semiparametric method for estimating receiver operating characteristic surface under density ratio model. The construction of the proposed method is based on the adjacent-category logit model and the empirical likelihood approach. A bootstrap approach for the VUS estimator inference is presented. In a simulation study, the proposed estimator is compared with the existing parametric and nonparametric estimators in terms of bias, standard error, and mean square error. Finally, a real data example and some discussions on the proposed method are provided.  相似文献   

20.
This article proposes a stochastic version of the matching pursuit algorithm for Bayesian variable selection in linear regression. In the Bayesian formulation, the prior distribution of each regression coefficient is assumed to be a mixture of a point mass at 0 and a normal distribution with zero mean and a large variance. The proposed stochastic matching pursuit algorithm is designed for sampling from the posterior distribution of the coefficients for the purpose of variable selection. The proposed algorithm can be considered a modification of the componentwise Gibbs sampler. In the componentwise Gibbs sampler, the variables are visited by a random or a systematic scan. In the stochastic matching pursuit algorithm, the variables that better align with the current residual vector are given higher probabilities of being visited. The proposed algorithm combines the efficiency of the matching pursuit algorithm and the Bayesian formulation with well defined prior distributions on coefficients. Several simulated examples of small n and large p are used to illustrate the algorithm. These examples show that the algorithm is efficient for screening and selecting variables.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号