首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the problem of selecting variables in a multivariate linear regression model, we derive new Bayesian information criteria based on a prior mixing a smooth distribution and a delta distribution. Each of them can be interpreted as a fusion of the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Inheriting their asymptotic properties, our information criteria are consistent in variable selection in both the large-sample and the high-dimensional asymptotic frameworks. In numerical simulations, variable selection methods based on our information criteria choose the true set of variables with high probability in most cases.  相似文献   

2.
A multivariate linear calibration problem, in which response variable is multivariate and explanatory variable is univariate, is considered. In this paper a class of generalized inverse regression estimators is proposed in multi-univariate linear calibration. It includes the classical estimator and the inverse regression one (or Krutchkoff estimator). For the proposed estimator we derive the expressions of bias and mean square error (MSE). Furthermore the behavior of these characteristics is investigated through an analytical method. In addition through a numerical study we confirm the existence of a generalized inverse regression estimator to improve both the classical and the inverse regression estimators on the MSE criterion.  相似文献   

3.
In statistical analysis, one of the most important subjects is to select relevant exploratory variables that perfectly explain the dependent variable. Variable selection methods are usually performed within regression analysis. Variable selection is implemented so as to minimize the information criteria (IC) in regression models. Information criteria directly affect the power of prediction and the estimation of selected models. There are numerous information criteria in literature such as Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC). These criteria are modified for to improve the performance of the selected models. BIC is extended with alternative modifications towards the usage of prior and information matrix. Information matrix-based BIC (IBIC) and scaled unit information prior BIC (SPBIC) are efficient criteria for this modification. In this article, we proposed a combination to perform variable selection via differential evolution (DE) algorithm for minimizing IBIC and SPBIC in linear regression analysis. We concluded that these alternative criteria are very useful for variable selection. We also illustrated the efficiency of this combination with various simulation and application studies.  相似文献   

4.
An adaptive variable selection procedure is proposed which uses an adaptive test along with a stepwise procedure to select variables for a multiple regression model. We compared this adaptive stepwise procedure to methods that use Akaike's information criterion, Schwartz's information criterion, and Sawa's information criterion. The simulation studies demonstrated that the adaptive stepwise method is more effective than the traditional variable selection methods if the error distribution is not normally distributed. If the error distribution is known to be normally distributed, the variable selection method based on Sawa's information criteria appears to be superior to the other methods. Unless the error distribution is known to be normally distributed, the adaptive stepwise method is recommended.  相似文献   

5.
In a nonlinear regression model based on a regularization method, selection of appropriate regularization parameters is crucial. Information criteria such as generalized information criterion (GIC) and generalized Bayesian information criterion (GBIC) are useful for selecting the optimal regularization parameters. However, the optimal parameter is often determined by calculating information criterion for all candidate regularization parameters, and so the computational cost is high. One simple method by which to accomplish this is to regard GIC or GBIC as a function of the regularization parameters and to find a value minimizing GIC or GBIC. However, it is unclear how to solve the optimization problem. In the present article, we propose an efficient Newton–Raphson type iterative method for selecting optimal regularization parameters with respect to GIC or GBIC in a nonlinear regression model based on basis expansions. This method reduces the computational time remarkably compared to the grid search and can select more suitable regularization parameters. The effectiveness of the method is illustrated through real data examples.  相似文献   

6.
ABSTRACT

In this article, we propose a more general criterion called Sp -criterion, for subset selection in the multiple linear regression Model. Many subset selection methods are based on the Least Squares (LS) estimator of β, but whenever the data contain an influential observation or the distribution of the error variable deviates from normality, the LS estimator performs ‘poorly’ and hence a method based on this estimator (for example, Mallows’ Cp -criterion) tends to select a ‘wrong’ subset. The proposed method overcomes this drawback and its main feature is that it can be used with any type of estimator (either the LS estimator or any robust estimator) of β without any need for modification of the proposed criterion. Moreover, this technique is operationally simple to implement as compared to other existing criteria. The method is illustrated with examples.  相似文献   

7.
We introduce a multivariate heteroscedastic measurement error model for replications under scale mixtures of normal distribution. The model can provide a robust analysis and can be viewed as a generalization of multiple linear regression from both model structure and distribution assumption. An efficient method based on Markov Chain Monte Carlo is developed for parameter estimation. The deviance information criterion and the conditional predictive ordinates are used as model selection criteria. Simulation studies show robust inference behaviours of the model against both misspecification of distributions and outliers. We work out an illustrative example with a real data set on measurements of plant root decomposition.  相似文献   

8.
This paper considers a linear regression model with regression parameter vector β. The parameter of interest is θ= aTβ where a is specified. When, as a first step, a data‐based variable selection (e.g. minimum Akaike information criterion) is used to select a model, it is common statistical practice to then carry out inference about θ, using the same data, based on the (false) assumption that the selected model had been provided a priori. The paper considers a confidence interval for θ with nominal coverage 1 ‐ α constructed on this (false) assumption, and calls this the naive 1 ‐ α confidence interval. The minimum coverage probability of this confidence interval can be calculated for simple variable selection procedures involving only a single variable. However, the kinds of variable selection procedures used in practice are typically much more complicated. For the real‐life data presented in this paper, there are 20 variables each of which is to be either included or not, leading to 220 different models. The coverage probability at any given value of the parameters provides an upper bound on the minimum coverage probability of the naive confidence interval. This paper derives a new Monte Carlo simulation estimator of the coverage probability, which uses conditioning for variance reduction. For these real‐life data, the gain in efficiency of this Monte Carlo simulation due to conditioning ranged from 2 to 6. The paper also presents a simple one‐dimensional search strategy for parameter values at which the coverage probability is relatively small. For these real‐life data, this search leads to parameter values for which the coverage probability of the naive 0.95 confidence interval is 0.79 for variable selection using the Akaike information criterion and 0.70 for variable selection using Bayes information criterion, showing that these confidence intervals are completely inadequate.  相似文献   

9.
Two new model selection procedures based on a measure of roughness of the residuals in simple regression are proposed and studied. The first criterion utilises a certain loss function and the second comprises the application of hypotheses tests, using the bootstrap methodology. The performances of these selection rules are illustrated and comparisons are made with traditional criteria using real and artificial data, and it is found that the new selection methods perform more satisfactorily.  相似文献   

10.
We introduce a fully model-based approach of studying functional relationships between a multivariate circular-dependent variable and several circular covariates, enabling inference regarding all model parameters and related prediction. Two multiple circular regression models are presented for this approach. First, for an univariate circular-dependent variable, we propose the least circular mean-square error (LCMSE) estimation method, and asymptotic properties of the LCMSE estimators and inferential methods are developed and illustrated. Second, using a simulation study, we provide some practical suggestions for model selection between the two models. An illustrative example is given using a real data set from protein structure prediction problem. Finally, a straightforward extension to the case with a multivariate-dependent circular variable is provided.  相似文献   

11.
In this paper, we propose bandwidth selectors for nonparametric regression with dependent errors. The methods are based on criteria that approximate the average squared error. We show that these approximations are uniform over the bandwidth sequence. The criteria involve some constants that depend on the unknown error correlations. We propose a novel way of estimating these constants. Our numerical study shows that the method is quite efficient in a variety of error models.  相似文献   

12.
In this paper, a new estimation procedure based on composite quantile regression and functional principal component analysis (PCA) method is proposed for the partially functional linear regression models (PFLRMs). The proposed estimation method can simultaneously estimate both the parametric regression coefficients and functional coefficient components without specification of the error distributions. The proposed estimation method is shown to be more efficient empirically for non-normal random error, especially for Cauchy error, and almost as efficient for normal random errors. Furthermore, based on the proposed estimation procedure, we use the penalized composite quantile regression method to study variable selection for parametric part in the PFLRMs. Under certain regularity conditions, consistency, asymptotic normality, and Oracle property of the resulting estimators are derived. Simulation studies and a real data analysis are conducted to assess the finite sample performance of the proposed methods.  相似文献   

13.
Several estimators of squared prediction error have been suggested for use in model and bandwidth selection problems. Among these are cross-validation, generalized cross-validation and a number of related techniques based on the residual sum of squares. For many situations with squared error loss, e.g. nonparametric smoothing, these estimators have been shown to be asymptotically optimal in the sense that in large samples the estimator minimizing the selection criterion also minimizes squared error loss. However, cross-validation is known not to be asymptotically optimal for some `easy' location problems. We consider selection criteria based on estimators of squared prediction risk for choosing between location estimators. We show that criteria based on adjusted residual sum of squares are not asymptotically optimal for choosing between asymptotically normal location estimators that converge at rate n 1/2but are when the rate of convergence is slower. We also show that leave-one-out cross-validation is not asymptotically optimal for choosing between √ n -differentiable statistics but leave- d -out cross-validation is optimal when d ∞ at the appropriate rate.  相似文献   

14.
Cross-validation has been widely used in the context of statistical linear models and multivariate data analysis. Recently, technological advancements give possibility of collecting new types of data that are in the form of curves. Statistical procedures for analysing these data, which are of infinite dimension, have been provided by functional data analysis. In functional linear regression, using statistical smoothing, estimation of slope and intercept parameters is generally based on functional principal components analysis (FPCA), that allows for finite-dimensional analysis of the problem. The estimators of the slope and intercept parameters in this context, proposed by Hall and Hosseini-Nasab [On properties of functional principal components analysis, J. R. Stat. Soc. Ser. B: Stat. Methodol. 68 (2006), pp. 109–126], are based on FPCA, and depend on a smoothing parameter that can be chosen by cross-validation. The cross-validation criterion, given there, is time-consuming and hard to compute. In this work, we approximate this cross-validation criterion by such another criterion so that we can turn to a multivariate data analysis tool in some sense. Then, we evaluate its performance numerically. We also treat a real dataset, consisting of two variables; temperature and the amount of precipitation, and estimate the regression coefficients for the former variable in a model predicting the latter one.  相似文献   

15.
Based on B-spline basis functions and smoothly clipped absolute deviation (SCAD) penalty, we present a new estimation and variable selection procedure based on modal regression for partially linear additive models. The outstanding merit of the new method is that it is robust against outliers or heavy-tail error distributions and performs no worse than the least-square-based estimation for normal error case. The main difference is that the standard quadratic loss is replaced by a kernel function depending on a bandwidth that can be automatically selected based on the observed data. With appropriate selection of the regularization parameters, the new method possesses the consistency in variable selection and oracle property in estimation. Finally, both simulation study and real data analysis are performed to examine the performance of our approach.  相似文献   

16.
A substantial fraction of the statistical analyses and in particular statistical computing is done under the heading of multiple linear regression. That is the fitting of equations to multivariate data using the least squares technique for estimating parameters The optimality properties of these estimates are described in an ideal setting which is not often realized in practice.

Frequently, we do not have "good" data in the sense that the errors are non-normal or the variance is non-homogeneous. The data may contain outliers or extremes which are not easily detectable but variables in the proper functional, and we. may have the linearity

Prior to the mid-sixties regression programs provided just the basic least squares computations plus possibly a step-wise algorithm for variable selection. The increased interest in regression prompted by dramatic improvements in computers has led to a vast amount of literatur describing alternatives to least squares improved variable selection methods and extensive diagnostic procedures

The purpose of this paper is to summarize and illustrate some of these recent developments. In particular we shall review some of the potential problems with regression data discuss the statistics and techniques used to detect these problems and consider some of the proposed solutions. An example is presented to illustrate the effectiveness of these diagnostic methods in revealing such problems and the potential consequences of employing the proposed methods.  相似文献   

17.
Summary. The paper presents a general strategy for selecting the bandwidth of nonparametric regression estimators and specializes it to local linear regression smoothers. The procedure requires the sample to be divided into a training sample and a testing sample. Using the training sample we first compute a family of regression smoothers indexed by their bandwidths. Next we select the bandwidth by minimizing the empirical quadratic prediction error on the testing sample. The resulting bandwidth satisfies a finite sample oracle inequality which holds for all bounded regression functions. This permits asymptotically optimal estimation for nearly any regression function. The practical performance of the method is illustrated by a simulation study which shows good finite sample behaviour of our method compared with other bandwidth selection procedures.  相似文献   

18.
Model selection problems arise while constructing unbiased or asymptotically unbiased estimators of measures known as discrepancies to find the best model. Most of the usual criteria are based on goodness-of-fit and parsimony. They aim to maximize a transformed version of likelihood. For linear regression models with normally distributed error, the situation is less clear when two models are equivalent: are they close to or far from the unknown true model? In this work, based on stochastic simulation and parametric simulation, we study the results of Vuong's test, Cox's test, Akaike's information criterion, Bayesian information criterion, Kullback information criterion and bias corrected Kullback information criterion and the ability of these tests to discriminate between non-nested linear models.  相似文献   

19.
Summary.  We propose two test statistics for use in inverse regression problems Y = K θ + ɛ , where K is a given linear operator which cannot be continuously inverted. Thus, only noisy, indirect observations Y for the function θ are available. Both test statistics have a counterpart in classical hypothesis testing, where they are called the order selection test and the data-driven Neyman smooth test. We also introduce two model selection criteria which extend the classical Akaike information criterion and Bayes information criterion to inverse regression problems. In a simulation study we show that the inverse order selection and Neyman smooth tests outperform their direct counterparts in many cases. The theory is motivated by data arising in confocal fluorescence microscopy. Here, images are observed with blurring, modelled as convolution, and stochastic error at subsequent times. The aim is then to reduce the signal-to-noise ratio by averaging over the distinct images. In this context it is relevant to decide whether the images are still equal, or have changed by outside influences such as moving of the object table.  相似文献   

20.
Biased regression estimators have traditionally benn studied using the Mean Square Error (MSE) criterion. Usually these comparisons have been based on the sum of the MSE's of each of the individual parameters, i.e., a scaler valued measure that is the trace of the MSE matrix. However, since this summed MSE does not consider the covariance structure of the estimators, we propose the use of a Pitman Measure of Closeness (PMC) criterion (Keating and Gupta, 1984; Keating and Mason, 1985). In this paper we consider two versions of PMC. One of these compares the estimates and the other compares the resultant predicted values for 12 different regression estimators. These estimators represent three classes of estimators, namely, ridge, shrunken, and principal component estimators. The comparisons of these estimators using the PMC criteria are contrasted with the usual MSE criteria as well as the prediction mean square error. Included in the estimators is a relatively new estimator termed the generalized principal component estimator proposed by Jolliffe. This estimator has previously received little attention in the literature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号