首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
Classification models can demonstrate apparent prediction accuracy even when there is no underlying relationship between the predictors and the response. Variable selection procedures can lead to false positive variable selections and overestimation of true model performance. A simulation study was conducted using logistic regression with forward stepwise, best subsets, and LASSO variable selection methods with varying total sample sizes (20, 50, 100, 200) and numbers of random noise predictor variables (3, 5, 10, 15, 20, 50). Using our critical values can help reduce needless follow-up on variables having no true association with the outcome.  相似文献   

2.
In this paper, we analytically derive the exact formula for the mean squared error (MSE) of two weighted average (WA) estimators for each individual regression coefficient. Further, we execute numerical evaluations to investigate small sample properties of the WA estimators, and compare the MSE performance of the WA estimators with the other shrinkage estimators and the usual OLS estimator. Our numerical results show that (1) the WA estimators have smaller MSE than the other shrinkage estimators and the OLS estimator over a wide region of parameter space; (2) the range where the relative MSE of the WA estimator is smaller than that of the OLS estimator gets narrower as the number of explanatory variables k increases.  相似文献   

3.
If uncorrelated random variables have a common expected value and decreasing variances, then the variance of a sample mean is decreasing with the number of observations. Unfortunately, this natural and desirable variance reduction property (VRP) by augmenting data is not automatically inherited by ordinary least-squares (OLS) estimators of parameters. We derive a new decomposition for updating the covariance matrices of the OLS which implies conditions for the OLS to have the VRP. In particular, in the case of a straight-line regression, we show that the OLS estimators of intercept and slope have the VRP if the values of the explanatory variable are increasing. This also holds true for alternating two-point experimental designs.  相似文献   

4.
Nonparametric seemingly unrelated regression provides a powerful alternative to parametric seemingly unrelated regression for relaxing the linearity assumption. The existing methods are limited, particularly with sharp changes in the relationship between the predictor variables and the corresponding response variable. We propose a new nonparametric method for seemingly unrelated regression, which adopts a tree-structured regression framework, has satisfiable prediction accuracy and interpretability, no restriction on the inclusion of categorical variables, and is less vulnerable to the curse of dimensionality. Moreover, an important feature is constructing a unified tree-structured model for multivariate data, even though the predictor variables corresponding to the response variable are entirely different. This unified model can offer revelatory insights such as underlying economic meaning. We propose the key factors of tree-structured regression, which are an impurity function detecting complex nonlinear relationships between the predictor variables and the response variable, split rule selection with negligible selection bias, and tree size determination solving underfitting and overfitting problems. We demonstrate our proposed method using simulated data and illustrate it using data from the Korea stock exchange sector indices.  相似文献   

5.
Presence of collinearity among the explanatory variables results in larger standard errors of parameters estimated. When multicollinearity is present among the explanatory variables, the ordinary least-square (OLS) estimators tend to be unstable due to larger variance of the estimators of the regression coefficients. As alternatives to OLS estimators few ridge estimators are available in the literature. This article presents some of the popular ridge estimators and attempts to provide (i) a generalized class of ridge estimators and (ii) a modified ridge estimator. The performance of the proposed estimators is investigated with the help of Monte Carlo simulation technique. Simulation results indicate that the suggested estimators perform better than the ordinary least-square (OLS) estimators and other estimators considered in this article.  相似文献   

6.
The calibration of forecasts for a sequence of events has an extensive literature. Since calibration does not ensure ‘good’ forecasts, the notion of refinement was introduced to provide a structure into which methods for comparing well-calibrated forecasters could be embedded.In this paper we apply these two concepts, calibration and refinement, to tree-structured statistical probability prediction systems by viewing predictions in terms of the expected value of a response variable given the values of a set of explanatory variables. When all of the variables are categorical, we show that, under suitable conditions, branching at the terminal node of a tree by adding another explanatory variable yields a tree with more refined predictions.  相似文献   

7.
Consider the usual linear regression model consisting of two or more explanatory variables. There are many methods aimed at indicating the relative importance of the explanatory variables. But in general these methods do not address a fundamental issue: when all of the explanatory variables are included in the model, how strong is the empirical evidence that the first explanatory variable is more or less important than the second explanatory variable? How strong is the empirical evidence that the first two explanatory variables are more important than the third explanatory variable? The paper suggests a robust method for dealing with these issues. The proposed technique is based on a particular version of explanatory power used in conjunction with a modification of the basic percentile method.  相似文献   

8.
In many medical studies patients are nested or clustered within doctor. With many explanatory variables, variable selection with clustered data can be challenging. We propose a method for variable selection based on random forest that addresses clustered data through stratified binary splits. Our motivating example involves the detection orthopedic device components from a large pool of candidates, where each patient belongs to a surgeon. Simulations compare the performance of survival forests grown using the stratified logrank statistic to conventional and robust logrank statistics, as well as a method to select variables using a threshold value based on a variable's empirical null distribution. The stratified logrank test performs superior to conventional and robust methods when data are generated to have cluster-specific effects, and when cluster sizes are sufficiently large, perform comparably to the splitting alternatives in the absence of cluster-specific effects. Thresholding was effective at distinguishing between important and unimportant variables.  相似文献   

9.
We consider the estimation of a regression coefficient in a linear regression when observations are missing due to nonresponse. Response is assumed to be determined by a nonobservable variable which is linearly related to an observable variable. The values of the observable variable are assumed to be available for the whole sample but the variable is not includsd in the regression relationship of interest . Several alternative estimators have been proposed for this situation under various simplifying assumptions. A sampling theory approach provides three alternative estimatrs by considering the observatins as obtained from a sub-sample, selected on the basis of the fully observable variable , as formulated by Nathan and Holt (1980). Under an econometric approach, Heckman (1979) proposed a two-stage (probit and OLS) estimator which is consistent under specificconditions. A simulation comparison of the four estimators and the ordinary least squares estimator , under multivariate normality of all the variables involved, indicates that the econometric approach estimator is not robust to departures from the conditions underlying its derivation, while two of the other estimators exhibit a similar degree of stable performance over a wide range of conditions. Simulations for a non-normal distribution show that gains in performance can be obtained if observations on the independent variable are available for the whole population.  相似文献   

10.
This article applies general engineering rules for describing the reliability of devices working under variable stresses. The approach is based on imposing completeness and physicality. Completeness refers to the model's capability for studying as many stated conditions as possible, and physicality refers to the model's capability for incorporating explanatory variables specified and related each other by the physical laws. The proposed reliability model has as many explanatory variables as necessary but only three unknown parameters, and hence, it allows the engineer to collect reliability data from different tests campaigns, and to extrapolate reliability results towards other operational and design points.  相似文献   

11.
At present, ensemble learning has exhibited its great power in stabilizing and enhancing the performance of some traditional variable selection methods such as lasso and genetic algorithm. In this paper, a novel bagging ensemble method called BSSW is developed to implement variable ranking and selection in linear regression models. Its main idea is to execute stepwise search algorithm on multiple bootstrap samples. In each trial, a mixed importance measure is assigned to each variable according to the order that it is selected into final model as well as the improvement of model fitting resulted from its inclusion. Based on the importance measure averaged across some bootstrapping trials, all candidate variables are ranked and then decided to be important or not. To extend the scope of application, BSSW is extended to the situation of generalized linear models. Experiments carried out with some simulated and real data indicate that BSSW achieves better performance in most studied cases when compared with several other existing methods.  相似文献   

12.
This paper describes a permutation procedure to test for the equality of selected elements of a covariance or correlation matrix across groups. It involves either centring or standardising each variable within each group before randomly permuting observations between groups. Since the assumption of exchangeability of observations between groups does not strictly hold following such transformations, Monte Carlo simulations were used to compare expected and empirical rejection levels as a function of group size, the number of groups and distribution type (Normal, mixtures of Normals and Gamma with various values of the shape parameter). The Monte Carlo study showed that the estimated probability levels are close to those that would be obtained with an exact test except at very small sample sizes (5 or 10 observations per group). The test appears robust against non-normal data, different numbers of groups or variables per group and unequal sample sizes per group. Power was increased with increasing sample size, effect size and the number of elements in the matrix and power was decreased with increasingly unequal numbers of observations per group.  相似文献   

13.
Many estimation procedures for quantitative linear models with autocorrelated errors have been proposed in the literature. A number of these procedures have been compared in various ways for different sample sizes and autocorrelation parameters values and for structured or random explanatory vaiables. In this paper, we revisit three situations that were considered to some extent in previous studies, by comparing ten estimation procedures: Ordinary Least Squares (OLS), Generalized Least Squares (GLS), estimated Generalized Least Squares (six procedures), Maximum Likelihood (ML), and First Differences (FD). The six estimated GLS procedures and the ML procedure differ in the way the error autocovariance matrix is estimated. The three situations can be defined as follows: Case 1, the explanatory variable x in the simple linear regression is fixed; Case 2,x is purely random; and Case 3x is first-order autoregressive. Following a theoretical presentation, the ten estimation procedures are compared in a Monte Carlo study conducted in the time domain, where the errors are first-order autoregressive in Cases 1-3. The measure of comparison for the estimation procedures is their efficiency relative to OLS. It is evaluated as a function of the time series length and the magnitude and sign of the error autocorrelation parameter. Overall, knowledge of the model of the time series process generating the errors enhances efficiency in estimated GLS. Differences in the efficiency of estimation procedures between Case 1 and Cases 2 and 3 as well as differences in efficiency among procedures in a given situation are observed and discussed.  相似文献   

14.
In this study, the performances of linear regression techniques, which are especially used in clinical chemistry in method comparison studies, are compared via the Monte-Carlo simulation. The regression techniques that take the measurement errors of both dependent and independent variables into account are called Type II regression techniques. In this study, we also compare the performances of Type II and Type I (classical regression techniques that do not take the measurement errors of the independent variable into account) regression techniques for different sample sizes and different shape parameters of the Weibull distribution. The mean square error is used as a performance criterion of each technique. MATLAB 7.02 software is used in the simulation study. As a result, in all conditions, the ordinary least-square (OLS)-bisector regression technique, which bisects the OLS(Y | X) and the OLS(X | Y), shows the best performance.  相似文献   

15.
The integration of results of independent studies in order to make inferences about a common threshold is an important problem with many practical applications. In this article, we apply the generalized variable method to make inferences on the common threshold of several exponential distributions when the scale (or rate) parameters are unknown and unequal. The merits of the proposed method are computed numerically and compared with other existing methods. Numerical results of both simulation studies and real data analyses show that the proposed method is applicable and its performance is better than other methods even when sample sizes are small.  相似文献   

16.
To compare their performance on high dimensional data, several regression methods are applied to data sets in which the number of exploratory variables greatly exceeds the sample sizes. The methods are stepwise regression, principal components regression, two forms of latent root regression, partial least squares, and a new method developed here. The data are four sample sets for which near infrared reflectance spectra have been determined and the regression methods use the spectra to estimate the concentration of various chemical constituents, the latter having been determined by standard chemical analysis. Thirty-two regression equations are estimated using each method and their performances are evaluated using validation data sets. Although it is the most widely used, stepwise regression was decidedly poorer than the other methods considered. Differences between the latter were small with partial least squares performing slightly better than other methods under all criteria examined, albeit not by a statistically significant amount.  相似文献   

17.

In this article, the validity of procedures for testing the significance of the slope in quantitative linear models with one explanatory variable and first-order autoregressive [AR(1)] errors is analyzed in a Monte Carlo study conducted in the time domain. Two cases are considered for the regressor: fixed and trended versus random and AR(1). In addition to the classical t -test using the Ordinary Least Squares (OLS) estimator of the slope and its standard error, we consider seven t -tests with n-2\,\hbox{df} built on the Generalized Least Squares (GLS) estimator or an estimated GLS estimator, three variants of the classical t -test with different variances of the OLS estimator, two asymptotic tests built on the Maximum Likelihood (ML) estimator, the F -test for fixed effects based on the Restricted Maximum Likelihood (REML) estimator in the mixed-model approach, two t -tests with n - 2 df based on first differences (FD) and first-difference ratios (FDR), and four modified t -tests using various corrections of the number of degrees of freedom. The FDR t -test, the REML F -test and the modified t -test using Dutilleul's effective sample size are the most valid among the testing procedures that do not assume the complete knowledge of the covariance matrix of the errors. However, modified t -tests are not applicable and the FDR t -test suffers from a lack of power when the regressor is fixed and trended ( i.e. , FDR is the same as FD in this case when observations are equally spaced), whereas the REML algorithm fails to converge at small sample sizes. The classical t -test is valid when the regressor is fixed and trended and autocorrelation among errors is predominantly negative, and when the regressor is random and AR(1), like the errors, and autocorrelation is moderately negative or positive. We discuss the results graphically, in terms of the circularity condition defined in repeated measures ANOVA and of the effective sample size used in correlation analysis with autocorrelated sample data. An example with environmental data is presented.  相似文献   

18.
Neighbor designs have their own importance in the experiments to remove the neighbor effects where the performance of a treatment is affected by the treatments applied to its adjacent plots. If each pair of distinct treatments appears exactly once as neighbors, neighbor designs are called minimal. Most of the neighbor designs require a large number of blocks of equal sizes. In this situation minimal neighbor designs in unequal block sizes are preferred to reduce the experimental material. In this article some series are presented to construct minimal neighbor designs in circular blocks of unequal sizes.  相似文献   

19.
20.
In this paper we are concerned with the problems of variable selection and estimation in double generalized linear models in which both the mean and the dispersion are allowed to depend on explanatory variables. We propose a maximum penalized pseudo-likelihood method when the number of parameters diverges with the sample size. With appropriate selection of the tuning parameters, the consistency of the variable selection procedure and asymptotic properties of the resulting estimators are established. We also carry out simulation studies and a real data analysis to assess the finite sample performance of the proposed variable selection procedure, showing that the proposed variable selection method works satisfactorily.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号