首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 859 毫秒
1.
In an epidemiological study the regression slope between a response and predictor variable is underestimated when the predictor variable is measured imprecisely. Repeat measurements of the predictor in individuals in a subset of the study or in a separate study can be used to estimate a multiplicative factor to correct for this 'regression dilution bias'. In applied statistics publications various methods have been used to estimate this correction factor. Here we compare six different estimation methods and explain how they fall into two categories, namely regression and correlation-based methods. We provide new asymptotic variance formulae for the optimal correction factors in each category, when these are estimated from the repeat measurements subset alone, and show analytically and by simulation that the correlation method of choice gives uniformly lower variance. The simulations also show that, when the correction factor is not much greater than 1, this correlation method gives a correction factor which is closer to the true value than that from the best regression method on up to 80% of occasions. We also provide a variance formula for a modified correlation method which uses the standard deviation of the predictor variable in the main study; this shows further improved performance provided that the correction factor is not too extreme. A confidence interval for a corrected regression slope in an epidemiological study should reflect the imprecision of both the uncorrected slope and the estimated correction factor. We provide formulae for this and show that, particularly when the correction factor is large and the size of the subset of repeat measures is small, the effect of allowing for imprecision in the estimated correction factor can be substantial.  相似文献   

2.
Jones and Copas (1986) present theoretical and simulation results on the relative merits of a Stein predictor (Copas, 1983) and the ordinary least squares predictor in the usual linear multiple regression model, when certain distributional properties of the regressor variables arising in the past differ from those for which predictions are to be made. Here, extension is made to the practical situation where the true regression parameters are unknown. A hypothesis testing procedure is developed to help determine which of shrinkage and least squares is preferable in any given instance. This approach is applied to explain some empirical evidence on the comparative merits of the two procedures, recently given by Berk (1984).  相似文献   

3.
Circular data are observations that are represented as points on a unit circle. Times of day and directions of wind are two such examples. In this work, we present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is useful especially when the likelihood surface is ill behaved. Markov chain Monte Carlo techniques are used to fit the proposed model and to generate predictions. The method is illustrated using an environmental data set.  相似文献   

4.
Summary. When a number of distinct models contend for use in prediction, the choice of a single model can offer rather unstable predictions. In regression, stochastic search variable selection with Bayesian model averaging offers a cure for this robustness issue but at the expense of requiring very many predictors. Here we look at Bayes model averaging incorporating variable selection for prediction. This offers similar mean-square errors of prediction but with a vastly reduced predictor space. This can greatly aid the interpretation of the model. It also reduces the cost if measured variables have costs. The development here uses decision theory in the context of the multivariate general linear model. In passing, this reduced predictor space Bayes model averaging is contrasted with single-model approximations. A fast algorithm for updating regressions in the Markov chain Monte Carlo searches for posterior inference is developed, allowing many more variables than observations to be contemplated. We discuss the merits of absolute rather than proportionate shrinkage in regression, especially when there are more variables than observations. The methodology is illustrated on a set of spectroscopic data used for measuring the amounts of different sugars in an aqueous solution.  相似文献   

5.
New robust estimates for variance components are introduced. Two simple models are considered: the balanced one-way classification model with a random factor and the balanced mixed model with one random factor and one fixed factor. However, the method of estimation proposed can be extended to more complex models. The new method of estimation we propose is based on the relationship between the variance components and the coefficients of the least-mean-squared-error predictor between two observations of the same group. This relationship enables us to transform the problem of estimating the variance components into the problem of estimating the coefficients of a simple linear regression model. The variance-component estimators derived from the least-squares regression estimates are shown to coincide with the maximum-likelihood estimates. Robust estimates of the variance components can be obtained by replacing the least-squares estimates by robust regression estimates. In particular, a Monte Carlo study shows that for outlier-contaminated normal samples, the estimates of variance components derived from GM regression estimates and the derived test outperform other robust procedures.  相似文献   

6.
Both the least squares estimator and M-estimators of regression coefficients are susceptible to distortion when high leverage points occur among the predictor variables in a multiple linear regression model. In this article a weighting scheme which enables one to bound the leverage values of a weighted matrix of predictor variables is proposed. Bounded-leverage weighting of the predictor variables followed by M-estimation of the regression coefficients is shown to be effective in protecting against distortion due to extreme predictor-variable values, extreme response values, or outlier-induced multieollinearites. Bounded-leverage estimators can also protect against distortion by small groups of high leverage points.  相似文献   

7.
An estimated 1 billion people suffer from hunger worldwide, and climate change, urbanization, and globalization have the potential to exacerbate this situation. Improved models for predicting food security are needed to understand these impacts and design interventions. However, food insecurity is the result of complex interactions between physical and socio-economic factors that can overwhelm linear regression models. More sophisticated data-mining approaches could provide an effective way to model these relationships and accurately predict food insecure situations. In this paper, we compare multiple regression and data-mining methods in their ability to predict the percent of a country's population that suffers from undernourishment using widely available predictor variables related to socio-economic settings, agricultural production and trade, and climate conditions. Averaging predictions from multiple models results in the lowest predictive error and provides an accurate method to predict undernourishment levels. Partial dependence plots are used to evaluate covariate influence and demonstrate the relationship between food insecurity and climatic and socio-economic variables. By providing insights into these relationships and a mechanism for predicting undernourishment using readily available data, statistical models like those developed here could be a useful tool for those tasked with understanding and addressing food insecurity.  相似文献   

8.
Naranjo and HeUmansperger (1994) recently derved a bounded influence rank regression method and suggested how hypotheses about the regression coefficients might be tested. This brief note reports some simulation results on how their procedure performs when there is one predictor. Even when the error term is highly skewed, good control over the Type I error probability is obtained Power can be high relative to least squares regression when the error term has a heavy tailed distribution .and the predictor has a symmetric distribution However, if the predictor has a skewed distribution, power can be relatively low even when the distribution of the error term is heavy tailed. Despite this, it is argued that their method provides an important and useful alternative to ordinary least squares as well as other robust regression methods.  相似文献   

9.
The estimation of population parameters of the continuous common factor model from categorical observed variables is meanwhile regularly performed. It is shown that the formula for the calculation of the determinacy of the regression factor score predictor from the estimated model parameters has to be adapted under these conditions. A method for the calculation of this determinacy from the model parameters of the continuous population factor model based on categorical variables is proposed and evaluated by means of simulated population data. It turns out that using the uncorrected formula can lead to serious overestimation of determinacy for categorical variables.  相似文献   

10.
Predictor importance in applied regression modeling gives the main operational tools for managers and decision-makers. The paper considers estimation of predictors' importance in regression using measures introduced in works by Gibson and R. Johnson (GJ), then modified by Green, Carroll, and DeSarbo, and developed further by J. Johnson (JJ). These indices of importance are based on the orthonormal decomposition of the data matrix, and the work shows how to improve this approximation. Using predictor importance, the regression coefficients can also be adjusted to reach the best data fit and to be meaningful and interpretable. The results are compared with the robust to multicollinearity, but computationally difficult, Shapley value regression (SVR). They show that the JJ index is good for importance estimation, but the GJ index outperforms it if both predictor importance and coefficients of regression are needed; hence, this index (GJ) can be used in place of the more computationally intensive estimation by SVR. The results can be easily estimated by the considered approach that is very useful in practical regression modeling and analysis, especially for big data.  相似文献   

11.
This article shows when the theoretical Lagrange multiplier solution for combining forecasts has a regression representation. This solution is not optimal in general because it imposes a restriction on an otherwise more general linear form. The optimal linear predictor based on N forecasts is presented. This predictor is or is not a regression function depending on whether the latter function is linear. I also show that the Lagrange multiplier solution may often be nearly optimal. Hence, when estimating a composite forecast, the restriction imposed by this solution may prove useful. This observation is supported in an empirical example.  相似文献   

12.
In functional linear regression, one conventional approach is to first perform functional principal component analysis (FPCA) on the functional predictor and then use the first few leading functional principal component (FPC) scores to predict the response variable. The leading FPCs estimated by the conventional FPCA stand for the major source of variation of the functional predictor, but these leading FPCs may not be mostly correlated with the response variable, so the prediction accuracy of the functional linear regression model may not be optimal. In this paper, we propose a supervised version of FPCA by considering the correlation of the functional predictor and response variable. It can automatically estimate leading FPCs, which represent the major source of variation of the functional predictor and are simultaneously correlated with the response variable. Our supervised FPCA method is demonstrated to have a better prediction accuracy than the conventional FPCA method by using one real application on electroencephalography (EEG) data and three carefully designed simulation studies.  相似文献   

13.
Abstract.  Previously, small area estimation under a nested error linear regression model was studied with area level covariates subject to measurement error. However, the information on observed covariates was not used in finding the Bayes predictor of a small area mean. In this paper, we first derive the fully efficient Bayes predictor by utilizing all the available data. We then estimate the regression and variance component parameters in the model to get an empirical Bayes (EB) predictor and show that the EB predictor is asymptotically optimal. In addition, we employ the jackknife method to obtain an estimator of mean squared prediction error (MSPE) of the EB predictor. Finally, we report the results of a simulation study on the performance of our EB predictor and associated jackknife MSPE estimators. Our results show that the proposed EB predictor can lead to significant gain in efficiency over the previously proposed EB predictor.  相似文献   

14.
Nested error linear regression models using survey weights have been studied in small area estimation to obtain efficient model‐based and design‐consistent estimators of small area means. The covariates in these nested error linear regression models are not subject to measurement errors. In practical applications, however, there are many situations in which the covariates are subject to measurement errors. In this paper, we develop a nested error linear regression model with an area‐level covariate subject to functional measurement error. In particular, we propose a pseudo‐empirical Bayes (PEB) predictor to estimate small area means. This predictor borrows strength across areas through the model and makes use of the survey weights to preserve the design consistency as the area sample size increases. We also employ a jackknife method to estimate the mean squared prediction error (MSPE) of the PEB predictor. Finally, we report the results of a simulation study on the performance of our PEB predictor and associated jackknife MSPE estimator.  相似文献   

15.
In this article, we propose a novel approach to fit a functional linear regression in which both the response and the predictor are functions. We consider the case where the response and the predictor processes are both sparsely sampled at random time points and are contaminated with random errors. In addition, the random times are allowed to be different for the measurements of the predictor and the response functions. The aforementioned situation often occurs in longitudinal data settings. To estimate the covariance and the cross‐covariance functions, we use a regularization method over a reproducing kernel Hilbert space. The estimate of the cross‐covariance function is used to obtain estimates of the regression coefficient function and of the functional singular components. We derive the convergence rates of the proposed cross‐covariance, the regression coefficient, and the singular component function estimators. Furthermore, we show that, under some regularity conditions, the estimator of the coefficient function has a minimax optimal rate. We conduct a simulation study and demonstrate merits of the proposed method by comparing it to some other existing methods in the literature. We illustrate the method by an example of an application to a real‐world air quality dataset. The Canadian Journal of Statistics 47: 524–559; 2019 © 2019 Statistical Society of Canada  相似文献   

16.
ABSTRACT

We introduce a nonparametric quantile predictor for multivariate time series via generalizing the well-known univariate conditional quantile into a multivariate setting for dependent data. Applying the multivariate predictor to predicting tail conditional quantiles from foreign exchange daily returns, it is observed that the accuracy of extreme tail quantile predictions can be greatly improved by incorporating interdependence between the returns in a bivariate framework. As a special application of the multivariate quantile predictor, we also introduce a so-called joint-horizon quantile predictor that is used to produce multi-step quantile predictions in one-go from univariate time series realizations. A simulation example is discussed to illustrate the relevance of the joint-horizon approach.  相似文献   

17.
18.
Research on the multiple comparison during the past 60 years or so has focused mainly on the comparison of several population means. Spurrier (J Am Stat Assoc 94:483–488, 1999) and Liu et al. (J Am Stat Assoc 99:395–403, 2004) considered the multiple comparison of several linear regression lines. They assumed that there was no functional relationship between the predictor variables. For the case of the polynomial regression model, the functional relationship between the predictor variables does exist. This lack of a full utilization of the functional relationship between the predictor variables may have some undesirable consequences. In this article we introduce an exact method for the multiple comparison of several polynomial regression models. This method sufficiently takes advantage of the feature of the polynomial regression model, and therefore, it can quickly and accurately compute the critical constant. This proposed method allows various types of comparisons, including pairwise, many-to-one and successive, and it also allows the predictor variable to be either unconstrained or constrained to a finite interval. The examples from the dose-response study are used to illustrate the method. MATLAB programs have been written for easy implementation of this method.  相似文献   

19.
Nonparametric seemingly unrelated regression provides a powerful alternative to parametric seemingly unrelated regression for relaxing the linearity assumption. The existing methods are limited, particularly with sharp changes in the relationship between the predictor variables and the corresponding response variable. We propose a new nonparametric method for seemingly unrelated regression, which adopts a tree-structured regression framework, has satisfiable prediction accuracy and interpretability, no restriction on the inclusion of categorical variables, and is less vulnerable to the curse of dimensionality. Moreover, an important feature is constructing a unified tree-structured model for multivariate data, even though the predictor variables corresponding to the response variable are entirely different. This unified model can offer revelatory insights such as underlying economic meaning. We propose the key factors of tree-structured regression, which are an impurity function detecting complex nonlinear relationships between the predictor variables and the response variable, split rule selection with negligible selection bias, and tree size determination solving underfitting and overfitting problems. We demonstrate our proposed method using simulated data and illustrate it using data from the Korea stock exchange sector indices.  相似文献   

20.
The paper describes two regression models—principal components and maximum-likelihood factor analysis—which may be used when the stochastic predictor varibles are highly intereorrelated and/or contain measurement error. The two problems can occur jointly, for example in social-survey data where the true (but unobserved) covariance matrix can be singular. Departure from singularity of the sample dispersion matrix is then due to measurement error. We first consider the more elementary principal components regression model, where it is shown that it can be derived as a special case of (i) canonical correlation, and (ii) restricted least squares. The second part consists of the more general maximum-likelihood factor-analysis regression model, which is derived from the generalized inverse of the product of two singular matrices. Also, it is proved that factor-analysis regression can be considered as an instrumental variables estimator and therefore does not depend on whether factors have been “properly” identified in terms of substantive behaviour. Consequently the additional task of rotating factors to “simple structure” does not arise.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号