首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
A recent article in this journal presented a variety of expressions for the coefficient of determination (R 2) and demonstrated that these expressions were generally not equivalent. The article discussed potential pitfalls in interpreting the R 2 statistic in ordinary least-squares regression analysis. The current article extends this discussion to the case in which regression models are fit by weighted least squares and points out an additional pitfall that awaits the unwary data analyst. We show that unthinking reliance on the R 2 statistic can lead to an overly optimistic interpretation of the proportion of variance accounted for in the regression. We propose a modification of the estimator and demonstrate its utility by example.  相似文献   

3.
The coefficient of determination (R 2) is perhaps the single most extensively used measure of goodness of fit for regression models. It is also widely misused. The primary source of the problem is that except for linear models with an intercept term, the several alternative R 2 statistics are not generally equivalent. This article discusses various considerations and potential pitfalls in using the R 2's. Specific points are exemplified by means of empirical data. A new resistant statistic is also introduced.  相似文献   

4.
Clusterwise regression aims to cluster data sets where the clusters are characterized by their specific regression coefficients in a linear regression model. In this paper, we propose a method for determining a partition which uses an idea of robust regression. We start with some random weighting to determine a start partition and continue in the spirit of M-estimators. The residuals for all regressions are used to assign the observations to the different groups. As target function we use the determination coefficient R2wR^{2}_{w} for the overall model. This coefficient is suitably defined for weighted regression.  相似文献   

5.
Inverse Gaussian regression models are useful for data where both the independent and dependent variable are nonnegative and the variance of the dependent variable depends on the independent variable. Zero intercept inverse Gaussian regression models are presented with nonconstant variance, constant ratio of variance to the mean and constant coefficient of variation. The power function for testing hypotheses about the slope is given for all of these models.  相似文献   

6.
Model summaries based on the ratio of fitted and null likelihoods have been proposed for generalised linear models, reducing to the familiar R2 coefficient of determination in the Gaussian model with identity link. In this note I show how to define the Cox–Snell and Nagelkerke summaries under arbitrary probability sampling designs, giving a design‐consistent estimator of the population model summary. It is also shown that for logistic regression models under case–control sampling the usual Cox–Snell and Nagelkerke R2 are not design‐consistent, but are systematically larger than would be obtained with a cross‐sectional or cohort sample from the same population, even in settings where the weighted and unweighted logistic regression estimators are similar or identical. Implementation of the new estimators is straightforward and code is provided in R.  相似文献   

7.
Variability explained by covariates or explained variance is a well‐known concept in assessing the importance of covariates for dependent outcomes. In this paper we study R2 statistics of explained variance pertinent to longitudinal data under linear mixed‐effect models, where the R2 statistics are computed at two different levels to measure, respectively, within‐ and between‐subject variabilities explained by the covariates. By deriving the limits of R2 statistics, we find that the interpretation of explained variance for the existing R2 statistics is clear only in the case where the covariance matrix of the outcome vector is compound symmetric. Two new R2 statistics are proposed to address the effect of time‐dependent covariate means. In the general case where the outcome covariance matrix is not compound symmetric, we introduce the concept of compound symmetry projection and use it to define level‐one and level‐two R2 statistics. Numerical results are provided to support the theoretical findings and demonstrate the performance of the R2 statistics. The Canadian Journal of Statistics 38: 352–368; 2010 © 2010 Statistical Society of Canada  相似文献   

8.
This paper extends an analysis of variance for categorical data (CATANOVA) procedure to multidimensional contingency tables involving several factors and a response variable measured on a nominal scale. Using an appropriate measure of total variation for multinomial data, partial and multiple association measures are developed as R2 quantities which parallel the analogous statistics in multiple linear regression for quantitative data. In addition, test statistics are derived in terms of these R2 criteria. Finally, this CATANOVA approach is illustrated within the context of 2 three-way contingency table from a multicenter clinicaltrial.  相似文献   

9.
For mixed regression models, we define a variance decomposition including three terms, explained individual variance, unexplained individual variance and noise variance. In contrast to traditional variance decomposition, it is thus the unexplained  , not the explained, variance that is split. It gives rise to a coefficient of individual determination (CID) defined as the estimated fraction of explained individual variance. We argue that in many applications CID is a valuable complement to R2R2, since it excludes noise variance (which can never be explained) and thus has one as a natural upper bound.  相似文献   

10.
To assess the quality of the fit in a multiple linear regression, the coefficient of determination or R2 is a very simple tool, yet the most used by practitioners. Indeed, it is reported in most statistical analyzes, and although it is not recommended as a final model selection tool, it provides an indication of the suitability of the chosen explanatory variables in predicting the response. In the classical setting, it is well known that the least-squares fit and coefficient of determination can be arbitrary and/or misleading in the presence of a single outlier. In many applied settings, the assumption of normality of the errors and the absence of outliers are difficult to establish. In these cases, robust procedures for estimation and inference in linear regression are available and provide a suitable alternative.  相似文献   

11.
This article examines several goodness-of-fit measures in the binary probit regression model. Existing pseudo-R 2 measures are reviewed, two modified and one new pseudo-R 2 measure are proposed. For the probit regression model, empirical comparisons are made for different goodness-of-fit measures with the squared sample correlation coefficient of the observed response and the predicted probabilities. As an illustration, the goodness-of-fit measures are applied to a “paid labor force” data set.  相似文献   

12.
Many robust regression estimators are defined by minimizing a measure of spread of the residuals. An accompanying R 2-measure, or multiple correlation coefficient, is then easily obtained. In this paper, local robustness properties of these robust R 2-coefficients are investigated. It is also shown how confidence intervals for the population multiple correlation coefficient can be constructed in the case of multivariate normality.  相似文献   

13.
The coefficient of determination, known also as the R 2, is a common measure in regression analysis. Many scientists use the R 2 and the adjusted R 2 on a regular basis. In most cases, the researchers treat the coefficient of determination as an index of ‘usefulness’ or ‘goodness of fit,’ and in some cases, they even treat it as a model selection tool. In cases in which the data is incomplete, most researchers and common statistical software will use complete case analysis in order to estimate the R 2, a procedure that might lead to biased results. In this paper, I introduce the use of multiple imputation for the estimation of R 2 and adjusted R 2 in incomplete data sets. I illustrate my methodology using a biomedical example.  相似文献   

14.
Inverse Gaussian regression models are useful for regression data where both variables are nonnegative and the variance of the dependent variable depends on the independent variable, Zero intercept inverse Gaussian regression models are presented with non-constant variance, constant ratio of variance to the mean and constant coefficient of variation, For purposes of calibration, the prediction band is used to give point and interval estimators for the independent variable, The results are illustrated with a real data set.  相似文献   

15.
Two methods are suggested for generating R 2 measures for a wide class of models. These measures are linked to the R 2 of the standard linear regression model through Wald and likelihood ratio statistics for testing the joint significance of the explanatory variables. Some currently used R 2's are shown to be special cases of these methods.  相似文献   

16.
R-squared (R2) and adjusted R-squared (R2Adj) are sometimes viewed as statistics detached from any target parameter, and sometimes as estimators for the population multiple correlation. The latter interpretation is meaningful only if the explanatory variables are random. This article proposes an alternative perspective for the case where the x’s are fixed. A new parameter is defined, in a similar fashion to the construction of R2, but relying on the true parameters rather than their estimates. (The parameter definition includes also the fixed x values.) This parameter is referred to as the “parametric” coefficient of determination, and denoted by ρ2*. The proposed ρ2* remains stable when irrelevant variables are removed (or added), unlike the unadjusted R2, which always goes up when variables, either relevant or not, are added to the model (and goes down when they are removed). The value of the traditional R2Adj may go up or down with added (or removed) variables, either relevant or not. It is shown that the unadjusted R2 overestimates ρ2*, while the traditional R2Adj underestimates it. It is also shown that for simple linear regression the magnitude of the bias of R2Adj can be as high as the bias of the unadjusted R2 (while their signs are opposite). Asymptotic convergence in probability of R2Adj to ρ2* is demonstrated. The effects of model parameters on the bias of R2 and R2Adj are characterized analytically and numerically. An alternative bi-adjusted estimator is presented and evaluated.  相似文献   

17.
For right-censored data, the accelerated failure time (AFT) model is an alternative to the commonly used proportional hazards regression model. It is a linear model for the (log-transformed) outcome of interest, and is particularly useful for censored outcomes that are not time-to-event, such as laboratory measurements. We provide a general and easily computable definition of the R2 measure of explained variation under the AFT model for right-censored data. We study its behavior under different censoring scenarios and under different error distributions; in particular, we also study its robustness when the parametric error distribution is misspecified. Based on Monte Carlo investigation results, we recommend the log-normal distribution as a robust error distribution to be used in practice for the parametric AFT model, when the R2 measure is of interest. We apply our methodology to an alcohol consumption during pregnancy data set from Ukraine.  相似文献   

18.
In regression models with multiplicative error, estimation is often based on either the log-normal or the gamma model. It is well known that the gamma model with constant coefficient of variation and the log-normal model with constant variance give almost the same analysis. This article focuses on the discrepancies of the regression estimates between the two models based on real examples. It identifies that even though the variance or the coefficient of variation remains constant, but regression estimates may be different between the two models. It also identifies that for the same positive data set, the variance is constant under the log-normal model but non-constant under the gamma model. For this data set, the regression estimates are completely different between the two models. In the process, it explains the causes of discrepancies between the two models.  相似文献   

19.
Prior studies have shown that automated variable selection results in models with substantially inflated estimates of the model R 2, and that a large proportion of selected variables are truly noise variables. These earlier studies used simulated data sets whose sample sizes were at most 100. We used Monte Carlo simulations to examine the large-sample performance of backwards variable elimination. We found that in large samples, backwards variable elimination resulted in estimates of R 2 that were at most marginally biased. However, even in large samples, backwards elimination tended to identify the correct regression model in a minority of the simulated data sets.  相似文献   

20.
We provide a simple result on the H-decomposition of a U-statistics that allows for easy determination of its magnitude when the statistic’s kernel depends on the sample size n. The result provides a direct and convenient method to characterize the asymptotic magnitude of semiparametric and nonparametric estimators or test statistics involving high dimensional sums. We illustrate the use of our result in previously studied estimators/test statistics and in a novel nonparametric R2 test for overall significance of a nonparametric regression model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号