期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Quantifying R 2 bias in the presence of measurement error

Karl D. Majeske Terri Lynch-Caris Janet Brelin-Fornari 《Journal of applied statistics》2010,37(4):667-677

相似文献

2.

Another Cautionary Note about R 2: Its Use in Weighted Least-Squares Regression Analysis

John B. Willett Judith D. Singer 《The American statistician》2013,67(3):236-238

A recent article in this journal presented a variety of expressions for the coefficient of determination (R ²) and demonstrated that these expressions were generally not equivalent. The article discussed potential pitfalls in interpreting the R ² statistic in ordinary least-squares regression analysis. The current article extends this discussion to the case in which regression models are fit by weighted least squares and points out an additional pitfall that awaits the unwary data analyst. We show that unthinking reliance on the R ² statistic can lead to an overly optimistic interpretation of the proportion of variance accounted for in the regression. We propose a modification of the estimator and demonstrate its utility by example. 相似文献

3.

Cautionary Note about R 2

Tarald O. Kvålseth 《The American statistician》2013,67(4):279-285

The coefficient of determination (R ²) is perhaps the single most extensively used measure of goodness of fit for regression models. It is also widely misused. The primary source of the problem is that except for linear models with an intercept term, the several alternative R ² statistics are not generally equivalent. This article discusses various considerations and potential pitfalls in using the R ²'s. Specific points are exemplified by means of empirical data. A new resistant statistic is also introduced. 相似文献

4.

A weighted least-squares approach to clusterwise regression

Rainer Schlittgen 《AStA Advances in Statistical Analysis》2011,95(2):205-217

Clusterwise regression aims to cluster data sets where the clusters are characterized by their specific regression coefficients in a linear regression model. In this paper, we propose a method for determining a partition which uses an idea of robust regression. We start with some random weighting to determine a start partition and continue in the spirit of M-estimators. The residuals for all regressions are used to assign the observations to the different groups. As target function we use the determination coefficient R²_wR^{2}_{w} for the overall model. This coefficient is suitably defined for weighted regression. 相似文献

5.

POWER FUNCTION FOR INVERSE GAUSSIAN REGRESSION MODELS

《统计学通讯:理论与方法》2013,42(5):787-797

Inverse Gaussian regression models are useful for data where both the independent and dependent variable are nonnegative and the variance of the dependent variable depends on the independent variable. Zero intercept inverse Gaussian regression models are presented with nonconstant variance, constant ratio of variance to the mean and constant coefficient of variation. The power function for testing hypotheses about the slope is given for all of these models. 相似文献

6.

Pseudo‐R2 statistics under complex sampling

下载免费PDF全文

Thomas Lumley 《Australian & New Zealand Journal of Statistics》2017,59(2):187-194

Model summaries based on the ratio of fitted and null likelihoods have been proposed for generalised linear models, reducing to the familiar R² coefficient of determination in the Gaussian model with identity link. In this note I show how to define the Cox–Snell and Nagelkerke summaries under arbitrary probability sampling designs, giving a design‐consistent estimator of the population model summary. It is also shown that for logistic regression models under case–control sampling the usual Cox–Snell and Nagelkerke R² are not design‐consistent, but are systematically larger than would be obtained with a cross‐sectional or cohort sample from the same population, even in settings where the weighted and unweighted logistic regression estimators are similar or identical. Implementation of the new estimators is straightforward and code is provided in R. 相似文献

7.

Variability explained by covariates in linear mixed‐effect models for longitudinal data

Bo Hu Jun Shao Mari Palta 《Revue canadienne de statistique》2010,38(3):352-368

Variability explained by covariates or explained variance is a well‐known concept in assessing the importance of covariates for dependent outcomes. In this paper we study R² statistics of explained variance pertinent to longitudinal data under linear mixed‐effect models, where the R² statistics are computed at two different levels to measure, respectively, within‐ and between‐subject variabilities explained by the covariates. By deriving the limits of R² statistics, we find that the interpretation of explained variance for the existing R² statistics is clear only in the case where the covariance matrix of the outcome vector is compound symmetric. Two new R² statistics are proposed to address the effect of time‐dependent covariate means. In the general case where the outcome covariance matrix is not compound symmetric, we introduce the concept of compound symmetry projection and use it to define level‐one and level‐two R² statistics. Numerical results are provided to support the theoretical findings and demonstrate the performance of the R² statistics. The Canadian Journal of Statistics 38: 352–368; 2010 © 2010 Statistical Society of Canada 相似文献

8.

Catanova for multidimensional contingency tables: Nominal-scale response

Robert J. Anderson J. Richard Landis 《统计学通讯:理论与方法》2013,42(11):1191-1206

This paper extends an analysis of variance for categorical data (CATANOVA) procedure to multidimensional contingency tables involving several factors and a response variable measured on a nominal scale. Using an appropriate measure of total variation for multinomial data, partial and multiple association measures are developed as R² quantities which parallel the analogous statistics in multiple linear regression for quantitative data. In addition, test statistics are derived in terms of these R² criteria. Finally, this CATANOVA approach is illustrated within the context of 2 three-way contingency table from a multicenter clinicaltrial. 相似文献

9.

On the coefficient of determination for mixed regression models

Ola Hössjer 《Journal of statistical planning and inference》2008

For mixed regression models, we define a variance decomposition including three terms, explained individual variance, unexplained individual variance and noise variance. In contrast to traditional variance decomposition, it is thus the unexplained , not the explained, variance that is split. It gives rise to a coefficient of individual determination (CID) defined as the estimated fraction of explained individual variance. We argue that in many applications CID is a valuable complement to R²

R^{2}

, since it excludes noise variance (which can never be explained) and thus has one as a natural upper bound. 相似文献

10.

A robust coefficient of determination for regression

Olivier Renaud Maria-Pia Victoria-Feser 《Journal of statistical planning and inference》2010

To assess the quality of the fit in a multiple linear regression, the coefficient of determination or R² is a very simple tool, yet the most used by practitioners. Indeed, it is reported in most statistical analyzes, and although it is not recommended as a final model selection tool, it provides an indication of the suitability of the chosen explanatory variables in predicting the response. In the classical setting, it is well known that the least-squares fit and coefficient of determination can be arbitrary and/or misleading in the presence of a single outlier. In many applied settings, the assumption of normality of the errors and the absence of outliers are difficult to establish. In these cases, robust procedures for estimation and inference in linear regression are available and provide a suitable alternative. 相似文献

11.

Comparison of Goodness-of-Fit Measures in Probit Regression Model

Berna Yazici Özlem Alpu Yaning Yang 《统计学通讯:模拟与计算》2013,42(5):1061-1073

This article examines several goodness-of-fit measures in the binary probit regression model. Existing pseudo-R ² measures are reviewed, two modified and one new pseudo-R ² measure are proposed. For the probit regression model, empirical comparisons are made for different goodness-of-fit measures with the squared sample correlation coefficient of the observed response and the predicted probabilities. As an illustration, the goodness-of-fit measures are applied to a “paid labor force” data set. 相似文献

12.

Estimators of the multiple correlation coefficient: Local robustness and confidence intervals

Cristophe Croux Catherine Dehon 《Statistical Papers》2003,44(3):315-334

Many robust regression estimators are defined by minimizing a measure of spread of the residuals. An accompanying R ²-measure, or multiple correlation coefficient, is then easily obtained. In this paper, local robustness properties of these robust R ²-coefficients are investigated. It is also shown how confidence intervals for the population multiple correlation coefficient can be constructed in the case of multivariate normality. 相似文献

13.

The estimation of R 2 and adjusted R 2 in incomplete data sets using multiple imputation

Ofer Harel 《Journal of applied statistics》2009,36(10):1109-1118

The coefficient of determination, known also as the R ², is a common measure in regression analysis. Many scientists use the R ² and the adjusted R ² on a regular basis. In most cases, the researchers treat the coefficient of determination as an index of ‘usefulness’ or ‘goodness of fit,’ and in some cases, they even treat it as a model selection tool. In cases in which the data is incomplete, most researchers and common statistical software will use complete case analysis in order to estimate the R ², a procedure that might lead to biased results. In this paper, I introduce the use of multiple imputation for the estimation of R ² and adjusted R ² in incomplete data sets. I illustrate my methodology using a biomedical example. 相似文献

14.

Calibration for inverse gaussian regression

Mammo Woldie J. Leroy Folks 《统计学通讯:理论与方法》2013,42(10):2609-2620

Inverse Gaussian regression models are useful for regression data where both variables are nonnegative and the variance of the dependent variable depends on the independent variable, Zero intercept inverse Gaussian regression models are presented with non-constant variance, constant ratio of variance to the mean and constant coefficient of variation, For purposes of calibration, the prediction band is used to give point and interval estimators for the independent variable, The results are illustrated with a real data set. 相似文献

15.

R 2 Measures Based on Wald and Likelihood Ratio Joint Significance Tests

Lonnie Magee 《The American statistician》2013,67(3):250-253

Two methods are suggested for generating R ² measures for a wide class of models. These measures are linked to the R ² of the standard linear regression model through Wald and likelihood ratio statistics for testing the joint significance of the explanatory variables. Some currently used R ²'s are shown to be special cases of these methods. 相似文献

16.

The Target Parameter of Adjusted R-Squared in Fixed-Design Experiments

Hillel Bar-Gera 《The American statistician》2017,71(2):112-119

R-squared (R²) and adjusted R-squared (R²_Adj) are sometimes viewed as statistics detached from any target parameter, and sometimes as estimators for the population multiple correlation. The latter interpretation is meaningful only if the explanatory variables are random. This article proposes an alternative perspective for the case where the x’s are fixed. A new parameter is defined, in a similar fashion to the construction of R², but relying on the true parameters rather than their estimates. (The parameter definition includes also the fixed x values.) This parameter is referred to as the “parametric” coefficient of determination, and denoted by ρ²_*. The proposed ρ²_* remains stable when irrelevant variables are removed (or added), unlike the unadjusted R², which always goes up when variables, either relevant or not, are added to the model (and goes down when they are removed). The value of the traditional R²_Adj may go up or down with added (or removed) variables, either relevant or not. It is shown that the unadjusted R² overestimates ρ²_*, while the traditional R²_Adj underestimates it. It is also shown that for simple linear regression the magnitude of the bias of R²_Adj can be as high as the bias of the unadjusted R² (while their signs are opposite). Asymptotic convergence in probability of R²_Adj to ρ²_* is demonstrated. The effects of model parameters on the bias of R² and R²_Adj are characterized analytically and numerically. An alternative bi-adjusted estimator is presented and evaluated. 相似文献

17.

A study of R2 measure under the accelerated failure time models

Priscilla H. Chan Christina D. Chambers 《统计学通讯:模拟与计算》2018,47(2):380-391

For right-censored data, the accelerated failure time (AFT) model is an alternative to the commonly used proportional hazards regression model. It is a linear model for the (log-transformed) outcome of interest, and is particularly useful for censored outcomes that are not time-to-event, such as laboratory measurements. We provide a general and easily computable definition of the R² measure of explained variation under the AFT model for right-censored data. We study its behavior under different censoring scenarios and under different error distributions; in particular, we also study its robustness when the parametric error distribution is misspecified. Based on Monte Carlo investigation results, we recommend the log-normal distribution as a robust error distribution to be used in practice for the parametric AFT model, when the R² measure is of interest. We apply our methodology to an alcohol consumption during pregnancy data set from Ukraine. 相似文献

18.

Discrepancy in regression estimates between log-normal and gamma: some case studies

Rabindra Nath Das 《Journal of applied statistics》2012,39(1):97-111

In regression models with multiplicative error, estimation is often based on either the log-normal or the gamma model. It is well known that the gamma model with constant coefficient of variation and the log-normal model with constant variance give almost the same analysis. This article focuses on the discrepancies of the regression estimates between the two models based on real examples. It identifies that even though the variance or the coefficient of variation remains constant, but regression estimates may be different between the two models. It also identifies that for the same positive data set, the variance is constant under the log-normal model but non-constant under the gamma model. For this data set, the regression estimates are completely different between the two models. In the process, it explains the causes of discrepancies between the two models. 相似文献

19.

The large-sample performance of backwards variable elimination

Peter C. Austin 《Journal of applied statistics》2008,35(12):1355-1370

Prior studies have shown that automated variable selection results in models with substantially inflated estimates of the model R ², and that a large proportion of selected variables are truly noise variables. These earlier studies used simulated data sets whose sample sizes were at most 100. We used Monte Carlo simulations to examine the large-sample performance of backwards variable elimination. We found that in large samples, backwards variable elimination resulted in estimates of R ² that were at most marginally biased. However, even in large samples, backwards elimination tended to identify the correct regression model in a minority of the simulated data sets. 相似文献

20.

An Asymptotic Characterization of Finite Degree U-statistics With Sample Size-Dependent Kernels: Applications to Nonparametric Estimators and Test Statistics

Feng Yao 《统计学通讯:理论与方法》2013,42(15):3251-3265

We provide a simple result on the H-decomposition of a U-statistics that allows for easy determination of its magnitude when the statistic’s kernel depends on the sample size n. The result provides a direct and convenient method to characterize the asymptotic magnitude of semiparametric and nonparametric estimators or test statistics involving high dimensional sums. We illustrate the use of our result in previously studied estimators/test statistics and in a novel nonparametric R² test for overall significance of a nonparametric regression model. 相似文献