期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Goodness-of-fit measures of R 2 for repeated measures mixed effect models

Honghu Liu Yan Zheng Jie Shen 《Journal of applied statistics》2008,35(10):1081-1092

Linear mixed effects model (LMEM) is efficient in modeling repeated measures longitudinal data. However, little research has been done in developing goodness-of-fit measures that can evaluate the models, particularly those that can be interpreted in an absolute sense without referencing a null model. This paper proposes three coefficient of determination (R ²) as goodness-of-fit measures for LMEM with repeated measures longitudinal data. Theorems are presented describing the properties of R ² and relationships between the R ² statistics. A simulation study was conducted to evaluate and compare the R ² along with other criteria from literature. Finally, we applied the proposed R ² to a real virologic response data of an HIV-patient cohort. We conclude that our proposed R ² statistics have more advantages than other goodness-of-fit measures in the literature, in terms of robustness to sample size, intuitive interpretation, well-defined range, and unnecessary to determine a null model. 相似文献

2.

Comparison of Goodness-of-Fit Measures in Probit Regression Model

Berna Yazici Özlem Alpu Yaning Yang 《统计学通讯:模拟与计算》2013,42(5):1061-1073

This article examines several goodness-of-fit measures in the binary probit regression model. Existing pseudo-R ² measures are reviewed, two modified and one new pseudo-R ² measure are proposed. For the probit regression model, empirical comparisons are made for different goodness-of-fit measures with the squared sample correlation coefficient of the observed response and the predicted probabilities. As an illustration, the goodness-of-fit measures are applied to a “paid labor force” data set. 相似文献

3.

A New Measure of Fit for Equations With Dichotomous Dependent Variables

Arturo Estrella 《商业与经济统计学杂志》2013,31(2):198-205

The econometrics literature contains many alternative measures of goodness of fit, roughly analogous to R ², for use with equations with dichotomous dependent variables. There is, however, no consensus as to the measures' relative merits or about which ones should be reported in empirical work. This article proposes a new measure that possesses several useful properties that the other measures lack. The new measure may be interpreted intuitively in a similar way to R ² in the linear regression context. 相似文献

4.

THE MULTIPLE CORRELATION COEFFICIENT AND FISHER'S A STATISTIC1

W. N. Venable 《Australian & New Zealand Journal of Statistics》1985,27(2):172-182

Fisher's A statistic, often called the adjusted R² statistic, is shown to be a close approximation to the maximum likelihood estimate of the multiple correlation coefficient, p², based on the marginal distribution of R². Expansions for the estimate are obtained. The same methods lead to maximum marginal likelihood estimators for the noncentrality parameters for noncentral X² and F. 相似文献

5.

The Target Parameter of Adjusted R-Squared in Fixed-Design Experiments

Hillel Bar-Gera 《The American statistician》2017,71(2):112-119

R-squared (R²) and adjusted R-squared (R²_Adj) are sometimes viewed as statistics detached from any target parameter, and sometimes as estimators for the population multiple correlation. The latter interpretation is meaningful only if the explanatory variables are random. This article proposes an alternative perspective for the case where the x’s are fixed. A new parameter is defined, in a similar fashion to the construction of R², but relying on the true parameters rather than their estimates. (The parameter definition includes also the fixed x values.) This parameter is referred to as the “parametric” coefficient of determination, and denoted by ρ²_*. The proposed ρ²_* remains stable when irrelevant variables are removed (or added), unlike the unadjusted R², which always goes up when variables, either relevant or not, are added to the model (and goes down when they are removed). The value of the traditional R²_Adj may go up or down with added (or removed) variables, either relevant or not. It is shown that the unadjusted R² overestimates ρ²_*, while the traditional R²_Adj underestimates it. It is also shown that for simple linear regression the magnitude of the bias of R²_Adj can be as high as the bias of the unadjusted R² (while their signs are opposite). Asymptotic convergence in probability of R²_Adj to ρ²_* is demonstrated. The effects of model parameters on the bias of R² and R²_Adj are characterized analytically and numerically. An alternative bi-adjusted estimator is presented and evaluated. 相似文献

6.

Variability explained by covariates in linear mixed‐effect models for longitudinal data

Bo Hu Jun Shao Mari Palta 《Revue canadienne de statistique》2010,38(3):352-368

Variability explained by covariates or explained variance is a well‐known concept in assessing the importance of covariates for dependent outcomes. In this paper we study R² statistics of explained variance pertinent to longitudinal data under linear mixed‐effect models, where the R² statistics are computed at two different levels to measure, respectively, within‐ and between‐subject variabilities explained by the covariates. By deriving the limits of R² statistics, we find that the interpretation of explained variance for the existing R² statistics is clear only in the case where the covariance matrix of the outcome vector is compound symmetric. Two new R² statistics are proposed to address the effect of time‐dependent covariate means. In the general case where the outcome covariance matrix is not compound symmetric, we introduce the concept of compound symmetry projection and use it to define level‐one and level‐two R² statistics. Numerical results are provided to support the theoretical findings and demonstrate the performance of the R² statistics. The Canadian Journal of Statistics 38: 352–368; 2010 © 2010 Statistical Society of Canada 相似文献

7.

A Coefficient of Determination for Generalized Linear Models

Dabao Zhang 《The American statistician》2017,71(4):310-316

The coefficient of determination, a.k.a. R², is well-defined in linear regression models, and measures the proportion of variation in the dependent variable explained by the predictors included in the model. To extend it for generalized linear models, we use the variance function to define the total variation of the dependent variable, as well as the remaining variation of the dependent variable after modeling the predictive effects of the independent variables. Unlike other definitions that demand complete specification of the likelihood function, our definition of R² only needs to know the mean and variance functions, so applicable to more general quasi-models. It is consistent with the classical measure of uncertainty using variance, and reduces to the classical definition of the coefficient of determination when linear regression models are considered. 相似文献

8.

Cautionary Note about R 2

Tarald O. Kvålseth 《The American statistician》2013,67(4):279-285

The coefficient of determination (R ²) is perhaps the single most extensively used measure of goodness of fit for regression models. It is also widely misused. The primary source of the problem is that except for linear models with an intercept term, the several alternative R ² statistics are not generally equivalent. This article discusses various considerations and potential pitfalls in using the R ²'s. Specific points are exemplified by means of empirical data. A new resistant statistic is also introduced. 相似文献

9.

The estimation of R 2 and adjusted R 2 in incomplete data sets using multiple imputation

Ofer Harel 《Journal of applied statistics》2009,36(10):1109-1118

The coefficient of determination, known also as the R ², is a common measure in regression analysis. Many scientists use the R ² and the adjusted R ² on a regular basis. In most cases, the researchers treat the coefficient of determination as an index of ‘usefulness’ or ‘goodness of fit,’ and in some cases, they even treat it as a model selection tool. In cases in which the data is incomplete, most researchers and common statistical software will use complete case analysis in order to estimate the R ², a procedure that might lead to biased results. In this paper, I introduce the use of multiple imputation for the estimation of R ² and adjusted R ² in incomplete data sets. I illustrate my methodology using a biomedical example. 相似文献

10.

Another Cautionary Note about R 2: Its Use in Weighted Least-Squares Regression Analysis

John B. Willett Judith D. Singer 《The American statistician》2013,67(3):236-238

A recent article in this journal presented a variety of expressions for the coefficient of determination (R ²) and demonstrated that these expressions were generally not equivalent. The article discussed potential pitfalls in interpreting the R ² statistic in ordinary least-squares regression analysis. The current article extends this discussion to the case in which regression models are fit by weighted least squares and points out an additional pitfall that awaits the unwary data analyst. We show that unthinking reliance on the R ² statistic can lead to an overly optimistic interpretation of the proportion of variance accounted for in the regression. We propose a modification of the estimator and demonstrate its utility by example. 相似文献

11.

Limitations of P-Values and R-squared for Stepwise Regression Building: A Fairness Demonstration in Health Policy Risk Adjustment

Sherri Rose Thomas G. McGuire 《The American statistician》2019,73(1):152-156

ABSTRACT

Stepwise regression building procedures are commonly used applied statistical tools, despite their well-known drawbacks. While many of their limitations have been widely discussed in the literature, other aspects of the use of individual statistical fit measures, especially in high-dimensional stepwise regression settings, have not. Giving primacy to individual fit, as is done with p-values and R², when group fit may be the larger concern, can lead to misguided decision making. One of the most consequential uses of stepwise regression is in health care, where these tools allocate hundreds of billions of dollars to health plans enrolling individuals with different predicted health care costs. The main goal of this “risk adjustment” system is to convey incentives to health plans such that they provide health care services fairly, a component of which is not to discriminate in access or care for persons or groups likely to be expensive. We address some specific limitations of p-values and R² for high-dimensional stepwise regression in this policy problem through an illustrated example by additionally considering a group-level fairness metric. 相似文献

12.

Estimators of the multiple correlation coefficient: Local robustness and confidence intervals

Cristophe Croux Catherine Dehon 《Statistical Papers》2003,44(3):315-334

Many robust regression estimators are defined by minimizing a measure of spread of the residuals. An accompanying R ²-measure, or multiple correlation coefficient, is then easily obtained. In this paper, local robustness properties of these robust R ²-coefficients are investigated. It is also shown how confidence intervals for the population multiple correlation coefficient can be constructed in the case of multivariate normality. 相似文献

13.

Approximations of the Distributions of Test Statistics for Homogeneity of a Product Multinomial Model

Nobuhiro Taneichi Yuri Sekiya 《统计学通讯:理论与方法》2013,42(10):1610-1631

Statistics R ^a based on power divergence can be used for testing the homogeneity of a product multinomial model. All R ^a have the same chi-square limiting distribution under the null hypothesis of homogeneity. R ⁰ is the log likelihood ratio statistic and R ¹ is Pearson's X ² statistic. In this article, we consider improvement of approximation of the distribution of R ^a under the homogeneity hypothesis. The expression of the asymptotic expansion of distribution of R ^a under the homogeneity hypothesis is investigated. The expression consists of continuous and discontinuous terms. Using the continuous term of the expression, a new approximation of the distribution of R ^a is proposed. A moment-corrected type of chi-square approximation is also derived. By numerical comparison, we show that both of the approximations perform much better than that of usual chi-square approximation for the statistics R ^a when a ≤ 0, which include the log likelihood ratio statistic. 相似文献

14.

Quantifying R 2 bias in the presence of measurement error

Karl D. Majeske Terri Lynch-Caris Janet Brelin-Fornari 《Journal of applied statistics》2010,37(4):667-677

相似文献

15.

Frequency of Selecting Noise Variables in Subset Regression Analysis: A Simulation Study

Virginia F. Flack Potter C. Chang 《The American statistician》2013,67(1):84-86

This article presents the results of a simulation study of variable selection in a multiple regression context that evaluates the frequency of selecting noise variables and the bias of the adjusted R ² of the selected variables when some of the candidate variables are authentic. It is demonstrated that for most samples a large percentage of the selected variables is noise, particularly when the number of candidate variables is large relative to the number of observations. The adjusted R ² of the selected variables is highly inflated. 相似文献

16.

Random sequential packing in Rn

《Journal of Statistical Computation and Simulation》2012,82(2):87-93

We introduce a Monte Carlo method for packing hypercubes in Rⁿ . Rigorous and conceptually simple, it is currently practical for n≥4. Experimental results indicate that Palasti's conjecture is false for R ² and K³ and still undecided for K⁴ 相似文献

17.

Deux méthodes d'estimation pour les paramètres de processus moyenne mobile spatiaux

Luc D. Adjengue Marc Moore 《Revue canadienne de statistique》1999,27(4):795-818

We consider moving average processes, {X_s, s ∈ ??}, where ?? is a triangular lattice in the plane R^2. To estimate the parameters of such processes, Adjengue & Moore (1993) have considered likelihood and gaussian pseudo-likelihood methods. We consider here two other methods. The first one is based on the estimation of the correlations and the relation between these correlations and the parameters of the process. The second relies on a linear approximation of the process. The asymptotic properties of the proposed estimators are analyzed and compared. A simulation study allows us to compare the estimators for fixed sample sizes. 相似文献

18.

A Note on Screening Regression Equations

David A. Freedman Professor David A. Freedman Professor 《The American statistician》2013,67(2):152-155

Consider developing a regression model in a context where substantive theory is weak. To focus on an extreme case, suppose that in fact there is no relationship between the dependent variable and the explanatory variables. Even so, if there are many explanatory variables, the R ² will be high. If explanatory variables with small t statistics are dropped and the equation refitted, the R ² will stay high and the overall F will become highly significant. This is demonstrated by simulation and by asymptotic calculation. 相似文献

19.

Limiting behavior of randomly weighted averages of symmetric heavy-tailed random variables

Rasool Roozegar 《统计学通讯:理论与方法》2017,46(9):4539-4544

In this paper we consider a sequence of independent continuous symmetric random variables X₁, X₂, …, with heavy-tailed distributions. Then we focus on limiting behavior of randomly weighted averages S_n = R⁽ⁿ⁾₁X₁ + ??? + R⁽ⁿ⁾_nX_n, where the random weights R⁽ⁿ⁾₁, …, R_n⁽ⁿ⁾ which are independent of X₁, X₂, …, X_n, are the cuts of (0, 1) by the n ? 1 order statistics from a uniform distribution. Indeed we prove that c_nS_n converges in distribution to a symmetric α-stable random variable with c_n = n^{1 ? 1/α}/Γ^1/α(α + 1). 相似文献

20.

Catanova for multidimensional contingency tables: Nominal-scale response

Robert J. Anderson J. Richard Landis 《统计学通讯:理论与方法》2013,42(11):1191-1206

This paper extends an analysis of variance for categorical data (CATANOVA) procedure to multidimensional contingency tables involving several factors and a response variable measured on a nominal scale. Using an appropriate measure of total variation for multinomial data, partial and multiple association measures are developed as R² quantities which parallel the analogous statistics in multiple linear regression for quantitative data. In addition, test statistics are derived in terms of these R² criteria. Finally, this CATANOVA approach is illustrated within the context of 2 three-way contingency table from a multicenter clinicaltrial. 相似文献