期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

R 2 Measures Based on Wald and Likelihood Ratio Joint Significance Tests

Lonnie Magee 《The American statistician》2013,67(3):250-253

Two methods are suggested for generating R ² measures for a wide class of models. These measures are linked to the R ² of the standard linear regression model through Wald and likelihood ratio statistics for testing the joint significance of the explanatory variables. Some currently used R ²'s are shown to be special cases of these methods. 相似文献

2.

Variability explained by covariates in linear mixed‐effect models for longitudinal data

Bo Hu Jun Shao Mari Palta 《Revue canadienne de statistique》2010,38(3):352-368

Variability explained by covariates or explained variance is a well‐known concept in assessing the importance of covariates for dependent outcomes. In this paper we study R² statistics of explained variance pertinent to longitudinal data under linear mixed‐effect models, where the R² statistics are computed at two different levels to measure, respectively, within‐ and between‐subject variabilities explained by the covariates. By deriving the limits of R² statistics, we find that the interpretation of explained variance for the existing R² statistics is clear only in the case where the covariance matrix of the outcome vector is compound symmetric. Two new R² statistics are proposed to address the effect of time‐dependent covariate means. In the general case where the outcome covariance matrix is not compound symmetric, we introduce the concept of compound symmetry projection and use it to define level‐one and level‐two R² statistics. Numerical results are provided to support the theoretical findings and demonstrate the performance of the R² statistics. The Canadian Journal of Statistics 38: 352–368; 2010 © 2010 Statistical Society of Canada 相似文献

3.

The Target Parameter of Adjusted R-Squared in Fixed-Design Experiments

Hillel Bar-Gera 《The American statistician》2017,71(2):112-119

R-squared (R²) and adjusted R-squared (R²_Adj) are sometimes viewed as statistics detached from any target parameter, and sometimes as estimators for the population multiple correlation. The latter interpretation is meaningful only if the explanatory variables are random. This article proposes an alternative perspective for the case where the x’s are fixed. A new parameter is defined, in a similar fashion to the construction of R², but relying on the true parameters rather than their estimates. (The parameter definition includes also the fixed x values.) This parameter is referred to as the “parametric” coefficient of determination, and denoted by ρ²_*. The proposed ρ²_* remains stable when irrelevant variables are removed (or added), unlike the unadjusted R², which always goes up when variables, either relevant or not, are added to the model (and goes down when they are removed). The value of the traditional R²_Adj may go up or down with added (or removed) variables, either relevant or not. It is shown that the unadjusted R² overestimates ρ²_*, while the traditional R²_Adj underestimates it. It is also shown that for simple linear regression the magnitude of the bias of R²_Adj can be as high as the bias of the unadjusted R² (while their signs are opposite). Asymptotic convergence in probability of R²_Adj to ρ²_* is demonstrated. The effects of model parameters on the bias of R² and R²_Adj are characterized analytically and numerically. An alternative bi-adjusted estimator is presented and evaluated. 相似文献

4.

Pseudo‐R2 statistics under complex sampling

下载免费PDF全文

Thomas Lumley 《Australian & New Zealand Journal of Statistics》2017,59(2):187-194

Model summaries based on the ratio of fitted and null likelihoods have been proposed for generalised linear models, reducing to the familiar R² coefficient of determination in the Gaussian model with identity link. In this note I show how to define the Cox–Snell and Nagelkerke summaries under arbitrary probability sampling designs, giving a design‐consistent estimator of the population model summary. It is also shown that for logistic regression models under case–control sampling the usual Cox–Snell and Nagelkerke R² are not design‐consistent, but are systematically larger than would be obtained with a cross‐sectional or cohort sample from the same population, even in settings where the weighted and unweighted logistic regression estimators are similar or identical. Implementation of the new estimators is straightforward and code is provided in R. 相似文献

5.

Goodness-of-fit measures of R 2 for repeated measures mixed effect models

Honghu Liu Yan Zheng Jie Shen 《Journal of applied statistics》2008,35(10):1081-1092

Linear mixed effects model (LMEM) is efficient in modeling repeated measures longitudinal data. However, little research has been done in developing goodness-of-fit measures that can evaluate the models, particularly those that can be interpreted in an absolute sense without referencing a null model. This paper proposes three coefficient of determination (R ²) as goodness-of-fit measures for LMEM with repeated measures longitudinal data. Theorems are presented describing the properties of R ² and relationships between the R ² statistics. A simulation study was conducted to evaluate and compare the R ² along with other criteria from literature. Finally, we applied the proposed R ² to a real virologic response data of an HIV-patient cohort. We conclude that our proposed R ² statistics have more advantages than other goodness-of-fit measures in the literature, in terms of robustness to sample size, intuitive interpretation, well-defined range, and unnecessary to determine a null model. 相似文献

6.

Another Cautionary Note about R 2: Its Use in Weighted Least-Squares Regression Analysis

John B. Willett Judith D. Singer 《The American statistician》2013,67(3):236-238

A recent article in this journal presented a variety of expressions for the coefficient of determination (R ²) and demonstrated that these expressions were generally not equivalent. The article discussed potential pitfalls in interpreting the R ² statistic in ordinary least-squares regression analysis. The current article extends this discussion to the case in which regression models are fit by weighted least squares and points out an additional pitfall that awaits the unwary data analyst. We show that unthinking reliance on the R ² statistic can lead to an overly optimistic interpretation of the proportion of variance accounted for in the regression. We propose a modification of the estimator and demonstrate its utility by example. 相似文献

7.

A Coefficient of Determination for Generalized Linear Models

Dabao Zhang 《The American statistician》2017,71(4):310-316

The coefficient of determination, a.k.a. R², is well-defined in linear regression models, and measures the proportion of variation in the dependent variable explained by the predictors included in the model. To extend it for generalized linear models, we use the variance function to define the total variation of the dependent variable, as well as the remaining variation of the dependent variable after modeling the predictive effects of the independent variables. Unlike other definitions that demand complete specification of the likelihood function, our definition of R² only needs to know the mean and variance functions, so applicable to more general quasi-models. It is consistent with the classical measure of uncertainty using variance, and reduces to the classical definition of the coefficient of determination when linear regression models are considered. 相似文献

8.

The estimation of R 2 and adjusted R 2 in incomplete data sets using multiple imputation

Ofer Harel 《Journal of applied statistics》2009,36(10):1109-1118

The coefficient of determination, known also as the R ², is a common measure in regression analysis. Many scientists use the R ² and the adjusted R ² on a regular basis. In most cases, the researchers treat the coefficient of determination as an index of ‘usefulness’ or ‘goodness of fit,’ and in some cases, they even treat it as a model selection tool. In cases in which the data is incomplete, most researchers and common statistical software will use complete case analysis in order to estimate the R ², a procedure that might lead to biased results. In this paper, I introduce the use of multiple imputation for the estimation of R ² and adjusted R ² in incomplete data sets. I illustrate my methodology using a biomedical example. 相似文献

9.

Estimators of the multiple correlation coefficient: Local robustness and confidence intervals

Cristophe Croux Catherine Dehon 《Statistical Papers》2003,44(3):315-334

Many robust regression estimators are defined by minimizing a measure of spread of the residuals. An accompanying R ²-measure, or multiple correlation coefficient, is then easily obtained. In this paper, local robustness properties of these robust R ²-coefficients are investigated. It is also shown how confidence intervals for the population multiple correlation coefficient can be constructed in the case of multivariate normality. 相似文献

10.

Employee turnover forecasting for human resource management based on time series analysis

Xiaojuan Zhu William Seaver Rapinder Sawhney Bruce Holt Gurudatt Bhaskar Sanil 《Journal of applied statistics》2017,44(8):1421-1440

In some organizations, the hiring lead time is often long due to responding to human resource requirements associated with technical and security constrains. Thus, the human resource departments in these organizations are pretty interested in forecasting employee turnover since a good prediction of employee turnover could help the organizations to minimize the costs and impacts from the turnover on the operational capabilities and the budget. This study aims to enhance the ability to forecast employee turnover with or without considering the impact of economic indicators. Various time series modelling techniques were used to identify optimal models for effective employee turnover prediction. More than 11-years of monthly turnover data were used to build and validate the proposed models. Compared with other models, a dynamic regression model with additive trend, seasonality, interventions, and a very important economic indicator effectively predicted the turnover with training R²?=?0.77 and holdout R²?=?0.59. The forecasting performance of optimal models confirms that time series modelling approach has the ability to predict employee turnover for the specific scenario observed in our analysis. 相似文献

11.

Approximations of the Distributions of Test Statistics for Homogeneity of a Product Multinomial Model

Nobuhiro Taneichi Yuri Sekiya 《统计学通讯:理论与方法》2013,42(10):1610-1631

Statistics R ^a based on power divergence can be used for testing the homogeneity of a product multinomial model. All R ^a have the same chi-square limiting distribution under the null hypothesis of homogeneity. R ⁰ is the log likelihood ratio statistic and R ¹ is Pearson's X ² statistic. In this article, we consider improvement of approximation of the distribution of R ^a under the homogeneity hypothesis. The expression of the asymptotic expansion of distribution of R ^a under the homogeneity hypothesis is investigated. The expression consists of continuous and discontinuous terms. Using the continuous term of the expression, a new approximation of the distribution of R ^a is proposed. A moment-corrected type of chi-square approximation is also derived. By numerical comparison, we show that both of the approximations perform much better than that of usual chi-square approximation for the statistics R ^a when a ≤ 0, which include the log likelihood ratio statistic. 相似文献

12.

The large-sample performance of backwards variable elimination

Peter C. Austin 《Journal of applied statistics》2008,35(12):1355-1370

Prior studies have shown that automated variable selection results in models with substantially inflated estimates of the model R ², and that a large proportion of selected variables are truly noise variables. These earlier studies used simulated data sets whose sample sizes were at most 100. We used Monte Carlo simulations to examine the large-sample performance of backwards variable elimination. We found that in large samples, backwards variable elimination resulted in estimates of R ² that were at most marginally biased. However, even in large samples, backwards elimination tended to identify the correct regression model in a minority of the simulated data sets. 相似文献

13.

THE MULTIPLE CORRELATION COEFFICIENT AND FISHER'S A STATISTIC1

W. N. Venable 《Australian & New Zealand Journal of Statistics》1985,27(2):172-182

Fisher's A statistic, often called the adjusted R² statistic, is shown to be a close approximation to the maximum likelihood estimate of the multiple correlation coefficient, p², based on the marginal distribution of R². Expansions for the estimate are obtained. The same methods lead to maximum marginal likelihood estimators for the noncentrality parameters for noncentral X² and F. 相似文献

14.

Fitting multiplicative models by robust alternating regressions 总被引：1，自引：0，他引：1

C. Croux P. Filzmoser G. Pison P. J. Rousseeuw 《Statistics and Computing》2003,13(1):23-36

In this paper a robust approach for fitting multiplicative models is presented. Focus is on the factor analysis model, where we will estimate factor loadings and scores by a robust alternating regression algorithm. The approach is highly robust, and also works well when there are more variables than observations. The technique yields a robust biplot, depicting the interaction structure between individuals and variables. This biplot is not predetermined by outliers, which can be retrieved from the residual plot. Also provided is an accompanying robust R ²-plot to determine the appropriate number of factors. The approach is illustrated by real and artificial examples and compared with factor analysis based on robust covariance matrix estimators. The same estimation technique can fit models with both additive and multiplicative effects (FANOVA models) to two-way tables, thereby extending the median polish technique. 相似文献

15.

Frequency of Selecting Noise Variables in Subset Regression Analysis: A Simulation Study

Virginia F. Flack Potter C. Chang 《The American statistician》2013,67(1):84-86

This article presents the results of a simulation study of variable selection in a multiple regression context that evaluates the frequency of selecting noise variables and the bias of the adjusted R ² of the selected variables when some of the candidate variables are authentic. It is demonstrated that for most samples a large percentage of the selected variables is noise, particularly when the number of candidate variables is large relative to the number of observations. The adjusted R ² of the selected variables is highly inflated. 相似文献

16.

D-minimax Second-order Designs Over Hypercubes for Extrapolation and Restricted Interpolation Regions

S. Huda R. M’Hallah 《统计学通讯:理论与方法》2013,42(21):4600-4613

The D-minimax criterion for estimating slopes of a response surface involving k factors is considered for situations where the experimental region χ and the region of interest ? are co-centered cubes but not necessarily identical. Taking χ = [ ? 1, 1]^k and ? = [ ? R, R]^k, optimal designs under the criterion for the full second-order model are derived for various values of R and their relative performances investigated. The asymptotically optimal design as R → ∞ is also derived and investigated. In addition, the optimal designs within the class of product designs are obtained. In the asymptotic case it is found that the optimal product design is given by a solution of a cubic equation that reduces to a quadratic equation for k = 3?and?6. Relative performances of various designs obtained are examined. In particular, the optimal asymptotic product design and the traditional D-optimal design are compared and it is found that the former performs very well. 相似文献

17.

A Study of Some Different Concepts of Symmetry on the Real Line

Wen-Jang Huang Hui-Yi Teng 《统计学通讯:理论与方法》2013,42(6):1042-1055

Recently, different concepts of symmetry on R ⁺ such as R-symmetry, log-symmetry, and double symmetry are studied. Analogous concepts and their properties of these symmetries on R will be studied in this work. Based on skewing representation and previous studies, characterizations of double symmetry on R will be given. Among others, some interesting examples of the so-called I-symmetry, that is the analogue of log-symmetry on R, will also be presented. 相似文献

18.

Geometric ergodicity of nonlinear autoregressive models with changing conditional variances

Min Chen Gemai Chen 《Revue canadienne de statistique》2000,28(3):605-614

The authors give easy‐to‐check sufficient conditions for the geometric ergodicity and the finiteness of the moments of a random process x_t = ?(x_t‐1,…, x_t‐p) + ?_tσ(x_t‐1,…, x_t‐q) in which ?: R^p → R, σ R^q → R and (?_t) is a sequence of independent and identically distributed random variables. They deduce strong mixing properties for this class of nonlinear autoregressive models with changing conditional variances which includes, among others, the ARCH(p), the AR(p)‐ARCH(p), and the double‐threshold autoregressive models. 相似文献

19.

R2 Bounds for Predictive Models: What Univariate Properties Tell us About Multivariate Predictability

James Mitchell Donald Robertson Stephen Wright 《商业与经济统计学杂志》2013,31(4):681-695

ABSTRACT

A long-standing puzzle in macroeconomic forecasting has been that a wide variety of multivariate models have struggled to out-predict univariate models consistently. We seek an explanation for this puzzle in terms of population properties. We derive bounds for the predictive R² of the true, but unknown, multivariate model from univariate ARMA parameters alone. These bounds can be quite tight, implying little forecasting gain even if we knew the true multivariate model. We illustrate using CPI inflation data. Supplementary materials for this article are available online. 相似文献

20.

Rényi statistics for testing composite hypotheses in general exponential models

D. Morales L. Pardo M. C. Pardo I. Vajda 《Statistics》2013,47(2):133-147

We introduce a family of Rényi statistics of orders r?∈?R for testing composite hypotheses in general exponential models, as alternatives to the previously considered generalized likelihood ratio (GLR) statistic and generalized Wald statistic. If appropriately normalized exponential models converge in a specific sense when the sample size (observation window) tends to infinity, and if the hypothesis is regular, then these statistics are shown to be χ²-distributed under the hypothesis. The corresponding Rényi tests are shown to be consistent. The exact sizes and powers of asymptotically α-size Rényi, GLR and generalized Wald tests are evaluated for a concrete hypothesis about a bivariate Lévy process and moderate observation windows. In this concrete situation the exact sizes of the Rényi test of the order r?=?2 practically coincide with those of the GLR and generalized Wald tests but the exact powers of the Rényi test are on average somewhat better. 相似文献