首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Two methods are suggested for generating R 2 measures for a wide class of models. These measures are linked to the R 2 of the standard linear regression model through Wald and likelihood ratio statistics for testing the joint significance of the explanatory variables. Some currently used R 2's are shown to be special cases of these methods.  相似文献   

2.
Variability explained by covariates or explained variance is a well‐known concept in assessing the importance of covariates for dependent outcomes. In this paper we study R2 statistics of explained variance pertinent to longitudinal data under linear mixed‐effect models, where the R2 statistics are computed at two different levels to measure, respectively, within‐ and between‐subject variabilities explained by the covariates. By deriving the limits of R2 statistics, we find that the interpretation of explained variance for the existing R2 statistics is clear only in the case where the covariance matrix of the outcome vector is compound symmetric. Two new R2 statistics are proposed to address the effect of time‐dependent covariate means. In the general case where the outcome covariance matrix is not compound symmetric, we introduce the concept of compound symmetry projection and use it to define level‐one and level‐two R2 statistics. Numerical results are provided to support the theoretical findings and demonstrate the performance of the R2 statistics. The Canadian Journal of Statistics 38: 352–368; 2010 © 2010 Statistical Society of Canada  相似文献   

3.
R-squared (R2) and adjusted R-squared (R2Adj) are sometimes viewed as statistics detached from any target parameter, and sometimes as estimators for the population multiple correlation. The latter interpretation is meaningful only if the explanatory variables are random. This article proposes an alternative perspective for the case where the x’s are fixed. A new parameter is defined, in a similar fashion to the construction of R2, but relying on the true parameters rather than their estimates. (The parameter definition includes also the fixed x values.) This parameter is referred to as the “parametric” coefficient of determination, and denoted by ρ2*. The proposed ρ2* remains stable when irrelevant variables are removed (or added), unlike the unadjusted R2, which always goes up when variables, either relevant or not, are added to the model (and goes down when they are removed). The value of the traditional R2Adj may go up or down with added (or removed) variables, either relevant or not. It is shown that the unadjusted R2 overestimates ρ2*, while the traditional R2Adj underestimates it. It is also shown that for simple linear regression the magnitude of the bias of R2Adj can be as high as the bias of the unadjusted R2 (while their signs are opposite). Asymptotic convergence in probability of R2Adj to ρ2* is demonstrated. The effects of model parameters on the bias of R2 and R2Adj are characterized analytically and numerically. An alternative bi-adjusted estimator is presented and evaluated.  相似文献   

4.
Model summaries based on the ratio of fitted and null likelihoods have been proposed for generalised linear models, reducing to the familiar R2 coefficient of determination in the Gaussian model with identity link. In this note I show how to define the Cox–Snell and Nagelkerke summaries under arbitrary probability sampling designs, giving a design‐consistent estimator of the population model summary. It is also shown that for logistic regression models under case–control sampling the usual Cox–Snell and Nagelkerke R2 are not design‐consistent, but are systematically larger than would be obtained with a cross‐sectional or cohort sample from the same population, even in settings where the weighted and unweighted logistic regression estimators are similar or identical. Implementation of the new estimators is straightforward and code is provided in R.  相似文献   

5.
Linear mixed effects model (LMEM) is efficient in modeling repeated measures longitudinal data. However, little research has been done in developing goodness-of-fit measures that can evaluate the models, particularly those that can be interpreted in an absolute sense without referencing a null model. This paper proposes three coefficient of determination (R 2) as goodness-of-fit measures for LMEM with repeated measures longitudinal data. Theorems are presented describing the properties of R 2 and relationships between the R 2 statistics. A simulation study was conducted to evaluate and compare the R 2 along with other criteria from literature. Finally, we applied the proposed R 2 to a real virologic response data of an HIV-patient cohort. We conclude that our proposed R 2 statistics have more advantages than other goodness-of-fit measures in the literature, in terms of robustness to sample size, intuitive interpretation, well-defined range, and unnecessary to determine a null model.  相似文献   

6.
A recent article in this journal presented a variety of expressions for the coefficient of determination (R 2) and demonstrated that these expressions were generally not equivalent. The article discussed potential pitfalls in interpreting the R 2 statistic in ordinary least-squares regression analysis. The current article extends this discussion to the case in which regression models are fit by weighted least squares and points out an additional pitfall that awaits the unwary data analyst. We show that unthinking reliance on the R 2 statistic can lead to an overly optimistic interpretation of the proportion of variance accounted for in the regression. We propose a modification of the estimator and demonstrate its utility by example.  相似文献   

7.
The coefficient of determination, a.k.a. R2, is well-defined in linear regression models, and measures the proportion of variation in the dependent variable explained by the predictors included in the model. To extend it for generalized linear models, we use the variance function to define the total variation of the dependent variable, as well as the remaining variation of the dependent variable after modeling the predictive effects of the independent variables. Unlike other definitions that demand complete specification of the likelihood function, our definition of R2 only needs to know the mean and variance functions, so applicable to more general quasi-models. It is consistent with the classical measure of uncertainty using variance, and reduces to the classical definition of the coefficient of determination when linear regression models are considered.  相似文献   

8.
The coefficient of determination, known also as the R 2, is a common measure in regression analysis. Many scientists use the R 2 and the adjusted R 2 on a regular basis. In most cases, the researchers treat the coefficient of determination as an index of ‘usefulness’ or ‘goodness of fit,’ and in some cases, they even treat it as a model selection tool. In cases in which the data is incomplete, most researchers and common statistical software will use complete case analysis in order to estimate the R 2, a procedure that might lead to biased results. In this paper, I introduce the use of multiple imputation for the estimation of R 2 and adjusted R 2 in incomplete data sets. I illustrate my methodology using a biomedical example.  相似文献   

9.
Many robust regression estimators are defined by minimizing a measure of spread of the residuals. An accompanying R 2-measure, or multiple correlation coefficient, is then easily obtained. In this paper, local robustness properties of these robust R 2-coefficients are investigated. It is also shown how confidence intervals for the population multiple correlation coefficient can be constructed in the case of multivariate normality.  相似文献   

10.
In some organizations, the hiring lead time is often long due to responding to human resource requirements associated with technical and security constrains. Thus, the human resource departments in these organizations are pretty interested in forecasting employee turnover since a good prediction of employee turnover could help the organizations to minimize the costs and impacts from the turnover on the operational capabilities and the budget. This study aims to enhance the ability to forecast employee turnover with or without considering the impact of economic indicators. Various time series modelling techniques were used to identify optimal models for effective employee turnover prediction. More than 11-years of monthly turnover data were used to build and validate the proposed models. Compared with other models, a dynamic regression model with additive trend, seasonality, interventions, and a very important economic indicator effectively predicted the turnover with training R2?=?0.77 and holdout R2?=?0.59. The forecasting performance of optimal models confirms that time series modelling approach has the ability to predict employee turnover for the specific scenario observed in our analysis.  相似文献   

11.
Statistics R a based on power divergence can be used for testing the homogeneity of a product multinomial model. All R a have the same chi-square limiting distribution under the null hypothesis of homogeneity. R 0 is the log likelihood ratio statistic and R 1 is Pearson's X 2 statistic. In this article, we consider improvement of approximation of the distribution of R a under the homogeneity hypothesis. The expression of the asymptotic expansion of distribution of R a under the homogeneity hypothesis is investigated. The expression consists of continuous and discontinuous terms. Using the continuous term of the expression, a new approximation of the distribution of R a is proposed. A moment-corrected type of chi-square approximation is also derived. By numerical comparison, we show that both of the approximations perform much better than that of usual chi-square approximation for the statistics R a when a ≤ 0, which include the log likelihood ratio statistic.  相似文献   

12.
Prior studies have shown that automated variable selection results in models with substantially inflated estimates of the model R 2, and that a large proportion of selected variables are truly noise variables. These earlier studies used simulated data sets whose sample sizes were at most 100. We used Monte Carlo simulations to examine the large-sample performance of backwards variable elimination. We found that in large samples, backwards variable elimination resulted in estimates of R 2 that were at most marginally biased. However, even in large samples, backwards elimination tended to identify the correct regression model in a minority of the simulated data sets.  相似文献   

13.
Fisher's A statistic, often called the adjusted R2 statistic, is shown to be a close approximation to the maximum likelihood estimate of the multiple correlation coefficient, p2, based on the marginal distribution of R2. Expansions for the estimate are obtained. The same methods lead to maximum marginal likelihood estimators for the noncentrality parameters for noncentral X2 and F.  相似文献   

14.
Fitting multiplicative models by robust alternating regressions   总被引:1,自引:0,他引:1  
In this paper a robust approach for fitting multiplicative models is presented. Focus is on the factor analysis model, where we will estimate factor loadings and scores by a robust alternating regression algorithm. The approach is highly robust, and also works well when there are more variables than observations. The technique yields a robust biplot, depicting the interaction structure between individuals and variables. This biplot is not predetermined by outliers, which can be retrieved from the residual plot. Also provided is an accompanying robust R 2-plot to determine the appropriate number of factors. The approach is illustrated by real and artificial examples and compared with factor analysis based on robust covariance matrix estimators. The same estimation technique can fit models with both additive and multiplicative effects (FANOVA models) to two-way tables, thereby extending the median polish technique.  相似文献   

15.
This article presents the results of a simulation study of variable selection in a multiple regression context that evaluates the frequency of selecting noise variables and the bias of the adjusted R 2 of the selected variables when some of the candidate variables are authentic. It is demonstrated that for most samples a large percentage of the selected variables is noise, particularly when the number of candidate variables is large relative to the number of observations. The adjusted R 2 of the selected variables is highly inflated.  相似文献   

16.
The D-minimax criterion for estimating slopes of a response surface involving k factors is considered for situations where the experimental region χ and the region of interest ? are co-centered cubes but not necessarily identical. Taking χ = [ ? 1, 1]k and ? = [ ? R, R]k, optimal designs under the criterion for the full second-order model are derived for various values of R and their relative performances investigated. The asymptotically optimal design as R → ∞ is also derived and investigated. In addition, the optimal designs within the class of product designs are obtained. In the asymptotic case it is found that the optimal product design is given by a solution of a cubic equation that reduces to a quadratic equation for k = 3?and?6. Relative performances of various designs obtained are examined. In particular, the optimal asymptotic product design and the traditional D-optimal design are compared and it is found that the former performs very well.  相似文献   

17.
Recently, different concepts of symmetry on R + such as R-symmetry, log-symmetry, and double symmetry are studied. Analogous concepts and their properties of these symmetries on R will be studied in this work. Based on skewing representation and previous studies, characterizations of double symmetry on R will be given. Among others, some interesting examples of the so-called I-symmetry, that is the analogue of log-symmetry on R, will also be presented.  相似文献   

18.
The authors give easy‐to‐check sufficient conditions for the geometric ergodicity and the finiteness of the moments of a random process xt = ?(xt‐1,…, xt‐p) + ?tσ(xt‐1,…, xt‐q) in which ?: Rp → R, σ Rq → R and (?t) is a sequence of independent and identically distributed random variables. They deduce strong mixing properties for this class of nonlinear autoregressive models with changing conditional variances which includes, among others, the ARCH(p), the AR(p)‐ARCH(p), and the double‐threshold autoregressive models.  相似文献   

19.
ABSTRACT

A long-standing puzzle in macroeconomic forecasting has been that a wide variety of multivariate models have struggled to out-predict univariate models consistently. We seek an explanation for this puzzle in terms of population properties. We derive bounds for the predictive R2 of the true, but unknown, multivariate model from univariate ARMA parameters alone. These bounds can be quite tight, implying little forecasting gain even if we knew the true multivariate model. We illustrate using CPI inflation data. Supplementary materials for this article are available online.  相似文献   

20.
We introduce a family of Rényi statistics of orders r?∈?R for testing composite hypotheses in general exponential models, as alternatives to the previously considered generalized likelihood ratio (GLR) statistic and generalized Wald statistic. If appropriately normalized exponential models converge in a specific sense when the sample size (observation window) tends to infinity, and if the hypothesis is regular, then these statistics are shown to be χ2-distributed under the hypothesis. The corresponding Rényi tests are shown to be consistent. The exact sizes and powers of asymptotically α-size Rényi, GLR and generalized Wald tests are evaluated for a concrete hypothesis about a bivariate Lévy process and moderate observation windows. In this concrete situation the exact sizes of the Rényi test of the order r?=?2 practically coincide with those of the GLR and generalized Wald tests but the exact powers of the Rényi test are on average somewhat better.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号