首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The coefficient of determination, known also as the R 2, is a common measure in regression analysis. Many scientists use the R 2 and the adjusted R 2 on a regular basis. In most cases, the researchers treat the coefficient of determination as an index of ‘usefulness’ or ‘goodness of fit,’ and in some cases, they even treat it as a model selection tool. In cases in which the data is incomplete, most researchers and common statistical software will use complete case analysis in order to estimate the R 2, a procedure that might lead to biased results. In this paper, I introduce the use of multiple imputation for the estimation of R 2 and adjusted R 2 in incomplete data sets. I illustrate my methodology using a biomedical example.  相似文献   

2.
The coefficient of determination (R 2) is perhaps the single most extensively used measure of goodness of fit for regression models. It is also widely misused. The primary source of the problem is that except for linear models with an intercept term, the several alternative R 2 statistics are not generally equivalent. This article discusses various considerations and potential pitfalls in using the R 2's. Specific points are exemplified by means of empirical data. A new resistant statistic is also introduced.  相似文献   

3.
The coefficient of determination, a.k.a. R2, is well-defined in linear regression models, and measures the proportion of variation in the dependent variable explained by the predictors included in the model. To extend it for generalized linear models, we use the variance function to define the total variation of the dependent variable, as well as the remaining variation of the dependent variable after modeling the predictive effects of the independent variables. Unlike other definitions that demand complete specification of the likelihood function, our definition of R2 only needs to know the mean and variance functions, so applicable to more general quasi-models. It is consistent with the classical measure of uncertainty using variance, and reduces to the classical definition of the coefficient of determination when linear regression models are considered.  相似文献   

4.
Clusterwise regression aims to cluster data sets where the clusters are characterized by their specific regression coefficients in a linear regression model. In this paper, we propose a method for determining a partition which uses an idea of robust regression. We start with some random weighting to determine a start partition and continue in the spirit of M-estimators. The residuals for all regressions are used to assign the observations to the different groups. As target function we use the determination coefficient R2wR^{2}_{w} for the overall model. This coefficient is suitably defined for weighted regression.  相似文献   

5.
Model summaries based on the ratio of fitted and null likelihoods have been proposed for generalised linear models, reducing to the familiar R2 coefficient of determination in the Gaussian model with identity link. In this note I show how to define the Cox–Snell and Nagelkerke summaries under arbitrary probability sampling designs, giving a design‐consistent estimator of the population model summary. It is also shown that for logistic regression models under case–control sampling the usual Cox–Snell and Nagelkerke R2 are not design‐consistent, but are systematically larger than would be obtained with a cross‐sectional or cohort sample from the same population, even in settings where the weighted and unweighted logistic regression estimators are similar or identical. Implementation of the new estimators is straightforward and code is provided in R.  相似文献   

6.
A recent article in this journal presented a variety of expressions for the coefficient of determination (R 2) and demonstrated that these expressions were generally not equivalent. The article discussed potential pitfalls in interpreting the R 2 statistic in ordinary least-squares regression analysis. The current article extends this discussion to the case in which regression models are fit by weighted least squares and points out an additional pitfall that awaits the unwary data analyst. We show that unthinking reliance on the R 2 statistic can lead to an overly optimistic interpretation of the proportion of variance accounted for in the regression. We propose a modification of the estimator and demonstrate its utility by example.  相似文献   

7.
The analysis of survival endpoints subject to right-censoring is an important research area in statistics, particularly among econometricians and biostatisticians. The two most popular semiparametric models are the proportional hazards model and the accelerated failure time (AFT) model. Rank-based estimation in the AFT model is computationally challenging due to optimization of a non-smooth loss function. Previous work has shown that rank-based estimators may be written as solutions to linear programming (LP) problems. However, the size of the LP problem is O(n 2+p) subject to n 2 linear constraints, where n denotes sample size and p denotes the dimension of parameters. As n and/or p increases, the feasibility of such solution in practice becomes questionable. Among data mining and statistical learning enthusiasts, there is interest in extending ordinary regression coefficient estimators for low-dimensions into high-dimensional data mining tools through regularization. Applying this recipe to rank-based coefficient estimators leads to formidable optimization problems which may be avoided through smooth approximations to non-smooth functions. We review smooth approximations and quasi-Newton methods for rank-based estimation in AFT models. The computational cost of our method is substantially smaller than the corresponding LP problem and can be applied to small- or large-scale problems similarly. The algorithm described here allows one to couple rank-based estimation for censored data with virtually any regularization and is exemplified through four case studies.  相似文献   

8.
The paper presents a general randomization theory approach to point and interval estimation of Q linear functions Tq = ΣN1ckqYk(q = 1,…,Q), where Y1,…,YN are values of a variable of interest Y in a finite population. Such linear functions include population and domain means and totals, population regression coefficients, etc. We assume that some auxiliary information can be exploited. This suggests the generalized regression technique based on the fit of a linear model, whereby is created approximately design unbiased estimators T?q. The paper focuses on estimation of the variance-covariance matrix of the T?q for single stage and two stage designs. Two techniques based on Taylor expansions are compared. Results of Monte-Carlo experiments (not reported here) show that the coverage properties are good of normal-theory confidence intervals flowing from one or the other variance estimate.  相似文献   

9.
For mixed regression models, we define a variance decomposition including three terms, explained individual variance, unexplained individual variance and noise variance. In contrast to traditional variance decomposition, it is thus the unexplained  , not the explained, variance that is split. It gives rise to a coefficient of individual determination (CID) defined as the estimated fraction of explained individual variance. We argue that in many applications CID is a valuable complement to R2R2, since it excludes noise variance (which can never be explained) and thus has one as a natural upper bound.  相似文献   

10.
A partially time-varying coefficient time series model is introduced to characterize the nonlinearity and trending phenomenon. To estimate the regression parameter and the nonlinear coefficient function, the profile least squares approach is applied with the help of local linear approximation. The asymptotic distributions of the proposed estimators are established under mild conditions. Meanwhile, the generalized likelihood ratio test is studied and the test statistics are demonstrated to follow asymptotic χ2-distribution under the null hypothesis. Furthermore, some extensions of the proposed model are discussed and several numerical examples are provided to illustrate the finite sample behavior of the proposed methods.  相似文献   

11.
12.
We examine the effects of modelling errors, such as underfitting and overfitting, on the asymptotic power of tests of association between an explanatory variable x and an outcome in the setting of generalized linear models. The regression function for x is approximated by a polynomial or another simple function, and a chi-square statistic is used to test whether the coefficients of the approximation are simultaneously equal to zero. Adding terms to the approximation increases asymptotic power if and only if the fit of the model increases by a certain quantifiable amount. Although a high degree of freedom approximation offers robustness to the shape of the unknown regression function, a low degree of freedom approximation can yield much higher asymptotic power even when the approximation is very poor. In practice, it is useful to compute the power of competing test statistics across the range of alternatives that are plausible a priori. This approach is illustrated through an application in epidemiology.  相似文献   

13.
In this paper, by considering a 2n-dimensional elliptically contoured random vector (XT,YT)T=(X1,…,Xn,Y1,…,Yn)T, we derive the exact joint distribution of linear combinations of concomitants of order statistics arising from X. Specifically, we establish a mixture representation for the distribution of the rth concomitant order statistic, and also for the joint distribution of the rth order statistic and its concomitant. We show that these distributions are indeed mixtures of multivariate unified skew-elliptical distributions. The two most important special cases of multivariate normal and multivariate t distributions are then discussed in detail. Finally, an application of the established results in an inferential problem is outlined.  相似文献   

14.
ABSTRACT

Stepwise regression building procedures are commonly used applied statistical tools, despite their well-known drawbacks. While many of their limitations have been widely discussed in the literature, other aspects of the use of individual statistical fit measures, especially in high-dimensional stepwise regression settings, have not. Giving primacy to individual fit, as is done with p-values and R2, when group fit may be the larger concern, can lead to misguided decision making. One of the most consequential uses of stepwise regression is in health care, where these tools allocate hundreds of billions of dollars to health plans enrolling individuals with different predicted health care costs. The main goal of this “risk adjustment” system is to convey incentives to health plans such that they provide health care services fairly, a component of which is not to discriminate in access or care for persons or groups likely to be expensive. We address some specific limitations of p-values and R2 for high-dimensional stepwise regression in this policy problem through an illustrated example by additionally considering a group-level fairness metric.  相似文献   

15.
The asymptotic distribution of certain tests of fit to the exponential distribution is obtained. The tests are based on regression of the order statistics on their expectations under a standard exponential distribution. Asymptotic normality at the rate (log n)1/2 is obtained for a family of statistics including the correlation coefficient.  相似文献   

16.
Local linear regression involves fitting a straight line segment over a small region whose midpoint is the target point x, and the local linear estimate at x   is the estimated intercept of that straight line segment, with an asymptotic bias of order h2h2 and variance of order (nh)-1(nh)-1 (h is the bandwidth). In this paper, we propose a new estimator, the double-smoothing local linear estimator, which is constructed by integrally combining all fitted values at x   of local lines in its neighborhood with another round of smoothing. The proposed estimator attempts to make use of all information obtained from fitting local lines. Without changing the order of variance, the new estimator can reduce the bias to an order of h4h4. The proposed estimator has better performance than local linear regression in situations with considerable bias effects; it also has less variability and more easily overcomes the sparse data problem than local cubic regression. At boundary points, the proposed estimator is comparable to local linear regression. Simulation studies are conducted and an ethanol example is used to compare the new approach with other competitive methods.  相似文献   

17.
This paper discusses a class of tests of lack-of-fit of a parametric regression model when design is non-random and uniform on [0,1]. These tests are based on certain minimized distances between a nonparametric regression function estimator and the parametric model being fitted. We investigate asymptotic null distributions of the proposed tests, their consistency and asymptotic power against a large class of fixed and sequences of local nonparametric alternatives, respectively. The best fitted parameter estimate is seen to be n1/2-consistent and asymptotically normal. A crucial result needed for proving these results is a central limit lemma for weighted degenerate U statistics where the weights are arrays of some non-random real numbers. This result is of an independent interest and an extension of a result of Hall for non-weighted degenerate U statistics.  相似文献   

18.
ABSTRACT

We develop splice plots as a diagnostic tool for parametric generalized linear models. Splice plots use the independence of the outcome and explanatory measures given the regression function. Plotting differences between the estimated parametric regression function and non-parametric estimates of the regression function computed in small neighborhoods of the fitted values from the parametric model can be used to assess model fit.  相似文献   

19.
The econometrics literature contains many alternative measures of goodness of fit, roughly analogous to R 2, for use with equations with dichotomous dependent variables. There is, however, no consensus as to the measures' relative merits or about which ones should be reported in empirical work. This article proposes a new measure that possesses several useful properties that the other measures lack. The new measure may be interpreted intuitively in a similar way to R 2 in the linear regression context.  相似文献   

20.
In this paper, we study M-estimators of regression parameters in semiparametric linear models for censored data. A class of consistent and asymptotically normal M-estimators is constructed. A resampling method is developed for the estimation of the asymptotic covariance matrix of the estimators.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号