首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This article examines several goodness-of-fit measures in the binary probit regression model. Existing pseudo-R 2 measures are reviewed, two modified and one new pseudo-R 2 measure are proposed. For the probit regression model, empirical comparisons are made for different goodness-of-fit measures with the squared sample correlation coefficient of the observed response and the predicted probabilities. As an illustration, the goodness-of-fit measures are applied to a “paid labor force” data set.  相似文献   

2.
ABSTRACT

Stepwise regression building procedures are commonly used applied statistical tools, despite their well-known drawbacks. While many of their limitations have been widely discussed in the literature, other aspects of the use of individual statistical fit measures, especially in high-dimensional stepwise regression settings, have not. Giving primacy to individual fit, as is done with p-values and R2, when group fit may be the larger concern, can lead to misguided decision making. One of the most consequential uses of stepwise regression is in health care, where these tools allocate hundreds of billions of dollars to health plans enrolling individuals with different predicted health care costs. The main goal of this “risk adjustment” system is to convey incentives to health plans such that they provide health care services fairly, a component of which is not to discriminate in access or care for persons or groups likely to be expensive. We address some specific limitations of p-values and R2 for high-dimensional stepwise regression in this policy problem through an illustrated example by additionally considering a group-level fairness metric.  相似文献   

3.
Suppose that we have a nonparametric regression model Y = m(X) + ε with XRp, where X is a random design variable and is observed completely, and Y is the response variable and some Y-values are missing at random. Based on the “complete” data sets for Y after nonaprametric regression imputation and inverse probability weighted imputation, two estimators of the regression function m(x0) for fixed x0Rp are proposed. Asymptotic normality of two estimators is established, which is used to construct normal approximation-based confidence intervals for m(x0). We also construct an empirical likelihood (EL) statistic for m(x0) with limiting distribution of χ21, which is used to construct an EL confidence interval for m(x0).  相似文献   

4.
Teresa Ledwina 《Statistics》2013,47(1):105-118
We state some necessary and sufficient conditions for admissibility of tests for a simple and a composite null hypotheses against ”one-sided” alternatives on multivariate exponential distributions with discrete support.

Admissibility of the maximum likelihood test for “one –sided” alternatives and z χ2test for the independence hypothesis in r× scontingency tables is deduced among others.  相似文献   

5.
ABSTRACT

We study the asymptotic properties of the standard GMM estimator when additional moment restrictions, weaker than the original ones, are available. We provide conditions under which these additional weaker restrictions improve the efficiency of the GMM estimator. To detect “spurious” identification that may come from invalid moments, we rely on the Hansen J-test that assesses the compatibility between existing restrictions and additional ones. Our simulations reveal that the J-test has good power properties and that its power increases with the weakness of the additional restrictions. Our theoretical characterization of the J-test provides some intuition for why that is.  相似文献   

6.
This article presents the “centered” method for establishing cell boundaries in the X 2 goodness-of-fit test, which when applied to common stock returns significantly reduces the high bias of the test statistic associated with the traditional Mann–Wald equiprobable approach. A modified null hypothesis is proposed to incorporate explicitly the usually implicit assumption that the observed discrete returns are “approximated” by the hypothesized continuous density. Simulation results indicate extremely biased X 2 values resulting from the traditional approach, particularly for low-priced and low volatile stocks. Daily stock returns for 114 firms are tested to determine whether they are approximated by a normal or one of several normal mixture densities. Results indicate a significantly higher degree of fit than that reported elsewhere to date.  相似文献   

7.
Nonparametric regression can be considered as a problem of model choice. In this article, we present the results of a simulation study in which several nonparametric regression techniques including wavelets and kernel methods are compared with respect to their behavior on different test beds. We also include the taut-string method whose aim is not to minimize the distance of an estimator to some “true” generating function f but to provide a simple adequate approximation to the data. Test beds are situations where a “true” generating f exists and in this situation it is possible to compare the estimates of f with f itself. The measures of performance we use are the L2- and the L-norms and the ability to identify peaks.  相似文献   

8.
We propose a simple method for evaluating the model that has been chosen by an adaptive regression procedure, our main focus being the lasso. This procedure deletes each chosen predictor and refits the lasso to get a set of models that are “close” to the chosen “base model,” and compares the error rates of the base model with that of nearby models. If the deletion of a predictor leads to significant deterioration in the model's predictive power, the predictor is called indispensable; otherwise, the nearby model is called acceptable and can serve as a good alternative to the base model. This provides both an assessment of the predictive contribution of each variable and a set of alternative models that may be used in place of the chosen model. We call this procedure “Next-Door analysis” since it examines models “next” to the base model. It can be applied to supervised learning problems with 1 penalization and stepwise procedures. We have implemented it in the R language as a library to accompany the well-known glmnet library. The Canadian Journal of Statistics 48: 447–470; 2020 © 2020 Statistical Society of Canada  相似文献   

9.
The Akaike Information Criterion (AIC) is developed for selecting the variables of the nested error regression model where an unobservable random effect is present. Using the idea of decomposing the likelihood into two parts of “within” and “between” analysis of variance, we derive the AIC when the number of groups is large and the ratio of the variances of the random effects and the random errors is an unknown parameter. The proposed AIC is compared, using simulation, with Mallows' C p , Akaike's AIC, and Sugiura's exact AIC. Based on the rates of selecting the true model, it is shown that the proposed AIC performs better.  相似文献   

10.
We re-examine the criteria of “hyper-admissibility” and “necessary bestness”, for the choice of estimator, from the point of view of their relevance to the design of actual surveys. Both these criteria give rise to a unique choice of estimator (viz. the Horvitz-Thompson estimator ?HT) whatever be the character under investigation or sample design. However, we show here that the “principal hyper-surfaces” (or “domains”) of dimension one (which are practically uninteresting)play the key role in arriving at the unique choice. A variance estimator v1(?HT) (due to Horvitz-Thompson), which takes negative values “often”, is shown to be uniquely “hyperadmissible” in a wide class of unbiased estimators of the variance of ?HT. Extensive empirical evidence on the superiority of the Sen-Yates-Grundy variance estimator v2(?HT) over v1(?HT) is presented.  相似文献   

11.
This article extends the theoretical analysis of spurious relationship and considers the situation where the deterministic components of the processes generating individual series are independent heavy-tailed with structural changes. It shows when those sequences are used in ordinary least-square regression, the convenient t-statistic procedures wrongly indicate that (i) the spurious significance is established when regressing mean-stationary and trend-stationary series with structural changes, (ii) the spurious relationship occurs under broken mean-stationary and difference-stationary sequences, and (iii) the extent of spurious regression becomes stronger between difference-stationary and trend-stationary series in the presence of breaks. The spurious phenomenon is present regardless of the sample size and structural breaks taking place at the same points or not. Simulation experiments confirm our asymptotic results and reveal that the spurious effects are not only sensitive to the relative location of structural changes with the sample, but seriously depend on the stable indexes.  相似文献   

12.
This paper proposes a wavelet-based approach to analyze spurious and cointegrated regressions in time series. The approach is based on the properties of the wavelet covariance and correlation in Monte Carlo studies of spurious and cointegrated regression. In the case of the spurious regression, the null hypotheses of zero wavelet covariance and correlation for these series across the scales fail to be rejected. Conversely, these null hypotheses across the scales are rejected for the cointegrated bivariate time series. These nonresidual-based tests are then applied to analyze if any relationship exists between the extraterrestrial phenomenon of sunspots and the earthly economic time series of oil prices. Conventional residual-based tests appear sensitive to the specification in both the cointegrating regression and the lag order in the augmented Dickey–Fuller tests on the residuals. In contrast, the wavelet tests, with their bootstrap t-statistics and confidence intervals, detect the spuriousness of this relationship.  相似文献   

13.
Results of an exhaustive study of the bias of the least square estimator (LSE) of an first order autoregression coefficient α in a contaminated Gaussian model are presented. The model describes the following situation. The process is defined as Xt = α Xt-1 + Yt . Until a specified time T, Yt are iid normal N(0, 1). At the moment T we start our observations and since then the distribution of Yt, tT, is a Tukey mixture T(εσ) = (1 – ε)N(0,1) + εN(0, σ2). Bias of LSE as a function of α and ε, and σ2 is considered. A rather unexpected fact is revealed: given α and ε, the bias does not change montonically with σ (“the magnitude of the contaminant”), and similarly, given α and σ, the bias is not growing with ε (“the amount of contaminants”).  相似文献   

14.
We consider automatic data-driven density, regression and autoregression estimates, based on any random bandwidth selector h/T. We show that in a first-order asymptotic approximation they behave as well as the related estimates obtained with the “optimal” bandwidth hT as long as hT/hT → 1 in probability. The results are obtained for dependent observations; some of them are also new for independent observations.  相似文献   

15.
The two-parameter lognormal distribution with density function f(y: γ, σ2) = [(2πσ2)1/2y] 1exp[?(ln y ? γ)2/2σ2], y > 0, is important as a failure-time model in life testing. In this paper, Bayesian lower bounds for the reliability function R(t: γ, σ2) = ?[(γ ? ln t)/σ] are obtained for two cases. First, it is assumed that γ is known and σ2 has either an inverted gamma or “general uniform” prior distribution. Then, for the case that both γ and σ2 are unknown, the normal-gamma prior and Jeffreys' vague prior are considered. Some Monte Carlo simulations are given to indicate some of the properties of the Bayesian lower bounds.  相似文献   

16.
Consider k independent observations Yi (i= 1,., k) from two-parameter exponential populations i with location parameters μ and the same scale parameter If the μi are ranked as consider population as the “worst” population and IIp(k) as the “best” population (with some tagging so that p{) and p(k) are well defined in the case of equalities). If the Yi are ranked as we consider the procedure, “Select provided YR(k) Yr(k) is sufficiently large so that is demonstrably better than the other populations.” A similar procedure is studied for selecting the “demonstrably worst” population.  相似文献   

17.
Let ?(1) and ?(2) be location-equivariant estimators of an unknown location parameter μ. It is shown that the test for H0: μ ≤ μ0 versus HA : μ > μ0 that rejects H0 if ?(1) is large is uniformly more powerful than the one that rejects H0 if ?(2) is large if and only if ?(2) is “more dispersed” than ?(1). A similar result is obtained for tests on scale using the star-shaped ordering. Examples are given.  相似文献   

18.
Significance tests on coefficients of lower-order terms in polynomial regression models are affected by linear transformations. For this reason, a polynomial regression model that excludes hierarchically inferior predictors (i.e., lower-order terms) is considered to be not well formulated. Existing variable-selection algorithms do not take into account the hierarchy of predictors and often select as “best” a model that is not hierarchically well formulated. This article proposes a theory of the hierarchical ordering of the predictors of an arbitrary polynomial regression model in m variables, where m is any arbitrary positive integer. Ways of modifying existing algorithms to restrict their search to well-formulated models are suggested. An algorithm that generates all possible well-formulated models is presented.  相似文献   

19.
“Precision” may be thought of either as the closeness with which a reported value approximates a “true” value, or as the number of digits carried in computations, depending on context. With suitable formal definitions, it is shown that the precision of a reported value is the difference between the precision with which computations are performed and the “loss” in precision due to the computations. Loss in precision is a function of the quantity computed and of the algorithm used to compute it; in the case of the usual “computing formula” for variances and covariances, it is shown that the loss of precision is expected to be log k i k j where k i , the reciprocal of the coefficient of variation, is the ratio of the mean to the standard deviation of the ith variable. When the precision of a reported value, the precision of computations, and the loss of precision due to the computations are expressed to the same base, all three quantities have the units of significant digits in the corresponding number system. Using this metric for “precision,” the expected precision of a computed (co)variance may be estimated in advance of the computation; for data reported in the paper, the estimates agree closely with observed precision. Implications are drawn for the programming of general-purpose statistical programs, as well as for users of existing programs, in order to minimize the loss of precision resulting from characteristics of the data, A nomograph is provided to facilitate the estimation of precision in binary, decimal, and hexadecimal digits.  相似文献   

20.
R-squared (R2) and adjusted R-squared (R2Adj) are sometimes viewed as statistics detached from any target parameter, and sometimes as estimators for the population multiple correlation. The latter interpretation is meaningful only if the explanatory variables are random. This article proposes an alternative perspective for the case where the x’s are fixed. A new parameter is defined, in a similar fashion to the construction of R2, but relying on the true parameters rather than their estimates. (The parameter definition includes also the fixed x values.) This parameter is referred to as the “parametric” coefficient of determination, and denoted by ρ2*. The proposed ρ2* remains stable when irrelevant variables are removed (or added), unlike the unadjusted R2, which always goes up when variables, either relevant or not, are added to the model (and goes down when they are removed). The value of the traditional R2Adj may go up or down with added (or removed) variables, either relevant or not. It is shown that the unadjusted R2 overestimates ρ2*, while the traditional R2Adj underestimates it. It is also shown that for simple linear regression the magnitude of the bias of R2Adj can be as high as the bias of the unadjusted R2 (while their signs are opposite). Asymptotic convergence in probability of R2Adj to ρ2* is demonstrated. The effects of model parameters on the bias of R2 and R2Adj are characterized analytically and numerically. An alternative bi-adjusted estimator is presented and evaluated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号