期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Comparison of Goodness-of-Fit Measures in Probit Regression Model

Berna Yazici Özlem Alpu Yaning Yang 《统计学通讯:模拟与计算》2013,42(5):1061-1073

This article examines several goodness-of-fit measures in the binary probit regression model. Existing pseudo-R ² measures are reviewed, two modified and one new pseudo-R ² measure are proposed. For the probit regression model, empirical comparisons are made for different goodness-of-fit measures with the squared sample correlation coefficient of the observed response and the predicted probabilities. As an illustration, the goodness-of-fit measures are applied to a “paid labor force” data set. 相似文献

2.

Limitations of P-Values and R-squared for Stepwise Regression Building: A Fairness Demonstration in Health Policy Risk Adjustment

Sherri Rose Thomas G. McGuire 《The American statistician》2019,73(1):152-156

ABSTRACT

Stepwise regression building procedures are commonly used applied statistical tools, despite their well-known drawbacks. While many of their limitations have been widely discussed in the literature, other aspects of the use of individual statistical fit measures, especially in high-dimensional stepwise regression settings, have not. Giving primacy to individual fit, as is done with p-values and R², when group fit may be the larger concern, can lead to misguided decision making. One of the most consequential uses of stepwise regression is in health care, where these tools allocate hundreds of billions of dollars to health plans enrolling individuals with different predicted health care costs. The main goal of this “risk adjustment” system is to convey incentives to health plans such that they provide health care services fairly, a component of which is not to discriminate in access or care for persons or groups likely to be expensive. We address some specific limitations of p-values and R² for high-dimensional stepwise regression in this policy problem through an illustrated example by additionally considering a group-level fairness metric. 相似文献

3.

Confidence Intervals for Nonparametric Regression Functions with Missing Data

Yongsong Qin Tao Qiu Qingzhu Lei 《统计学通讯:理论与方法》2014,43(19):4123-4142

Suppose that we have a nonparametric regression model Y = m(X) + ε with X ∈ R^p, where X is a random design variable and is observed completely, and Y is the response variable and some Y-values are missing at random. Based on the “complete” data sets for Y after nonaprametric regression imputation and inverse probability weighted imputation, two estimators of the regression function m(x₀) for fixed x₀ ∈ R^p are proposed. Asymptotic normality of two estimators is established, which is used to construct normal approximation-based confidence intervals for m(x₀). We also construct an empirical likelihood (EL) statistic for m(x₀) with limiting distribution of χ²₁, which is used to construct an EL confidence interval for m(x₀). 相似文献

4.

Admissible tests for exponential families with finite support 2

Teresa Ledwina 《Statistics》2013,47(1):105-118

We state some necessary and sufficient conditions for admissibility of tests for a simple and a composite null hypotheses against ”one-sided” alternatives on multivariate exponential distributions with discrete support.

Admissibility of the maximum likelihood test for “one –sided” alternatives and z χ²test for the independence hypothesis in r× scontingency tables is deduced among others. 相似文献

5.

On the relevance of weaker instruments

Bertille Antoine 《Econometric Reviews》2017,36(6-9):928-945

ABSTRACT

We study the asymptotic properties of the standard GMM estimator when additional moment restrictions, weaker than the original ones, are available. We provide conditions under which these additional weaker restrictions improve the efficiency of the GMM estimator. To detect “spurious” identification that may come from invalid moments, we rely on the Hansen J-test that assesses the compatibility between existing restrictions and additional ones. Our simulations reveal that the J-test has good power properties and that its power increases with the weakness of the additional restrictions. Our theoretical characterization of the J-test provides some intuition for why that is. 相似文献

6.

An Application of the Chi-Squared Goodness-of-Fit Test to Discrete Common Stock Returns

Robert J. Ritchey 《商业与经济统计学杂志》2013,31(2):243-254

This article presents the “centered” method for establishing cell boundaries in the X ² goodness-of-fit test, which when applied to common stock returns significantly reduces the high bias of the test statistic associated with the traditional Mann–Wald equiprobable approach. A modified null hypothesis is proposed to incorporate explicitly the usually implicit assumption that the observed discrete returns are “approximated” by the hypothesized continuous density. Simulation results indicate extremely biased X ² values resulting from the traditional approach, particularly for low-priced and low volatile stocks. Daily stock returns for 114 firms are tested to determine whether they are approximated by a normal or one of several normal mixture densities. Results indicate a significantly higher degree of fit than that reported elsewhere to date. 相似文献

7.

Nonparametric Regression as an Example of Model Choice

Laurie Davies Henrike Weinert 《统计学通讯:模拟与计算》2013,42(2):274-289

Nonparametric regression can be considered as a problem of model choice. In this article, we present the results of a simulation study in which several nonparametric regression techniques including wavelets and kernel methods are compared with respect to their behavior on different test beds. We also include the taut-string method whose aim is not to minimize the distance of an estimator to some “true” generating function f but to provide a simple adequate approximation to the data. Test beds are situations where a “true” generating f exists and in this situation it is possible to compare the estimates of f with f itself. The measures of performance we use are the L²- and the L^∞-norms and the ability to identify peaks. 相似文献

8.

Post model-fitting exploration via a “Next-Door” analysis

Leying Guan Robert Tibshirani 《Revue canadienne de statistique》2020,48(3):447-470

We propose a simple method for evaluating the model that has been chosen by an adaptive regression procedure, our main focus being the lasso. This procedure deletes each chosen predictor and refits the lasso to get a set of models that are “close” to the chosen “base model,” and compares the error rates of the base model with that of nearby models. If the deletion of a predictor leads to significant deterioration in the model's predictive power, the predictor is called indispensable; otherwise, the nearby model is called acceptable and can serve as a good alternative to the base model. This provides both an assessment of the predictive contribution of each variable and a set of alternative models that may be used in place of the chosen model. We call this procedure “Next-Door analysis” since it examines models “next” to the base model. It can be applied to supervised learning problems with ℓ₁ penalization and stepwise procedures. We have implemented it in the R language as a library to accompany the well-known glmnet library. The Canadian Journal of Statistics 48: 447–470; 2020 © 2020 Statistical Society of Canada 相似文献

9.

Akaike Information Criterion for Selecting Variables in the Nested Error Regression Model

Tatsuya Kubokawa Muni S. Srivastava 《统计学通讯:理论与方法》2013,42(15):2626-2642

The Akaike Information Criterion (AIC) is developed for selecting the variables of the nested error regression model where an unobservable random effect is present. Using the idea of decomposing the likelihood into two parts of “within” and “between” analysis of variance, we derive the AIC when the number of groups is large and the ratio of the variances of the random effects and the random errors is an unknown parameter. The proposed AIC is compared, using simulation, with Mallows' C _p, Akaike's AIC, and Sugiura's exact AIC. Based on the rates of selecting the true model, it is shown that the proposed AIC performs better. 相似文献

10.

ON THE CHOICE OF ESTIMATOR IN SURVERY SAMPLING*

J. N. K. Rao M. P. Singh 《Australian & New Zealand Journal of Statistics》1973,15(2):95-104

We re-examine the criteria of “hyper-admissibility” and “necessary bestness”, for the choice of estimator, from the point of view of their relevance to the design of actual surveys. Both these criteria give rise to a unique choice of estimator (viz. the Horvitz-Thompson estimator ?_HT) whatever be the character under investigation or sample design. However, we show here that the “principal hyper-surfaces” (or “domains”) of dimension one (which are practically uninteresting)play the key role in arriving at the unique choice. A variance estimator v₁(?_HT) (due to Horvitz-Thompson), which takes negative values “often”, is shown to be uniquely “hyperadmissible” in a wide class of unbiased estimators of the variance of ?_HT. Extensive empirical evidence on the superiority of the Sen-Yates-Grundy variance estimator v₂(?_HT) over v₁(?_HT) is presented. 相似文献

11.

Evidence of spurious regression driven by heavy-tailed observations with structural changes

Hao Jin Danshi Zhang Jinsuo Zhang Si Zhang 《统计学通讯:模拟与计算》2017,46(2):1086-1101

This article extends the theoretical analysis of spurious relationship and considers the situation where the deterministic components of the processes generating individual series are independent heavy-tailed with structural changes. It shows when those sequences are used in ordinary least-square regression, the convenient t-statistic procedures wrongly indicate that (i) the spurious significance is established when regressing mean-stationary and trend-stationary series with structural changes, (ii) the spurious relationship occurs under broken mean-stationary and difference-stationary sequences, and (iii) the extent of spurious regression becomes stronger between difference-stationary and trend-stationary series in the presence of breaks. The spurious phenomenon is present regardless of the sample size and structural breaks taking place at the same points or not. Simulation experiments confirm our asymptotic results and reveal that the spurious effects are not only sensitive to the relative location of structural changes with the sample, but seriously depend on the stable indexes. 相似文献

12.

Testing for spurious and cointegrated regressions: A wavelet approach

Chee Kian Leong Weihong Huang 《Journal of applied statistics》2010,37(2):215-233

This paper proposes a wavelet-based approach to analyze spurious and cointegrated regressions in time series. The approach is based on the properties of the wavelet covariance and correlation in Monte Carlo studies of spurious and cointegrated regression. In the case of the spurious regression, the null hypotheses of zero wavelet covariance and correlation for these series across the scales fail to be rejected. Conversely, these null hypotheses across the scales are rejected for the cointegrated bivariate time series. These nonresidual-based tests are then applied to analyze if any relationship exists between the extraterrestrial phenomenon of sunspots and the earthly economic time series of oil prices. Conventional residual-based tests appear sensitive to the specification in both the cointegrating regression and the lag order in the augmented Dickey–Fuller tests on the residuals. In contrast, the wavelet tests, with their bootstrap t-statistics and confidence intervals, detect the spuriousness of this relationship. 相似文献

13.

Bias of the lse estimator of the first order autoregressive model under tukey contamination

Hocine Fellag Ryszard Zieliński 《统计学通讯:理论与方法》2013,42(7):1537-1551

Results of an exhaustive study of the bias of the least square estimator (LSE) of an first order autoregression coefficient α in a contaminated Gaussian model are presented. The model describes the following situation. The process is defined as X_t = α X_t-1 + Y_t . Until a specified time T, Y_t are iid normal N(0, 1). At the moment T we start our observations and since then the distribution of Y_t, t≥T, is a Tukey mixture T(εσ) = (1 – ε)N(0,1) + εN(0, σ²). Bias of LSE as a function of α and ε, and σ² is considered. A rather unexpected fact is revealed: given α and ε, the bias does not change montonically with σ (“the magnitude of the contaminant”), and similarly, given α and σ, the bias is not growing with ε (“the amount of contaminants”). 相似文献

14.

Asymptotic distribution of data-driven smoothers in density and regression estimation under dependence

Graciela Boente Ricardo Fraiman 《Revue canadienne de statistique》1995,23(4):383-397

We consider automatic data-driven density, regression and autoregression estimates, based on any random bandwidth selector h/_T. We show that in a first-order asymptotic approximation they behave as well as the related estimates obtained with the “optimal” bandwidth h_T as long as h_T/h_T → 1 in probability. The results are obtained for dependent observations; some of them are also new for independent observations. 相似文献

15.

Some Bayesian lower bounds on reliability in the lognormal distribution

W. J. Padgett M. P. Johnson 《Revue canadienne de statistique》1983,11(2):137-147

The two-parameter lognormal distribution with density function f(y: γ, σ²) = [(2πσ²)^1/2y] ¹exp[?(ln y ? γ)²/2σ²], y > 0, is important as a failure-time model in life testing. In this paper, Bayesian lower bounds for the reliability function R(t: γ, σ²) = ?[(γ ? ln t)/σ] are obtained for two cases. First, it is assumed that γ is known and σ² has either an inverted gamma or “general uniform” prior distribution. Then, for the case that both γ and σ² are unknown, the normal-gamma prior and Jeffreys' vague prior are considered. Some Monte Carlo simulations are given to indicate some of the properties of the Bayesian lower bounds. 相似文献

16.

SELECTING "DEMONSTRABLY BEST" OR "DEMONSTRABLY WORST" EXPONENTIAL POPULATIONS

Eve Bofinger 《Australian & New Zealand Journal of Statistics》1991,33(2):183-190

Consider k independent observations Y_i (i= 1,., k) from two-parameter exponential populations _i with location parameters μ and the same scale parameter If the μ_i are ranked as consider population as the “worst” population and II_p(k) as the “best” population (with some tagging so that p{) and p(k) are well defined in the case of equalities). If the Y_i are ranked as we consider the procedure, “Select provided Y_R(k) Y_r(k) is sufficiently large so that is demonstrably better than the other populations.” A similar procedure is studied for selecting the “demonstrably worst” population. 相似文献

17.

Tests based on equivariant estimators

Louis-Paul Rivest 《Revue canadienne de statistique》1985,13(3):211-216

Let ?⁽¹⁾ and ?⁽²⁾ be location-equivariant estimators of an unknown location parameter μ. It is shown that the test for H₀: μ ≤ μ₀ versus H_A : μ > μ₀ that rejects H₀ if ?⁽¹⁾ is large is uniformly more powerful than the one that rejects H₀ if ?⁽²⁾ is large if and only if ?⁽²⁾ is “more dispersed” than ?⁽¹⁾. A similar result is obtained for tests on scale using the star-shaped ordering. Examples are given. 相似文献

18.

Hierarchical Variable Selection in Polynomial Regression Models

Julio L. Peixoto 《The American statistician》2013,67(4):311-313

Significance tests on coefficients of lower-order terms in polynomial regression models are affected by linear transformations. For this reason, a polynomial regression model that excludes hierarchically inferior predictors (i.e., lower-order terms) is considered to be not well formulated. Existing variable-selection algorithms do not take into account the hierarchy of predictors and often select as “best” a model that is not hierarchically well formulated. This article proposes a theory of the hierarchical ordering of the predictors of an arbitrary polynomial regression model in m variables, where m is any arbitrary positive integer. Ways of modifying existing algorithms to restrict their search to well-formulated models are suggested. An algorithm that generates all possible well-formulated models is presented. 相似文献

19.

Computational precision in statistics: variances and covariances †

《Journal of Statistical Computation and Simulation》2012,82(2):105-115

“Precision” may be thought of either as the closeness with which a reported value approximates a “true” value, or as the number of digits carried in computations, depending on context. With suitable formal definitions, it is shown that the precision of a reported value is the difference between the precision with which computations are performed and the “loss” in precision due to the computations. Loss in precision is a function of the quantity computed and of the algorithm used to compute it; in the case of the usual “computing formula” for variances and covariances, it is shown that the loss of precision is expected to be log k _i k _j where k _i, the reciprocal of the coefficient of variation, is the ratio of the mean to the standard deviation of the ith variable. When the precision of a reported value, the precision of computations, and the loss of precision due to the computations are expressed to the same base, all three quantities have the units of significant digits in the corresponding number system. Using this metric for “precision,” the expected precision of a computed (co)variance may be estimated in advance of the computation; for data reported in the paper, the estimates agree closely with observed precision. Implications are drawn for the programming of general-purpose statistical programs, as well as for users of existing programs, in order to minimize the loss of precision resulting from characteristics of the data, A nomograph is provided to facilitate the estimation of precision in binary, decimal, and hexadecimal digits. 相似文献

20.

The Target Parameter of Adjusted R-Squared in Fixed-Design Experiments

Hillel Bar-Gera 《The American statistician》2017,71(2):112-119

R-squared (R²) and adjusted R-squared (R²_Adj) are sometimes viewed as statistics detached from any target parameter, and sometimes as estimators for the population multiple correlation. The latter interpretation is meaningful only if the explanatory variables are random. This article proposes an alternative perspective for the case where the x’s are fixed. A new parameter is defined, in a similar fashion to the construction of R², but relying on the true parameters rather than their estimates. (The parameter definition includes also the fixed x values.) This parameter is referred to as the “parametric” coefficient of determination, and denoted by ρ²_*. The proposed ρ²_* remains stable when irrelevant variables are removed (or added), unlike the unadjusted R², which always goes up when variables, either relevant or not, are added to the model (and goes down when they are removed). The value of the traditional R²_Adj may go up or down with added (or removed) variables, either relevant or not. It is shown that the unadjusted R² overestimates ρ²_*, while the traditional R²_Adj underestimates it. It is also shown that for simple linear regression the magnitude of the bias of R²_Adj can be as high as the bias of the unadjusted R² (while their signs are opposite). Asymptotic convergence in probability of R²_Adj to ρ²_* is demonstrated. The effects of model parameters on the bias of R² and R²_Adj are characterized analytically and numerically. An alternative bi-adjusted estimator is presented and evaluated. 相似文献