期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient regression modeling for correlated and overdispersed count data

《统计学通讯:理论与方法》2012,41(24):6005-6018

Abstract

The objective of this paper is to propose an efficient estimation procedure in a marginal mean regression model for longitudinal count data and to develop a hypothesis test for detecting the presence of overdispersion. We extend the matrix expansion idea of quadratic inference functions to the negative binomial regression framework that entails accommodating both the within-subject correlation and overdispersion issue. Theoretical and numerical results show that the proposed procedure yields a more efficient estimator asymptotically than the one ignoring either the within-subject correlation or overdispersion. When the overdispersion is absent in data, the proposed method might hinder the estimation efficiency in practice, yet the Poisson regression based regression model is fitted to the data sufficiently well. Therefore, we construct the hypothesis test that recommends an appropriate model for the analysis of the correlated count data. Extensive simulation studies indicate that the proposed test can identify the effective model consistently. The proposed procedure is also applied to a transportation safety study and recommends the proposed negative binomial regression model. 相似文献

2.

Generalized correlation and kernel causality with applications in development economics

Hrishikesh D. Vinod 《统计学通讯:模拟与计算》2017,46(6):4513-4534

New generalized correlation measures of 2012, GMC(Y|X), use Kernel regressions to overcome the linearity of Pearson's correlation coefficients. A new matrix of generalized correlation coefficients is such that when |r*_ij| > |r*_ji|, it is more likely that the column variable X_j is what Granger called the “instantaneous cause” or what we call “kernel cause” of the row variable X_i. New partial correlations ameliorate confounding. Various examples and simulations support robustness of new causality. We include bootstrap inference, robustness checks based on the dependence between regressor and error, and on the out-of-sample forecasts. Data for 198 countries on nine development variables support growth policy over redistribution and Deaton's criticism of foreign aid. Potential applications include Big Data, since our R code is available in the online supplementary material. 相似文献

3.

On robustness of the Shiryaev–Roberts change-point detection procedure under parameter misspecification in the post-change distribution

Wenyu Du Grigory Sokolov 《统计学通讯:模拟与计算》2017,46(3):2185-2206

The gist of the quickest change-point detection problem is to detect the presence of a change in the statistical behavior of a series of sequentially made observations, and do so in an optimal detection-speed-versus-“false-positive”-risk manner. When optimality is understood either in the generalized Bayesian sense or as defined in Shiryaev's multi-cyclic setup, the so-called Shiryaev–Roberts (SR) detection procedure is known to be the “best one can do”, provided, however, that the observations’ pre- and post-change distributions are both fully specified. We consider a more realistic setup, viz. one where the post-change distribution is assumed known only up to a parameter, so that the latter may be misspecified. The question of interest is the sensitivity (or robustness) of the otherwise “best” SR procedure with respect to a possible misspecification of the post-change distribution parameter. To answer this question, we provide a case study where, in a specific Gaussian scenario, we allow the SR procedure to be “out of tune” in the way of the post-change distribution parameter, and numerically assess the effect of the “mistuning” on Shiryaev's (multi-cyclic) Stationary Average Detection Delay delivered by the SR procedure. The comprehensive quantitative robustness characterization of the SR procedure obtained in the study can be used to develop the respective theory as well as to provide a rational for practical design of the SR procedure. The overall qualitative conclusion of the study is an expected one: the SR procedure is less (more) robust for less (more) contrast changes and for lower (higher) levels of the false alarm risk. 相似文献

4.

Semiparametric statistical inferences for longitudinal data with nonparametric covariance modelling

Qunfang Xu 《Statistics》2017,51(6):1280-1303

In this paper, semiparametric modelling for longitudinal data with an unstructured error process is considered. We propose a partially linear additive regression model for longitudinal data in which within-subject variances and covariances of the error process are described by unknown univariate and bivariate functions, respectively. We provide an estimating approach in which polynomial splines are used to approximate the additive nonparametric components and the within-subject variance and covariance functions are estimated nonparametrically. Both the asymptotic normality of the resulting parametric component estimators and optimal convergence rate of the resulting nonparametric component estimators are established. In addition, we develop a variable selection procedure to identify significant parametric and nonparametric components simultaneously. We show that the proposed SCAD penalty-based estimators of non-zero components have an oracle property. Some simulation studies are conducted to examine the finite-sample performance of the proposed estimation and variable selection procedures. A real data set is also analysed to demonstrate the usefulness of the proposed method. 相似文献

5.

Post model-fitting exploration via a “Next-Door” analysis

Leying Guan Robert Tibshirani 《Revue canadienne de statistique》2020,48(3):447-470

We propose a simple method for evaluating the model that has been chosen by an adaptive regression procedure, our main focus being the lasso. This procedure deletes each chosen predictor and refits the lasso to get a set of models that are “close” to the chosen “base model,” and compares the error rates of the base model with that of nearby models. If the deletion of a predictor leads to significant deterioration in the model's predictive power, the predictor is called indispensable; otherwise, the nearby model is called acceptable and can serve as a good alternative to the base model. This provides both an assessment of the predictive contribution of each variable and a set of alternative models that may be used in place of the chosen model. We call this procedure “Next-Door analysis” since it examines models “next” to the base model. It can be applied to supervised learning problems with ℓ₁ penalization and stepwise procedures. We have implemented it in the R language as a library to accompany the well-known glmnet library. The Canadian Journal of Statistics 48: 447–470; 2020 © 2020 Statistical Society of Canada 相似文献

6.

Spurious Regressions with Time-Series Data: Further Asymptotic Results

David E. A. Giles 《统计学通讯:理论与方法》2013,42(5):967-979

A “spurious regression” is one in which the time-series variables are non stationary and independent. It is well known that in this context the OLS parameter estimates and the R ² converge to functionals of Brownian motions, the “t-ratios” diverge in distribution, and the Durbin–Watson statistic converges in probability to zero. We derive corresponding results for some common tests for the normality and homoskedasticity of the errors in a spurious regression. 相似文献

7.

Modified mantel-haenszel procedures for matched pairs

Eric Peritz 《统计学通讯:理论与方法》2013,42(10):2263-2285

In the prospective study of a finely stratified population, one individual from each stratum is chosen at random for the “treatment” group and one for the “non-treatment” group. For each individual the probability of failure is a logistic function of parameters designating the stratum, the treatment and a covariate. Uniformly most powerful unbiased tests for the treatment effect are given. These tests are generally cumbersome but, if the covariate is dichotomous, the tests and confidence intervals are simple. Readily usable (but non-optimal) tests are also proposed for poly-tomous covariates and factorial designs. These are then adapted to retrospective studies (in which one “success” and one “failure” per stratum are sampled). Tests for retrospective studies with a continuous “treatment” score are also proposed. 相似文献

8.

Comparison of Goodness-of-Fit Measures in Probit Regression Model

Berna Yazici Özlem Alpu Yaning Yang 《统计学通讯:模拟与计算》2013,42(5):1061-1073

This article examines several goodness-of-fit measures in the binary probit regression model. Existing pseudo-R ² measures are reviewed, two modified and one new pseudo-R ² measure are proposed. For the probit regression model, empirical comparisons are made for different goodness-of-fit measures with the squared sample correlation coefficient of the observed response and the predicted probabilities. As an illustration, the goodness-of-fit measures are applied to a “paid labor force” data set. 相似文献

9.

A comparison between bootstrap methods and generalized estimating equations for correlated outcomes in generalized linear models

Michael Sherman Saskia le Cessie 《统计学通讯:模拟与计算》2013,42(3):901-925

We discuss and evaluate bootstrap algorithms for obtaining confidence intervals for parameters in Generalized Linear Models when the data are correlated. The methods are based on a stratified bootstrap and are suited to correlation occurring within “blocks” of data (e.g., individuals within a family, teeth within a mouth, etc.). Application of the intervals to data from a Dutch follow-up study on preterm infants shows the corroborative usefulness of the intervals, while the intervals are seen to be a powerful diagnostic in studying annual measles data. In a simulation study, we compare the coverage rates of the proposed intervals with existing methods (e.g., via Generalized Estimating Equations). In most cases, the bootstrap intervals are seen to perform better than current methods, and are produced in an automatic fashion, so that the user need not know (or have to guess) the dependence structure within a block. 相似文献

10.

A joint test for parametric specification and independence in nonlinear regression models

Shuo Li 《Econometric Reviews》2019,38(10):1202-1215

This paper develops a testing procedure to simultaneously check (i) the independence between the error and the regressor(s), and (ii) the parametric specification in nonlinear regression models. This procedure generalizes the existing work of Sen and Sen [“Testing Independence and Goodness-of-fit in Linear Models,” Biometrika, 101, 927–942.] to a regression setting that allows any smooth parametric form of the regression function. We establish asymptotic theory for the test procedure under both conditional homoscedastic error and heteroscedastic error. The derived tests are easily implementable, asymptotically normal, and consistent against a large class of fixed alternatives. Besides, the local power performance is investigated. To calibrate the finite sample distribution of the test statistics, a smooth bootstrap procedure is proposed and found work well in simulation studies. Finally, two real data examples are analyzed to illustrate the practical merit of our proposed tests. 相似文献

11.

Reasonable sample sizes for convergence to normality

Carsten Schröder 《统计学通讯:模拟与计算》2017,46(9):7074-7087

The central limit theorem says that, provided an estimator fulfills certain weak conditions, then, for reasonable sample sizes, the sampling distribution of the estimator converges to normality. We propose a procedure to find out what a “reasonably large sample size” is. The procedure is based on the properties of Gini's mean difference decomposition. We show the results of implementations of the procedure from simulated datasets and data from the German Socio-economic Panel. 相似文献

12.

Testing the adequacy of a linear model VIA critical smoothing

《Journal of Statistical Computation and Simulation》2012,82(3):281-294

An important problem for fitting local linear regression is the choice of the smoothing parameter. As the smoothing parameter becomes large, the estimator tends to a straight line, which is the least squares fit in the ordinary linear regression setting. This property may be used to assess the adequacy of a simple linear model. Motivated by Silverman's (1981) work in kernel density estimation, a suitable test statistic is the critical smoothing parameter where the estimate changes from nonlinear to linear, while linearity or non- linearity requires a more precise judgment. We define the critical smoothing parameter through the approximate F-tests by Hastie and Tibshirani (1990). To assess the significance, the “wild bootstrap” procedure is used to replicate the data and the proportion of bootstrap samples which give a nonlinear estimate when using the critical bandwidth is obtained as the p-value. Simulation results show that the critical smoothing test is useful in detecting a wide range of alternatives. 相似文献

13.

Bayesian nonparametric smoothers for regular processes

S. Weerahandi J. V. Zidek 《Revue canadienne de statistique》1988,16(1):61-74

This report is about the analysis of stochastic processes of the form R = S + N, where S is a “smooth” functional and N is noise. The proposed methods derive from the assumption that the observed R-values and unobserved values of R, the assumed inferential objectives of the analysis, are linearly related through Taylor series expansions of observed about unobserved values. The expansion errors and all other priori unspecified quantities have a joint multivariate normal distribution which expresses the prior uncertainty about their values. The results include interpolators, predictors, and derivative estimates, with credibility-interval estimates automatically generated in each case. An analysis of an acid-rain wet-deposition time series is included to indicate the efficacy of the proposed method. It was this problem which led to the methodological developments reported in this paper. 相似文献

14.

Akaike Information Criterion for Selecting Variables in the Nested Error Regression Model

Tatsuya Kubokawa Muni S. Srivastava 《统计学通讯:理论与方法》2013,42(15):2626-2642

The Akaike Information Criterion (AIC) is developed for selecting the variables of the nested error regression model where an unobservable random effect is present. Using the idea of decomposing the likelihood into two parts of “within” and “between” analysis of variance, we derive the AIC when the number of groups is large and the ratio of the variances of the random effects and the random errors is an unknown parameter. The proposed AIC is compared, using simulation, with Mallows' C _p, Akaike's AIC, and Sugiura's exact AIC. Based on the rates of selecting the true model, it is shown that the proposed AIC performs better. 相似文献

15.

A CLASS OF NONPARAMETRIC TESTS FOR THE ONE-SAMPLE LOCATION PROBLEM

K.L. Mehra N.G.N. Prasad K.S. Madhava Rao 《Australian & New Zealand Journal of Statistics》1990,32(3):373-392

A class of “optimal”U-statistics type nonparametric test statistics is proposed for the one-sample location problem by considering a kernel depending on a constant a and all possible (distinct) subsamples of size two from a sample of n independent and identically distributed observations. The “optimal” choice of a is determined by the underlying distribution. The proposed class includes the Sign and the modified Wilcoxon signed-rank statistics as special cases. It is shown that any “optimal” member of the class performs better in terms of Pitman efficiency relative to the Sign and Wilcoxon-signed rank statistics. The effect of deviation of chosen a from the “optimal” a on Pitman efficiency is also examined. A Hodges-Lehmann type point estimator of the location parameter corresponding to the proposed “optimal” test-statistics is also defined and studied in this paper. 相似文献

16.

A computer program for the design of group testing experiments

Toby .J Mitchell David .S Scott 《统计学通讯:理论与方法》2013,42(10):2943-2955

This paper describes a computer program GTEST for designing group testing experiments for classifying each member of a population of items as “good” or “defective”. The outcome of a test on a group of items is either “negative” (if all items in the group are good) or “positive” (if at least one of the items is defective, but it is not known which). GTEST is based on a Bayesian approach. At each stage, it attempts to maximize (nearly) the expected reduction in the “entropy”, which is a quantitative measure of the amount of uncertainty about the state of the items. The user controls the procedure through specification of the prior probabilities of being defective, restrictions on the construction of the test group, and priorities that are assigned to the items. The nominal prior probabilities can be modified adaptively, to reduce the sensitivity of the procedure to the proportion of defectives in the population. 相似文献

17.

Moran's I statistic-based nonparametric test with spatio-temporal observations

Y. Xiong D. Bingham W. J. Braun 《Journal of nonparametric statistics》2019,31(1):244-267

Moran's I statistic [Moran, (1950), ‘Notes on Continuous Stochastic Phenomena’, Biometrika, 37, 17–23] has been widely used to evaluate spatial autocorrelation. This paper is concerned with Moran's I-induced testing procedure in residual analysis. We begin with exploring the Moran's I statistic in both its original and extended forms analytically and numerically. We demonstrate that the magnitude of the statistic in general depends not only on the underlying correlation but also on certain heterogeneity in the individual observations. One should exercise caution when interpreting the outcome on correlation by the Moran's I-induced procedure. On the other hand, the effect on the Moran's I due to heterogeneity in the observations enables a regression model checking procedure with the residuals. This novel application of Moran's I is justified by simulation and illustrated by an analysis of wildfire records from Alberta, Canada. 相似文献

18.

An iterative self-weighting procedure for fitting straight lines to heteroscedastic data

John Mandel Frank L. McCrackin 《统计学通讯:模拟与计算》2013,42(2):609-635

An iterative, self-weighting procedure is presented for the fitting of straight lines to data with heteroscedastic error-variances in the response variable. The error-variances are assumed to be unknown, even relative to each other. The procedure is compared with the “resistant line” method advocated by Emerson and Hoaglin [Emerson and Hoaglin, 1983], using extensive Monte-Carlo calculations. The proposed method is simple and easily automated, and gives parameter-estimates with smaller variance (higher efficiency) than those resulting from the resistant line technique. A BASIC program to perform the self-weighting fit is given in an appendix 相似文献

19.

Genetic Algorithm in the Wavelet Domain for Large p Small n Regression

Eylem Deniz Howe Orietta Nicolis 《统计学通讯:模拟与计算》2015,44(5):1144-1157

Many areas of statistical modeling are plagued by the “curse of dimensionality,” in which there are more variables than observations. This is especially true when developing functional regression models where the independent dataset is some type of spectral decomposition, such as data from near-infrared spectroscopy. While we could develop a very complex model by simply taking enough samples (such that n > p), this could prove impossible or prohibitively expensive. In addition, a regression model developed like this could turn out to be highly inefficient, as spectral data usually exhibit high multicollinearity. In this article, we propose a two-part algorithm for selecting an effective and efficient functional regression model. Our algorithm begins by evaluating a subset of discrete wavelet transformations, allowing for variation in both wavelet and filter number. Next, we perform an intermediate processing step to remove variables with low correlation to the response data. Finally, we use the genetic algorithm to perform a stochastic search through the subset regression model space, driven by an information-theoretic objective function. We allow our algorithm to develop the regression model for each response variable independently, so as to optimally model each variable. We demonstrate our method on the familiar biscuit dough dataset, which has been used in a similar context by several researchers. Our results demonstrate both the flexibility and the power of our algorithm. For each response variable, a different subset model is selected, and different wavelet transformations are used. The models developed by our algorithm show an improvement, as measured by lower mean error, over results in the published literature. 相似文献

20.

NONPARAMETRIC RELIABILITY ESTIMATION BASED ON A FEW ORDERED OBSERVATIONS

H. Arsham 《Australian & New Zealand Journal of Statistics》1988,30(1):67-77

The usual one-sided Kolmogorov-Smirnov distance is generalized to obtain an improved lower confidence region for the extreme left tail of the reliability function based on k observations in a “k out of n censored” plan. Finite sample and asymptotic critical values necessary for implementation are given. The two numerical comparisons with existing parametric procedures for the case of complete or censored samples demonstrate the applicability of the proposed nonparametric procedure. 相似文献