首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 644 毫秒
1.
Abstract

The objective of this paper is to propose an efficient estimation procedure in a marginal mean regression model for longitudinal count data and to develop a hypothesis test for detecting the presence of overdispersion. We extend the matrix expansion idea of quadratic inference functions to the negative binomial regression framework that entails accommodating both the within-subject correlation and overdispersion issue. Theoretical and numerical results show that the proposed procedure yields a more efficient estimator asymptotically than the one ignoring either the within-subject correlation or overdispersion. When the overdispersion is absent in data, the proposed method might hinder the estimation efficiency in practice, yet the Poisson regression based regression model is fitted to the data sufficiently well. Therefore, we construct the hypothesis test that recommends an appropriate model for the analysis of the correlated count data. Extensive simulation studies indicate that the proposed test can identify the effective model consistently. The proposed procedure is also applied to a transportation safety study and recommends the proposed negative binomial regression model.  相似文献   

2.
New generalized correlation measures of 2012, GMC(Y|X), use Kernel regressions to overcome the linearity of Pearson's correlation coefficients. A new matrix of generalized correlation coefficients is such that when |r*ij| > |r*ji|, it is more likely that the column variable Xj is what Granger called the “instantaneous cause” or what we call “kernel cause” of the row variable Xi. New partial correlations ameliorate confounding. Various examples and simulations support robustness of new causality. We include bootstrap inference, robustness checks based on the dependence between regressor and error, and on the out-of-sample forecasts. Data for 198 countries on nine development variables support growth policy over redistribution and Deaton's criticism of foreign aid. Potential applications include Big Data, since our R code is available in the online supplementary material.  相似文献   

3.
The gist of the quickest change-point detection problem is to detect the presence of a change in the statistical behavior of a series of sequentially made observations, and do so in an optimal detection-speed-versus-“false-positive”-risk manner. When optimality is understood either in the generalized Bayesian sense or as defined in Shiryaev's multi-cyclic setup, the so-called Shiryaev–Roberts (SR) detection procedure is known to be the “best one can do”, provided, however, that the observations’ pre- and post-change distributions are both fully specified. We consider a more realistic setup, viz. one where the post-change distribution is assumed known only up to a parameter, so that the latter may be misspecified. The question of interest is the sensitivity (or robustness) of the otherwise “best” SR procedure with respect to a possible misspecification of the post-change distribution parameter. To answer this question, we provide a case study where, in a specific Gaussian scenario, we allow the SR procedure to be “out of tune” in the way of the post-change distribution parameter, and numerically assess the effect of the “mistuning” on Shiryaev's (multi-cyclic) Stationary Average Detection Delay delivered by the SR procedure. The comprehensive quantitative robustness characterization of the SR procedure obtained in the study can be used to develop the respective theory as well as to provide a rational for practical design of the SR procedure. The overall qualitative conclusion of the study is an expected one: the SR procedure is less (more) robust for less (more) contrast changes and for lower (higher) levels of the false alarm risk.  相似文献   

4.
Qunfang Xu 《Statistics》2017,51(6):1280-1303
In this paper, semiparametric modelling for longitudinal data with an unstructured error process is considered. We propose a partially linear additive regression model for longitudinal data in which within-subject variances and covariances of the error process are described by unknown univariate and bivariate functions, respectively. We provide an estimating approach in which polynomial splines are used to approximate the additive nonparametric components and the within-subject variance and covariance functions are estimated nonparametrically. Both the asymptotic normality of the resulting parametric component estimators and optimal convergence rate of the resulting nonparametric component estimators are established. In addition, we develop a variable selection procedure to identify significant parametric and nonparametric components simultaneously. We show that the proposed SCAD penalty-based estimators of non-zero components have an oracle property. Some simulation studies are conducted to examine the finite-sample performance of the proposed estimation and variable selection procedures. A real data set is also analysed to demonstrate the usefulness of the proposed method.  相似文献   

5.
We propose a simple method for evaluating the model that has been chosen by an adaptive regression procedure, our main focus being the lasso. This procedure deletes each chosen predictor and refits the lasso to get a set of models that are “close” to the chosen “base model,” and compares the error rates of the base model with that of nearby models. If the deletion of a predictor leads to significant deterioration in the model's predictive power, the predictor is called indispensable; otherwise, the nearby model is called acceptable and can serve as a good alternative to the base model. This provides both an assessment of the predictive contribution of each variable and a set of alternative models that may be used in place of the chosen model. We call this procedure “Next-Door analysis” since it examines models “next” to the base model. It can be applied to supervised learning problems with 1 penalization and stepwise procedures. We have implemented it in the R language as a library to accompany the well-known glmnet library. The Canadian Journal of Statistics 48: 447–470; 2020 © 2020 Statistical Society of Canada  相似文献   

6.
A “spurious regression” is one in which the time-series variables are non stationary and independent. It is well known that in this context the OLS parameter estimates and the R 2 converge to functionals of Brownian motions, the “t-ratios” diverge in distribution, and the Durbin–Watson statistic converges in probability to zero. We derive corresponding results for some common tests for the normality and homoskedasticity of the errors in a spurious regression.  相似文献   

7.
In the prospective study of a finely stratified population, one individual from each stratum is chosen at random for the “treatment” group and one for the “non-treatment” group. For each individual the probability of failure is a logistic function of parameters designating the stratum, the treatment and a covariate. Uniformly most powerful unbiased tests for the treatment effect are given. These tests are generally cumbersome but, if the covariate is dichotomous, the tests and confidence intervals are simple. Readily usable (but non-optimal) tests are also proposed for poly-tomous covariates and factorial designs. These are then adapted to retrospective studies (in which one “success” and one “failure” per stratum are sampled). Tests for retrospective studies with a continuous “treatment” score are also proposed.  相似文献   

8.
This article examines several goodness-of-fit measures in the binary probit regression model. Existing pseudo-R 2 measures are reviewed, two modified and one new pseudo-R 2 measure are proposed. For the probit regression model, empirical comparisons are made for different goodness-of-fit measures with the squared sample correlation coefficient of the observed response and the predicted probabilities. As an illustration, the goodness-of-fit measures are applied to a “paid labor force” data set.  相似文献   

9.
We discuss and evaluate bootstrap algorithms for obtaining confidence intervals for parameters in Generalized Linear Models when the data are correlated. The methods are based on a stratified bootstrap and are suited to correlation occurring within “blocks” of data (e.g., individuals within a family, teeth within a mouth, etc.). Application of the intervals to data from a Dutch follow-up study on preterm infants shows the corroborative usefulness of the intervals, while the intervals are seen to be a powerful diagnostic in studying annual measles data. In a simulation study, we compare the coverage rates of the proposed intervals with existing methods (e.g., via Generalized Estimating Equations). In most cases, the bootstrap intervals are seen to perform better than current methods, and are produced in an automatic fashion, so that the user need not know (or have to guess) the dependence structure within a block.  相似文献   

10.
Shuo Li 《Econometric Reviews》2019,38(10):1202-1215
This paper develops a testing procedure to simultaneously check (i) the independence between the error and the regressor(s), and (ii) the parametric specification in nonlinear regression models. This procedure generalizes the existing work of Sen and Sen [“Testing Independence and Goodness-of-fit in Linear Models,” Biometrika, 101, 927–942.] to a regression setting that allows any smooth parametric form of the regression function. We establish asymptotic theory for the test procedure under both conditional homoscedastic error and heteroscedastic error. The derived tests are easily implementable, asymptotically normal, and consistent against a large class of fixed alternatives. Besides, the local power performance is investigated. To calibrate the finite sample distribution of the test statistics, a smooth bootstrap procedure is proposed and found work well in simulation studies. Finally, two real data examples are analyzed to illustrate the practical merit of our proposed tests.  相似文献   

11.
The central limit theorem says that, provided an estimator fulfills certain weak conditions, then, for reasonable sample sizes, the sampling distribution of the estimator converges to normality. We propose a procedure to find out what a “reasonably large sample size” is. The procedure is based on the properties of Gini's mean difference decomposition. We show the results of implementations of the procedure from simulated datasets and data from the German Socio-economic Panel.  相似文献   

12.
An important problem for fitting local linear regression is the choice of the smoothing parameter. As the smoothing parameter becomes large, the estimator tends to a straight line, which is the least squares fit in the ordinary linear regression setting. This property may be used to assess the adequacy of a simple linear model. Motivated by Silverman's (1981) work in kernel density estimation, a suitable test statistic is the critical smoothing parameter where the estimate changes from nonlinear to linear, while linearity or non- linearity requires a more precise judgment. We define the critical smoothing parameter through the approximate F-tests by Hastie and Tibshirani (1990). To assess the significance, the “wild bootstrap” procedure is used to replicate the data and the proportion of bootstrap samples which give a nonlinear estimate when using the critical bandwidth is obtained as the p-value. Simulation results show that the critical smoothing test is useful in detecting a wide range of alternatives.  相似文献   

13.
This report is about the analysis of stochastic processes of the form R = S + N, where S is a “smooth” functional and N is noise. The proposed methods derive from the assumption that the observed R-values and unobserved values of R, the assumed inferential objectives of the analysis, are linearly related through Taylor series expansions of observed about unobserved values. The expansion errors and all other priori unspecified quantities have a joint multivariate normal distribution which expresses the prior uncertainty about their values. The results include interpolators, predictors, and derivative estimates, with credibility-interval estimates automatically generated in each case. An analysis of an acid-rain wet-deposition time series is included to indicate the efficacy of the proposed method. It was this problem which led to the methodological developments reported in this paper.  相似文献   

14.
The Akaike Information Criterion (AIC) is developed for selecting the variables of the nested error regression model where an unobservable random effect is present. Using the idea of decomposing the likelihood into two parts of “within” and “between” analysis of variance, we derive the AIC when the number of groups is large and the ratio of the variances of the random effects and the random errors is an unknown parameter. The proposed AIC is compared, using simulation, with Mallows' C p , Akaike's AIC, and Sugiura's exact AIC. Based on the rates of selecting the true model, it is shown that the proposed AIC performs better.  相似文献   

15.
A class of “optimal”U-statistics type nonparametric test statistics is proposed for the one-sample location problem by considering a kernel depending on a constant a and all possible (distinct) subsamples of size two from a sample of n independent and identically distributed observations. The “optimal” choice of a is determined by the underlying distribution. The proposed class includes the Sign and the modified Wilcoxon signed-rank statistics as special cases. It is shown that any “optimal” member of the class performs better in terms of Pitman efficiency relative to the Sign and Wilcoxon-signed rank statistics. The effect of deviation of chosen a from the “optimal” a on Pitman efficiency is also examined. A Hodges-Lehmann type point estimator of the location parameter corresponding to the proposed “optimal” test-statistics is also defined and studied in this paper.  相似文献   

16.
This paper describes a computer program GTEST for designing group testing experiments for classifying each member of a population of items as “good” or “defective”. The outcome of a test on a group of items is either “negative” (if all items in the group are good) or “positive” (if at least one of the items is defective, but it is not known which). GTEST is based on a Bayesian approach. At each stage, it attempts to maximize (nearly) the expected reduction in the “entropy”, which is a quantitative measure of the amount of uncertainty about the state of the items. The user controls the procedure through specification of the prior probabilities of being defective, restrictions on the construction of the test group, and priorities that are assigned to the items. The nominal prior probabilities can be modified adaptively, to reduce the sensitivity of the procedure to the proportion of defectives in the population.  相似文献   

17.
Moran's I statistic [Moran, (1950), ‘Notes on Continuous Stochastic Phenomena’, Biometrika, 37, 17–23] has been widely used to evaluate spatial autocorrelation. This paper is concerned with Moran's I-induced testing procedure in residual analysis. We begin with exploring the Moran's I statistic in both its original and extended forms analytically and numerically. We demonstrate that the magnitude of the statistic in general depends not only on the underlying correlation but also on certain heterogeneity in the individual observations. One should exercise caution when interpreting the outcome on correlation by the Moran's I-induced procedure. On the other hand, the effect on the Moran's I due to heterogeneity in the observations enables a regression model checking procedure with the residuals. This novel application of Moran's I is justified by simulation and illustrated by an analysis of wildfire records from Alberta, Canada.  相似文献   

18.
An iterative, self-weighting procedure is presented for the fitting of straight lines to data with heteroscedastic error-variances in the response variable. The error-variances are assumed to be unknown, even relative to each other. The procedure is compared with the “resistant line” method advocated by Emerson and Hoaglin [Emerson and Hoaglin, 1983], using extensive Monte-Carlo calculations. The proposed method is simple and easily automated, and gives parameter-estimates with smaller variance (higher efficiency) than those resulting from the resistant line technique. A BASIC program to perform the self-weighting fit is given in an appendix  相似文献   

19.
Many areas of statistical modeling are plagued by the “curse of dimensionality,” in which there are more variables than observations. This is especially true when developing functional regression models where the independent dataset is some type of spectral decomposition, such as data from near-infrared spectroscopy. While we could develop a very complex model by simply taking enough samples (such that n > p), this could prove impossible or prohibitively expensive. In addition, a regression model developed like this could turn out to be highly inefficient, as spectral data usually exhibit high multicollinearity. In this article, we propose a two-part algorithm for selecting an effective and efficient functional regression model. Our algorithm begins by evaluating a subset of discrete wavelet transformations, allowing for variation in both wavelet and filter number. Next, we perform an intermediate processing step to remove variables with low correlation to the response data. Finally, we use the genetic algorithm to perform a stochastic search through the subset regression model space, driven by an information-theoretic objective function. We allow our algorithm to develop the regression model for each response variable independently, so as to optimally model each variable. We demonstrate our method on the familiar biscuit dough dataset, which has been used in a similar context by several researchers. Our results demonstrate both the flexibility and the power of our algorithm. For each response variable, a different subset model is selected, and different wavelet transformations are used. The models developed by our algorithm show an improvement, as measured by lower mean error, over results in the published literature.  相似文献   

20.
The usual one-sided Kolmogorov-Smirnov distance is generalized to obtain an improved lower confidence region for the extreme left tail of the reliability function based on k observations in a “k out of n censored” plan. Finite sample and asymptotic critical values necessary for implementation are given. The two numerical comparisons with existing parametric procedures for the case of complete or censored samples demonstrate the applicability of the proposed nonparametric procedure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号