共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we consider the setting where the observed data is incomplete. For the general situation where the number of gaps as well as the number of unobserved values in some gaps go to infinity, the asymptotic behavior of maximum likelihood estimator is not clear. We derive and investigate the asymptotic properties of maximum likelihood estimator under censorship and drive a statistic for testing the null hypothesis that the proposed non-nested models are equally close to the true model against the alternative hypothesis that one model is closer when we are faced with a life-time situation. Furthermore rewrite a normalization of a difference of Akaike criterion for estimating the difference of expected Kullback–Leibler risk between the distributions in two different models. 相似文献
2.
In many studies a large number of variables is measured and the identification of relevant variables influencing an outcome is an important task. For variable selection several procedures are available. However, focusing on one model only neglects that there usually exist other equally appropriate models. Bayesian or frequentist model averaging approaches have been proposed to improve the development of a predictor. With a larger number of variables (say more than ten variables) the resulting class of models can be very large. For Bayesian model averaging Occam’s window is a popular approach to reduce the model space. As this approach may not eliminate any variables, a variable screening step was proposed for a frequentist model averaging procedure. Based on the results of selected models in bootstrap samples, variables are eliminated before deriving a model averaging predictor. As a simple alternative screening procedure backward elimination can be used. Through two examples and by means of simulation we investigate some properties of the screening step. In the simulation study we consider situations with fifteen and 25 variables, respectively, of which seven have an influence on the outcome. With the screening step most of the uninfluential variables will be eliminated, but also some variables with a weak effect. Variable screening leads to more applicable models without eliminating models, which are more strongly supported by the data. Furthermore, we give recommendations for important parameters of the screening step. 相似文献
3.
In this article, we consider the problem of testing the hypothesis on mean vectors in multiple-sample problem when the number of observations is smaller than the number of variables. First we propose an independence rule test (IRT) to deal with high-dimensional effects. The asymptotic distributions of IRT under the null hypothesis as well as under the alternative are established when both the dimension and the sample size go to infinity. Next, using the derived asymptotic power of IRT, we propose an adaptive independence rule test (AIRT) that is particularly designed for testing against sparse alternatives. Our AIRT is novel in that it can effectively pick out a few relevant features and reduce the effect of noise accumulation. Real data analysis and Monte Carlo simulations are used to illustrate our proposed methods. 相似文献
4.
Adam McCloskey 《商业与经济统计学杂志》2020,38(4):810-825
AbstractThis article specializes the critical value (CV) methods that are based upon (refinements of) Bonferroni bounds, introduced by McCloskey to a problem of inference after consistent model selection in a general linear regression model. The post-selection problem is formulated to mimic common empirical practice and is applicable to both cross-sectional and time series contexts. We provide algorithms for constructing the CVs in this setting and establish uniform asymptotic size results for the resulting tests. The practical implementation of the CVs is illustrated in an empirical application to the effect of classroom size on test scores. 相似文献
5.
Bootstrapping has been used as a diagnostic tool for validating model results for a wide array of statistical models. Here we evaluate the use of the non-parametric bootstrap for model validation in mixture models. We show that the bootstrap is problematic for validating the results of class enumeration and demonstrating the stability of parameter estimates in both finite mixture and regression mixture models. In only 44% of simulations did bootstrapping detect the correct number of classes in at least 90% of the bootstrap samples for a finite mixture model without any model violations. For regression mixture models and cases with violated model assumptions, the performance was even worse. Consequently, we cannot recommend the non-parametric bootstrap for validating mixture models.
The cause of the problem is that when resampling is used influential individual observations have a high likelihood of being sampled many times. The presence of multiple replications of even moderately extreme observations is shown to lead to additional latent classes being extracted. To verify that these replications cause the problems we show that leave-k-out cross-validation where sub-samples taken without replacement does not suffer from the same problem. 相似文献
6.
We develop diagnostic tests for random-effects multi-spell multi-state models focusing on: independence between the unobserved heterogeneity and observed covariates; mutual independence of heterogeneity terms; and distributional form. They are applied to a transition model of the British youth labor market, revealing significant misspecifications in our initial model, and allowing us to develop a considerably better-fitting specification that would have been difficult to reach by other means. The improved specification implies reduced estimates of the effectiveness of the youth training scheme (YTS), but we nevertheless retain the conclusion of significant positive effects of YTS on employment prospects. 相似文献
7.
M. L. Martin-Magniette 《Journal of the Royal Statistical Society. Series C, Applied statistics》2005,54(2):317-331
Summary. Controversy has intensified regarding the death-rate from cancer that is induced by a dose of radiation. In the models that are usually considered the hazard function is an increasing function of the dose of radiation. Such models can mask local variations. We consider the models of excess relative risk and of absolute risk and propose a nonparametric estimation of the effect of the dose by using a model selection procedure. This estimation deals with stratified data. We approximate the function of the dose by a collection of splines and select the best one according to the Akaike information criterion. In the same way between the models of excess relative risk or excess absolute risk, we choose the model that best fits the data. We propose a bootstrap method for calculating a pointwise confidence interval of the dose function. We apply our method for estimating the solid cancer and leukaemia death hazard functions to Hiroshima. 相似文献
8.
Jing Qin 《Scandinavian Journal of Statistics》1998,25(4):681-691
We use Owen's (1988, 1990) empirical likelihood method in upgraded mixture models. Two groups of independent observations are available. One is z 1 , ..., z n which is observed directly from a distribution F ( z ). The other one is x 1 , ..., x m which is observed indirectly from F ( z ), where the x i s have density ∫ p ( x | z ) dF ( z ) and p ( x | z ) is a conditional density function. We are interested in testing H 0 : p ( x | z ) = p ( x | z ; θ ), for some specified smooth density function. A semiparametric likelihood ratio based statistic is proposed and it is shown that it converges to a chi-squared distribution. This is a simple method for doing goodness of fit tests, especially when x is a discrete variable with finitely many values. In addition, we discuss estimation of θ and F ( z ) when H 0 is true. The connection between upgraded mixture models and general estimating equations is pointed out. 相似文献
9.
《Journal of Statistical Computation and Simulation》2012,82(4):333-350
The problem of constructing confidence intervals to estimate the mean in a two-stage nested model is considered. Several approximate intervals, which are based on both linear and nonlinear estimators of the mean are investigated. In particular, the method of bootstrap is used to correct the bias in the ‘usual’ variance of the nonlinear estimators. It is found that the intervals based on the nonlinear estimators did not achieve the nominal confidence coefficient for designs involving a small number of groups. Further, it turns out that the intervals are generally conservative, especially at small values of the intraclass correlation coefficient, and that the intervals based on the nonlinear estimators are more conservative than those based on the linear estimators. Compared with the others, the intervals based on the unweighted mean of the group means performed well in terms of coverage and length. For small values of the intraclass correlation coefficient, the ANOVA estimators of the variance components are recommended, otherwise the unweighted means estimator of the between groups variance component should be used. If one is fortunate enough to have control over the design, he is advised to increase the number of groups, as opposed to increasing group sizes, while avoiding groups of size one or two. 相似文献
10.
Nicolai Bissantz Gerda Claeskens Hajo Holzmann Axel Munk 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2009,71(1):25-48
Summary. We propose two test statistics for use in inverse regression problems Y = K θ + ɛ , where K is a given linear operator which cannot be continuously inverted. Thus, only noisy, indirect observations Y for the function θ are available. Both test statistics have a counterpart in classical hypothesis testing, where they are called the order selection test and the data-driven Neyman smooth test. We also introduce two model selection criteria which extend the classical Akaike information criterion and Bayes information criterion to inverse regression problems. In a simulation study we show that the inverse order selection and Neyman smooth tests outperform their direct counterparts in many cases. The theory is motivated by data arising in confocal fluorescence microscopy. Here, images are observed with blurring, modelled as convolution, and stochastic error at subsequent times. The aim is then to reduce the signal-to-noise ratio by averaging over the distinct images. In this context it is relevant to decide whether the images are still equal, or have changed by outside influences such as moving of the object table. 相似文献
11.
12.
13.
Dominique Haughton 《统计学通讯:理论与方法》2013,42(5-6):1619-1629
In this paper we prove the consistency in probability of a class of generalized BIC criteria for model selection in non-linear regression, by using asymptotic results of Gallant. This extends a result obtained by Nishii for model selection in linear regression. 相似文献
14.
Lan Wang Annie Qu 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2009,71(1):177-190
Summary. Model selection for marginal regression analysis of longitudinal data is challenging owing to the presence of correlation and the difficulty of specifying the full likelihood, particularly for correlated categorical data. The paper introduces a novel Bayesian information criterion type model selection procedure based on the quadratic inference function, which does not require the full likelihood or quasi-likelihood. With probability approaching 1, the criterion selects the most parsimonious correct model. Although a working correlation matrix is assumed, there is no need to estimate the nuisance parameters in the working correlation matrix; moreover, the model selection procedure is robust against the misspecification of the working correlation matrix. The criterion proposed can also be used to construct a data-driven Neyman smooth test for checking the goodness of fit of a postulated model. This test is especially useful and often yields much higher power in situations where the classical directional test behaves poorly. The finite sample performance of the model selection and model checking procedures is demonstrated through Monte Carlo studies and analysis of a clinical trial data set. 相似文献
15.
We consider multiple comparison test procedures among treatment effects in a randomized block design. We propose closed testing procedures based on maximum values of some two-sample t test statistics and based on F test statistics. It is shown that the proposed procedures are more powerful than single-step procedures and the REGW (Ryan/Einot–Gabriel/Welsch)-type tests. Next, we consider the randomized block design under simple ordered restrictions of treatment effects. We propose closed testing procedures based on maximum values of two-sample one-sided t test statistics and based on Batholomew’s statistics for all pairwise comparisons of treatment effects. Although single-step multiple comparison procedures are utilized in general, the power of these procedures is low for a large number of groups. The closed testing procedures stated in the present article are more powerful than the single-step procedures. Simulation studies are performed under the null hypothesis and some alternative hypotheses. In this studies, the proposed procedures show a good performance. 相似文献
16.
This article considers constructing confidence intervals for the date of a structural break in linear regression models. Using extensive simulations, we compare the performance of various procedures in terms of exact coverage rates and lengths of the confidence intervals. These include the procedures of Bai (1997) based on the asymptotic distribution under a shrinking shift framework, Elliott and Müller (2007) based on inverting a test locally invariant to the magnitude of break, Eo and Morley (2015) based on inverting a likelihood ratio test, and various bootstrap procedures. On the basis of achieving an exact coverage rate that is closest to the nominal level, Elliott and Müller's (2007) approach is by far the best one. However, this comes with a very high cost in terms of the length of the confidence intervals. When the errors are serially correlated and dealing with a change in intercept or a change in the coefficient of a stationary regressor with a high signal-to-noise ratio, the length of the confidence interval increases and approaches the whole sample as the magnitude of the change increases. The same problem occurs in models with a lagged dependent variable, a common case in practice. This drawback is not present for the other methods, which have similar properties. Theoretical results are provided to explain the drawbacks of Elliott and Müller's (2007) method. 相似文献
17.
Peter Hall D. M. Titterington Jing-Hao Xue 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2009,71(4):783-803
Summary. Many contemporary classifiers are constructed to provide good performance for very high dimensional data. However, an issue that is at least as important as good classification is determining which of the many potential variables provide key information for good decisions. Responding to this issue can help us to determine which aspects of the datagenerating mechanism (e.g. which genes in a genomic study) are of greatest importance in terms of distinguishing between populations. We introduce tilting methods for addressing this problem. We apply weights to the components of data vectors, rather than to the data vectors themselves (as is commonly the case in related work). In addition we tilt in a way that is governed by L 2 -distance between weight vectors, rather than by the more commonly used Kullback–Leibler distance. It is shown that this approach, together with the added constraint that the weights should be non-negative, produces an algorithm which eliminates vector components that have little influence on the classification decision. In particular, use of the L 2 -distance in this problem produces properties that are reminiscent of those that arise when L 1 -penalties are employed to eliminate explanatory variables in very high dimensional prediction problems, e.g. those involving the lasso. We introduce techniques that can be implemented very rapidly, and we show how to use bootstrap methods to assess the accuracy of our variable ranking and variable elimination procedures. 相似文献
18.
19.
In reliability and life-testing experiments, the researcher is often interested in the effects of extreme or varying stress factors such as temperature, voltage and load on the lifetimes of experimental units. Step-stress test, which is a special class of accelerated life-tests, allows the experimenter to increase the stress levels at fixed times during the experiment in order to obtain information on the parameters of the life distributions more quickly than under normal operating conditions. In this paper, we consider a new step-stress model in which the life-testing experiment gets terminated either at a pre-fixed time (say, Tm+1) or at a random time ensuring at least a specified number of failures (say, r out of n). Under this model in which the data obtained are Type-II hybrid censored, we consider the case of exponential distribution for the underlying lifetimes. We then derive the maximum likelihood estimators (MLEs) of the parameters assuming a cumulative exposure model with lifetimes being exponentially distributed. The exact distributions of the MLEs of parameters are obtained through the use of conditional moment generating functions. We also derive confidence intervals for the parameters using these exact distributions, asymptotic distributions of the MLEs and the parametric bootstrap methods, and assess their performance through a Monte Carlo simulation study. Finally, we present two examples to illustrate all the methods of inference discussed here. 相似文献
20.
This paper is concerned with the problem of constructing a good predictive distribution relative to the Kullback–Leibler information in a linear regression model. The problem is equivalent to the simultaneous estimation of regression coefficients and error variance in terms of a complicated risk, which yields a new challenging issue in a decision-theoretic framework. An estimator of the variance is incorporated here into a loss for estimating the regression coefficients. Several estimators of the variance and of the regression coefficients are proposed and shown to improve on usual benchmark estimators both analytically and numerically. Finally, the prediction problem of a distribution is noted to be related to an information criterion for model selection like the Akaike information criterion (AIC). Thus, several AIC variants are obtained based on proposed and improved estimators and are compared numerically with AIC as model selection procedures. 相似文献