The r largest order statistics approach is widely used in extreme value analysis because it may use more information from the data than just the block maxima. In practice, the choice of r is critical. If r is too large, bias can occur; if too small, the variance of the estimator can be high. The limiting distribution of the r largest order statistics, denoted by GEV\(_r\), extends that of the block maxima. Two specification tests are proposed to select r sequentially. The first is a score test for the GEV\(_r\) distribution. Due to the special characteristics of the GEV\(_r\) distribution, the classical chi-square asymptotics cannot be used. The simplest approach is to use the parametric bootstrap, which is straightforward to implement but computationally expensive. An alternative fast weighted bootstrap or multiplier procedure is developed for computational efficiency. The second test uses the difference in estimated entropy between the GEV\(_r\) and GEV\(_{r-1}\) models, applied to the r largest order statistics and the \(r-1\) largest order statistics, respectively. The asymptotic distribution of the difference statistic is derived. In a large scale simulation study, both tests held their size and had substantial power to detect various misspecification schemes. A new approach to address the issue of multiple, sequential hypotheses testing is adapted to this setting to control the false discovery rate or familywise error rate. The utility of the procedures is demonstrated with extreme sea level and precipitation data.  相似文献   

Let X be a N(μ, σ 2) distributed characteristic with unknown σ. We present the minimax version of the two-stage t test having minimal maximal average sample size among all two-stage t tests obeying the classical two-point-condition on the operation characteristic. We give several examples. Furthermore, the minimax version of the two-stage t test is compared with the corresponding two-stage Gauß test.  相似文献   

Estimation of prediction accuracy is important when our aim is prediction. The training error is an easy estimate of prediction error, but it has a downward bias. On the other hand, K-fold cross-validation has an upward bias. The upward bias may be negligible in leave-one-out cross-validation, but it sometimes cannot be neglected in 5-fold or 10-fold cross-validation, which are favored from a computational standpoint. Since the training error has a downward bias and K-fold cross-validation has an upward bias, there will be an appropriate estimate in a family that connects the two estimates. In this paper, we investigate two families that connect the training error and K-fold cross-validation.  相似文献   

This note shows that the asymptotic properties of the quasi-maximum likelihood estimation for dynamic panel models can be easily derived by following the approach of Grassetti (Stat Methods Appl 20:221–240, 2011) to take the long difference to remove the time-invariant individual specific effects.  相似文献   

The skew t-distribution includes both the skew normal and the normal distributions as special cases. Inference for the skew t-model becomes problematic in these cases because the expected information matrix is singular and the parameter corresponding to the degrees of freedom takes a value at the boundary of its parameter space. In particular, the distributions of the likelihood ratio statistics for testing the null hypotheses of skew normality and normality are not asymptotically \(\chi ^2\). The asymptotic distributions of the likelihood ratio statistics are considered by applying the results of Self and Liang (J Am Stat Assoc 82:605–610, 1987) for boundary-parameter inference in terms of reparameterizations designed to remove the singularity of the information matrix. The Self–Liang asymptotic distributions are mixtures, and it is shown that their accuracy can be improved substantially by correcting the mixing probabilities. Furthermore, although the asymptotic distributions are non-standard, versions of Bartlett correction are developed that afford additional accuracy. Bootstrap procedures for estimating the mixing probabilities and the Bartlett adjustment factors are shown to produce excellent approximations, even for small sample sizes.  相似文献   

A new algorithm is presented and studied in this paper for fast computation of the nonparametric maximum likelihood estimate of a U-shaped hazard function. It successfully overcomes a difficulty when computing a U-shaped hazard function, which is only properly defined by knowing its anti-mode, and the anti-mode itself has to be found during the computation. Specifically, the new algorithm maintains the constant hazard segment, regardless of its length being zero or positive. The length varies naturally, according to what mass values are allocated to their associated knots after each updating. Being an appropriate extension of the constrained Newton method, the new algorithm also inherits its advantage of fast convergence, as demonstrated by some real-world data examples. The algorithm works not only for exact observations, but also for purely interval-censored data, and for data mixed with exact and interval-censored observations.  相似文献   

Mixtures of multivariate t distributions provide a robust parametric extension to the fitting of data with respect to normal mixtures. In presence of some noise component, potential outliers or data with longer-than-normal tails, one way to broaden the model can be provided by considering t distributions. In this framework, the degrees of freedom can act as a robustness parameter, tuning the heaviness of the tails, and downweighting the effect of the outliers on the parameters estimation. The aim of this paper is to extend to mixtures of multivariate elliptical distributions some theoretical results about the likelihood maximization on constrained parameter spaces. Further, a constrained monotone algorithm implementing maximum likelihood mixture decomposition of multivariate t distributions is proposed, to achieve improved convergence capabilities and robustness. Monte Carlo numerical simulations and a real data study illustrate the better performance of the algorithm, comparing it to earlier proposals.  相似文献   

A general formulation of mixed proportional hazard models with K random effects is provided. It enables to account for a population stratified at K different levels. I then show how to approximate the partial maximum likelihood estimator using an EM algorithm. In a Monte Carlo study, the behavior of the estimator is assessed and I provide an application to the ratification of ILO conventions. Compared to other procedures, the results indicate an important decrease in computing time, as well as improved convergence and stability.  相似文献   

The distribution of the probabilities of misclassification is derived in this paper, which are reproduced by the use of the linear discriminant function. The statistical background is two independent doubly truncated t populations with distinct location parameters and common scale parameter and degrees of freedom. The behavior of the linear discriminant function is studied by comparing the distribution function of the errors of misclassification under the truncated t and truncated normal models.  相似文献   

Unfortunately many of the numerous algorithms for computing the comulative distribution function (cdf) and noncentrality parameter of the noncentral F and beta distributions can produce completely incorrect results as demonstrated in the paper by examples. Existing algorithms are scrutinized and those parts that involve numerical difficulties are identified. As a result, a pseudo code is presented in which all the known numerical problems are resolved. This pseudo code can be easily implemented in programming language C or FORTRAN without understanding the complicated mathematical background. Symbolic evaluation of a finite and closed formula is proposed to compute exact cdf values. This approach makes it possible to check quickly and reliably the values returned by professional statistical packages over an extraordinarily wide parameter range without any programming knowledge. This research was motivated by the fact that a very useful table for calculating the size of detectable effects for ANOVA tables contains suspect values in the region of large noncentrality parameter values compared to the values obtained by Patnaik’s 2-moment central-F approximation. The cause is identified and the corrected form of the table for ANOVA purposes is given. The accuracy of the approximations to the noncentral-F distribution is also discussed. The authors wish to thank Mr. Richárd Király for his preliminary work. The authors are grateful to the Editor and Associate Editor of STCO and the unknown reviewers for their helpful suggestions.  相似文献   

Exact permutation testing of effects in unreplicated two-level multifactorial designs is developed based on the notion of realigning observations and on paired permutations. This approach preserves the exchangeability of error components for testing up tok effects. Advantages and limitations of exact permutation procedures for unreplicated factorials are discussed and a simulation study on paired permutation testing is presented.  相似文献   

The pooled variance of p samples presumed to have been obtained from p populations having common variance σ2, has invariably been adopted as the default estimator for σ2. In this paper, alternative estimators of the common population variance are developed. These estimators are biased and have lower mean-squared error values than . The comparative merit of these estimators over the unbiased estimator is explored using relative efficiency (a ratio of mean-squared error values).  相似文献   

In this paper we consider the double k-class estimator which incorporates the Stein variance estimator. This estimator is called the SVKK estimator. We derive the explicit formula for the mean squared error (MSE) of the SVKK estimator for each individual regression coefficient. It is shown analytically that the MSE performance of the Stein-rule estimator for each individual regression coefficient can be improved by utilizing the Stein variance estimator. Also, MSE’s of several estimators included in a family of the SVKK estimators are compared by numerical evaluations.  相似文献   

This paper presents a novel framework for maximum likelihood (ML) estimation in skew-t factor analysis (STFA) models in the presence of missing values or nonresponses. As a robust extension of the ordinary factor analysis model, the STFA model assumes a restricted version of the multivariate skew-t distribution for the latent factors and the unobservable errors to accommodate non-normal features such as asymmetry and heavy tails or outliers. An EM-type algorithm is developed to carry out ML estimation and imputation of missing values under a missing at random mechanism. The practical utility of the proposed methodology is illustrated through real and synthetic data examples.  相似文献   

The aim of this paper is to develop a general, unified approach, based on some partial estimation functions which we call “Z-process”, to some change point problems in mathematical statistics. The method proposed can be applied not only to ergodic models but also to some models where the Fisher information matrix is random. Applications to some concrete models, including a parametric model for volatilities of diffusion processes are presented. Simulations for randomly time-transformed Brownian bridge process appearing as the limit of the proposed test statistics are performed with computer intensive use.  相似文献   

The current status and panel count data frequently arise from cancer and tumorigenicity studies when events currently occur. A common and widely used class of two sample tests, for current status and panel count data, is the permutation class. We manipulate the double saddlepoint method to calculate the exact mid-p-values of the underlying permutation distributions of this class of tests. Permutation simulations are replaced by analytical saddlepoint computations which provide extremely accurate mid-p-values that are exact for most practical purposes and almost always more accurate than normal approximations. The method is illustrated using two real tumorigenicity panel count data. To compare the saddlepoint approximation with the normal asymptotic approximation, a simulation study is conducted. The speed and accuracy of the saddlepoint method facilitate the calculation of the confidence interval for the treatment effect. The inversion of the mid-p-values to calculate the confidence interval for the mean rate of development of the recurrent event is discussed.  相似文献   

Crime or disease surveillance commonly rely in space-time clustering methods to identify emerging patterns. The goal is to detect spatial-temporal clusters as soon as possible after its occurrence and to control the rate of false alarms. With this in mind, a spatio-temporal multiple cluster detection method was developed as an extension of a previous proposal based on a spatial version of the Shiryaev–Roberts statistic. Besides the capability of multiple cluster detection, the method have less input parameter than the previous proposal making its use more intuitive to practitioners. To evaluate the new methodology a simulation study is performed in several scenarios and enlighten many advantages of the proposed method. Finally, we present a case study to a crime data-set in Belo Horizonte, Brazil.  相似文献   

W-graph refers to a general class of random graph models that can be seen as a random graph limit. It is characterized by both its graphon function and its motif frequencies. In this paper, relying on an existing variational Bayes algorithm for the stochastic block models (SBMs) along with the corresponding weights for model averaging, we derive an estimate of the graphon function as an average of SBMs with increasing number of blocks. In the same framework, we derive the variational posterior frequency of any motif. A simulation study and an illustration on a social network complete our work.  相似文献   

