首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
This paper considers two general ways dependent groups might be compared based on quantiles. The first compares the quantiles of the marginal distributions. The second focuses on the lower and upper quantiles of the usual difference scores. Methods for comparing quantiles have been derived that typically assume that sampling is from a continuous distribution. There are exceptions, but generally, when sampling from a discrete distribution where tied values are likely, extant methods can perform poorly, even with a large sample size. One reason is that extant methods for estimating the standard error can perform poorly. Another is that quantile estimators based on a single-order statistic, or a weighted average of two-order statistics, are not necessarily asymptotically normal. Our main result is that when using the Harrell–Davis estimator, good control over the Type I error probability can be achieved in simulations via a standard percentile bootstrap method, even when there are tied values, provided the sample sizes are not too small. In addition, the two methods considered here can have substantially higher power than alternative procedures. Using real data, we illustrate how quantile comparisons can be used to gain a deeper understanding of how groups differ.  相似文献   

2.
This paper introduces some robust estimation procedures to estimate quantiles of a continuous random variable based on data, without any other assumptions of probability distribution. We construct a reasonable linear regression model to connect the relationship between a suitable symmetric data transformation and the approximate standard normal statistics. Statistical properties of this linear regression model and its applications are studied, including estimators of quantiles, quartile mean, quartile deviation, correlation coefficient of quantiles and standard errors of these estimators. We give some empirical examples to illustrate the statistical properties and apply our estimators to grouping data.  相似文献   

3.
Error rate is a popular criterion for assessing the performance of an allocation rule in discriminant analysis. Training samples which involve missing values cause problems for those error rate estimators that require all variables to be observed at all data points. This paper explores imputation algorithms, their effects on, and problems of implementing them with, eight commonly used error rate estimators (three parametric and five non-parametric) in linear discriminant analysis. The results indicate that imputation should not be based on the way error rate estimators are calculated, and that imputed values may underestimate error rates.  相似文献   

4.
Estimation for Type II domain of attraction based on the W statistic   总被引:1,自引:0,他引:1  
The paper presents an estimating equation approach to the estimation of high quantiles of a distribution in the Type II domain of attraction based on the k largest order statistics. The estimators are shown to be consistent. The method fits neatly into a general scheme for estimating high quantiles irrespective of the domain of attraction, which includes Wang's approach to optimally choosing k .  相似文献   

5.
It is well known that if a multivariate outlier has one or more missing component values, then multiple imputation (MI) methods tend to impute nonextreme values and make the outlier become less extreme and less likely to be detected. In this paper, nonparametric depth-based multivariate outlier identifiers are used as criteria in a numerical study comparing several established methods of MI as well as a new proposed one, nine in all, in a setting of several actual clinical laboratory data sets of different dimensions. Two criteria, an ‘outlier recovery probability’ and a ‘relative accuracy measure’, are developed, based on depth functions. Three outlier identifiers, based on Mahalanobis distance, robust Mahalanobis distance, and generalized principle component analysis are also included in the study. Consequently, not only the comparison of imputation methods but also the comparison of outlier detection methods is accomplished in this study. Our findings show that the performance of an MI method depends on the choice of depth-based outlier detection criterion, as well as the size and dimension of the data and the fraction of missing components. By taking these features into account, an MI method for a given data set can be selected more optimally.  相似文献   

6.
Various exact tests for statistical inference are available for powerful and accurate decision rules provided that corresponding critical values are tabulated or evaluated via Monte Carlo methods. This article introduces a novel hybrid method for computing p‐values of exact tests by combining Monte Carlo simulations and statistical tables generated a priori. To use the data from Monte Carlo generations and tabulated critical values jointly, we employ kernel density estimation within Bayesian‐type procedures. The p‐values are linked to the posterior means of quantiles. In this framework, we present relevant information from the Monte Carlo experiments via likelihood‐type functions, whereas tabulated critical values are used to reflect prior distributions. The local maximum likelihood technique is employed to compute functional forms of prior distributions from statistical tables. Empirical likelihood functions are proposed to replace parametric likelihood functions within the structure of the posterior mean calculations to provide a Bayesian‐type procedure with a distribution‐free set of assumptions. We derive the asymptotic properties of the proposed nonparametric posterior means of quantiles process. Using the theoretical propositions, we calculate the minimum number of needed Monte Carlo resamples for desired level of accuracy on the basis of distances between actual data characteristics (e.g. sample sizes) and characteristics of data used to present corresponding critical values in a table. The proposed approach makes practical applications of exact tests simple and rapid. Implementations of the proposed technique are easily carried out via the recently developed STATA and R statistical packages.  相似文献   

7.
Based on a chi square transform of the multivariate normal data set, we proposed a technique for testing multinormality which is the sum of interpoint squared distances between an ordered set of the transformed observations and the set of the population pth quantiles of the chi squared distribution. The critical values of the test were evaluated for different sample sizes and random vector dimensions through extensive simulations. The empirical type-I-error rates and powers of the proposed test were compared with those of some other well known tests for MVN with the proposed test showing excellent results at large sample sizes.  相似文献   

8.
The most common strategy for comparing two independent groups is in terms of some measure of location intended to reflect the typical observation. However, it can be informative and important to compare the lower and upper quantiles as well, but when there are tied values, extant techniques suffer from practical concerns reviewed in the paper. For the special case where the goal is to compare the medians, a slight generalization of the percentile bootstrap method performs well in terms of controlling Type I errors when there are tied values [Wilcox RR. Comparing medians. Comput. Statist. Data Anal. 2006;51:1934–1943]. But our results indicate that when the goal is to compare the quartiles, or quantiles close to zero or one, this approach is highly unsatisfactory when the quantiles are estimated using a single order statistic or a weighted average of two order statistics. The main result in this paper is that when using the Harrell–Davis estimator, which uses all of the order statistics to estimate a quantile, control over the Type I error probability can be achieved in simulations, even when there are tied values, provided the sample sizes are not too small. It is demonstrated that this method can also have substantially higher power than the distribution free method derived by Doksum and Sievers [Plotting with confidence: graphical comparisons of two populations. Biometrika 1976;63:421–434]. Data from two studies are used to illustrate the practical advantages of the method studied here.  相似文献   

9.
For noninformative nonparametric estimation of finite population quantiles under simple random sampling, estimation based on the Polya posterior is similar to estimation based on the Bayesian approach developed by Ericson (J. Roy. Statist. Soc. Ser. B 31 (1969) 195) in that the Polya posterior distribution is the limit of Ericson's posterior distributions as the weight placed on the prior distribution diminishes. Furthermore, Polya posterior quantile estimates can be shown to be admissible under certain conditions. We demonstrate the admissibility of the sample median as an estimate of the population median under such a set of conditions. As with Ericson's Bayesian approach, Polya posterior-based interval estimates for population quantiles are asymptotically equivalent to the interval estimates obtained from standard frequentist approaches. In addition, for small to moderate sized populations, Polya posterior-based interval estimates for quantiles of a continuous characteristic of interest tend to agree with the standard frequentist interval estimates.  相似文献   

10.
In this paper, a probability plots class of tests for multivariate normality is introduced. Based on independent standardized principal components of a d-variate normal data set, we obtained the sum of squared differences between corresponding observations of an ordered set of each principal component observations and the set of the population pth quantiles of the standard normal distribution. We proposed the sum of these d-sums of squared differences as an appropriate statistic for testing multivariate normality. We evaluated empirical critical values of the statistic and compared its power with those of some highly regarded techniques with a wonderful result.  相似文献   

11.
In this paper the estimation of high return period quantiles of the flood peak and volume in the Kolubara River basin are carried out. Estimation of flood frequencies is carried out on a data set containing high outliers which are identified by the Rosner’s test. Simultaneously, low outliers are determined by the multiple Grubbs–Beck. The next step involved the usage of the mixed distribution functions applied to a data set from three populations: floods with low outliers, normal floods and floods with high outliers. The contribution of the data set with low outliers is neglected, since it should underestimate the flood quantiles with large return periods. Consequently, the best fitted mixed distribution from the applied types (EV1, GEV, P3 and LP3) was determined by using the minimum standard error of fit.  相似文献   

12.
QUANTILES OF SUMS AND EXPECTED VALUES OF ORDERED SUMS   总被引:1,自引:0,他引:1  
Watson & Gordon (1986) investigated the relationship between the quantiles of a sum of independent continuous random variables and the sum of the individual quantiles. In this note some further results are obtained. Also corresponding relationships are developed for the expected values of the order statistics of a sum, and for the sum of the expected values of the individual order statistics.  相似文献   

13.
Abstract

In survival or reliability data analysis, it is often useful to estimate the quantiles of the lifetime distribution, such as the median time to failure. Different nonparametric methods can construct confidence intervals for the quantiles of the lifetime distributions, some of which are implemented in commonly used statistical software packages. We here investigate the performance of different interval estimation procedures under a variety of settings with different censoring schemes. Our main objectives in this paper are to (i) evaluate the performance of confidence intervals based on the transformation approach commonly used in statistical software, (ii) introduce a new density-estimation-based approach to obtain confidence intervals for survival quantiles, and (iii) compare it with the transformation approach. We provide a comprehensive comparative study and offer some useful practical recommendations based on our results. Some numerical examples are presented to illustrate the methodologies developed.  相似文献   

14.
We used a proper multiple imputation (MI) through Gibbs sampling approach to impute missing values of a gamma distributed outcome variable which were missing at random, using generalized linear model (GLM) with identity link function. The missing values of the outcome variable were multiply imputed using GLM and then the complete data sets obtained after MI were analysed through GLM again for the estimation purpose. We examined the performance of the proposed technique through a simulation study with the data sets having four moderate and large proportions of missing values, 10%, 20%, 30% and 50%. We also applied this technique on a real life data and compared the results with those obtained by applying GLM only on observed cases. The results showed that the proposed technique gave better results for moderate proportions of missing values.  相似文献   

15.
The statistical literature on the analysis of discrete variate time series has concentrated mainly on parametric models, that is the conditional probability mass function is assumed to belong to a parametric family. Generally, these parametric models impose strong assumptions on the relationship between the conditional mean and variance. To generalize these implausible assumptions, this paper instead considers a more realistic semiparametric model, called random rounded integer-valued autoregressive conditional heteroskedastic (RRINARCH) model, where there are essentially no assumptions on the relationship between the conditional mean and variance. The new model has several advantages: (a) it provides a coherent semiparametric framework for discrete variate time series, in which the conditional mean and variance can be modeled separately; (b) it allows negative values both for the series and its autocorrelation function; (c) its autocorrelation structure is the same as that of a standard autoregressive (AR) process; (d) standard software for its estimation is directly applicable. For the new model, conditions for stationarity, ergodicity and the existence of moments are established and the consistency and asymptotic normality of the conditional least squares estimator are proved. Simulation experiments are carried out to assess the performance of the model. The analyses of real data sets illustrate the flexibility and usefulness of the RRINARCH model for obtaining more realistic forecast means and variances.  相似文献   

16.
In this paper nonparametric simultaneous tolerance limits are developed using rectangle probabilities for uniform order statistics. Consideration is given to the handling of censored data, and some comparisons are made with the parametric normal theory. The nonparametric regional estimation techniques of (i) confidence bands for a distribution function, (ii) simultaneous confidence intervals for quantiles and (iii) simultaneous tolerance limits are unified. A Bayesian approach is also discussed.  相似文献   

17.
The appropriate interpretation of measurements often requires standardization for concomitant factors. For example, standardization of weight for both height and age is important in obesity research and in failure-to-thrive research in children. Regression quantiles from a reference population afford one intuitive and popular approach to standardization. Current methods for the estimation of regression quantiles can be classified as nonparametric with respect to distributional assumptions or as fully parametric. We propose a semiparametric method where we model the mean and variance as flexible regression spline functions and allow the unspecified distribution to vary smoothly as a function of covariates. Similarly to Cole and Green, our approach provides separate estimates and summaries for location, scale and distribution. However, similarly to Koenker and Bassett, we do not assume any parametric form for the distribution. Estimation for either cross-sectional or longitudinal samples is obtained by using estimating equations for the location and scale functions and through local kernel smoothing of the empirical distribution function for standardized residuals. Using this technique with data on weight, height and age for females under 3 years of age, we find that there is a close relationship between quantiles of weight for height and age and quantiles of body mass index (BMI=weight/height2) for age in this cohort.  相似文献   

18.
For many continuous distributions, a closed-form expression for their quantiles does not exist. Numerical approximations for their quantiles are developed on a distribution-by-distribution basis. This work develops a general approximation for quantiles using the Taylor expansion. Our method only requires that the distribution has a continuous probability density function and its derivatives can be derived to a certain order (usually 3 or 4). We demonstrate our unified approach by approximating the quantiles of the normal, exponential, and chi-square distributions. The approximation works well for these distributions.  相似文献   

19.
Populational conditional quantiles in terms of percentage α are useful as indices for identifying outliers. We propose a class of symmetric quantiles for estimating unknown nonlinear regression conditional quantiles. In large samples, symmetric quantiles are more efficient than regression quantiles considered by Koenker and Bassett (Econometrica 46 (1978) 33) for small or large values of α, when the underlying distribution is symmetric, in the sense that they have smaller asymptotic variances. Symmetric quantiles play a useful role in identifying outliers. In estimating nonlinear regression parameters by symmetric trimmed means constructed by symmetric quantiles, we show that their asymptotic variances can be very close to (or can even attain) the Cramer–Rao lower bound under symmetric heavy-tailed error distributions, whereas the usual robust and nonrobust estimators cannot.  相似文献   

20.
Missing data methods, maximum likelihood estimation (MLE) and multiple imputation (MI), for longitudinal questionnaire data were investigated via simulation. Predictive mean matching (PMM) was applied at both item and scale levels, logistic regression at item level and multivariate normal imputation at scale level. We investigated a hybrid approach which is combination of MLE and MI, i.e. scales from the imputed data are eliminated if all underlying items were originally missing. Bias and mean square error (MSE) for parameter estimates were examined. ML seemed to provide occasionally the best results in terms of bias, but hardly ever on MSE. All imputation methods at the scale level and logistic regression at item level hardly ever showed the best performance. The hybrid approach is similar or better than its original MI. The PMM-hybrid approach at item level demonstrated the best MSE for most settings and in some cases also the smallest bias.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号