首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A boxplot is a simple and effective exploratory data analysis tool for graphically summarizing a distribution of data. However, in cases where the quartiles in a boxplot are inaccurately estimated, these estimates can affect subsequent analyses. In this paper, we consider the problem of constructing boxplots in a bivariate setting with a categorical covariate with multiple subgroups, and assume that some of these boxplots can be clustered. We propose to use this grouping property to improve the estimation of the quartiles. We demonstrate that the proposed method more accurately estimates the quartiles compared to the usual boxplot. It is also shown that the proposed method identifies outliers effectively as a consequence of accurate quartiles, and possesses a clustering effect due to the group property. We then apply the proposed method to annual maximum precipitation data in South Korea and present its clustering results.  相似文献   

2.
In this paper, we propose a new Bayesian inference approach for classification based on the traditional hinge loss used for classical support vector machines, which we call the Bayesian Additive Machine (BAM). Unlike existing approaches, the new model has a semiparametric discriminant function where some feature effects are nonlinear and others are linear. This separation of features is achieved automatically during model fitting without user pre-specification. Following the literature on sparse regression of high-dimensional models, we can also identify the irrelevant features. By introducing spike-and-slab priors using two sets of indicator variables, these multiple goals are achieved simultaneously and automatically, without any parameter tuning such as cross-validation. An efficient partially collapsed Markov chain Monte Carlo algorithm is developed for posterior exploration based on a data augmentation scheme for the hinge loss. Our simulations and three real data examples demonstrate that the new approach is a strong competitor to some approaches that were proposed recently for dealing with challenging classification examples with high dimensionality.  相似文献   

3.
We propose approximations to the moments, different possibilities for the limiting distributions and approximate confidence intervals for the maximum-likelihood estimator of a given parametric function when sampling from partially non-regular log-exponential models. Our results are applicable to the two-parameter exponential, power-function and Pareto distribution. Asymptotic confidence intervals for quartiles in several Pareto models have been simulated. These are compared to asymptotic intervals based on sample quartiles. Our intervals are superior since we get shorter intervals with similar coverage probability. This superiority is even assessed probabilistically. Applications to real data are included.  相似文献   

4.
For J ? 2 independent groups, the article deals with testing the global hypothesis that all J groups have a common population median or identical quantiles, with an emphasis on the quartiles. Classic rank-based methods are sometimes suggested for comparing medians, but it is well known that under general conditions they do not adequately address this goal. Extant methods based on the usual sample median are unsatisfactory when there are tied values except for the special case J = 2. A variation of the percentile bootstrap used in conjunction with the Harrell–Davis quantile estimator performs well in simulations. The method is illustrated with data from the Well Elderly 2 study.  相似文献   

5.
Rank tests, such as logrank or Wilcoxon rank sum tests, have been popularly used to compare survival distributions of two or more groups in the presence of right censoring. However, there has been little research on sample size calculation methods for rank tests to compare more than two groups. An existing method is based on a crude approximation, which tends to underestimate sample size, i.e., the calculated sample size has lower power than projected. In this paper we propose an asymptotically correct method and an approximate method for sample size calculation. The proposed methods are compared to other methods through simulation studies.  相似文献   

6.
Functional boxplot is an attractive technique to visualize data that come from functions. We propose an alternative to the functional boxplot based on depth measures. Our proposal generalizes the usual construction of the box-plot in one dimension related to the down-upward orderings of the data by considering two intuitive pre-orders in the functional context. These orderings are based on the epigraphs and hypographs of the data that allow a new definition of functional quartiles which is more robust to shape outliers. Simulated and real examples show that this proposal provides a convenient visualization technique with a great potential for analyzing functional data and illustrate its usefulness to detect outliers that other procedures do not detect.  相似文献   

7.
In single-arm clinical trials with survival outcomes, the Kaplan–Meier estimator and its confidence interval are widely used to assess survival probability and median survival time. Since the asymptotic normality of the Kaplan–Meier estimator is a common result, the sample size calculation methods have not been studied in depth. An existing sample size calculation method is founded on the asymptotic normality of the Kaplan–Meier estimator using the log transformation. However, the small sample properties of the log transformed estimator are quite poor in small sample sizes (which are typical situations in single-arm trials), and the existing method uses an inappropriate standard normal approximation to calculate sample sizes. These issues can seriously influence the accuracy of results. In this paper, we propose alternative methods to determine sample sizes based on a valid standard normal approximation with several transformations that may give an accurate normal approximation even with small sample sizes. In numerical evaluations via simulations, some of the proposed methods provided more accurate results, and the empirical power of the proposed method with the arcsine square-root transformation tended to be closer to a prescribed power than the other transformations. These results were supported when methods were applied to data from three clinical trials.  相似文献   

8.
We extend four tests common in classical regression – Wald, score, likelihood ratio and F tests – to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications.  相似文献   

9.
The logratio methodology is not applicable when rounded zeros occur in compositional data. There are many methods to deal with rounded zeros. However, some methods are not suitable for analyzing data sets with high dimensionality. Recently, related methods have been developed, but they cannot balance the calculation time and accuracy. For further improvement, we propose a method based on regression imputation with Q-mode clustering. This method forms the groups of parts and builds partial least squares regression with these groups using centered logratio coordinates. We also prove that using centered logratio coordinates or isometric logratio coordinates in the response of partial least squares regression have the equivalent results for the replacement of rounded zeros. Simulation study and real example are conducted to analyze the performance of the proposed method. The results show that the proposed method can reduce the calculation time in higher dimensions and improve the quality of results.  相似文献   

10.
In this paper we shall deal with the AOQL single sampling plans for inspection by variables when the remainder of rejected lots is inspected. We shall report on an algorithm allowing calculation1 of these plans. For the calculation we shall derive a new theorem (see Theorem 4) and we shall use an original method. Prepared with support from the Grant Agency of the Czech Republic. The calculation is considerably difficult, we must use sequently three numerical methods.  相似文献   

11.
In this paper we present relatively simple (ruler, paper, and pencil) nonparametric procedures for constructing joint confidence regions for (i) the median and the inner quartile range for the symmetric one-sample problem and (ii) the shift and ratio of scale parameters for the two-sample case. Both procedures are functions of the sample quartiles and have exact confidence levels when the populations are continuous. The one-sample case requires symmetry of first and third quartiles about the median.

The confidence regions we propose are always convex, nested for decreasing confidence levels and are compact for reasonably large sample sizes. Both exact small sample and approximate large sample distributions are given.  相似文献   

12.
ABSTRACT

Missing data are commonly encountered in self-reported measurements and questionnaires. It is crucial to treat missing values using appropriate method to avoid bias and reduction of power. Various types of imputation methods exist, but it is not always clear which method is preferred for imputation of data with non-normal variables. In this paper, we compared four imputation methods: mean imputation, quantile imputation, multiple imputation, and quantile regression multiple imputation (QRMI), using both simulated and real data investigating factors affecting self-efficacy in breast cancer survivors. The results displayed an advantage of using multiple imputation, especially QRMI when data are not normal.  相似文献   

13.
Euclidean distance k-nearest neighbor (k-NN) classifiers are simple nonparametric classification rules. Bootstrap methods, widely used for estimating the expected prediction error of classification rules, are motivated by the objective of calculating the ideal bootstrap estimate of expected prediction error. In practice, bootstrap methods use Monte Carlo resampling to estimate the ideal bootstrap estimate because exact calculation is generally intractable. In this article, we present analytical formulae for exact calculation of the ideal bootstrap estimate of expected prediction error for k-NN classifiers and propose a new weighted k-NN classifier based on resampling ideas. The resampling-weighted k-NN classifier replaces the k-NN posterior probability estimates by their expectations under resampling and predicts an unclassified covariate as belonging to the group with the largest resampling expectation. A simulation study and an application involving remotely sensed data show that the resampling-weighted k-NN classifier compares favorably to unweighted and distance-weighted k-NN classifiers.  相似文献   

14.
Univariate Pareto distributions are extensively studied. In this article, we propose a Bayesian inference methodology in the context of multivariate Pareto distributions of the second kind (Mardia's type). Computational techniques organized around Gibbs sampling with data augmentation are proposed to implement Bayesian inference in practice. The new methods are shown to work well in artificial examples involving a trivariate distribution, and to an empirical application involving daily exchange rate data for four major currencies.  相似文献   

15.
Consistency of propensity score matching estimators hinges on the propensity score's ability to balance the distributions of covariates in the pools of treated and non-treated units. Conventional balance tests merely check for differences in covariates’ means, but cannot account for differences in higher moments. For this reason, this paper proposes balance tests which test for differences in the entire distributions of continuous covariates based on quantile regression (to derive Kolmogorov–Smirnov and Cramer–von-Mises–Smirnov-type test statistics) and resampling methods (for inference). Simulations suggest that these methods are very powerful and capture imbalances related to higher moments when conventional balance tests fail to do so.  相似文献   

16.
Summary.  Contemporary statistical research frequently deals with problems involving a diverging number of parameters. For those problems, various shrinkage methods (e.g. the lasso and smoothly clipped absolute deviation) are found to be particularly useful for variable selection. Nevertheless, the desirable performances of those shrinkage methods heavily hinge on an appropriate selection of the tuning parameters. With a fixed predictor dimension, Wang and co-worker have demonstrated that the tuning parameters selected by a Bayesian information criterion type criterion can identify the true model consistently. In this work, similar results are further extended to the situation with a diverging number of parameters for both unpenalized and penalized estimators. Consequently, our theoretical results further enlarge not only the scope of applicabilityation criterion type criteria but also that of those shrinkage estimation methods.  相似文献   

17.
Much of the small‐area estimation literature focuses on population totals and means. However, users of survey data are often interested in the finite‐population distribution of a survey variable and in the measures (e.g. medians, quartiles, percentiles) that characterize the shape of this distribution at the small‐area level. In this paper we propose a model‐based direct estimator (MBDE, Chandra and Chambers) of the small‐area distribution function. The MBDE is defined as a weighted sum of sample data from the area of interest, with weights derived from the calibrated spline‐based estimate of the finite‐population distribution function introduced by Harms and Duchesne, under an appropriately specified regression model with random area effects. We also discuss the mean squared error estimation of the MBDE. Monte Carlo simulations based on both simulated and real data sets show that the proposed MBDE and its associated mean squared error estimator perform well when compared with alternative estimators of the area‐specific finite‐population distribution function.  相似文献   

18.
In this paper we introduce a flexible extension of the Gumbel distribution called the odd log-logistic exponentiated Gumbel distribution. The new model was implemented in GAMLSS package of R software and a brief tutorial on how to use this package is presented throughout the paper. We provide a comprehensive treatment of its general mathematical properties. Further, we propose a new extended regression model considering four regression structures. We discuss estimation methods based on censored and uncensored data. Two simulation studies are presented and four real data sets are applied to illustrating the usefulness of the new model.  相似文献   

19.
Summary.  We propose an approach for assessing the risk of individual identification in the release of categorical data. This requires the accurate calculation of predictive probabilities for those cells in a contingency table which have small sample frequencies, making the problem somewhat different from usual contingency table estimation, where interest is generally focused on regions of high probability. Our approach is Bayesian and provides posterior predictive probabilities of identification risk. By incorporating model uncertainty in our analysis, we can provide more realistic estimates of disclosure risk for individual cell counts than are provided by methods which ignore the multivariate structure of the data set.  相似文献   

20.
Reuse of controls in a nested case-control (NCC) study has not been considered feasible since the controls are matched to their respective cases. However, in the last decade or so, methods have been developed that break the matching and allow for analyses where the controls are no longer tied to their cases. These methods can be divided into two groups; weighted partial likelihood (WPL) methods and full maximum likelihood methods. The weights in the WPL can be estimated in different ways and four estimation procedures are discussed. In addition, we address modifications needed to accommodate left truncation. A full likelihood approach is also presented and we suggest an aggregation technique to decrease the computation time. Furthermore, we generalize calibration for case-cohort designs to NCC studies. We consider a competing risks situation and compare WPL, full likelihood and calibration through simulations and analyses on a real data example.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号