首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Non parametric approaches to classification have gained significant attention in the last two decades. In this paper, we propose a classification methodology based on the multivariate rank functions and show that it is a Bayes rule for spherically symmetric distributions with a location shift. We show that a rank-based classifier is equivalent to optimal Bayes rule under suitable conditions. We also present an affine invariant version of the classifier. To accommodate different covariance structures, we construct a classifier based on the central rank region. Asymptotic properties of these classification methods are studied. We illustrate the performance of our proposed methods in comparison to some other depth-based classifiers using simulated and real data sets.  相似文献   

2.
It is well known that if a multivariate outlier has one or more missing component values, then multiple imputation (MI) methods tend to impute nonextreme values and make the outlier become less extreme and less likely to be detected. In this paper, nonparametric depth-based multivariate outlier identifiers are used as criteria in a numerical study comparing several established methods of MI as well as a new proposed one, nine in all, in a setting of several actual clinical laboratory data sets of different dimensions. Two criteria, an ‘outlier recovery probability’ and a ‘relative accuracy measure’, are developed, based on depth functions. Three outlier identifiers, based on Mahalanobis distance, robust Mahalanobis distance, and generalized principle component analysis are also included in the study. Consequently, not only the comparison of imputation methods but also the comparison of outlier detection methods is accomplished in this study. Our findings show that the performance of an MI method depends on the choice of depth-based outlier detection criterion, as well as the size and dimension of the data and the fraction of missing components. By taking these features into account, an MI method for a given data set can be selected more optimally.  相似文献   

3.
AStA Advances in Statistical Analysis - Notions of data depth have motivated nonparametric multivariate analysis, especially in supervised learning. Maximum depth classifiers, classifiers based on...  相似文献   

4.
One of the greatest challenges related to the use of piecewise exponential models (PEMs) is to find an adequate grid of time-points needed in its construction. In general, the number of intervals in such a grid and the position of their endpoints are ad-hoc choices. We extend previous works by introducing a full Bayesian approach for the piecewise exponential model in which the grid of time-points (and, consequently, the endpoints and the number of intervals) is random. We estimate the failure rates using the proposed procedure and compare the results with the non-parametric piecewise exponential estimates. Estimates for the survival function using the most probable partition are compared with the Kaplan-Meier estimators (KMEs). A sensitivity analysis for the proposed model is provided considering different prior specifications for the failure rates and for the grid. We also evaluate the effect of different percentage of censoring observations in the estimates. An application to a real data set is also provided. We notice that the posteriors are strongly influenced by prior specifications, mainly for the failure rates parameters. Thus, the priors must be fairly built, say, really disclosing the expert prior opinion.  相似文献   

5.
This paper presents a Bayesian non-parametric approach to survival analysis based on arbitrarily right censored data. The analysis is based on posterior predictive probabilities using a Polya tree prior distribution on the space of probability measures on [0, ∞). In particular we show that the estimate generalizes the classical Kaplanndash;Meier non-parametric estimator, which is obtained in the limiting case as the weight of prior information tends to zero.  相似文献   

6.
When the outcome of interest is semicontinuous and collected longitudinally, efficient testing can be difficult. Daily rainfall data is an excellent example which we use to illustrate the various challenges. Even under the simplest scenario, the popular ‘two-part model’, which uses correlated random-effects to account for both the semicontinuous and longitudinal characteristics of the data, often requires prohibitively intensive numerical integration and difficult interpretation. Reducing data to binary (truncating continuous positive values to equal one), while relatively straightforward, leads to a potentially substantial loss in power. We propose an alternative: using a non-parametric rank test recently proposed for joint longitudinal survival data. We investigate the potential benefits of such a test for the analysis of semicontinuous longitudinal data with regards to power and computational feasibility.  相似文献   

7.
Using 1998 and 1999 singleton birth data of the State of Florida, we study the stability of classification trees. Tree stability depends on both the learning algorithm and the specific data set. In this study, test samples are used in statistical learning to evaluate both stability and predictive performance. We also use the resampling technique bootstrap, which can be regarded as data self-perturbation, to evaluate the sensitivity of the modeling algorithm with respect to the specific data set. We demonstrate that the selection of the cost function plays an important role in stability. In particular, classifiers with equal misclassification costs and equal priors are less stable compared to those with unequal misclassification costs and equal priors.  相似文献   

8.
Li and Liu [New nonparametric tests of multivariate locations and scales. Statist Sci. 2004;19(4):686–696] introduced two tests for a difference in locations of two multivariate distributions based on the concept of data depth. Using the simplicial depth [Liu RY. On a notion of data depth based on random simplices. Ann Stat. 1990;18(1):405–414], they studied the performance of these tests for symmetric distributions, namely, the normal and the Cauchy, in a simulation study. However, to the best of our knowledge, the performance of these tests for skewed distributions has not been studied in the current literature. This paper is a contribution in that direction and examines the performance of these depth-based tests in an extensive simulation study involving ten distributions belonging to five well-known families of multivariate skewed distributions. The study includes a comparison of the performance of these tests for four popular affine-invariant depth functions. Conclusions and recommendations are offered.  相似文献   

9.
Balanced Confidence Regions Based on Tukey's Depth and the Bootstrap   总被引:1,自引:0,他引:1  
We propose and study the bootstrap confidence regions for multivariate parameters based on Tukey's depth. The bootstrap is based on the normalized or Studentized statistic formed from an independent and identically distributed random sample obtained from some unknown distribution in R q . The bootstrap points are deleted on the basis of Tukey's depth until the desired confidence level is reached. The proposed confidence regions are shown to be second order balanced in the context discussed by Beran. We also study the asymptotic consistency of Tukey's depth-based bootstrap confidence regions. The applicability of the method proposed is demonstrated in a simulation study.  相似文献   

10.
In many applications in applied statistics, researchers reduce the complexity of a data set by combining a group of variables into a single measure using a factor analysis or an index number. We argue that such compression loses information if the data actually have high dimensionality. We advocate the use of a non-parametric estimator, commonly used in physics (the Takens estimator), to estimate the correlation dimension of the data prior to compression. The advantage of this approach over traditional linear data compression approaches is that the data do not have to be linearised. Applying our ideas to the United Nations Human Development Index, we find that the four variables that are used in its construction have dimension 3 and the index loses information.  相似文献   

11.
A general inductive Bayesian classification framework is considered using a simultaneous predictive distribution for test items. We introduce a principle of generative supervised and semi-supervised classification based on marginalizing the joint posterior distribution of labels for all test items. The simultaneous and marginalized classifiers arise under different loss functions, while both acknowledge jointly all uncertainty about the labels of test items and the generating probability measures of the classes. We illustrate for data from multiple finite alphabets that such classifiers achieve higher correct classification rates than a standard marginal predictive classifier which labels all test items independently, when training data are sparse. In the supervised case for multiple finite alphabets the simultaneous and the marginal classifiers are proven to become equal under generalized exchangeability when the amount of training data increases. Hence, the marginal classifier can be interpreted as an asymptotic approximation to the simultaneous classifier for finite sets of training data. It is also shown that such convergence is not guaranteed in the semi-supervised setting, where the marginal classifier does not provide a consistent approximation.  相似文献   

12.
In this paper the indicator approach in spatial data analysis is presented for the determination of probability distributions to characterize the uncertainty about any unknown value. Such an analysis is non-parametric and is done independently of the estimate retained. These distributions are given through a series of quantile estimates and are not related to any particular prior model or shape. Moreover, determination of these distributions accounts for the data configuration and data values. An application is discussed. Moreover, some properties related to the Gaussian model are presented.  相似文献   

13.
The interval-censored survival data appear very frequently, where the event of interest is not observed exactly but it is only known to occur within some time interval. In this paper, we propose a location-scale regression model based on the log-generalized gamma distribution for modelling interval-censored data. We shall be concerned only with parametric forms. The proposed model for interval-censored data represents a parametric family of models that has, as special submodels, other regression models which are broadly used in lifetime data analysis. Assuming interval-censored data, we consider a frequentist analysis, a Jackknife estimator and a non-parametric bootstrap for the model parameters. We derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and present some techniques to perform global influence.  相似文献   

14.
Generalized partially linear varying-coefficient models   总被引:1,自引:0,他引:1  
Generalized varying-coefficient models are useful extensions of generalized linear models. They arise naturally when investigating how regression coefficients change over different groups characterized by certain covariates such as age. In this paper, we extend these models to generalized partially linear varying-coefficient models, in which some coefficients are constants and the others are functions of certain covariates. Procedures for estimating the linear and non-parametric parts are developed and their associated statistical properties are studied. The methods proposed are illustrated using some simulations and real data analysis.  相似文献   

15.
Summary. Although some researchers have examined posterior multimodality for specific richly parameterized models, multimodality is not well characterized for any such model. The paper characterizes bimodality of the joint and marginal posteriors for a conjugate analysis of the balanced one-way random-effects model with a flat prior on the mean. This apparently simple model has surprisingly complex and even bizarre mode behaviour. Bimodality usually arises when the data indicate a much larger between-groups variance than does the prior. We examine an example in detail, present a graphical display for describing bimodality and use real data sets from a statistical practice to shed light on the practical relevance of bimodality for these models.  相似文献   

16.
In this paper, we use simulated data to investigate the power of different causality tests in a two-dimensional vector autoregressive (VAR) model. The data are presented in a nonlinear environment that is modelled using a logistic smooth transition autoregressive function. We use both linear and nonlinear causality tests to investigate the unidirection causality relationship and compare the power of these tests. The linear test is the commonly used Granger causality F test. The nonlinear test is a non-parametric test based on Baek and Brock [A general test for non-linear Granger causality: Bivariate model. Tech. Rep., Iowa State University and University of Wisconsin, Madison, WI, 1992] and Hiemstra and Jones [Testing for linear and non-linear Granger causality in the stock price–volume relation, J. Finance 49(5) (1994), pp. 1639–1664]. When implementing the nonlinear test, we use separately the original data, the linear VAR filtered residuals, and the wavelet decomposed series based on wavelet multiresolution analysis. The VAR filtered residuals and the wavelet decomposition series are used to extract the nonlinear structure of the original data. The simulation results show that the non-parametric test based on the wavelet decomposition series (which is a model-free approach) has the highest power to explore the causality relationship in nonlinear models.  相似文献   

17.
Several methods for comparing k populations have been proposed in the literature. These methods assess the same null hypothesis of equal distributions but differ in the alternative hypothesis they consider. We focus on two important alternative hypotheses: monotone and umbrella ordering. Two new families of test statistics are proposed, including two known tests, as well as two new powerful tests under monotone ordering. Furthermore, these families are adapted for testing umbrella ordering. We compare some members of the families with respect to power and Type I errors under different simulation scenarios. Finally, the methods are illustrated in several applications to real data.  相似文献   

18.
We apply the univariate sliced inverse regression to survival data. Our approach is different from the other papers on this subject. The right-censored observations are taken into account during the slicing of the survival times by assigning each of them with equal weight to all of the slices with longer survival. We test this method with different distributions for the two main survival data models, the accelerated lifetime model and Cox’s proportional hazards model. In both cases and under different conditions of sparsity, sample size and dimension of parameters, this non-parametric approach finds the data structure and can be viewed as a variable selector.  相似文献   

19.
In this paper we investigate the application of stochastic complexity theory to classification problems. In particular, we define the notion of admissible models as a function of problem complexity, the number of data pointsN, and prior belief. This allows us to derive general bounds relating classifier complexity with data-dependent parameters such as sample size, class entropy and the optimal Bayes error rate. We discuss the application of these results to a variety of problems, including decision tree classifiers, Markov models for image segmentation, and feedforward multilayer neural network classifiers.  相似文献   

20.
In this article, we introduce a new method for modelling curves with dynamic structures, using a non-parametric approach formulated as a state space model. The non-parametric approach is based on the use of penalised splines, represented as a dynamic mixed model. This formulation can capture the dynamic evolution of curves using a limited number of latent factors, allowing an accurate fit with a small number of parameters. We also present a new method to determine the optimal smoothing parameter through an adaptive procedure, using a formulation analogous to a model of stochastic volatility (SV). The non-parametric state space model allows unifying different methods applied to data with a functional structure in finance. We present the advantages and limitations of this method through simulation studies and also by comparing its predictive performance with other parametric and non-parametric methods used in financial applications using data on the term structure of interest rates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号