首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
Hellinger distances are considered as measures of distance between density functions, and an inequality concerning different Hellinger distances is proved. Distance measures based on the α-entropy are proposed, and their relationship to a Hellinger distance is shown. Furthermore explicit expressions for the distance measures examined are derived in a one—parameter class of density functions, including Weibull, Gamma, and Maxwell distributions.  相似文献   

2.
Distance between two probability densities or two random variables is a well established concept in statistics. The present paper considers generalizations of distances to separation measurements for three or more elements in a function space. Geometric intuition and examples from hypothesis testing suggest lower and upper bounds for such measurements in terms of pairwise distances; but also in Lp spaces some useful non-pairwise separation measurements always lie within these bounds. Examples of such separation measurements are the Bayes probability of correct classification among several arbitrary distributions, and the expected range among several random variables.  相似文献   

3.
Distance concentration is the phenomenon that, in certain conditions, the contrast between the nearest and the farthest neighbouring points vanishes as the data dimensionality increases. It affects high dimensional data processing, analysis, retrieval, and indexing, which all rely on some notion of distance or dissimilarity. Previous work has characterised this phenomenon in the limit of infinite dimensions. However, real data is finite dimensional, and hence the infinite-dimensional characterisation is insufficient. Here we quantify the phenomenon more precisely, for the possibly high but finite dimensional case in a distribution-free manner, by bounding the tails of the probability that distances become meaningless. As an application, we show how this can be used to assess the concentration of a given distance function in some unknown data distribution solely on the basis of an available data sample from it. This can be used to test and detect problematic cases more rigorously than it is currently possible, and we demonstrate the working of this approach on both synthetic data and ten real-world data sets from different domains.  相似文献   

4.
Abstract.  Functional measures of skewness and kurtosis, called asymmetry and gradient asymmetry functions, are described for continuous univariate unimodal distributions. They are defined and interpreted directly in terms of the density function and its derivative. Asymmetry is defined by comparing distances from points of equal density to the mode. Gradient asymmetry is defined, in novel fashion, as asymmetry of an appropriate function of the density derivative. Properties and illustrations of asymmetry and gradient asymmetry functions are presented. Estimation of them is considered and illustrated with an example. Scalar summary skewness and kurtosis measures associated with asymmetry and gradient asymmetry functions are discussed.  相似文献   

5.
Exact influence measures are applied in the evaluation of a principal component decomposition for high dimensional data. Some data used for classifying samples of rice from their near infra-red transmission profiles, following a preliminary principal component analysis, are examined in detail. A normalization of eigenvalue influence statistics is proposed which ensures that measures reflect the relative orientations of observations, rather than their overall Euclidean distance from the sample mean. Thus, the analyst obtains more information from an analysis of eigenvalues than from approximate approaches to eigenvalue influence. This is particularly important for high dimensional data where a complete investigation of eigenvector perturbations may be cumbersome. The results are used to suggest a new class of influence measures based on ratios of Euclidean distances in orthogonal spaces.  相似文献   

6.
ABSTRACT

Consider the problem of estimating the positions of a set of targets in a multidimensional Euclidean space from distances reported by a number of observers when the observers do not know their own positions in the space. Each observer reports the distance from the observer to each target plus a random error. This statistical problem is the basic model for the various forms of what is called multidimensional unfolding in the psychometric literature. Multidimensional unfolding methodology as developed in the field of cognitive psychology is basically a statistical estimation problem where the data structure is a set of measures that are monotonic functions of Euclidean distances between a number of observers and targets in a multidimensional space. The new method presented in this article deals with estimating the target locations and the observer positions when the observations are functions of the squared distances between observers and targets observed with an additive random error in a two-dimensional space. The method provides robust estimates of the target locations in a multidimensional space for the parametric structure of the data generating model presented in the article. The method also yields estimates of the orientation of the coordinate system and the mean and variances of the observer locations. The mean and the variances are not estimated by standard unfolding methods which yield targets maps that are invariant to a rotation of the coordinate system. The data is transformed so that the nonlinearity due to the squared observer locations is removed. The sampling properties of the estimates are derived from the asymptotic variances of the additive errors of a maximum likelihood factor analysis of the sample covariance matrix of the transformed data augmented with bootstrapping. The robustness of the new method is tested using artificial data. The method is applied to a 2001 survey data set from Turkey to provide a real data example.  相似文献   

7.
A Bayes linear space is a linear space of equivalence classes of proportional σ‐finite measures, including probability measures. Measures are identified with their density functions. Addition is given by Bayes' rule and substraction by Radon–Nikodym derivatives. The present contribution shows the subspace of square‐log‐integrable densities to be a Hilbert space, which can include probability and infinite measures, measures on the whole real line or discrete measures. It extends the ideas from the Hilbert space of densities on a finite support towards Hilbert spaces on general measure spaces. It is also a generalisation of the Euclidean structure of the simplex, the sample space of random compositions. In this framework, basic notions of mathematical statistics get a simple algebraic interpretation. A key tool is the centred‐log‐ratio transformation, a generalization of that used in compositional data analysis, which maps the Hilbert space of measures into a subspace of square‐integrable functions. As a consequence of this structure, distances between densities, orthonormal bases, and Fourier series representing measures become available. As an application, Fourier series of normal distributions and distances between them are derived, and an example related to grain size distributions is presented. The geometry of the sample space of random compositions, known as Aitchison geometry of the simplex, is obtained as a particular case of the Hilbert space when the measures have discrete and finite support.  相似文献   

8.
Abstract

The aim of this paper is to investigate how some results related to the complex normal distribution are relevant in size and shape analysis. Our main focus is on the derivation of influential measures. In particular, Cook and Kullback–Leibler distances are combined with their respective asymptotic results as well as to an alternative process of defining cut-off points. Some numerical examples illustrate how these measures are used in practice. We perform an application to simulated and actual data. Results provide evidence that the methodology based on Kullback–Leibler distance outperforms one in terms of the Cook classic distance.  相似文献   

9.
ABSTRACT

Among the statistical methods to model stochastic behaviours of objects, clustering is a preliminary technique to recognize similar patterns within a group of observations in a data set. Various distances to measure differences among objects could be invoked to cluster data through numerous clustering methods. When variables in hand contain geometrical information of objects, such metrics should be adequately adapted. In fact, statistical methods for these typical data are endowed with a geometrical paradigm in a multivariate sense. In this paper, a procedure for clustering shape data is suggested employing appropriate metrics. Then, the best shape distance candidate as well as a suitable agglomerative method for clustering the simulated shape data are provided by considering cluster validation measures. The results are implemented in a real life application.  相似文献   

10.
In this article, we introduce a new weighted quantile regression method. Traditionally, the estimation of the parameters involved in quantile regression is obtained by minimizing a loss function based on absolute distances with weights independent of explanatory variables. Specifically, we study a new estimation method using a weighted loss function with the weights associated with explanatory variables so that the performance of the resulting estimation can be improved. In full generality, we derive the asymptotic distribution of the weighted quantile regression estimators for any uniformly bounded positive weight function independent of the response. Two practical weighting schemes are proposed, each for a certain type of data. Monte Carlo simulations are carried out for comparing our proposed methods with the classical approaches. We also demonstrate the proposed methods using two real-life data sets from the literature. Both our simulation study and the results from these examples show that our proposed method outperforms the classical approaches when the relative efficiency is measured by the mean-squared errors of the estimators.  相似文献   

11.
For the data from multivariate t distributions, it is very hard to make an influence analysis based on the probability density function since its expression is intractable. In this paper, we present a technique for influence analysis based on the mixture distribution and EM algorithm. In fact, the multivariate t distribution can be considered as a particular Gaussian mixture by introducing the weights from the Gamma distribution. We treat the weights as the missing data and develop the influence analysis for the data from multivariate t distributions based on the conditional expectation of the complete-data log-likelihood function in the EM algorithm. Several case-deletion measures are proposed for detecting influential observations from multivariate t distributions. Two numerical examples are given to illustrate our methodology.  相似文献   

12.
SUMMARY Automatic identification of faces from a database given a digital view is becoming increasingly important. The question arises whether or not there can be a face identification system similar to the fingerprinting system, where a certain number of matches are regarded as sufficient to identify the person in the database. We first give a very general review of the topic of facial measurements and indicate some deep statistical problems. We then analyze a database of photographs. Certain characteristics of the population are provided, such as the modes of variation and correlation structures using shape analysis. The data involve angles as well as distances. The principal component analysis for angular data is discussed, its conversion into landmark data is established and the two approaches are compared. A new approach of anchor shape analysis for specialized distances is discussed.  相似文献   

13.
In this paper, we extend Choi and Hall's [Data sharpening as a prelude to density estimation. Biometrika. 1999;86(4):941–947] data sharpening algorithm for kernel density estimation to interval-censored data. Data sharpening has several advantages, including bias and mean integrated squared error (MISE) reduction as well as increased robustness to bandwidth misspecification. Several interval metrics are explored for use with the kernel function in the data sharpening transformation. A simulation study based on randomly generated data is conducted to assess and compare the performance of each interval metric. It is found that the bias is reduced by sharpening, often with little effect on the variance, thus maintaining or reducing overall MISE. Applications involving time to onset of HIV and running distances subject to measurement error are used for illustration.  相似文献   

14.
The location linear discriminant function is used in a two-population classification problem when the available data are generated from both binary and continuous random variables. Asymptotic distribution of the studentized location linear discriminant function is derived directly without the inversion of the corresponding characteristic function. The resulting plug-in estimate of the overall error of misclassification consists of the estimate based on the limiting distribution of the discriminant plus a correction term up to the second order. By comparison, our estimate avoids exact knowledge of the Mahalanobis distances which is necessary when the expansions of Vlachonikolis (1985) are used in the case of an arbitrary cut-off point. An example is re-examined and analysed in the present context.  相似文献   

15.
Indices of Dependence Between Types in Multivariate Point Patterns   总被引:2,自引:0,他引:2  
We propose new summary statistics quantifying several forms of dependence between points of different types in a multi-type spatial point pattern. These statistics are the multivariate counterparts of the J -function for point processes of a single type, introduced by Lieshout & Baddeley (1996). They are based on comparing distances from a type i point to either the nearest type j point or to the nearest point in the pattern regardless of type to these distances seen from an arbitrary point in space. Information about the range of interaction can also be inferred. Our statistics can be computed explicitly for a range of well-known multivariate point process models. Some applications to bivariate and trivariate data sets are presented as well.  相似文献   

16.
Time dependent association measures between variables are of interest in bivariate survival data. Several such measures have been proposed in literature for the modelling and analysis of survival data. In this paper, we introduce a new measure of association for bivariate survival data using product moment residual life function and mean residual life function. Various properties of the proposed measure and its relationship with existing measures are discussed. We also develop a non-parametric estimator of the measure and study its asymptotic properties. The application of the result is illustrated using a real life data. Finally, a stimulation study is carried out to assess the performance of the estimator.  相似文献   

17.
This paper presents a unified method for influence analysis to deal with random effects appeared in additive nonlinear regression models for repeated measurement data. The basic idea is to apply the Q-function, the conditional expectation of the complete-data log-likelihood function obtained from EM algorithm, instead of the observed-data log-likelihood function as used in standard influence analysis. Diagnostic measures are derived based on the case-deletion approach and the local influence approach. Two real examples and a simulation study are examined to illustrate our methodology.  相似文献   

18.
《统计学通讯:理论与方法》2012,41(13-14):2342-2355
We propose a distance-based method to relate two data sets. We define and study some measures of multivariate association based on distances between observations. The proposed approach can be used to deal with general data sets (e.g., observations on continuous, categorical or mixed variables). An application, using Hellinger distance, provides the relationships between two regions of hyperspectral images.  相似文献   

19.
The influence of observations in estimating the misclassification probability in multiple discriminant analysis is studied using the common omission approach. An empirical influence function for the misclassification probability is also derived, It can give a very good approximation to the omission approach, but the computational load is much reduced, Various extensions of the measures are suggested. The proposed measures are applied to the famous Iris data set. The same three observations are identified as having the most influence under different measures.  相似文献   

20.
Vine copulas are a flexible class of dependence models consisting of bivariate building blocks and have proven to be particularly useful in high dimensions. Classical model distance measures require multivariate integration and thus suffer from the curse of dimensionality. In this paper, we provide numerically tractable methods to measure the distance between two vine copulas even in high dimensions. For this purpose, we consecutively develop three new distance measures based on the Kullback–Leibler distance, using the result that it can be expressed as the sum over expectations of KL distances between univariate conditional densities, which can be easily obtained for vine copulas. To reduce numerical calculations, we approximate these expectations on adequately designed grids, outperforming Monte Carlo integration with respect to computational time. For the sake of interpretability, we provide a baseline calibration for the proposed distance measures. We further develop similar substitutes for the Jeffreys distance, a symmetrized version of the Kullback–Leibler distance. In numerous examples and applications, we illustrate the strengths and weaknesses of the developed distance measures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号