首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In this work it is shown how the k-means method for clustering objects can be applied in the context of statistical shape analysis. Because the choice of the suitable distance measure is a key issue for shape analysis, the Hartigan and Wong k-means algorithm is adapted for this situation. Simulations on controlled artificial data sets demonstrate that distances on the pre-shape spaces are more appropriate than the Euclidean distance on the tangent space. Finally, results are presented of an application to a real problem of oceanography, which in fact motivated the current work.  相似文献   

2.
ABSTRACT

Matrix-valued covariance functions are crucial to geostatistical modelling of multivariate spatial data. The classical assumption of symmetry of a multivariate covariance function is overly restrictive and has been considered as unrealistic for most of the real data applications. Despite of that, the literature on asymmetric covariance functions has been very sparse. In particular, there is some work related to asymmetric covariances on Euclidean spaces, depending on the Euclidean distance. However, for data collected over large portions of planet Earth, the most natural spatial domain is a sphere, with the corresponding geodesic distance being the natural metric. In this work, we propose a strategy based on spatial rotations to generate asymmetric covariances for multivariate random fields on the d-dimensional unit sphere. We illustrate through simulations as well as real data analysis that our proposal allows to achieve improvements in the predictive performance in comparison to the symmetric counterpart.  相似文献   

3.
This study proposes a simple way to perform a power analysis of Mantel's test applied to squared Euclidean distance matrices. The general statistical aspects of the simple Mantel's test are reviewed. The Monte Carlo method is used to generate bivariate Gaussian variables in order to create squared Euclidean distance matrices. The power of the parametric correlation t-test applied to raw data is also evaluated and compared with that of Mantel's test. The standard procedure for calculating punctual power levels is used for validation. The proposed procedure allows one to draw the power curve by running the test only once, dispensing with the time demanding standard procedure of Monte Carlo simulations. Unlike the standard procedure, it does not depend on a knowledge of the distribution of the raw data. The simulated power function has all the properties of the power analysis theory and is in agreement with the results of the standard procedure.  相似文献   

4.
在面板数据聚类分析方法的研究中,基于面板数据兼具截面维度和时间维度的特征,对欧氏距离函数进行了改进,在聚类过程中考虑指标权重与时间权重,提出了适用于面板数据聚类分析的"加权距离函数"以及相应的Ward.D聚类方法。首先定义了考虑指标绝对值、邻近时点增长率以及波动变异程度的欧氏距离函数;然后,将指标权重与时间权重通过线性模型集结成综合加权距离,最终实现面板数据的加权聚类过程。实证分析结果显示,考虑指标权重与时间权重的面板数据加权聚类分析方法具有更好的分辨能力,能提高样本聚类的准确性。  相似文献   

5.
ABSTRACT

Consider the problem of estimating the positions of a set of targets in a multidimensional Euclidean space from distances reported by a number of observers when the observers do not know their own positions in the space. Each observer reports the distance from the observer to each target plus a random error. This statistical problem is the basic model for the various forms of what is called multidimensional unfolding in the psychometric literature. Multidimensional unfolding methodology as developed in the field of cognitive psychology is basically a statistical estimation problem where the data structure is a set of measures that are monotonic functions of Euclidean distances between a number of observers and targets in a multidimensional space. The new method presented in this article deals with estimating the target locations and the observer positions when the observations are functions of the squared distances between observers and targets observed with an additive random error in a two-dimensional space. The method provides robust estimates of the target locations in a multidimensional space for the parametric structure of the data generating model presented in the article. The method also yields estimates of the orientation of the coordinate system and the mean and variances of the observer locations. The mean and the variances are not estimated by standard unfolding methods which yield targets maps that are invariant to a rotation of the coordinate system. The data is transformed so that the nonlinearity due to the squared observer locations is removed. The sampling properties of the estimates are derived from the asymptotic variances of the additive errors of a maximum likelihood factor analysis of the sample covariance matrix of the transformed data augmented with bootstrapping. The robustness of the new method is tested using artificial data. The method is applied to a 2001 survey data set from Turkey to provide a real data example.  相似文献   

6.
Abstract

The aim of this paper is to investigate how some results related to the complex normal distribution are relevant in size and shape analysis. Our main focus is on the derivation of influential measures. In particular, Cook and Kullback–Leibler distances are combined with their respective asymptotic results as well as to an alternative process of defining cut-off points. Some numerical examples illustrate how these measures are used in practice. We perform an application to simulated and actual data. Results provide evidence that the methodology based on Kullback–Leibler distance outperforms one in terms of the Cook classic distance.  相似文献   

7.
Candidate locally D-optimal designs for the binary two-variable logistic model with no interaction, which comprise 3 and 4 support points lying in the first quadrant of the two-dimensional Euclidean space, were introduced by Haines et al. (D-optimal designs for logistic regression in two variables. In: Lopez-Fidalgo J, Rodrigez-Diaz JM, Torsney B, editors. MODA8 – advances in model-oriented designs and analysis. Heidelberg: Physica-Verlag; 2007. p. 91–98). The authors proved algebraically the global D-optimality of the 3-point design for the special case in which the intercept parameter is equal to?1.5434. However for other selected values of the intercept parameter, the global D-optimality of the proposed 3- and 4-point designs was only demonstrated numerically. In this paper, we provide analytical proofs of the D-optimality of these 3- and 4-point designs for all negative and zero intercept parameters of the binary two-variable logistic model with no interaction. The results are extended to the construction of D-optimal designs on a rectangular design space and illustrated by means of two examples of which one is a real example taken from the literature.  相似文献   

8.
P. Jagers 《Statistics》2013,47(4):455-464
For a suitable norm, conservation of the distance between expectation and hypothesis may furnish a basis for data reduction by invariance in the linear, not neces-sarily normal, model. If the norm is Euclidean (i.e. based on some inner product), the maximal invariant is a pair of sums of squares. This provides support for traditional χ2 (or F) - methods also in nonnormal cases. If the norm is lp p≠2, or the supnorm, the maximal invariant is, at the best a air of order statistics.  相似文献   

9.
In this paper we shall establish a new matrix inequality which will fill the gap that there has not been any matrix Euclidean norm version of the Wielandt inequality in the literature yet. This inequality can be used to present an upper bound of a new measure of association which plays a very important role in statistics, especially in multivariate analysis. A new alternative based on Euclidean norm for relative gain of the covariance adjusted estimator of parameters is provided.  相似文献   

10.
Few publications consider the estimation of relative risk for vector-borne infectious diseases. Most of these articles involve exploratory analysis that includes the study of covariates and their effects on disease distribution and the study of geographic information systems to integrate patient-related information. The aim of this paper is to introduce an alternative method of relative risk estimation based on discrete time–space stochastic SIR-SI models (susceptible–infective–recovered for human populations; susceptible–infective for vector populations) for the transmission of vector-borne infectious diseases, particularly dengue disease. First, we describe deterministic compartmental SIR-SI models that are suitable for dengue disease transmission. We then adapt these to develop corresponding discrete time–space stochastic SIR-SI models. Finally, we develop an alternative method of estimating the relative risk for dengue disease mapping based on these models and apply them to analyse dengue data from Malaysia. This new approach offers a better model for estimating the relative risk for dengue disease mapping compared with the other common approaches, because it takes into account the transmission process of the disease while allowing for covariates and spatial correlation between risks in adjacent regions.  相似文献   

11.
Microarray experiments are being widely used in medical and biological research. The main features of these studies are the large number of variables (genes) involved and the low number of replicates (arrays). It seems clear that the most appropriate models, when looking for detecting differences in gene expression are those that exploit the most useful information to compensate for the lack of replicates. On the other hand, the control of the error in the decision process plays an important role for the high number of simultaneous statistical tests (one for each gene), so that concepts such as the false discovery rate (FDR) take a special importance. One of the alternatives for the analysis of the data in these experiments is based on the calculation of statistics derived from modifications of the classical methods used in this type of problems (moderated-t, B-statistic). Nonparametric techniques have been also proposed [B. Efron, R. Tibshirani, J.D. Storey, and V. Tusher, Empirical Bayes analysis of a microarray experiment, J. Amer. Stat. Assoc. 96 (2001), pp. 1151–1160; W. Pan, J. Lin, and C.T. Le, A mixture model approach to detecting differentially expressed genes with microarray data, Funct. Integr. Genomics 3 (2003), pp. 117–124], allowing the analysis without assuming any prior condition about the distribution of the data, which make them especially suitable in such situations. This paper presents a new method to detect differentially expressed genes based on non-parametric density estimation by a class of functions that allow us to define a distance between individuals in the sample (characterized by the coordinates of the individual (gene) in the dual space tangent to the manifold of parameters) [A. Miñarro and J.M. Oller, Some remarks on the individuals-score distance and its applications to statistical inference, Qüestiió, 16 (1992), pp. 43–57]. From these distances, we designed the test to determine the rejection region based on the control of FDR.  相似文献   

12.
A non-normal invariance principle is established for a restricted class of univariate multi-response permutation procedures whose distance measure is the square of Euclidean distance. For observations from a distribution with finite second moment, the test statistic is found asymptotically to have a centered chi-squared distribution. Spectral expansions are used to determine the asymptotic distribution for more general distance measures d, and it is shown that if d(x, y) = |x — y|u, u? 2, the asymptotic distribution is not invariant (i.e. it is dependent on the distribution of the observations).  相似文献   

13.
Jan Rataj 《Statistics》2013,47(4):377-385
Two classes of random distances (generated as contact distances or free path lengths) external to a stationary random closed set in Euclidean space are introduced. The censored distance distributions within bounded region are obtained. Unbiased estimators of the random distance distribution functions using only the information from inside the bounded region are constructed.  相似文献   

14.
Frame corrections have been studied in census applications for a long time. One very promising method is dual system estimation, which is based on capture–recapture models. These methods have been applied recently in the USA, England, Israel and Switzerland. In order to gain information on subgroups of the population, structure preserving estimators can be applied [i.e. structure preserving estimation (SPREE) and generalized SPREE]. The present paper extends the SPREE approach with an alternative distance function, the chi‐square. The new method has shown improved estimates in our application with very small domains. A comparative study based on a large‐scale Monte Carlo simulation elaborates on advantages and disadvantages of the estimators in the context of the German register‐assisted Census 2011.  相似文献   

15.
郑振龙  孙清泉 《统计研究》2014,31(6):98-106
模型设定检验是资产定价的核心环节,作为模型误设的新指标,第一HJ距离受到学术界的广泛关注。然而,鲜有文献比较第一HJ距离和传统的误设测度的异同。本文系统地分析第一HJ距离的性质,并与传统的模型设定误差测度进行比较发现:(1)第一HJ距离将基于模型所用SDF的欧氏空间距离和最大定价误相联系,有丰富的经济含义;(2)第一HJ距离关注定价误差,相较于传统的模型误设测度,倾向于选择大的零Beta收益率和小的因子风险溢酬,对模型的排序有差异;(3)第一HJ距离的加权矩阵具有模型独立性和对测试资产组合选择的一致性。  相似文献   

16.
A statistical model is said to be an order‐restricted statistical model when its parameter takes its values in a closed convex cone C of the Euclidean space. In recent years, order‐restricted likelihood ratio tests and maximum likelihood estimators have been criticized on the grounds that they may violate a cone order monotonicity (COM) property, and hence reverse the cone order induced by C. The authors argue here that these reversals occur only in the case that C is an obtuse cone, and that in this case COM is an inappropriate requirement for likelihood‐based estimates and tests. They conclude that these procedures thus remain perfectly reasonable procedures for order‐restricted inference.  相似文献   

17.
In a recent paper, Nair et al. [Stat Pap 52:893–909, 2011] proposed Chernoff distance measure for left/right-truncated random variables and studied their properties in the context of reliability analysis. Here we extend the definition of Chernoff distance for doubly truncated distributions. This measure may help the information theorists and reliability analysts to study the various characteristics of a system/component when it fails between two time points. We study some properties of this measure and obtain its upper and lower bounds. We also study the interval Chernoff distance between the original and weighted distributions. These results generalize and enhance the related existing results that are developed based on Chernoff distance for one-sided truncated random variables.  相似文献   

18.
Efficiency and robustness are two fundamental concepts in parametric estimation problems. It was long thought that there was an inherent contradiction between the aims of achieving robustness and efficiency; that is, a robust estimator could not be efficient and vice versa. It is now known that the minimum Hellinger distance approached introduced by Beran [R. Beran, Annals of Statistics 1977;5:445–463] is one way of reconciling the conflicting concepts of efficiency and robustness. For parametric models, it has been shown that minimum Hellinger estimators achieve efficiency at the model density and simultaneously have excellent robustness properties. In this article, we examine the application of this approach in two semiparametric models. In particular, we consider a two‐component mixture model and a two‐sample semiparametric model. In each case, we investigate minimum Hellinger distance estimators of finite‐dimensional Euclidean parameters of particular interest and study their basic asymptotic properties. Small sample properties of the proposed estimators are examined using a Monte Carlo study. The results can be extended to semiparametric models of general form as well. The Canadian Journal of Statistics 37: 514–533; 2009 © 2009 Statistical Society of Canada  相似文献   

19.
We discuss the problem of selecting among alternative parametric models within the Bayesian framework. For model selection problems, which involve non‐nested models, the common objective choice of a prior on the model space is the uniform distribution. The same applies to situations where the models are nested. It is our contention that assigning equal prior probability to each model is over simplistic. Consequently, we introduce a novel approach to objectively determine model prior probabilities, conditionally, on the choice of priors for the parameters of the models. The idea is based on the notion of the worth of having each model within the selection process. At the heart of the procedure is the measure of this worth using the Kullback–Leibler divergence between densities from different models.  相似文献   

20.
Exact influence measures are applied in the evaluation of a principal component decomposition for high dimensional data. Some data used for classifying samples of rice from their near infra-red transmission profiles, following a preliminary principal component analysis, are examined in detail. A normalization of eigenvalue influence statistics is proposed which ensures that measures reflect the relative orientations of observations, rather than their overall Euclidean distance from the sample mean. Thus, the analyst obtains more information from an analysis of eigenvalues than from approximate approaches to eigenvalue influence. This is particularly important for high dimensional data where a complete investigation of eigenvector perturbations may be cumbersome. The results are used to suggest a new class of influence measures based on ratios of Euclidean distances in orthogonal spaces.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号