首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In environmetrics, interest often centres around the development of models and methods for making inference on observed point patterns assumed to be generated by latent spatial or spatio‐temporal processes, which may have a hierarchical structure. In this research, motivated by the analysis of spatio‐temporal storm cell data, we generalize the Neyman–Scott parent–child process to account for hierarchical clustering. This is accomplished by allowing the parents to follow a log‐Gaussian Cox process thereby incorporating correlation and facilitating inference at all levels of the hierarchy. This approach is applied to monthly storm cell data from the Bismarck, North Dakota radar station from April through August 2003 and we compare these results to simpler cluster processes to demonstrate the advantages of accounting for both levels of correlation present in these hierarchically clustered point patterns. The Canadian Journal of Statistics 47: 46–64; 2019 © 2019 Statistical Society of Canada  相似文献   

2.
Model-based clustering is a method that clusters data with an assumption of a statistical model structure. In this paper, we propose a novel model-based hierarchical clustering method for a finite statistical mixture model based on the Fisher distribution. The main foci of the proposed method are: (a) provide efficient solution to estimate the parameters of a Fisher mixture model (FMM); (b) generate a hierarchy of FMMs and (c) select the optimal model. To this aim, we develop a Bregman soft clustering method for FMM. Our model estimation strategy exploits Bregman divergence and hierarchical agglomerative clustering. Whereas, our model selection strategy comprises a parsimony-based approach and an evaluation graph-based approach. We empirically validate our proposed method by applying it on simulated data. Next, we apply the method on real data to perform depth image analysis. We demonstrate that the proposed clustering method can be used as a potential tool for unsupervised depth image analysis.  相似文献   

3.
Demonstrated equivalence between a categorical regression model based on case‐control data and an I‐sample semiparametric selection bias model leads to a new goodness‐of‐fit test. The proposed test statistic is an extension of an existing Kolmogorov–Smirnov‐type statistic and is the weighted average of the absolute differences between two estimated distribution functions in each response category. The paper establishes an optimal property for the maximum semiparametric likelihood estimator of the parameters in the I‐sample semiparametric selection bias model. It also presents a bootstrap procedure, some simulation results and an analysis of two real datasets.  相似文献   

4.
Research on structure determination and parameter estimation of hierarchical Archimedean copulas (HACs) has so far mostly focused on the case in which all appearing Archimedean copulas belong to the same Archimedean family. The present work addresses this issue and proposes a new approach for estimating HACs that involve different Archimedean families. It is based on employing goodness-of-fit test statistics directly into HAC estimation. The approach is summarized in a simple algorithm, its theoretical justification is given and its applicability is illustrated by several experiments, which include estimation of HACs involving up to five different Archimedean families.  相似文献   

5.
Abstract. DNA array technology is an important tool for genomic research due to its capa‐city of measuring simultaneously the expression levels of a great number of genes or fragments of genes in different experimental conditions. An important point in gene expression data analysis is to identify clusters of genes which present similar expression levels. We propose a new procedure for estimating the mixture model for clustering of gene expression data. The proposed method is a posterior split‐merge‐birth MCMC procedure which does not require the specification of the number of components, since it is estimated jointly with component parameters. The strategy for splitting is based on data and on posterior distribution from the previously allocated observations. This procedure defines a quick split proposal in contrary to other split procedures, which require substantial computational effort. The performance of the method is verified using real and simulated datasets.  相似文献   

6.
Measuring a statistical model's complexity is important for model criticism and comparison. However, it is unclear how to do this for hierarchical models due to uncertainty about how to count the random effects. The authors develop a complexity measure for generalized linear hierarchical models based on linear model theory. They demonstrate the new measure for binomial and Poisson observables modeled using various hierarchical structures, including a longitudinal model and an areal‐data model having both spatial clustering and pure heterogeneity random effects. They compare their new measure to a Bayesian index of model complexity, the effective number pD of parameters (Spiegelhalter, Best, Carlin & van der Linde 2002); the comparisons are made in the binomial and Poisson cases via simulation and two real data examples. The two measures are usually close, but differ markedly in some instances where pD is arguably inappropriate. Finally, the authors show how the new measure can be used to approach the difficult task of specifying prior distributions for variance components, and in the process cast further doubt on the commonly‐used vague inverse gamma prior.  相似文献   

7.
Priors are introduced into goodness‐of‐fit tests, both for unknown parameters in the tested distribution and on the alternative density. Neyman–Pearson theory leads to the test with the highest expected power. To make the test practical, we seek priors that make it likely a priori that the power will be larger than the level of the test but not too close to one. As a result, priors are sample size dependent. We explore this procedure in particular for priors that are defined via a Gaussian process approximation for the logarithm of the alternative density. In the case of testing for the uniform distribution, we show that the optimal test is of the U‐statistic type and establish limiting distributions for the optimal test statistic, both under the null hypothesis and averaged over the alternative hypotheses. The optimal test statistic is shown to be of the Cramér–von Mises type for specific choices of the Gaussian process involved. The methodology when parameters in the tested distribution are unknown is discussed and illustrated in the case of testing for the von Mises distribution. The Canadian Journal of Statistics 47: 560–579; 2019 © 2019 Statistical Society of Canada  相似文献   

8.
Identifying the distribution of the incidence rate of a disease over a region is a prediction problem where area‐specific random effects are to be estimated. The authors consider the inclusion of such effects at different levels of a hierarchical health administrative structure. They develop inference procedures for this type of multi‐level model and show that the predicted rates are approximately weighted sums of the crude rates obtained by pooling data on each level of the hierarchy. Their techniques are illustrated using infant mortality data from British Columbia.  相似文献   

9.
Icicle Plots: Better Displays for Hierarchical Clustering   总被引:1,自引:0,他引:1  
An icicle plot is a method for presenting a hierarchical clustering. Compared with other methods of presentation, it is far easier in an icicle plot to read off which objects belong to which clusters, and which objects join or drop out from a cluster as we move up and down the levels of the hierarchy, though these benefits only appear when enough objects are being clustered. Icicle plots are described, and their benefits are illustrated using a clustering of 48 objects.  相似文献   

10.
Abstract. We consider the problem of testing parametric assumptions in an inverse regression model with a convolution‐type operator. An L 2 ‐type goodness‐of‐fit test is proposed which compares the distance between a parametric and a non‐parametric estimate of the regression function. Asymptotic normality of the corresponding test statistic is shown under the null hypothesis and under a general non‐parametric alternative with different rates of convergence in both cases. The feasibility of the proposed test is demonstrated by means of a small simulation study. In particular, the power of the test against certain types of alternative is investigated. Finally, an empirical example is provided, in which the proposed methods are applied to the determination of the shape of the luminosity profile of the elliptical galaxy NGC 5017.  相似文献   

11.
A goodness‐of‐fit procedure is proposed for parametric families of copulas. The new test statistics are functionals of an empirical process based on the theoretical and sample versions of Spearman's dependence function. Conditions under which this empirical process converges weakly are seen to hold for many families including the Gaussian, Frank, and generalized Farlie–Gumbel–Morgenstern systems of distributions, as well as the models with singular components described by Durante [Durante ( 2007 ) Comptes Rendus Mathématique. Académie des Sciences. Paris, 344, 195–198]. Thanks to a parametric bootstrap method that allows to compute valid P‐values, it is shown empirically that tests based on Cramér–von Mises distances keep their size under the null hypothesis. Simulations attesting the power of the newly proposed tests, comparisons with competing procedures and complete analyses of real hydrological and financial data sets are presented. The Canadian Journal of Statistics 37: 80‐101; 2009 © 2009 Statistical Society of Canada  相似文献   

12.
The process comparing the empirical cumulative distribution function of the sample with a parametric estimate of the cumulative distribution function is known as the empirical process with estimated parameters and has been extensively employed in the literature for goodness‐of‐fit testing. The simplest way to carry out such goodness‐of‐fit tests, especially in a multivariate setting, is to use a parametric bootstrap. Although very easy to implement, the parametric bootstrap can become very computationally expensive as the sample size, the number of parameters, or the dimension of the data increase. An alternative resampling technique based on a fast weighted bootstrap is proposed in this paper, and is studied both theoretically and empirically. The outcome of this work is a generic and computationally efficient multiplier goodness‐of‐fit procedure that can be used as a large‐sample alternative to the parametric bootstrap. In order to approximately determine how large the sample size needs to be for the parametric and weighted bootstraps to have roughly equivalent powers, extensive Monte Carlo experiments are carried out in dimension one, two and three, and for models containing up to nine parameters. The computational gains resulting from the use of the proposed multiplier goodness‐of‐fit procedure are illustrated on trivariate financial data. A by‐product of this work is a fast large‐sample goodness‐of‐fit procedure for the bivariate and trivariate t distribution whose degrees of freedom are fixed. The Canadian Journal of Statistics 40: 480–500; 2012 © 2012 Statistical Society of Canada  相似文献   

13.
During drug development, the calculation of inhibitory concentration that results in a response of 50% (IC50) is performed thousands of times every day. The nonlinear model most often used to perform this calculation is a four‐parameter logistic, suitably parameterized to estimate the IC50 directly. When performing these calculations in a high‐throughput mode, each and every curve cannot be studied in detail, and outliers in the responses are a common problem. A robust estimation procedure to perform this calculation is desirable. In this paper, a rank‐based estimate of the four‐parameter logistic model that is analogous to least squares is proposed. The rank‐based estimate is based on the Wilcoxon norm. The robust procedure is illustrated with several examples from the pharmaceutical industry. When no outliers are present in the data, the robust estimate of IC50 is comparable with the least squares estimate, and when outliers are present in the data, the robust estimate is more accurate. A robust goodness‐of‐fit test is also proposed. To investigate the impact of outliers on the traditional and robust estimates, a small simulation study was conducted. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

14.
Reduced k‐means clustering is a method for clustering objects in a low‐dimensional subspace. The advantage of this method is that both clustering of objects and low‐dimensional subspace reflecting the cluster structure are simultaneously obtained. In this paper, the relationship between conventional k‐means clustering and reduced k‐means clustering is discussed. Conditions ensuring almost sure convergence of the estimator of reduced k‐means clustering as unboundedly increasing sample size have been presented. The results for a more general model considering conventional k‐means clustering and reduced k‐means clustering are provided in this paper. Moreover, a consistent selection of the numbers of clusters and dimensions is described.  相似文献   

15.
In document clustering, a document may be assigned to multiple clusters and the probabilities of a document belonging to different clusters are directly normalized. We propose a new Posterior Probabilistic Clustering (PPC) model that has this normalization property. The clustering model is based on Nonnegative Matrix Factorization (NMF) and flexible such that if we use class conditional probability normalization, the model reduces to Probabilistic Latent Semantic Indexing (PLSI). Systematic comparison and evaluation indicates that PPC is competitive with other state-of-art clustering methods. Furthermore, the results of PPC are more sparse and orthogonal, both of which are highly desirable.  相似文献   

16.
In this paper, we propose to use a special class of bivariate frailty models to study dependent censored data. The proposed models are closely linked to Archimedean copula models. We give sufficient conditions for the identifiability of this type of competing risks models. The proposed conditions are derived based on a property shared by Archimedean copula models and satisfied by several well‐known bivariate frailty models. Compared with the models studied by Heckman and Honoré and Abbring and van den Berg, our models are more restrictive but can be identified with a discrete (even finite) covariate. Under our identifiability conditions, expectation–maximization (EM) algorithm provides us with consistent estimates of the unknown parameters. Simulation studies have shown that our estimation procedure works quite well. We fit a dependent censored leukaemia data set using the Clayton copula model and end our paper with some discussions. © 2014 Board of the Foundation of the Scandinavian Journal of Statistics  相似文献   

17.
The Ising model is one of the simplest and most famous models of interacting systems. It was originally proposed to model ferromagnetic interactions in statistical physics and is now widely used to model spatial processes in many areas such as ecology, sociology, and genetics, usually without testing its goodness of fit. Here, we propose various test statistics and an exact goodness‐of‐fit test for the finite‐lattice Ising model. The theory of Markov bases has been developed in algebraic statistics for exact goodness‐of‐fit testing using a Monte Carlo approach. However, finding a Markov basis is often computationally intractable. Thus, we develop a Monte Carlo method for exact goodness‐of‐fit testing for the Ising model that avoids computing a Markov basis and also leads to a better connectivity of the Markov chain and hence to a faster convergence. We show how this method can be applied to analyze the spatial organization of receptors on the cell membrane.  相似文献   

18.
One of the most popular methods and algorithms to partition data to k clusters is k-means clustering algorithm. Since this method relies on some basic conditions such as, the existence of mean and finite variance, it is unsuitable for data that their variances are infinite such as data with heavy tailed distribution. Pitman Measure of Closeness (PMC) is a criterion to show how much an estimator is close to its parameter with respect to another estimator. In this article using PMC, based on k-means clustering, a new distance and clustering algorithm is developed for heavy tailed data.  相似文献   

19.
In this paper we present a semiparametric test of goodness of fit which is based on the method of L‐moments for the estimation of the nuisance parameters. This test is particularly useful for any distribution that has a convenient expression for its quantile function. The test proceeds by investigating equality of the first few L‐moments of the true and the hypothesised distributions. We provide details and undertake simulation studies for the logistic and the generalised Pareto distributions. Although for some distributions the method of L‐moments estimator is less efficient than the maximum likelihood estimator, the former method has the advantage that it may be used in semiparametric settings and that it requires weaker existence conditions. The new test is often more powerful than competitor tests for goodness of fit of the logistic and generalised Pareto distributions.  相似文献   

20.
S. H. Ong 《Statistics》2013,47(3):291-302
In this paper, we consider the preliminary test approach for the estimation of the regression parameter in a multiple regression model under a multicollinearity situation. The preliminary test two-parameter estimators based on the Wald (W), likelihood ratio, and Lagrangian multiplier tests are given, when it is suspected that the regression parameter may be restricted to a subspace and the regression error is distributed with multivariate Student's t distribution. The bias and mean square error of the proposed estimators are derived and compared. The conditions of superiority of the proposed estimators are obtained. Finally, we conclude that the optimum choice of the level of significance becomes the traditional choice by using the Wald test.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号