首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Crime or disease surveillance commonly rely in space-time clustering methods to identify emerging patterns. The goal is to detect spatial-temporal clusters as soon as possible after its occurrence and to control the rate of false alarms. With this in mind, a spatio-temporal multiple cluster detection method was developed as an extension of a previous proposal based on a spatial version of the Shiryaev–Roberts statistic. Besides the capability of multiple cluster detection, the method have less input parameter than the previous proposal making its use more intuitive to practitioners. To evaluate the new methodology a simulation study is performed in several scenarios and enlighten many advantages of the proposed method. Finally, we present a case study to a crime data-set in Belo Horizonte, Brazil.  相似文献   

2.
ABSTRACT

For many years, detection of clusters has been of great public health interest and widely studied. Several methods have been developed to detect clusters and their performance has been evaluated in various contexts. Spatial scan statistics are widely used for geographical cluster detection and inference. Different types of discrete or continuous data can be analyzed using spatial scan statistics for Bernoulli, Poisson, ordinal, exponential, and normal models. In this paper, we propose a scan statistic for survival data which is based on generalized life distribution model that provides three important life distributions, viz. Weibull, exponential, and Rayleigh. The proposed method is applied to the survival data of tuberculosis patients in Nainital district of Uttarakhand, India, for the year 2004–05. The Monte Carlo simulation studies reveal that the proposed method performs well for different survival distributions.  相似文献   

3.
Binary outcome data with small clusters often arise in medical studies and the size of clusters might be informative of the outcome. The authors conducted a simulation study to examine the performance of a range of statistical methods. The simulation results showed that all methods performed mostly comparable in the estimation of covariate effects. However, the standard logistic regression approach that ignores the clustering encountered an undercoverage problem when the degree of clustering was nontrivial. The performance of random-effects logistic regression approach tended to be affected by low disease prevalence, relatively small cluster size, or informative cluster size.  相似文献   

4.
Many experiments aim at populations with persons nested within clusters. Randomization to treatment conditions can be done at the cluster level or at the person level within each cluster. The latter may result in control group contamination, and cluster randomization is therefore oftenpreferred in practice. This article models the control group contamination, calculates the required sample sizes for both levels of randomization, and gives the degree of contamination for which cluster randomization is preferable above randomization of persons within clusters. Moreover, itprovides examples of situations where one has to make a choice between both levels of randomization.  相似文献   

5.
Online monitoring is needed to detect outbreaks of diseases such as influenza. Surveillance is also needed for other kinds of outbreaks, in the sense of an increasing expected value after a constant period. Information on spatial location or other variables might be available and may be utilized. We adapted a robust method for outbreak detection to a multivariate case. The relation between the times of the onsets of the outbreaks at different locations (or some other variable) was used to determine the sufficient statistic for surveillance. The derived maximum-likelihood estimator of the outbreak regression was semi-parametric in the sense that the baseline and the slope were non-parametric while the distribution belonged to the one-parameter exponential family. The estimator was used in a generalized-likelihood ratio surveillance method. The method was evaluated with respect to robustness and efficiency in a simulation study and applied to spatial data for detection of influenza outbreaks in Sweden.  相似文献   

6.
Accurate and efficient methods to detect unusual clusters of abnormal activity are needed in many fields such as medicine and business. Often the size of clusters is unknown; hence, multiple (variable) window scan statistics are used to identify clusters using a set of different potential cluster sizes. We give an efficient method to compute the exact distribution of multiple window discrete scan statistics for higher-order, multi-state Markovian sequences. We define a Markov chain to efficiently keep track of probabilities needed to compute p-values for the statistic. The state space of the Markov chain is set up by a criterion developed to identify strings that are associated with observing the specified values of the statistic. Using our algorithm, we identify cases where the available approximations do not perform well. We demonstrate our methods by detecting unusual clusters of made free throw shots by National Basketball Association players during the 2009–2010 regular season.  相似文献   

7.
We set out IDR as a loglinear-model-based Moran's I test for Poisson count data that resembles the Moran's I residual test for Gaussian data. We evaluate its type I and type II error probabilities via simulations, and demonstrate its utility via a case study. When population sizes are heterogeneous, IDR is effective in detecting local clusters by local association terms with an acceptable type I error probability. When used in conjunction with local spatial association terms in loglinear models, IDR can also indicate the existence of first-order global cluster that can hardly be removed by local spatial association terms. In this situation, IDR should not be directly applied for local cluster detection. In the case study of St. Louis homicides, we bridge loglinear model methods for parameter estimation to exploratory data analysis, so that a uniform association term can be defined with spatially varied contributions among spatial neighbors. The method makes use of exploratory tools such as Moran's I scatter plots and residual plots to evaluate the magnitude of deviance residuals, and it is effective to model the shape, the elevation and the magnitude of a local cluster in the model-based test.  相似文献   

8.
This paper considers the effects of informative two-stage cluster sampling on estimation and prediction. The aims of this article are twofold: first to estimate the parameters of the superpopulation model for two-stage cluster sampling from a finite population, when the sampling design for both stages is informative, using maximum likelihood estimation methods based on the sample-likelihood function; secondly to predict the finite population total and to predict the cluster-specific effects and the cluster totals for clusters in the sample and for clusters not in the sample. To achieve this we derive the sample and sample-complement distributions and the moments of the first and second stage measurements. Also we derive the conditional sample and conditional sample-complement distributions and the moments of the cluster-specific effects given the cluster measurements. It should be noted that classical design-based inference that consists of weighting the sample observations by the inverse of sample selection probabilities cannot be applied for the prediction of the cluster-specific effects for clusters not in the sample. Also we give an alternative justification of the Royall [1976. The linear least squares prediction approach to two-stage sampling. Journal of the American Statistical Association 71, 657–664] predictor of the finite population total under two-stage cluster population. Furthermore, small-area models are studied under informative sampling.  相似文献   

9.
Applying spatiotemporal scan statistics is an effective method to detect the clustering of mean shifts in many application fields. Although several exponentially weighted moving average (EWMA) based scan statistics have been proposed, the existing methods generally require a fixed scan window size or apply the weighting technique across the temporal axis only. However, the size of shift coverage is often unavailable in practical problems. Using a mismatching scan radius may mislead the size of cluster coverage in space or delay the time to detection. This research proposed an stEWMA method by applying the weighting technique across both temporal and spatial axes with variable scan radius. The simulation analysis showed that the stEWMA method can have a significantly shorter time to detection than the likelihood ratio-based scan statistic using variable scan radius, especially when cluster coverage size is small. The application to detecting the increase of male thyroid cancer in the New Mexico state also showed the effectiveness of the proposed method.  相似文献   

10.
Among the many tools suited to detect local clusters in group-level data, Kulldorff–Nagarwalla’s spatial scan statistic gained wide popularity (Kulldorff and Nagarwalla in Stat Med 14(8):799–810, 1995). The underlying assumptions needed for making statistical inference feasible are quite strong, as counts in spatial units are assumed to be independent Poisson distributed random variables. Unfortunately, outcomes in spatial units are often not independent of each other, and risk estimates of areas that are close to each other will tend to be positively correlated as they share a number of spatially varying characteristics. We therefore introduce a Bayesian model-based algorithm for cluster detection in the presence of spatially autocorrelated relative risks. Our approach has been made possible by the recent development of new numerical methods based on integrated nested Laplace approximation, by which we can directly compute very accurate approximations of posterior marginals within short computational time (Rue et al. in JRSS B 71(2):319–392, 2009). Simulated data and a case study show that the performance of our method is at least comparable to that of Kulldorff–Nagarwalla’s statistic.  相似文献   

11.
In this paper we consider a novel approach to analyzing medical images by applying a concept typically employed in geospatial studies. For certain diseases, such as asthma, there is a relevant distinction between the heterogeneity of constriction in airways for patients compared to healthy individuals. In order to describe such heterogeneities quantitatively, we utilize spatial correlation in the realm of lung computer tomography (CT). Specifically, we apply the approximate profile-likelihood estimator (APLE) to simulated lung air-trapping data selected based on potential interest to pulmonologists, and we explore reference values obtainable through this statistic. Results indicate that APLE values are independent of air-trapping values, and can provide useful insight into spatial patterns of these values within the lungs in situations where other common metrics, such as the coefficient of variation, reveal little. The APLE relies on a neighborhood weights matrix to define spatial relatedness of considered regions, and among a few weight structures explored, a working optimal choice seems to be one based on the inverse distance squared between regions of interest. The application yields a new method to help analyze the degree of heterogeneity in lung CT images, which can be generalized to other medical images as well.  相似文献   

12.
In epidemiologic studies where the outcome is binary, the data often arise as clusters, as when siblings, friends or neighbors are used as matched controls in a case-control study. Conditional logistic regression (CLR) is typically used for such studies to estimate the odds ratio for an exposure of interest. However, CLR assumes the exposure coefficient is the same in every cluster, and CLR-based inference can be badly biased when homogeneity is violated. Existing methods for testing goodness-of-fit for CLR are not designed to detect such violations. Good alternative methods of analysis exist if one suspects there is heterogeneity across clusters. However, routine use of alternative robust approaches when there is no appreciable heterogeneity could cause loss of precision and be computationally difficult, particularly if the clusters are small. We propose a simple non-parametric test, the test of heterogeneous susceptibility (THS), to assess the assumption of homogeneity of a coefficient across clusters. The test is easy to apply and provides guidance as to the appropriate method of analysis. Simulations demonstrate that the THS has reasonable power to reveal violations of homogeneity. We illustrate by applying the THS to a study of periodontal disease.  相似文献   

13.
This article investigates the effects of number of clusters, cluster size, and correction for chance agreement on the distribution of two similarity indices, namely, Jaccard and Rand indices. Skewness and kurtosis are calculated for the two indices and their corrected forms then compared with those of the normal distribution. Three clustering algorithms are implemented: complete linkage, Ward, and K-means. Data were randomly generated from bivariate normal distributions with specified means and variance covariance matrices. Three-way ANOVA is performed to assess the significance of the design factors using skewness and kurtosis of the indices as responses. Test statistics for testing skewness and kurtosis and observed power are calculated. Simulation results showed that independent of the clustering algorithms or the similarity indices used, the interaction effect cluster size x number of clusters and the main effects of cluster size and number of clusters were found always significant for skewness and kurtosis. The three way interaction of cluster size x correction x number of clusters was significant for skewness of Rand and Jaccard indices using all clustering algorithms, but was not significant using Ward's method for both Rand and Jaccard indices, while significant for Jaccard only using complete linkage and K-means algorithms. The correction for chance agreement was significant for skewness and kurtosis using Rand and Jaccard indices when complete linkage method is used. Hence, such design factors must be taken into consideration when studying distribution of such indices.  相似文献   

14.
Most disease registries are updated at least yearly. If a geographically localized health hazard suddenly occurs, we would like to have a surveillance system in place that can pick up a new geographical disease cluster as quickly as possible, irrespective of its location and size. At the same time, we want to minimize the number of false alarms. By using a space–time scan statistic, we propose and illustrate a system for regular time periodic disease surveillance to detect any currently 'active' geographical clusters of disease and which tests the statistical significance of such clusters adjusting for the multitude of possible geographical locations and sizes, time intervals and time periodic analyses. The method is illustrated on thyroid cancer among men in New Mexico 1973–1992.  相似文献   

15.
We study the problem of merging homogeneous groups of pre-classified observations from a robust perspective motivated by the anti-fraud analysis of international trade data. This problem may be seen as a clustering task which exploits preliminary information on the potential clusters, available in the form of group-wise linear regressions. Robustness is then needed because of the sensitivity of likelihood-based regression methods to deviations from the postulated model. Through simulations run under different contamination scenarios, we assess the impact of outliers both on group-wise regression fitting and on the quality of the final clusters. We also compare alternative robust methods that can be adopted to detect the outliers and thus to clean the data. One major conclusion of our study is that the use of robust procedures for preliminary outlier detection is generally recommended, except perhaps when contamination is weak and the identification of cluster labels is more important than the estimation of group-specific population parameters. We also apply the methodology to find homogeneous groups of transactions in one empirical example that illustrates our motivating anti-fraud framework.  相似文献   

16.
We suggest locally parametric methods for estimating curves, such as boundaries of density supports or fault lines in response surfaces, in a variety of spatial problems. The methods are based on spatial approximations to the local likelihood that the curve passes through a given point in the plane, as a function of that point. The local likelihood might be a regular likelihood computed locally, with kernel weights (e.g. in the case of support boundary estimation) or a local version of a likelihood ratio statistic (e.g. in fault line estimation). In either case, the local likelihood surface represents a function which is relatively large near the target curve, and relatively small elsewhere. Therefore, the curve may be estimated as a ridge line of the surface; we require only a numerical algorithm for tracking the projection of a ridge into the plane. This approach offers several potential advantages over alternative methods. First, the local (log-)likelihood surface can be graphed, and the degree of 'ridginess' assessed visually, to determine how the level of local smoothing should be varied in different spatial locations in order to emphasize the ridge and hence the curve adequately. Secondly, the local likelihood surface does not need to be computed in anything like its entirety; once we have a reasonable approximation to a point on the curve we may track it by numerically 'walking along' the ridge line. Thirdly, the method is appropriate without change for many different types of spatial explanatory variables—gridded, stochastic or otherwise. Three examples are explored in detail; fault lines in response surfaces and in intensity or density surfaces, and boundaries of supports of probability densities.  相似文献   

17.
A spatial process observed over a lattice or a set of irregular regions is usually modeled using a conditionally autoregressive (CAR) model. The neighborhoods within a CAR model are generally formed using only the inter-distances or boundaries between the regions. To accommodate directional spatial variation, a new class of spatial models is proposed using different weights given to neighbors in different directions. The proposed model generalizes the usual CAR model by accounting for spatial anisotropy. Maximum likelihood estimators are derived and shown to be consistent under some regularity conditions. Simulation studies are presented to evaluate the finite sample performance of the new model as compared to the CAR model. Finally, the method is illustrated using a data set on the crime rates of Columbus, OH and on the elevated blood lead levels of children under the age of 72 months observed in Virginia in the year of 2000.  相似文献   

18.
Centroid-based partitioning cluster analysis is a popular method for segmenting data into more homogeneous subgroups. Visualization can help tremendously to understand the positions of these subgroups relative to each other in higher dimensional spaces and to assess the quality of partitions. In this paper we present several improvements on existing cluster displays using neighborhood graphs with edge weights based on cluster separation and convex hulls of inner and outer cluster regions. A new display called shadow-stars can be used to diagnose pairwise cluster separation with respect to the distribution of the original data. Artificial data and two case studies with real data are used to demonstrate the techniques.  相似文献   

19.
When modeling correlated binary data in the presence of informative cluster sizes, generalized estimating equations with either resampling or inverse-weighting, are often used to correct for estimation bias. However, existing methods for the clustered longitudinal setting assume constant cluster sizes over time. We present a subject-weighted generalized estimating equations scheme that provides valid parameter estimation for the clustered longitudinal setting while allowing cluster sizes to change over time. We compare, via simulation, the performance of existing methods to our subject-weighted approach. The subject-weighted approach was the only method that showed negligible bias, with excellent coverage, for all model parameters.  相似文献   

20.
Compared to tests for localized clusters, the tests for global clustering only collect evidence for clustering throughout the study region without evaluating the statistical significance of the individual clusters. The weighted likelihood ratio (WLR) test based on the weighted sum of likelihood ratios represents an important class of tests for global clustering. Song and Kulldorff (Likelihood based tests for spatial randomness. Stat Med. 2006;25(5):825–839) developed a wide variety of weight functions with the WLR test for global clustering. However, these weight functions are often defined based on the cell population size or the geographic information such as area size and distance between cells. They do not make use of the information from the observed count, although the likelihood ratio of a potential cluster depends on both the observed count and its population size. In this paper, we develop a self-adjusted weight function to directly allocate weights onto the likelihood ratios according to their values. The power of the test was evaluated and compared with existing methods based on a benchmark data set. The comparison results favour the suggested test especially under global chain clustering models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号