首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
ABSTRACT

For many years, detection of clusters has been of great public health interest and widely studied. Several methods have been developed to detect clusters and their performance has been evaluated in various contexts. Spatial scan statistics are widely used for geographical cluster detection and inference. Different types of discrete or continuous data can be analyzed using spatial scan statistics for Bernoulli, Poisson, ordinal, exponential, and normal models. In this paper, we propose a scan statistic for survival data which is based on generalized life distribution model that provides three important life distributions, viz. Weibull, exponential, and Rayleigh. The proposed method is applied to the survival data of tuberculosis patients in Nainital district of Uttarakhand, India, for the year 2004–05. The Monte Carlo simulation studies reveal that the proposed method performs well for different survival distributions.  相似文献   

2.
In this article scan statistics for detecting a local change in variance for two-dimensional normal data are discussed. When the precise size of the rectangular window, where a local change in variance has occurred, is unknown, multiple and variable window scan statistics are proposed. A simulation study is presented to evaluate the performance of the scan statistics investigated in this article via comparison of power. A method for estimating the rectangular region, where a change in variance has occurred, and the size of the change in variance is also discussed.  相似文献   

3.
In statistical data analysis it is often important to compare, classify, and cluster different time series. For these purposes various methods have been proposed in the literature, but they usually assume time series with the same sample size. In this article, we propose a spectral domain method for handling time series of unequal length. The method make the spectral estimates comparable by producing statistics at the same frequency. The procedure is compared with other methods proposed in the literature by a Monte Carlo simulation study. As an illustrative example, the proposed spectral method is applied to cluster industrial production series of some developed countries.  相似文献   

4.
Abstract

In many cluster randomization studies, cluster sizes are not fixed and may be highly variable. For those studies, sample size estimation assuming a constant cluster size may lead to under-powered studies. Sample size formulas have been developed to incorporate the variability in cluster size for clinical trials with continuous and binary outcomes. Count outcomes frequently occur in cluster randomized studies. In this paper, we derive a closed-form sample size formula for count outcomes accounting for the variability in cluster size. We compare the performance of the proposed method with the average cluster size method through simulation. The simulation study shows that the proposed method has a better performance with empirical powers and type I errors closer to the nominal levels.  相似文献   

5.
This paper discusses regression analysis of clustered interval-censored failure time data, which often occur in medical follow-up studies among other areas. For such data, sometimes the failure time may be related to the cluster size, the number of subjects within each cluster or we have informative cluster sizes. For the problem, we present a within-cluster resampling method for the situation where the failure time of interest can be described by a class of linear transformation models. In addition to the establishment of the asymptotic properties of the proposed estimators of regression parameters, an extensive simulation study is conducted for the assessment of the finite sample properties of the proposed method and suggests that it works well in practical situations. An application to the example that motivated this study is also provided.  相似文献   

6.
A scan statistic is proposed for the prospective monitoring of spatiotemporal count data with an excess of zeros. The method that is based on an outbreak model for the zero‐inflated Poisson distribution is shown to be superior to traditional scan statistics based on the Poisson distribution in the presence of structural zeros. The spatial accuracy and the detection timeliness of the proposed scan statistic are investigated by means of simulation, and an application on the weekly cases of Campylobacteriosis in Germany illustrates how the scan statistic could be used to detect emerging disease outbreaks. An implementation of the method is provided in the open‐source R package scanstatistics available on the Comprehensive R Archive Network.  相似文献   

7.
Accurate and efficient methods to detect unusual clusters of abnormal activity are needed in many fields such as medicine and business. Often the size of clusters is unknown; hence, multiple (variable) window scan statistics are used to identify clusters using a set of different potential cluster sizes. We give an efficient method to compute the exact distribution of multiple window discrete scan statistics for higher-order, multi-state Markovian sequences. We define a Markov chain to efficiently keep track of probabilities needed to compute p-values for the statistic. The state space of the Markov chain is set up by a criterion developed to identify strings that are associated with observing the specified values of the statistic. Using our algorithm, we identify cases where the available approximations do not perform well. We demonstrate our methods by detecting unusual clusters of made free throw shots by National Basketball Association players during the 2009–2010 regular season.  相似文献   

8.
The problem of predicting future generalized-order statistics, by assuming the future sample size is a random variable, is discussed. A general expression for the coverage probability of the prediction intervals is derived. Since k-records and progressively type-II censored-order statistics are contained in the model of generalized-order statistics, the corresponding results for them can be deduced as special cases. When the future sample size has degenerate, binomial, Poisson and geometric distributions, numerical computations are given. The procedure for finding an optimal prediction interval is presented for each case. Finally, we apply our results to a real data set in life testing given in Lee and Wang [Statistical methods for survival data analysis. Hoboken, NJ: John Wiley and Sons; 2003, p. 58, Table 3.4] for illustrative the proposed procedure in this paper.  相似文献   

9.
The aim of this paper is to propose a pedagogical explanation of the Le Cam theorem and to illustrate its use, through a practical application, for temporal cluster detection. This theorem focusses on the interval division by randomly chosen points. The aim of the theorem is to characterize the asymptotic behavior of a certain category of sums of functions applied to the length of successive intervals between points. It is not very intuitive and its understanding needs some deepening. After enouncing the theorem, its different aspects are explained and detailed in a way as pedagogical as possible. Theoretical applications are proposed through the proof of two propositions. Then a very concrete application of this theorem for temporal cluster detection is presented, tested by a power study, and compared with other global cluster detection tests. Finally, this approach is applied to the well-known Knox temporal data set.  相似文献   

10.
This article focuses on the clustering problem based on Dirichlet process (DP) mixtures. To model both time invariant and temporal patterns, different from other existing clustering methods, the proposed semi-parametric model is flexible in that both the common and unique patterns are taken into account simultaneously. Furthermore, by jointly clustering subjects and the associated variables, the intrinsic complex shared patterns among subjects and among variables are expected to be captured. The number of clusters and cluster assignments are directly inferred with the use of DP. Simulation studies illustrate the effectiveness of the proposed method. An application to wheal size data is discussed with an aim of identifying novel temporal patterns among allergens within subject clusters.  相似文献   

11.
12.

Pairwise likelihood is a limited information estimation method that has also been used for estimating the parameters of latent variable and structural equation models. Pairwise likelihood is a special case of composite likelihood methods that uses lower-order conditional or marginal log-likelihoods instead of the full log-likelihood. The composite likelihood to be maximized is a weighted sum of marginal or conditional log-likelihoods. Weighting has been proposed for increasing efficiency, but the choice of weights is not straightforward in most applications. Furthermore, the importance of leaving out higher-order scores to avoid duplicating lower-order marginal information has been pointed out. In this paper, we approach the problem of weighting from a sampling perspective. More specifically, we propose a sampling method for selecting pairs based on their contribution to the total variance from all pairs. The sampling approach does not aim to increase efficiency but to decrease the estimation time, especially in models with a large number of observed categorical variables. We demonstrate the performance of the proposed methodology using simulated examples and a real application.

  相似文献   

13.
A versatile procedure is described comprising an application of statistical techniques to the analysis of the large, multi‐dimensional data arrays produced by electroencephalographic (EEG) measurements of human brain function. Previous analytical methods have been unable to identify objectively the precise times at which statistically significant experimental effects occur, owing to the large number of variables (electrodes) and small number of subjects, or have been restricted to two‐treatment experimental designs. Many time‐points are sampled in each experimental trial, making adjustment for multiple comparisons mandatory. Given the typically large number of comparisons and the clear dependence structure among time‐points, simple Bonferroni‐type adjustments are far too conservative. A three‐step approach is proposed: (i) summing univariate statistics across variables; (ii) using permutation tests for treatment effects at each time‐point; and (iii) adjusting for multiple comparisons using permutation distributions to control family‐wise error across the whole set of time‐points. Our approach provides an exact test of the individual hypotheses while asymptotically controlling family‐wise error in the strong sense, and can provide tests of interaction and main effects in factorial designs. An application to two experimental data sets from EEG studies is described, but the approach has application to the analysis of spatio‐temporal multivariate data gathered in many other contexts.  相似文献   

14.
In many medical studies patients are nested or clustered within doctor. With many explanatory variables, variable selection with clustered data can be challenging. We propose a method for variable selection based on random forest that addresses clustered data through stratified binary splits. Our motivating example involves the detection orthopedic device components from a large pool of candidates, where each patient belongs to a surgeon. Simulations compare the performance of survival forests grown using the stratified logrank statistic to conventional and robust logrank statistics, as well as a method to select variables using a threshold value based on a variable's empirical null distribution. The stratified logrank test performs superior to conventional and robust methods when data are generated to have cluster-specific effects, and when cluster sizes are sufficiently large, perform comparably to the splitting alternatives in the absence of cluster-specific effects. Thresholding was effective at distinguishing between important and unimportant variables.  相似文献   

15.
We propose a method for assessing an individual patient's risk of a future clinical event using clinical trial or cohort data and Cox proportional hazards regression, combining the information from several studies using meta-analysis techniques. The method combines patient-specific estimates of the log cumulative hazard across studies, weighting by the relative precision of the estimates, using either fixed- or random-effects meta-analysis calculations. Risk assessment can be done for any future patient using a few key summary statistics determined once and for all from each study. Generalizations of the method to logistic regression and linear models are immediate. We evaluate the methods using simulation studies and illustrate their application using real data.  相似文献   

16.
The circulant embedding method for generating statistically exact simulations of time series from certain Gaussian distributed stationary processes is attractive because of its advantage in computational speed over a competitive method based upon the modified Cholesky decomposition. We demonstrate that the circulant embedding method can be used to generate simulations from stationary processes whose spectral density functions are dictated by a number of popular nonparametric estimators, including all direct spectral estimators (a special case being the periodogram), certain lag window spectral estimators, all forms of Welch's overlapped segment averaging spectral estimator and all basic multitaper spectral estimators. One application for this technique is to generate time series for bootstrapping various statistics. When used with bootstrapping, our proposed technique avoids some – but not all – of the pitfalls of previously proposed frequency domain methods for simulating time series.  相似文献   

17.
Although prediction in mixed effects models usually concerns the random effects, in this paper we deal with the problem of prediction of a future, or yet unobserved, response random variable, belonging to a given cluster. In particular, the aim is to define computationally tractable prediction intervals, with conditional and unconditional coverage probability close to the target nominal value. This solution involves the conditional density of the future response random variable given the observed data, or a suitable high-order approximation based on the Laplace method. We prove that, unless the amount of data is very limited, the estimative or naive predictive procedure gives a relatively simple, feasible solution for response prediction. An application to generalized linear mixed models is presented.  相似文献   

18.
Abstract

This paper presents a new method to estimate the quantiles of generic statistics by combining the concept of random weighting with importance resampling. This method converts the problem of quantile estimation to a dual problem of tail probabilities estimation. Random weighting theories are established to calculate the optimal resampling weights for estimation of tail probabilities via sequential variance minimization. Subsequently, the quantile estimation is constructed by using the obtained optimal resampling weights. Experimental results on real and simulated data sets demonstrate that the proposed random weighting method can effectively estimate the quantiles of generic statistics.  相似文献   

19.
When conducting research with controlled experiments, sample size planning is one of the important decisions that researchers have to make. However, current methods do not adequately address this issue with regard to variance heterogeneity with some cost constraints for comparing several treatment means. This paper proposes a sample size allocation ratio in the fixed-effect heterogeneous analysis of variance when group variances are unequal and in cases where the sampling and/or variable cost has some constraints. The efficient sample size allocation is determined for the purpose of minimizing total cost with a designated power or maximizing the power with a given total cost. Finally, the proposed method is verified by using the index of relative efficiency and the corresponding total cost and the total sample size needed. We also apply our method in a pain management trial to decide an efficient sample size. Simulation studies also show that the proposed sample size formulas are efficient in terms of statistical power. SAS and R codes are provided in the appendix for easy application.  相似文献   

20.
In many research fields, scientific questions are investigated by analyzing data collected over space and time, usually at fixed spatial locations and time steps and resulting in geo-referenced time series. In this context, it is of interest to identify potential partitions of the space and study their evolution over time. A finite space-time mixture model is proposed to identify level-based clusters in spatio-temporal data and study their temporal evolution along the time frame. We anticipate space-time dependence by introducing spatio-temporally varying mixing weights to allocate observations at nearby locations and consecutive time points with similar cluster’s membership probabilities. As a result, a clustering varying over time and space is accomplished. Conditionally on the cluster’s membership, a state-space model is deployed to describe the temporal evolution of the sites belonging to each group. Fully posterior inference is provided under a Bayesian framework through Monte Carlo Markov chain algorithms. Also, a strategy to select the suitable number of clusters based upon the posterior temporal patterns of the clusters is offered. We evaluate our approach through simulation experiments, and we illustrate using air quality data collected across Europe from 2001 to 2012, showing the benefit of borrowing strength of information across space and time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号