首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Yu (1995) provides a novel convergence diagnostic for Markov chain Monte Carlo (MCMC) which provides a qualitative measure of mixing for Markov chains via a cusum path plot for univariate parameters of interest. The method is based upon the output of a single replication of an MCMC sampler and is therefore widely applicable and simple to use. One criticism of the method is that it is subjective in its interpretation, since it is based upon a graphical comparison of two cusum path plots. In this paper, we develop a quantitative measure of smoothness which we can associate with any given cusum path, and show how we can use this measure to obtain a quantitative measure of mixing. In particular, we derive the large sample distribution of this smoothness measure, so that objective inference is possible. In addition, we show how this quantitative measure may also be used to provide an estimate of the burn-in length for any given sampler. We discuss the utility of this quantitative approach, and highlight a problem which may occur if the chain is able to remain in any one state for some period of time. We provide a more general implementation of the method to overcome the problem in such cases.  相似文献   

2.
In this paper, we propose to monitor a Markov chain sampler using the cusum path plot of a chosen one-dimensional summary statistic. We argue that the cusum path plot can bring out, more effectively than the sequential plot, those aspects of a Markov sampler which tell the user how quickly or slowly the sampler is moving around in its sample space, in the direction of the summary statistic. The proposal is then illustrated in four examples which represent situations where the cusum path plot works well and not well. Moreover, a rigorous analysis is given for one of the examples. We conclude that the cusum path plot is an effective tool for convergence diagnostics of a Markov sampler and for comparing different Markov samplers.  相似文献   

3.
Social network monitoring consists of monitoring changes in networks with the aim of detecting significant ones and attempting to identify assignable cause(s) contributing to the occurrence of a change. This paper proposes a method that helps to overcome some of the weaknesses of the existing methods. A Poisson regression model for the probability of the number of communications between network members as a function of vertex attributes is constructed. Multivariate exponentially weighted moving average (MEWMA) and multivariate cumulative sum (MCUSUM) control charts are used to monitor the network formation process. The results indicate more efficient performance for the MEWMA chart in identifying significant changes.  相似文献   

4.
Selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. In this paper, we propose a flexible rank-based nonparametric procedure for gene selection from microarray data. In the method we propose a statistic for testing whether area under receiver operating characteristic curve (AUC) for each gene is equal to 0.5 allowing different variance for each gene. The contribution to this “single gene” statistic is the studentization of the empirical AUC, which takes into account the variances associated with each gene in the experiment. Delong et al. proposed a nonparametric procedure for calculating a consistent variance estimator of the AUC. We use their variance estimation technique to get a test statistic, and we focus on the primary step in the gene selection process, namely, the ranking of genes with respect to a statistical measure of differential expression. Two real datasets are analyzed to illustrate the methods and a simulation study is carried out to assess the relative performance of different statistical gene ranking measures. The work includes how to use the variance information to produce a list of significant targets and assess differential gene expressions under two conditions. The proposed method does not involve complicated formulas and does not require advanced programming skills. We conclude that the proposed methods offer useful analytical tools for identifying differentially expressed genes for further biological and clinical analysis.  相似文献   

5.
孙海燕 《统计研究》1999,16(10):39-44
长期以来,我国政府对统计数据的质量极为重视,并通过各级统计人员的不懈努力,已经探索出一套较为完整的、准确的和有效的控制数据质量的措施和方法,积累了相当丰富的实际操作经验,对提高统计数据质量起到了必不可少的作用。但是,这些措施和方法更偏重于具体细则的实施和实际经验的运用,缺少控制数据质量方法的介入。特别是在不同经济模式下,统计数据摆动相当显著,仅凭经验判断虚假数据就远远不能满足要求。形势迫切地要求我们除了加大政策和法规实施力度外,还必须尽快引进现代统计方法,实现从经验到经验和方法论相结合的过渡,以…  相似文献   

6.
In this paper some hierarchical methods for identifying groups of variables are illustrated and compared. It is shown that the use of multivariate association measures between two sets of variables can overcome the drawbacks of the usually employed bivariate correlation coefficient, but the resulting methods are generally not monotonic. Thus a new multivariate association measure is proposed, based on the links existing between canonical correlation analysis and principal component analysis, which can be more suitably used for the purpose at hand. The hierarchical method based on the suggested measure is illustrated and compared with other possible solutions by analysing simulated and real data sets. Finally an extension of the suggested method to the more general situation of mixed (qualitative and quantitative) variables is proposed and theoretically discussed.  相似文献   

7.
A method of monitoring the incidence of malformations is described. It is suitable for systems where the number of births between successive malformations is known or can be estimated with reasonable accuracy. The method utilises a cusum technique based on the exponential distribution to detect an increase in the incidence of malformations above a baseline level. Adequate information to enable the implementation of the method is presented. The proposed method compares favourably with others such as the Poisson cusum and the modified sets technique.  相似文献   

8.
ADE-4: a multivariate analysis and graphical display software   总被引:59,自引:0,他引:59  
We present ADE-4, a multivariate analysis and graphical display software. Multivariate analysis methods available in ADE-4 include usual one-table methods like principal component analysis and correspondence analysis, spatial data analysis methods (using a total variance decomposition into local and global components, analogous to Moran and Geary indices), discriminant analysis and within/between groups analyses, many linear regression methods including lowess and polynomial regression, multiple and PLS (partial least squares) regression and orthogonal regression (principal component regression), projection methods like principal component analysis on instrumental variables, canonical correspondence analysis and many other variants, coinertia analysis and the RLQ method, and several three-way table (k-table) analysis methods. Graphical display techniques include an automatic collection of elementary graphics corresponding to groups of rows or to columns in the data table, thus providing a very efficient way for automatic k-table graphics and geographical mapping options. A dynamic graphic module allows interactive operations like searching, zooming, selection of points, and display of data values on factor maps. The user interface is simple and homogeneous among all the programs; this contributes to making the use of ADE-4 very easy for non- specialists in statistics, data analysis or computer science.  相似文献   

9.
Process monitoring in the presence of data correlation is one of the most discussed issues in statistical process control literature over the past decade. However, the attention to retrospective analysis in the presence of data correlation with various common cause sigma estimators is lacking in the literature. Maragah et al. (1992), in an early paper on the retrospective analysis in presence of data correlation, addresses only a single common cause sigma estimator. This paper studies the effect of data correlation on retrospective X-chart with various common cause sigma estimates in stable period of AR(1) Process. This study is carried out with the aim of identifying suitable standard deviation statistic/statistics which is/are robust to the data correlation. This paper also discusses the robustness of common cause sigma estimates for monitoring the data following other time series models, namely ARMA(1,1) and AR(p). Further, the bias characteristics of robust standard deviation estimates have been discussed for the above time-series models. This paper further studies the performance of retrospective X-chart on forecast residuals from various forecasting methods of AR(1) process. The above studies were carried out through simulating the stable period of AR(1), AR(2), stable and invertible period of ARMA(1,1) processes. The average number of false alarms have been considered as a measure of performance. The results of simulation studies have been discussed.  相似文献   

10.
The availability of the next generation sequencing (NGS) technology in today's biomedical research has provided new opportunities in scientific discovery of genetic information. The high-throughput NGS technology, especially DNA-seq, is particularly useful in profiling a genome for the analysis of DNA copy number variants (CNVs). The read count (RC) data resulting from NGS technology are massive and information rich. How to exploit the RC data for accurate CNV detection has become a computational and statistical challenge. We provide a statistical online change point method to help detect CNVs in the sequencing RC data in this paper. This method uses the idea of online searching for change point (or breakpoint) with a Markov chain assumption on the breakpoints loci and an iterative computing process via a Bayesian framework. We illustrate that an online change-point detection method is particularly suitable for identifying CNVs in the RC data. The algorithm is applied to the publicly available NCI-H2347 lung cancer cell line sequencing reads data for locating the breakpoints. Extensive simulation studies have been carried out and results show the good behavior of the proposed algorithm. The algorithm is implemented in R and the codes are available upon request.  相似文献   

11.
Missing data are a common problem in almost all areas of empirical research. Ignoring the missing data mechanism, especially when data are missing not at random (MNAR), can result in biased and/or inefficient inference. Because MNAR mechanism is not verifiable based on the observed data, sensitivity analysis is often used to assess it. Current sensitivity analysis methods primarily assume a model for the response mechanism in conjunction with a measurement model and examine sensitivity to missing data mechanism via the parameters of the response model. Recently, Jamshidian and Mata (Post-modelling sensitivity analysis to detect the effect of missing data mechanism, Multivariate Behav. Res. 43 (2008), pp. 432–452) introduced a new method of sensitivity analysis that does not require the difficult task of modelling the missing data mechanism. In this method, a single measurement model is fitted to all of the data and to a sub-sample of the data. Discrepancy in the parameter estimates obtained from the the two data sets is used as a measure of sensitivity to missing data mechanism. Jamshidian and Mata describe their method mainly in the context of detecting data that are missing completely at random (MCAR). They used a bootstrap type method, that relies on heuristic input from the researcher, to test for the discrepancy of the parameter estimates. Instead of using bootstrap, the current article obtains confidence interval for parameter differences on two samples based on an asymptotic approximation. Because it does not use bootstrap, the developed procedure avoids likely convergence problems with the bootstrap methods. It does not require heuristic input from the researcher and can be readily implemented in statistical software. The article also discusses methods of obtaining sub-samples that may be used to test missing at random in addition to MCAR. An application of the developed procedure to a real data set, from the first wave of an ongoing longitudinal study on aging, is presented. Simulation studies are performed as well, using two methods of missing data generation, which show promise for the proposed sensitivity method. One method of missing data generation is also new and interesting in its own right.  相似文献   

12.
Abstract

The frailties, representing extra variations due to unobserved measurements, are often assumed to be iid in shared frailty models. In medical applications, however, a speculation can arise that a data set might violate the iid assumption. In this paper we investigate this conjecture through an analysis of the kidney infection data in McGilchrist and Aisbett (McGilchrist, C. A., Aisbett, C. W. (1991). Regression with frailty in survival analysis. Biometrics 47:461–466). As a test procedure, we consider the cusum of squares test which is frequently used for monitoring a variance change in statistical models. Our result strongly sustains the heterogeneity of the frailty distribution.  相似文献   

13.
Structural equation models (SEM) have been extensively used in behavioral, social, and psychological research to model relations between the latent variables and the observations. Most software packages for the fitting of SEM rely on frequentist methods. Traditional models and software are not appropriate for analysis of the dependent observations such as time-series data. In this study, a structural equation model with a time series feature is introduced. A Bayesian approach is used to solve the model with the aid of the Markov chain Monte Carlo method. Bayesian inferences as well as prediction with the proposed time series structural equation model can also reveal certain unobserved relationships among the observations. The approach is successfully employed using real Asian, American and European stock return data.  相似文献   

14.
Environmental issues have become a hot topic recently, especially those surrounding industrial outputs. Effluents, emissions, outflows, by-products, waste materials, product de-commissioning, land reclamation and energy consumption are all the subject of monitoring, either under new legislation or through economic necessity. Many types of environmental data are often difficult to understand or measure because of their unusual distribution of values however. Standard methods of monitoring these data types often fail or are unwieldy. The scarcity of events, small volume measurements and the unusual time scales sometimes involved add to the complexity of the task. One recently developed monitoring technique is the Summed Rank Cusum (SRC) that applies non-parametric methods to a standard chart. The SRC can be used diagnostically and this paper describes the application of this new tool to three data sets, each derived from a different problem area. These are measuring industrial effluent, assessing the levels of potentially harmful proteins produced by an industrial process and industrial land reclamation in the face of harmful waste materials. The use of the SRC to spot change points in time retrospectively is described. The paper also shows the use of SRC in the significant-difference testing mode, which is applied via the use of spreadsheets. Links to other similar methods described in the literature are given and formulae describing the statistical nature of the transformation are shown. These practical demonstrations illustrate that the graphical interpretation of the method appears to help considerably in practice when trying to find time-series change points. The charts are an effective graphical retrospective monitoring technique when dealing with non-normal data. The method is easy to apply and may help considerably in dealing with environmental data in the industrial setting when standard methods are not appropriate. Further work is continuing on the more theoretical aspects of the method.  相似文献   

15.
Environmental issues have become a hot topic recently, especially those surrounding industrial outputs. Effluents, emissions, outflows, by-products, waste materials, product de-commissioning, land reclamation and energy consumption are all the subject of monitoring, either under new legislation or through economic necessity. Many types of environmental data are often difficult to understand or measure because of their unusual distribution of values however. Standard methods of monitoring these data types often fail or are unwieldy. The scarcity of events, small volume measurements and the unusual time scales sometimes involved add to the complexity of the task. One recently developed monitoring technique is the Summed Rank Cusum (SRC) that applies non-parametric methods to a standard chart. The SRC can be used diagnostically and this paper describes the application of this new tool to three data sets, each derived from a different problem area. These are measuring industrial effluent, assessing the levels of potentially harmful proteins produced by an industrial process and industrial land reclamation in the face of harmful waste materials. The use of the SRC to spot change points in time retrospectively is described. The paper also shows the use of SRC in the significant-difference testing mode, which is applied via the use of spreadsheets. Links to other similar methods described in the literature are given and formulae describing the statistical nature of the transformation are shown. These practical demonstrations illustrate that the graphical interpretation of the method appears to help considerably in practice when trying to find time-series change points. The charts are an effective graphical retrospective monitoring technique when dealing with non-normal data. The method is easy to apply and may help considerably in dealing with environmental data in the industrial setting when standard methods are not appropriate. Further work is continuing on the more theoretical aspects of the method.  相似文献   

16.
Statisticians often employ simultaneous confidence intervals to reduce the likelihood of their drawing false conclusions when they must make a number of comparisons. To do this properly, it is necessary to consider the family of comparisons over which simultaneous confidence must be assured. Sometimes it is not clear what family of comparisons is appropriate. We describe how computer software can monitor the types of contrasts a user examines, and select the smallest family of contrasts that is likely to be of interest. We also describe how to calculate simultaneous confidence intervals for these families using a hybrid of the Bonferroni and Scheffé methods. Our method is especially suitable for problems with discrete and continuous predictors.  相似文献   

17.
Existing statistical methods for the detection of space–time clusters of point events are retrospective, in that they are used to ascertain whether space–time clustering exists among a fixed number of past events. In contrast, prospective methods treat a series of observations sequentially, with the aim of detecting quickly any changes that occur in the series. In this paper, cumulative sum methods of monitoring are adapted for use with Knox's space–time statistic. The result is a procedure for the rapid detection of any emergent space–time interactions for a set of sequentially monitored point events. The approach relies on a 'local' Knox statistic that is useful in retrospective analyses to detect when and where space–time interaction occurs. The distribution of the local Knox statistic under the null hypothesis of no space–time interaction is derived. The retrospective local statistic and the prospective cumulative sum monitoring method are illustrated by using previously published data on Burkitt's lymphoma in Uganda.  相似文献   

18.
The real-life environment is made of probabilistic data by nature and the ability to make decisions based on probabilities is crucial in the business world. It is common to have a set of data and the need of calculating the probability of taking a value greater or less than a specific value. It is also common in many companies the unavailability of a statistical software or a specialized professional in statistics. The purpose of this paper is to present a practical and simple method to calculate probabilities from normal or non-normal distributed data set and illustrate it with an application from the electronic industry. The method does not demand statistical knowledge from the user; there is no need of normality assumptions, goodness test or transformations. The proposed method is easy to implement, robust and the experiments have evidenced its quality. The technique is validated with a large variety of instances and compared with the well-known Johnson system of distributions.  相似文献   

19.
When recruitment into a clinical trial is limited due to rarity of the disease of interest, or when recruitment to the control arm is limited due to ethical reasons (eg, pediatric studies or important unmet medical need), exploiting historical controls to augment the prospectively collected database can be an attractive option. Statistical methods for combining historical data with randomized data, while accounting for the incompatibility between the two, have been recently proposed and remain an active field of research. The current literature is lacking a rigorous comparison between methods but also guidelines about their use in practice. In this paper, we compare the existing methods based on a confirmatory phase III study design exercise done for a new antibacterial therapy with a binary endpoint and a single historical dataset. A procedure to assess the relative performance of the different methods for borrowing information from historical control data is proposed, and practical questions related to the selection and implementation of methods are discussed. Based on our examination, we found that the methods have a comparable performance, but we recommend the robust mixture prior for its ease of implementation.  相似文献   

20.
Covariance changes detection in multivariate time series   总被引:1,自引:0,他引:1  
This paper studies the detection of step changes in the variances and in the correlation structure of the components of a vector of time series. Two procedures based on the likelihood ratio test (LRT) statistic and on a cumulative sums (cusum) statistic are considered and compared in a simulation study. We conclude that for a single covariance change the cusum procedure is more powerful in small and medium samples, whereas the likelihood ratio test is more powerful in large samples. However, for several covariance changes the cusum procedure works clearly better. The procedures are illustrated in two real data examples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号