首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 687 毫秒
We propose a new regression-based filter for extracting signals online from multivariate high frequency time series. It separates relevant signals of several variables from noise and (multivariate) outliers.

Unlike parallel univariate filters, the new procedure takes into account the local covariance structure between the single time series components. It is based on high-breakdown estimates, which makes it robust against (patches of) outliers in one or several of the components as well as against outliers with respect to the multivariate covariance structure. Moreover, the trade-off problem between bias and variance for the optimal choice of the window width is approached by choosing the size of the window adaptively, depending on the current data situation.

Furthermore, we present an advanced algorithm of our filtering procedure that includes the replacement of missing observations in real time. Thus, the new procedure can be applied in online-monitoring practice. Applications to physiological time series from intensive care show the practical effect of the proposed filtering technique.  相似文献   

This article considers the analysis of complex monitored health data, where often one or several signals are reflecting the current health status that can be represented by a finite number of states, in addition to a set of covariates. In particular, we consider a novel application of a non-parametric state intensity regression method in order to study time-dependent effects of covariates on the state transition intensities. The method can handle baseline, time varying as well as dynamic covariates. Because of the non-parametric nature, the method can handle different data types and challenges under minimal assumptions. If the signal that is reflecting the current health status is of continuous nature, we propose the application of a weighted median and a hysteresis filter as data pre-processing steps in order to facilitate robust analysis. In intensity regression, covariates can be aggregated by a suitable functional form over a time history window. We propose to study the estimated cumulative regression parameters for different choices of the time history window in order to investigate short- and long-term effects of the given covariates. The proposed framework is discussed and applied to resuscitation data of newborns collected in Tanzania.  相似文献   

We propose tests for hypotheses on the parameters of the deterministic trend function of a univariate time series. The tests do not require knowledge of the form of serial correlation in the data, and they are robust to strong serial correlation. The data can contain a unit root and still have the correct size asymptotically. The tests that we analyze are standard heteroscedasticity autocorrelation robust tests based on nonparametric kernel variance estimators. We analyze these tests using the fixed-b asymptotic framework recently proposed by Kiefer and Vogelsang. This analysis allows us to analyze the power properties of the tests with regard to bandwidth and kernel choices. Our analysis shows that among popular kernels, specific kernel and bandwidth choices deliver tests with maximal power within a specific class of tests. Based on the theoretical results, we propose a data-dependent bandwidth rule that maximizes integrated power. Our recommended test is shown to have power that dominates a related test proposed by Vogelsang. We apply the recommended test to the logarithm of a net barter terms of trade series and we find that this series has a statistically significant negative slope. This finding is consistent with the well-known Prebisch–Singer hypothesis.  相似文献   

Cumulative sum (cusum) methods can be used for monitoring processes and for retrospective (historical) data analysis. Most software only provides the former. The comment by Williamson that retrospective cusum analysis is a neglected area is still true. Though not in vogue, retrospective cusum analysis is useful for investigations such as benchmarking of processes, identifying causes of process decay, selecting reference data sets for typicality studies, and reporting of historical data. Even those texts which cover retrospective analyses, usually ignore the question of identifying multiple points of change (breakpoints), and present essentially manual methods for assessing single breakpoints. Most users of statistical methods want software solutions that are easy to use and require little user intervention or interpretation. Direct implementation of manual method does not give a user robust solution. Problems are illustrated. Attempts to use monitoring CuSums in retrospective analysis can also lead to errors. A practical recursive method is presented for breakpoint identification and significance assessment, which can be automated. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

Early phase 2 tuberculosis (TB) trials are conducted to characterize the early bactericidal activity (EBA) of anti‐TB drugs. The EBA of anti‐TB drugs has conventionally been calculated as the rate of decline in colony forming unit (CFU) count during the first 14 days of treatment. The measurement of CFU count, however, is expensive and prone to contamination. Alternatively to CFU count, time to positivity (TTP), which is a potential biomarker for long‐term efficacy of anti‐TB drugs, can be used to characterize EBA. The current Bayesian nonlinear mixed‐effects (NLME) regression model for TTP data, however, lacks robustness to gross outliers that often are present in the data. The conventional way of handling such outliers involves their identification by visual inspection and subsequent exclusion from the analysis. However, this process can be questioned because of its subjective nature. For this reason, we fitted robust versions of the Bayesian nonlinear mixed‐effects regression model to a wide range of TTP datasets. The performance of the explored models was assessed through model comparison statistics and a simulation study. We conclude that fitting a robust model to TTP data obviates the need for explicit identification and subsequent “deletion” of outliers but ensures that gross outliers exert no undue influence on model fits. We recommend that the current practice of fitting conventional normal theory models be abandoned in favor of fitting robust models to TTP data.  相似文献   

The trend test is often used for the analysis of 2×K ordered categorical data, in which K pre-specified increasing scores are used. There have been discussions on how to assign these scores and the impact of the outcomes on different scores. The scores are often assigned based on the data-generating model. When this model is unknown, using the trend test is not robust. We discuss the weighted average of a trend test over all scientifically plausible choices of scores or models. This approach is more computationally efficient than a commonly used robust test MAX when K is large. Our discussion is for any ordered 2×K table, but simulation and applications to real data are focused on case-control genetic association studies. Although there is no single test optimal for all choices of scores, our numerical results show that some score averaging tests can achieve the performance of MAX.  相似文献   

Summary.  Time series arise often in environmental monitoring settings, which typically involve measuring processes repeatedly over time. In many such applications, observations are irregularly spaced and, additionally, are not distributed normally. An example is water monitoring data collected in Boston Harbor by the Massachusetts Water Resources Authority. We describe a simple robust approach for estimating regression parameters and a first-order autocorrelation parameter in a time series where the observations are irregularly spaced. Estimates are obtained from an estimating equation that is constructed as a linear combination of estimated innovation errors, suitably made robust by symmetric and possibly bounded functions. Under an assumption of data missing completely at random and mild regularity conditions, the proposed estimating equation yields consistent and asymptotically normal estimates. Simulations suggest that our estimator performs well in moderate sample sizes. We demonstrate our method on Secchi depth data collected from Boston Harbor.  相似文献   

Standard unit-root and cointegration tests are sensitive to atypical events such as outliers and structural breaks. In this article, we use outlier-robust estimation techniques to examine the impact of these events on cointegration analysis. Our outlier-robust cointegration test provides a new diagnostic tool for signaling when standard cointegration results might be driven by a few aberrant observations. A main feature of our approach is that the proposed robust estimator can be used to compute weights for all observations, which in turn can be used to identify the approximate dates of atypical events. We evaluate our method using simulated data and a Monte Carlo experiment. We also present an empirical example showing the usefulness of the proposed analysis.  相似文献   

This paper discusses the contribution of Cerioli et al. (Stat Methods Appl, 2018), where robust monitoring based on high breakdown point estimators is proposed for multivariate data. The results follow years of development in robust diagnostic techniques. We discuss the issues of extending data monitoring to other models with complex structure, e.g. factor analysis, mixed linear models for which S and MM-estimators exist or deviating data cells. We emphasise the importance of robust testing that is often overlooked despite robust tests being readily available once S and MM-estimators have been defined. We mention open questions like out-of-sample inference or big data issues that would benefit from monitoring.  相似文献   

We introduce and study a class of rank-based estimators for the linear model. The estimate may be roughly described as being calculated in the same manner as a generalized M-estimate, but with the residual being replaced by a function of its signed rank. The influence function can thus be bounded, both as a function of the residual and as a function of the carriers. Subject to such a bound, the efficiency at a particular model distribution can be optimized by appropriate choices of rank scores and carrier weights. Such choices are given, with respect to a variety of optimality criteria. We compare our estimates with several others, in a Monte Carlo study and on a real data set from the literature.  相似文献   

Traditional statistical modeling of continuous outcome variables relies heavily on the assumption of a normal distribution. However, in some applications, such as analysis of microRNA (miRNA) data, normality may not hold. Skewed distributions play an important role in such studies and might lead to robust results in the presence of extreme outliers. We apply a skew-normal (SN) distribution, which is indexed by three parameters (location, scale and shape), in the context of miRNA studies. We developed a test statistic for comparing means of two conditions replacing the normal assumption with SN distribution. We compared the performance of the statistic with other Wald-type statistics through simulations. Two real miRNA datasets are analyzed to illustrate the methods. Our simulation findings showed that the use of a SN distribution can result in improved identification of differentially expressed miRNAs, especially with markedly skewed data and when the two groups have different variances. It also appeared that the statistic with SN assumption performs comparably with other Wald-type statistics irrespective of the sample size or distribution. Moreover, the real dataset analyses suggest that the statistic with SN assumption can be used effectively for identification of important miRNAs. Overall, the statistic with SN distribution is useful when data are asymmetric and when the samples have different variances for the two groups.  相似文献   

Although there exists an increasing interest in monitoring and diagnosing multistage processes through the recent years, this issue has been overlooked to a large extent in cascade processes where the quality characteristics are liable to outliers. The presence of outliers has a debilitating effect on the detect-ability of the traditional cause selecting control charts and thus makes them unreliable. Therefore, the purpose of this article is to provide a robust approach to quality control in multistage processes. It is assumed that the process consists of two stages and the historical data with regard to both dependent quality characteristics contain outliers. A robust fitting procedure based on compound-estimator is employed to build the relationship between the quality variables and a robust monitoring approach is presented. Subsequently, simulation studies are undertaken to assess the performance of the robust scheme by means of the average run length (ARL) criterion. It is shown that the proposed robust procedure can much faster detect diverse types of shift.  相似文献   

Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

We introduce a general class of continuous univariate distributions with positive support obtained by transforming the class of two-piece distributions. We show that this class of distributions is very flexible, easy to implement, and contains members that can capture different tail behaviours and shapes, producing also a variety of hazard functions. The proposed distributions represent a flexible alternative to the classical choices such as the log-normal, Gamma, and Weibull distributions. We investigate empirically the inferential properties of the proposed models through an extensive simulation study. We present some applications using real data in the contexts of time-to-event and accelerated failure time models. In the second kind of applications, we explore the use of these models in the estimation of the distribution of the individual remaining life.  相似文献   

In this article, a new model-free feature screening method named after probability density (mass) function distance (PDFD) correlation is presented for ultrahigh-dimensional data analysis. We improve the fused-Kolmogorov filter (F-KOL) screening procedure through probability density distribution. The proposed method is also fully nonparametric and can be applied to more general types of predictors and responses, including discrete and continuous random variables. Kernel density estimate method and numerical integration are applied to obtain the estimator we proposed. The results of simulation studies indicate that the fused-PDFD performs better than other existing screening methods, such as F-KOL filter, sure-independent screening (SIS), sure independent ranking and screening (SIRS), distance correlation sure-independent screening (DCSIS) and robust ranking correlation screening (RRCS). Finally, we demonstrate the validity of fused-PDFD by a real data example.  相似文献   

We investigate and develop methods for structural break detection, considering time series from thermal spraying process monitoring. Since engineers induce technical malfunctions during the processes, the time series exhibit structural breaks at known time points, giving us valuable information to conduct the investigations. First, we consider a recently developed robust online (also real-time) filtering (i.e. smoothing) procedure that comprises a test for local linearity. This test rejects when jumps and trend changes are present, so that it can also be useful to detect such structural breaks online. Second, based on the filtering procedure we develop a robust method for the online detection of ongoing trends. We investigate these two methods as to the online detection of structural breaks by simulations and applications to the time series from the manipulated spraying processes. Third, we consider a recently developed fluctuation test for constant variances that can be applied offline, i.e. after the whole time series has been observed, to control the spraying results. Since this test is not reliable when jumps are present in the time series, we suggest data transformation based on filtering and demonstrate that this transformation makes the test applicable.  相似文献   


This paper deals with Bayes, robust Bayes, and minimax predictions in a subfamily of scale parameters under an asymmetric precautionary loss function. In Bayesian statistical inference, the goal is to obtain optimal rules under a specified loss function and an explicit prior distribution over the parameter space. However, in practice, we are not able to specify the prior totally or when a problem must be solved by two statisticians, they may agree on the choice of the prior but not the values of the hyperparameters. A common approach to the prior uncertainty in Bayesian analysis is to choose a class of prior distributions and compute some functional quantity. This is known as Robust Bayesian analysis which provides a way to consider the prior knowledge in terms of a class of priors Γ for global prevention against bad choices of hyperparameters. Under a scale invariant precautionary loss function, we deal with robust Bayes predictions of Y based on X. We carried out a simulation study and a real data analysis to illustrate the practical utility of the prediction procedure.  相似文献   

We study the problem of merging homogeneous groups of pre-classified observations from a robust perspective motivated by the anti-fraud analysis of international trade data. This problem may be seen as a clustering task which exploits preliminary information on the potential clusters, available in the form of group-wise linear regressions. Robustness is then needed because of the sensitivity of likelihood-based regression methods to deviations from the postulated model. Through simulations run under different contamination scenarios, we assess the impact of outliers both on group-wise regression fitting and on the quality of the final clusters. We also compare alternative robust methods that can be adopted to detect the outliers and thus to clean the data. One major conclusion of our study is that the use of robust procedures for preliminary outlier detection is generally recommended, except perhaps when contamination is weak and the identification of cluster labels is more important than the estimation of group-specific population parameters. We also apply the methodology to find homogeneous groups of transactions in one empirical example that illustrates our motivating anti-fraud framework.  相似文献   

We develop strategies for Bayesian modelling as well as model comparison, averaging and selection for compartmental models with particular emphasis on those that occur in the analysis of positron emission tomography (PET) data. Both modelling and computational issues are considered. Biophysically inspired informative priors are developed for the problem at hand, and by comparison with default vague priors it is shown that the proposed modelling is not overly sensitive to prior specification. It is also shown that an additive normal error structure does not describe measured PET data well, despite being very widely used, and that within a simple Bayesian framework simultaneous parameter estimation and model comparison can be performed with a more general noise model. The proposed approach is compared with standard techniques using both simulated and real data. In addition to good, robust estimation performance, the proposed technique provides, automatically, a characterisation of the uncertainty in the resulting estimates which can be considerable in applications such as PET.  相似文献   

Many methods have been developed for detecting multiple outliers in a single multivariate sample, but very few for the case where there may be groups in the data. We propose a method of simultaneously determining groups (as in cluster analysis) and detecting outliers, which are points that are distant from every group. Our method is an adaptation of the BACON algorithm proposed by Billor, Hadi and Velleman for the robust detection of multiple outliers in a single group of multivariate data. There are two versions of our method, depending on whether or not the groups can be assumed to have equal covariance matrices. The effectiveness of the method is illustrated by its application to two real data sets and further shown by a simulation study for different sample sizes and dimensions for 2 and 3 groups, with and without planted outliers in the data. When the number of groups is not known in advance, the algorithm could be used as a robust method of cluster analysis, by running it for various numbers of groups and choosing the best solution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号