首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 625 毫秒
1.
In this article, we present a compressive sensing based framework for generalized linear model regression that employs a two-component noise model and convex optimization techniques to simultaneously detect outliers and determine optimally sparse representations of noisy data from arbitrary sets of basis functions. We then extend our model to include model order reduction capabilities that can uncover inherent sparsity in regression coefficients and achieve simple, superior fits. Second, we use the mixed ?2/?1 norm to develop another model that can efficiently uncover block-sparsity in regression coefficients. By performing model order reduction over all independent variables and basis functions, our algorithms successfully deemphasize the effect of independent variables that become uncorrelated with dependent variables. This desirable property has various applications in real-time anomaly detection, such as faulty sensor detection and sensor jamming in wireless sensor networks. After developing our framework and inheriting a stable recovery theorem from compressive sensing theory, we present two simulation studies on sparse or block-sparse problems that demonstrate the superior performance of our algorithms with respect to (1) classic outlier-invariant regression techniques like least absolute value and iteratively reweighted least-squares and (2) classic sparse-regularized regression techniques like LASSO.  相似文献   

2.
Sequential multi-chart detection procedures for detecting changes in multichannel sensor systems are developed. In the case of complete information on pre-change and post-change distributions, the detection algorithm represents a likelihood ratio-based multichannel generalization of Page’s cumulative sum (CUSUM) test that is applied to general stochastic models that may include correlated and nonstationary observations. There are many potential application areas where it is necessary to consider multichannel generalizations and general statistical models. In this paper our main motivation for doing so is network security: rapid anomaly detection for an early detection of attacks in computer networks that lead to changes in network traffic. Moreover, this kind of application encourages the development of a nonparametric multichannel detection test that does not use exact pre-change (legitimate) and post-change (attack) traffic models. The proposed nonparametric method can be effectively applied to detect a wide variety of attacks such as denial-of-service attacks, worm-based attacks, port-scanning, and man-in-the-middle attacks. In addition, we propose a multichannel CUSUM procedure that is based on binary quantized data; this procedure turns out to be more efficient than the previous two algorithms in certain scenarios. All proposed detection algorithms are based on the change-point detection theory. They utilize the thresholding of test statistics to achieve a fixed rate of false alarms, while allowing changes in statistical models to be detected “as soon as possible”. Theoretical frameworks for the performance analysis of detection procedures, as well as results of Monte Carlo simulations for a Poisson example and results of detecting real flooding attacks, are presented.  相似文献   

3.
In this paper, we propose a new nonparametric simultaneous test for dual alternatives. Simultaneous tests for dual alternatives are used for pattern detection of arsenic contamination level in ground water. We consider two possible patterns, namely, monotone shift and an umbrella-type location alternative, as the dual alternatives. Pattern recognition problems of this nature are addressed in Bandyopadhyay et al. [5], stretching the idea of multiple hypotheses tests as in Benjamini and Hochberg [6]. In the present context, we develop an alternative approach based on contrasts that helps us to detect three underlying pattern much more efficiently. We illustrate the new methodology through a motivating example related to highly sensitive issue of arsenic contamination in ground water. We provide some Monte-Carlo studies related to the proposed technique and give a comparative study between different detection procedures. We also obtain some related asymptotic results.  相似文献   

4.
Abstract.  Change point problems are considered where at some unobservable time the intensity of a point process ( Tn ), n ∈  N , has a jump. For a given reward functional we detect the change point optimally for different information schemes. These schemes differ in the available information. We consider three information levels, namely sequential observation of ( Tn ), ex post decision after observing the point process up to a fixed time t * and a combination of both observation schemes. In all of these cases the detection problem is viewed as an optimal stopping problem which can be solved by deriving a semimartingale representation of the gain process and applying tools from filtering theory.  相似文献   

5.
It is commonly required to detect change points in sequences of random variables. In the most difficult setting of this problem, change detection must be performed sequentially with new observations being constantly received over time. Further, the parameters of both the pre- and post- change distributions may be unknown. In Hawkins and Zamba (Technometrics 47(2):164–173, 2005), the sequential generalised likelihood ratio test was introduced for detecting changes in this context, under the assumption that the observations follow a Gaussian distribution. However, we show that the asymptotic approximation used in their test statistic leads to it being conservative even when a large numbers of observations is available. We propose an improved procedure which is more efficient, in the sense of detecting changes faster, in all situations. We also show that similar issues arise in other parametric change detection contexts, which we illustrate by introducing a novel monitoring procedure for sequences of Exponentially distributed random variable, which is an important topic in time-to-failure modelling.  相似文献   

6.
Site occupancy, as estimated by the probability of presence, is used for monitoring species populations. However, the detection of species at individual sites is often subject to errors. In order to accurately estimate occupancy we must simultaneously account for imperfect detectability by estimating the probability of detection. The problem with estimating occupancy arises from not knowing whether a nondetection occurred at an occupied site due to imperfect detectability (sampling zeros), or the nondetection resulting from an unoccupied site (fixed zeros). We evaluated the performance of the basic, normal approximation, studentised and percentile methods for approximating confidence limits for occupancy and detection of species. Using coverage and average interval width, we demonstrated that the studentised estimator was generally superior to the others, except when a small sample of sites are selected. Under this circumstance and when calculating limits for detection, no estimator produced reliable results. The experimental factors we considered include: (i) number of sites; (ii) number of survey occasions; (iii) probabilities of presence (occupancy) and detection; and (iv) overdispersion in the capture matrix. Similar conclusions were reached both for the simulated studies and a case study. Overall, estimation near the boundaries of the probability of occupancy and detectability was difficult.  相似文献   

7.
The detection of (structural) breaks or the so called change point problem has drawn increasing attention from the theoretical, applied economic and financial fields. Much of the existing research concentrates on the detection of change points and asymptotic properties of their estimators in panels when N, the number of panels, as well as T, the number of observations in each panel are large. In this paper we pursue a different approach, i.e., we consider the asymptotic properties when N→∞ while keeping T fixed. This situation is typically related to large (firm-level) data containing financial information about an immense number of firms/stocks across a limited number of years/quarters/months. We propose a general approach for testing for break(s) in this setup. In particular, we obtain the asymptotic behavior of test statistics. We also propose a wild bootstrap procedure that could be used to generate the critical values of the test statistics. The theoretical approach is supplemented by numerous simulations and by an empirical illustration. We demonstrate that the testing procedure works well in the framework of the four factors CAPM model. In particular, we estimate the breaks in the monthly returns of US mutual funds during the period January 2006 to February 2010 which covers the subprime crises.  相似文献   

8.
Spatial outliers are spatially referenced objects whose non spatial attribute values are significantly different from the corresponding values in their spatial neighborhoods. In other words, a spatial outlier is a local instability or an extreme observation that deviates significantly in its spatial neighborhood, but possibly not be in the entire dataset. In this article, we have proposed a novel spatial outlier detection algorithm, location quotient (LQ) for multiple attributes spatial datasets, and compared its performance with the well-known mean and median algorithms for multiple attributes spatial datasets, in the literature. In particular, we have applied the mean, median, and LQ algorithms on a real dataset and on simulated spatial datasets of 13 different sizes to compare their performances. In addition, we have calculated area under the curve values in all the cases, which shows that our proposed algorithm is more powerful than the mean and median algorithms in almost all the considered cases and also plotted receiver operating characteristic curves in some cases.  相似文献   

9.
In case–control studies the Cochran–Armitage trend test is powerful for detection of an association between a risk genetic marker and a disease of interest. To apply this test, a score should be assigned to the genotypes based on the genetic model. When the underlying genetic model is unknown, the trend test statistic is quite sensitive to the choice of the score. In this paper, we study the asymptotic property of the robust suptest statistic defined as a supremum of Cochran–Armitage trend test across all scores between 0 and 1. Through numerical studies we show that small to moderate sample size performances of the suptest appear reasonable in terms of type I error control and we compared empirical powers of the suptest to those of three individual Cochran–Armitage trend tests and the maximum of the three Cochran–Armitage trend tests. The use of the suptest is applied to rheumatoid arthritis data from a genome-wide association study.  相似文献   

10.
Change-point detection regains much attention recently for analyzing array or sequencing data for copy number variation (CNV) detection. In such applications, the true signals are typically very short and buried in the long data sequence, which makes it challenging to identify the variations efficiently and accurately. In this article, we propose a new change-point detection method, a backward procedure, which is not only fast and simple enough to exploit high-dimensional data but also performs very well for detecting short signals. Although motivated by CNV detection, the backward procedure is generally applicable to assorted change-point problems that arise in a variety of scientific applications. It is illustrated by both simulated and real CNV data that the backward detection has clear advantages over other competing methods, especially when the true signal is short. The Canadian Journal of Statistics 48: 366–385; 2020 © 2020 Statistical Society of Canada  相似文献   

11.
In this paper, we develop a monitoring procedure for an early detection of parameter changes in random coefficient autoregressive models. It is shown that the stopping rule signaling a parameter change satisfies the desired asymptotic property as seen in Lee, Lee, and Na (submitted for publication). Simulation results are provided for illustration.  相似文献   

12.
Observations collected over time are often autocorrelated rather than independent, and sometimes include observations below or above detection limits (i.e. censored values reported as less or more than a level of detection) and/or missing data. Practitioners commonly disregard censored data cases or replace these observations with some function of the limit of detection, which often results in biased estimates. Moreover, parameter estimation can be greatly affected by the presence of influential observations in the data. In this paper we derive local influence diagnostic measures for censored regression models with autoregressive errors of order p (hereafter, AR(p)‐CR models) on the basis of the Q‐function under three useful perturbation schemes. In order to account for censoring in a likelihood‐based estimation procedure for AR(p)‐CR models, we used a stochastic approximation version of the expectation‐maximisation algorithm. The accuracy of the local influence diagnostic measure in detecting influential observations is explored through the analysis of empirical studies. The proposed methods are illustrated using data, from a study of total phosphorus concentration, that contain left‐censored observations. These methods are implemented in the R package ARCensReg.  相似文献   

13.
Outlier detection has been used extensively in data analysis to detect anomalous observation in data. It has important applications such as in fraud detection and robust analysis, among others. In this paper, we propose a method in detecting multiple outliers in linear functional relationship model for circular variables. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering approach. With the use of a tree diagram, we illustrate the detection of outliers graphically. A Monte Carlo simulation study is done to verify the accuracy of the proposed method. Low probability of masking and swamping effects indicate the validity of the proposed approach. Also, the illustrations to two sets of real data are given to show its practical applicability.  相似文献   

14.
Efficient score tests exist among others, for testing the presence of additive and/or innovative outliers that are the result of the shifted mean of the error process under the regression model. A sample influence function of autocorrelation-based diagnostic technique also exists for the detection of outliers that are the result of the shifted autocorrelations. The later diagnostic technique is however not useful if the outlying observation does not affect the autocorrelation structure but is generated due to an inflation in the variance of the error process under the regression model. In this paper, we develop a unified maximum studentized type test which is applicable for testing the additive and innovative outliers as well as variance shifted outliers that may or may not affect the autocorrelation structure of the outlier free time series observations. Since the computation of the p-values for the maximum studentized type test is not easy in general, we propose a Satterthwaite type approximation based on suitable doubly non-central F-distributions for finding such p-values [F.E. Satterthwaite, An approximate distribution of estimates of variance components, Biometrics 2 (1946), pp. 110–114]. The approximations are evaluated through a simulation study, for example, for the detection of additive and innovative outliers as well as variance shifted outliers that do not affect the autocorrelation structure of the outlier free time series observations. Some simulation results on model misspecification effects on outlier detection are also provided.  相似文献   

15.

Outlier detection is an inevitable step to most statistical data analyses. However, the mere detection of an outlying case does not always answer all scientific questions associated with that data point. Outlier detection techniques, classical and robust alike, will typically flag the entire case as outlying, or attribute a specific case weight to the entire case. In practice, particularly in high dimensional data, the outlier will most likely not be outlying along all of its variables, but just along a subset of them. If so, the scientific question why the case has been flagged as an outlier becomes of interest. In this article, a fast and efficient method is proposed to detect variables that contribute most to an outlier’s outlyingness. Thereby, it helps the analyst understand in which way an outlier lies out. The approach pursued in this work is to estimate the univariate direction of maximal outlyingness. It is shown that the problem of estimating that direction can be rewritten as the normed solution of a classical least squares regression problem. Identifying the subset of variables contributing most to outlyingness, can thus be achieved by estimating the associated least squares problem in a sparse manner. From a practical perspective, sparse partial least squares (SPLS) regression, preferably by the fast sparse NIPALS (SNIPLS) algorithm, is suggested to tackle that problem. The performed method is demonstrated to perform well both on simulated data and real life examples.

  相似文献   

16.
ABSTRACT

The exponential-logarithmic distribution is a distribution which has a decreasing failure function and various applications such as in biological and engineering fields. In this paper, we study a change-point problem of this distribution. A procedure based on Schwarz information criterion is proposed to detect changes in parameters of this distribution. Simulations are conducted to indicate the performance of the proposed procedure under different scenarios. Applications on two real data are provided to illustrate the detection procedure.  相似文献   

17.
In this paper we build on an approach proposed by Zou et al. (2014) for nonparametric changepoint detection. This approach defines the best segmentation for a data set as the one which minimises a penalised cost function, with the cost function defined in term of minus a non-parametric log-likelihood for data within each segment. Minimising this cost function is possible using dynamic programming, but their algorithm had a computational cost that is cubic in the length of the data set. To speed up computation, Zou et al. (2014) resorted to a screening procedure which means that the estimated segmentation is no longer guaranteed to be the global minimum of the cost function. We show that the screening procedure adversely affects the accuracy of the changepoint detection method, and show how a faster dynamic programming algorithm, pruned exact linear time (PELT) (Killick et al. 2012), can be used to find the optimal segmentation with a computational cost that can be close to linear in the amount of data. PELT requires a penalty to avoid under/over-fitting the model which can have a detrimental effect on the quality of the detected changepoints. To overcome this issue we use a relatively new method, changepoints over a range of penalties (Haynes et al. 2016), which finds all of the optimal segmentations for multiple penalty values over a continuous range. We apply our method to detect changes in heart-rate during physical activity.  相似文献   

18.
High leverage points can induce or disrupt multicollinearity patterns in data. Observations responsible for this problem are generally known as collinearity-influential observations. A significant amount of published work on the identification of collinearity-influential observations exists; however, we show in this article that all commonly used detection techniques display greatly reduced sensitivity in the presence of multiple high leverage collinearity-influential observations. We propose a new measure based on a diagnostic robust group deletion approach. Some practical cutoff points for existing and developed diagnostics measures are also introduced. Numerical examples and simulation results show that the proposed measure provides significant improvement over the existing measures.  相似文献   

19.
In fitting regression model, one or more observations may have substantial effects on estimators. These unusual observations are precisely detected by a new diagnostic measure, Pena's statistic. In this article, we introduce a type of Pena's statistic for each point in Liu regression. Using the forecast change property, we simplify the Pena's statistic in a numerical sense. It is found that the simplified Pena's statistic behaves quite well as far as detection of influential observations is concerned. We express Pena's statistic in terms of the Liu leverages and residuals. The normality of this statistic is also discussed and it is demonstrated that it can identify a subset of high Liu leverage outliers. For numerical evaluation, simulated studies are given and a real data set has been analysed for illustration.  相似文献   

20.
In astronomy multiple images are frequently obtained at the same position of the sky for follow-up coaddition as it helps one go deeper and look for fainter objects. With large scale panchromatic synoptic surveys becoming more common, image co-addition has become even more necessary as new observations start to get compared with coadded fiducial sky in real time. The standard coaddition techniques have included straight averages, variance weighted averages, medians etc. A more sophisticated nonlinear response chi-square method is also used when it is known that the data are background noise limited and the point spread function is homogenized in all channels. A more robust object detection technique capable of detecting faint sources, even those not seen at all epochs which will normally be smoothed out in traditional methods, is described. The analysis at each pixel level is based on a formula similar to Mahalanobis distance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号