首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we describe some results of an ESPRIT project known as StatLog whose purpose is the comparison of classification algorithms. We give a brief summary of some of the algorithms in the project: discriminant analysis; nearest neighbours; decision trees; neural net methods; SMART; kernel methods and other Bayesian approaches.We focus on data sets derived from images, ranging from raw pixel data to features and summaries extracted from such data.  相似文献   

2.
In this paper, we describe some results of an ESPRIT project known as StatLog whose purpose is the comparison of classification algorithms. We give a brief summary of some of the algorithms in the project: discriminant analysis; nearest neighbours; decision trees; neural net methods; SMART; kernel methods and other Bayesian approaches.We focus on data sets derived from images, ranging from raw pixel data to features and summaries extracted from such data.  相似文献   

3.
A new method of discrimination and classification based on a Hausdorff type distance is proposed. In two groups, the Hausdorff distance is defined as the sum of the furthest distance of the nearest elements of one set to another. This distance has some useful properties and is exploited in developing a discriminant criterion between individual objects belonging to two groups based on a finite number of classification variables. The discrimination criterion is generalized to more than two groups in a couple of ways. Several data sets are analysed and their classification accuracy is compared to that obtained from linear discriminant function and the results are encouraging. The method in simple, lends itself to parallel computation and imposes less stringent conditions on the data.  相似文献   

4.
Since the squared ranks test was first proposed by Taha in 1964 it has been mentioned by several authors as a test that is easy to use, with good power in many situations. It is almost as easy to use as the Wilcoxon rank sum test, and has greater power when two populations differ in their scale parameters rather than in their location parameters. This paper discuss the versatility of the squared ranks test, introduces a test which uses squared ranks, and presents some exact tables  相似文献   

5.
In this paper, we present a test of independence between the response variable, which can be discrete or continuous, and a continuous covariate after adjusting for heteroscedastic treatment effects. The method involves first augmenting each pair of the data for all treatments with a fixed number of nearest neighbours as pseudo‐replicates. Then a test statistic is constructed by taking the difference of two quadratic forms. The statistic is equivalent to the average lagged correlations between the response and nearest neighbour local estimates of the conditional mean of response given the covariate for each treatment group. This approach effectively eliminates the need to estimate the nonlinear regression function. The asymptotic distribution of the proposed test statistic is obtained under the null and local alternatives. Although using a fixed number of nearest neighbours pose significant difficulty in the inference compared to that allowing the number of nearest neighbours to go to infinity, the parametric standardizing rate for our test statistics is obtained. Numerical studies show that the new test procedure has robust power to detect nonlinear dependency in the presence of outliers that might result from highly skewed distributions. The Canadian Journal of Statistics 38: 408–433; 2010 © 2010 Statistical Society of Canada  相似文献   

6.
Spearman's rank correlation coefficient is not entirely suitable for measuring the correlation between two rankings in some applications because it treats all ranks equally. In 2000, Blest proposed an alternative measure of correlation that gives more importance to higher ranks but has some drawbacks. This paper proposes a weighted rank measure of correlation that weights the distance between two ranks using a linear function of those ranks, giving more importance to higher ranks than lower ones. It analyses its distribution and provides a table of critical values to test whether a given value of the coefficient is significantly different from zero. The paper also summarizes a number of applications for which the new measure is more suitable than Spearman's.  相似文献   

7.
Millions of smart meters that are able to collect individual load curves, that is, electricity consumption time series, of residential and business customers at fine scale time grids are now deployed by electricity companies all around the world. It may be complex and costly to transmit and exploit such a large quantity of information, therefore it can be relevant to use survey sampling techniques to estimate mean load curves of specific groups of customers. Data collection, like every mass process, may undergo technical problems at every point of the metering and collection chain resulting in missing values. We consider imputation approaches (linear interpolation, kernel smoothing, nearest neighbours, principal analysis by conditional estimation) that take advantage of the specificities of the data, that is to say the strong relation between the consumption at different instants of time. The performances of these techniques are compared on a real example of Irish electricity load curves under various scenarios of missing data. A general variance approximation of total estimators is also given which encompasses nearest neighbours, kernel smoothers imputation and linear imputation methods. The Canadian Journal of Statistics 47: 65–89; 2019 © 2018 Statistical Society of Canada  相似文献   

8.
Randomly generated points in Rd are connected to their nearest neighbours (Euclidean distance). The resulting connected clusters of points are studied. This paper examines questions related to the collection of clusters formed and to the internal structure of a cluster. In particular, the one-dimensional structure is examined in detail.  相似文献   

9.
Indices of Dependence Between Types in Multivariate Point Patterns   总被引:2,自引:0,他引:2  
We propose new summary statistics quantifying several forms of dependence between points of different types in a multi-type spatial point pattern. These statistics are the multivariate counterparts of the J -function for point processes of a single type, introduced by Lieshout & Baddeley (1996). They are based on comparing distances from a type i point to either the nearest type j point or to the nearest point in the pattern regardless of type to these distances seen from an arbitrary point in space. Information about the range of interaction can also be inferred. Our statistics can be computed explicitly for a range of well-known multivariate point process models. Some applications to bivariate and trivariate data sets are presented as well.  相似文献   

10.
We consider a novel univariate non parametric cumulative sum (CUSUM) control chart for detecting the small shifts in the mean of a process, where the nominal value of the mean is unknown but some historical data are available. This chart is established based on the Mann–Whitney statistic as well as the change-point model, where any assumption for the underlying distribution of the process is not required. The performance comparisons based on simulations show that the proposed control chart is slightly more effective than some other related non parametric control charts.  相似文献   

11.
This article proposes a new spatial cluster detection method for longitudinal outcomes that detects neighborhoods and regions with elevated rates of disease while controlling for individual level confounders. The proposed method, CumResPerm, utilizes cumulative geographic residuals through a permutation test to detect potential clusters which are defined as sets of administrative regions, such as a town or group of administrative regions. Previous cluster detection methods are not able to incorporate individual level data including covariate adjustment, while still being able to define potential clusters using informative neighborhood or town boundaries. Often, it is of interest to detect such spatial clusters because individuals residing in a town may have similar environmental exposures or socioeconomic backgrounds due to administrative reasons, such as zoning laws. Therefore, these boundaries can be very informative and more relevant than arbitrary clusters such as the standard circle or square. Application of the CumResPerm method will be illustrated by the Home Allergens and Asthma prospective cohort study analyzing the relationship between area or neighborhood residence and repeated measured outcome, occurrence of wheeze in the last six months, while taking into account mobile locations.  相似文献   

12.
Properties of the Weibull cumulative exposure model   总被引:1,自引:0,他引:1  
This article is aimed at the investigation of some properties of the Weibull cumulative exposure model on multiple-step step-stress accelerated life test data. Although the model includes a probabilistic idea of Miner's rule in order to express the effect of cumulative damage in fatigue, our result shows that the application of only this is not sufficient to express degradation of specimens and the shape parameter must be larger than 1. For a random variable obeying the model, its average and standard deviation are investigated on a various sets of parameter values. In addition, a way of checking the validity of the model is illustrated through an example of the maximum likelihood estimation on an actual data set, which is about time to breakdown of cross-linked polyethylene-insulated cables.  相似文献   

13.
Summary.  Competing risks situations can be encountered in many research areas such as medicine, social science and engineering. The main stream of analyses of those competing risks data has been nonparametric or semiparametric in the statistical literature. We propose a new parametric family to parameterize the cumulative incidence function completely. The new distribution is sufficiently flexible to fit various shapes of hazard patterns in survival data and increases the efficiency of the cumulative incidence estimates over the distribution-free approaches. A simple two-sample parametric test statistic is also proposed to compare the cumulative incidence functions between two groups at a given time point. The new parametric approach is illustrated by using breast cancer data sets from the National Surgical Adjuvant Breast and Bowel Project.  相似文献   

14.
Distance concentration is the phenomenon that, in certain conditions, the contrast between the nearest and the farthest neighbouring points vanishes as the data dimensionality increases. It affects high dimensional data processing, analysis, retrieval, and indexing, which all rely on some notion of distance or dissimilarity. Previous work has characterised this phenomenon in the limit of infinite dimensions. However, real data is finite dimensional, and hence the infinite-dimensional characterisation is insufficient. Here we quantify the phenomenon more precisely, for the possibly high but finite dimensional case in a distribution-free manner, by bounding the tails of the probability that distances become meaningless. As an application, we show how this can be used to assess the concentration of a given distance function in some unknown data distribution solely on the basis of an available data sample from it. This can be used to test and detect problematic cases more rigorously than it is currently possible, and we demonstrate the working of this approach on both synthetic data and ten real-world data sets from different domains.  相似文献   

15.
Existing statistical methods for the detection of space–time clusters of point events are retrospective, in that they are used to ascertain whether space–time clustering exists among a fixed number of past events. In contrast, prospective methods treat a series of observations sequentially, with the aim of detecting quickly any changes that occur in the series. In this paper, cumulative sum methods of monitoring are adapted for use with Knox's space–time statistic. The result is a procedure for the rapid detection of any emergent space–time interactions for a set of sequentially monitored point events. The approach relies on a 'local' Knox statistic that is useful in retrospective analyses to detect when and where space–time interaction occurs. The distribution of the local Knox statistic under the null hypothesis of no space–time interaction is derived. The retrospective local statistic and the prospective cumulative sum monitoring method are illustrated by using previously published data on Burkitt's lymphoma in Uganda.  相似文献   

16.
Due to destructiveness of natural disasters, restriction of disaster scenarios and some human causes, missing data usually occur in disaster decision-making problems. In order to estimate missing values of alternatives, this paper focuses on imputing heterogeneous attribute values of disaster based on an improved K nearest neighbor imputation (KNNI) method. Firstly, some definitions of trapezoidal fuzzy numbers (TFNs) are introduced and three types of attributes (i.e. linguistic term sets, intervals and real numbers) are converted to TFNs. Then the correlated degree model is utilized to extract related attributes to form instances that will be used in K nearest neighbor algorithm, and a novel KNNI method merging with correlated degree model is presented. Finally, an illustrative example is given to verify the proposed method and to demonstrate its feasibility and effectiveness.  相似文献   

17.
Leverage values are being used in regression diagnostics as measures of influential observations in the $X$-space. Detection of high leverage values is crucial because of their responsibility for misleading conclusion about the fitting of a regression model, causing multicollinearity problems, masking and/or swamping of outliers, etc. Much work has been done on the identification of single high leverage points and it is generally believed that the problem of detection of a single high leverage point has been largely resolved. But there is no general agreement among the statisticians about the detection of multiple high leverage points. When a group of high leverage points is present in a data set, mainly because of the masking and/or swamping effects the commonly used diagnostic methods fail to identify them correctly. On the other hand, the robust alternative methods can identify the high leverage points correctly but they have a tendency to identify too many low leverage points to be points of high leverages which is not also desired. An attempt has been made to make a compromise between these two approaches. We propose an adaptive method where the suspected high leverage points are identified by robust methods and then the low leverage points (if any) are put back into the estimation data set after diagnostic checking. The usefulness of our newly proposed method for the detection of multiple high leverage points is studied by some well-known data sets and Monte Carlo simulations.  相似文献   

18.
Scaling of multivariate data prior to cluster analysis is important as a preprocessing step. Currently there are methods for doing this. This paper proposes some alternatives, which are particularly directed at helping reveal cluster structures in data. These methods are applied to simulated and real data sets and their performances are compared to some currently used methods. The results indicate that, in many situations, the new methods are much better than the most popular method, called autoscaling. In the most challenging clustering example considered, their performances, while poor, are no worse than all the currently used methods.  相似文献   

19.
The ranking of paired contestants (players) after a series of contests is difficult when every player does not play every other player. In the 1975 JASA Mark Thompson presented a maximum likelihood solution based on the assumption that the probability of any one player defeating any other is a function only of the difference in their ranks. Here the linear approximation to that likelihood is shown to lead to a nonparametric measure of the efficacy of the ranking, called the net difference in ranks (NDR) , which is the sum of the differences in ranks of the paired players in the observed contests that agree with the ranking minus the sum of the differences in ranks in the observed contests that disagree with the ranking (upsets) . The subject is part of a large literature that has been consolidated by H.A. David in The Method of Paired Comparisons (1963, 1988). The method was introduced by the psychophysicist Fechner in 1860 and has been widely applied to sensory testing,  相似文献   

20.
An overview of risk-adjusted charts   总被引:2,自引:1,他引:1  
Summary.  The paper provides an overview of risk-adjusted charts, with examples based on two data sets: the first consisting of outcomes following cardiac surgery and patient factors contributing to the Parsonnet score; the second being age–sex-adjusted death-rates per year under a single general practitioner. Charts presented include the cumulative sum (CUSUM), resetting sequential probability ratio test, the sets method and Shewhart chart. Comparisons between the charts are made. Estimation of the process parameter and two-sided charts are also discussed. The CUSUM is found to be the least efficient, under the average run length (ARL) criterion, of the resetting sequential probability ratio test class of charts, but the ARL criterion is thought not to be sensible for comparisons within that class. An empirical comparison of the sets method and CUSUM, for binary data, shows that the sets method is more efficient when the in-control ARL is small and more efficient for a slightly larger range of in-control ARLs when the change in parameter being tested for is larger. The Shewart p -chart is found to be less efficient than the CUSUM even when the change in parameter being tested for is large.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号