首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Most of the longitudinal data contain influential points and for analyzing them generalized and weighted generalized estimating equations (GEEs and WGEEs) are highly influenced by these points. An approach for dealing with outliers is having weight functions. In this article, we propose some new weights based on the statistical depth. These weights express centrality of points with respect to the whole sample with a smaller depth (larger depth) for the point far from the center (for the point near the center). The proposed approach leads to robust WGEE. These approaches are applied on two real datasets and some simulation studies.  相似文献   

2.
The concept of location depth was introduced as a way to extend the univariate notion of ranking to a bivariate configuration of data points. It has been used successfully for robust estimation, hypothesis testing, and graphical display. The depth contours form a collection of nested polygons, and the center of the deepest contour is called the Tukey median. The only available implemented algorithms for the depth contours and the Tukey median are slow, which limits their usefulness. In this paper we describe an optimal algorithm which computes all bivariate depth contours in O(n 2) time and space, using topological sweep of the dual arrangement of lines. Once these contours are known, the location depth of any point can be computed in O(log2 n) time with no additional preprocessing or in O(log n) time after O(n 2) preprocessing. We provide fast implementations of these algorithms to allow their use in everyday statistical practice.  相似文献   

3.
In the recent years, the notion of data depth has been used in nonparametric multivariate data analysis since it gives natural ‘centre-outward’ ordering of multivariate data points with respect to the given data cloud. In the literature, various nonparametric tests are developed for testing equality of location of two multivariate distributions based on data depth. Here, we define two nonparametric tests based on two different test statistic for testing equality of locations of two multivariate distributions. In the present work, we compare the performance of these tests with the tests developed by Li and Liu [New nonparametric tests of multivariate locations and scales using data depth. Statist Sci. 2004;(1):686–696] for testing equality of locations of two multivariate distributions. Comparison in terms of power is done for multivariate symmetric and skewed distributions using simulation for three popular depth functions. Application of tests to real life data is provided. Conclusion and recommendations are also provided.  相似文献   

4.
Several nonparametric tests for multivariate multi-sample location problem are proposed in this paper. These tests are based on the notion of data depth, which is used to measure the centrality/outlyingness of a given point with respect to a given distribution or a data cloud. Proposed tests are completely nonparametric and implemented through the idea of permutation tests. Performance of the proposed tests is compared with existing parametric test and nonparametric test based on data depth. An extensive simulation study reveals that proposed tests are superior to the existing tests based on data depth with regard to power. Illustrations with real data are provided.  相似文献   

5.
Data depth provides a natural means to rank multivariate vectors with respect to an underlying multivariate distribution. Most existing depth functions emphasize a centre-outward ordering of data points, which may not provide a useful geometric representation of certain distributional features, such as multimodality, of concern to some statistical applications. Such inadequacy motivates us to develop a device for ranking data points according to their “representativeness” rather than “centrality” with respect to an underlying distribution of interest. Derived essentially from a choice of goodness-of-fit test statistic, our device calls for a new interpretation of “depth” more akin to the concept of density than location. It copes particularly well with multivariate data exhibiting multimodality. In addition to providing depth values for individual data points, depth functions derived from goodness-of-fit tests also extend naturally to provide depth values for subsets of data points, a concept new to the data-depth literature.  相似文献   

6.
Abstract

In this article, we consider the problem of estimating regression coefficients for a linear model with censored and truncated data based on regression depth. Any line can be given a rank using regression depth and the deepest regression line is the line with the maximum regression depth. We propose a method to define the regression depth of a line in the presence of censoring and truncation. We show how the proposed regression performs through analyzing Stanford heart transplant data and AIDS incubation data.  相似文献   

7.
The present paper deals with the problem of testing equality of locations of two multivariate distributions using a notion of data depth. A notion of data depth has been used to measure centrality/outlyingness of a given point in a given data cloud. The paper proposes two nonparametric tests for testing equality of locations of two multivariate populations which are developed by observing the behavior of the depth versus depth plot. Simulation study reveals that the proposed tests are superior to the existing tests based on the data depth with regard to power. Illustrations with real data are provided.  相似文献   

8.
聂斌  杜梦莹  廖丹 《统计研究》2012,29(9):88-94
 在统计过程控制的第I阶段,准确识别运行状态发生漂移的时间点是决定控制效果的关键。本文以多维空间的数据离心程度作为判定变点规则的标准,通过概率密度轮廓将单一观测值序列转化为多维空间中的数据点,运用数据深度技术构造特征变量,并建立变点定位规则。仿真性能分析的结果表明新方法能够在不需要假设过程服从正态分布的前提下对变点位置进行精确定位。在比较研究中也表现出良好的综合性能。  相似文献   

9.
For classification problems where the test data are labeled sequentially, the point at which all true positives are first identified is often of critical importance. This article develops hypothesis tests to assess whether all true positives have been labeled in the test data. The tests use a partial receiver operating characteristic (ROC) that is generated from a labeled subset of the test data. These methods are developed in the context of unexploded ordnance (UXO) classification, but are applicable to any binary classification problem. First, the likelihood of the observed ROC given binormal model parameters is derived using order statistics, leading to a nonlinear parameter estimation problem. I then derive the approximate distribution of the point on the ROC at which all true instances are found. Using estimated binormal parameters, this distribution can be integrated up to a desired confidence level to define a critical false alarm rate (FAR). If the selected operating point is before this critical point, then additional labels out to the critical point are required. A second test uses the uncertainty in binormal parameters to determine the critical FAR. These tests are demonstrated with UXO classification examples and both approaches are recommended for testing operating points.  相似文献   

10.
An exact permutation test for analyzing and/or dredging multi-response data at the ordinal or higher levels is presented. The associated test statistic is based on the average distance (or any specified norm) between points within a priori disjoint subgroups of a finite population of points in an r-dimensional space (corresponding to r measured responses from each object in a finite population of objects). Alternative approximate tests based on the beta and normal distributions are provided. Two detailed examples utilizing actual social science data are considered, including comparisons of the approximate tests. An additional example describes the behavior of these tests under a variety of conditions, including extreme data configurations  相似文献   

11.
ON BOOTSTRAP HYPOTHESIS TESTING   总被引:2,自引:0,他引:2  
We describe methods for constructing bootstrap hypothesis tests, illustrating our approach using analysis of variance. The importance of pivotalness is discussed. Pivotal statistics usually result in improved accuracy of level. We note that hypothesis tests and confidence intervals call for different methods of resampling, so as to ensure that accurate critical point estimates are obtained in the former case even when data fail to comply with the null hypothesis. Our main points are illustrated by a simulation study and application to three real data sets.  相似文献   

12.
In this article, we present an alternative test of discordancy in samples of univariate circular data. The new technique is based on the effect of existence of an outlier on the summation of circular distances of the point of interest to all other points. The percentage points are calculated and the performance is examined. We compare the performance of the test in detecting an outlier with other tests and show that the new approach performs relatively better than other known tests. As an illustration a practical example is presented.  相似文献   

13.
Computing location depth and regression depth in higher dimensions   总被引:3,自引:0,他引:3  
The location depth (Tukey 1975) of a point relative to a p-dimensional data set Z of size n is defined as the smallest number of data points in a closed halfspace with boundary through . For bivariate data, it can be computed in O(nlogn) time (Rousseeuw and Ruts 1996). In this paper we construct an exact algorithm to compute the location depth in three dimensions in O(n2logn) time. We also give an approximate algorithm to compute the location depth in p dimensions in O(mp3+mpn) time, where m is the number of p-subsets used.Recently, Rousseeuw and Hubert (1996) defined the depth of a regression fit. The depth of a hyperplane with coefficients (1,...,p) is the smallest number of residuals that need to change sign to make (1,...,p) a nonfit. For bivariate data (p=2) this depth can be computed in O(nlogn) time as well. We construct an algorithm to compute the regression depth of a plane relative to a three-dimensional data set in O(n2logn) time, and another that deals with p=4 in O(n3logn) time. For data sets with large n and/or p we propose an approximate algorithm that computes the depth of a regression fit in O(mp3+mpn+mnlogn) time. For all of these algorithms, actual implementations are made available.  相似文献   

14.
For given (small) a and β a sequential confidence set that covers the true parameter point with probability at least 1 - a and one or more specified false parameter points with probability at most β can be generated by a family of sequen-tial tests. Several situations are described where this approach would be a natural one. The following example is studied in some detail: obtain an upper (1 - α)-confidence interval for a normal mean μ (variance known) with β-protection at μ - δ(μ), where δ(.) is not bounded away from 0 so that a truly sequential procedure is mandatory. Some numerical results are presented for intervals generated by (1) sequential probability ratio tests (SPRT's), and (2) generalized sequential probability ratio tests (GSPRT's). These results indicate the superiority of the GSPRT-generated intervals over the SPRT-generated ones if expected sample size is taken as performance criterion  相似文献   

15.
This paper discusses two problems, which can occur when using central composite designs (CCDs), that are not generally covered in the literature but can lead to wrong decisions-and therefore incorrect models-if they are ignored. Most industrialbased experimental designs are sequential. This usually involves running as few initial tests as possible, while getting enough information as is needed to provide a reasonable approximation to reality (the screening stage). The CCD design strategy generally requires the running of a full or fractional factorial design (the cube or hypercube) with one or more additional centre points. The cube is augmented, if deemed necessary, by additional experiments known as star-points. The major problems highlighted here concern the decision to run the star points or not. If the difference between the average response at the centre of the design and the average of the cube results is significant, there is probably a need for one or more quadratic terms in the predictive model. If not, then a simpler model that includes only main effects and interactions is usually considered sufficient. This test for 'curvature' in a main effect will often fail if the design space contains or surrounds a saddle-point. Such a point may disguise the need for a quadratic term. This paper describes the occurrence of a real saddle-point from an industrial project and how this was overcome. The second problem occurs because the cube and star point portions of a CCD are sometimes run as orthogonal blocks. Indeed, theory would suggest that this is the correct procedure. However in the industrial context, where minimizing the total number of tests is at a premium, this can lead to designs with star points a long way from the cube. In such a situation, were the curvature test to be found non-significant, we could end with a model that predicted well within the cube portion of the design space but that would be unreliable in the balance of the total area of investigation. The paper discusses just such a design, one that disguised the real need for a quadratic term.  相似文献   

16.
This paper discusses two problems, which can occur when using central composite designs (CCDs), that are not generally covered in the literature but can lead to wrong decisions-and therefore incorrect models-if they are ignored. Most industrialbased experimental designs are sequential. This usually involves running as few initial tests as possible, while getting enough information as is needed to provide a reasonable approximation to reality (the screening stage). The CCD design strategy generally requires the running of a full or fractional factorial design (the cube or hypercube) with one or more additional centre points. The cube is augmented, if deemed necessary, by additional experiments known as star-points. The major problems highlighted here concern the decision to run the star points or not. If the difference between the average response at the centre of the design and the average of the cube results is significant, there is probably a need for one or more quadratic terms in the predictive model. If not, then a simpler model that includes only main effects and interactions is usually considered sufficient. This test for 'curvature' in a main effect will often fail if the design space contains or surrounds a saddle-point. Such a point may disguise the need for a quadratic term. This paper describes the occurrence of a real saddle-point from an industrial project and how this was overcome. The second problem occurs because the cube and star point portions of a CCD are sometimes run as orthogonal blocks. Indeed, theory would suggest that this is the correct procedure. However in the industrial context, where minimizing the total number of tests is at a premium, this can lead to designs with star points a long way from the cube. In such a situation, were the curvature test to be found non-significant, we could end with a model that predicted well within the cube portion of the design space but that would be unreliable in the balance of the total area of investigation. The paper discusses just such a design, one that disguised the real need for a quadratic term.  相似文献   

17.
Zero-inflated models are commonly used for modeling count and continuous data with extra zeros. Inflations at one point or two points apart from zero for modeling continuous data have been discussed less than that of zero inflation. In this article, inflation at an arbitrary point α as a semicontinuous distribution is presented and the mean imputation for a continuous response is discussed as a cause of having semicontinuous data. Also, inflation at two points and generally at k arbitrary points and their relation to cell-mean imputation in the mixture of continuous distributions are studied. To analyze the imputed data, a mixture of semicontinuous distributions is used. The effects of covariates on the dependent variable in a mixture of k semicontinuous distributions with inflation at k points are also investigated. In order to find the parameter estimates, the method of expectation–maximization (EM) algorithm is used. In a real data of Iranian Households Income and Expenditure Survey (IHIES), it is shown how to obtain a proper estimate of the population variance when continuous missing at random responses are mean imputed.  相似文献   

18.
Summary.  In many therapeutic areas, the identification and validation of surrogate end points is of prime interest to reduce the duration and/or size of clinical trials. Buyse and co-workers and Burzykowski and co-workers have proposed a validation strategy for end points that are either normally distributed or (possibly censored) failure times. In this paper, we address the problem of validating an ordinal categorical or binary end point as a surrogate for a failure time true end point. In particular, we investigate the validity of tumour response as a surrogate for survival time in evaluating fluoropyrimidine-based experimental therapies for advanced colorectal cancer. Our analysis is performed on data from 28 randomized trials in advanced colorectal cancer, which are available through the Meta-Analysis Group in Cancer.  相似文献   

19.
We consider the comparison of point processes in a discrete observation situation in which each subject is observed only at discrete time points and no history information between observation times is available. A class of non-parametric test statistics for the comparison of point processes based on this kind of data is presented and their asymptotic distributions are derived. The proposed tests are generalizations of the corresponding tests for continuous observations. Some results from a simulation study for evaluating the proposed tests are presented and an illustrative example from a clinical trial is discussed.  相似文献   

20.
Zero-inflated power series distribution is commonly used for modelling count data with extra zeros. Inflation at point zero has been investigated and several tests for zero inflation have been examined. However sometimes, inflation occurs at a point apart from zero. In this case, we say inflation occurs at an arbitrary point j. The j-inflation has been discussed less than zero inflation. In this paper, inflation at an arbitrary point j is studied with more details and a Bayesian test for detecting inflation at point j is presented. The Bayesian method is extended to inflation at arbitrary points i and j. The relationship between the distribution for inflation at point j, inflation at points i and j and missing value imputation is studied. It is shown how to obtain a proper estimate of the population variance if a mean-imputed missing at random data set is used. Some simulation studies are conducted and the proposed Bayesian test is applied on two real data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号