首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Computing location depth and regression depth in higher dimensions   总被引:3,自引:0,他引:3  
The location depth (Tukey 1975) of a point relative to a p-dimensional data set Z of size n is defined as the smallest number of data points in a closed halfspace with boundary through . For bivariate data, it can be computed in O(nlogn) time (Rousseeuw and Ruts 1996). In this paper we construct an exact algorithm to compute the location depth in three dimensions in O(n2logn) time. We also give an approximate algorithm to compute the location depth in p dimensions in O(mp3+mpn) time, where m is the number of p-subsets used.Recently, Rousseeuw and Hubert (1996) defined the depth of a regression fit. The depth of a hyperplane with coefficients (1,...,p) is the smallest number of residuals that need to change sign to make (1,...,p) a nonfit. For bivariate data (p=2) this depth can be computed in O(nlogn) time as well. We construct an algorithm to compute the regression depth of a plane relative to a three-dimensional data set in O(n2logn) time, and another that deals with p=4 in O(n3logn) time. For data sets with large n and/or p we propose an approximate algorithm that computes the depth of a regression fit in O(mp3+mpn+mnlogn) time. For all of these algorithms, actual implementations are made available.  相似文献   

3.
4.
R. Göb 《Statistical Papers》1992,33(1):273-277
In elementary probability theory, as a result of a limiting process the probabilities of aBi(n, p) binomial distribution are approximated by the probabilities of aPo(np) Poisson distribution. Accordingly, in statistical quality control the binomial operating characteristic function \(\mathcal{L}_{n,c} (p)\) is approximated by the Poisson operating characteristic function \(\mathcal{F}_{n,c} (p)\) . The inequality \(\mathcal{L}_{n + 1,c + 1} (p) > \mathcal{L}_{n,c} (p)\) forp∈(0;1) is evident from the interpretation of \(\mathcal{L}_{n + 1,c + 1} (p)\) , \(\mathcal{L}_{n,c} (p)\) as probabilities of accepting a lot. It is shown that the Poisson approximation \(\mathcal{F}_{n,c} (p)\) preserves this essential feature of the binomial operating characteristic function, i.e. that an analogous inequality holds for the Poisson operating characteristic function, too.  相似文献   

5.
6.
In this paper, a procedure based on the delete-1 cross-validation is given for estimating the number of superimposed exponential signals, its limiting behavior is explored and it is shown that the probability of overestimating the true number of signals is greater than a positive constant for sufficiently large samples. Also a general procedure based on the cross-validation is presented when the deletion proceeds according to a collection of subsets of indices. The result is similar to the delete-1 cross-validation if the number of deletions is fixed. The simulation results are provided for the performance of the procedure when the collections of subsets of indices are chosen as those suggested by Shao [1] Shao, J. 1993. Linear model selection by cross-validation. J. Amer. Statist. Assoc., 88: 486494. [Taylor & Francis Online], [Web of Science ®] [Google Scholar]in a linear model selection problem.  相似文献   

7.
Let \(\mathbb{N } = \{1, 2, 3, \ldots \}\) . Let \(\{X, X_{n}; n \in \mathbb N \}\) be a sequence of i.i.d. random variables, and let \(S_{n} = \sum _{i=1}^{n}X_{i}, n \in \mathbb N \) . Then \( S_{n}/\sqrt{n} \Rightarrow N(0, \sigma ^{2})\) for some \(\sigma ^{2} < \infty \) whenever, for a subsequence \(\{n_{k}; k \in \mathbb N \}\) of \(\mathbb N \) , \( S_{n_{k}}/\sqrt{n_{k}} \Rightarrow N(0, \sigma ^{2})\) . Motivated by this result, we study the central limit theorem along subsequences of sums of i.i.d. random variables when \(\{\sqrt{n}; n \in \mathbb N \}\) is replaced by \(\{\sqrt{na_{n}};n \in \mathbb N \}\) with \(\lim _{n \rightarrow \infty } a_{n} = \infty \) . We show that, for given positive nondecreasing sequence \(\{a_{n}; n \in \mathbb N \}\) with \(\lim _{n \rightarrow \infty } a_{n} = \infty \) and \(\lim _{n \rightarrow \infty } a_{n+1}/a_{n} = 1\) and given nondecreasing function \(h(\cdot ): (0, \infty ) \rightarrow (0, \infty )\) with \(\lim _{x \rightarrow \infty } h(x) = \infty \) , there exists a sequence \(\{X, X_{n}; n \in \mathbb N \}\) of symmetric i.i.d. random variables such that \(\mathbb E h(|X|) = \infty \) and, for some subsequence \(\{n_{k}; k \in \mathbb N \}\) of \(\mathbb N \) , \( S_{n_{k}}/\sqrt{n_{k}a_{n_{k}}} \Rightarrow N(0, 1)\) . In particular, for given \(0 < p < 2\) and given nondecreasing function \(h(\cdot ): (0, \infty ) \rightarrow (0, \infty )\) with \(\lim _{x \rightarrow \infty } h(x) = \infty \) , there exists a sequence \(\{X, X_{n}; n \in \mathbb N \}\) of symmetric i.i.d. random variables such that \(\mathbb E h(|X|) = \infty \) and, for some subsequence \(\{n_{k}; k \in \mathbb N \}\) of \(\mathbb N \) , \( S_{n_{k}}/n_{k}^{1/p} \Rightarrow N(0, 1)\) .  相似文献   

8.
We study the validation of prediction rules such as regression models and classification algorithms through two out-of-sample strategies, cross-validation and accumulated prediction error. We use the framework of Efron (1983 Efron , B. ( 1983 ). Estimating the error rate of a prediction rule: improvement on cross-validation . Journal of the American Statistical Association 78 : 316331 .[Taylor & Francis Online], [Web of Science ®] [Google Scholar]) where measures of prediction errors are defined as sample averages of expected errors and show through exact finite sample calculations that cross-validation and accumulated prediction error yield different smoothing parameter choices in nonparametric regression. The difference in choice does not vanish as sample size increases.  相似文献   

9.
10.
11.
The purpose of this article is to explain cross-validation and describe its use in regression. Because replicability analyses are not typically employed in studies, this is a topic with which many researchers may not be familiar. As a result, researchers may not understand how to conduct cross-validation in order to evaluate the replicability of their data. This article not only explains the purpose of cross-validation, but also uses the widely available Holzinger and Swineford (1939 Holzinger, K.J., Swineford, F. (1939). A Study in Factor Analysis: The Stability of a Bi-Factor Solution. Chicago, IL: University of Chicago. Available at: http://people.cehd.tamu.edu/~bthompson/datasets.htm [Google Scholar]) dataset as a heuristic example to concretely demonstrate its use. By incorporating multiple tables and examples of SPSS syntax and output, the reader is provided with additional visual examples in order to further clarify the steps involved in conducting cross-validation. A brief discussion of the limitations of cross-validation is also included. After reading this article, the reader should have a clear understanding of cross-validation, including when it is appropriate to use, and how it can be used to evaluate replicability in regression.  相似文献   

12.
13.
14.
The Andersen-Gill multiplicative intensity(MI) model is well-suited to the analysis of recurrent failuretime data. The fundamental assumption of the MI model is thatthe process M_i(t) for subjects i=1,,n,defined to be the difference between a subject's counting processand compensator, i.e., N_i(t) A_i(t); >0,is a martingale with respect to some filtration. We propose omnibusprocedures for testing this assumption. The methods are basedon transformations of the estimated martingale residual process ^M i (t) a function of consistent estimatesof the log-intensity ratios and the baseline cumulative hazard.Under a correctly specified model, the expected value of ^M i (t)is approximately equal to zero with approximately uncorrelatedincrements. These properties are exploited in the proposed testingprocedures. We examine the effects of censoring and covariateeffects on the operating characteristics of the proposed methodsvia simulation. The procedures are most sensitive to the omissionof a time-varying continuous covariate. We illustrate use ofthe methods in an analysis of data from a clinical trial involvingpatients with chronic granulatomous disease.  相似文献   

15.
Many stochastic processes considered in applied probability models, and, in particular, in reliability theory, are processes of the following form: Shocks occur according to some point process, and each shock causes the process to have a random jump. Between shocks the process increases or decreases in some deterministic fashion. In this paper we study processes for which the rate of increase or decrease between shocks depends only on the height of the process. For such processes we find conditions under which the processes can be stochastically compared. We also study hybrid processes in which periods of increase and periods of decrease alternate. A further result yields a stochastic comparison of processes that start with a random jump, rather than processes in which there is at the beginning some random delay time before the first jump.Supported by NSF Grant DMS 9303891.  相似文献   

16.
We introduce a simple combinatorial scheme for systematically running through a complete enumeration of sample reuse procedures such as the bootstrap, Hartigan's subsets, and various permutation tests. The scheme is based on Gray codes which give tours through various spaces, changing only one or two points at a time. We use updating algorithms to avoid recomputing statistics and achieve substantial speedups. Several practical examples and computer codes are given.  相似文献   

17.
In this paper we address the problem of protecting confidentiality in statistical tables containing sensitive information that cannot be disseminated. This is an issue of primary importance in practice. Cell Suppression is a widely-used technique for avoiding disclosure of sensitive information, which consists in suppressing all sensitive table entries along with a certain number of other entries, called complementary suppressions. Determining a pattern of complementary suppressions that minimizes the overall loss of information results into a difficult (i.e., -hard) optimization problem known as the Cell Suppression Problem. We propose here a different protection methodology consisting of replacing some table entries by appropriate intervals containing the actual value of the unpublished cells. We call this methodology Partial Cell Suppression, as opposed to the classical complete cell suppression. Partial cell suppression has the important advantage of reducing the overall information loss needed to protect the sensitive information. Also, the new method provides automatically auditing ranges for each unpublished cell, thus saving an often time-consuming task to the statistical office while increasing the information explicitly provided with the table. Moreover, we propose an efficient (i.e., polynomial-time) algorithm to find an optimal partial suppression solution. A preliminary computational comparison between partial and complete suppression methologies is reported, showing the advantages of the new approach. Finally, we address possible extensions leading to a unified complete/partial cell suppression framework.  相似文献   

18.
19.
Multi-layer perceptrons (MLPs), a common type of artificial neural networks (ANNs), are widely used in computer science and engineering for object recognition, discrimination and classification, and have more recently found use in process monitoring and control. Training such networks is not a straightforward optimisation problem, and we examine features of these networks which contribute to the optimisation difficulty.Although the original perceptron, developed in the late 1950s (Rosenblatt 1958, Widrow and Hoff 1960), had a binary output from each node, this was not compatible with back-propagation and similar training methods for the MLP. Hence the output of each node (and the final network output) was made a differentiable function of the network inputs. We reformulate the MLP model with the original perceptron in mind so that each node in the hidden layers can be considered as a latent (that is, unobserved) Bernoulli random variable. This maintains the property of binary output from the nodes, and with an imposed logistic regression of the hidden layer nodes on the inputs, the expected output of our model is identical to the MLP output with a logistic sigmoid activation function (for the case of one hidden layer).We examine the usual MLP objective function—the sum of squares—and show its multi-modal form and the corresponding optimisation difficulty. We also construct the likelihood for the reformulated latent variable model and maximise it by standard finite mixture ML methods using an EM algorithm, which provides stable ML estimates from random starting positions without the need for regularisation or cross-validation. Over-fitting of the number of nodes does not affect this stability. This algorithm is closely related to the EM algorithm of Jordan and Jacobs (1994) for the Mixture of Experts model.We conclude with some general comments on the relation between the MLP and latent variable models.  相似文献   

20.
Several estimators of squared prediction error have been suggested for use in model and bandwidth selection problems. Among these are cross-validation, generalized cross-validation and a number of related techniques based on the residual sum of squares. For many situations with squared error loss, e.g. nonparametric smoothing, these estimators have been shown to be asymptotically optimal in the sense that in large samples the estimator minimizing the selection criterion also minimizes squared error loss. However, cross-validation is known not to be asymptotically optimal for some `easy' location problems. We consider selection criteria based on estimators of squared prediction risk for choosing between location estimators. We show that criteria based on adjusted residual sum of squares are not asymptotically optimal for choosing between asymptotically normal location estimators that converge at rate n 1/2but are when the rate of convergence is slower. We also show that leave-one-out cross-validation is not asymptotically optimal for choosing between √ n -differentiable statistics but leave- d -out cross-validation is optimal when d ∞ at the appropriate rate.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号