期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Iterated Bootstrap-t Confidence Intervals for Density Functions

YVONNE H. S. HO STEPHEN M. S. LEE 《Scandinavian Journal of Statistics》2008,35(2):295-308

Abstract. Conventional bootstrap- t intervals for density functions based on kernel density estimators exhibit poor coverages due to failure of the bootstrap to estimate the bias correctly. The problem can be resolved by either estimating the bias explicitly or undersmoothing the kernel density estimate to undermine its bias asymptotically. The resulting bias-corrected intervals have an optimal coverage error of order arbitrarily close to second order for a sufficiently smooth density function. We investigated the effects on coverage error of both bias-corrected intervals when the nominal coverage level is calibrated by the iterated bootstrap. In either case, an asymptotic reduction of coverage error is possible provided that the bias terms are handled using an extra round of smoothed bootstrapping. Under appropriate smoothness conditions, the optimal coverage error of the iterated bootstrap- t intervals has order arbitrarily close to third order. Examples of both simulated and real data are reported to illustrate the iterated bootstrap procedures. 相似文献

2.

Bootstrap MISE Estimators to Obtain Bandwidth for Kernel Density Estimation

Jeffrey C. Miecznikowski Dongliang Wang Alan Hutson 《统计学通讯:模拟与计算》2013,42(7):1455-1469

Research in the area of bandwidth selection was an active topic in the 1980s and 1990s, however, recently there has been little research in the area. We re-opened this investigation and have found a new method for estimating mean integrated squared error for kernel density estimators. We provide an overview of other methods to obtain optimal bandwidths and offer a comparison of these methods via a simulation study. In certain situations, our method of estimating an optimal bandwidth yields a smaller MISE than competing methods to compute bandwidths. This procedure is illustrated by an application to two data sets. 相似文献

3.

On the Estimation of the Density of a Directional Data Stream

下载免费PDF全文

Aboubacar Amiri Baba Thiam Thomas Verdebout 《Scandinavian Journal of Statistics》2017,44(1):249-267

Many directional data such as wind directions can be collected extremely easily so that experiments typically yield a huge number of data points that are sequentially collected. To deal with such big data, the traditional nonparametric techniques rapidly require a lot of time to be computed and therefore become useless in practice if real time or online forecasts are expected. In this paper, we propose a recursive kernel density estimator for directional data which (i) can be updated extremely easily when a new set of observations is available and (ii) keeps asymptotically the nice features of the traditional kernel density estimator. Our methodology is based on Robbins–Monro stochastic approximations ideas. We show that our estimator outperforms the traditional techniques in terms of computational time while being extremely competitive in terms of efficiency with respect to its competitors in the sequential context considered here. We obtain expressions for its asymptotic bias and variance together with an almost sure convergence rate and an asymptotic normality result. Our technique is illustrated on a wind dataset collected in Spain. A Monte‐Carlo study confirms the nice properties of our recursive estimator with respect to its non‐recursive counterpart. 相似文献

4.

A robust automatic clustering algorithm for probability density functions with application to categorizing color images

J. H. Chen Y. C. Chang 《统计学通讯:模拟与计算》2018,47(7):2152-2168

This study develops a robust automatic algorithm for clustering probability density functions based on the previous research. Unlike other existing methods that often pre-determine the number of clusters, this method can self-organize data groups based on the original data structure. The proposed clustering method is also robust in regards to noise. Three examples of synthetic data and a real-world COREL dataset are utilized to illustrate the accurateness and effectiveness of the proposed approach. 相似文献

5.

Spatial dependence estimation using FFT of biased covariances

Rubén Fernández-Casal Rosa M. Crujeiras 《Journal of statistical planning and inference》2010

One of the main problems in geostatistics is fitting a valid variogram or covariogram model in order to describe the underlying dependence structure in the data. The dependence between observations can be also modeled in the spectral domain, but the traditional methods based on the periodogram as an estimator of the spectral density may present some problems for the spatial case. In this work, we propose an estimation method for the covariogram parameters based on the fast Fourier transform (FFT) of biased covariances. The performance of this estimator for finite samples is compared through a simulation study with other classical methods stated in spatial domain, such as weighted least squares and maximum likelihood, as well as with other spectral estimators. Additionally, an example of application to real data is given. 相似文献

6.

Density Estimation for the Metropolis–Hastings Algorithm

M. Sköld G. O. Roberts 《Scandinavian Journal of Statistics》2003,30(4):699-718

Abstract. Kernel density estimation is an important tool in visualizing posterior densities from Markov chain Monte Carlo output. It is well known that when smooth transition densities exist, the asymptotic properties of the estimator agree with those for independent data. In this paper, we show that because of the rejection step of the Metropolis–Hastings algorithm, this is no longer true and the asymptotic variance will depend on the probability of accepting a proposed move. We find an expression for this variance and apply the result to algorithms for automatic bandwidth selection. 相似文献

7.

Interactive local bandwidth choice

Marron J. S. Udina F. 《Statistics and Computing》1999,9(2):101-110

A tool for user choice of the local bandwidth function for kernel density and nonparametric regression estimates is developed using KDE, a graphical object-oriented package for interactive kernel density estimation written in LISP-STAT. The bandwidth function is a parameterized spline, whose knots are manipulated by the user in one window, while the resulting estimate appears in another window. A real data illustration of this method raises concerns, because an extremely large family of estimates is available. Suggestions are made to overcome this problem so that this tool can be used effectively for presenting final results of a data analysis. 相似文献

8.

Nonparametric estimation of the mode of a distribution of random curves

Th. Gasser P. Hall & B. Presnell 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1998,60(4):681-691

Motivated by the need to develop meaningful empirical approximations to a 'typical' data value, we introduce methods for density and mode estimation when data are in the form of random curves. Our approach is based on finite dimensional approximations via generalized Fourier expansions on an empirically chosen basis. The mode estimation problem is reduced to a problem of kernel-type multivariate estimation from vector data and is solved using a new recursive algorithm for finding the empirical mode. The algorithm may be used as an aid to the identification of clusters in a set of data curves. Bootstrap methods are employed to select the bandwidth. 相似文献

9.

Bayesian Inference in Multivariate Regression With Missing Observations on the Response Variables

Irwin Guttman Ulrich Menzefricke 《商业与经济统计学杂志》2013,31(3):239-248

We discuss the case of the multivariate linear model Y = XB + E with Y an (n × p) matrix, and so on, when there are missing observations in the Y matrix in a so-called nested pattern. We propose an analysis that arises by incorporating the predictive density of the missing observations in determining the posterior distribution of B, and its mean and variance matrix. This involves us with matric-T variables. The resulting analysis is illustrated with some Canadian economic data. 相似文献

10.

L'effet de l'erreur d'arrondi sur l'estimateur d'une densité par la méthode du noyau

Louis Houde Yves Lepage 《Revue canadienne de statistique》1991,19(4):361-369

In a model for rounded data suppose that the random sample X₁,.,.,X_n,. i.i.d., is transformed into an observed random sample X_1Δ,.,.,X_nΔ, where X_iΔ = 2vΔ if X_i, ∈ (2vΔ - Δ, 2vΔ + Δ), for i = 1,.,.,n. We show that the precision Δ of the observations has an important effect on the shape of the kernel density estimator, and we identify important points for the graphical display of this estimator. We examine the IMSE criteria to find the optimal window under the rounded-data model. 相似文献

11.

Simulating from irregular data: Kernel Carlo Simulation

J. Andrew Howe 《Journal of Statistical Computation and Simulation》2013,83(3):446-457

In this paper, we address the problem of simulating from a data-generating process for which the observed data do not follow a regular probability distribution. One existing method for doing this is bootstrapping, but it is incapable of interpolating between observed data. For univariate or bivariate data, in which a mixture structure can easily be identified, we could instead simulate from a Gaussian mixture model. In general, though, we would have the problem of identifying and estimating the mixture model. Instead of these, we introduce a non-parametric method for simulating datasets like this: Kernel Carlo Simulation. Our algorithm begins by using kernel density estimation to build a target probability distribution. Then, an envelope function that is guaranteed to be higher than the target distribution is created. We then use simple accept–reject sampling. Our approach is more flexible than others, can simulate intelligently across gaps in the data, and requires no subjective modelling decisions. With several univariate and multivariate examples, we show that our method returns simulated datasets that, compared with the observed data, retain the covariance structures and have distributional characteristics that are remarkably similar. 相似文献

12.

Bivariate deconvolution with SIMEX: an application to mapping Alaska earthquake density

Julie McIntyre Ronald P. Barry 《Journal of applied statistics》2012,39(2):297-308

Constructing spatial density maps of seismic events, such as earthquake hypocentres, is complicated by the fact that events are not located precisely. In this paper, we present a method for estimating density maps from event locations that are measured with error. The estimator is based on the simulation–extrapolation method of estimation and is appropriate for location errors that are either homoscedastic or heteroscedastic. A simulation study shows that the estimator outperforms the standard estimator of density that ignores location errors in the data, even when location errors are spatially dependent. We apply our method to construct an estimated density map of earthquake hypocenters using data from the Alaska earthquake catalogue. 相似文献

13.

Estimation of the angular density in bivariate generalized Pareto models

René Michel 《Statistics》2013,47(2):187-202

We investigate a method to estimate the angular density non-parametrically in bivariate generalized Pareto models. The angular density can be used as a visual tool to gain a first insight into the tail-dependence structure of given data. We derive a representation of the angular density by means of the Pickands density and use it to construct our estimator. The estimator is asymptotically normal under certain regularity conditions. We also test it with simulated data and give an application to a real hydrological data set. Finally, we show that our estimator cannot be transferred directly to higher dimensions. 相似文献

14.

Exploring multivariate data using directions of high density

FOSTER PETER 《Statistics and Computing》1998,8(4):347-355

The most common techniques for graphically presenting a multivariate dataset involve projection onto a one or two-dimensional subspace. Interpretation of such plots is not always straightforward because projections are smoothing operations in that structure can be obscured by projection but never enhanced. In this paper an alternative procedure for finding interesting features is proposed that is based on locating the modes of an induced hyperspherical density function, and a simple algorithm for this purpose is developed. Emphasis is placed on identifying the non-linear effects, such as clustering, so to this end the data are firstly sphered to remove all of the location, scale and correlational structure. A set of simulated bivariate data and artistic qualities of painters data are used as examples. 相似文献

15.

Dot Plots

Leland Wilkinson 《The American statistician》2013,67(3):276-281

Dot plots represent individual observations in a batch of data with symbols, usually circular dots. They have been used for more than 100 years to depict distributions in detail. Hand-drawn examples show their authors' efforts to arrange symbols so that they are as near as possible to their proper locations on a scale without overlapping enough to obscure each other. Recent computer programs that attempt to reproduce these historical plots have unfortunately resorted to simple histogram binning instead of using methods that follow the rules for the hand-drawn examples. This article introduces an algorithm that more accurately represents the dot plots cited in the literature. 相似文献

16.

CORRECTING FOR KURTOSIS IN DENSITY ESTIMATION 总被引：1，自引：0，他引：1

D. Ruppert M.P. Wand 《Australian & New Zealand Journal of Statistics》1992,34(1):19-29

Using a global window width kernel estimator to estimate an approximately symmetric probability density with high kurtosis usually leads to poor estimation because good estimation of the peak of the distribution leads to unsatisfactory estimation of the tails and vice versa. The technique proposed corrects for kurtosis via a transformation of the data before using a global window width kernel estimator. The transformation depends on a “generalised smoothing parameter” consisting of two real-valued parameters and a window width parameter which can be selected either by a simple graphical method or, for a completely data-driven implementation, by minimising an estimate of mean integrated squared error. Examples of real and simulated data demonstrate the effectiveness of this approach, which appears suitable for a wide range of symmetric, unimodal densities. Its performance is similar to ordinary kernel estimation in situations where the latter is effective, e.g. Gaussian densities. For densities like the Cauchy where ordinary kernel estimation is not satisfactory, our methodology offers a substantial improvement. 相似文献

17.

A Semiparametric Bayesian Model for Circular-Linear Regression

Barbara Jane George Kaushik Ghosh 《统计学通讯:模拟与计算》2013,42(4):911-923

Circular data are observations that are represented as points on a unit circle. Times of day and directions of wind are two such examples. In this work, we present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is useful especially when the likelihood surface is ill behaved. Markov chain Monte Carlo techniques are used to fit the proposed model and to generate predictions. The method is illustrated using an environmental data set. 相似文献

18.

A density based empirical likelihood approach for testing bivariate normality

Gregory Gurevich 《Journal of Statistical Computation and Simulation》2018,88(13):2540-2560

Sample entropy based tests, methods of sieves and Grenander estimation type procedures are known to be very efficient tools for assessing normality of underlying data distributions, in one-dimensional nonparametric settings. Recently, it has been shown that the density based empirical likelihood (EL) concept extends and standardizes these methods, presenting a powerful approach for approximating optimal parametric likelihood ratio test statistics, in a distribution-free manner. In this paper, we discuss difficulties related to constructing density based EL ratio techniques for testing bivariate normality and propose a solution regarding this problem. Toward this end, a novel bivariate sample entropy expression is derived and shown to satisfy the known concept related to bivariate histogram density estimations. Monte Carlo results show that the new density based EL ratio tests for bivariate normality behave very well for finite sample sizes. To exemplify the excellent applicability of the proposed approach, we demonstrate a real data example. 相似文献

19.

Practical performance of local likelihood for circular density estimation

M. Di Marzio S. Fensore A. Panzera C. C. Taylor 《Journal of Statistical Computation and Simulation》2016,86(13):2560-2572

ABSTRACT

Local likelihood has been mainly developed from an asymptotic point of view, with little attention to finite sample size issues. The present paper provides simulation evidence of how likelihood density estimation practically performs from two points of view. First, we explore the impact of the normalization step of the final estimate, second we show the effectiveness of higher order fits in identifying modes present in the population when small sample sizes are available. We refer to circular data, nevertheless it is easily seen that our findings straightforwardly extend to the Euclidean setting, where they appear to be somehow new. 相似文献

20.

Kernel density estimation with binned data

David W. Scott Simon J. Sheather 《统计学通讯:理论与方法》2013,42(6):1353-1359

Continuous data are often measured or used in binned or rounded form. In this paper we follow up on Hall's work analyzing the effect of using equally-spaced binned data in a kernel density estimator. It is shown that a surprisingly large amount of binning does not adversely affect the integrated mean squared error of a kernel estimate. 相似文献