期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Tail density estimation for exploratory data analysis using kernel methods

B. Béranger T. Duong S. E. Perkins-Kirkpatrick S. A. Sisson 《Journal of nonparametric statistics》2019,31(1):144-174

It is often critical to accurately model the upper tail behaviour of a random process. Nonparametric density estimation methods are commonly implemented as exploratory data analysis techniques for this purpose and can avoid model specification biases implied by using parametric estimators. In particular, kernel-based estimators place minimal assumptions on the data, and provide improved visualisation over scatterplots and histograms. However kernel density estimators can perform poorly when estimating tail behaviour above a threshold, and can over-emphasise bumps in the density for heavy tailed data. We develop a transformation kernel density estimator which is able to handle heavy tailed and bounded data, and is robust to threshold choice. We derive closed form expressions for its asymptotic bias and variance, which demonstrate its good performance in the tail region. Finite sample performance is illustrated in numerical studies, and in an expanded analysis of the performance of global climate models. 相似文献

2.

Kernel estimations for multivariate density functional with bootstrap

Dewang Li 《统计学通讯:理论与方法》2017,46(9):4631-4641

In this article the bootstrap method is discussed for the kernel estimation of the multivariate density function. We have considered sample mean functional and constructed its consistency and asymptotic normality by bootstrap estimator. It has been shown that the bootstrap works for kernel estimates of multivariate density functional. The convergence rate with bootstrap for density has been proved. Finally, two simulations of application are given. 相似文献

3.

Density based exploration of bivariate data

Adrian Bowman Peter Foster 《Statistics and Computing》1993,3(4):171-177

The difficulties of assessing details of the shape of a bivariate distribution, and of contrasting subgroups, from a raw scatterplot are discussed. The use of contours of a density estimate in highlighting features of distributional shape is illustrated on data on the development of aircraft technology. The estimated density height at each observation imposes an ordering on the data which can be used to select contours which contain specified proportions of the sample. This leads to a display which is reminiscent of a boxplot and which allows simple but effective comparison of different groups. Some simple properties of this technique are explored.Interesting features of a distribution such as arms and multimodality are found along the directions where the largest probability mass is located. These directions can be quantified through the modes of a density estimate based on the direction of each observation. 相似文献

4.

Pointwise and uniform convergence of multivariate kernel density estimators using random bandwidths

Santanu Dutta Koushik Saha 《统计学通讯:理论与方法》2017,46(6):2708-2723

We obtain the rates of pointwise and uniform convergence of multivariate kernel density estimators using a random bandwidth vector obtained by some data-based algorithm. We are able to obtain faster rate for pointwise convergence. The uniform convergence rate is obtained under some moment condition on the marginal distribution. The rates are obtained under i.i.d. and strongly mixing type dependence assumptions. 相似文献

5.

Pairwise directions estimation for multivariate response regression data

Heng-Hui Lue 《Journal of Statistical Computation and Simulation》2019,89(5):776-794

This article concerns the analysis of multivariate response data with multi-dimensional covariates. Based on local linear smoothing techniques, we propose an iteratively adaptive estimation method to reduce the dimensions of response variables and covariates. Two weighted estimation strategies are incorporated in our approach to provide initial estimates. Our proposal is also extended to curve response data for a data-adaptive basis function searching. Instead of focusing on goodness of fit, we shift the problem to reveal the data structure and basis patterns. Simulation studies with multivariate response and curve data are conducted for our pairwise directions estimation (PDE) approach in comparison with sliced inverse regression of Li et al. [Dimension reduction for multivariate response data. J Amer Statist Assoc. 2003;98:99–109]. The results demonstrate that the proposed PDE method is useful for data with responses approximating linear or bending structures. Illustrative applications to two real datasets are also presented. 相似文献

6.

MulticlusterKDE: a new algorithm for clustering based on multivariate kernel density estimation

D. Scaldelai L. C. Matioli S. R. Santos M. Kleina 《Journal of applied statistics》2022,49(1):98

In this paper, we propose the MulticlusterKDE algorithm applied to classify elements of a database into categories based on their similarity. MulticlusterKDE is centered on the multiple optimization of the kernel density estimator function with multivariate Gaussian kernel. One of the main features of the proposed algorithm is that the number of clusters is an optional input parameter. Furthermore, it is very simple, easy to implement, well defined and stops at a finite number of steps and it always converges regardless of the data set. We illustrate our findings by implementing the algorithm in R software. The results indicate that the MulticlusterKDE algorithm is competitive when compared to K-means, K-medoids, CLARA, DBSCAN and PdfCluster algorithms. Features such as simplicity and efficiency make the proposed algorithm an attractive and promising research field that can be used as basis for its improvement and also for the development of new density-based clustering algorithms. 相似文献

7.

Simple estimation of the mode of a multivariate density

Christophe Abraham Grard Biau Benoît Cadre 《Revue canadienne de statistique》2003,31(1):23-34

The authors consider an estimate of the mode of a multivariate probability density using a kernel estimate drawn from a random sample. The estimate is defined by maximizing the kernel estimate over the set of sample values. The authors show that this estimate is strongly consistent and give an almost sure rate of convergence. This rate depends on the sharpness of the density near the true mode, which is measured by a peak index. 相似文献

8.

A new algorithm for clustering based on kernel density estimation

L. C. Matioli S.R. Santos M. Kleina E. A. Leite 《Journal of applied statistics》2018,45(2):347-366

In this paper, we present an algorithm for clustering based on univariate kernel density estimation, named ClusterKDE. It consists of an iterative procedure that in each step a new cluster is obtained by minimizing a smooth kernel function. Although in our applications we have used the univariate Gaussian kernel, any smooth kernel function can be used. The proposed algorithm has the advantage of not requiring a priori the number of cluster. Furthermore, the ClusterKDE algorithm is very simple, easy to implement, well-defined and stops in a finite number of steps, namely, it always converges independently of the initial point. We also illustrate our findings by numerical experiments which are obtained when our algorithm is implemented in the software Matlab and applied to practical applications. The results indicate that the ClusterKDE algorithm is competitive and fast when compared with the well-known Clusterdata and K-means algorithms, used by Matlab to clustering data. 相似文献

9.

Weak and strong uniform consistency rates of kernel density estimates for randomly censored data

R. J. Karunamuni Song Yang 《Revue canadienne de statistique》1991,19(4):349-359

Let X₁,., X_n, be i.i.d. random variables with distribution function F, and let Y₁,.,.,Y_n be i.i.d. with distribution function G. For i = 1, 2,.,., n set δ_i, = 1 if X_i ≤ Y_i, and 0 otherwise, and X_i, = min{X_i, K_i}. A kernel-type density estimate of f, the density function of F w.r.t. Lebesgue measure on the Borel o-field, based on the censored data (δ_i, X_i), i = 1,.,.,n, is considered. Weak and strong uniform consistency properties over the whole real line are studied. Rates of convergence results are established under higher-order differentiability assumption on f. A procedure for relaxing such assumptions is also proposed. 相似文献

10.

The exact density of nonparametric regression estimators: fixed design case

Aman Ullah 《统计学通讯:理论与方法》2013,42(5):1251-1254

This paper studies the exact density of a general nonparametric regression estimator when the errors are non-normal. The fixed design case is considered. The density function is derived by an application of the technique of Davis (1976) 相似文献

11.

Exploratory data analysis for counts using the empirical probability generating function 1

Miguel Nakamura Victor Pérez-Abreu 《统计学通讯:理论与方法》2013,42(3):827-842

We present a graphical method based on the empirical probability generating function for preliminary statistical analysis of distributions for counts. The method is especially useful in fitting a Poisson model, or for identifying alternative models as well as possible outlying observations from general discrete distributions. 相似文献

12.

Nonparametric density estimation for multivariate bounded data

Taoufik Bouezmarni Jeroen V.K. Rombouts 《Journal of statistical planning and inference》2010,140(1):139-152

We propose a new nonparametric estimator for the density function of multivariate bounded data. As frequently observed in practice, the variables may be partially bounded (e.g. nonnegative) or completely bounded (e.g. in the unit interval). In addition, the variables may have a point mass. We reduce the conditions on the underlying density to a minimum by proposing a nonparametric approach. By using a gamma, a beta, or a local linear kernel (also called boundary kernels), in a product kernel, the suggested estimator becomes simple in implementation and robust to the well known boundary bias problem. We investigate the mean integrated squared error properties, including the rate of convergence, uniform strong consistency and asymptotic normality. We establish consistency of the least squares cross-validation method to select optimal bandwidth parameters. A detailed simulation study investigates the performance of the estimators. Applications using lottery and corporate finance data are provided. 相似文献

13.

Performance of localized regression tree splitting criteria on data with discontinuities

Alexandra P. Bremner Ross H. Taplin 《Australian & New Zealand Journal of Statistics》2004,46(3):367-381

Properties of the localized regression tree splitting criterion, described in Bremner & Taplin (2002) and referred to as the BT method, are explored in this paper and compared to those of Clark & Pregibon's (1992) criterion (the CP method). These properties indicate why the BT method can result in superior trees. This paper shows that the BT method exhibits a weak bias towards edge splits, and the CP method exhibits a strong bias towards central splits in the presence of main effects. A third criterion, called the SM method, that exhibits no bias towards a particular split position is introduced. The SM method is a modification of the BT method that uses more symmetric local means. The BT and SM methods are more likely to split at a discontinuity than the CP method because of their relatively low bias towards particular split positions. The paper shows that the BT and SM methods can be used to discover discontinuities in the data, and that they offer a way of producing a variety of different trees for examination or for tree averaging methods. 相似文献

14.

A fourier integral density estimate: a Monte Carlo study

Bruce Bloxom 《统计学通讯:模拟与计算》2013,42(4):391-396

Davis (1977) proposed the use of a kernel density estimate which is the sample characteristic function integrated over (-A(n) , A(n)), where A(n) is chosen to minimize the mean integrated square error of the estimate. The scalar, A(n), is determined by the sample size and the population characteristic function. This paper investigates, in a Monte Carlo study, the mean integrated square error obtained under a procedure suggested by Davis (1977) for estimating A(n) when the population characteristic function is unknown. 相似文献

15.

An automatic clustering algorithm for probability density functions

《Journal of Statistical Computation and Simulation》2012,82(15):3047-3063

We propose an intuitive and computationally simple algorithm for clustering the probability density functions (pdfs). A data-driven learning mechanism is incorporated in the algorithm in order to determine the suitable widths of the clusters. The clustering results prove that the proposed algorithm is able to automatically group the pdfs and provide the optimal cluster number without any a priori information. The performance study also shows that the proposed algorithm is more efficient than existing ones. In addition, the clustering can serve as the intermediate compression tool in content-based multimedia retrieval that we apply the proposed algorithm to categorize a subset of COREL image database. And the clustering results indicate that the proposed algorithm performs well in colour image categorization. 相似文献

16.

Results of exploratory data analysis in the broken stick model

David Almorza M. Hortensia García 《Journal of applied statistics》2008,35(9):979-983

The broken stick model is a model of the abundance of species in a habitat, and it has been widely extended. In this paper, we present results from exploratory data analysis of this model. To obtain some of the statistics, we formulate the broken stick model as a probability distribution function based on the same model, and we provide an expression for the cumulative distribution function, which is needed to obtain the results from exploratory data analysis. The inequalities we present are useful in ecological studies that apply broken stick models. These results are also useful for testing the goodness of fit of the broken stick model as an alternative to the chi square test, which has often been the main test used. Therefore, these results may be used in several alternative and complementary ways for testing the goodness of fit of the broken stick model. 相似文献

17.

Kernel density estimation with binned data

David W. Scott Simon J. Sheather 《统计学通讯:理论与方法》2013,42(6):1353-1359

Continuous data are often measured or used in binned or rounded form. In this paper we follow up on Hall's work analyzing the effect of using equally-spaced binned data in a kernel density estimator. It is shown that a surprisingly large amount of binning does not adversely affect the integrated mean squared error of a kernel estimate. 相似文献

18.

Lp distance for kernel density estimator in length-biased data

Vahid Fakoor Raheleh Zamini 《统计学通讯:理论与方法》2017,46(18):9247-9264

相似文献

19.

Limiting bias-reduced Amoroso kernel density estimators for non-negative data

Gaku Igarashi Yoshihide Kakizawa 《统计学通讯:理论与方法》2018,47(20):4905-4937

The Amoroso kernel density estimator (Igarashi and Kakizawa 2017 Igarashi, G., and Y. Kakizawa. 2017. Amoroso kernel density estimation for nonnegative data and its bias reduction. Department of Policy and Planning Sciences Discussion Paper Series No. 1345, University of Tsukuba. [Google Scholar]) for non-negative data is boundary-bias-free and has the mean integrated squared error (MISE) of order O(n^{? 4/5}), where n is the sample size. In this paper, we construct a linear combination of the Amoroso kernel density estimator and its derivative with respect to the smoothing parameter. Also, we propose a related multiplicative estimator. We show that the MISEs of these bias-reduced estimators achieve the convergence rates n^{? 8/9}, if the underlying density is four times continuously differentiable. We illustrate the finite sample performance of the proposed estimators, through the simulations. 相似文献

20.

Inferactive data analysis

Nan Bi Jelena Markovic Lucy Xia Jonathan Taylor 《Scandinavian Journal of Statistics》2020,47(1):212-249

We describe inferactive data analysis, so-named to denote an interactive approach to data analysis with an emphasis on inference after data analysis. Our approach is a compromise between Tukey's exploratory and confirmatory data analysis allowing also for Bayesian data analysis. We see this as a useful step in concrete providing tools (with statistical guarantees) for current data scientists. The basis of inference we use is (a conditional approach to) selective inference, in particular its randomized form. The relevant reference distributions are constructed from what we call a DAG-DAG—a Data Analysis Generative DAG, and a selective change of variables formula is crucial to any practical implementation of inferactive data analysis via sampling these distributions. We discuss a canonical example of an incomplete cross-validation test statistic to discriminate between black box models, and a real HIV dataset example to illustrate inference after making multiple queries on data. 相似文献