首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 640 毫秒
1.
Kernel-based density estimation algorithms are inefficient in presence of discontinuities at support endpoints. This is substantially due to the fact that classic kernel density estimators lead to positive estimates beyond the endopoints. If a nonparametric estimate of a density functional is required in determining the bandwidth, then the problem also affects the bandwidth selection procedure. In this paper algorithms for bandwidth selection and kernel density estimation are proposed for non-negative random variables. Furthermore, the methods we propose are compared with some of the principal solutions in the literature through a simulation study.  相似文献   

2.
As conventional cross-validation bandwidth selection methods do not work properly in the situation where the data are serially dependent time series, alternative bandwidth selection methods are necessary. In recent years, Bayesian-based methods for global bandwidth selection have been studied. Our experience shows that a global bandwidth is however less suitable than a localized bandwidth in kernel density estimation based on serially dependent time series data. Nonetheless, a di?cult issue is how we can consistently estimate a localized bandwidth. This paper presents a nonparametric localized bandwidth estimator, for which we establish a completely new asymptotic theory. Applications of this new bandwidth estimator to the kernel density estimation of Eurodollar deposit rate and the S&P 500 daily return demonstrate the effectiveness and competitiveness of the proposed localized bandwidth.  相似文献   

3.
In Kernel density estimation, a criticism of bandwidth selection techniques which minimize squared error expressions is that they perform poorly when estimating tails of probability density functions. Techniques minimizing absolute error expressions are thought to result in more uniform performance and be potentially superior. An asympotic mean absolute error expression for nonparametric kernel density estimators from right-censored data is developed here. This expression is used to obtain local and global bandwidths that are optimal in the sense that they minimize asymptotic mean absolute error and integrated asymptotic mean absolute error, respectively. These estimators are illustrated fro eight data sets from known distributions. Computer simulation results are discussed, comparing the estimation methods with squared-error-based bandwidth selection for right-censored data.  相似文献   

4.
Statistical learning is emerging as a promising field where a number of algorithms from machine learning are interpreted as statistical methods and vice-versa. Due to good practical performance, boosting is one of the most studied machine learning techniques. We propose algorithms for multivariate density estimation and classification. They are generated by using the traditional kernel techniques as weak learners in boosting algorithms. Our algorithms take the form of multistep estimators, whose first step is a standard kernel method. Some strategies for bandwidth selection are also discussed with regard both to the standard kernel density classification problem, and to our 'boosted' kernel methods. Extensive experiments, using real and simulated data, show an encouraging practical relevance of the findings. Standard kernel methods are often outperformed by the first boosting iterations and in correspondence of several bandwidth values. In addition, the practical effectiveness of our classification algorithm is confirmed by a comparative study on two real datasets, the competitors being trees including AdaBoosting with trees.  相似文献   

5.
Nonparametric density estimation in the presence of measurement error is considered. The usual kernel deconvolution estimator seeks to account for the contamination in the data by employing a modified kernel. In this paper a new approach based on a weighted kernel density estimator is proposed. Theoretical motivation is provided by the existence of a weight vector that perfectly counteracts the bias in density estimation without generating an excessive increase in variance. In practice a data driven method of weight selection is required. Our strategy is to minimize the discrepancy between a standard kernel estimate from the contaminated data on the one hand, and the convolution of the weighted deconvolution estimate with the measurement error density on the other hand. We consider a direct implementation of this approach, in which the weights are optimized subject to sum and non-negativity constraints, and a regularized version in which the objective function includes a ridge-type penalty. Numerical tests suggest that the weighted kernel estimation can lead to tangible improvements in performance over the usual kernel deconvolution estimator. Furthermore, weighted kernel estimates are free from the problem of negative estimation in the tails that can occur when using modified kernels. The weighted kernel approach generalizes to the case of multivariate deconvolution density estimation in a very straightforward manner.  相似文献   

6.
We consider the problem of density estimation when the data is in the form of a continuous stream with no fixed length. In this setting, implementations of the usual methods of density estimation such as kernel density estimation are problematic. We propose a method of density estimation for massive datasets that is based upon taking the derivative of a smooth curve that has been fit through a set of quantile estimates. To achieve this, a low-storage, single-pass, sequential method is proposed for simultaneous estimation of multiple quantiles for massive datasets that form the basis of this method of density estimation. For comparison, we also consider a sequential kernel density estimator. The proposed methods are shown through simulation study to perform well and to have several distinct advantages over existing methods.  相似文献   

7.
On the one hand, kernel density estimation has become a common tool for empirical studies in any research area. This goes hand in hand with the fact that this kind of estimator is now provided by many software packages. On the other hand, since about three decades the discussion on bandwidth selection has been going on. Although a good part of the discussion is about nonparametric regression, this parameter choice is by no means less problematic for density estimation. This becomes obvious when reading empirical studies in which practitioners have made use of kernel densities. New contributions typically provide simulations only to show that the own selector outperforms some of the existing methods. We review existing methods and compare them on a set of designs that exhibit few bumps and exponentially falling tails. We concentrate on small and moderate sample sizes because for large ones the differences between consistent methods are often negligible, at least for practitioners. As a byproduct we find that a mixture of simple plug-in and cross-validation methods produces bandwidths with a quite stable performance.  相似文献   

8.
Length-biased data are a particular case of weighted data, which arise in many situations: biomedicine, quality control or epidemiology among others. In this paper we study the theoretical properties of kernel density estimation in the context of length-biased data, proposing two consistent bootstrap methods that we use for bandwidth selection. Apart from the bootstrap bandwidth selectors we suggest a rule-of-thumb. These bandwidth selection proposals are compared with a least-squares cross-validation method. A simulation study is accomplished to understand the behaviour of the procedures in finite samples.  相似文献   

9.
A bandwidth selection method that combines the concept of least-squares cross-validation and the plug-in approach is being introduced in connection with kernel density estimation. A simulation study reveals that this hybrid methodology outperforms some commonly used bandwidth selection rules. It is shown that the proposed approach can also be readily employed in the context of variable kernel density estimation. We conclude with two illustrative examples.  相似文献   

10.
Kernel smoothing of spatial point data can often be improved using an adaptive, spatially varying bandwidth instead of a fixed bandwidth. However, computation with a varying bandwidth is much more demanding, especially when edge correction and bandwidth selection are involved. This paper proposes several new computational methods for adaptive kernel estimation from spatial point pattern data. A key idea is that a variable-bandwidth kernel estimator for d-dimensional spatial data can be represented as a slice of a fixed-bandwidth kernel estimator in \((d+1)\)-dimensional scale space, enabling fast computation using Fourier transforms. Edge correction factors have a similar representation. Different values of global bandwidth correspond to different slices of the scale space, so that bandwidth selection is greatly accelerated. Potential applications include estimation of multivariate probability density and spatial or spatiotemporal point process intensity, relative risk, and regression functions. The new methods perform well in simulations and in two real applications concerning the spatial epidemiology of primary biliary cirrhosis and the alarm calls of capuchin monkeys.  相似文献   

11.
A great deal of research has focused on improving the bias properties of kernel estimators. One proposal involves removing the restriction of non-negativity on the kernel to construct “higher-order” kernels that eliminate additional terms in the Taylor's series expansion of the bias. This paper considers an alternative that uses a local approach to bandwidth selection to not only reduce the bias, but to eliminate it entirely. These so-called “zero-bias bandwidths” are shown to exist for univariate and multivariate kernel density estimation as well as kernel regression. Implications of the existence of such bandwidths are discussed. An estimation strategy is presented, and the extent of the reduction or elimination of bias in practice is studied through simulation and example.  相似文献   

12.
ABSTRACT. This paper deals with kernel non-parametric estimation. The multiple kernel method, as proposed by Berlinet (1993), consists in choosing both the smoothing parameter and the order of the kernel function. In this paper we follow this general idea, and the selection is carried out by a combination of plug-in and cross-validation techniques. In a first attempt we give an asymptotic optimality theorem which is stated in a general unifying setting that includes many curve estimation problems. Then, as an illustration, it will be seen how this behaves in both special cases of kernel density and kernel regression estimation.  相似文献   

13.
On boundary correction in kernel density estimation   总被引:1,自引:0,他引:1  
It is well known now that kernel density estimators are not consistent when estimating a density near the finite end points of the support of the density to be estimated. This is due to boundary effects that occur in nonparametric curve estimation problems. A number of proposals have been made in the kernel density estimation context with some success. As of yet there appears to be no single dominating solution that corrects the boundary problem for all shapes of densities. In this paper, we propose a new general method of boundary correction for univariate kernel density estimation. The proposed method generates a class of boundary corrected estimators. They all possess desirable properties such as local adaptivity and non-negativity. In simulation, it is observed that the proposed method perform quite well when compared with other existing methods available in the literature for most shapes of densities, showing a very important robustness property of the method. The theory behind the new approach and the bias and variance of the proposed estimators are given. Results of a data analysis are also given.  相似文献   

14.
Kernel density classification and boosting: an L2 analysis   总被引:1,自引:0,他引:1  
Kernel density estimation is a commonly used approach to classification. However, most of the theoretical results for kernel methods apply to estimation per se and not necessarily to classification. In this paper we show that when estimating the difference between two densities, the optimal smoothing parameters are increasing functions of the sample size of the complementary group, and we provide a small simluation study which examines the relative performance of kernel density methods when the final goal is classification.A relative newcomer to the classification portfolio is boosting, and this paper proposes an algorithm for boosting kernel density classifiers. We note that boosting is closely linked to a previously proposed method of bias reduction in kernel density estimation and indicate how it will enjoy similar properties for classification. We show that boosting kernel classifiers reduces the bias whilst only slightly increasing the variance, with an overall reduction in error. Numerical examples and simulations are used to illustrate the findings, and we also suggest further areas of research.  相似文献   

15.
Abstract.  The performance of multivariate kernel density estimates depends crucially on the choice of bandwidth matrix, but progress towards developing good bandwidth matrix selectors has been relatively slow. In particular, previous studies of cross-validation (CV) methods have been restricted to biased and unbiased CV selection of diagonal bandwidth matrices. However, for certain types of target density the use of full (i.e. unconstrained) bandwidth matrices offers the potential for significantly improved density estimation. In this paper, we generalize earlier work from diagonal to full bandwidth matrices, and develop a smooth cross-validation (SCV) methodology for multivariate data. We consider optimization of the SCV technique with respect to a pilot bandwidth matrix. All the CV methods are studied using asymptotic analysis, simulation experiments and real data analysis. The results suggest that SCV for full bandwidth matrices is the most reliable of the CV methods. We also observe that experience from the univariate setting can sometimes be a misleading guide for understanding bandwidth selection in the multivariate case.  相似文献   

16.
We propose a modification to the regular kernel density estimation method that use asymmetric kernels to circumvent the spill over problem for densities with positive support. First a pivoting method is introduced for placement of the data relative to the kernel function. This yields a strongly consistent density estimator that integrates to one for each fixed bandwidth in contrast to most density estimators based on asymmetric kernels proposed in the literature. Then a data-driven Bayesian local bandwidth selection method is presented and lognormal, gamma, Weibull and inverse Gaussian kernels are discussed as useful special cases. Simulation results and a real-data example illustrate the advantages of the new methodology.  相似文献   

17.
Abstract

In this work, we propose beta prime kernel estimator for estimation of a probability density functions defined with nonnegative support. For the proposed estimator, beta prime probability density function used as a kernel. It is free of boundary bias and nonnegative with a natural varying shape. We obtained the optimal rate of convergence for the mean squared error (MSE) and the mean integrated squared error (MISE). Also, we use adaptive Bayesian bandwidth selection method with Lindley approximation for heavy tailed distributions and compare its performance with the global least squares cross-validation bandwidth selection method. Simulation studies are performed to evaluate the average integrated squared error (ISE) of the proposed kernel estimator against some asymmetric competitors using Monte Carlo simulations. Moreover, real data sets are presented to illustrate the findings.  相似文献   

18.
Spatial point pattern data sets are commonplace in a variety of different research disciplines. The use of kernel methods to smooth such data is a flexible way to explore spatial trends and make inference about underlying processes without, or perhaps prior to, the design and fitting of more intricate semiparametric or parametric models to quantify specific effects. The long-standing issue of ‘optimal’ data-driven bandwidth selection is complicated in these settings by issues such as high heterogeneity in observed patterns and the need to consider edge correction factors. We scrutinize bandwidth selectors built on leave-one-out cross-validation approximation to likelihood functions. A key outcome relates to previously unconsidered adaptive smoothing regimens for spatiotemporal density and multitype conditional probability surface estimation, whereby we propose a novel simultaneous pilot-global selection strategy. Motivated by applications in epidemiology, the results of both simulated and real-world analyses suggest this strategy to be largely preferable to classical fixed-bandwidth estimation for such data.  相似文献   

19.
Abstract.  The problem of choosing the bandwidth h for kernel density estimation is considered. All the plug-in-type bandwidth selection methods require the use of a pilot bandwidth g . The usual way to make an h -dependent choice of g is by obtaining their asymptotic expressions separately and solving the two equations. In contrast, we obtain the asymptotically optimal value of g for every fixed h , thus making our selection 'less asymptotic'. Exact error expressions show that some usually assumed hypotheses have to be discarded in the asymptotic study in this case. Two versions of a new bandwidth selector based on this idea are proposed, and their properties are analysed through theoretical results and a simulation study.  相似文献   

20.
In this paper the use of three kernel-based nonparametric forecasting methods - the conditional mean, the conditional median, and the conditional mode -is explored in detail. Several issues related to the estimation of these methods are discussed, including the choice of the bandwidth and the type of kernel function. The out-of-sample forecasting performance of the three nonparametric methods is investigated using 60 real time series. We find that there is no superior forecast method for series having approximately less than 100 observations. However, when a time series is long or when its conditional density is bimodal there is quite a difference between the forecasting performance of the three kernel-based forecasting methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号