首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
In non-parametric function estimation selection of a smoothing parameter is one of the most important issues. The performance of smoothing techniques depends highly on the choice of this parameter. Preferably the bandwidth should be determined via a data-driven procedure. In this paper we consider kernel estimators in a white noise model, and investigate whether locally adaptive plug-in bandwidths can achieve optimal global rates of convergence. We consider various classes of functions: Sobolev classes, bounded variation function classes, classes of convex functions and classes of monotone functions. We study the situations of pilot estimation with oversmoothing and without oversmoothing. Our main finding is that simple local plug-in bandwidth selectors can adapt to spatial inhomogeneity of the regression function as long as there are no local oscillations of high frequency. We establish the pointwise asymptotic distribution of the regression estimator with local plug-in bandwidth.  相似文献   

A bandwidth selection method that combines the concept of least-squares cross-validation and the plug-in approach is being introduced in connection with kernel density estimation. A simulation study reveals that this hybrid methodology outperforms some commonly used bandwidth selection rules. It is shown that the proposed approach can also be readily employed in the context of variable kernel density estimation. We conclude with two illustrative examples.  相似文献   

A data-driven bandwidth choice for a kernel density estimator called critical bandwidth is investigated. This procedure allows the estimation to have as many modes as assumed for the density to estimate. Both Gaussian and uniform kernels are considered. For the Gaussian kernel, asymptotic results are given. For the uniform kernel, an argument against these properties is mentioned. These theoretical results are illustrated with a simulation study that compares the kernel estimators that rely on critical bandwidth with another one that uses a plug-in method to select its bandwidth. An estimator that consists in estimates of density contour clusters and takes assumptions on number of modes into account is also considered. Finally, the methodology is illustrated using environment monitoring data.  相似文献   

ABSTRACT. This paper deals with kernel non-parametric estimation. The multiple kernel method, as proposed by Berlinet (1993), consists in choosing both the smoothing parameter and the order of the kernel function. In this paper we follow this general idea, and the selection is carried out by a combination of plug-in and cross-validation techniques. In a first attempt we give an asymptotic optimality theorem which is stated in a general unifying setting that includes many curve estimation problems. Then, as an illustration, it will be seen how this behaves in both special cases of kernel density and kernel regression estimation.  相似文献   

In this paper we study the ideal variable bandwidth kernel density estimator introduced by McKay (1993a, b) and Jones et al. (1994) and the plug-in practical version of the variable bandwidth kernel estimator with two sequences of bandwidths as in Giné and Sang (2013). Based on the bias and variance analysis of the ideal and plug-in variable bandwidth kernel density estimators, we study the central limit theorems for each of them. The simulation study confirms the central limit theorem and demonstrates the advantage of the plug-in variable bandwidth kernel method over the classical kernel method.  相似文献   

The geographical relative risk function is a useful tool for investigating the spatial distribution of disease based on case and control data. The most common way of estimating this function is using the ratio of bivariate kernel density estimates constructed from the locations of cases and controls, respectively. An alternative is to use a local-linear (LL) estimator of the log-relative risk function. In both cases, the choice of bandwidth is critical. In this article, we examine the relative performance of the two estimation techniques using a variety of data-driven bandwidth selection methods, including likelihood cross-validation (CV), least-squares CV, rule-of-thumb reference methods, and a new approximate plug-in (PI) bandwidth for the LL estimator. Our analysis includes the comparison of asymptotic results; a simulation study; and application of the estimators on two real data sets. Our findings suggest that the density ratio method implemented with the least-squares CV bandwidth selector is generally best, with the LL estimator with PI bandwidth being competitive in applications with strong large-scale trends but much worse in situations with elliptical clusters.  相似文献   

Abstract.  The problem of choosing the bandwidth h for kernel density estimation is considered. All the plug-in-type bandwidth selection methods require the use of a pilot bandwidth g . The usual way to make an h -dependent choice of g is by obtaining their asymptotic expressions separately and solving the two equations. In contrast, we obtain the asymptotically optimal value of g for every fixed h , thus making our selection 'less asymptotic'. Exact error expressions show that some usually assumed hypotheses have to be discarded in the asymptotic study in this case. Two versions of a new bandwidth selector based on this idea are proposed, and their properties are analysed through theoretical results and a simulation study.  相似文献   

Density level sets are mainly estimated using one of three methodologies: plug-in, excess mass, or a hybrid approach. The plug-in methods are based on replacing the unknown density by some nonparametric estimator, usually the kernel one. Thus, the bandwidth selection is a fundamental problem from an applied perspective. Recently, specific selectors for level sets have been proposed. However, if some a priori information about the geometry of the level set is available, then excess mass algorithms can be useful. In this case, the problem of bandwidth selection can be avoided. The third methodology is a hybrid of the others. It assumes a mild geometric restriction on the level set and it requires a pilot nonparametric estimator of the density. One interesting open question concerns the performance of these methods. In this work, existing methods are reviewed, and two new hybrid algorithms are proposed. Their practical behaviour is compared through extensive simulation study.  相似文献   

In this paper, we propose a robust bandwidth selection method for local M-estimates used in nonparametric regression. We study the asymptotic behavior of the resulting estimates. We use the results of a Monte Carlo study to compare the performance of various competitors for moderate samples sizes. It appears that the robust plug-in bandwidth selector we propose compares favorably to its competitors, despite the need to select a pilot bandwidth. The Monte Carlo study shows that the robust plug-in bandwidth selector is very stable and relatively insensitive to the choice of the pilot.  相似文献   

Statistical learning is emerging as a promising field where a number of algorithms from machine learning are interpreted as statistical methods and vice-versa. Due to good practical performance, boosting is one of the most studied machine learning techniques. We propose algorithms for multivariate density estimation and classification. They are generated by using the traditional kernel techniques as weak learners in boosting algorithms. Our algorithms take the form of multistep estimators, whose first step is a standard kernel method. Some strategies for bandwidth selection are also discussed with regard both to the standard kernel density classification problem, and to our 'boosted' kernel methods. Extensive experiments, using real and simulated data, show an encouraging practical relevance of the findings. Standard kernel methods are often outperformed by the first boosting iterations and in correspondence of several bandwidth values. In addition, the practical effectiveness of our classification algorithm is confirmed by a comparative study on two real datasets, the competitors being trees including AdaBoosting with trees.  相似文献   

The use of a kernel estimator as a smooth estimator for a distribution function has been suggested by many authors An expression for the bandwidth that minimizes the mean integrated square error asymptotically has been available for some time. However, few practical data based methods ior estimating this bandwidth have been investigated. In this paper we propose multisstage plug-in type estimater for this optimal bandwith and derive its asymptotic properties. In particular we show that two stages are required for good asymptotic properties. This behavior is verified for finite samples using a simulation study.  相似文献   

A plug-in the number of interior knots (NIKs) selector is proposed for polynomial spline estimation in nonparametric regression. The existence and properties of the optimal NIKs for spline regression are established by minimising the weighted mean integrated squared error. We obtain plug-in formulae for the optimal NIKs based on the theoretical results of asymptotic optimality, and develop strategies for choosing the NIKs of the spline estimator. The proposed NIKs selection method is tested on our simulated data with quite satisfactory performance, and is illustrated by analysing a fossil data set.  相似文献   

In order to explore and compare a finite number T of data sets by applying functional principal component analysis (FPCA) to the T associated probability density functions, we estimate these density functions by using the multivariate kernel method. The data set sizes being fixed, we study the behaviour of this FPCA under the assumption that all the bandwidth matrices used in the estimation of densities are proportional to a common parameter h and proportional to either the variance matrices or the identity matrix. In this context, we propose a selection criterion of the parameter h which depends only on the data and the FPCA method. Then, on simulated examples, we compare the quality of approximation of the FPCA when the bandwidth matrices are selected using either the previous criterion or two other classical bandwidth selection methods, that is, a plug-in or a cross-validation method.  相似文献   

This paper investigates nonparametric estimation of density on [0, 1]. The kernel estimator of density on [0, 1] has been found to be sensitive to both bandwidth and kernel. This paper proposes a unified Bayesian framework for choosing both the bandwidth and kernel function. In a simulation study, the Bayesian bandwidth estimator performed better than others, and kernel estimators were sensitive to the choice of the kernel and the shapes of the population densities on [0, 1]. The simulation and empirical results demonstrate that the methods proposed in this paper can improve the way the probability densities on [0, 1] are presently estimated.  相似文献   

Abstract.  The performance of multivariate kernel density estimates depends crucially on the choice of bandwidth matrix, but progress towards developing good bandwidth matrix selectors has been relatively slow. In particular, previous studies of cross-validation (CV) methods have been restricted to biased and unbiased CV selection of diagonal bandwidth matrices. However, for certain types of target density the use of full (i.e. unconstrained) bandwidth matrices offers the potential for significantly improved density estimation. In this paper, we generalize earlier work from diagonal to full bandwidth matrices, and develop a smooth cross-validation (SCV) methodology for multivariate data. We consider optimization of the SCV technique with respect to a pilot bandwidth matrix. All the CV methods are studied using asymptotic analysis, simulation experiments and real data analysis. The results suggest that SCV for full bandwidth matrices is the most reliable of the CV methods. We also observe that experience from the univariate setting can sometimes be a misleading guide for understanding bandwidth selection in the multivariate case.  相似文献   

On the one hand, kernel density estimation has become a common tool for empirical studies in any research area. This goes hand in hand with the fact that this kind of estimator is now provided by many software packages. On the other hand, since about three decades the discussion on bandwidth selection has been going on. Although a good part of the discussion is about nonparametric regression, this parameter choice is by no means less problematic for density estimation. This becomes obvious when reading empirical studies in which practitioners have made use of kernel densities. New contributions typically provide simulations only to show that the own selector outperforms some of the existing methods. We review existing methods and compare them on a set of designs that exhibit few bumps and exponentially falling tails. We concentrate on small and moderate sample sizes because for large ones the differences between consistent methods are often negligible, at least for practitioners. As a byproduct we find that a mixture of simple plug-in and cross-validation methods produces bandwidths with a quite stable performance.  相似文献   

The existence and properties of optimal bandwidths for multivariate local linear regression are established, using either a scalar bandwidth for all regressors or a diagonal bandwidth vector that has a different bandwidth for each regressor. Both involve functionals of the derivatives of the unknown multivariate regression function. Estimating these functionals is difficult primarily because they contain multivariate derivatives. In this paper, an estimator of the multivariate second derivative is obtained via local cubic regression with most cross-terms left out. This estimator has the optimal rate of convergence but is simpler and uses much less computing time than the full local estimator. Using this as a pilot estimator, we obtain plug-in formulae for the optimal bandwidth, both scalar and diagonal, for multivariate local linear regression. As a simpler alternative, we also provide rule-of-thumb bandwidth selectors. All these bandwidths have satisfactory performance in our simulation study.  相似文献   

Interval-grouped data are defined, in general, when the event of interest cannot be directly observed and it is only known to have been occurred within an interval. In this framework, a nonparametric kernel density estimator is proposed and studied. The approach is based on the classical Parzen–Rosenblatt estimator and on the generalisation of the binned kernel density estimator. The asymptotic bias and variance of the proposed estimator are derived under usual assumptions, and the effect of using non-equally spaced grouped data is analysed. Additionally, a plug-in bandwidth selector is proposed. Through a comprehensive simulation study, the behaviour of both the estimator and the plug-in bandwidth selector considering different scenarios of data grouping is shown. An application to real data confirms the simulation results, revealing the good performance of the estimator whenever data are not heavily grouped.  相似文献   

Consider a regression model where the regression function is the sum of a linear and a nonparametric component. Assuming that the errors of the model follow a stationary strong mixing process with mean zero, the problem of bandwidth selection for a kernel estimator of the nonparametric component is addressed here. We obtain an asymptotic expression for an optimal band-width and we propose to use a plug-in methodology in order to estimate this bandwidth through preliminary estimates of the unknown quantities. Asymptotic optimality for the plug-in bandwidth is established.  相似文献   

Integrated squared density derivatives are important to the plug-in type of bandwidth selector for kernel density estimation. Conventional estimators of these quantities are inefficient when there is a non-smooth boundary in the support of the density. We introduce estimators that utilize density derivative estimators obtained from local polynomial fitting. They retain the rates of convergence in mean-squared error that are familiar from non-boundary cases, and the constant coefficients have similar forms. The estimators and the formula for their asymptotically optimal bandwidths, which depend on integrated products of density derivatives, are applied to automatic bandwidth selection for local linear density estimation. Simulation studies show that the constructed bandwidth rule and the Sheather–Jones bandwidth are competitive in non-boundary cases, but the former overcomes boundary problems whereas the latter does not.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号