Kernel estimation of probability density functions is considered when ranked-set samples are available. The properties of the resulting estimators are derived for small and large samples, while performance with respect to the usual simple random sample estimators is investigated for a range of probability density models.  相似文献   

Since the late 1980s, several methods have been considered in the literature to reduce the sample variability of the least-squares cross-validation bandwidth selector for kernel density estimation. In this article, a weighted version of this classical method is proposed and its asymptotic and finite-sample behavior is studied. The simulation results attest that the weighted cross-validation bandwidth performs quite well, presenting a better finite-sample performance than the standard cross-validation method for “easy-to-estimate” densities, and retaining the good finite-sample performance of the standard cross-validation method for “hard-to-estimate” ones.  相似文献   

Summary.  Compared with the classical backfitting of Buja, Hastie and Tibshirani, the smooth backfitting estimator (SBE) of Mammen, Linton and Nielsen not only provides complete asymptotic theory under weaker conditions but is also more efficient, robust and easier to calculate. However, the original paper describing the SBE method is complex and the practical as well as the theoretical advantages of the method have still neither been recognized nor accepted by the statistical community. We focus on a clear presentation of the idea, the main theoretical results and practical aspects like implementation and simplification of the algorithm. We introduce a feasible cross-validation procedure and apply it to the problem of data-driven bandwidth choice for the SBE. By simulations it is shown that the SBE and our cross-validation work very well indeed. In particular, the SBE is less affected by sparseness of data in high dimensional regression problems or strongly correlated designs. The SBE has reasonable performance even in 100-dimensional additive regression problems.  相似文献   

We propose a method for simultaneously estimating a density function and its derivatives based on a recursive reduction of bias in the usual one-step estimator—in effect, a jackknife. The procedure is computationally simple and requires only the inversion of a triangular matrix with easily-calculated elements.  相似文献   

Local quasi-likelihood estimation is a useful extension of local least squares methods, but its computational cost and algorithmic convergence problems make the procedure less appealing, particularly when it is iteratively used in methods such as the back-fitting algorithm, cross-validation and bootstrapping. A one-step local quasi-likelihood estimator is introduced to overcome the computational drawbacks of the local quasi-likelihood method. We demonstrate that as long as the initial estimators are reasonably good, the one-step estimator has the same asymptotic behaviour as the local quasi-likelihood method. Our simulation shows that the one-step estimator performs at least as well as the local quasi-likelihood method for a wide range of choices of bandwidths. A data-driven bandwidth selector is proposed for the one-step estimator based on the pre-asymptotic substitution method of Fan and Gijbels. It is then demonstrated that the data-driven one-step local quasi-likelihood estimator performs as well as the maximum local quasi-likelihood estimator by using the ideal optimal bandwidth.  相似文献   

The performances of data-driven bandwidth selection procedures in local polynomial regression are investigated by using asymptotic methods and simulation. The bandwidth selection procedures considered are based on minimizing 'prelimit' approximations to the (conditional) mean-squared error (MSE) when the MSE is considered as a function of the bandwidth h . We first consider approximations to the MSE that are based on Taylor expansions around h=0 of the bias part of the MSE. These approximations lead to estimators of the MSE that are accurate only for small bandwidths h . We also consider a bias estimator which instead of using small h approximations to bias naïvely estimates bias as the difference of two local polynomial estimators of different order and we show that this estimator performs well only for moderate to large h . We next define a hybrid bias estimator which equals the Taylor-expansion-based estimator for small h and the difference estimator for moderate to large h . We find that the MSE estimator based on this hybrid bias estimator leads to a bandwidth selection procedure with good asymptotic and, for our Monte Carlo examples, finite sample properties.  相似文献   

Partitioned cross-validation is proposed as a method for overcoming the large amounts of across sample variability to which ordinary cross-validation is subject. The price for cutting down on the sample noise is that a type of bias is intriduced. A theory is presented for optimal trade-off of this variance and bias. Comparison with other bandwidth selection methods is given.  相似文献   

The expected inactivity time (EIT) function (also known as the mean past lifetime function) is a well known reliability function which has application in many disciplines such as survival analysis, actuarial studies and forensic science, to name but a few. In this paper, we use a fixed design local polynomial fitting technique to obtain estimators for the EIT function when the lifetime random variable has an unknown distribution. It will be shown that the proposed estimators are asymptotically unbiased, consistent and also, when standardized, has an asymptotic normal distribution. An optimal bandwidth, which minimizes the AMISE (asymptotic mean integrated squared error) of the estimator, is derived. Numerical examples based on simulated samples from various lifetime distributions common in reliability studies will be presented to evaluate the performances of these estimators. Finally, three real life applications will also be presented to further illustrate the wide applicability of these estimators.  相似文献   

Nonparametric regression techniques have been studied extensively in the literature in recent years due to their flexibility.In addition robust versions of these techniques have become popular and have been incorporated into some of the standard statistical analysis packages.With new techniques available comes the responsibility of using them properly and in appropriate situations. Often, as in the case presented here, model-fitting diagnostics, such as cross-validation statistics,are not available as tools to determine if the smoothing parameter value being used is preferable to some other arbitrarily chosen value.  相似文献   

Our article presents a general treatment of the linear regression model, in which the error distribution is modelled nonparametrically and the error variances may be heteroscedastic, thus eliminating the need to transform the dependent variable in many data sets. The mean and variance components of the model may be either parametric or nonparametric, with parsimony achieved through variable selection and model averaging. A Bayesian approach is used for inference with priors that are data-based so that estimation can be carried out automatically with minimal input by the user. A Dirichlet process mixture prior is used to model the error distribution nonparametrically; when there are no regressors in the model, the method reduces to Bayesian density estimation, and we show that in this case the estimator compares favourably with a well-regarded plug-in density estimator. We also consider a method for checking the fit of the full model. The methodology is applied to a number of simulated and real examples and is shown to work well.  相似文献   

In Kernel density estimation, a criticism of bandwidth selection techniques which minimize squared error expressions is that they perform poorly when estimating tails of probability density functions. Techniques minimizing absolute error expressions are thought to result in more uniform performance and be potentially superior. An asympotic mean absolute error expression for nonparametric kernel density estimators from right-censored data is developed here. This expression is used to obtain local and global bandwidths that are optimal in the sense that they minimize asymptotic mean absolute error and integrated asymptotic mean absolute error, respectively. These estimators are illustrated fro eight data sets from known distributions. Computer simulation results are discussed, comparing the estimation methods with squared-error-based bandwidth selection for right-censored data.  相似文献   

Selecting predictors to optimize the outcome prediction is an important statistical method. However, it usually ignores the false positives in the selected predictors. In this article, we advocate a conventional stepwise forward variable selection method based on the predicted residual sum of squares, and develop a positive false discovery rate (pFDR) estimate for the selected predictor subset, and a local pFDR estimate to prioritize the selected predictors. This pFDR estimate takes account of the existence of non null predictors, and is proved to be asymptotically conservative. In addition, we propose two views of a variable selection process: an overall and an individual test. An interesting feature of the overall test is that its power of selecting non null predictors increases with the proportion of non null predictors among all candidate predictors. Data analysis is illustrated with an example, in which genetic and clinical predictors were selected to predict the cholesterol level change after four months of tamoxifen treatment, and pFDR was estimated. Our method's performance is evaluated through statistical simulations.  相似文献   

Spatial point pattern data sets are commonplace in a variety of different research disciplines. The use of kernel methods to smooth such data is a flexible way to explore spatial trends and make inference about underlying processes without, or perhaps prior to, the design and fitting of more intricate semiparametric or parametric models to quantify specific effects. The long-standing issue of ‘optimal’ data-driven bandwidth selection is complicated in these settings by issues such as high heterogeneity in observed patterns and the need to consider edge correction factors. We scrutinize bandwidth selectors built on leave-one-out cross-validation approximation to likelihood functions. A key outcome relates to previously unconsidered adaptive smoothing regimens for spatiotemporal density and multitype conditional probability surface estimation, whereby we propose a novel simultaneous pilot-global selection strategy. Motivated by applications in epidemiology, the results of both simulated and real-world analyses suggest this strategy to be largely preferable to classical fixed-bandwidth estimation for such data.  相似文献   

In this note, we propose a new method for selecting the bandwidth parameter in non-parametric regression. While standard criteria, such as cross-validation, are based on the true regression curve about which we know little, we propose a criterion which focuses on the true errors about which assumptions may be made. Our proposal is to choose the bandwidth for which the residuals are as uncorrelated as possible. We use the Box-Pierce statistic as the objective to be minimized. In doing so, the behaviour of our residuals will be close to that of the true errors under the hypothesis of independent errors. A simulation study shows that our method succeeds in capturing the main features of the regression curve, in the sense that the number of turning-points of the curve is correctly estimated most of the time.  相似文献   

A linear regression method to predict a scalar from a discretized smooth function is presented. The method takes into account the functional nature of the predictors and the importance of the second derivative in spectroscopic applications. This motivates a functional inner product that can be used as a roughness penalty. Using this inner product, we derive a linear prediction method that is similar to ridge regression but with different shrinkage characteristics. We describe its practical implementation and we address the problem of computing the second derivatives nonparametrically. We apply the method to a calibration example using near infra-red spectra. We conclude with a discussion comparing our approach with other regression algorithms.  相似文献   

We propose a new nonparametric estimator for the density function of multivariate bounded data. As frequently observed in practice, the variables may be partially bounded (e.g. nonnegative) or completely bounded (e.g. in the unit interval). In addition, the variables may have a point mass. We reduce the conditions on the underlying density to a minimum by proposing a nonparametric approach. By using a gamma, a beta, or a local linear kernel (also called boundary kernels), in a product kernel, the suggested estimator becomes simple in implementation and robust to the well known boundary bias problem. We investigate the mean integrated squared error properties, including the rate of convergence, uniform strong consistency and asymptotic normality. We establish consistency of the least squares cross-validation method to select optimal bandwidth parameters. A detailed simulation study investigates the performance of the estimators. Applications using lottery and corporate finance data are provided.  相似文献   

The mean squared error (MSE)-minimizing local variable bandwidth for the univariate local linear estimator (the LL) is well-known. This bandwidth does not stabilize variance over the domain. Moreover, in regions where a regression function has zero curvature, the LL estimator is discontinuous. In this paper, we propose a variance-stabilizing (VS) local variable diagonal bandwidth matrix for the multivariate LL estimator. Theoretically, the VS bandwidth can outperform the multivariate extension of the MSE-minimizing local variable scalar bandwidth in terms of asymptotic mean integrated squared error and can avoid discontinuity created by the MSE-minimizing bandwidth. We present an algorithm for estimating the VS bandwidth and simulation studies.  相似文献   

