首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
The existence and properties of optimal bandwidths for multivariate local linear regression are established, using either a scalar bandwidth for all regressors or a diagonal bandwidth vector that has a different bandwidth for each regressor. Both involve functionals of the derivatives of the unknown multivariate regression function. Estimating these functionals is difficult primarily because they contain multivariate derivatives. In this paper, an estimator of the multivariate second derivative is obtained via local cubic regression with most cross-terms left out. This estimator has the optimal rate of convergence but is simpler and uses much less computing time than the full local estimator. Using this as a pilot estimator, we obtain plug-in formulae for the optimal bandwidth, both scalar and diagonal, for multivariate local linear regression. As a simpler alternative, we also provide rule-of-thumb bandwidth selectors. All these bandwidths have satisfactory performance in our simulation study.  相似文献   


The most important factor in kernel regression is a choice of a bandwidth. Considerable attention has been paid to extension the idea of an iterative method known for a kernel density estimate to kernel regression. Data-driven selectors of the bandwidth for kernel regression are considered. The proposed method is based on an optimally balanced relation between the integrated variance and the integrated square bias. This approach leads to an iterative quadratically convergent process. The analysis of statistical properties shows the rationale of the proposed method. In order to see statistical properties of this method the consistency is determined. The utility of the method is illustrated through a simulation study and real data applications.  相似文献   

Many biological experiments involve data whose distribution belongs to the exponential family. Such data are often analysed using generalised linear models but this method requires specification of the link function which can have strong influence on the resulting estimate. Instead a local method based on quasi-likelihood can be used, but the choice of the smoothing parameter is crucial for its performance. A bootstrap bandwidth selection method is proposed and shown to be consistent. Examples of application to data from biological and psychometric experiments are given.  相似文献   

Approximate confidence intervals are given for the lognormal regression problem. The error in the nominal level can be reduced to O(n ?2), where n is the sample size. An alternative procedure is given which avoids the non-robust assumption of lognormality. This amounts to finding a confidence interval based on M-estimates for a general smooth function of both ? and F, where ? are the parameters of the general (possibly nonlinear) regression problem and F is the unknown distribution function of the residuals. The derived intervals are compared using theory, simulation and real data sets.  相似文献   

A bandwidth selection method that combines the concept of least-squares cross-validation and the plug-in approach is being introduced in connection with kernel density estimation. A simulation study reveals that this hybrid methodology outperforms some commonly used bandwidth selection rules. It is shown that the proposed approach can also be readily employed in the context of variable kernel density estimation. We conclude with two illustrative examples.  相似文献   

As conventional cross-validation bandwidth selection methods do not work properly in the situation where the data are serially dependent time series, alternative bandwidth selection methods are necessary. In recent years, Bayesian-based methods for global bandwidth selection have been studied. Our experience shows that a global bandwidth is however less suitable than a localized bandwidth in kernel density estimation based on serially dependent time series data. Nonetheless, a di?cult issue is how we can consistently estimate a localized bandwidth. This paper presents a nonparametric localized bandwidth estimator, for which we establish a completely new asymptotic theory. Applications of this new bandwidth estimator to the kernel density estimation of Eurodollar deposit rate and the S&P 500 daily return demonstrate the effectiveness and competitiveness of the proposed localized bandwidth.  相似文献   

Global sensitivity analysis (GSA) can help practitioners focusing on the inputs whose uncertainties have an impact on the model output, which allows reducing the complexity of the model. Screening, as the qualitative method of GSA, is to identify and exclude non- or less-influential input variables in high-dimensional models. However, for non-parametric problems, there remains the challenging problem of finding an efficient screening procedure, as one needs to properly handle the non-parametric high-order interactions among input variables and keep the size of the screening experiment economically feasible. In this study, we design a novel screening approach based on analysis of variance decomposition of the model. This approach combines the virtues of run-size economy and model independence. The core idea is to choose a low-level complete orthogonal array to derive the sensitivity estimates for all input factors and their interactions with low cost, and then develop a statistical process to screen out the non-influential ones without assuming the effect-sparsity of the model. Simulation studies show that the proposed approach performs well in various settings.  相似文献   

In this paper, a new non-parametric multivariate exponentially weighted moving average (NMEWMA) sign chart is proposed for monitoring the process dispersion. The run length characteristics of the NMEWMA sign chart are computed with the help of Markov chain and Monte Carlo simulations. Moreover, the NMEWMA sign chart is also used to detect changes in the process mean and dispersion simultaneously. An illustrative example is also used to explain the implementation of proposed control chart.  相似文献   

A new, fully data-driven bandwidth selector with a double smoothing (DS) bias term and a data-driven variance estimator is developed following the bootstrap idea. The data-driven variance estimation does not involve any additional bandwidth selection. The proposed bandwidth selector convergences faster than a plug-in one due to the DS bias estimate, whereas the data-driven variance improves its finite sample performance clearly and makes it stable. Asymptotic results of the proposals are obtained. A comparative simulation study was done to show the overall gains and the gains obtained by improving either the bias term or the variance estimate, respectively. It is shown that the use of a good variance estimator is more important when the sample size is relatively small.  相似文献   

Let f?n, h denote the kernel density estimate based on a sample of size n drawn from an unknown density f. Using techniques from L2 projection density estimators, the author shows how to construct a data-driven estimator f?n, h which satisfies This paper is inspired by work of Stone (1984), Devroye and Lugosi (1996) and Birge and Massart (1997).  相似文献   

A computationally simple method for estimating finite-population quantiles in the presence of auxiliary information is proposed. An algorithm is also found for implementing related approaches for estimating quantiles, including that of Rao et al. (1990), obtained from inverting difference-type estimators of the distribution function. The proposed estimation procedure can be seen as a one-step iteration of the suggested algorithm and is asymptotically equivalent to the limiting estimator. In particular, the proposed method yields a simple and efficient way of approximating Rao et al.'s estimator. Simulation studies based on two real populations show that the approximation can be very satisfactory even for small to moderate samples.  相似文献   

Consider a regression model where the regression function is the sum of a linear and a nonparametric component. Assuming that the errors of the model follow a stationary strong mixing process with mean zero, the problem of bandwidth selection for a kernel estimator of the nonparametric component is addressed here. We obtain an asymptotic expression for an optimal band-width and we propose to use a plug-in methodology in order to estimate this bandwidth through preliminary estimates of the unknown quantities. Asymptotic optimality for the plug-in bandwidth is established.  相似文献   

Summary. The paper presents a general strategy for selecting the bandwidth of nonparametric regression estimators and specializes it to local linear regression smoothers. The procedure requires the sample to be divided into a training sample and a testing sample. Using the training sample we first compute a family of regression smoothers indexed by their bandwidths. Next we select the bandwidth by minimizing the empirical quadratic prediction error on the testing sample. The resulting bandwidth satisfies a finite sample oracle inequality which holds for all bounded regression functions. This permits asymptotically optimal estimation for nearly any regression function. The practical performance of the method is illustrated by a simulation study which shows good finite sample behaviour of our method compared with other bandwidth selection procedures.  相似文献   


The literature on spurious regressions has found that the t-statistic for testing the null of no relationship between two independent variables diverges asymptotically under a wide variety of non stationary data-generating processes for the dependent and explanatory variables. This paper introduces a simple method which guarantees convergence of this t-statistic to a pivotal limit distribution, thus allowing asymptotic inference. This method can be used to distinguish a genuine relationship from a spurious one among integrated processes. We apply the proposed procedure to several pairs of apparently independent integrated variables, and find that our procedure does not find (spurious) significant relationships.  相似文献   

It is common practice to compare the fit of non‐nested models using the Akaike (AIC) or Bayesian (BIC) information criteria. The basis of these criteria is the log‐likelihood evaluated at the maximum likelihood estimates of the unknown parameters. For the general linear model (and the linear mixed model, which is a special case), estimation is usually carried out using residual or restricted maximum likelihood (REML). However, for models with different fixed effects, the residual likelihoods are not comparable and hence information criteria based on the residual likelihood cannot be used. For model selection, it is often suggested that the models are refitted using maximum likelihood to enable the criteria to be used. The first aim of this paper is to highlight that both the AIC and BIC can be used for the general linear model by using the full log‐likelihood evaluated at the REML estimates. The second aim is to provide a derivation of the criteria under REML estimation. This aim is achieved by noting that the full likelihood can be decomposed into a marginal (residual) and conditional likelihood and this decomposition then incorporates aspects of both the fixed effects and variance parameters. Using this decomposition, the appropriate information criteria for model selection of models which differ in their fixed effects specification can be derived. An example is presented to illustrate the results and code is available for analyses using the ASReml‐R package.  相似文献   

This paper studies bandwidth selection for kernel estimation of derivatives of multidimensional conditional densities, a non-parametric realm unexplored in the literature. This paper extends Baird [Cross validation bandwidth selection for derivatives of multidimensional densities. RAND Working Paper series, WR-1060; 2014] in its examination of conditional multivariate densities, derives and presents criteria for arbitrary kernel order and density dimension, shows consistency of the estimators, and investigates a minimization criterion which jointly estimates numerator and denominator bandwidths. I conduct a Monte Carlo simulation study for various orders of kernels in the Gaussian family and compare the new cross validation criterion with those implied by Baird [Cross validation bandwidth selection for derivatives of multidimensional densities. RAND Working Paper series, WR-1060; 2014]. The paper finds that higher order kernels become increasingly important as the dimension of the distribution increases. I find that the cross validation criterion developed in this paper that jointly estimates the derivative of the joint density (numerator) and the marginal density (denominator) does orders of magnitude better than criteria that estimate the bandwidths separately. I further find that using the infinite order Dirichlet kernel tends to have the best results.  相似文献   

Many different methods have been proposed to construct nonparametric estimates of a smooth regression function, including local polynomial, (convolution) kernel and smoothing spline estimators. Each of these estimators uses a smoothing parameter to control the amount of smoothing performed on a given data set. In this paper an improved version of a criterion based on the Akaike information criterion (AIC), termed AICC, is derived and examined as a way to choose the smoothing parameter. Unlike plug-in methods, AICC can be used to choose smoothing parameters for any linear smoother, including local quadratic and smoothing spline estimators. The use of AICC avoids the large variability and tendency to undersmooth (compared with the actual minimizer of average squared error) seen when other 'classical' approaches (such as generalized cross-validation (GCV) or the AIC) are used to choose the smoothing parameter. Monte Carlo simulations demonstrate that the AICC-based smoothing parameter is competitive with a plug-in method (assuming that one exists) when the plug-in method works well but also performs well when the plug-in approach fails or is unavailable.  相似文献   

A supersaturated design (SSD) is a design whose run size is not enough for estimating all main effects. Such a design is commonly used in screening experiments to screen active effects based on the effect sparsity principle. Traditional approaches, such as the ordinary stepwise regression and the best subset variable selection, may not be appropriate in this situation. In this article, a new variable selection method is proposed based on the idea of staged dimensionality reduction. Simulations and several real data studies indicate that the newly proposed method is more effective than the existing data analysis methods.  相似文献   

We propose a new simulation method, SimSel, for variable selection in linear and nonlinear modelling problems. SimSel works by disturbing the input data with pseudo-errors. We then study how this disturbance affects the quality of an approximative model fitted to the data. The main idea is that disturbing unimportant variables does not affect the quality of the model fit. The use of an approximative model has the advantage that the true underlying function does not need to be known and that the method becomes insensitive to model misspecifications. We demonstrate SimSel on simulated data from linear and nonlinear models and on two real data sets. The simulation studies suggest that SimSel works well in complicated situations, such as nonlinear errors-in-variable models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号