首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Statistical learning is emerging as a promising field where a number of algorithms from machine learning are interpreted as statistical methods and vice-versa. Due to good practical performance, boosting is one of the most studied machine learning techniques. We propose algorithms for multivariate density estimation and classification. They are generated by using the traditional kernel techniques as weak learners in boosting algorithms. Our algorithms take the form of multistep estimators, whose first step is a standard kernel method. Some strategies for bandwidth selection are also discussed with regard both to the standard kernel density classification problem, and to our 'boosted' kernel methods. Extensive experiments, using real and simulated data, show an encouraging practical relevance of the findings. Standard kernel methods are often outperformed by the first boosting iterations and in correspondence of several bandwidth values. In addition, the practical effectiveness of our classification algorithm is confirmed by a comparative study on two real datasets, the competitors being trees including AdaBoosting with trees.  相似文献   

2.
The use of a kernel estimator as a smooth estimator for a distribution function has been suggested by many authors An expression for the bandwidth that minimizes the mean integrated square error asymptotically has been available for some time. However, few practical data based methods ior estimating this bandwidth have been investigated. In this paper we propose multisstage plug-in type estimater for this optimal bandwith and derive its asymptotic properties. In particular we show that two stages are required for good asymptotic properties. This behavior is verified for finite samples using a simulation study.  相似文献   

3.
Abstract. Kernel density estimation is an important tool in visualizing posterior densities from Markov chain Monte Carlo output. It is well known that when smooth transition densities exist, the asymptotic properties of the estimator agree with those for independent data. In this paper, we show that because of the rejection step of the Metropolis–Hastings algorithm, this is no longer true and the asymptotic variance will depend on the probability of accepting a proposed move. We find an expression for this variance and apply the result to algorithms for automatic bandwidth selection.  相似文献   

4.
We recently proposed a representation of the bivariate survivor function as a mapping of the hazard function for truncated failure time variates. The representation led to a class of estimators that includes van der Laan’s repaired nonparametric maximum likelihood estimator (NPMLE) as an important special case. We proposed a Greenwood-like variance estimator for the repaired NPMLE but found somewhat poor agreement between the empirical variance estimates and these analytic estimates for the sample sizes and bandwidths considered in our simulation study. The simulation results also confirmed those of others in showing slightly inferior performance for the repaired NPMLE compared to other competing estimators as well as a sensitivity to bandwidth choice in moderate sized samples. Despite its attractive asymptotic properties, the repaired NPMLE has drawbacks that hinder its practical application. This paper presents a modification of the repaired NPMLE that improves its performance in moderate sized samples and renders it less sensitive to the choice of bandwidth. Along with this modified estimator, more extensive simulation studies of the repaired NPMLE and Greenwood-like variance estimates are presented. The methods are then applied to a real data example. This revised version was published online in September 2005 with a correction to the second author's name.  相似文献   

5.
Abstract.  This paper develops non-parametric techniques for dynamic models whose data have unknown probability distributions. Point estimators are obtained from the maximization of a semiparametric likelihood function built on the kernel density of the disturbances. This approach can also provide Kullback–Leibler cross-validation estimates of the bandwidth of the kernel densities. Confidence regions are derived from the dual-empirical likelihood method based on non-parametric estimates of the scores. Limit theorems for martingale difference sequences support the statistical theory; moreover, simulation experiments and a real case study show the validity of the methods.  相似文献   

6.
This paper focuses on bivariate kernel density estimation that bridges the gap between univariate and multivariate applications. We propose a subsampling-extrapolation bandwidth matrix selector that improves the reliability of the conventional cross-validation method. The proposed procedure combines a U-statistic expression of the mean integrated squared error and asymptotic theory, and can be used in both cases of diagonal bandwidth matrix and unconstrained bandwidth matrix. In the subsampling stage, one takes advantage of the reduced variability of estimating the bandwidth matrix at a smaller subsample size m (m < n); in the extrapolation stage, a simple linear extrapolation is used to remove the incurred bias. Simulation studies reveal that the proposed method reduces the variability of the cross-validation method by about 50% and achieves an expected integrated squared error that is up to 30% smaller than that of the benchmark cross-validation. It shows comparable or improved performance compared to other competitors across six distributions in terms of the expected integrated squared error. We prove that the components of the selected bivariate bandwidth matrix have an asymptotic multivariate normal distribution, and also present the relative rate of convergence of the proposed bandwidth selector.  相似文献   

7.
This paper studies a functional coe?cient time series model with trending regressors, where the coe?cients are unknown functions of time and random variables. We propose a local linear estimation method to estimate the unknown coe?cient functions, and establish the corresponding asymptotic theory under mild conditions. We also develop a test procedure to see if the functional coe?cients take particular parametric forms. For practical use, we further propose a Bayesian approach to select the bandwidths, and conduct several numerical experiments to examine the finite sample performance of our proposed local linear estimator and the test procedure. The results show that the local linear estimator works well and the proposed test has satisfactory size and power. In addition, our simulation studies show that the Bayesian bandwidth selection method performs better than the cross-validation method. Furthermore, we use the functional coe?cient model to study the relationship between consumption per capita and income per capita in United States, and it was shown that the functional coe?cient model with our proposed local linear estimator and Bayesian bandwidth selection method performs well in both in-sample fitting and out-of-sample forecasting.  相似文献   

8.
Summary.  The paper introduces a new local polynomial estimator and develops supporting asymptotic theory for nonparametric regression in the presence of covariate measurement error. We address the measurement error with Cook and Stefanski's simulation–extrapolation (SIMEX) algorithm. Our method improves on previous local polynomial estimators for this problem by using a bandwidth selection procedure that addresses SIMEX's particular estimation method and considers higher degree local polynomial estimators. We illustrate the accuracy of our asymptotic expressions with a Monte Carlo study, compare our method with other estimators with a second set of Monte Carlo simulations and apply our method to a data set from nutritional epidemiology. SIMEX was originally developed for parametric models. Although SIMEX is, in principle, applicable to nonparametric models, a serious problem arises with SIMEX in nonparametric situations. The problem is that smoothing parameter selectors that are developed for data without measurement error are no longer appropriate and can result in considerable undersmoothing. We believe that this is the first paper to address this difficulty.  相似文献   

9.
Non‐inferiority trials aim to demonstrate whether an experimental therapy is not unacceptably worse than an active reference therapy already in use. When applicable, a three‐arm non‐inferiority trial, including an experiment therapy, an active reference therapy, and a placebo, is often recommended to assess assay sensitivity and internal validity of a trial. In this paper, we share some practical considerations based on our experience from a phase III three‐arm non‐inferiority trial. First, we discuss the determination of the total sample size and its optimal allocation based on the overall power of the non‐inferiority testing procedure and provide ready‐to‐use R code for implementation. Second, we consider the non‐inferiority goal of ‘capturing all possibilities’ and show that it naturally corresponds to a simple two‐step testing procedure. Finally, using this two‐step non‐inferiority testing procedure as an example, we compare extensively commonly used frequentist p ‐value methods with the Bayesian posterior probability approach. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

10.
We study bandwidth selection for a class of semi-parametric models. The proper choice of optimal bandwidth minimizes the prediction errors of the model. We provide detailed derivation of our procedure and the corresponding computation algorithms. Our proposed method simplifies the computation of the cross-validation criteria and facilitates more complicated inference and analysis in practice. A data set from Wisconsin Diabetes Registry has been analysed as an illustration.  相似文献   

11.
In this paper we study the ideal variable bandwidth kernel density estimator introduced by McKay (1993a, b) and Jones et al. (1994) and the plug-in practical version of the variable bandwidth kernel estimator with two sequences of bandwidths as in Giné and Sang (2013). Based on the bias and variance analysis of the ideal and plug-in variable bandwidth kernel density estimators, we study the central limit theorems for each of them. The simulation study confirms the central limit theorem and demonstrates the advantage of the plug-in variable bandwidth kernel method over the classical kernel method.  相似文献   

12.
利润最大化区位理论与广州高房价的根源   总被引:1,自引:0,他引:1  
张立建 《统计研究》2008,25(9):16-23
本文利用利润最大化区位理论,建立房价模型,实证研究广州房价持续上涨的根源。发现影响房价的主要因素是住房供给的短缺,次要原因是高成本以及严重的贫富分化。其体制根源在于自由竞争的地产需求市场与计划经济的地产供给市场之间的矛盾,政策根源在于政府变为“经济人”,一味经营城市,经济根源在于因竞争和权力垄断所导致的产业分化,社会根源在于广州市民不合理的住房消费习惯。因而,近期来讲,加大土地供给、改革土地出让方式、实行房地产累进累退税是抑制房价的关键,从长远来讲,要建立自由竞争的地产供给市场,变“经济人”政府为服务性政府,优化产业结构,取消“国字头”行业特权。  相似文献   

13.
Bandwidth plays an important role in determining the performance of nonparametric estimators, such as the local constant estimator. In this article, we propose a Bayesian approach to bandwidth estimation for local constant estimators of time-varying coefficients in time series models. We establish a large sample theory for the proposed bandwidth estimator and Bayesian estimators of the unknown parameters involved in the error density. A Monte Carlo simulation study shows that (i) the proposed Bayesian estimators for bandwidth and parameters in the error density have satisfactory finite sample performance; and (ii) our proposed Bayesian approach achieves better performance in estimating the bandwidths than the normal reference rule and cross-validation. Moreover, we apply our proposed Bayesian bandwidth estimation method for the time-varying coefficient models that explain Okun’s law and the relationship between consumption growth and income growth in the U.S. For each model, we also provide calibrated parametric forms of the time-varying coefficients. Supplementary materials for this article are available online.  相似文献   

14.
ABSTRACT

This paper discusses the detailed performance of an iterative plug-in (IPI) bandwidth selector for estimating the diurnal duration pattern in a Semi-ACD (semiparametric autoregressive conditional duration) model. For this purpose a large simulation study was carried out. The effects of different factors, which affect the selected bandwidth are discussed in detail. The simulated results and data examples show that the proposed IPI algorithm works very well in practice and that the Semi-ACD model in general is clearly superior to the parametric ACD model, if there is a deterministic trend in the duration data. It is also shown that the bandwidth selection, and the estimation of the diurnal pattern and the model parameters will all be clearly improved, if the sample size is enlarged. According to the goodness-of-fit of the estimated diurnal pattern, a best combination of the above-mentioned factors is found. Moreover, a comparative study shows that our proposal usually outperforms the commonly used cubic spline.  相似文献   

15.
As conventional cross-validation bandwidth selection methods do not work properly in the situation where the data are serially dependent time series, alternative bandwidth selection methods are necessary. In recent years, Bayesian-based methods for global bandwidth selection have been studied. Our experience shows that a global bandwidth is however less suitable than a localized bandwidth in kernel density estimation based on serially dependent time series data. Nonetheless, a di?cult issue is how we can consistently estimate a localized bandwidth. This paper presents a nonparametric localized bandwidth estimator, for which we establish a completely new asymptotic theory. Applications of this new bandwidth estimator to the kernel density estimation of Eurodollar deposit rate and the S&P 500 daily return demonstrate the effectiveness and competitiveness of the proposed localized bandwidth.  相似文献   

16.
博彩旅游业是澳门经济的支柱产业,它是保持澳门政治稳定、社会发展和经济健康的决定性因素。采用二阶构成型的PLS路径模型这一新的方法分析澳门游客的满意度,结果发现购物、信息服务和景点是当前澳门旅游业需要重点改进的方面,同时应当加强博彩业服务人员的培训和管理。二阶构成型PLS路径模型的应用可以直接获得一阶显变量及一阶潜变量的影响大小数据,这将有效地提升满意度研究的实践操作价值。  相似文献   

17.
This article presents five principles of learning, derived from cognitive theory and supported by empirical results in cognitive psychology. To bridge the gap between theory and practice, each of these principles is transformed into a practical guideline and exemplified in a real teaching context. It is argued that this approach of putting cognitive theory into practice can offer several benefits to statistics education: A means for explaining and understanding why reform efforts work; a set of guidelines that can help instructors make well-informed design decisions when implementing these reforms; and a framework for generating new and effective instructional innovations.  相似文献   

18.
In this paper, we provide a large bandwidth analysis for a class of local likelihood methods. This work complements the small bandwidth analysis of Park et al. (Ann. Statist. 30 (2002) 1480). Our treatment is more general than the large bandwidth analysis of Eguchi and Copas (J. Roy. Statist. Soc. B 60 (1998) 709). We provide a higher-order asymptotic analysis for the risk of the local likelihood density estimator, from which a direct comparison between various versions of local likelihood can be made. The present work, being combined with the small bandwidth results of Park et al. (2002), gives an optimal size of the bandwidth which depends on the degree of departure of the underlying density from the proposed parametric model.  相似文献   

19.
Summary.  Compared with the classical backfitting of Buja, Hastie and Tibshirani, the smooth backfitting estimator (SBE) of Mammen, Linton and Nielsen not only provides complete asymptotic theory under weaker conditions but is also more efficient, robust and easier to calculate. However, the original paper describing the SBE method is complex and the practical as well as the theoretical advantages of the method have still neither been recognized nor accepted by the statistical community. We focus on a clear presentation of the idea, the main theoretical results and practical aspects like implementation and simplification of the algorithm. We introduce a feasible cross-validation procedure and apply it to the problem of data-driven bandwidth choice for the SBE. By simulations it is shown that the SBE and our cross-validation work very well indeed. In particular, the SBE is less affected by sparseness of data in high dimensional regression problems or strongly correlated designs. The SBE has reasonable performance even in 100-dimensional additive regression problems.  相似文献   

20.
This paper is devoted to the nonparametric estimation of hazard function by means of kernel smoothers, and more specifically to the crucial problem of bandwidth selection. We first get the convergence rate of usual cross-validated bandwidth under a general dependence assumption on the sample data, extending in several directions the results existing in the literature. In a second attempt, this rate of convergence is used to motivate the introduction of a penalized version of the cross-validation procedure. The rate of convergence is calculated, and a short simulation study, together with a practical application to real data, shows the interest of this approach for finite sample studies. Finally, as a by-product of our proofs, we state a general inequality for the moments of sums of strong dependent variables. Because of its possible interest for many other purposes apart from hazard estimation, this inequality is presented in a specific self-contained section.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号