Tree-based methods similar to CART have recently been utilized for problems in which the main goal is to estimate some set of interest. It is often the case that the boundary of the true set is smooth in some sense, however tree-based estimates will not be smooth, as they will be a union of 'boxes'. We propose a general methodology for smoothing such sets that allows for varying levels of smoothness on the boundary automatically. The method is similar to the idea underlying support vector machines, which is applying a computationally simple technique to data after a non-linear mapping to produce smooth estimates in the original space. In particular, we consider the problem of level-set estimation for regression functions and the dyadic tree-based method of Willett and Nowak [Minimax optimal level-set estimation, IEEE Trans. Image Process. 16 (2007), pp. 2965–2979].   

Except in special cases optimum smoothing parameters of kernel methods are difficult to obtain for small samples, and large sample results are often used. Simulation is used to obtain finite sample optimum smoothing parameters and mean integrated square errors for the bivariate normal density. For this example, comparison is made of finite and asymptotic results, and of fixed and adaptive kernel methods. Further comparisons are made of fixed and adaptive methods by considering four other different types of density. Finally, some examples are given.   

Point estimates that are weighted averages of other estimates are considered. They are adaptive because the weights are also functions of the sample observations.In particular, the weights are functions of new measures of peakedness and skewness. Five adaptive estimators are compared (in a Monte Carlo study using the swindle) to some of the usual estimators, including those robust ones of Huber and Tukey. In addition, the swindle constant is considered in some detail. All of the adaptive estimators do extremely well with an adaptive biweight statistic being the best one in this study. Suggestions are made about future directions in this area.   

ROC curve is a graphical representation of the relationship between sensitivity and specificity of a diagnostic test. It is a popular tool for evaluating and comparing different diagnostic tests in medical sciences. In the literature,the ROC curve is often estimated empirically based on an empirical distribution function estimator and an empirical quantile function estimator. In this paper an alternative nonparametric procedure to estimate the ROC Curve is suggested which is based on local smoothing techniques. Several numerical examples are presented to evaluate the performance of this procedure.   

The location model is a familiar basis for discriminant analysis of mixtures of categorical and continuous variables. Its usual implementation involves second-order smoothing, using multivariate regression for the continuous variables and log-linear models for the categorical variables. In spite of the smoothing, these procedures still require many parameters to be estimated and this in turn restricts the categorical variables to a small number if implementation is to be feasible. In this paper we propose non-parametric smoothing procedures for both parts of the model. The number of parameters to be estimated is dramatically reduced and the range of applicability thereby greatly increased. The methods are illustrated on several data sets, and the performances are compared with a range of other popular discrimination techniques. The proposed method compares very favourably with all its competitors.   

This paper examines the problem of assessing local influence on the optimal bandwidth estimation in kernel smoothing based on cross validation. The bandwidth for kernel smoothing plays an important role in the model fitting and is often estimated using the cross-validation criterion. Following the argument of the second-order approach to local influence suggested by Wu and Luo (1993), we develop a new diagnostic statistic to examine the local influence of the observations on the estimation of the optimal bandwidth, where the perturbation may belong to one of three schemes. These are the response perturbation, the perturbation in the explanatory variable, and the case-weight

perturbation. The proposed diagnostic is nonparametric and is capable of identifying influential observations with strong influence on the bandwidth estimation. An example is presented to illustrate the application of the proposed diagnostic, and the usefulness of the nonparametric approach is illustrated in comparison with some other approaches to the assessment of local influence   

There are many statistics which can be used to characterize data sets and provide valuable information regarding the data distribution, even for large samples. Traditional measures, such as skewness and kurtosis, mentioned in introductory statistics courses, are rarely applied. A variety of other measures of tail length, skewness and tail weight have been proposed, which can be used to describe the underlying population distribution. Adaptive statistical procedures change the estimator of location, depending on sample characteristics. The success of these estimators depends on correctly classifying the underlying distribution model. Advocates of adaptive distribution testing propose to proceed by assuming (1) that an appropriate model, say Omega , is such that Omega { Omega , Omega , i i 1 2 … , Omega }, and (2) that the character of the model selection process is statistically k independent of the hypothesis testing. We review the development of adaptive linear estimators and adaptive maximum-likelihood estimators.   

We consider online monitoring of sequentially arising data as e.g. met in clinical information systems. The general focus thereby is to detect breakpoints, i.e. timepoints where the measurement series suddenly changes the general level. The method suggested is based on local estimation. In particular, local linear smoothing is combined by ridging with local constant smoothing. The procedure is demonstrated by examples and compared with other available online monitoring routines.   

A new procedure is proposed for deriving variable bandwidths in univariate kernel density estimation, based upon likelihood cross-validation and an analysis of a Bayesian graphical model. The procedure admits bandwidth selection which is flexible in terms of the amount of smoothing required. In addition, the basic model can be extended to incorporate local smoothing of the density estimate. The method is shown to perform well in both theoretical and practical situations, and we compare our method with those of Abramson (The Annals of Statistics 10: 1217–1223) and Sain and Scott (Journal of the American Statistical Association 91: 1525–1534). In particular, we note that in certain cases, the Sain and Scott method performs poorly even with relatively large sample sizes.We compare various bandwidth selection methods using standard mean integrated square error criteria to assess the quality of the density estimates. We study situations where the underlying density is assumed both known and unknown, and note that in practice, our method performs well when sample sizes are small. In addition, we also apply the methods to real data, and again we believe our methods perform at least as well as existing methods.   

Optimal estimation in rotation patterns   总被引:1,自引:0,他引:1  
The aim of this paper is to examine the setting of surveys repeated over time when the elements in the sample are rotated in a predesigned way. On each occasion the best linear unbiased estimator (BLUE) of the current population mean, built on all past responses, is to be found. The most straightforward approach would be to compute the estimator as a solution of a least squares problem with linear restrictions. However, this method has certain drawbacks related to the fact that the size of the response data set increases over time. We follow a different approach based on finding linear recurrence relationships between optimal estimators obtained on successive occasions. Most of the original disadvantages are then corrected. In this context we present the solution to the BLUE estimation problem for some—sufficiently regular—classes of rotation patterns.   


In this paper, a class of variance estimator is proposed of a finite population variance under an adaptive cluster sampling design in the presence of information on an auxiliary variable. We obtain expressions for the mean square error and bias for the developed estimators and their performance is evaluated on a Poisson clustered process and a real data set. The simulation study evaluates the efficiency of the suggested estimators for an adaptive cluster sampling (ACS) design and the Isaki (1983 Isaki, C. T. 1983. Variance estimation using auxiliary information. Journal of the American Statistical Association 78 (381):11723. doi: 10.1080/01621459.1983.10477939.[Taylor & Francis Online], [Web of Science ®] [Google Scholar]) estimator of the variance for SRSWOR over the sample variance for SRSWOR.   

Penalized likelihood methods provide a range of practical modelling tools, including spline smoothing, generalized additive models and variants of ridge regression. Selecting the correct weights for penalties is a critical part of using these methods and in the single-penalty case the analyst has several well-founded techniques to choose from. However, many modelling problems suggest a formulation employing multiple penalties, and here general methodology is lacking. A wide family of models with multiple penalties can be fitted to data by iterative solution of the generalized ridge regression problem minimize || W 1/2 ( Xp − y ) ||2ρ+Σ i =1 m  θ i p ' S i p ( p is a parameter vector, X a design matrix, S i a non-negative definite coefficient matrix defining the i th penalty with associated smoothing parameter θ i , W a diagonal weight matrix, y a vector of data or pseudodata and ρ an 'overall' smoothing parameter included for computational efficiency). This paper shows how smoothing parameter selection can be performed efficiently by applying generalized cross-validation to this problem and how this allows non-linear, generalized linear and linear models to be fitted using multiple penalties, substantially increasing the scope of penalized modelling methods. Examples of non-linear modelling, generalized additive modelling and anisotropic smoothing are given.   

In economics, a production frontier function is a graph that shows the maximum output of production units such as firms, industries, or economies, as a function of their inputs. Practically, estimating production frontiers often requires imposition of constraints such as monotonicity or monotone concavity. However, few constrained estimators of production frontier have been proposed in the literature. They are based on simple envelopment techniques which often suffer from lack of precision and smoothness. Motivated by this observation, we propose a smooth constrained nonparametric frontier estimator respecting constraints by considering kernel smoothing estimators from a transformed data. It is particularly appealing to practitioners who would like to use smooth estimates that, in addition, satisfy theoretical axioms of production. The utility of this method is illustrated through application to one real dataset and simulation evidences are also presented to show its superiority over the most known methods.   

In this paper we demonstrate how the task of spectral density estimation by direct methods may be posed as that of solving a simple optimal smoothing problem. A criterion functional is considered which involves a smoothness frequency domain term and a fidelity time domain term.   

This paper demonstrates that cross-validation (CV) and Bayesian adaptive bandwidth selection can be applied in the estimation of associated kernel discrete functions. This idea is originally proposed by Brewer [A Bayesian model for local smoothing in kernel density estimation, Stat. Comput. 10 (2000), pp. 299–309] to derive variable bandwidths in adaptive kernel density estimation. Our approach considers the adaptive binomial kernel estimator and treats the variable bandwidths as parameters with beta prior distribution. The best variable bandwidth selector is estimated by the posterior mean in the Bayesian sense under squared error loss. Monte Carlo simulations are conducted to examine the performance of the proposed Bayesian adaptive approach in comparison with the performance of the Asymptotic mean integrated squared error estimator and CV technique for selecting a global (fixed) bandwidth proposed in Kokonendji and Senga Kiessé [Discrete associated kernels method and extensions, Stat. Methodol. 8 (2011), pp. 497–516]. The Bayesian adaptive bandwidth estimator performs better than the global bandwidth, in particular for small and moderate sample sizes.   

We examine the issue of asymptotic efficiency of estimation for response adaptive designs of clinical trials, from which the collected data set contains a dependency structure. We establish the asymptotic lower bound of exponential rates for consistent estimators. Under certain regularity conditions, we show that the maximum likelihood estimator achieves the asymptotic lower bound for response adaptive trials with dichotomous responses. Furthermore, it is shown that the maximum likelihood estimator of the treatment effect is asymptotically efficient in the Bahadur sense for response adaptive clinical trials.   

In a missing data setting, we have a sample in which a vector of explanatory variables ${\bf x}_i$ is observed for every subject i, while scalar responses $y_i$ are missing by happenstance on some individuals. In this work we propose robust estimators of the distribution of the responses assuming missing at random (MAR) data, under a semiparametric regression model. Our approach allows the consistent estimation of any weakly continuous functional of the response's distribution. In particular, strongly consistent estimators of any continuous location functional, such as the median, L‐functionals and M‐functionals, are proposed. A robust fit for the regression model combined with the robust properties of the location functional gives rise to a robust recipe for estimating the location parameter. Robustness is quantified through the breakdown point of the proposed procedure. The asymptotic distribution of the location estimators is also derived. The proofs of the theorems are presented in Supplementary Material available online. The Canadian Journal of Statistics 41: 111–132; 2013 © 2012 Statistical Society of Canada   

We consider the problem of statistical inference for functional and dynamic magnetic resonance imaging (MRI). A new approach is proposed which extends the adaptive weights smoothing procedure of Polzehl and Spokoiny that was originally designed for image denoising. We demonstrate how the adaptive weights smoothing method can be applied to time series of images, which typically occur in functional and dynamic MRI. It is shown how signal detection in functional MRI and the analysis of dynamic MRI can benefit from spatially adaptive smoothing. The performance of the procedure is illustrated by using real and simulated data.   

The authors consider a double robust estimation of the regression parameter defined by an estimating equation in a surrogate outcome set‐up. Under a correct specification of the propensity score, the proposed estimator has smallest trace of asymptotic covariance matrix whether the "working outcome regression model" involved is specified correct or not, and it is particularly meaningful when it is incorrectly specified. Simulations are conducted to examine the finite sample performance of the proposed procedure. Data on obesity and high blood pressure are analyzed for illustration. The Canadian Journal of Statistics 38: 633–646; 2010 © 2010 Statistical Society of Canada   

