期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Fast computation of spatially adaptive kernel estimates

Tilman M. Davies Adrian Baddeley 《Statistics and Computing》2018,28(4):937-956

Kernel smoothing of spatial point data can often be improved using an adaptive, spatially varying bandwidth instead of a fixed bandwidth. However, computation with a varying bandwidth is much more demanding, especially when edge correction and bandwidth selection are involved. This paper proposes several new computational methods for adaptive kernel estimation from spatial point pattern data. A key idea is that a variable-bandwidth kernel estimator for d-dimensional spatial data can be represented as a slice of a fixed-bandwidth kernel estimator in \((d+1)\)-dimensional scale space, enabling fast computation using Fourier transforms. Edge correction factors have a similar representation. Different values of global bandwidth correspond to different slices of the scale space, so that bandwidth selection is greatly accelerated. Potential applications include estimation of multivariate probability density and spatial or spatiotemporal point process intensity, relative risk, and regression functions. The new methods perform well in simulations and in two real applications concerning the spatial epidemiology of primary biliary cirrhosis and the alarm calls of capuchin monkeys. 相似文献

2.

Estimating a sparse reduction for general regression in high dimensions

Tao Wang Mengjie Chen Hongyu Zhao Lixing Zhu 《Statistics and Computing》2018,28(1):33-46

Although the concept of sufficient dimension reduction that was originally proposed has been there for a long time, studies in the literature have largely focused on properties of estimators of dimension-reduction subspaces in the classical “small p, and large n” setting. Rather than the subspace, this paper considers directly the set of reduced predictors, which we believe are more relevant for subsequent analyses. A principled method is proposed for estimating a sparse reduction, which is based on a new, revised representation of an existing well-known method called the sliced inverse regression. A fast and efficient algorithm is developed for computing the estimator. The asymptotic behavior of the new method is studied when the number of predictors, p, exceeds the sample size, n, providing a guide for choosing the number of sufficient dimension-reduction predictors. Numerical results, including a simulation study and a cancer-drug-sensitivity data analysis, are presented to examine the performance. 相似文献

3.

Automated selection of <Emphasis Type="Italic">r</Emphasis> for the <Emphasis Type="Italic">r</Emphasis> largest order statistics approach with adjustment for sequential testing

Brian Bader Jun Yan Xuebin Zhang 《Statistics and Computing》2017,27(6):1435-1451

The r largest order statistics approach is widely used in extreme value analysis because it may use more information from the data than just the block maxima. In practice, the choice of r is critical. If r is too large, bias can occur; if too small, the variance of the estimator can be high. The limiting distribution of the r largest order statistics, denoted by GEV\(_r\), extends that of the block maxima. Two specification tests are proposed to select r sequentially. The first is a score test for the GEV\(_r\) distribution. Due to the special characteristics of the GEV\(_r\) distribution, the classical chi-square asymptotics cannot be used. The simplest approach is to use the parametric bootstrap, which is straightforward to implement but computationally expensive. An alternative fast weighted bootstrap or multiplier procedure is developed for computational efficiency. The second test uses the difference in estimated entropy between the GEV\(_r\) and GEV\(_{r-1}\) models, applied to the r largest order statistics and the \(r-1\) largest order statistics, respectively. The asymptotic distribution of the difference statistic is derived. In a large scale simulation study, both tests held their size and had substantial power to detect various misspecification schemes. A new approach to address the issue of multiple, sequential hypotheses testing is adapted to this setting to control the false discovery rate or familywise error rate. The utility of the procedures is demonstrated with extreme sea level and precipitation data. 相似文献

4.

Sequential Monte Carlo for counting vertex covers in general graphs

Radislav Vaisman Zdravko I. Botev Ad Ridder 《Statistics and Computing》2016,26(3):591-607

In this paper we describe a sequential importance sampling (SIS) procedure for counting the number of vertex covers in general graphs. The optimal SIS proposal distribution is the uniform over a suitably restricted set, but is not implementable. We will consider two proposal distributions as approximations to the optimal. Both proposals are based on randomization techniques. The first randomization is the classic probability model of random graphs, and in fact, the resulting SIS algorithm shows polynomial complexity for random graphs. The second randomization introduces a probabilistic relaxation technique that uses Dynamic Programming. The numerical experiments show that the resulting SIS algorithm enjoys excellent practical performance in comparison with existing methods. In particular the method is compared with cachet—an exact model counter, and the state of the art SampleSearch, which is based on Belief Networks and importance sampling. 相似文献

5.

Minimax versions of the two-stage <Emphasis Type="Italic">t</Emphasis> test

Wolf Krumbholz Andreas Rohr Eno Vangjeli 《Statistical Papers》2012,53(2):311-321

Let X be a N(μ, σ ²) distributed characteristic with unknown σ. We present the minimax version of the two-stage t test having minimal maximal average sample size among all two-stage t tests obeying the classical two-point-condition on the operation characteristic. We give several examples. Furthermore, the minimax version of the two-stage t test is compared with the corresponding two-stage Gauß test. 相似文献

6.

Multi-dimensional functional principal component analysis

Lu-Hung Chen Ci-Ren Jiang 《Statistics and Computing》2017,27(5):1181-1192

Functional principal component analysis is one of the most commonly employed approaches in functional and longitudinal data analysis and we extend it to analyze functional/longitudinal data observed on a general d-dimensional domain. The computational issues emerging in the extension are fully addressed with our proposed solutions. The local linear smoothing technique is employed to perform estimation because of its capabilities of performing large-scale smoothing and of handling data with different sampling schemes (possibly on irregular domain) in addition to its nice theoretical properties. Besides taking the fast Fourier transform strategy in smoothing, the modern GPGPU (general-purpose computing on graphics processing units) architecture is applied to perform parallel computation to save computation time. To resolve the out-of-memory issue due to large-scale data, the random projection procedure is applied in the eigendecomposition step. We show that the proposed estimators can achieve the classical nonparametric rates for longitudinal data and the optimal convergence rates for functional data if the number of observations per sample is of the order \((n/ \log n)^{d/4}\). Finally, the performance of our approach is demonstrated with simulation studies and the fine particulate matter (PM 2.5) data measured in Taiwan. 相似文献

7.

Bayesian Additive Regression Trees using Bayesian model averaging

Belinda Hernández Adrian E. Raftery Stephen R Pennington Andrew C. Parnell 《Statistics and Computing》2018,28(4):869-890

Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However, for datasets where the number of variables p is large the algorithm can become inefficient and computationally expensive. Another method which is popular for high-dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However, its default implementation does not produce probabilistic estimates or predictions. We propose an alternative fitting algorithm for BART called BART-BMA, which uses Bayesian model averaging and a greedy search algorithm to obtain a posterior distribution more efficiently than BART for datasets with large p. BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data. We have found that BART-BMA can be run in a reasonable time on a standard laptop for the “small n large p” scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments, one to distinguish between patients with cardiovascular disease and controls and another to classify aggressive from non-aggressive prostate cancer. We compare our results to their main competitors. Open source code written in R and Rcpp to run BART-BMA can be found at: https://github.com/BelindaHernandez/BART-BMA.git. 相似文献

8.

A new universal resample-stable bootstrap-based stopping criterion for PLS component construction

Jérémy Magnanensi Frédéric Bertrand Myriam Maumy-Bertrand Nicolas Meyer 《Statistics and Computing》2017,27(3):757-774

We develop a new robust stopping criterion for partial least squares regression (PLSR) component construction, characterized by a high level of stability. This new criterion is universal since it is suitable both for PLSR and extensions to generalized linear regression (PLSGLR). The criterion is based on a non-parametric bootstrap technique and must be computed algorithmically. It allows the testing of each successive component at a preset significance level \(\alpha \). In order to assess its performance and robustness with respect to various noise levels, we perform dataset simulations in which there is a preset and known number of components. These simulations are carried out for datasets characterized both by \(n>p\), with n the number of subjects and p the number of covariates, as well as for \(n<p\). We then use t-tests to compare the predictive performance of our approach with other common criteria. The stability property is in particular tested through re-sampling processes on a real allelotyping dataset. An important additional conclusion is that this new criterion gives globally better predictive performances than existing ones in both the PLSR and PLSGLR (logistic and poisson) frameworks. 相似文献

9.

Robust skew-<Emphasis Type="Italic">t</Emphasis> factor analysis models for handling missing data

Wan-Lun Wang Min Liu Tsung-I Lin 《Statistical Methods and Applications》2017,26(4):649-672

This paper presents a novel framework for maximum likelihood (ML) estimation in skew-t factor analysis (STFA) models in the presence of missing values or nonresponses. As a robust extension of the ordinary factor analysis model, the STFA model assumes a restricted version of the multivariate skew-t distribution for the latent factors and the unobservable errors to accommodate non-normal features such as asymmetry and heavy tails or outliers. An EM-type algorithm is developed to carry out ML estimation and imputation of missing values under a missing at random mechanism. The practical utility of the proposed methodology is illustrated through real and synthetic data examples. 相似文献

10.

Rate of uniform consistency for a class of mode regression on functional stationary ergodic data

Mohamed Chaouch Naâmane Laïb Djamal Louani 《Statistical Methods and Applications》2017,26(1):19-47

The aim of this paper is to study the asymptotic properties of a class of kernel conditional mode estimates whenever functional stationary ergodic data are considered. To be more precise on the matter, in the ergodic data setting, we consider a random elements (X, Z) taking values in some semi-metric abstract space \(E\times F\). For a real function \(\varphi \) defined on the space F and \(x\in E\), we consider the conditional mode of the real random variable \(\varphi (Z)\) given the event “\(X=x\)”. While estimating the conditional mode function, say \(\theta _\varphi (x)\), using the well-known kernel estimator, we establish the strong consistency with rate of this estimate uniformly over Vapnik–Chervonenkis classes of functions \(\varphi \). Notice that the ergodic setting offers a more general framework than the usual mixing structure. Two applications to energy data are provided to illustrate some examples of the proposed approach in time series forecasting framework. The first one consists in forecasting the daily peak of electricity demand in France (measured in Giga-Watt). Whereas the second one deals with the short-term forecasting of the electrical energy (measured in Giga-Watt per Hour) that may be consumed over some time intervals that cover the peak demand. 相似文献

11.

On estimation of covariate-specific residual time quantiles under the proportional hazards model

Luis Alexander Crouch Susanne May Ying Qing Chen 《Lifetime data analysis》2016,22(2):299-319

Estimation and inference in time-to-event analysis typically focus on hazard functions and their ratios under the Cox proportional hazards model. These hazard functions, while popular in the statistical literature, are not always easily or intuitively communicated in clinical practice, such as in the settings of patient counseling or resource planning. Expressing and comparing quantiles of event times may allow for easier understanding. In this article we focus on residual time, i.e., the remaining time-to-event at an arbitrary time t given that the event has yet to occur by t. In particular, we develop estimation and inference procedures for covariate-specific quantiles of the residual time under the Cox model. Our methods and theory are assessed by simulations, and demonstrated in analysis of two real data sets. 相似文献

12.

Importance tempering

Robert Gramacy Richard Samworth Ruth King 《Statistics and Computing》2010,20(1):1-7

Simulated tempering (ST) is an established Markov chain Monte Carlo (MCMC) method for sampling from a multimodal density π(θ). Typically, ST involves introducing an auxiliary variable k taking values in a finite subset of [0,1] and indexing a set of tempered distributions, say π _k(θ)∝ π(θ)^k. In this case, small values of k encourage better mixing, but samples from π are only obtained when the joint chain for (θ,k) reaches k=1. However, the entire chain can be used to estimate expectations under π of functions of interest, provided that importance sampling (IS) weights are calculated. Unfortunately this method, which we call importance tempering (IT), can disappoint. This is partly because the most immediately obvious implementation is naïve and can lead to high variance estimators. We derive a new optimal method for combining multiple IS estimators and prove that the resulting estimator has a highly desirable property related to the notion of effective sample size. We briefly report on the success of the optimal combination in two modelling scenarios requiring reversible-jump MCMC, where the naïve approach fails. 相似文献

13.

Optimal regular graph designs

Sera Aylin Cakiroglu 《Statistics and Computing》2018,28(1):103-112

A typical problem in optimal design theory is finding an experimental design that is optimal with respect to some criteria in a class of designs. The most popular criteria include the A- and D-criteria. Regular graph designs occur in many optimality results, and if the number of blocks is large enough, an A-optimal (or D-optimal) design is among them (if any exist). To explore the landscape of designs with a large number of blocks, we introduce extensions of regular graph designs. These are constructed by adding the blocks of a balanced incomplete block design repeatedly to the original design. We present the results of an exact computer search for the best regular graph designs and the best extended regular graph designs with up to 20 treatments v, block size \(k \le 10\) and replication r \(\le 10\) and \(r(k-1)-(v-1)\lfloor r(k-1)/(v-1)\rfloor \le 9\). 相似文献

14.

A comparison of efficient approximations for a weighted sum of chi-squared random variables

Dean A. Bodenham Niall M. Adams 《Statistics and Computing》2016,26(4):917-928

In many applications, the cumulative distribution function (cdf) \(F_{Q_N}\) of a positively weighted sum of N i.i.d. chi-squared random variables \(Q_N\) is required. Although there is no known closed-form solution for \(F_{Q_N}\), there are many good approximations. When computational efficiency is not an issue, Imhof’s method provides a good solution. However, when both the accuracy of the approximation and the speed of its computation are a concern, there is no clear preferred choice. Previous comparisons between approximate methods could be considered insufficient. Furthermore, in streaming data applications where the computation needs to be both sequential and efficient, only a few of the available methods may be suitable. Streaming data problems are becoming ubiquitous and provide the motivation for this paper. We develop a framework to enable a much more extensive comparison between approximate methods for computing the cdf of weighted sums of an arbitrary random variable. Utilising this framework, a new and comprehensive analysis of four efficient approximate methods for computing \(F_{Q_N}\) is performed. This analysis procedure is much more thorough and statistically valid than previous approaches described in the literature. A surprising result of this analysis is that the accuracy of these approximate methods increases with N. 相似文献

15.

Objective Bayesian analysis for the multivariate skew-t model

Antonio Parisi B. Liseo 《Statistical Methods and Applications》2018,27(2):277-295

We propose a novel Bayesian analysis of the p-variate skew-t model, providing a new parameterization, a set of non-informative priors and a sampler specifically designed to explore the posterior density of the model parameters. Extensions, such as the multivariate regression model with skewed errors and the stochastic frontiers model, are easily accommodated. A novelty introduced in the paper is given by the extension of the bivariate skew-normal model given in Liseo and Parisi (2013) to a more realistic p-variate skew-t model. We also introduce the R package mvst, which produces a posterior sample for the parameters of a multivariate skew-t model. 相似文献

16.

The analysis of mixed interval-censored and complete data

Dejing Kong Lirong Cui 《统计学通讯:模拟与计算》2017,46(1):145-163

The article focuses mainly on a conditional imputation algorithm of quantile-filling to analyze a new kind of censored data, mixed interval-censored and complete data related to interval-censored sample. With the algorithm, the imputed failure times, which are the conditional quantiles, are obtained within the censoring intervals in which some exact failure times are. The algorithm is viable and feasible for the parameter estimation with general distributions, for instance, a case of Weibull distribution that has a moment estimation of closed form by log-transformation. Furthermore, interval-censored sample is a special case of the new censored sample, and the conditional imputation algorithm can also be used to deal with the failure data of interval censored. By comparing the interval-censored data and the new censored data, using the imputation algorithm, in the view of the bias of estimation, we find that the performance of new censored data is better than that of interval censored. 相似文献

17.

Discussion of “The power of monitoring: how to make the most of a contaminated multivariate sample” by Andrea Cerioli,Marco Riani,Anthony C. Atkinson and Aldo Corbellini

Valentin Todorov 《Statistical Methods and Applications》2018,27(4):595-602

This paper discusses the contribution of Cerioli et al. (Stat Methods Appl, 2018), where robust monitoring based on high breakdown point estimators is proposed for multivariate data. The results follow years of development in robust diagnostic techniques. We discuss the issues of extending data monitoring to other models with complex structure, e.g. factor analysis, mixed linear models for which S and MM-estimators exist or deviating data cells. We emphasise the importance of robust testing that is often overlooked despite robust tests being readily available once S and MM-estimators have been defined. We mention open questions like out-of-sample inference or big data issues that would benefit from monitoring. 相似文献

18.

Optimal designs for treatment comparisons represented by graphs

Samuel Rosa 《AStA Advances in Statistical Analysis》2018,102(4):479-503

Consider an experiment for comparing a set of treatments: in each trial, one treatment is chosen and its effect determines the mean response of the trial. We examine the optimal approximate designs for the estimation of a system of treatment contrasts under this model. These designs can be used to provide optimal treatment proportions in more general models with nuisance effects. For any system of pairwise treatment comparisons, we propose to represent such a system by a graph. Then, we represent the designs by the inverses of the vertex weights in the corresponding graph and we show that the values of the eigenvalue-based optimality criteria can be expressed using the Laplacians of the vertex-weighted graphs. We provide a graph theoretic interpretation of D-, A- and E-optimality for estimating sets of pairwise comparisons. We apply the obtained graph representation to provide optimality results for these criteria as well as for ’symmetric’ systems of treatment contrasts. 相似文献

19.

Adaptive grid semidefinite programming for finding optimal designs

Belmiro P. M. Duarte Weng Kee Wong Holger Dette 《Statistics and Computing》2018,28(2):441-460

We find optimal designs for linear models using a novel algorithm that iteratively combines a semidefinite programming (SDP) approach with adaptive grid techniques. The proposed algorithm is also adapted to find locally optimal designs for nonlinear models. The search space is first discretized, and SDP is applied to find the optimal design based on the initial grid. The points in the next grid set are points that maximize the dispersion function of the SDP-generated optimal design using nonlinear programming. The procedure is repeated until a user-specified stopping rule is reached. The proposed algorithm is broadly applicable, and we demonstrate its flexibility using (i) models with one or more variables and (ii) differentiable design criteria, such as A-, D-optimality, and non-differentiable criterion like E-optimality, including the mathematically more challenging case when the minimum eigenvalue of the information matrix of the optimal design has geometric multiplicity larger than 1. Our algorithm is computationally efficient because it is based on mathematical programming tools and so optimality is assured at each stage; it also exploits the convexity of the problems whenever possible. Using several linear and nonlinear models with one or more factors, we show the proposed algorithm can efficiently find optimal designs. 相似文献

20.

A characterization of geometric distribution based on weak records

Mohammad Ahsanullah Fazil Aliev 《Statistical Papers》2011,52(3):651-655

Let \({\{X_n, n\geq 1\}}\) be a sequence of independent and identically distributed non-degenerated random variables with common cumulative distribution function F. Suppose X ₁ is concentrated on 0, 1, . . . , N ≤ ∞ and P(X ₁ = 1) > 0. Let \({X_{U_w(n)}}\) be the n-th upper weak record value. In this paper we show that for any fixed m ≥ 2, X ₁ has Geometric distribution if and only if \({X_{U_{w}(m)}\mathop=\limits^d X_1+\cdots+X_m ,}\) where \({\underline{\underline{d}}}\) denotes equality in distribution. Our result is a generalization of the case m = 2 obtained by Ahsanullah (J Stat Theory Appl 8(1):5–16, 2009). 相似文献