期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Classification using distance nearest neighbours

N. Friel A. N. Pettitt 《Statistics and Computing》2011,21(3):431-437

This paper proposes a new probabilistic classification algorithm using a Markov random field approach. The joint distribution of class labels is explicitly modelled using the distances between feature vectors. Intuitively, a class label should depend more on class labels which are closer in the feature space, than those which are further away. Our approach builds on previous work by Holmes and Adams (J. R. Stat. Soc. Ser. B 64:295–306, 2002; Biometrika 90:99–112, 2003) and Cucala et al. (J. Am. Stat. Assoc. 104:263–273, 2009). Our work shares many of the advantages of these approaches in providing a probabilistic basis for the statistical inference. In comparison to previous work, we present a more efficient computational algorithm to overcome the intractability of the Markov random field model. The results of our algorithm are encouraging in comparison to the k-nearest neighbour algorithm. 相似文献

2.

Bayesian inference for the Birnbaum–Saunders nonlinear regression model

Rafael B. A. Farias Artur J. Lemonte 《Statistical Methods and Applications》2011,20(4):423-438

We develop a Bayesian analysis for the class of Birnbaum–Saunders nonlinear regression models introduced by Lemonte and Cordeiro (Comput Stat Data Anal 53:4441–4452, 2009). This regression model, which is based on the Birnbaum–Saunders distribution (Birnbaum and Saunders in J Appl Probab 6:319–327, 1969a), has been used successfully to model fatigue failure times. We have considered a Bayesian analysis under a normal-gamma prior. Due to the complexity of the model, Markov chain Monte Carlo methods are used to develop a Bayesian procedure for the considered model. We describe tools for model determination, which include the conditional predictive ordinate, the logarithm of the pseudo-marginal likelihood and the pseudo-Bayes factor. Additionally, case deletion influence diagnostics is developed for the joint posterior distribution based on the Kullback–Leibler divergence. Two empirical applications are considered in order to illustrate the developed procedures. 相似文献

3.

A Bayesian latent variable approach to functional principal components analysis with binary and count data

Angelika van der Linde 《AStA Advances in Statistical Analysis》2009,93(3):307-333

Recently, van der Linde (Comput. Stat. Data Anal. 53:517–533, 2008) proposed a variational algorithm to obtain approximate Bayesian inference in functional principal components analysis (FPCA), where the functions were observed with Gaussian noise. Generalized FPCA under different noise models with sparse longitudinal data was developed by Hall et al. (J. R. Stat. Soc. B 70:703–723, 2008), but no Bayesian approach is available yet. It is demonstrated that an adapted version of the variational algorithm can be applied to obtain a Bayesian FPCA for canonical parameter functions, particularly log-intensity functions given Poisson count data or logit-probability functions given binary observations. To this end a second order Taylor expansion of the log-likelihood, that is, a working Gaussian distribution and hence another step of approximation, is used. Although the approach is conceptually straightforward, difficulties can arise in practical applications depending on the accuracy of the approximation and the information in the data. A modified algorithm is introduced generally for one-parameter exponential families and exemplified for binary and count data. Conditions for its successful application are discussed and illustrated using simulated data sets. Also an application with real data is presented. 相似文献

4.

A computational framework for empirical Bayes inference

Yves F. Atchadé 《Statistics and Computing》2011,21(4):463-473

In empirical Bayes inference one is typically interested in sampling from the posterior distribution of a parameter with a hyper-parameter set to its maximum likelihood estimate. This is often problematic particularly when the likelihood function of the hyper-parameter is not available in closed form and the posterior distribution is intractable. Previous works have dealt with this problem using a multi-step approach based on the EM algorithm and Markov Chain Monte Carlo (MCMC). We propose a framework based on recent developments in adaptive MCMC, where this problem is addressed more efficiently using a single Monte Carlo run. We discuss the convergence of the algorithm and its connection with the EM algorithm. We apply our algorithm to the Bayesian Lasso of Park and Casella (J. Am. Stat. Assoc. 103:681–686, 2008) and on the empirical Bayes variable selection of George and Foster (J. Am. Stat. Assoc. 87:731–747, 2000). 相似文献

5.

The structured elastic net for quantile regression and?support vector classification

Martin Slawski 《Statistics and Computing》2012,22(1):153-168

In view of its ongoing importance for a variety of practical applications, feature selection via ℓ ₁-regularization methods like the lasso has been subject to extensive theoretical as well empirical investigations. Despite its popularity, mere ℓ ₁-regularization has been criticized for being inadequate or ineffective, notably in situations in which additional structural knowledge about the predictors should be taken into account. This has stimulated the development of either systematically different regularization methods or double regularization approaches which combine ℓ ₁-regularization with a second kind of regularization designed to capture additional problem-specific structure. One instance thereof is the ‘structured elastic net’, a generalization of the proposal in Zou and Hastie (J. R. Stat. Soc. Ser. B 67:301–320, 2005), studied in Slawski et al. (Ann. Appl. Stat. 4(2):1056–1080, 2010) for the class of generalized linear models. 相似文献

6.

On population-based simulation for static inference

Ajay Jasra David A. Stephens Christopher C. Holmes 《Statistics and Computing》2007,17(3):263-279

In this paper we present a review of population-based simulation for static inference problems. Such methods can be described as generating a collection of random variables {X _n}_n=1,…,N in parallel in order to simulate from some target density π (or potentially sequence of target densities). Population-based simulation is important as many challenging sampling problems in applied statistics cannot be dealt with successfully by conventional Markov chain Monte Carlo (MCMC) methods. We summarize population-based MCMC (Geyer, Computing Science and Statistics: The 23rd Symposium on the Interface, pp. 156–163, 1991; Liang and Wong, J. Am. Stat. Assoc. 96, 653–666, 2001) and sequential Monte Carlo samplers (SMC) (Del Moral, Doucet and Jasra, J. Roy. Stat. Soc. Ser. B 68, 411–436, 2006a), providing a comparison of the approaches. We give numerical examples from Bayesian mixture modelling (Richardson and Green, J. Roy. Stat. Soc. Ser. B 59, 731–792, 1997). 相似文献

7.

On-line changepoint detection and parameter estimation with application to genomic data

Fran?ois Caron Arnaud Doucet Raphael Gottardo 《Statistics and Computing》2012,22(2):579-595

相似文献

8.

Automatic Bayesian quantile regression curve fitting

Colin Chen Keming Yu 《Statistics and Computing》2009,19(3):271-281

Quantile regression, including median regression, as a more completed statistical model than mean regression, is now well known with its wide spread applications. Bayesian inference on quantile regression or Bayesian quantile regression has attracted much interest recently. Most of the existing researches in Bayesian quantile regression focus on parametric quantile regression, though there are discussions on different ways of modeling the model error by a parametric distribution named asymmetric Laplace distribution or by a nonparametric alternative named scale mixture asymmetric Laplace distribution. This paper discusses Bayesian inference for nonparametric quantile regression. This general approach fits quantile regression curves using piecewise polynomial functions with an unknown number of knots at unknown locations, all treated as parameters to be inferred through reversible jump Markov chain Monte Carlo (RJMCMC) of Green (Biometrika 82:711–732, 1995). Instead of drawing samples from the posterior, we use regression quantiles to create Markov chains for the estimation of the quantile curves. We also use approximate Bayesian factor in the inference. This method extends the work in automatic Bayesian mean curve fitting to quantile regression. Numerical results show that this Bayesian quantile smoothing technique is competitive with quantile regression/smoothing splines of He and Ng (Comput. Stat. 14:315–337, 1999) and P-splines (penalized splines) of Eilers and de Menezes (Bioinformatics 21(7):1146–1153, 2005). 相似文献

9.

A note on “Data depths satisfying the projection property”

Li Qiang Wu Yi 《AStA Advances in Statistical Analysis》2008,92(2):229-232

This note is on two theorems in a paper by Rainer Dyckerhoff (Allg. Stat. Arch. 88:163–190, 2004). We state a missing condition in Theorem 3. On the other hand, Theorem 2 can be weakened. 相似文献

10.

Make assurance double sure: combination of two disclosure limitation methods and estimation of general regression models

Anton Flossmann Sandra Nolte 《AStA Advances in Statistical Analysis》2008,92(4):405-422

In order to guarantee confidentiality and privacy of firm-level data, statistical offices apply various disclosure limitation techniques. However, each anonymization technique has its protection limits such that the probability of disclosing the individual information for some observations is not minimized. To overcome this problem, we propose combining two separate disclosure limitation techniques, blanking and multiplication of independent noise, in order to protect the original dataset. The proposed approach yields a decrease in the probability of reidentifying/disclosing individual information and can be applied to linear and nonlinear regression models. We show how to combine the blanking method with the multiplicative measurement error method and how to estimate the model by combining the multiplicative Simulation-Extrapolation (M-SIMEX) approach from Nolte (, 2007) on the one side with the Inverse Probability Weighting (IPW) approach going back to Horwitz and Thompson (J. Am. Stat. Assoc. 47:663–685, 1952) and on the other side with matching methods, as an alternative to IPW, like the semiparametric M-Estimator proposed by Flossmann (, 2007). Based on Monte Carlo simulations, we show that multiplicative measurement error combined with blanking as a masking procedure does not necessarily lead to a severe reduction in the estimation quality, provided that its effects on the data generating process are known. 相似文献

11.

A management consulting view on the statistical consulting process

Dietmar Fink Oded Löwenbein 《AStA Advances in Statistical Analysis》2010,94(1):105-109

In an earlier contribution to this journal, Kauermann and Weihs (Adv. Stat. Anal. 91(4):344 2007) addressed the lack of procedural understanding in statistical consulting: “Even though there seems to be a consensus that statistical consulting should be well structured and target-orientated, the range of activity and the process itself seem to be less well-understood.” While this issue appears to be rather new to statistical consultants, other consulting disciplines—in particular management consultants—have long come up with a viable approach that divides the typical consulting process into seven successive steps. Using this model as a frame allows for reflecting the approaches on statistical consulting suggested by authors published in AStA volume 91, number 4, and for adding value to statistical consulting in general. 相似文献

12.

Estimation of 2D jump location curve and 3D jump location surface in nonparametric regression

Chih-Kang Chu Jhao-Siang Siao Lih-Chung Wang Wen-Shuenn Deng 《Statistics and Computing》2012,22(1):17-31

A new procedure is proposed to estimate the jump location curve and surface in the two-dimensional (2D) and three-dimensional (3D) nonparametric jump regression models, respectively. In each of the 2D and 3D cases, our estimation procedure is motivated by the fact that, under some regularity conditions, the ridge location of the rotational difference kernel estimate (RDKE; Qiu in Sankhyā Ser. A 59, 268–294, 1997, and J. Comput. Graph. Stat. 11, 799–822, 2002; Garlipp and Müller in Sankhyā Ser. A 69, 55–86, 2007) obtained from the noisy image is asymptotically close to the jump location of the true image. Accordingly, a computational procedure based on the kernel smoothing method is designed to find the ridge location of RDKE, and the result is taken as the jump location estimate. The sequence relationship among the points comprising our jump location estimate is obtained. Our jump location estimate is produced without the knowledge of the range or shape of jump region. Simulation results demonstrate that the proposed estimation procedure can detect the jump location very well, and thus it is a useful alternative for estimating the jump location in each of the 2D and 3D cases. 相似文献

13.

Using recursive algorithms for the efficient identification of smoothing spline ANOVA models

Marco Ratto Andrea Pagano 《AStA Advances in Statistical Analysis》2010,94(4):367-388

In this paper we present a unified discussion of different approaches to the identification of smoothing spline analysis of variance (ANOVA) models: (i) the “classical” approach (in the line of Wahba in Spline Models for Observational Data, 1990; Gu in Smoothing Spline ANOVA Models, 2002; Storlie et al. in Stat. Sin., 2011) and (ii) the State-Dependent Regression (SDR) approach of Young in Nonlinear Dynamics and Statistics (2001). The latter is a nonparametric approach which is very similar to smoothing splines and kernel regression methods, but based on recursive filtering and smoothing estimation (the Kalman filter combined with fixed interval smoothing). We will show that SDR can be effectively combined with the “classical” approach to obtain a more accurate and efficient estimation of smoothing spline ANOVA models to be applied for emulation purposes. We will also show that such an approach can compare favorably with kriging. 相似文献

14.

Spectral estimation for locally stationary time series with missing observations

Marina I. Knight Matthew A. Nunes Guy P. Nason 《Statistics and Computing》2012,22(4):877-895

Time series arising in practice often have an inherently irregular sampling structure or missing values, that can arise for example due to a faulty measuring device or complex time-dependent nature. Spectral decomposition of time series is a traditionally useful tool for data variability analysis. However, existing methods for spectral estimation often assume a regularly-sampled time series, or require modifications to cope with irregular or ‘gappy’ data. Additionally, many techniques also assume that the time series are stationary, which in the majority of cases is demonstrably not appropriate. This article addresses the topic of spectral estimation of a non-stationary time series sampled with missing data. The time series is modelled as a locally stationary wavelet process in the sense introduced by Nason et al. (J. R. Stat. Soc. B 62(2):271–292, 2000) and its realization is assumed to feature missing observations. Our work proposes an estimator (the periodogram) for the process wavelet spectrum, which copes with the missing data whilst relaxing the strong assumption of stationarity. At the centre of our construction are second generation wavelets built by means of the lifting scheme (Sweldens, Wavelet Applications in Signal and Image Processing III, Proc. SPIE, vol. 2569, pp. 68–79, 1995), designed to cope with irregular data. We investigate the theoretical properties of our proposed periodogram, and show that it can be smoothed to produce a bias-corrected spectral estimate by adopting a penalized least squares criterion. We demonstrate our method with real data and simulated examples. 相似文献

15.

Variance decompositions of nonlinear time series using stochastic simulation and sensitivity analysis

T. J. Harris W. Yu 《Statistics and Computing》2012,22(2):387-396

In this paper, A variance decomposition approach to quantify the effects of endogenous and exogenous variables for nonlinear time series models is developed. This decomposition is taken temporally with respect to the source of variation. The methodology uses Monte Carlo methods to affect the variance decomposition using the ANOVA-like procedures proposed in Archer et al. (J. Stat. Comput. Simul. 58:99–120, 1997), Sobol’ (Math. Model. 2:112–118, 1990). The results of this paper can be used in investment problems, biomathematics and control theory, where nonlinear time series with multiple inputs are encountered. 相似文献

16.

A Robust Conflict Measure of Inconsistencies in Bayesian Hierarchical Models

FREDRIK A. DAHL JØRUND GÅSEMYR BENT NATVIG 《Scandinavian Journal of Statistics》2007,34(4):816-828

Abstract. O'Hagan ( Highly Structured Stochastic Systems , Oxford University Press, Oxford, 2003) introduces some tools for criticism of Bayesian hierarchical models that can be applied at each node of the model, with a view to diagnosing problems of model fit at any point in the model structure. His method relies on computing the posterior median of a conflict index, typically through Markov chain Monte Carlo simulations. We investigate a Gaussian model of one-way analysis of variance, and show that O'Hagan's approach gives unreliable false warning probabilities. We extend and refine the method, especially avoiding double use of data by a data-splitting approach, accompanied by theoretical justifications from a non-trivial special case. Through extensive numerical experiments we show that our method detects model mis-specification about as well as the method of O'Hagan, while retaining the desired false warning probability for data generated from the assumed model. This also holds for Student's- t and uniform distribution versions of the model. 相似文献

17.

Regression analysis for cumulative incidence probability under competing risks and left-truncated sampling

Shen PS 《Lifetime data analysis》2012,18(1):1-18

The cumulative incidence function provides intuitive summary information about competing risks data. Via a mixture decomposition of this function, Chang and Wang (Statist. Sinca 19:391–408, 2009) study how covariates affect the cumulative incidence probability of a particular failure type at a chosen time point. Without specifying the corresponding failure time distribution, they proposed two estimators and derived their large sample properties. The first estimator utilized the technique of weighting to adjust for the censoring bias, and can be considered as an extension of Fine’s method (J R Stat Soc Ser B 61: 817–830, 1999). The second used imputation and extends the idea of Wang (J R Stat Soc Ser B 65: 921–935, 2003) from a nonparametric setting to the current regression framework. In this article, when covariates take only discrete values, we extend both approaches of Chang and Wang (Statist Sinca 19:391–408, 2009) by allowing left truncation. Large sample properties of the proposed estimators are derived, and their finite sample performance is investigated through a simulation study. We also apply our methods to heart transplant survival data. 相似文献

18.

Factor stochastic volatility with time varying loadings and Markov switching regimes

Hedibert Freitas Lopes Carlos Marinho Carvalho 《Journal of statistical planning and inference》2007

We generalize the factor stochastic volatility (FSV) model of Pitt and Shephard [1999. Time varying covariances: a factor stochastic volatility approach (with discussion). In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M. (Eds.), Bayesian Statistics, vol. 6, Oxford University Press, London, pp. 547–570.] and Aguilar and West [2000. Bayesian dynamic factor models and variance matrix discounting for portfolio allocation. J. Business Econom. Statist. 18, 338–357.] in two important directions. First, we make the FSV model more flexible and able to capture more general time-varying variance–covariance structures by letting the matrix of factor loadings to be time dependent. Secondly, we entertain FSV models with jumps in the common factors volatilities through So, Lam and Li's [1998. A stochastic volatility model with Markov switching. J. Business Econom. Statist. 16, 244–253.] Markov switching stochastic volatility model. Novel Markov Chain Monte Carlo algorithms are derived for both classes of models. We apply our methodology to two illustrative situations: daily exchange rate returns [Aguilar, O., West, M., 2000. Bayesian dynamic factor models and variance matrix discounting for portfolio allocation. J. Business Econom. Statist. 18, 338–357.] and Latin American stock returns [Lopes, H.F., Migon, H.S., 2002. Comovements and contagion in emergent markets: stock indexes volatilities. In: Gatsonis, C., Kass, R.E., Carriquiry, A.L., Gelman, A., Verdinelli, I. Pauler, D., Higdon, D. (Eds.), Case Studies in Bayesian Statistics, vol. 6, pp. 287–302]. 相似文献

19.

The flood algorithm—a multivariate, self-organizing-map-based, robust location and covariance estimator

Steffen Liebscher Thomas Kirschstein Claudia Becker 《Statistics and Computing》2012,22(1):325-336

Self-organizing maps (SOMs) introduced by Kohonen (Biol. Cybern. 43(1):59–69, 1982) are well-known in the field of artificial neural networks. The way SOMs are performing is very intuitive, leading to great popularity and numerous applications (related to statistics: classification, clustering). The result of the unsupervised learning process performed by SOMs is a non-linear, low-dimensional projection of the high-dimensional input data, that preserves certain features of the underlying data, e.g. the topology and probability distribution (Lee and Verleysen in Nonlinear Dimensionality Reduction, Springer, 2007; Kohonen in Self-organizing Maps, 3rd edn., Springer, 2001). 相似文献

20.

Cluster analysis of massive datasets in astronomy

Woncheol Jang Martin Hendry 《Statistics and Computing》2007,17(3):253-262

Clusters of galaxies are a useful proxy to trace the distribution of mass in the universe. By measuring the mass of clusters of galaxies on different scales, one can follow the evolution of the mass distribution (Martínez and Saar, Statistics of the Galaxy Distribution, 2002). It can be shown that finding galaxy clusters is equivalent to finding density contour clusters (Hartigan, Clustering Algorithms, 1975): connected components of the level set S _c≡{f>c} where f is a probability density function. Cuevas et al. (Can. J. Stat. 28, 367–382, 2000; Comput. Stat. Data Anal. 36, 441–459, 2001) proposed a nonparametric method for density contour clusters, attempting to find density contour clusters by the minimal spanning tree. While their algorithm is conceptually simple, it requires intensive computations for large datasets. We propose a more efficient clustering method based on their algorithm with the Fast Fourier Transform (FFT). The method is applied to a study of galaxy clustering on large astronomical sky survey data. 相似文献