期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using recursive algorithms for the efficient identification of smoothing spline ANOVA models

Marco Ratto Andrea Pagano 《AStA Advances in Statistical Analysis》2010,94(4):367-388

In this paper we present a unified discussion of different approaches to the identification of smoothing spline analysis of variance (ANOVA) models: (i) the “classical” approach (in the line of Wahba in Spline Models for Observational Data, 1990; Gu in Smoothing Spline ANOVA Models, 2002; Storlie et al. in Stat. Sin., 2011) and (ii) the State-Dependent Regression (SDR) approach of Young in Nonlinear Dynamics and Statistics (2001). The latter is a nonparametric approach which is very similar to smoothing splines and kernel regression methods, but based on recursive filtering and smoothing estimation (the Kalman filter combined with fixed interval smoothing). We will show that SDR can be effectively combined with the “classical” approach to obtain a more accurate and efficient estimation of smoothing spline ANOVA models to be applied for emulation purposes. We will also show that such an approach can compare favorably with kriging. 相似文献

2.

A Bayesian latent variable approach to functional principal components analysis with binary and count data

Angelika van der Linde 《AStA Advances in Statistical Analysis》2009,93(3):307-333

Recently, van der Linde (Comput. Stat. Data Anal. 53:517–533, 2008) proposed a variational algorithm to obtain approximate Bayesian inference in functional principal components analysis (FPCA), where the functions were observed with Gaussian noise. Generalized FPCA under different noise models with sparse longitudinal data was developed by Hall et al. (J. R. Stat. Soc. B 70:703–723, 2008), but no Bayesian approach is available yet. It is demonstrated that an adapted version of the variational algorithm can be applied to obtain a Bayesian FPCA for canonical parameter functions, particularly log-intensity functions given Poisson count data or logit-probability functions given binary observations. To this end a second order Taylor expansion of the log-likelihood, that is, a working Gaussian distribution and hence another step of approximation, is used. Although the approach is conceptually straightforward, difficulties can arise in practical applications depending on the accuracy of the approximation and the information in the data. A modified algorithm is introduced generally for one-parameter exponential families and exemplified for binary and count data. Conditions for its successful application are discussed and illustrated using simulated data sets. Also an application with real data is presented. 相似文献

3.

A multivariate uniformity test for the case of unknown support

José R. Berrendero Antonio Cuevas Beatriz Pateiro-López 《Statistics and Computing》2012,22(1):259-271

A test for the hypothesis of uniformity on a support S⊂ℝ^d is proposed. It is based on the use of multivariate spacings as those studied in Janson (Ann. Probab. 15:274–280, 1987). As a novel aspect, this test can be adapted to the case that the support S is unknown, provided that it fulfils the shape condition of λ-convexity. The consistency properties of this test are analyzed and its performance is checked through a small simulation study. The numerical problems involved in the practical calculation of the maximal spacing (which is required to obtain the test statistic) are also discussed in some detail. 相似文献

4.

Efficiency improvement in a class of survival models through model-free covariate incorporation 总被引：1，自引：1，他引：0

Garcia TP Ma Y Yin G 《Lifetime data analysis》2011,17(4):552-565

In randomized clinical trials, we are often concerned with comparing two-sample survival data. Although the log-rank test is usually suitable for this purpose, it may result in substantial power loss when the two groups have nonproportional hazards. In a more general class of survival models of Yang and Prentice (Biometrika 92:1–17, 2005), which includes the log-rank test as a special case, we improve model efficiency by incorporating auxiliary covariates that are correlated with the survival times. In a model-free form, we augment the estimating equation with auxiliary covariates, and establish the efficiency improvement using the semiparametric theories in Zhang et al. (Biometrics 64:707–715, 2008) and Lu and Tsiatis (Biometrics, 95:674–679, 2008). Under minimal assumptions, our approach produces an unbiased, asymptotically normal estimator with additional efficiency gain. Simulation studies and an application to a leukemia study show the satisfactory performance of the proposed method. 相似文献

5.

Semi-parametric hybrid empirical likelihood inference for two-sample comparison with censored data

Su H Zhou M Liang H 《Lifetime data analysis》2011,17(4):533-551

Two-sample comparison problems are often encountered in practical projects and have widely been studied in literature. Owing to practical demands, the research for this topic under special settings such as a semiparametric framework have also attracted great attentions. Zhou and Liang (Biometrika 92:271–282, 2005) proposed an empirical likelihood-based semi-parametric inference for the comparison of treatment effects in a two-sample problem with censored data. However, their approach is actually a pseudo-empirical likelihood and the method may not be fully efficient. In this study, we develop a new empirical likelihood-based inference under more general framework by using the hazard formulation of censored data for two sample semi-parametric hybrid models. We demonstrate that our empirical likelihood statistic converges to a standard chi-squared distribution under the null hypothesis. We further illustrate the use of the proposed test by testing the ROC curve with censored data, among others. Numerical performance of the proposed method is also examined. 相似文献

6.

Perfect sampling algorithm for small <Emphasis Type="Italic">m</Emphasis>×<Emphasis Type="Italic">n</Emphasis> contingency tables

Nicolas Wicker 《Statistics and Computing》2010,20(1):57-61

A Markov chain is proposed that uses coupling from the past sampling algorithm for sampling m×n contingency tables. This method is an extension of the one proposed by Kijima and Matsui (Rand. Struct. Alg., 29:243–256, 2006). It is not polynomial, as it is based upon a recursion, and includes a rejection phase but can be used for practical purposes on small contingency tables as illustrated in a classical 4×4 example. 相似文献

7.

Regression analysis for cumulative incidence probability under competing risks and left-truncated sampling

Shen PS 《Lifetime data analysis》2012,18(1):1-18

The cumulative incidence function provides intuitive summary information about competing risks data. Via a mixture decomposition of this function, Chang and Wang (Statist. Sinca 19:391–408, 2009) study how covariates affect the cumulative incidence probability of a particular failure type at a chosen time point. Without specifying the corresponding failure time distribution, they proposed two estimators and derived their large sample properties. The first estimator utilized the technique of weighting to adjust for the censoring bias, and can be considered as an extension of Fine’s method (J R Stat Soc Ser B 61: 817–830, 1999). The second used imputation and extends the idea of Wang (J R Stat Soc Ser B 65: 921–935, 2003) from a nonparametric setting to the current regression framework. In this article, when covariates take only discrete values, we extend both approaches of Chang and Wang (Statist Sinca 19:391–408, 2009) by allowing left truncation. Large sample properties of the proposed estimators are derived, and their finite sample performance is investigated through a simulation study. We also apply our methods to heart transplant survival data. 相似文献

8.

A comment on the orthogonalization of B-spline basis functions and their derivatives

Andrew Redd 《Statistics and Computing》2012,22(1):251-257

Through the use of a matrix representation for B-splines presented by Qin (Vis. Comput. 16:177–186, 2000) we are able to reexamine calculus operations on B-spline basis functions. In this matrix framework the problem associated with generating orthogonal splines is reexamined, and we show that this approach can simplify the operations involved to linear matrix operations. We apply these results to a recent paper (Zhou et al. in Biometrika 95:601–619, 2008) on hierarchical functional data analysis using a principal components approach, where a numerical integration scheme was used to orthogonalize a set of B-spline basis functions. These orthogonalized basis functions, along with their estimated derivatives, are then used to construct estimates of mean functions and functional principal components. By applying the methods presented here such algorithms can benefit from increased speed and precision. An R package is available to do the computations. 相似文献

9.

A partially adaptive estimator for the censored regression model based on a mixture of normal distributions 总被引：1，自引：0，他引：1

Steven B. Caudill 《Statistical Methods and Applications》2012,21(2):121-137

The goal of this paper is to introduce a partially adaptive estimator for the censored regression model based on an error structure described by a mixture of two normal distributions. The model we introduce is easily estimated by maximum likelihood using an EM algorithm adapted from the work of Bartolucci and Scaccia (Comput Stat Data Anal 48:821–834, 2005). A Monte Carlo study is conducted to compare the small sample properties of this estimator to the performance of some common alternative estimators of censored regression models including the usual tobit model, the CLAD estimator of Powell (J Econom 25:303–325, 1984), and the STLS estimator of Powell (Econometrica 54:1435–1460, 1986). In terms of RMSE, our partially adaptive estimator performed well. The partially adaptive estimator is applied to data on wife’s hours worked from Mroz (1987). In this application we find support for the partially adaptive estimator over the usual tobit model. 相似文献

10.

Cluster analysis of massive datasets in astronomy

Woncheol Jang Martin Hendry 《Statistics and Computing》2007,17(3):253-262

Clusters of galaxies are a useful proxy to trace the distribution of mass in the universe. By measuring the mass of clusters of galaxies on different scales, one can follow the evolution of the mass distribution (Martínez and Saar, Statistics of the Galaxy Distribution, 2002). It can be shown that finding galaxy clusters is equivalent to finding density contour clusters (Hartigan, Clustering Algorithms, 1975): connected components of the level set S _c≡{f>c} where f is a probability density function. Cuevas et al. (Can. J. Stat. 28, 367–382, 2000; Comput. Stat. Data Anal. 36, 441–459, 2001) proposed a nonparametric method for density contour clusters, attempting to find density contour clusters by the minimal spanning tree. While their algorithm is conceptually simple, it requires intensive computations for large datasets. We propose a more efficient clustering method based on their algorithm with the Fast Fourier Transform (FFT). The method is applied to a study of galaxy clustering on large astronomical sky survey data. 相似文献

11.

Test of hypotheses in panel data models when the regressor and disturbances are possibly non-stationary

Badi H. Baltagi Chihwa Kao Sanggon Na 《AStA Advances in Statistical Analysis》2011,95(4):329-350

This paper considers the problem of hypothesis testing in a simple panel data regression model with random individual effects and serially correlated disturbances. Following Baltagi et al. (Econom. J. 11:554–572, 2008), we allow for the possibility of non-stationarity in the regressor and/or the disturbance term. While Baltagi et al. (Econom. J. 11:554–572, 2008) focus on the asymptotic properties and distributions of the standard panel data estimators, this paper focuses on testing of hypotheses in this setting. One important finding is that unlike the time-series case, one does not necessarily need to rely on the “super-efficient” type AR estimator by Perron and Yabu (J. Econom. 151:56–69, 2009) to make an inference in the panel data. In fact, we show that the simple t-ratio always converges to the standard normal distribution, regardless of whether the disturbances and/or the regressor are stationary. 相似文献

12.

The structured elastic net for quantile regression and?support vector classification

Martin Slawski 《Statistics and Computing》2012,22(1):153-168

In view of its ongoing importance for a variety of practical applications, feature selection via ℓ ₁-regularization methods like the lasso has been subject to extensive theoretical as well empirical investigations. Despite its popularity, mere ℓ ₁-regularization has been criticized for being inadequate or ineffective, notably in situations in which additional structural knowledge about the predictors should be taken into account. This has stimulated the development of either systematically different regularization methods or double regularization approaches which combine ℓ ₁-regularization with a second kind of regularization designed to capture additional problem-specific structure. One instance thereof is the ‘structured elastic net’, a generalization of the proposal in Zou and Hastie (J. R. Stat. Soc. Ser. B 67:301–320, 2005), studied in Slawski et al. (Ann. Appl. Stat. 4(2):1056–1080, 2010) for the class of generalized linear models. 相似文献

13.

Stochastic boosting algorithms

Ajay Jasra Christopher C. Holmes 《Statistics and Computing》2011,21(3):335-347

In this article we develop a class of stochastic boosting (SB) algorithms, which build upon the work of Holmes and Pintore (Bayesian Stat. 8, Oxford University Press, Oxford, 2007). They introduce boosting algorithms which correspond to standard boosting (e.g. Bühlmann and Hothorn, Stat. Sci. 22:477–505, 2007) except that the optimization algorithms are randomized; this idea is placed within a Bayesian framework. We show that the inferential procedure in Holmes and Pintore (Bayesian Stat. 8, Oxford University Press, Oxford, 2007) is incorrect and further develop interpretational, computational and theoretical results which allow one to assess SB’s potential for classification and regression problems. To use SB, sequential Monte Carlo (SMC) methods are applied. As a result, it is found that SB can provide better predictions for classification problems than the corresponding boosting algorithm. A theoretical result is also given, which shows that the predictions of SB are not significantly worse than boosting, when the latter provides the best prediction. We also investigate the method on a real case study from machine learning. 相似文献

14.

Automatic Bayesian quantile regression curve fitting

Colin Chen Keming Yu 《Statistics and Computing》2009,19(3):271-281

Quantile regression, including median regression, as a more completed statistical model than mean regression, is now well known with its wide spread applications. Bayesian inference on quantile regression or Bayesian quantile regression has attracted much interest recently. Most of the existing researches in Bayesian quantile regression focus on parametric quantile regression, though there are discussions on different ways of modeling the model error by a parametric distribution named asymmetric Laplace distribution or by a nonparametric alternative named scale mixture asymmetric Laplace distribution. This paper discusses Bayesian inference for nonparametric quantile regression. This general approach fits quantile regression curves using piecewise polynomial functions with an unknown number of knots at unknown locations, all treated as parameters to be inferred through reversible jump Markov chain Monte Carlo (RJMCMC) of Green (Biometrika 82:711–732, 1995). Instead of drawing samples from the posterior, we use regression quantiles to create Markov chains for the estimation of the quantile curves. We also use approximate Bayesian factor in the inference. This method extends the work in automatic Bayesian mean curve fitting to quantile regression. Numerical results show that this Bayesian quantile smoothing technique is competitive with quantile regression/smoothing splines of He and Ng (Comput. Stat. 14:315–337, 1999) and P-splines (penalized splines) of Eilers and de Menezes (Bioinformatics 21(7):1146–1153, 2005). 相似文献

15.

ML estimation for factor analysis: EM or non-EM?

J.-H. Zhao Philip L. H. Yu Qibao Jiang 《Statistics and Computing》2008,18(2):109-123

To obtain maximum likelihood (ML) estimation in factor analysis (FA), we propose in this paper a novel and fast conditional maximization (CM) algorithm, which has quadratic and monotone convergence, consisting of a sequence of CM log-likelihood (CML) steps. The main contribution of this algorithm is that the closed form expression for the parameter to be updated in each step can be obtained explicitly, without resorting to any numerical optimization methods. In addition, a new ECME algorithm similar to Liu’s (Biometrika 81, 633–648, 1994) one is obtained as a by-product, which turns out to be very close to the simple iteration algorithm proposed by Lawley (Proc. R. Soc. Edinb. 60, 64–82, 1940) but our algorithm is guaranteed to increase log-likelihood at every iteration and hence to converge. Both algorithms inherit the simplicity and stability of EM but their convergence behaviors are much different as revealed in our extensive simulations: (1) In most situations, ECME and EM perform similarly; (2) CM outperforms EM and ECME substantially in all situations, no matter assessed by the CPU time or the number of iterations. Especially for the case close to the well known Heywood case, it accelerates EM by factors of around 100 or more. Also, CM is much more insensitive to the choice of starting values than EM and ECME. 相似文献

16.

Variance decompositions of nonlinear time series using stochastic simulation and sensitivity analysis

T. J. Harris W. Yu 《Statistics and Computing》2012,22(2):387-396

In this paper, A variance decomposition approach to quantify the effects of endogenous and exogenous variables for nonlinear time series models is developed. This decomposition is taken temporally with respect to the source of variation. The methodology uses Monte Carlo methods to affect the variance decomposition using the ANOVA-like procedures proposed in Archer et al. (J. Stat. Comput. Simul. 58:99–120, 1997), Sobol’ (Math. Model. 2:112–118, 1990). The results of this paper can be used in investment problems, biomathematics and control theory, where nonlinear time series with multiple inputs are encountered. 相似文献

17.

Feasible estimation in generalized structured models

Javier Roca-Pardiñas Stefan Sperlich 《Statistics and Computing》2010,20(3):367-379

This article introduces a feasible estimation method for a large class of semi and nonparametric models. We present the family of generalized structured models which we wish to estimate. After highlighting the main idea of the theoretical smooth backfitting estimators, we introduce a general estimation procedure. We consider modifications and practical issues, and discuss inference, cross validation, and asymptotic theory applying the theoretical framework of Mammen and Nielsen (Biometrika 90: 551–566, 2003). An extensive simulation study shows excellent performance of our method. Furthermore, real data applications from environmetrics and biometrics demonstrate its usefulness. 相似文献

18.

Multivariate CUSUM chart: properties and enhancements

Vasyl Golosnoy Sergiy Ragulin Wolfgang Schmid 《AStA Advances in Statistical Analysis》2009,93(3):263-279

The multivariate CUSUM#1 control chart of Pignatiello and Runger (J. Qual. Technol. 22:173–186, 1991) is widely used in practical applications due to its good ability to detect shifts of small and medium size in a process of interest. This paper investigates properties and suggests several refinements of this chart. The performance of the competing procedures is evaluated within a Monte Carlo simulation study. The suggested log MCUSUM chart proves to be the best among the investigated alternatives for the considered performance criteria. 相似文献

19.

On population-based simulation for static inference

Ajay Jasra David A. Stephens Christopher C. Holmes 《Statistics and Computing》2007,17(3):263-279

In this paper we present a review of population-based simulation for static inference problems. Such methods can be described as generating a collection of random variables {X _n}_n=1,…,N in parallel in order to simulate from some target density π (or potentially sequence of target densities). Population-based simulation is important as many challenging sampling problems in applied statistics cannot be dealt with successfully by conventional Markov chain Monte Carlo (MCMC) methods. We summarize population-based MCMC (Geyer, Computing Science and Statistics: The 23rd Symposium on the Interface, pp. 156–163, 1991; Liang and Wong, J. Am. Stat. Assoc. 96, 653–666, 2001) and sequential Monte Carlo samplers (SMC) (Del Moral, Doucet and Jasra, J. Roy. Stat. Soc. Ser. B 68, 411–436, 2006a), providing a comparison of the approaches. We give numerical examples from Bayesian mixture modelling (Richardson and Green, J. Roy. Stat. Soc. Ser. B 59, 731–792, 1997). 相似文献

20.

Multitrait-multimethod change modelling

Christian Geiser Michael Eid Fridtjof W. Nussbeck Delphine S. Courvoisier David A. Cole 《AStA Advances in Statistical Analysis》2010,94(2):185-201

Geiser (Multitrait-multimethod-multioccasion modeling, 2009) recently presented the Correlated State-Correlated (Methods-Minus-1) [CS-C(M−1)] model for analysing longitudinal multitrait-multimethod (MTMM) data. In the present article, the authors discuss the extension of the CS-C(M−1) model to a model that includes latent difference variables, called CS-C(M−1) change model. The CS-C(M−1) change model allows investigators to study inter-individual differences in intra-individual change over time, to separate true change from random measurement error, and to analyse change simultaneously for different methods. Change in a reference method can be contrasted with change in other methods to analyse convergent validity of change. 相似文献