期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Cluster analysis of massive datasets in astronomy

Woncheol Jang Martin Hendry 《Statistics and Computing》2007,17(3):253-262

Clusters of galaxies are a useful proxy to trace the distribution of mass in the universe. By measuring the mass of clusters of galaxies on different scales, one can follow the evolution of the mass distribution (Martínez and Saar, Statistics of the Galaxy Distribution, 2002). It can be shown that finding galaxy clusters is equivalent to finding density contour clusters (Hartigan, Clustering Algorithms, 1975): connected components of the level set S _c≡{f>c} where f is a probability density function. Cuevas et al. (Can. J. Stat. 28, 367–382, 2000; Comput. Stat. Data Anal. 36, 441–459, 2001) proposed a nonparametric method for density contour clusters, attempting to find density contour clusters by the minimal spanning tree. While their algorithm is conceptually simple, it requires intensive computations for large datasets. We propose a more efficient clustering method based on their algorithm with the Fast Fourier Transform (FFT). The method is applied to a study of galaxy clustering on large astronomical sky survey data. 相似文献

2.

Output-sensitive algorithms for Tukey depth and related problems

David Bremner Dan Chen John Iacono Stefan Langerman Pat Morin 《Statistics and Computing》2008,18(3):259-266

The Tukey depth (Proceedings of the International Congress of Mathematicians, vol. 2, pp. 523–531, 1975) of a point p with respect to a finite set S of points is the minimum number of elements of S contained in any closed halfspace that contains p. Algorithms for computing the Tukey depth of a point in various dimensions are considered. The running times of these algorithms depend on the value of the output, making them suited to situations, such as outlier removal, where the value of the output is typically small. This research was partly funded by the NSERC Canada. 相似文献

3.

A likelihood ratio test of a homoscedastic normal mixture against a heteroscedastic normal mixture

Yungtai Lo 《Statistics and Computing》2008,18(3):233-240

It is generally assumed that the likelihood ratio statistic for testing the null hypothesis that data arise from a homoscedastic normal mixture distribution versus the alternative hypothesis that data arise from a heteroscedastic normal mixture distribution has an asymptotic χ ² reference distribution with degrees of freedom equal to the difference in the number of parameters being estimated under the alternative and null models under some regularity conditions. Simulations show that the χ ² reference distribution will give a reasonable approximation for the likelihood ratio test only when the sample size is 2000 or more and the mixture components are well separated when the restrictions suggested by Hathaway (Ann. Stat. 13:795–800, 1985) are imposed on the component variances to ensure that the likelihood is bounded under the alternative distribution. For small and medium sample sizes, parametric bootstrap tests appear to work well for determining whether data arise from a normal mixture with equal variances or a normal mixture with unequal variances. 相似文献

4.

Test of hypotheses in panel data models when the regressor and disturbances are possibly non-stationary

Badi H. Baltagi Chihwa Kao Sanggon Na 《AStA Advances in Statistical Analysis》2011,95(4):329-350

This paper considers the problem of hypothesis testing in a simple panel data regression model with random individual effects and serially correlated disturbances. Following Baltagi et al. (Econom. J. 11:554–572, 2008), we allow for the possibility of non-stationarity in the regressor and/or the disturbance term. While Baltagi et al. (Econom. J. 11:554–572, 2008) focus on the asymptotic properties and distributions of the standard panel data estimators, this paper focuses on testing of hypotheses in this setting. One important finding is that unlike the time-series case, one does not necessarily need to rely on the “super-efficient” type AR estimator by Perron and Yabu (J. Econom. 151:56–69, 2009) to make an inference in the panel data. In fact, we show that the simple t-ratio always converges to the standard normal distribution, regardless of whether the disturbances and/or the regressor are stationary. 相似文献

5.

On population-based simulation for static inference

Ajay Jasra David A. Stephens Christopher C. Holmes 《Statistics and Computing》2007,17(3):263-279

In this paper we present a review of population-based simulation for static inference problems. Such methods can be described as generating a collection of random variables {X _n}_n=1,…,N in parallel in order to simulate from some target density π (or potentially sequence of target densities). Population-based simulation is important as many challenging sampling problems in applied statistics cannot be dealt with successfully by conventional Markov chain Monte Carlo (MCMC) methods. We summarize population-based MCMC (Geyer, Computing Science and Statistics: The 23rd Symposium on the Interface, pp. 156–163, 1991; Liang and Wong, J. Am. Stat. Assoc. 96, 653–666, 2001) and sequential Monte Carlo samplers (SMC) (Del Moral, Doucet and Jasra, J. Roy. Stat. Soc. Ser. B 68, 411–436, 2006a), providing a comparison of the approaches. We give numerical examples from Bayesian mixture modelling (Richardson and Green, J. Roy. Stat. Soc. Ser. B 59, 731–792, 1997). 相似文献

6.

A partially adaptive estimator for the censored regression model based on a mixture of normal distributions 总被引：1，自引：0，他引：1

Steven B. Caudill 《Statistical Methods and Applications》2012,21(2):121-137

The goal of this paper is to introduce a partially adaptive estimator for the censored regression model based on an error structure described by a mixture of two normal distributions. The model we introduce is easily estimated by maximum likelihood using an EM algorithm adapted from the work of Bartolucci and Scaccia (Comput Stat Data Anal 48:821–834, 2005). A Monte Carlo study is conducted to compare the small sample properties of this estimator to the performance of some common alternative estimators of censored regression models including the usual tobit model, the CLAD estimator of Powell (J Econom 25:303–325, 1984), and the STLS estimator of Powell (Econometrica 54:1435–1460, 1986). In terms of RMSE, our partially adaptive estimator performed well. The partially adaptive estimator is applied to data on wife’s hours worked from Mroz (1987). In this application we find support for the partially adaptive estimator over the usual tobit model. 相似文献

7.

Efficiency improvement in a class of survival models through model-free covariate incorporation 总被引：1，自引：1，他引：0

Garcia TP Ma Y Yin G 《Lifetime data analysis》2011,17(4):552-565

In randomized clinical trials, we are often concerned with comparing two-sample survival data. Although the log-rank test is usually suitable for this purpose, it may result in substantial power loss when the two groups have nonproportional hazards. In a more general class of survival models of Yang and Prentice (Biometrika 92:1–17, 2005), which includes the log-rank test as a special case, we improve model efficiency by incorporating auxiliary covariates that are correlated with the survival times. In a model-free form, we augment the estimating equation with auxiliary covariates, and establish the efficiency improvement using the semiparametric theories in Zhang et al. (Biometrics 64:707–715, 2008) and Lu and Tsiatis (Biometrics, 95:674–679, 2008). Under minimal assumptions, our approach produces an unbiased, asymptotically normal estimator with additional efficiency gain. Simulation studies and an application to a leukemia study show the satisfactory performance of the proposed method. 相似文献

8.

The effect of infrequent trading on detecting price jumps

Frowin C. Schulz Karl Mosler 《AStA Advances in Statistical Analysis》2011,95(1):27-58

The subject of the present study is to analyze how accurately an elaborated price jump detection methodology by Barndorff-Nielsen and Shephard (J. Financ. Econom. 2:1–37, 2004a; 4:1–30, 2006) applies to financial time series characterized by less frequent trading. In this context, it is of primary interest to understand the impact of infrequent trading on two test statistics, applicable to disentangle contributions from price jumps to realized variance. In a simulation study, evidence is found that infrequent trading induces a sizable distortion of the test statistics towards overrejection. A new empirical investigation using high frequency information of the most heavily traded electricity forward contract of the Nord Pool Energy Exchange corroborates the evidence of the simulation. In line with the theory, a “zero-return-adjusted estimation” is introduced to reduce the bias in the test statistics, both illustrated in the simulation study and empirical case. 相似文献

9.

Classification using distance nearest neighbours

N. Friel A. N. Pettitt 《Statistics and Computing》2011,21(3):431-437

This paper proposes a new probabilistic classification algorithm using a Markov random field approach. The joint distribution of class labels is explicitly modelled using the distances between feature vectors. Intuitively, a class label should depend more on class labels which are closer in the feature space, than those which are further away. Our approach builds on previous work by Holmes and Adams (J. R. Stat. Soc. Ser. B 64:295–306, 2002; Biometrika 90:99–112, 2003) and Cucala et al. (J. Am. Stat. Assoc. 104:263–273, 2009). Our work shares many of the advantages of these approaches in providing a probabilistic basis for the statistical inference. In comparison to previous work, we present a more efficient computational algorithm to overcome the intractability of the Markov random field model. The results of our algorithm are encouraging in comparison to the k-nearest neighbour algorithm. 相似文献

10.

Precedence tests and Lehmann alternatives

Paul van der Laan Subha Chakraborti 《Statistical Papers》2001,42(3):301-312

G = F ^k (k > 1); G = 1 − (1−F)^k (k < 1); G = F ^k (k < 1); and G = 1 − (1−F)^k (k > 1), where F and G are two continuous cumulative distribution functions. If an optimal precedence test (one with the maximal power) is determined for one of these four classes, the optimal tests for the other classes of alternatives can be derived. Application of this is given using the results of Lin and Sukhatme (1992) who derived the best precedence test for testing the null hypothesis that the lifetimes of two types of items on test have the same distibution. The test has maximum power for fixed κ in the class of alternatives G = 1 − (1−F)^k, with k < 1. Best precedence tests for the other three classes of Lehmann-type alternatives are derived using their results. Finally, a comparison of precedence tests with Wilcoxon's two-sample test is presented. Received: February 22, 1999; revised version: June 7, 2000 相似文献

11.

Pooling multivariate data under W, LR and LM tests

B. M. Golam Kibria A. K. Ms. E. Saleh 《Statistical Papers》2006,47(1):49-68

Two independent random samples are drawn from two multivariate normal populations with mean vectors μ1 and μ2 and a common variance-covariance matrix Σ. Ahmed and Saleh (1990) considered preliminary test maximum likelihood estimator (PMLTE) for estimating μ1 based on the Hotelling's T _N ², when it is suspected that μ1=μ2. In this paper, the PTMLE based on the Wald (W), Likelihood Ratio (LR) and Lagrangian Multiplier (LM) tests are considered. Using the quadratic risk function, the conditions of superiority of the proposed estimator for departure parameter are derived. A max-min rule for the size of the preliminary test of significance is presented. It is demonstrated that the PTMLE based on W test produces the highest minimum guaranteed efficiencies compared to UMLE among the three test procedures. 相似文献

12.

Estimation of the variance when kurtosis is known

Eshetu Wencheko Honest W. Chipoyera 《Statistical Papers》2009,50(3):455-464

The unbiased estimator of a population variance σ², S ² has traditionally been overemphasized, regardless of sample size. In this paper, alternative estimators of population variance are developed. These estimators are biased and have the minimum possible mean-squared error [and we define them as the “minimum mean-squared error biased estimators” (MBBE)]. The comparative merit of these estimators over the unbiased estimator is explored using relative efficiency (RE) (a ratio of mean-squared error values). It is found that, across all population distributions investigated, the RE of the MBBE is much higher for small samples and progressively diminishes to 1 with increasing sample size. The paper gives two applications involving the normal and exponential distributions. 相似文献

13.

Testing multivariate uniformity: The distance‐to‐boundary method

Jos R. Berrendero Antonio Cuevas Francisco Vjoszquez‐grande 《Revue canadienne de statistique》2006,34(4):693-707

Given a random sample taken on a compact domain S ? ^d, the authors propose a new method for testing the hypothesis of uniformity of the underlying distribution. The test statistic is based on the distance of every observation to the boundary of S. The proposed test has a number of interesting properties. In particular, it is feasible and particularly suitable for high dimensional data; it is distribution free for a wide range of choices of 5; it can be adapted to the case that the support of S is unknown; and it also allows for one‐sided versions. Moreover, the results suggest that, in some cases, this procedure does not suffer from the well‐known curse of dimensionality. The authors study the properties of this test from both a theoretical and practical point of view. In particular, an extensive Monte Carlo simulation study allows them to compare their methods with some alternative procedures. They conclude that the proposed test provides quite a satisfactory balance between power, computational simplicity, and adaptability to different dimensions and supports. 相似文献

14.

The structured elastic net for quantile regression and?support vector classification

Martin Slawski 《Statistics and Computing》2012,22(1):153-168

In view of its ongoing importance for a variety of practical applications, feature selection via ℓ ₁-regularization methods like the lasso has been subject to extensive theoretical as well empirical investigations. Despite its popularity, mere ℓ ₁-regularization has been criticized for being inadequate or ineffective, notably in situations in which additional structural knowledge about the predictors should be taken into account. This has stimulated the development of either systematically different regularization methods or double regularization approaches which combine ℓ ₁-regularization with a second kind of regularization designed to capture additional problem-specific structure. One instance thereof is the ‘structured elastic net’, a generalization of the proposal in Zou and Hastie (J. R. Stat. Soc. Ser. B 67:301–320, 2005), studied in Slawski et al. (Ann. Appl. Stat. 4(2):1056–1080, 2010) for the class of generalized linear models. 相似文献

15.

Using recursive algorithms for the efficient identification of smoothing spline ANOVA models

Marco Ratto Andrea Pagano 《AStA Advances in Statistical Analysis》2010,94(4):367-388

In this paper we present a unified discussion of different approaches to the identification of smoothing spline analysis of variance (ANOVA) models: (i) the “classical” approach (in the line of Wahba in Spline Models for Observational Data, 1990; Gu in Smoothing Spline ANOVA Models, 2002; Storlie et al. in Stat. Sin., 2011) and (ii) the State-Dependent Regression (SDR) approach of Young in Nonlinear Dynamics and Statistics (2001). The latter is a nonparametric approach which is very similar to smoothing splines and kernel regression methods, but based on recursive filtering and smoothing estimation (the Kalman filter combined with fixed interval smoothing). We will show that SDR can be effectively combined with the “classical” approach to obtain a more accurate and efficient estimation of smoothing spline ANOVA models to be applied for emulation purposes. We will also show that such an approach can compare favorably with kriging. 相似文献

16.

ML estimation for factor analysis: EM or non-EM?

J.-H. Zhao Philip L. H. Yu Qibao Jiang 《Statistics and Computing》2008,18(2):109-123

To obtain maximum likelihood (ML) estimation in factor analysis (FA), we propose in this paper a novel and fast conditional maximization (CM) algorithm, which has quadratic and monotone convergence, consisting of a sequence of CM log-likelihood (CML) steps. The main contribution of this algorithm is that the closed form expression for the parameter to be updated in each step can be obtained explicitly, without resorting to any numerical optimization methods. In addition, a new ECME algorithm similar to Liu’s (Biometrika 81, 633–648, 1994) one is obtained as a by-product, which turns out to be very close to the simple iteration algorithm proposed by Lawley (Proc. R. Soc. Edinb. 60, 64–82, 1940) but our algorithm is guaranteed to increase log-likelihood at every iteration and hence to converge. Both algorithms inherit the simplicity and stability of EM but their convergence behaviors are much different as revealed in our extensive simulations: (1) In most situations, ECME and EM perform similarly; (2) CM outperforms EM and ECME substantially in all situations, no matter assessed by the CPU time or the number of iterations. Especially for the case close to the well known Heywood case, it accelerates EM by factors of around 100 or more. Also, CM is much more insensitive to the choice of starting values than EM and ECME. 相似文献

17.

Bias correction for the regression-based LM fractional integration test

Matei Demetrescu Adina I. Tarcolea 《AStA Advances in Statistical Analysis》2008,92(1):91-99

This paper examines the finite-sample behavior of the Lagrange Multiplier (LM) test for fractional integration proposed by Breitung and Hassler (J. Econom. 110:167–185, 2002). We find by extensive Monte Carlo simulations that size distortions can be quite large in small samples. These are caused by a finite-sample bias towards the alternative. Analytic expressions for this bias are derived, based on which the test can easily be corrected. 相似文献

18.

Causal Proportional Hazards Models and Time-constant Exposure in Randomized Clinical Trials

Loeys T Goetghebeur E Vandebosch A 《Lifetime data analysis》2005,11(4):435-449

The last decade saw enormous progress in the development of causal inference tools to account for noncompliance in randomized clinical trials. With survival outcomes, structural accelerated failure time (SAFT) models enable causal estimation of effects of observed treatments without making direct assumptions on the compliance selection mechanism. The traditional proportional hazards model has however rarely been used for causal inference. The estimator proposed by Loeys and Goetghebeur (2003, Biometrics vol. 59 pp. 100–105) is limited to the setting of all or nothing exposure. In this paper, we propose an estimation procedure for more general causal proportional hazards models linking the distribution of potential treatment-free survival times to the distribution of observed survival times via observed (time-constant) exposures. Specifically, we first build models for observed exposure-specific survival times. Next, using the proposed causal proportional hazards model, the exposure-specific survival distributions are backtransformed to their treatment-free counterparts, to obtain – after proper mixing – the unconditional treatment-free survival distribution. Estimation of the parameter(s) in the causal model is then based on minimizing a test statistic for equality in backtransformed survival distributions between randomized arms. 相似文献

19.

Sparse conformal predictors

Mohamed Hebiri 《Statistics and Computing》2010,20(2):253-266

Conformal predictors, introduced by Vovk et al. (Algorithmic Learning in a Random World, Springer, New York, 2005), serve to build prediction intervals by exploiting a notion of conformity of the new data point with previously observed data. We propose a novel method for constructing prediction intervals for the response variable in multivariate linear models. The main emphasis is on sparse linear models, where only few of the covariates have significant influence on the response variable even if the total number of covariates is very large. Our approach is based on combining the principle of conformal prediction with the ℓ ₁ penalized least squares estimator (LASSO). The resulting confidence set depends on a parameter ε>0 and has a coverage probability larger than or equal to 1−ε. The numerical experiments reported in the paper show that the length of the confidence set is small. Furthermore, as a by-product of the proposed approach, we provide a data-driven procedure for choosing the LASSO penalty. The selection power of the method is illustrated on simulated and real data. 相似文献

20.

A new functional statistic for multivariate normality

Chun-Chao Wang Yi-Ting Hwang 《Statistics and Computing》2011,21(4):501-509

The test statistics of assessing multivariate normality based on Roy’s union-intersection principle (Roy, Some Aspects of Multivariate Analysis, Wiley, New York, 1953) are generalizations of univariate normality, and are formed as the optimal value of a nonlinear multivariate function. Due to the difficulty of solving multivariate optimization problems, researchers have proposed various approximations. However, this paper shows that the (nearly) global solution contrarily results in unsatisfactory power performance in Monte Carlo simulations. Thus, instead of searching for a true optimal solution, this study proposes a functional statistic constructed by the q% quantile of the objective function values. A comparative Monte Carlo analysis shows that the proposed method is superior to two highly recommended tests when detecting widely-selected alternatives that characterize the various properties of multivariate normality. 相似文献