首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
There exist primarily three different types of algorithms for computing nonparametric maximum likelihood estimates (NPMLEs) of mixing distributions in the literature, which are the EM-type algorithms, the vertex direction algorithms such as VDM and VEM, and the algorithms based on general constrained optimization techniques such as the projected gradient method. It is known that the projected gradient algorithm may run into stagnation during iterations. When a stagnation occurs, VDM steps need to be added. We argue that the abrupt switch to VDM steps can significantly reduce the efficiency of the projected gradient algorithm, and is usually unnecessary. In this paper, we define a group of partially projected directions, which can be regarded as hybrids of ordinary projected gradient directions and VDM directions. Based on these directions, four new algorithms are proposed for computing NPMLEs of mixing distributions. The properties of the algorithms are discussed and their convergence is proved. Extensive numerical simulations show that the new algorithms outperform the existing methods, especially when a NPMLE has a large number of support points or when high accuracy is required.  相似文献   

2.
Clusters of galaxies are a useful proxy to trace the distribution of mass in the universe. By measuring the mass of clusters of galaxies on different scales, one can follow the evolution of the mass distribution (Martínez and Saar, Statistics of the Galaxy Distribution, 2002). It can be shown that finding galaxy clusters is equivalent to finding density contour clusters (Hartigan, Clustering Algorithms, 1975): connected components of the level set S c ≡{f>c} where f is a probability density function. Cuevas et al. (Can. J. Stat. 28, 367–382, 2000; Comput. Stat. Data Anal. 36, 441–459, 2001) proposed a nonparametric method for density contour clusters, attempting to find density contour clusters by the minimal spanning tree. While their algorithm is conceptually simple, it requires intensive computations for large datasets. We propose a more efficient clustering method based on their algorithm with the Fast Fourier Transform (FFT). The method is applied to a study of galaxy clustering on large astronomical sky survey data.  相似文献   

3.
Model selection is a general paradigm which includes many statistical problems. One of the most fruitful and popular approaches to carry it out is the minimization of a penalized criterion. Birgé and Massart (Probab. Theory Relat. Fields 138:33–73, 2006) have proposed a promising data-driven method to calibrate such criteria whose penalties are known up to a multiplicative factor: the “slope heuristics”. Theoretical works validate this heuristic method in some situations and several papers report a promising practical behavior in various frameworks. The purpose of this work is twofold. First, an introduction to the slope heuristics and an overview of the theoretical and practical results about it are presented. Second, we focus on the practical difficulties occurring for applying the slope heuristics. A new practical approach is carried out and compared to the standard dimension jump method. All the practical solutions discussed in this paper in different frameworks are implemented and brought together in a Matlab graphical user interface called capushe. Supplemental Materials containing further information and an additional application, the capushe package and the datasets presented in this paper, are available on the journal Web site.  相似文献   

4.
In order to guarantee confidentiality and privacy of firm-level data, statistical offices apply various disclosure limitation techniques. However, each anonymization technique has its protection limits such that the probability of disclosing the individual information for some observations is not minimized. To overcome this problem, we propose combining two separate disclosure limitation techniques, blanking and multiplication of independent noise, in order to protect the original dataset. The proposed approach yields a decrease in the probability of reidentifying/disclosing individual information and can be applied to linear and nonlinear regression models. We show how to combine the blanking method with the multiplicative measurement error method and how to estimate the model by combining the multiplicative Simulation-Extrapolation (M-SIMEX) approach from Nolte (, 2007) on the one side with the Inverse Probability Weighting (IPW) approach going back to Horwitz and Thompson (J. Am. Stat. Assoc. 47:663–685, 1952) and on the other side with matching methods, as an alternative to IPW, like the semiparametric M-Estimator proposed by Flossmann (, 2007). Based on Monte Carlo simulations, we show that multiplicative measurement error combined with blanking as a masking procedure does not necessarily lead to a severe reduction in the estimation quality, provided that its effects on the data generating process are known.  相似文献   

5.
This paper proposes a new probabilistic classification algorithm using a Markov random field approach. The joint distribution of class labels is explicitly modelled using the distances between feature vectors. Intuitively, a class label should depend more on class labels which are closer in the feature space, than those which are further away. Our approach builds on previous work by Holmes and Adams (J. R. Stat. Soc. Ser. B 64:295–306, 2002; Biometrika 90:99–112, 2003) and Cucala et al. (J. Am. Stat. Assoc. 104:263–273, 2009). Our work shares many of the advantages of these approaches in providing a probabilistic basis for the statistical inference. In comparison to previous work, we present a more efficient computational algorithm to overcome the intractability of the Markov random field model. The results of our algorithm are encouraging in comparison to the k-nearest neighbour algorithm.  相似文献   

6.
To obtain maximum likelihood (ML) estimation in factor analysis (FA), we propose in this paper a novel and fast conditional maximization (CM) algorithm, which has quadratic and monotone convergence, consisting of a sequence of CM log-likelihood (CML) steps. The main contribution of this algorithm is that the closed form expression for the parameter to be updated in each step can be obtained explicitly, without resorting to any numerical optimization methods. In addition, a new ECME algorithm similar to Liu’s (Biometrika 81, 633–648, 1994) one is obtained as a by-product, which turns out to be very close to the simple iteration algorithm proposed by Lawley (Proc. R. Soc. Edinb. 60, 64–82, 1940) but our algorithm is guaranteed to increase log-likelihood at every iteration and hence to converge. Both algorithms inherit the simplicity and stability of EM but their convergence behaviors are much different as revealed in our extensive simulations: (1) In most situations, ECME and EM perform similarly; (2) CM outperforms EM and ECME substantially in all situations, no matter assessed by the CPU time or the number of iterations. Especially for the case close to the well known Heywood case, it accelerates EM by factors of around 100 or more. Also, CM is much more insensitive to the choice of starting values than EM and ECME.  相似文献   

7.
We propose a regression method that studies covariate effects on the conditional quantiles of residual lifetimes at a certain followup time point. This can be particularly useful in cancer studies, where more patients survive cancers initially and a patient’s residual life expectancy is used to compare the efficacy of secondary or adjuvant therapies. The new method provides a consistent estimator that often exhibits smaller standard error in real and simulated examples, compared to the existing method of Jung et al. (2009). It also provides a simple empirical likelihood inference method that does not require estimating the covariance matrix of the estimator or resampling. We apply the new method to a breast cancer study (NSABP Protocol B-04, Fisher et al. (2002)) and estimate median residual lifetimes at various followup time points, adjusting for important prognostic factors.  相似文献   

8.
A Markov chain is proposed that uses coupling from the past sampling algorithm for sampling m×n contingency tables. This method is an extension of the one proposed by Kijima and Matsui (Rand. Struct. Alg., 29:243–256, 2006). It is not polynomial, as it is based upon a recursion, and includes a rejection phase but can be used for practical purposes on small contingency tables as illustrated in a classical 4×4 example.  相似文献   

9.
While most of the literature on measurement error focuses on additive measurement error, we consider in this paper the multiplicative case. We apply the Simulation Extrapolation method (SIMEX)—a procedure which was originally proposed by Cook and Stefanski (J. Am. Stat. Assoc. 89:1314–1328, 1994) in order to correct the bias due to additive measurement error—to the case where data are perturbed by multiplicative noise and present several approaches to account for multiplicative noise in the SIMEX procedure. Furthermore, we analyze how well these approaches reduce the bias caused by multiplicative perturbation. Using a binary probit model, we produce Monte Carlo evidence on how the reduction of data quality can be minimized. For helpful comments, we would like to thank Helmut Küchenhoff, Winfried Pohlmeier, and Gerd Ronning. Sandra Nolte gratefully acknowledges financial support by the DFG. Elena Biewen and Martin Rosemann gratefully acknowledge the financial support by the Federal Ministry of Education and Research (BMBF). The usual disclaimer applies.  相似文献   

10.
A new exchange algorithm for construction of 2mD-optimal fractional factorial design (FFD) is devised. This exchange algorithm is a modification of the one due to Fedorov (1969, 1972) and is an improvement over similar algorithm due to Mitchell (1974) and Galil & Kiefer (1980). This exchange algorithm is then used to construct 54 D-optimal 2m-FFD's of resolution V for m = 4,5,6.  相似文献   

11.
This article develops a new and stable estimator for information matrix when the EM algorithm is used in maximum likelihood estimation. This estimator is constructed using the smoothed individual complete-data scores that are readily available from running the EM algorithm. The method works for dependent data sets and when the expectation step is an irregular function of the conditioning parameters. In comparison to the approach of Louis (J. R. Stat. Soc., Ser. B 44:226–233, 1982), this new estimator is more stable and easier to implement. Both real and simulated data are used to demonstrate the use of this new estimator.  相似文献   

12.
The paper is focussing on some recent developments in nonparametric mixture distributions. It discusses nonparametric maximum likelihood estimation of the mixing distribution and will emphasize gradient type results, especially in terms of global results and global convergence of algorithms such as vertex direction or vertex exchange method. However, the NPMLE (or the algorithms constructing it) provides also an estimate of the number of components of the mixing distribution which might be not desirable for theoretical reasons or might be not allowed from the physical interpretation of the mixture model. When the number of components is fixed in advance, the before mentioned algorithms can not be used and globally convergent algorithms do not exist up to now. Instead, the EM algorithm is often used to find maximum likelihood estimates. However, in this case multiple maxima are often occuring. An example from a meta-analyis of vitamin A and childhood mortality is used to illustrate the considerable, inferential importance of identifying the correct global likelihood. To improve the behavior of the EM algorithm we suggest a combination of gradient function steps and EM steps to achieve global convergence leading to the EM algorithm with gradient function update (EMGFU). This algorithms retains the number of components to be exactly k and typically converges to the global maximum. The behavior of the algorithm is highlighted at hand of several examples.  相似文献   

13.
A new exchange algorithm for the construction of (M, S)-optimal incomplete block designs (IBDS) is developed. This exchange algorithm is used to construct 973 (M, S)-optimal IBDs (v, k, b) for v= 4,…,12 (varieties) with arbitrary v, k (block size) and b (number of blocks). The efficiencies of the “best” (M, S)-optimal IBDs constructed by this algorithm are compared with the efficiencies of the corresponding nearly balanced incomplete block designs (NBIBDs) of Cheng(1979), Cheng & Wu (1981) and Mitchell & John(1976).  相似文献   

14.
In this paper, the problem of learning Bayesian network (BN) structures is studied by virtue of particle swarm optimization (PSO) algorithms. After analysing the optimal flying behaviours of some classic PSO algorithms, we put forward a new PSO-based method of learning BN structures. In this method, we treat the position of a particle as an imaginary likelihood that represents to what extent the associated edges exist, treat the velocity as the corresponding increment or decrement of likelihood that represents how the position changes in the process of flying, and treat the BN structures outputted as appendants of positions. The resulting algorithm and its improved version with expert knowledge integrated are illustrated to be efficient in collecting the randomly searched information from all particles. The numerical study based on two bechmarking BNs shows the superiority of our algorithms in the sense of precision, speed, and accuracy.  相似文献   

15.
In this paper we present a review of population-based simulation for static inference problems. Such methods can be described as generating a collection of random variables {X n } n=1,…,N in parallel in order to simulate from some target density π (or potentially sequence of target densities). Population-based simulation is important as many challenging sampling problems in applied statistics cannot be dealt with successfully by conventional Markov chain Monte Carlo (MCMC) methods. We summarize population-based MCMC (Geyer, Computing Science and Statistics: The 23rd Symposium on the Interface, pp. 156–163, 1991; Liang and Wong, J. Am. Stat. Assoc. 96, 653–666, 2001) and sequential Monte Carlo samplers (SMC) (Del Moral, Doucet and Jasra, J. Roy. Stat. Soc. Ser. B 68, 411–436, 2006a), providing a comparison of the approaches. We give numerical examples from Bayesian mixture modelling (Richardson and Green, J. Roy. Stat. Soc. Ser. B 59, 731–792, 1997).  相似文献   

16.
We introduce multicovariate-adjusted regression (MCAR), an adjustment method for regression analysis, where both the response (Y) and predictors (X 1, …, X p ) are not directly observed. The available data have been contaminated by unknown functions of a set of observable distorting covariates, Z 1, …, Z s , in a multiplicative fashion. The proposed method substantially extends the current contaminated regression modelling capability, by allowing for multiple distorting covariate effects. MCAR is a flexible generalisation of the recently proposed covariate-adjusted regression method, an effective adjustment method in the presence of a single covariate, Z. For MCAR estimation, we establish a connection between the MCAR models and adaptive varying coefficient models. This connection leads to an adaptation of a hybrid backfitting estimation algorithm. Extensive simulations are used to study the performance and limitations of the proposed iterative estimation algorithm. In particular, the bias and mean square error of the proposed MCAR estimators are examined, relative to a baseline and a consistent benchmark estimator. The method is also illustrated with a Pima Indian diabetes data set, where the response and predictors are potentially contaminated by body mass index and triceps skin fold thickness. Both distorting covariates measure aspects of obesity, an important risk factor in type 2 diabetes.  相似文献   

17.
Conformal predictors, introduced by Vovk et al. (Algorithmic Learning in a Random World, Springer, New York, 2005), serve to build prediction intervals by exploiting a notion of conformity of the new data point with previously observed data. We propose a novel method for constructing prediction intervals for the response variable in multivariate linear models. The main emphasis is on sparse linear models, where only few of the covariates have significant influence on the response variable even if the total number of covariates is very large. Our approach is based on combining the principle of conformal prediction with the 1 penalized least squares estimator (LASSO). The resulting confidence set depends on a parameter ε>0 and has a coverage probability larger than or equal to 1−ε. The numerical experiments reported in the paper show that the length of the confidence set is small. Furthermore, as a by-product of the proposed approach, we provide a data-driven procedure for choosing the LASSO penalty. The selection power of the method is illustrated on simulated and real data.  相似文献   

18.
The method of tempered transitions was proposed by Neal (Stat. Comput. 6:353–366, 1996) for tackling the difficulties arising when using Markov chain Monte Carlo to sample from multimodal distributions. In common with methods such as simulated tempering and Metropolis-coupled MCMC, the key idea is to utilise a series of successively easier to sample distributions to improve movement around the state space. Tempered transitions does this by incorporating moves through these less modal distributions into the MCMC proposals. Unfortunately the improved movement between modes comes at a high computational cost with a low acceptance rate of expensive proposals. We consider how the algorithm may be tuned to increase the acceptance rates for a given number of temperatures. We find that the commonly assumed geometric spacing of temperatures is reasonable in many but not all applications.  相似文献   

19.
ABSTRACT

In this study, methods for efficient construction of A-, MV-, D- and E-optimal or near-optimal block designs for two-colour cDNA microarray experiments with array as the block effect are considered. Two algorithms, namely the array exchange and treatment exchange algorithms together with the complete enumeration technique are introduced. For large numbers of arrays or treatments or both, the complete enumeration method is highly computer intensive. The treatment exchange algorithm computes the optimal or near-optimal designs faster than the array exchange algorithm. The two methods however produce optimal or near-optimal designs with the same efficiency under the four optimality criteria.  相似文献   

20.
We study the limiting degree distribution of the vertex splitting model introduced in Ref.[3 David, F.; Dukes, M.; Jonsson, T.; Stefansson, S.Ö. Random tree growth by vertex splitting. J. Statist. Mech. Theory Exp. 2009, 04. doi:10.1088/1742-5468/2009/04/P04009. [Google Scholar]]. This is a model of randomly growing ordered trees, where in each time step the tree is separated into two components by splitting a vertex into two, and then inserting an edge between the two new vertices. Under some assumptions on the parameters, related to the growth of the maximal degree of the tree, we prove that the vertex degree densities converge almost surely to constants which satisfy a system of equations. Using this, we are also able to strengthen and prove some previously non-rigorous results mentioned in the literature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号