首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The computation of rectangular probabilities of multivariate discrete integer distributions such as the multinomial, multivariate hypergeometric or multivariate Pólya distributions is of great interest both for statistical applications and for probabilistic modeling purpose. All these distributions are members of a broader family of multivariate discrete integer distributions for which computationaly efficient approximate methods have been proposed for the evaluation of such probabilities, but with no control over their accuracy. Recently, exact algorithms have been proposed for computing such probabilities, but they are either dedicated to a specific distribution or to very specific rectangular probabilities. We propose a new algorithm that allows to perform the computation of arbitrary rectangular probabilities in the most general case. Its accuracy matches or even outperforms the accuracy exact algorithms when the rounding errors are taken into account. In the worst case, its computational cost is the same as the most efficient exact method published so far, and is much lower in many situations of interest. It does not need any additional storage than the one for the parameters of the distribution, which allows to deal with large dimension/large counting parameter applications at no extra memory cost and with an acceptable computation time, which is a major difference with respect to the methods published so far.  相似文献   

2.
Conventional computations use real numbers as input and produce real numbers as results without any indication of the accuracy. Interval analysis, instead, uses interval elements throughout the computation and produces intervals as output with the guarantee that the true results are contained in them. One major use for interval analysis in statistics is to get results of high-dimensional multivariate probabilities. With the efforts to decrease the length of the intervals that contain the theoretically true answers, we can obtain results to any arbitrary accuracy, which is demonstrated by multivariate normal and multivariate t integrations. This is an advantage over the approximation methods that are currently in use. Since interval analysis is more computationally intensive than traditional computing, a MasPar parallel computer is used in this research to improve performance.  相似文献   

3.
This work is devoted to robust principal component analysis (PCA). We give a comparison between some multivariate estimators of location and scatter by computing the influence functions of the sensitivity coefficient ρ corresponding to these estimators, and the mean squared error (MSE) of estimators of ρ. The coefficient ρ measures the closeness between the subspaces spanned by the initial eigenvectors and their corresponding version derived from an infinitesimal perturbation of the data distribution.  相似文献   

4.
Parallel multivariate slice sampling   总被引:2,自引:0,他引:2  
Slice sampling provides an easily implemented method for constructing a Markov chain Monte Carlo (MCMC) algorithm. However, slice sampling has two major drawbacks: (i) it requires repeated evaluation of likelihoods for each update, which can make it impractical when evaluations are expensive or as the number of evaluations grows (geometrically) with the dimension of the slice sampler, and (ii) since it can be challenging to construct multivariate updates, the updates are typically univariate, which often results in slow mixing samplers. We propose an approach to multivariate slice sampling that naturally lends itself to a parallel implementation. Our approach takes advantage of recent advances in computer architectures, for instance, the newest generation of graphics cards can execute roughly 30,000 threads simultaneously. We demonstrate that it is possible to construct a multivariate slice sampler that has good mixing properties and is efficient in terms of computing time. The contributions of this article are therefore twofold. We study approaches for constructing a multivariate slice sampler, and we show how parallel computing can be useful for making MCMC algorithms computationally efficient. We study various implementations of our algorithm in the context of real and simulated data.  相似文献   

5.
Influence functions are derived for the parameters in covariance structure analysis, where the parameters are estimated by minimizing a discrepancy function between the assumed covariance matrix and the sample covariance matrix. The case of confirmatory factor analysis is studied precisely with a numerical example. Comparing with a general procedure called one-step estimation, the proposed procedure has two advantages:1) computing cost is cheaper, 2) the property that arbitrary influence can be decomposed into a fi-nite number of components discussed by Tanaka and Castano-Tostado(1990) can be used for efficient computing and the characterization of a covariance structure model from the sensitivity perspective. A numerical comparison is made among the confirmatory factor analysis and some procedures of ex-ploratory factor analysis by using the decomposition mentioned above.  相似文献   

6.
Recent developments of multivariate smoothing methods provide a rich collection of feasible models for nonparametric multivariate data analysis. Among the most interpretable are those with smoothed additive terms. Construction of various methods and algorithms for computing the models have been the main concern in literature in this area. Less results are available on the validation of computed fit, instead, and many applications of nonparametric methods end up in computing and comparing the generalized validation error or related indexes. This article reviews the behaviour of some of the best known multivariate nonparametric methods, based on subset selection and on projection, when (exact) collinearity or multicollinearity (near collinearity) is present in the input matrix. It shows the possible aliasing effects in computed fits of some selection methods and explores the properties of the projection spaces reached by projection methods in order to help data analysts to select the best model in case of ill conditioned input matrices. Two simulation studies and a real data set application are presented to illustrate further the effects of collinearity or multicollinearity in the fit.  相似文献   

7.
The aim of this article is to assess and compare several statistical methods for hyperspectral image supervised classification only using the spectral dimension. Since hyperspectral profiles may be viewed either as a random vector or a random curve, we propose to confront various multivariate discriminating procedures with functional alternatives. Eight methods representing three important statistical communities (mixture models, machine learning and functional data analysis) have been applied on three hyperspectral datasets following three protocols studying the influence of size and composition of the learning sample, with or without noised labels. Besides this comparative study, this work proposes a functional extension of multinomial logit model as well as a fast computing adaptation of the nonparametric functional discrimination. As a by-product, this work provides a useful comprehensive bibliography and also supplemental material especially oriented towards practitioners.  相似文献   

8.
The first step in statistical analysis is the parameter estimation. In multivariate analysis, one of the parameters of interest to be estimated is the mean vector. In multivariate statistical analysis, it is usually assumed that the data come from a multivariate normal distribution. In this situation, the maximum likelihood estimator (MLE), that is, the sample mean vector, is the best estimator. However, when outliers exist in the data, the use of sample mean vector will result in poor estimation. So, other estimators which are robust to the existence of outliers should be used. The most popular robust multivariate estimator for estimating the mean vector is S-estimator with desirable properties. However, computing this estimator requires the use of a robust estimate of mean vector as a starting point. Usually minimum volume ellipsoid (MVE) is used as a starting point in computing S-estimator. For high-dimensional data computing, the MVE takes too much time. In some cases, this time is so large that the existing computers cannot perform the computation. In addition to the computation time, for high-dimensional data set the MVE method is not precise. In this paper, a robust starting point for S-estimator based on robust clustering is proposed which could be used for estimating the mean vector of the high-dimensional data. The performance of the proposed estimator in the presence of outliers is studied and the results indicate that the proposed estimator performs precisely and much better than some of the existing robust estimators for high-dimensional data.  相似文献   

9.
Various methods for estimating the parameters of the simple harmonic curve and corresponding statistics for testing the significance of the sinusoidal trend are investigated. The locally reasonable method is almost fully efficient when the size of the trend is very small; however, the maximum likelihood method is preferred generally, especially when the trend is not very small. The log likelihood ratio test is more powerful than the R test which is based on locally reasonable estimates. The efficient method and the log likelihood ratio or equivalent tests are the best statistical techniques for identifying the cyclical trend. Thus they are the methods of choice when adequate computing facilities are available.  相似文献   

10.
We describe methods to detect influential observations in a sample of pre-shapes when the underlying distribution is assumed to be complex Bingham. One of these methods is based on Cook's distance, which is derived from the likelihood of the complex Bingham distribution. Other method is related to the tangent space, which is based on the local influence for the multivariate normal distribution. A method to detect outliers is also explained. The application of the methods is illustrated in both a real dataset and a simulated sample.  相似文献   

11.
Local Influence in Generalized Estimating Equations   总被引:1,自引:0,他引:1  
Abstract.  We investigate the influence of subjects or observations on regression coefficients of generalized estimating equations (GEEs) using local influence. The GEE approach does not require the full multivariate distribution of the response vector. We extend the likelihood displacement to a quasi-likelihood displacement, and propose local influence diagnostics under several perturbation schemes. An illustrative example in GEEs is given and we compare the results using the local influence and deletion methods.  相似文献   

12.
Estimating parameters in a stochastic volatility (SV) model is a challenging task. Among other estimation methods and approaches, efficient simulation methods based on importance sampling have been developed for the Monte Carlo maximum likelihood estimation of univariate SV models. This paper shows that importance sampling methods can be used in a general multivariate SV setting. The sampling methods are computationally efficient. To illustrate the versatility of this approach, three different multivariate stochastic volatility models are estimated for a standard data set. The empirical results are compared to those from earlier studies in the literature. Monte Carlo simulation experiments, based on parameter estimates from the standard data set, are used to show the effectiveness of the importance sampling methods.  相似文献   

13.
Estimating parameters in a stochastic volatility (SV) model is a challenging task. Among other estimation methods and approaches, efficient simulation methods based on importance sampling have been developed for the Monte Carlo maximum likelihood estimation of univariate SV models. This paper shows that importance sampling methods can be used in a general multivariate SV setting. The sampling methods are computationally efficient. To illustrate the versatility of this approach, three different multivariate stochastic volatility models are estimated for a standard data set. The empirical results are compared to those from earlier studies in the literature. Monte Carlo simulation experiments, based on parameter estimates from the standard data set, are used to show the effectiveness of the importance sampling methods.  相似文献   

14.
First, we compare several methods for computing the ML estimators of the two-parameter beta distribution; the most effective one is identified. Second, a simple way is found to characterize the sampling distribution of the ML estimators; this characterization leads to a practical way of establishing confidence intervals for the ML estimators.  相似文献   

15.
Models with large parameter (i.e., hundreds or thousands of parameters) often behave as if they depend upon only a few parameters, with the rest having comparatively little influence. One challenge of sensitivity analysis with such models is screening parameters to identify the influential ones, and then characterizing their influences.

Large models often require significant computing resources to evaluate their output, and so a good screening mechanism should be efficient: it should minimize the number of times a model must be exercised. This paper describes an efficient procedure to perform sensitivity analysis on deterministic models with specified ranges or probability distributions for each parameter.

It is based on repeated exercising of the model, which can be treated as a black box. Statistical checks can ensure that the screening identified parameters that account for the bulk of the model variation. Subsequent sensitivity analysis can use the screening information to reduce the investment required to characterize the influence of influential and other parameters.

The procedure exploits simplifications in the dependence of a model output on model inputs. It works best where a small number of parameters are much more influential than all the rest. The method is much more sensitive to the number of influential parameters than to the total number of parameters. It is most effective when linear or quadratic effects dominate higher order effects and complex interactions.

The paper presents a set of M athematica functions that can be used to create a variety of types of experimental designs useful for sensitivity analysis, including simple random, latin hypercube and fractional factorial sampling. Each sampling method can use discretization, folding, grouping and replication to create composite designs. These techniques have beencombined in a composite approach called Iterated Fractional Factorial Design (IFFD).

The procedure is applied to model of nuclear fuel waste disposal, and to simplified example models to demonstrate the concepts involved.  相似文献   

16.
We give a general procedure to characterize multivariate distributions by using products of the hazard gradient and mean residual life components. This procedure is applied to characterize multivariate distributions as Gumbel exponential, Lomax, Burr, Pareto and generalized Pareto multivariate distributions. Our results extend the results of several authors and can be used to study how to extend univariate models to the multivariate set-up.  相似文献   

17.
Some simple methods for the estimation of mixed multivariate autoregressive moving average time series models are introduced. The methods require the fitting of a long autoregression to the data and the computation of consistent initial estimates for the parameters of the model. After these preliminaries the estimators of the paper are obtained by applying weighted least squares to a multivariate auxiliary regression model. Two types of weight matrices are considered. Both of them yield estimators which are strongly consistent and asymptotically normally distributed. The first estimators are also asymptotically efficient while the second ones are not fully efficient but computationally simple. A simulation study is performed to illustrate the behaviour of the estimators in finite samples.  相似文献   

18.
There has been much recent work on Bayesian approaches to survival analysis, incorporating features such as flexible baseline hazards, time-dependent covariate effects, and random effects. Some of the proposed methods are quite complicated to implement, and we argue that as good or better results can be obtained via simpler methods. In particular, the normal approximation to the log-gamma distribution yields easy and efficient computational methods in the face of simple multivariate normal priors for baseline log-hazards and time-dependent covariate effects. While the basic method applies to piecewise-constant hazards and covariate effects, it is easy to apply importance sampling to consider smoother functions.  相似文献   

19.
ADE-4: a multivariate analysis and graphical display software   总被引:59,自引:0,他引:59  
We present ADE-4, a multivariate analysis and graphical display software. Multivariate analysis methods available in ADE-4 include usual one-table methods like principal component analysis and correspondence analysis, spatial data analysis methods (using a total variance decomposition into local and global components, analogous to Moran and Geary indices), discriminant analysis and within/between groups analyses, many linear regression methods including lowess and polynomial regression, multiple and PLS (partial least squares) regression and orthogonal regression (principal component regression), projection methods like principal component analysis on instrumental variables, canonical correspondence analysis and many other variants, coinertia analysis and the RLQ method, and several three-way table (k-table) analysis methods. Graphical display techniques include an automatic collection of elementary graphics corresponding to groups of rows or to columns in the data table, thus providing a very efficient way for automatic k-table graphics and geographical mapping options. A dynamic graphic module allows interactive operations like searching, zooming, selection of points, and display of data values on factor maps. The user interface is simple and homogeneous among all the programs; this contributes to making the use of ADE-4 very easy for non- specialists in statistics, data analysis or computer science.  相似文献   

20.
Multiple regression diagnostic methods have recently been developed to help data analysts identify failures of data to adhere to the assumptions that customarily accompany regression models. However, the mathematical development of regression diagnostics has not generally led to efficient computing formulas. Conflicting terminology and the use of closely related but subtly different statistics has caused confusion. This article attempts to make regression diagnostics more readily available to those who compute regressions with packaged statistics programs. We review regression diagnostic methodology, highlighting ambiguities of terminology and relationships among similar methods. We present new formulas for efficient computing of regression diagnostics. Finally, we offer specific advice on obtaining regression diagnostics from existing statistics programs, with examples drawn from Minitab and SAS.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号