首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
The calculation of multivariate normal orthant probabilities is practically impossible when the number of variates is greater than five or six, except in very special cases. A transformation of the integral is obtained which enables quite accurate Monte Carlo estimates to be obtained for a fairly high number of dimensions, particularly if control variates are used.  相似文献   

2.
Summary. To construct an optimal estimating function by weighting a set of score functions, we must either know or estimate consistently the covariance matrix for the individual scores. In problems with high dimensional correlated data the estimated covariance matrix could be unreliable. The smallest eigenvalues of the covariance matrix will be the most important for weighting the estimating equations, but in high dimensions these will be poorly determined. Generalized estimating equations introduced the idea of a working correlation to minimize such problems. However, it can be difficult to specify the working correlation model correctly. We develop an adaptive estimating equation method which requires no working correlation assumptions. This methodology relies on finding a reliable approximation to the inverse of the variance matrix in the quasi-likelihood equations. We apply a multivariate generalization of the conjugate gradient method to find estimating equations that preserve the information well at fixed low dimensions. This approach is particularly useful when the estimator of the covariance matrix is singular or close to singular, or impossible to invert owing to its large size.  相似文献   

3.
Vine copulas are a flexible class of dependence models consisting of bivariate building blocks and have proven to be particularly useful in high dimensions. Classical model distance measures require multivariate integration and thus suffer from the curse of dimensionality. In this paper, we provide numerically tractable methods to measure the distance between two vine copulas even in high dimensions. For this purpose, we consecutively develop three new distance measures based on the Kullback–Leibler distance, using the result that it can be expressed as the sum over expectations of KL distances between univariate conditional densities, which can be easily obtained for vine copulas. To reduce numerical calculations, we approximate these expectations on adequately designed grids, outperforming Monte Carlo integration with respect to computational time. For the sake of interpretability, we provide a baseline calibration for the proposed distance measures. We further develop similar substitutes for the Jeffreys distance, a symmetrized version of the Kullback–Leibler distance. In numerous examples and applications, we illustrate the strengths and weaknesses of the developed distance measures.  相似文献   

4.
This article develops a new test based on Spearman’s rank correlation coefficients for total independence in high dimensions. The test is robust to the non normality and heavy tails of the data, which is a merit that is not shared by the existing tests in literature. Simulation results suggest that the new test performs well under several typical null and alternative hypotheses. Besides, we employ a real data set to illustrate the use of the new test.  相似文献   

5.
Two procedures are considered for estimating the concentration parameters of the Fisher matrix distribution for rotations or orientations in three dimensions. The first is maximum likelihood. The use of a convenient 1-dimensional integral representation of the normalising constant, which greatly simplifies the computation, is suggested. The second approach exploits the equivalence of the Fisher distribution for rotations in three dimensions, and the Bingham distribution for axes in four dimensions. We describe a pseudo likelihood procedure which works for the Bingham distribution in any dimension. This alternative approach does not require numerical integration. Results on the asymptotic efficiency of the pseudo likelihood estimator relative to the maximum likelihood estimator are given, and the two estimators are compared in the analysis of a well-known vectorcardiography dataset.  相似文献   

6.
We propose a family of goodness-of-fit tests for copulas. The tests use generalizations of the information matrix (IM) equality of White and so relate to the copula test proposed by Huang and Prokhorov. The idea is that eigenspectrum-based statements of the IM equality reduce the degrees of freedom of the test’s asymptotic distribution and lead to better size-power properties, even in high dimensions. The gains are especially pronounced for vine copulas, where additional benefits come from simplifications of score functions and the Hessian. We derive the asymptotic distribution of the generalized tests, accounting for the nonparametric estimation of the marginals and apply a parametric bootstrap procedure, valid when asymptotic critical values are inaccurate. In Monte Carlo simulations, we study the behavior of the new tests, compare them with several Cramer–von Mises type tests and confirm the desired properties of the new tests in high dimensions.  相似文献   

7.
Quasi-regression, an approach for approximating an unknown function on the unit cube in very high dimensions, has very high computational efficiency. In this article, generalized quasi-regression is introduced. Some theoretical results on the generalized quasi-regression are provided, and numerical examples are illustrated.  相似文献   

8.
ABSTRACT

We propose two non parametric portmanteau test statistics for serial dependence in high dimensions using the correlation integral. One test depends on a cutoff threshold value, while the other test is freed of this dependence. Although these tests may each be viewed as variants of the classical Brock, Dechert, and Scheinkman (BDS) test statistic, they avoid some of the major weaknesses of this test. We establish consistency and asymptotic normality of both portmanteau tests. Using Monte Carlo simulations, we investigate the small sample properties of the tests for a variety of data generating processes with normally and uniformly distributed innovations. We show that asymptotic theory provides accurate inference in finite samples and for relatively high dimensions. This is followed by a power comparison with the BDS test, and with several rank-based extensions of the BDS tests that have recently been proposed in the literature. Two real data examples are provided to illustrate the use of the test procedure.  相似文献   

9.
Slope rotatability over all directions (SROAD) is a useful concept when the slope of a second-order response is to be studied. SROAD designs ensure that knowledge jof the slope is acquired symmetrically, whatever direction later becomes of more interest as the data are analyzed. In a prior paper, we explored designs for k=2 and 3 dimensions, which do not have the full symmetries of second-order designs but which still possess the SROAD property.Here, we discuss designs in higher dimensions.The introductory sections 1 and 2 are essentially identical to those of the prior paper.  相似文献   

10.
This paper proposes an algorithm for the classification of multi-dimensional datasets based on the conjugate Bayesian Multiple Kernel Grouping Learning (BMKGL). Using conjugate Bayesian framework improves the computation efficiency. Multiple kernels instead of a single kernel avoid the kernel selection problem which is also a computationally expensive work. Through grouping parameter learning, BMKGL can simultaneously integrate information from different dimensions and find the dimensions which contribute more to the variations of the outcome for the purpose of interpretable property. Meanwhile, BMKGL can select the most suitable combination of kernels for different dimensions so as to extract the most appropriate measure for each dimension and improve the accuracy of classification results. The simulation results illustrate that our learning process has better performance in prediction results and stability compared to some popular classifiers, such as k-nearest neighbours algorithm, support vector machine algorithm and naive Bayes classifier. BMKGL also outperforms previous methods in terms of accuracy and interpretation for the heart disease and EEG datasets.  相似文献   

11.
This paper sets out a methodology for risk assessment of pregnancies in terms of adverse outcomes such as low birth-weight and neonatal mortality in a situation of multiple but possibly interdependent major dimensions of risk. In the present analysis, the outcome is very low birth-weight and the observed risk indicators are assumed to be linked to three main dimensions: socio-demographic, bio-medical status, and fertility history. Summary scores for each mother under each risk dimension are derived from observed indicators and used as the basis for a multidimensional classification to high or low risk. A fully Bayesian method of implementation is applied to estimation and prediction. A case study is presented of very low birth-weight singleton livebirths over 1991-93 in a health region covering North West London and parts of the adjacent South East of England, with validating predictions to maternities in 1994.  相似文献   

12.
Algorithms     
Abstract

The main reason for the limited use of multivariate discrete models is the difficulty in calculating the required probabilities. The task is usually undertaken via recursive relationships which become quite computationally demanding for high dimensions and large values. The present paper discusses efficient algorithms that make use of the recurrence relationships in a manner that reduces the computational effort and thus allow for easy and cheap calculation of the probabilities. The most common multivariate discrete distribution, the multivariate Poisson distribution is treated. Real data problems are provided to motivate the use of the proposed strategies. Extensions of our results are discussed. It is shown that probabilities, for a large family of multivariate distributions, can be computed efficiently via our algorithms.  相似文献   

13.
We give a new characterization of Elfving's (1952) method for computing c-optimal designs in k dimensions which gives explicit formulae for the k unknown optimal weights and k unknown signs in Elfving's characterization. This eliminates the need to search over these parameters to compute c-optimal designs, and thus reduces the computational burden from solving a family of optimization problems to solving a single optimization problem for the optimal finite support set. We give two illustrative examples: a high dimensional polynomial regression model and a logistic regression model, the latter showing that the method can be used for locally optimal designs in nonlinear models as well.  相似文献   

14.
We describe parallel Markov chain Monte Carlo methods that propagate a collective ensemble of paths, with local covariance information calculated from neighbouring replicas. The use of collective dynamics eliminates multiplicative noise and stabilizes the dynamics, thus providing a practical approach to difficult anisotropic sampling problems in high dimensions. Numerical experiments with model problems demonstrate that dramatic potential speedups, compared to various alternative schemes, are attainable.  相似文献   

15.
This paper deals with the problem of finding nearly D-optimal designs for multivariate quadratic regression on a cube which take as few observations as possible and still allow estimation of all parameters. It is shown that among the class of all such designs taking as many observations as possible on the corners of the cube there is one which is asymptotically efficient as the dimension of the cube increases. Methods for constructing designs in this class, using balanced arrays, are given. It is shown that the designs so constructed for dimensions ≤6 compare well with existing computer generated designs, and in dimensions 5 and 6 are better than those in literature prior to 1978.  相似文献   

16.
Multivariate extreme events are typically modelled using multivariate extreme value distributions. Unfortunately, there exists no finite parametrization for the class of multivariate extreme value distributions. One common approach is to model extreme events using some flexible parametric subclass. This approach has been limited to only two or three dimensions, primarily because suitably flexible high-dimensional parametric models have prohibitively complex density functions. We present an approach that allows a number of popular flexible models to be used in arbitrarily high dimensions. The approach easily handles missing and censored data, and can be employed when modelling componentwise maxima and multivariate threshold exceedances. The approach is based on a representation using conditionally independent marginal components, conditioning on positive stable random variables. We use Bayesian inference, where the conditioning variables are treated as auxiliary variables within Markov chain Monte Carlo simulations. We demonstrate these methods with an application to sea-levels, using data collected at 10 sites on the east coast of England.  相似文献   

17.
High-dimensional predictive models, those with more measurements than observations, require regularization to be well defined, perform well empirically, and possess theoretical guarantees. The amount of regularization, often determined by tuning parameters, is integral to achieving good performance. One can choose the tuning parameter in a variety of ways, such as through resampling methods or generalized information criteria. However, the theory supporting many regularized procedures relies on an estimate for the variance parameter, which is complicated in high dimensions. We develop a suite of information criteria for choosing the tuning parameter in lasso regression by leveraging the literature on high-dimensional variance estimation. We derive intuition showing that existing information-theoretic approaches work poorly in this setting. We compare our risk estimators to existing methods with an extensive simulation and derive some theoretical justification. We find that our new estimators perform well across a wide range of simulation conditions and evaluation criteria.  相似文献   

18.
Distance concentration is the phenomenon that, in certain conditions, the contrast between the nearest and the farthest neighbouring points vanishes as the data dimensionality increases. It affects high dimensional data processing, analysis, retrieval, and indexing, which all rely on some notion of distance or dissimilarity. Previous work has characterised this phenomenon in the limit of infinite dimensions. However, real data is finite dimensional, and hence the infinite-dimensional characterisation is insufficient. Here we quantify the phenomenon more precisely, for the possibly high but finite dimensional case in a distribution-free manner, by bounding the tails of the probability that distances become meaningless. As an application, we show how this can be used to assess the concentration of a given distance function in some unknown data distribution solely on the basis of an available data sample from it. This can be used to test and detect problematic cases more rigorously than it is currently possible, and we demonstrate the working of this approach on both synthetic data and ten real-world data sets from different domains.  相似文献   

19.
Longitudinal imaging studies have moved to the forefront of medical research due to their ability to characterize spatio-temporal features of biological structures across the lifespan. Valid inference in longitudinal imaging requires enough flexibility of the covariance model to allow reasonable fidelity to the true pattern. On the other hand, the existence of computable estimates demands a parsimonious parameterization of the covariance structure. Separable (Kronecker product) covariance models provide one such parameterization in which the spatial and temporal covariances are modeled separately. However, evaluating the validity of this parameterization in high dimensions remains a challenge. Here we provide a scientifically informed approach to assessing the adequacy of separable (Kronecker product) covariance models when the number of observations is large relative to the number of independent sampling units (sample size). We address both the general case, in which unstructured matrices are considered for each covariance model, and the structured case, which assumes a particular structure for each model. For the structured case, we focus on the situation where the within-subject correlation is believed to decrease exponentially in time and space as is common in longitudinal imaging studies. However, the provided framework equally applies to all covariance patterns used within the more general multivariate repeated measures context. Our approach provides useful guidance for high dimension, low-sample size data that preclude using standard likelihood-based tests. Longitudinal medical imaging data of caudate morphology in schizophrenia illustrate the approaches appeal.  相似文献   

20.
In a clinical trial with a time-to-event endpoint the treatment effect can be measured in various ways. Under proportional hazards all reasonable measures (such as the hazard ratio and the difference in restricted mean survival time) are consistent in the following sense: Take any control group survival distribution such that the hazard rate remains above zero; if there is no benefit by any measure there is no benefit by all measures, and as the magnitude of treatment benefit increases by any measure it increases by all measures. Under nonproportional hazards, however, survival curves can cross, and the direction of the effect for any pair of measures can be inconsistent. In this paper we critically evaluate a variety of treatment effect measures in common use and identify flaws with them. In particular, we demonstrate that a treatment's benefit has two distinct and independent dimensions which can be measured by the difference in the survival rate at the end of follow-up and the difference in restricted mean survival time, and that commonly used measures do not adequately capture both dimensions. We demonstrate that a generalized hazard difference, which can be estimated by the difference in exposure-adjusted subject incidence rates, captures both dimensions, and that its inverse, the number of patient-years of follow-up that results in one fewer event (the NYNT), is an easily interpretable measure of the magnitude of clinical benefit.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号