首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Density-based clustering methods hinge on the idea of associating groups to the connected components of the level sets of the density underlying the data, to be estimated by a nonparametric method. These methods claim some desirable properties and generally good performance, but they involve a non-trivial computational effort, required for the identification of the connected regions. In a previous work, the use of spatial tessellation such as the Delaunay triangulation has been proposed, because it suitably generalizes the univariate procedure for detecting the connected components. However, its computational complexity grows exponentially with the dimensionality of data, thus making the triangulation unfeasible for high dimensions. Our aim is to overcome the limitations of Delaunay triangulation. We discuss the use of an alternative procedure for identifying the connected regions associated to the level sets of the density. By measuring the extent of possible valleys of the density along the segment connecting pairs of observations, the proposed procedure shifts the formulation from a space with arbitrary dimension to a univariate one, thus leading benefits both in computation and visualization.  相似文献   

2.
ABSTRACT

In economics and government statistics, aggregated data instead of individual level data are usually reported for data confidentiality and for simplicity. In this paper we develop a method of flexibly estimating the probability density function of the population using aggregated data obtained as group averages when individual level data are grouped according to quantile limits. The kernel density estimator has been commonly applied to such data without taking into account the data aggregation process and has been shown to perform poorly. Our method models the quantile function as an integral of the exponential of a spline function and deduces the density function from the quantile function. We match the aggregated data to their theoretical counterpart using least squares, and regularize the estimation by using the squared second derivatives of the density function as the penalty function. A computational algorithm is developed to implement the method. Application to simulated data and US household income survey data show that our penalized spline estimator can accurately recover the density function of the underlying population while the common use of kernel density estimation is severely biased. The method is applied to study the dynamic of China's urban income distribution using published interval aggregated data of 1985–2010.  相似文献   

3.
Classical statistical approaches for multiclass probability estimation are typically based on regression techniques such as multiple logistic regression, or density estimation approaches such as linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA). These methods often make certain assumptions on the form of probability functions or on the underlying distributions of subclasses. In this article, we develop a model-free procedure to estimate multiclass probabilities based on large-margin classifiers. In particular, the new estimation scheme is employed by solving a series of weighted large-margin classifiers and then systematically extracting the probability information from these multiple classification rules. A main advantage of the proposed probability estimation technique is that it does not impose any strong parametric assumption on the underlying distribution and can be applied for a wide range of large-margin classification methods. A general computational algorithm is developed for class probability estimation. Furthermore, we establish asymptotic consistency of the probability estimates. Both simulated and real data examples are presented to illustrate competitive performance of the new approach and compare it with several other existing methods.  相似文献   

4.
The tabled significance values of the Kolmogorov-Smirnov goodness-of-fit statistic determined for continuous underlying distributions are conservative for applications involving discrete underlying distributions. Conover (1972) proposed an efficient method for computing the exact significance level of the Kolmogorov-Smirnov test for discrete distributions; however, he warned against its use for large sample sizes because “the calculations become too difficult.”

In this work we explore the relationship between sample size and the computational effectiveness of Conover's formulas, where “computational effectiveness” is taken to mean the accuracy attained with a fixed precision of machine arithmetic. The nature of the difficulties in calculations is pointed out. It is indicated that, despite these difficulties, Conover's method of computing the Kolmogorov-Smirnov significance level for discrete distributions can still be a useful tool for a wide range of sample sizes.  相似文献   

5.
For big data analysis, high computational cost for Bayesian methods often limits their applications in practice. In recent years, there have been many attempts to improve computational efficiency of Bayesian inference. Here we propose an efficient and scalable computational technique for a state-of-the-art Markov chain Monte Carlo methods, namely, Hamiltonian Monte Carlo. The key idea is to explore and exploit the structure and regularity in parameter space for the underlying probabilistic model to construct an effective approximation of its geometric properties. To this end, we build a surrogate function to approximate the target distribution using properly chosen random bases and an efficient optimization process. The resulting method provides a flexible, scalable, and efficient sampling algorithm, which converges to the correct target distribution. We show that by choosing the basis functions and optimization process differently, our method can be related to other approaches for the construction of surrogate functions such as generalized additive models or Gaussian process models. Experiments based on simulated and real data show that our approach leads to substantially more efficient sampling algorithms compared to existing state-of-the-art methods.  相似文献   

6.
The adaptive rejection sampling (ARS) algorithm is a universal random generator for drawing samples efficiently from a univariate log-concave target probability density function (pdf). ARS generates independent samples from the target via rejection sampling with high acceptance rates. Indeed, ARS yields a sequence of proposal functions that converge toward the target pdf, so that the probability of accepting a sample approaches one. However, sampling from the proposal pdf becomes more computational demanding each time it is updated. In this work, we propose a novel ARS scheme, called Cheap Adaptive Rejection Sampling (CARS), where the computational effort for drawing from the proposal remains constant, decided in advance by the user. For generating a large number of desired samples, CARS is faster than ARS.  相似文献   

7.
We develop a novel computational methodology for Bayesian optimal sequential design for nonparametric regression. This computational methodology, that we call inhomogeneous evolutionary Markov chain Monte Carlo, combines ideas of simulated annealing, genetic or evolutionary algorithms, and Markov chain Monte Carlo. Our framework allows optimality criteria with general utility functions and general classes of priors for the underlying regression function. We illustrate the usefulness of our novel methodology with applications to experimental design for nonparametric function estimation using Gaussian process priors and free-knot cubic splines priors.  相似文献   

8.
The theoretical price of a financial option is given by the expectation of its discounted expiry time payoff. The computation of this expectation depends on the density of the value of the underlying instrument at expiry time. This density depends on both the parametric model assumed for the behaviour of the underlying, and the values of parameters within the model, such as volatility. However neither the model, nor the parameter values are known. Common practice when pricing options is to assume a specific model, such as geometric Brownian Motion, and to use point estimates of the model parameters, thereby precisely defining a density function.We explicitly acknowledge the uncertainty of model and parameters by constructing the predictive density of the underlying as an average of model predictive densities, weighted by each model's posterior probability. A model's predictive density is constructed by integrating its transition density function by the posterior distribution of its parameters. This is an extension to Bayesian model averaging. Sampling importance-resampling and Monte Carlo algorithms implement the computation. The advantage of this method is that rather than falsely assuming the model and parameter values are known, inherent ignorance is acknowledged and dealt with in a mathematically logical manner, which utilises all information from past and current observations to generate and update option prices. Moreover point estimates for parameters are unnecessary. We use this method to price a European Call option on a share index.  相似文献   

9.
10.
A method for constructing powerful significance tests for the equivalence of two proportions is proposed by assuming prior density values. Recent changes in the medical research environment emphasize the need for choice of a prior density in advance of any study. The proposed test is based on the posterior probability of the alternative model and preserves the significance level with minimal reduction of power. The new test performs better than the familiar mid-p test under the uniform prior density condition. In addition, the computational burden is low. Potential extensions of the proposed test to related problems are also discussed.  相似文献   

11.
A nonstationary Markov process model with embedded explanatory variables offers a means to account for underlying causal factors while retaining unrestrictive assumptions and the predictive ability of a stochastic framework. We find that a direct search algorithm requiring minimal user preparation is a feasible computational procedure for estimating such a model. We compare this method with several others using factorially designed Monte Carlo simulations and find evidence that a small state space and a long time series lead to better algorithmic performance.  相似文献   

12.
This paper studies the case where the observations come from a unimodal and skew density function with an unknown mode. The skew-symmetric representation of such a density has a symmetric component which can be written as a scale mixture of uniform densities. A Dirichlet process (DP) prior is assigned to mixing distribution. We also assume prior distributions for the mode and the skewed component. A computational approach is used to obtain the Bayes estimate of the components. An example is given to illustrate the approach.  相似文献   

13.
《随机性模型》2013,29(2-3):785-797
ABSTRACT

This paper describes a new algorithm for policy evaluation for Markov decision processes (MDP) that possess a quasi birth-death structure. The proposed algorithm is based on matrix analytic methods which use probabilistic concepts associated with restricting the underlying Markov process to certain state subsets. A telecommunications application example shows that the method offers significant computational reduction compared to a standard MDP policy evaluation approach.  相似文献   

14.
The procedure for building space-time autoregressive moving average (STARMA) models depends on the form of the variance-covariance matrix G of the underlying errors (see Pfeifer Deutsch (1980a,c)). In this paper the distribu¬tion of the statistic for testing the hypothesis that G is diagonal is obtained in a very convenient computational form. A table of critical values for the test is given Comparison is made with the approximate values obtained by Pfeifer Deutsch (1980c)  相似文献   

15.
The problem of updating a discriminant function on the basis of data of unknown origin is studied. There are observations of known origin from each of the underlying populations, and subsequently there is available a limited number of unclassified observations assumed to have been drawn from a mixture of the underlying populations. A sample discriminant function can be formed initially from the classified data. The question of whether the subsequent updating of this discriminant function on the basis of the unclassified data produces a reduction in the error rate of sufficient magnitude to warrant the computational effort is considered by carrying out a series of Monte Carlo experiments. The simulation results are contrasted with available asymptotic results.  相似文献   

16.
The least-squares cross-validation is a completely automatic method for choosing the smoothing parameter in probability density estimation but this method consume large amounts of computer time. This article concerns an efficient computational algorithm for this method when the kernel is symmetric and polynomial functions.  相似文献   

17.
Bandwidth selection is an important problem of kernel density estimation. Traditional simple and quick bandwidth selectors usually oversmooth the density estimate. Existing sophisticated selectors usually have computational difficulties and occasionally do not exist. Besides, they may not be robust against outliers in the sample data, and some are highly variable, tending to undersmooth the density. In this paper, a highly robust simple and quick bandwidth selector is proposed, which adapts to different types of densities.  相似文献   

18.
This paper explores the study on mixture of a class of probability density functions under type-I censoring scheme. In this paper, we mold a heterogeneous population by means of a two-component mixture of the class of probability density functions. The parameters of the class of mixture density functions are estimated and compared using the Bayes estimates under the squared-error and precautionary loss functions. A censored mixture dataset is simulated by probabilistic mixing for the computational purpose considering particular case of the Maxwell distribution. Closed-form expressions for the Bayes estimators along with their posterior risks are derived for censored as well as complete samples. Some stimulating comparisons and properties of the estimates are presented here. A factual dataset has also been for illustration.  相似文献   

19.
In computational sciences, including computational statistics, machine learning, and bioinformatics, it is often claimed in articles presenting new supervised learning methods that the new method performs better than existing methods on real data, for instance in terms of error rate. However, these claims are often not based on proper statistical tests and, even if such tests are performed, the tested hypothesis is not clearly defined and poor attention is devoted to the Type I and Type II errors. In the present article, we aim to fill this gap by providing a proper statistical framework for hypothesis tests that compare the performances of supervised learning methods based on several real datasets with unknown underlying distributions. After giving a statistical interpretation of ad hoc tests commonly performed by computational researchers, we devote special attention to power issues and outline a simple method of determining the number of datasets to be included in a comparison study to reach an adequate power. These methods are illustrated through three comparison studies from the literature and an exemplary benchmarking study using gene expression microarray data. All our results can be reproduced using R codes and datasets available from the companion website http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/compstud2013.  相似文献   

20.
A new class of nonparametric tests, based on random projections, is proposed. They can be used for several null hypotheses of practical interest, including uniformity for spherical (directional) and compositional data, sphericity of the underlying distribution and homogeneity in two-sample problems on the sphere or the simplex. The proposed procedures have a number of advantages, mostly associated with their flexibility (for example, they also work to test “partial uniformity” in a subset of the sphere), computational simplicity and ease of application even in high-dimensional cases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号