期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Applications of a general propagation algorithm for probabilistic expert systems 总被引：4，自引：0，他引：4

A. P. Dawid 《Statistics and Computing》1992,2(1):25-36

A probabilistic expert system provides a graphical representation of a joint probability distribution which can be used to simplify and localize calculations. Jensenet al. (1990) introduced a flow-propagation algorithm for calculating marginal and conditional distributions in such a system. This paper analyses that algorithm in detail, and shows how it can be modified to perform other tasks, including maximization of the joint density and simultaneous fast retraction of evidence entered on several variables. 相似文献

2.

The German Administrative Record Census – An object identification problem

Mattis Neiling Hans-J. Lenz 《Allgemeines Statistisches Archiv》2004,88(3):259-277

Summary: The next German census will be an Administrative Record Census. Data from several administrative registers about persons will be merged. Object identification has to be applied, since no unique identification number exists in the registers. We present a two–step procedure. We briefly discuss questions like correctness and completeness of the Administrative Record Census. Then we focus on the object identification problem, that can be perceived as a special classification problem. Pairs of records are to be classified as matched or not matched. To achieve computational efficiency a preselection technique of pairs is applied. Our approach is illustrated with a database containing a large set of consumer addresses.*This work was partially supported by the Berlin–Brandenburg Graduate School in Distributed Information Systems (DFG grant no. GRK 316). The authors thank Michael Fürnrohr for previewing the paper. We would like to thank also for the helpful comments of an anonymous reviewer. 相似文献

3.

Interpolation models with multiple hyperparameters

DAVID J. C. MACKAY RYO TAKEUCHI 《Statistics and Computing》1998,8(1):15-23

A traditional interpolation model is characterized by the choice of regularizer applied to the interpolant, and the choice of noise model. Typically, the regularizer has a single regularization constant , and the noise model has a single parameter . The ratio / alone is responsible for determining globally all these attributes of the interpolant: its complexity, flexibility, smoothness, characteristic scale length, and characteristic amplitude. We suggest that interpolation models should be able to capture more than just one flavour of simplicity and complexity. We describe Bayesian models in which the interpolant has a smoothness that varies spatially. We emphasize the importance, in practical implementation, of the concept of conditional convexity when designing models with many hyperparameters. We apply the new models to the interpolation of neuronal spike data and demonstrate a substantial improvement in generalization error. 相似文献

4.

Computation of exact confidence intervals from discrete data using studentized test statistics

Paul Kabaila 《Statistics and Computing》2005,15(1):71-78

A new area of research interest is the computation of exact confidence limits or intervals for a scalar parameter of interest from discrete data by inverting a hypothesis test based on a studentized test statistic. See, for example, Chan and Zhang (1999), Agresti and Min (2001) and Agresti (2003) who deal with a difference of binomial probabilities and Agresti and Min (2002) who deal with an odds ratio. However, neither (1) a detailed analysis of the computational issues involved nor (2) a reliable method of computation that deals effectively with these issues is currently available. In this paper we solve these two problems for a very broad class of discrete data models. We suppose that the distribution of the data is determined by (,) where is a nuisance parameter vector. We also consider six different studentized test statistics. Our contributions to (1) are as follows. We show that the P-value resulting from the hypothesis test, considered as a function of the null-hypothesized value of , has both jump and drop discontinuities. Numerical examples are used to demonstrate that these discontinuities lead to the failure of simple-minded approaches to the computation of the confidence limit or interval. We also provide a new method for efficiently computing the set of all possible locations of these discontinuities. Our contribution to (2) is to provide a new and reliable method of computing the confidence limit or interval, based on the knowledge of this set. 相似文献

5.

Quantile estimation in ultra-high frequency financial data: a comparison between parametric and semiparametric approach

Paola?Zuccolotto Email author 《Statistical Methods and Applications》2003,12(2):243-257

In the context of ACD models for ultra-high frequency data different specifications are available to estimate the conditional mean of intertrade durations, while quantiles estimation has been completely neglected by literature, even if to trading extent it can be more informative. The main problem arising with quantiles estimation is the correct specification of durations probability law: the usual assumption of Exponentially distributed residuals, is very robust for the estimation of parameters of the conditional mean, but dramatically fails the distributional fit. In this paper a semiparametric approach is formalized, and compared with the parametric one, deriving from Exponential assumption. Empirical evidence for a stock of Italian financial market strongly supports the former approach.Paola Zuccolotto: The author wishes to thank Prof. A. Mazzali, Dott. G. De Luca, Dott. M. Sandri for valuable comments. 相似文献

6.

The Statistics of Simulating Chaos

Phil Diamond Alexei Pokrovskii 《Statistics and Computing》2001,11(3):217-228

When simulating a dynamical system, the computation is actually of a spatially discretized system, because finite machine arithmetic replaces continuum state space. For chaotic dynamical systems, the discretized simulations often have collapsing effects, to a fixed point or to short cycles. Statistical properties of these phenomena can be modelled with random mappings with an absorbing centre. The model gives results which are very much in line with computational experiments. The effects are discussed with special reference to the family of mappings f (x)=1-|1-2x|,x [0,1],1,<,,<,. Computer experiments show close agreement with predictions of the model. 相似文献

7.

Sampling based approach for one-hit and multi-hit models in quantal bioassay

CHU HUI-MAY KUO LYNN 《Statistics and Computing》1997,7(3):183-192

Bayesian methods for estimating the dose response curves with the one-hit model, the gamma multi-hit model, and their modified versions with Abbott's correction are studied. The Gibbs sampling approach with data augmentation and with the Metropolis algorithm is employed to compute the Bayes estimates of the potency curves. In addition, estimation of the relative additional risk and the virtually safe dose is studied. Model selection based on conditional predictive ordinates from cross-validated data is developed. 相似文献

8.

Small area estimation for longitudinal surveys

Maria?Rosaria?Ferrante Email author Silvia?Pacei 《Statistical Methods and Applications》2004,13(3):327-340

Over the last few years many studies have been carried out in Italy to identify reliable small area labour force indicators. Considering the rotated sample design of the Italian Labour Force Survey, the aim of this work is to derive a small area estimator which borrows strength from individual temporal correlation, as well as from related areas. Two small area estimators are derived as extensions of an estimation strategies proposed by Fuller (1990) for partial overlap samples. A simulation study is carried out to evaluate the gain in efficiency provided by our solutions. Results obtained for different levels of autocorrelation between repeated measurements on the same outcome and different population settings show that these estimators are always more reliable than the traditional composite one, and in some circumstances they are extremely advantageous.The present paper is financially supported by Murst-Cofin (2001) Lutilizzo di informazioni di tipo amministrativo nella stima per piccole aree e per sottoinsiemi della popolazione (National Coordinator Prof. Carlo Filippucci). 相似文献

9.

Semiparametric Efficient Estimation in the Generalized Odds-Rate Class of Regression Models for Right-Censored Time-to-Event Data

Scharfstein Daniel O. Tsiatis Anastasios A. Gilbert Peter B. 《Lifetime data analysis》1998,4(4):355-391

The generalized odds-rate class of regression models for time to event data is indexed by a non-negative constant and assumes thatg(S(t|Z)) = (t) + Zwhere g(s) = log(^-1(s-) for > 0, g₀(s) = log(- log s), S(t|Z) is the survival function of the time to event for an individual with qx1 covariate vector Z, is a qx1 vector of unknown regression parameters, and (t) is some arbitrary increasing function of t. When =0, this model is equivalent to the proportional hazards model and when =1, this model reduces to the proportional odds model. In the presence of right censoring, we construct estimators for and exp((t)) and show that they are consistent and asymptotically normal. In addition, we show that the estimator for is semiparametric efficient in the sense that it attains the semiparametric variance bound. 相似文献

10.

An efficient algorithm for finding the M most probable configurationsin probabilistic expert systems 总被引：1，自引：0，他引：1

D. NILSSON 《Statistics and Computing》1998,8(2):159-173

A probabilistic expert system provides a graphical representation of a joint probability distribution which enables local computations of probabilities. Dawid (1992) provided a flow- propagation algorithm for finding the most probable configuration of the joint distribution in such a system. This paper analyses that algorithm in detail, and shows how it can be combined with a clever partitioning scheme to formulate an efficient method for finding the M most probable configurations. The algorithm is a divide and conquer technique, that iteratively identifies the M most probable configurations. 相似文献

11.

Computing location depth and regression depth in higher dimensions 总被引：3，自引：0，他引：3

Peter J. Rousseeuw Anja Struyf 《Statistics and Computing》1998,8(3):193-203

The location depth (Tukey 1975) of a point relative to a p-dimensional data set Z of size n is defined as the smallest number of data points in a closed halfspace with boundary through . For bivariate data, it can be computed in O(nlogn) time (Rousseeuw and Ruts 1996). In this paper we construct an exact algorithm to compute the location depth in three dimensions in O(n2logn) time. We also give an approximate algorithm to compute the location depth in p dimensions in O(mp3+mpn) time, where m is the number of p-subsets used.Recently, Rousseeuw and Hubert (1996) defined the depth of a regression fit. The depth of a hyperplane with coefficients (1,...,p) is the smallest number of residuals that need to change sign to make (1,...,p) a nonfit. For bivariate data (p=2) this depth can be computed in O(nlogn) time as well. We construct an algorithm to compute the regression depth of a plane relative to a three-dimensional data set in O(n2logn) time, and another that deals with p=4 in O(n3logn) time. For data sets with large n and/or p we propose an approximate algorithm that computes the depth of a regression fit in O(mp3+mpn+mnlogn) time. For all of these algorithms, actual implementations are made available. 相似文献

12.

A fast splitting procedure for classification trees 总被引：1，自引：0，他引：1

MOLA FRANCESCO SICILIANO ROBERTA 《Statistics and Computing》1997,7(3):209-216

This paper provides a faster method to find the best split at each node when using the CART methodology. The predictability index is proposed as a splitting rule for growing the same classification tree as CART does when using the Gini index of heterogeneity as an impurity measure. A theorem is introduced to show a new property of the index : the for a given predictor has a value not lower than the for any split generated by the predictor. This property is used to make a substantial saving in the time required to generate a classification tree. Three simulation studies are presented in order to show the computational gain in terms of both the number of splits analysed at each node and the CPU time. The proposed splitting algorithm can prove computational efficiency in real data sets as shown in an example. 相似文献

13.

Generating random numbers of prescribed distribution using physical sources

Daniel Neuenschwander Hansmartin Zeuner 《Statistics and Computing》2003,13(1):5-11

When constructing uniform random numbers in [0, 1] from the output of a physical device, usually n independent and unbiased bits B _j are extracted and combined into the machine number . In order to reduce the number of data used to build one real number, we observe that for independent and exponentially distributed random variables X _n (which arise for example as waiting times between two consecutive impulses of a Geiger counter) the variable U _n : = X _{2n – 1}/(X _{2n – 1} + X _2n) is uniform in [0, 1]. In the practical application X _n can only be measured up to a given precision (in terms of the expectation of the X _n); it is shown that the distribution function obtained by calculating U _n from these measurements differs from the uniform by less than /2.We compare this deviation with the error resulting from the use of biased bits B _j with P {B _j = 1{ = (where ] – [) in the construction of Y above. The influence of a bias is given by the estimate that in the p-total variation norm Q^TV _p = ( |Q()|^p)^1/p (p 1) we have P _Y – P ⁰ _Y^TV _p (c _n · )^1/p with c _n p for n . For the distribution function F _Y – F ⁰ _Y 2(1 – 2^–n)|| holds. 相似文献

14.

Pitfalls in linear models for style analysis

Francesco?Corielli Email author Attilio?Meucci 《Statistical Methods and Applications》2004,13(1):105-129

We discuss the statistical properties of return-based OLS style analysis introduced by Sharpe (1992). The aim of style analysis is to infer a fund managers investment decisions using only publicly available data on the fund performance and on the time evolution of market indexes. We show that the model proposed by Sharpe suffers of relevant drawbacks, most notably that it fails to yield correct results even in the simple case of a buy-and-hold strategy that only invests in the market indexes. Under this hypothesis we show that a model linear in index levels, as opposed to index returns, estimated via a Kalman filter avoids Sharpes model drawbacks. We further extend our analysis to strategies where the fund manager policy changes with time and the asset classes in which the fund manager invests are not known exactly. In this last case we show that a style analysis is possible only conditional to either an orthogonality hypothesis on the active investment strategy, or by the introduction of suitable instrumental variables.The authors are grateful to the editor and an anonymous referee for many comments which greatly helped in improving the paper. The authors are, obviously, fully responsible for any remaining error. 相似文献

15.

Basic aspects of evolution strategies

Thomas Bäck Frank Hoffmeister 《Statistics and Computing》1994,4(2):51-63

Evolution strategies (ESs) are a special class of probabilistic, direct, global optimization methods. They are similar to genetic algorithms but work in continuous spaces and have the additional capability of self-adapting their major strategy parameters. This paper presents the most important features of ESs, namely their self-adaptation, as well as their robustness and potential for parallelization which they share with other evolutionary algorithms.Besides the early (1 + 1)-ES and its underlying theoretical results, the modern ( + )-ES and (, )-ES are presented with special emphasis on the self-adaptation of strategy parameters, a mechanism which enables the algorithm to evolve not only the object variables but also the characteristics of the probability distributions of normally distributed mutations. The self-adaptation property of the algorithm is also illustrated by an experimental example.The robustness of ESs is demonstrated for noisy fitness evaluations and by its application to discrete optimization problems, namely the travelling salesman problem (TSP).Finally, the paper concludes by summarizing existing work and general possibilities regarding the parallelization of evolution strategies and evolutionary algorithms in general. 相似文献

16.

Non-standard methods in evolutionary computation

Zbigniew Michalewicz 《Statistics and Computing》1994,4(2):141-155

The paper presents non-standard methods in evolutionary computation and discusses their applicability to various optimization problems. These methods maintain populations of individuals with nonlinear chromosomal structure and use genetic operators enhanced by the problem specific knowledge. 相似文献

17.

The calibration of P-values,posterior Bayes factors and the AIC from the posterior distribution of the likelihood

Aitkin Murray 《Statistics and Computing》1997,7(4):253-261

The posterior distribution of the likelihood is used to interpret the evidential meaning of P-values, posterior Bayes factors and Akaike's information criterion when comparing point null hypotheses with composite alternatives. Asymptotic arguments lead to simple re-calibrations of these criteria in terms of posterior tail probabilities of the likelihood ratio. (Prior) Bayes factors cannot be calibrated in this way as they are model-specific. 相似文献

18.

Comparisons of Test Statistics Arising from Marginal Analyses of Multivariate Survival Data

Li QH Lagakos SW 《Lifetime data analysis》2004,10(4):389-405

We investigate the properties of several statistical tests for comparing treatment groups with respect to multivariate survival data, based on the marginal analysis approach introduced by Wei, Lin and Weissfeld [Regression Analysis of multivariate incomplete failure time data by modelling marginal distributians, JASA vol. 84 pp. 1065–1073]. We consider two types of directional tests, based on a constrained maximization and on linear combinations of the unconstrained maximizer of the working likelihood function, and the omnibus test arising from the same working likelihood. The directional tests are members of a larger class of tests, from which an asymptotically optimal test can be found. We compare the asymptotic powers of the tests under general contiguous alternatives for a variety of settings, and also consider the choice of the number of survival times to include in the multivariate outcome. We illustrate the results with simulations and with the results from a clinical trial examining recurring opportunistic infections in persons with HIV. 相似文献

19.

Principal curves revisited 总被引：15，自引：0，他引：15

Robert Tibshirani 《Statistics and Computing》1992,2(4):183-190

A principal curve (Hastie and Stuetzle, 1989) is a smooth curve passing through the middle of a distribution or data cloud, and is a generalization of linear principal components. We give an alternative definition of a principal curve, based on a mixture model. Estimation is carried out through an EM algorithm. Some comparisons are made to the Hastie-Stuetzle definition. 相似文献

20.

Characteristics and impact of faked interviews in surveys – An analysis of genuine fakes in the raw data of SOEP

Joerg&#x;Peter Schraepler Gert G. Wagner 《Allgemeines Statistisches Archiv》2005,89(1):7-20

Summary: Panel data offers a unique opportunity to identify data that interviewers clearly faked by comparing data waves. In the German Socio–Economic Panel (SOEP), only 0.5 percent of all records of raw data have been detected as faked. These fakes are used here to analyze the potential impact of fakes on survey results. Due to our central finding the faked records have no impact on the mean or the proportions. However, we show that there may be a serious bias in the estimation of correlations and regression coefficients. In all but one year (1998), the detected faked data have never been disseminated within the widely–used SOEP study. The fakes are removed prior to data release.* We are grateful to participants in the workshop on Item Nonresponse and Data Quality on Large Social Surveys for useful critique and comments, especially Rainer Schnell and our outstanding discussant Regina Riphahn. The usual disclaimer applies. 相似文献