期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The Performance of Randomization Tests that Use Permutations of Independent Variables

Thomas W. O'Gorman 《统计学通讯:模拟与计算》2013,42(4):895-908

ABSTRACT

In this article we evaluate the performance of a randomization test for a subset of regression coefficients in a linear model. This randomization test is based on random permutations of the independent variables. It is shown that the method maintains its level of significance, except for extreme situations, and has power that approximates the power of another randomization test, which is based on the permutation of residuals from the reduced model. We also show, via an example, that the method of permuting independent variables is more valuable than other randomization methods because it can be used in connection with the downweighting of outliers. 相似文献

2.

Analysis of distance for structured multivariate data and extensions to multivariate analysis of variance 总被引：3，自引：0，他引：3

J. C. Gower & W. J. Krzanowski 《Journal of the Royal Statistical Society. Series C, Applied statistics》1999,48(4):505-519

相似文献

3.

Choosing the link function and accounting for link uncertainty in generalized linear models using Bayes factors

Claudia Czado Adrian E. Raftery 《Statistical Papers》2006,47(3):419-442

One important component of model selection using generalized linear models (GLM) is the choice of a link function. We propose using approximate Bayes factors to assess the improvement in fit over a GLM with canonical link when a parametric link family is used. The approximate Bayes factors are calculated using the Laplace approximations given in [32], together with a reference set of prior distributions. This methodology can be used to differentiate between different parametric link families, as well as allowing one to jointly select the link family and the independent variables. This involves comparing nonnested models and so standard significance tests cannot be used. The approach also accounts explicitly for uncertainty about the link function. The methods are illustrated using parametric link families studied in [12] for two data sets involving binomial responses. The first author was supported by Sonderforschungsbereich 386 Statistische Analyse Diskreter Strukturen, and the second author by NIH Grant 1R01CA094212-01 and ONR Grant N00014-01-10745. 相似文献

4.

Bayesian estimation of the mean of an autoregressive process

Lyle D Broemeling Peyton Cook 《Journal of applied statistics》1993,20(1):25-39

The marginal posterior probability density function (pdf) for the mean of a stationary pth order Gaussian autoregressive process is derived using the conditional likelihood function. While the posterior pdf provides a small sample analysis, the pdf is not well known and must be analyzed numerically. This is relatively easy since it is a function of only one variable. Two sets of examples are presented. The first set involves synthetic data generated by computer, and the second set deals with energy expenditure data on a bum patient. 相似文献

5.

Exact credal treatment of missing data

《Journal of statistical planning and inference》2002,105(1):105-122

相似文献

6.

On nonparametric maximum likelihood estimation with interval censoring and left truncation

Michael G. Hudgens 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2005,67(4):573-587

Summary. A graph theoretical approach is employed to describe the support set of the nonparametric maximum likelihood estimator for the cumulative distribution function given interval-censored and left-truncated data. A necessary and sufficient condition for the existence of a nonparametric maximum likelihood estimator is then derived. Two previously analysed data sets are revisited. 相似文献

7.

Label‐ and Level‐Invariant Graphical Log‐Linear Models

下载免费PDF全文

Ricardo Ramírez‐Aldana Guillermina Eslava‐Gómez 《Australian & New Zealand Journal of Statistics》2016,58(2):269-291

We introduce two types of graphical log‐linear models: label‐ and level‐invariant models for triangle‐free graphs. These models generalise symmetry concepts in graphical log‐linear models and provide a tool with which to model symmetry in the discrete case. A label‐invariant model is category‐invariant and is preserved after permuting some of the vertices according to transformations that maintain the graph, whereas a level‐invariant model equates expected frequencies according to a given set of permutations. These new models can both be seen as instances of a new type of graphical log‐linear model termed the restricted graphical log‐linear model, or RGLL, in which equality restrictions on subsets of main effects and first‐order interactions are imposed. Their likelihood equations and graphical representation can be obtained from those derived for the RGLL models. 相似文献

8.

Model selection for probabilistic clustering using cross-validated likelihood 总被引：1，自引：0，他引：1

Smyth Padhraic 《Statistics and Computing》2000,10(1):63-72

Cross-validated likelihood is investigated as a tool for automatically determining the appropriate number of components (given the data) in finite mixture modeling, particularly in the context of model-based probabilistic clustering. The conceptual framework for the cross-validation approach to model selection is straightforward in the sense that models are judged directly on their estimated out-of-sample predictive performance. The cross-validation approach, as well as penalized likelihood and McLachlan's bootstrap method, are applied to two data sets and the results from all three methods are in close agreement. The second data set involves a well-known clustering problem from the atmospheric science literature using historical records of upper atmosphere geopotential height in the Northern hemisphere. Cross-validated likelihood provides an interpretable and objective solution to the atmospheric clustering problem. The clusters found are in agreement with prior analyses of the same data based on non-probabilistic clustering techniques. 相似文献

9.

Statistical learning procedures for monitoring regulatory compliance: an application to fisheries data 总被引：1，自引：1，他引：0

Cleridy E. Lennert-Cody Richard A. Berk 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2007,170(3):671-689

Summary. As a special case of statistical learning, ensemble methods are well suited for the analysis of opportunistically collected data that involve many weak and sometimes specialized predictors, especially when subject-matter knowledge favours inductive approaches. We analyse data on the incidental mortality of dolphins in the purse-seine fishery for tuna in the eastern Pacific Ocean. The goal is to identify those rare purse-seine sets for which incidental mortality would be expected but none was reported. The ensemble method random forests is used to classify sets according to whether mortality was (response 1) or was not (response 0) reported. To identify questionable reporting practice, we construct 'residuals' as the difference between the categorical response (0,1) and the proportion of trees in the forest that classify a given set as having mortality. Two uses of these residuals to identify suspicious data are illustrated. This approach shows promise as a means of identifying suspect data gathered for environmental monitoring. 相似文献

10.

Mixture structure analysis using the Akaike Information Criterion and the bootstrap

Jeffrey L. Solka Edward J. Wegman Carey E. Priebe Wendy L. Poston George W. Rogers 《Statistics and Computing》1998,8(3):177-188

Given i.i.d. observations x1,x2,x3,...,xn drawn from a mixture of normal terms, one is often interested in determining the number of terms in the mixture and their defining parameters. Although the problem of determining the number of terms is intractable under the most general assumptions, there is hope of elucidating the mixture structure given appropriate caveats on the underlying mixture. This paper examines a new approach to this problem based on the use of Akaike Information Criterion (AIC) based pruning of data driven mixture models which are obtained from resampled data sets. Results of the application of this procedure to artificially generated data sets and a real world data set are provided. 相似文献

11.

Permuiation tesis for k-sample binomial data with comparisons of exact and approximate P-levels

Andrew P. Soms 《统计学通讯:理论与方法》2013,42(1):217-233

Exact ksample permutation tests for binary data for three commonly encountered hypotheses tests are presented,, The tests are derived both under the population and randomization models . The generating function for the number of cases in the null distribution is obtained, The asymptotic distributions of the test statistics are derived . Actual significance levels are computed for the asymptotic test versions , Random sampling of the null distribution is suggested as a superior alternative to the asymptotics and an efficient computer technique for implementing the random sampling is described., finally, some numerical examples are presented and sample size guidelines given for computer implementation of the exact tests. 相似文献

12.

Prespective from a weather modification experiment

Ralph A. Bradley Elton Scott 《统计学通讯:理论与方法》2013,42(18):1941-1961

This article provides some views on the statistical design and analysis of weather modification experiments. Perspectives were developed from experience with analyses of the Santa Barbara Phase I experiment summarized in Section 2, Randomization analvses are reported and compared with previously published parametric analyses. The parametric significance levels of tests for a cloud seeding effect agree well with the significance levels of the new corresponding randomization tests, These results, along with similar results of others, suggest that parametric analyses may be used as approximations to randomization analyses in exploratory analyses or reanalyses of weather modification experimental data. 相似文献

13.

Estimation of gene/haplotype frequencies in genetic marker systems based on phenotype data

Danny Dyer Larry F. Heath 《统计学通讯:理论与方法》2013,42(11):3927-3947

Two equivalent methods (gene counting and maximum likelihood) for estimating gene frequencies in a general genetic marker system based on observed phenotype data are derived. Under the maximum likelihood approach, an expression is given for the estimated covariance matrix from which estimated standard errors of the estimators can be found. In addition, consideration is given to the problem of estimating gene frequencies when there are available several independent population data sets. 相似文献

14.

Rates of Convergence for a Bayesian Level Set Estimation

GHISLAINE GAYRAUD JUDITH ROUSSEAU 《Scandinavian Journal of Statistics》2005,32(4):639-660

Abstract. We are interested in estimating level sets using a Bayesian non-parametric approach, from an independent and identically distributed sample drawn from an unknown distribution. Under fairly general conditions on the prior, we provide an upper bound on the rate of convergence of the Bayesian level set estimate, via the rate at which the posterior distribution concentrates around the true level set. We then consider, as an application, the log-spline prior in the two-dimensional unit cube. Assuming that the true distribution belongs to a class of Hölder, we provide an upper bound on the rate of convergence of the Bayesian level set estimates. We compare our results with existing rates of convergence in the frequentist non-parametric literature: the Bayesian level set estimator proves to be competitive and is also easy to compute, which is of no small importance. A simulation study is given as an illustration. 相似文献

15.

Exact group comparisons using irregular longitudinal data

J. S. Maritz C. J. Lombard & C. H. Morrell 《Journal of the Royal Statistical Society. Series C, Applied statistics》1998,47(3):351-360

相似文献

16.

An adaptive test for the one-way layout

Thomas W. O'GORMAN 《Revue canadienne de statistique》1997,25(2):269-279

An adaptive test is proposed for the one-way layout. This test procedure uses the order statistics of the combined data to obtain estimates of percentiles, which are used to select an appropriate set of rank scores for the one-way test statistic. This test is designed to have reasonably high power over a range of distributions. The adaptive procedure proposed for a one-way layout is a generalization of an existing two-sample adaptive test procedure. In this Monte Carlo study, the power and significance level of the F-test, the Kruskal-Wallis test, the normal scores test, and the adaptive test were evaluated for the one-way layout. All tests maintained their significance level for data sets having at least 24 observations. The simulation results show that the adaptive test is more powerful than the other tests for skewed distributions if the total number of observations equals or exceeds 24. For data sets having at least 60 observations the adaptive test is also more powerful than the F-test for some symmetric distributions. 相似文献

17.

Influence Diagnostics for a Skew Extension of the Grubbs Measurement Error Model

Lourdes C. Montenegro Heleno Bolfarine Victor H. Lachos 《统计学通讯:模拟与计算》2013,42(4):667-681

Influence diagnostics methods are extended in this article to the Grubbs model when the unknown quantity x (latent variable) follows a skew-normal distribution. Diagnostic measures are derived from the case-deletion approach and the local influence approach under several perturbation schemes. The observed information matrix to the postulated model and Delta matrices to the corresponding perturbed models are derived. Results obtained for one real data set are reported, illustrating the usefulness of the proposed methodology. 相似文献

18.

A general theory of hypothesis testing based on rankings

《Journal of statistical planning and inference》1997,61(2):219-248

In nonparametric statistics, a hypothesis testing problem based on the ranks of the data gives rise to two separate permutation sets corresponding to the null and to the alternative hypothesis, respectively. A modification of Critchlow's unified approach to hypothesis testing is proposed. By defining the distance between permutation sets to be the average distance between pairs of permutations, one from each set, various test statistics are derived for the multi-sample location problem and the two-way layout. The asymptotic distributions of the test statistics are computed under both the null and alternative hypotheses. Some comparisons are made on the basis of the asymptotic relative efficiency. 相似文献

19.

The naive credal classifier

《Journal of statistical planning and inference》2002,105(1):5-21

Convex sets of probability distributions are also called credal sets. They generalize probability theory by relaxing the requirement that probability values be precise. Classification, i.e. assigning class labels to instances described by a set of attributes, is an important domain of application of Bayesian methods, where the naive Bayes classifier has a surprisingly good performance. This paper proposes a new method of classification which involves extending the naive Bayes classifier to credal sets. Exact and effective solution procedures for naive credal classification are derived, and the related dominance criteria are discussed. Credal classification appears as a new method, based on more realistic assumptions and in the direction of more reliable inferences. 相似文献

20.

New Goodness of Fit Tests Based on Stochastic EDF

Jianxin Zhao Xingzhong Xu Xiaobo Ding 《统计学通讯:理论与方法》2013,42(6):1075-1094

A new approach of randomization is proposed to construct goodness of fit tests generally. Some new test statistics are derived, which are based on the stochastic empirical distribution function (EDF). Note that the stochastic EDF for a set of given sample observations is a randomized distribution function. By substituting the stochastic EDF for the classical EDF in the Kolmogorov–Smirnov, Cramér–von Mises, Anderson–Darling, Berk–Jones, and Einmahl–Mckeague statistics, randomized statistics are derived, of which the qth quantile and the expectation are chosen as test statistics. In comparison to existing tests, it is shown, by a simulation study, that the new test statistics are generally more powerful than the corresponding ones based on the classical EDF or modified EDF in most cases. 相似文献