首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
ABSTRACT

In this article we evaluate the performance of a randomization test for a subset of regression coefficients in a linear model. This randomization test is based on random permutations of the independent variables. It is shown that the method maintains its level of significance, except for extreme situations, and has power that approximates the power of another randomization test, which is based on the permutation of residuals from the reduced model. We also show, via an example, that the method of permuting independent variables is more valuable than other randomization methods because it can be used in connection with the downweighting of outliers.  相似文献   

2.
3.
One important component of model selection using generalized linear models (GLM) is the choice of a link function. We propose using approximate Bayes factors to assess the improvement in fit over a GLM with canonical link when a parametric link family is used. The approximate Bayes factors are calculated using the Laplace approximations given in [32], together with a reference set of prior distributions. This methodology can be used to differentiate between different parametric link families, as well as allowing one to jointly select the link family and the independent variables. This involves comparing nonnested models and so standard significance tests cannot be used. The approach also accounts explicitly for uncertainty about the link function. The methods are illustrated using parametric link families studied in [12] for two data sets involving binomial responses. The first author was supported by Sonderforschungsbereich 386 Statistische Analyse Diskreter Strukturen, and the second author by NIH Grant 1R01CA094212-01 and ONR Grant N00014-01-10745.  相似文献   

4.
The marginal posterior probability density function (pdf) for the mean of a stationary pth order Gaussian autoregressive process is derived using the conditional likelihood function. While the posterior pdf provides a small sample analysis, the pdf is not well known and must be analyzed numerically. This is relatively easy since it is a function of only one variable. Two sets of examples are presented. The first set involves synthetic data generated by computer, and the second set deals with energy expenditure data on a bum patient.  相似文献   

5.
6.
Summary.  A graph theoretical approach is employed to describe the support set of the nonparametric maximum likelihood estimator for the cumulative distribution function given interval-censored and left-truncated data. A necessary and sufficient condition for the existence of a nonparametric maximum likelihood estimator is then derived. Two previously analysed data sets are revisited.  相似文献   

7.
We introduce two types of graphical log‐linear models: label‐ and level‐invariant models for triangle‐free graphs. These models generalise symmetry concepts in graphical log‐linear models and provide a tool with which to model symmetry in the discrete case. A label‐invariant model is category‐invariant and is preserved after permuting some of the vertices according to transformations that maintain the graph, whereas a level‐invariant model equates expected frequencies according to a given set of permutations. These new models can both be seen as instances of a new type of graphical log‐linear model termed the restricted graphical log‐linear model, or RGLL, in which equality restrictions on subsets of main effects and first‐order interactions are imposed. Their likelihood equations and graphical representation can be obtained from those derived for the RGLL models.  相似文献   

8.
Cross-validated likelihood is investigated as a tool for automatically determining the appropriate number of components (given the data) in finite mixture modeling, particularly in the context of model-based probabilistic clustering. The conceptual framework for the cross-validation approach to model selection is straightforward in the sense that models are judged directly on their estimated out-of-sample predictive performance. The cross-validation approach, as well as penalized likelihood and McLachlan's bootstrap method, are applied to two data sets and the results from all three methods are in close agreement. The second data set involves a well-known clustering problem from the atmospheric science literature using historical records of upper atmosphere geopotential height in the Northern hemisphere. Cross-validated likelihood provides an interpretable and objective solution to the atmospheric clustering problem. The clusters found are in agreement with prior analyses of the same data based on non-probabilistic clustering techniques.  相似文献   

9.
Summary.  As a special case of statistical learning, ensemble methods are well suited for the analysis of opportunistically collected data that involve many weak and sometimes specialized predictors, especially when subject-matter knowledge favours inductive approaches. We analyse data on the incidental mortality of dolphins in the purse-seine fishery for tuna in the eastern Pacific Ocean. The goal is to identify those rare purse-seine sets for which incidental mortality would be expected but none was reported. The ensemble method random forests is used to classify sets according to whether mortality was (response 1) or was not (response 0) reported. To identify questionable reporting practice, we construct 'residuals' as the difference between the categorical response (0,1) and the proportion of trees in the forest that classify a given set as having mortality. Two uses of these residuals to identify suspicious data are illustrated. This approach shows promise as a means of identifying suspect data gathered for environmental monitoring.  相似文献   

10.
Given i.i.d. observations x1,x2,x3,...,xn drawn from a mixture of normal terms, one is often interested in determining the number of terms in the mixture and their defining parameters. Although the problem of determining the number of terms is intractable under the most general assumptions, there is hope of elucidating the mixture structure given appropriate caveats on the underlying mixture. This paper examines a new approach to this problem based on the use of Akaike Information Criterion (AIC) based pruning of data driven mixture models which are obtained from resampled data sets. Results of the application of this procedure to artificially generated data sets and a real world data set are provided.  相似文献   

11.
Exact ksample permutation tests for binary data for three commonly encountered hypotheses tests are presented,, The tests are derived both under the population and randomization models . The generating function for the number of cases in the null distribution is obtained, The asymptotic distributions of the test statistics are derived . Actual significance levels are computed for the asymptotic test versions , Random sampling of the null distribution is suggested as a superior alternative to the asymptotics and an efficient computer technique for implementing the random sampling is described., finally, some numerical examples are presented and sample size guidelines given for computer implementation of the exact tests.  相似文献   

12.
This article provides some views on the statistical design and analysis of weather modification experiments. Perspectives were developed from experience with analyses of the Santa Barbara Phase I experiment summarized in Section 2, Randomization analvses are reported and compared with previously published parametric analyses. The parametric significance levels of tests for a cloud seeding effect agree well with the significance levels of the new corresponding randomization tests, These results, along with similar results of others, suggest that parametric analyses may be used as approximations to randomization analyses in exploratory analyses or reanalyses of weather modification experimental data.  相似文献   

13.
Two equivalent methods (gene counting and maximum likelihood) for estimating gene frequencies in a general genetic marker system based on observed phenotype data are derived. Under the maximum likelihood approach, an expression is given for the estimated covariance matrix from which estimated standard errors of the estimators can be found. In addition, consideration is given to the problem of estimating gene frequencies when there are available several independent population data sets.  相似文献   

14.
Abstract.  We are interested in estimating level sets using a Bayesian non-parametric approach, from an independent and identically distributed sample drawn from an unknown distribution. Under fairly general conditions on the prior, we provide an upper bound on the rate of convergence of the Bayesian level set estimate, via the rate at which the posterior distribution concentrates around the true level set. We then consider, as an application, the log-spline prior in the two-dimensional unit cube. Assuming that the true distribution belongs to a class of Hölder, we provide an upper bound on the rate of convergence of the Bayesian level set estimates. We compare our results with existing rates of convergence in the frequentist non-parametric literature: the Bayesian level set estimator proves to be competitive and is also easy to compute, which is of no small importance. A simulation study is given as an illustration.  相似文献   

15.
16.
An adaptive test is proposed for the one-way layout. This test procedure uses the order statistics of the combined data to obtain estimates of percentiles, which are used to select an appropriate set of rank scores for the one-way test statistic. This test is designed to have reasonably high power over a range of distributions. The adaptive procedure proposed for a one-way layout is a generalization of an existing two-sample adaptive test procedure. In this Monte Carlo study, the power and significance level of the F-test, the Kruskal-Wallis test, the normal scores test, and the adaptive test were evaluated for the one-way layout. All tests maintained their significance level for data sets having at least 24 observations. The simulation results show that the adaptive test is more powerful than the other tests for skewed distributions if the total number of observations equals or exceeds 24. For data sets having at least 60 observations the adaptive test is also more powerful than the F-test for some symmetric distributions.  相似文献   

17.
Influence diagnostics methods are extended in this article to the Grubbs model when the unknown quantity x (latent variable) follows a skew-normal distribution. Diagnostic measures are derived from the case-deletion approach and the local influence approach under several perturbation schemes. The observed information matrix to the postulated model and Delta matrices to the corresponding perturbed models are derived. Results obtained for one real data set are reported, illustrating the usefulness of the proposed methodology.  相似文献   

18.
In nonparametric statistics, a hypothesis testing problem based on the ranks of the data gives rise to two separate permutation sets corresponding to the null and to the alternative hypothesis, respectively. A modification of Critchlow's unified approach to hypothesis testing is proposed. By defining the distance between permutation sets to be the average distance between pairs of permutations, one from each set, various test statistics are derived for the multi-sample location problem and the two-way layout. The asymptotic distributions of the test statistics are computed under both the null and alternative hypotheses. Some comparisons are made on the basis of the asymptotic relative efficiency.  相似文献   

19.
Convex sets of probability distributions are also called credal sets. They generalize probability theory by relaxing the requirement that probability values be precise. Classification, i.e. assigning class labels to instances described by a set of attributes, is an important domain of application of Bayesian methods, where the naive Bayes classifier has a surprisingly good performance. This paper proposes a new method of classification which involves extending the naive Bayes classifier to credal sets. Exact and effective solution procedures for naive credal classification are derived, and the related dominance criteria are discussed. Credal classification appears as a new method, based on more realistic assumptions and in the direction of more reliable inferences.  相似文献   

20.
A new approach of randomization is proposed to construct goodness of fit tests generally. Some new test statistics are derived, which are based on the stochastic empirical distribution function (EDF). Note that the stochastic EDF for a set of given sample observations is a randomized distribution function. By substituting the stochastic EDF for the classical EDF in the Kolmogorov–Smirnov, Cramér–von Mises, Anderson–Darling, Berk–Jones, and Einmahl–Mckeague statistics, randomized statistics are derived, of which the qth quantile and the expectation are chosen as test statistics. In comparison to existing tests, it is shown, by a simulation study, that the new test statistics are generally more powerful than the corresponding ones based on the classical EDF or modified EDF in most cases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号