首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this second part of this paper, reproducibility of discrete ordinal and nominal outcomes is addressed. The first part deals with continuous outcomes, concentrating on intraclass correlation (ρ) in the context of one‐way analysis of variance. For categorical data, the focus has generally not been on a meaningful population parameter such as ρ. However, intraclass correlation has been defined for discrete ordinal data, ρc, and for nominal data, κI. Therefore, a unified approach to reproducibility is proposed. The relevance of these parameters is outlined. Estimation and inferential procedures for ρc and κI are reviewed, together with worked examples. Topics related to reproducibility that are not addressed in either this or the previous paper are highlighted. Considerations for designing reproducibility studies and for interpreting their results are provided. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

2.
ABSTRACT

We discuss problems the null hypothesis significance testing (NHST) paradigm poses for replication and more broadly in the biomedical and social sciences as well as how these problems remain unresolved by proposals involving modified p-value thresholds, confidence intervals, and Bayes factors. We then discuss our own proposal, which is to abandon statistical significance. We recommend dropping the NHST paradigm—and the p-value thresholds intrinsic to it—as the default statistical paradigm for research, publication, and discovery in the biomedical and social sciences. Specifically, we propose that the p-value be demoted from its threshold screening role and instead, treated continuously, be considered along with currently subordinate factors (e.g., related prior evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain) as just one among many pieces of evidence. We have no desire to “ban” p-values or other purely statistical measures. Rather, we believe that such measures should not be thresholded and that, thresholded or not, they should not take priority over the currently subordinate factors. We also argue that it seldom makes sense to calibrate evidence as a function of p-values or other purely statistical measures. We offer recommendations for how our proposal can be implemented in the scientific publication process as well as in statistical decision making more broadly.  相似文献   

3.
4.
Abstract

The EPA-RTP Library is a federal research library established on the campus of the United States Environmental Protection Agency in Research Triangle Park, North Carolina. Historically, one of the ways the library has supported the information needs of researchers on campus is through the provision of basic bibliometric-related reference support. As these requests evolved in complexity due to the nature of organizational and scientific research impact demands, library staff recognized an opportunity to expand the library’s offerings to form two distinct services with product deliverables. The resulting services analyze and package metrics at the author, article, and journal levels with graphical data visualizations to provide researchers with a portfolio-style product they can use to demonstrate their individual scholarly output and research value. Development of Research Impact Reports and Article Impact Reports is explained in detail along with reflections on implementation processes and prospects for future expansion in bibliometric analysis services.  相似文献   

5.
ABSTRACT

The current concerns about reproducibility have focused attention on proper use of statistics across the sciences. This gives statisticians an extraordinary opportunity to change what are widely regarded as statistical practices detrimental to the cause of good science. However, how that should be done is enormously complex, made more difficult by the balkanization of research methods and statistical traditions across scientific subdisciplines. Working within those sciences while also allying with science reform movements—operating simultaneously on the micro and macro levels—are the key to making lasting change in applied science.  相似文献   

6.
The anonymous mixing of Fisherian (p-values) and Neyman–Pearsonian (α levels) ideas about testing, distilled in the customary but misleading p < α criterion of statistical significance, has led researchers in the social and management sciences (and elsewhere) to commonly misinterpret the p-value as a ‘data-adjusted’ Type I error rate. Evidence substantiating this claim is provided from a number of fronts, including comments by statisticians, articles judging the value of significance testing, textbooks, surveys of scholars, and the statistical reporting behaviours of applied researchers. That many investigators do not know the difference between p’s and α’s indicates much bewilderment over what those most ardently sought research outcomes—statistically significant results—means. Statisticians can play a leading role in clearing this confusion. A good starting point would be to abolish the p < α criterion of statistical significance.  相似文献   

7.
The Fisher exact test has been unjustly dismissed by some as ‘only conditional,’ whereas it is unconditionally the uniform most powerful test among all unbiased tests, tests of size α and with power greater than its nominal level of significance α. The problem with this truly optimal test is that it requires randomization at the critical value(s) to be of size α. Obviously, in practice, one does not want to conclude that ‘with probability x the we have a statistical significant result.’ Usually, the hypothesis is rejected only if the test statistic's outcome is more extreme than the critical value, reducing the actual size considerably.

The randomized unconditional Fisher exact is constructed (using Neyman–structure arguments) by deriving a conditional randomized test randomizing at critical values c(t) by probabilities γ(t), that both depend on the total number of successes T (the complete-sufficient statistic for the nuisance parameter—the common success probability) conditioned upon.

In this paper, the Fisher exact is approximated by deriving nonrandomized conditional tests with critical region including the critical value only if γ (t) > γ0, for a fixed threshold value γ0, such that the size of the unconditional modified test is for all value of the nuisance parameter—the common success probability—smaller, but as close as possible to α. It will be seen that this greatly improves the size of the test as compared with the conservative nonrandomized Fisher exact test.

Size, power, and p value comparison with the (virtual) randomized Fisher exact test, and the conservative nonrandomized Fisher exact, Pearson's chi-square test, with the more competitive mid-p value, the McDonald's modification, and Boschloo's modifications are performed under the assumption of two binomial samples.  相似文献   

8.
Despite nearly 350 years of practice, we still do a fairly poor job of measuring the quality and impact of scholarly papers and the work of individual researchers. Much of our system for determining funding and career advancement revolves around one metric: the impact factor. Technology has progressed to a point where we can do better. We can now track the life of each individual paper after it has been published and better understand how it is read and how it is used. This creates a tremendous opportunity for new alternative metrics (“altmetrics”) to give us a much better sense of a paper's true impact. At the same time, we are now faced with an overwhelming quantity of information and seemingly unlimited ways to analyze a researcher's work. How do we separate out the digital needles from the haystack—the signal from the noise—and create useful tools for research assessment?  相似文献   

9.
《随机性模型》2013,29(2):173-191
Abstract

We propose a new approximation formula for the waiting time tail probability of the M/G/1 queue with FIFO discipline and unlimited waiting space. The aim is to address the difficulty of obtaining good estimates when the tail probability has non-exponential asymptotics. We show that the waiting time tail probability can be expressed in terms of the waiting time tail probability of a notional M/G/1 queue with truncated service time distribution plus the tail probability of an extreme order statistic. The Cramér–Lundberg approximation is applied to approximate the tail probability of the notional queue. In essence, our technique extends the applicability of the Cramér–Lundberg approximation to cases where the standard Lundberg condition does not hold. We propose a simple moment-based technique for estimating the parameters of the approximation; numerical results demonstrate that our approximation can yield very good estimates over the whole range of the argument.  相似文献   

10.
ABSTRACT

When the editors of Basic and Applied Social Psychology effectively banned the use of null hypothesis significance testing (NHST) from articles published in their journal, it set off a fire-storm of discussions both supporting the decision and defending the utility of NHST in scientific research. At the heart of NHST is the p-value which is the probability of obtaining an effect equal to or more extreme than the one observed in the sample data, given the null hypothesis and other model assumptions. Although this is conceptually different from the probability of the null hypothesis being true, given the sample, p-values nonetheless can provide evidential information, toward making an inference about a parameter. Applying a 10,000-case simulation described in this article, the authors found that p-values’ inferential signals to either reject or not reject a null hypothesis about the mean (α?=?0.05) were consistent for almost 70% of the cases with the parameter’s true location for the sampled-from population. Success increases if a hybrid decision criterion, minimum effect size plus p-value (MESP), is used. Here, rejecting the null also requires the difference of the observed statistic from the exact null to be meaningfully large or practically significant, in the researcher’s judgment and experience. The simulation compares performances of several methods: from p-value and/or effect size-based, to confidence-interval based, under various conditions of true location of the mean, test power, and comparative sizes of the meaningful distance and population variability. For any inference procedure that outputs a binary indicator, like flagging whether a p-value is significant, the output of one single experiment is not sufficient evidence for a definitive conclusion. Yet, if a tool like MESP generates a relatively reliable signal and is used knowledgeably as part of a research process, it can provide useful information.  相似文献   

11.
Abstract

Numerous methods—based on exact and asymptotic distributions—can be used to obtain confidence intervals for the odds ratio in 2 × 2 tables. We examine ten methods for generating these intervals based on coverage probability, closeness of coverage probability to target, and length of confidence intervals. Based on these criteria, Cornfield’s method, without the continuity correction, performed the best of the methods examined here. A drawback to use of this method is the significant possibility that the attained coverage probability will not meet the nominal confidence level. Use of a mid-P value greatly improves methods based on the “exact” distribution. When combined with the Wilson rule for selection of a rejection set, the resulting method is a procedure that performed very well. Crow’s method, with use of a mid-P, performed well, although it was only a slight improvement over the Wilson mid-P method. Its cumbersome calculations preclude its general acceptance. Woolf's (logit) method—with the Haldane–Anscombe correction— performed well, especially with regard to length of confidence intervals, and is recommended based on ease of computation.  相似文献   

12.
We propose the L1 distance between the distribution of a binned data sample and a probability distribution from which it is hypothetically drawn as a statistic for testing agreement between the data and a model. We study the distribution of this distance for N-element samples drawn from k bins of equal probability and derive asymptotic formulae for the mean and dispersion of L1 in the large-N limit. We argue that the L1 distance is asymptotically normally distributed, with the mean and dispersion being accurately reproduced by asymptotic formulae even for moderately large values of N and k.  相似文献   

13.
Taguchi's quality engineering concepts are of great importance in designing and improving product quality and process. However, most of the controversy and mystique have been centred on Taguchi's tactics. This research proposes an extension to the on-going research by investigating the probability of identifying insignificant factors as significant, or the so-called alpha error, with L16 orthogonal array for the larger-the-better type response variable using simulation. The response variables in the L16 array are generated from a normal distribution with the same mean and standard deviation. Consequently, the null hypothesis that all factors in the L16 array will be identified as insignificant is true. Simulation results, however, reveal that some insignificant factors are wrongly identified as significant with a very high probability, which may provide a risky parameter design. Therefore, efficient and more valid statistical tactics should be developed put out Taguchi's important quality engineering concepts into practice.  相似文献   

14.
This article presents evidence that published results of scientific investigations are not a representative sample of results of all scientific studies. Research studies from 11 major journals demonstrate the existence of biases that favor studies that observe effects that, on statistical evaluation, have a low probability of erroneously rejecting the so-called null hypothesis (H 0). This practice makes the probability of erroneously rejecting H 0 different for the reader than for the investigator. It introduces two biases in the interpretation of the scientific literature: one due to multiple repetition of studies with false hypothesis, and one due to failure to publish smaller and less significant outcomes of tests of a true hypotheses. These practices distort the results of literature surveys and of meta-analyses. These results also indicate that practice leading to publication bias have not changed over a period of 30 years.  相似文献   

15.
Despite the simplicity of the Bernoulli process, developing good confidence interval procedures for its parameter—the probability of success p—is deceptively difficult. The binary data yield a discrete number of successes from a discrete number of trials, n. This discreteness results in actual coverage probabilities that oscillate with the n for fixed values of p (and with p for fixed n). Moreover, this oscillation necessitates a large sample size to guarantee a good coverage probability when p is close to 0 or 1.

It is well known that the Wilson procedure is superior to many existing procedures because it is less sensitive to p than any other procedures, therefore it is less costly. The procedures proposed in this article work as well as the Wilson procedure when 0.1 ≤p ≤ 0.9, and are even less sensitive (i.e., more robust) than the Wilson procedure when p is close to 0 or 1. Specifically, when the nominal coverage probability is 0.95, the Wilson procedure requires a sample size 1, 021 to guarantee that the coverage probabilities stay above 0.92 for any 0.001 ≤ min {p, 1 ?p} <0.01. By contrast, our procedures guarantee the same coverage probabilities but only need a sample size 177 without increasing either the expected interval width or the standard deviation of the interval width.  相似文献   

16.
Estimating the proportion of true null hypotheses, π0, has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π0 in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π0 by incorporating the distribution pattern of the observed p-values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null p-values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1?λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance.  相似文献   

17.
While it is often argued that a p-value is a probability; see Wasserstein and Lazar, we argue that a p-value is not defined as a probability. A p-value is a bijection of the sufficient statistic for a given test which maps to the same scale as the Type I error probability. As such, the use of p-values in a test should be no more a source of controversy than the use of a sufficient statistic. It is demonstrated that there is, in fact, no ambiguity about what a p-value is, contrary to what has been claimed in recent public debates in the applied statistics community. We give a simple example to illustrate that rejecting the use of p-values in testing for a normal mean parameter is conceptually no different from rejecting the use of a sample mean. The p-value is innocent; the problem arises from its misuse and misinterpretation. The way that p-values have been informally defined and interpreted appears to have led to tremendous confusion and controversy regarding their place in statistical analysis.  相似文献   

18.
ABSTRACT

Most statistical analyses use hypothesis tests or estimation about parameters to form inferential conclusions. I think this is noble, but misguided. The point of view expressed here is that observables are fundamental, and that the goal of statistical modeling should be to predict future observations, given the current data and other relevant information. Further, the prediction of future observables provides multiple advantages to practicing scientists, and to science in general. These include an interpretable numerical summary of a quantity of direct interest to current and future researchers, a calibrated prediction of what’s likely to happen in future experiments, a prediction that can be either “corroborated” or “refuted” through experimentation, and avoidance of inference about parameters; quantities that exists only as convenient indices of hypothetical distributions. Finally, the predictive probability of a future observable can be used as a standard for communicating the reliability of the current work, regardless of whether confirmatory experiments are conducted. Adoption of this paradigm would improve our rigor for scientific accuracy and reproducibility by shifting our focus from “finding differences” among hypothetical parameters to predicting observable events based on our current scientific understanding.  相似文献   

19.
The discrete stable family constitutes an interesting two-parameter model of distributions on the non-negative integers with a Paretian tail. The practical use of the discrete stable distribution is inhibited by the lack of an explicit expression for its probability function. Moreover, the distribution does not possess moments of any order. Therefore, the usual tools—such as the maximum-likelihood method or even the moment method—are not feasible for parameter estimation. However, the probability generating function of the discrete stable distribution is available in a simple form. Hence, we initially explore the application of some existing estimation procedures based on the empirical probability generating function. Subsequently, we propose a new estimation method by minimizing a suitable weighted L 2-distance between the empirical and the theoretical probability generating functions. In addition, we provide a goodness-of-fit statistic based on the same distance.  相似文献   

20.
Experimentation with graphical methods for data presentation is important for improving graphical communication in science. Several methods—full scale breaks, dot charts, and multibased logging—are discussed. Full scale breaks are suggested as replacements for partial scale breaks, since partial breaks can fail to provide a forceful visual indication of a change in the scale. Dot charts show data that have labels and are replacements for bar charts; the new charts can be used in a wider variety of circumstances and allow more effective visual decoding of the quantitative information. Logarithms are powerful tools for data presentation; base 2 or base e is often more effective than the commonly used base 10.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号