首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
P-values are useful statistical measures of evidence against a null hypothesis. In contrast to other statistical estimates, however, their sample-to-sample variability is usually not considered or estimated, and therefore not fully appreciated. Via a systematic study of log-scale p-value standard errors, bootstrap prediction bounds, and reproducibility probabilities for future replicate p-values, we show that p-values exhibit surprisingly large variability in typical data situations. In addition to providing context to discussions about the failure of statistical results to replicate, our findings shed light on the relative value of exact p-values vis-a-vis approximate p-values, and indicate that the use of *, **, and *** to denote levels 0.05, 0.01, and 0.001 of statistical significance in subject-matter journals is about the right level of precision for reporting p-values when judged by widely accepted rules for rounding statistical estimates.  相似文献   

2.
In large-scale genomics experiments involving thousands of statistical tests, such as association scans and microarray expression experiments, a key question is: Which of the L tests represent true associations (TAs)? The traditional way to control false findings is via individual adjustments. In the presence of multiple TAs, p-value combination methods offer certain advantages. Both Fisher's and Lancaster's combination methods use an inverse gamma transformation. We identify the relation of the shape parameter of that distribution to the implicit threshold value; p-values below that threshold are favored by the inverse gamma method (GM). We explore this feature to improve power over Fisher's method when L is large and the number of TAs is moderate. However, the improvement in power provided by combination methods is at the expense of a weaker claim made upon rejection of the null hypothesis - that there are some TAs among the L tests. Thus, GM remains a global test. To allow a stronger claim about a subset of p-values that is smaller than L, we investigate two methods with an explicit truncation: the rank truncated product method (RTP) that combines the first K-ordered p-values, and the truncated product method (TPM) that combines p-values that are smaller than a specified threshold. We conclude that TPM allows claims to be made about subsets of p-values, while the claim of the RTP is, like GM, more appropriately about all L tests. GM gives somewhat higher power than TPM, RTP, Fisher, and Simes methods across a range of simulations.  相似文献   

3.
When a genetic algorithm (GA) is employed in a statistical problem, the result is affected by both variability due to sampling and the stochastic elements of algorithm. Both of these components should be controlled in order to obtain reliable results. In the present work we analyze parametric estimation problems tackled by GAs, and pursue two objectives: the first one is related to a formal variability analysis of final estimates, showing that it can be easily decomposed in the two sources of variability. In the second one we introduce a framework of GA estimation with fixed computational resources, which is a form of statistical and the computational tradeoff question, crucial in recent problems. In this situation the result should be optimal from both the statistical and computational point of view, considering the two sources of variability and the constraints on resources. Simulation studies will be presented for illustrating the proposed method and the statistical and computational tradeoff question.  相似文献   

4.
A method of statistical analysis of single replicate and fractional factorial designs requiring no estimate of error variance is given. By comparison of the relative magnitudes of independent effect .estimates, effects corresponding to relatively large effect estimates may be asserted to be nonzero. The procedure maintains a prespecified experimentwise error rate for a general class of modulus-ratio statistics.  相似文献   

5.
Pressure is often placed on statistical analysts to improve the accuracy of their population estimates. In response to this pressure, analysts have long exploited the potential to combine surveys in various ways. This paper develops a framework for combining surveys when data items from one of the surveys is mass imputed. The estimates from the surveys are combined using a composite estimator (CE). The CE accounts for the variability due to the imputation model and the surveys’ sampling schemes. Diagnostics for the validity of the imputation model are also discussed. We describe an application of combining the Australian Labour Force Survey and the National Aboriginal and Torres Strait Islander Health Survey to estimate employment characteristics about the Indigenous population. The findings suggest that combining these surveys is beneficial.  相似文献   

6.
Two types of state-switching models for U.S. real output have been proposed: models that switch randomly between states and models that switch states deterministically, as in the threshold autoregressive model of Potter. These models have been justified primarily on how well they fit the sample data, yielding statistically significant estimates of the model coefficients. Here we propose a new approach to the evaluation of an estimated nonlinear time series model that provides a complement to existing methods based on in-sample fit or on out-of-sample forecasting. In this new approach, a battery of distinct nonlinearity tests is applied to the sample data, resulting in a set of p-values for rejecting the null hypothesis of a linear generating mechanism. This set of p-values is taken to be a “stylized fact” characterizing the nonlinear serial dependence in the generating mechanism of the time series. The effectiveness of an estimated nonlinear model for this time series is then evaluated in terms of the congruence between this stylized fact and a set of nonlinearity test results obtained from data simulated using the estimated model. In particular, we derive a portmanteau statistic based on this set of nonlinearity test p-values that allows us to test the proposition that a given model adequately captures the nonlinear serial dependence in the sample data. We apply the method to several estimated state-switching models of U.S. real output.  相似文献   

7.
Zhang C  Fan J  Yu T 《Annals of statistics》2011,39(1):613-642
The multiple testing procedure plays an important role in detecting the presence of spatial signals for large scale imaging data. Typically, the spatial signals are sparse but clustered. This paper provides empirical evidence that for a range of commonly used control levels, the conventional FDR procedure can lack the ability to detect statistical significance, even if the p-values under the true null hypotheses are independent and uniformly distributed; more generally, ignoring the neighboring information of spatially structured data will tend to diminish the detection effectiveness of the FDR procedure. This paper first introduces a scalar quantity to characterize the extent to which the "lack of identification phenomenon" (LIP) of the FDR procedure occurs. Second, we propose a new multiple comparison procedure, called FDR(L), to accommodate the spatial information of neighboring p-values, via a local aggregation of p-values. Theoretical properties of the FDR(L) procedure are investigated under weak dependence of p-values. It is shown that the FDR(L) procedure alleviates the LIP of the FDR procedure, thus substantially facilitating the selection of more stringent control levels. Simulation evaluations indicate that the FDR(L) procedure improves the detection sensitivity of the FDR procedure with little loss in detection specificity. The computational simplicity and detection effectiveness of the FDR(L) procedure are illustrated through a real brain fMRI dataset.  相似文献   

8.
Statistics and Computing - Multiple hypothesis tests are often carried out in practice using p-value estimates obtained with bootstrap or permutation tests since the analytical p-values underlying...  相似文献   

9.
10.
Software packages usually report the results of statistical tests using p-values. Users often interpret these values by comparing them with standard thresholds, for example, 0.1, 1, and 5%, which is sometimes reinforced by a star rating (***, **, and *, respectively). We consider an arbitrary statistical test whose p-value p is not available explicitly, but can be approximated by Monte Carlo samples, for example, by bootstrap or permutation tests. The standard implementation of such tests usually draws a fixed number of samples to approximate p. However, the probability that the exact and the approximated p-value lie on different sides of a threshold (the resampling risk) can be high, particularly for p-values close to a threshold. We present a method to overcome this. We consider a finite set of user-specified intervals that cover [0, 1] and that can be overlapping. We call these p-value buckets. We present algorithms that, with arbitrarily high probability, return a p-value bucket containing p. We prove that for both a bounded resampling risk and a finite runtime, overlapping buckets need to be employed, and that our methods both bound the resampling risk and guarantee a finite runtime for such overlapping buckets. To interpret decisions with overlapping buckets, we propose an extension of the star rating system. We demonstrate that our methods are suitable for use in standard software, including for low p-value thresholds occurring in multiple testing settings, and that they can be computationally more efficient than standard implementations.  相似文献   

11.
This paper deals with the analysis of datasets, where the subjects are described by the estimated means of a p-dimensional variable. Classical statistical methods of data analysis do not treat measurements affected by intrinsic variability, as in the case of estimates, so that the heterogeneity induced among subjects by this condition is not taken into account. In this paper a way to solve the problem is suggested in the context of symbolic data analysis, whose specific aim is to handle data tables where single valued measurements are substituted by complex data structures like frequency distributions, intervals, and sets of values. A principal component analysis is carried out according to this proposal, with a significant improvement in the treatment of information.  相似文献   

12.
An estimator, λ is proposed for the parameter λ of the log-zero-Poisson distribution. While it is not a consistent estimator of λ in the usual statistical sense, it is shown to be quite close to the maximum likelihood estimates for many of the 35 sets of data on which it is tried. Since obtaining maximum likelihood estimates is extremely difficult for this and other contagious distributions, this estimate will act at least as an initial estimate in solving the likelihood equations iteratively. A lesson learned from this experience is that in the area of contagious distributions, variability is so large that attention should be focused directly on the mean squared error and not on consistency or unbiasedness, whether for small samples or for the asymptotic case. Sample sizes for some of the data considered in the paper are in hundreds. The fact that the estimator which is not a consistent estimator of λ is closer to the maximum likeli-hood estimator than the consistent moment estimator shows that the variability is large enough to not permit consistency to materialize even for such large sample sizes usually available in actual practice.  相似文献   

13.
Tsui and Weerahandi (1989) introduced the notion of generalized p-values and since then this idea is used to solve many statistical testing problems. Heteroskedasticity is one of the major practical problems encountered in ANOVA problems. To compare the means of several groups under heteroskedasticity approximate tests are used in the literature. Weerahandi (1995a) introduced a test using the notion of generalized p-values for comparing the means of several populations when the variances are not equal. This test is referred to as a generalized F-test.

In this paper we compare the size performance of the Generalized F-test and four other widely used procedures: the Classical F-test for ANOVA, the F-test obtained by the weighted least-squares to adjust for heteroskedasticity, the Brown-Forsythe-test, and the Welch-test. The comparison is based on a simulation study of size performance of tests applied to the balanced one-way model. The intended level of the tests is set at 0.05. While the Generalized F-test was found to have size not exceeding the intended level, as heteroskedasticity becomes severe the other tests were found to have poor size performance. With mild heteroskedasticity the Welch-test and the classical ANOVA F-test have the intended levels, and the Welch-test was found to perform better than the latter. Widely used (due to computational convenience) weighted F-test was found to have very serious size problems. The size advantage of the generalized F-test was also found to be robust even under severe deviations from the assumption of normality.  相似文献   

14.
The statistical problems associated with estimating the mean responding cell density in the limiting dilution assay (LDA) have largely been ignored. We evaluate techniques for analyzing LDA data from multiple biological samples, assumed to follow either a normal or gamma distribution. Simulated data is used to evaluate the performance of an unweighted mean, a log transform, and a weighted mean procedure described by Taswell (1987). In general, an unweighted mean with jackknife estimates will produce satisfactory results. In some cases, a log transform is more appropriate. Taswell's weighted mean algorithm is unable to estimate an accurate variance. We also show that methods which pool samples, or LDA data, are invalid. In addition, we show that optimization of the variance in multiple sample LDA's is dependent on the estimator, the between-organism variance, the replicate well size, and the numberof biological samples. However, this optimization is generally achieved by maximizing biological samples at the expense of well replicates.  相似文献   

15.
Most real-world shapes and images are characterized by high variability- they are not rigid, like crystals, for example—but they are strongly structured. Therefore, a fundamental task in the understanding and analysis of such image ensembles is the construction of models that incorporate both variability and structure in a mathematically precise way. The global shape models introduced in Grenander's general pattern theory are intended to do this. In this paper, we describe the representation of two-dimensional mitochondria and membranes in electron microscope photographs, and three-dimensional amoebae in optical sectioning microscopy. There are three kinds of variability to all of these patterns, which these representations accommodate. The first is the variability in shape and viewing orientation. For this, the typical structure is represented via linear, circular and spherical templates, with the variability accomodated via the application of transformations applied to the templates. The transformations form groups: scale, rotation and translation. They are locally applied throughout the continuum and of high dimension. The second is the textural variability; the inside and outside of these basic shapes are subject to random variation, as well as sensor noise. For this, statistical sensor models and Markov random field texture models are used to connect the constituent structures of the shapes to the measured data. The third variability type is associated with the fact that each scene is made up of a variable number of shapes; this number is not assumed to be known a priori. Each scene has a variable number of parameters encoding the transformations of the templates appropriate for that scene. For this, a single posterior distribution is defined over the countable union of spaces representing models of varying numbers of shapes. Bayesian inference is performed via computation of the conditional expectation of the parametrically defined shapes under the posterior. These conditional mean estimates are generated using jump-diffusion processes. Results for membranes, mitochondria and amoebae are shown.  相似文献   

16.
Most real-world shapes and images are characterized by high variability- they are not rigid, like crystals, for example—but they are strongly structured. Therefore, a fundamental task in the understanding and analysis of such image ensembles is the construction of models that incorporate both variability and structure in a mathematically precise way. The global shape models introduced in Grenander's general pattern theory are intended to do this. In this paper, we describe the representation of two-dimensional mitochondria and membranes in electron microscope photographs, and three-dimensional amoebae in optical sectioning microscopy. There are three kinds of variability to all of these patterns, which these representations accommodate. The first is the variability in shape and viewing orientation. For this, the typical structure is represented via linear, circular and spherical templates, with the variability accomodated via the application of transformations applied to the templates. The transformations form groups: scale, rotation and translation. They are locally applied throughout the continuum and of high dimension. The second is the textural variability; the inside and outside of these basic shapes are subject to random variation, as well as sensor noise. For this, statistical sensor models and Markov random field texture models are used to connect the constituent structures of the shapes to the measured data. The third variability type is associated with the fact that each scene is made up of a variable number of shapes; this number is not assumed to be known a priori. Each scene has a variable number of parameters encoding the transformations of the templates appropriate for that scene. For this, a single posterior distribution is defined over the countable union of spaces representing models of varying numbers of shapes. Bayesian inference is performed via computation of the conditional expectation of the parametrically defined shapes under the posterior. These conditional mean estimates are generated using jump-diffusion processes. Results for membranes, mitochondria and amoebae are shown.  相似文献   

17.
ABSTRACT

Second generation p-values preserve the simplicity that has made p-values popular while resolving critical flaws that promote misinterpretation of data, distraction by trivial effects, and unreproducible assessments of data. The second-generation p-value (SGPV) is an extension that formally accounts for scientific relevance by using a composite null hypothesis that captures null and scientifically trivial effects. Because the majority of spurious findings are small effects that are technically nonnull but practically indistinguishable from the null, the second-generation approach greatly reduces the likelihood of a false discovery. SGPVs promote transparency, rigor and reproducibility of scientific results by a priori identifying which candidate hypotheses are practically meaningful and by providing a more reliable statistical summary of when the data are compatible with the candidate hypotheses or null hypotheses, or when the data are inconclusive. We illustrate the importance of these advances using a dataset of 247,000 single-nucleotide polymorphisms, i.e., genetic markers that are potentially associated with prostate cancer.  相似文献   

18.
美国统计学会"关于统计显著性与P值"的官方声明发布之后,再次引发国内外研究学者对P值的广泛关注。在介绍国内统计教材中假设检验的基本内容和步骤的基础上,以"硬币投掷"与"背影识人"为例直观性解释P值、统计显著性与统计功效等相关概念,并引用心理学统计经典调查案例分析P值被误读的原因。同时,基于美国统计学会的声明,给出正确使用P值的建议。  相似文献   

19.
We present an application study which exemplifies a cutting edge statistical approach for detecting climate regime shifts. The algorithm uses Bayesian computational techniques that make time‐efficient analysis of large volumes of climate data possible. Output includes probabilistic estimates of the number and duration of regimes, the number and probability distribution of hidden states, and the probability of a regime shift in any year of the time series. Analysis of the Pacific Decadal Oscillation (PDO) index is provided as an example. Two states are detected: one is associated with positive values of the PDO and presents lower interannual variability, while the other corresponds to negative values of the PDO and greater variability. We compare this approach with existing alternatives from the literature and highlight the potential for ours to unlock features hidden in climate data.  相似文献   

20.
Test statistics from the class of two-sample linear rank tests are commonly used to compare a treatment group with a control group. Two independent random samples of sizes m and n are drawn from two populations. As a result, N = m + n observations in total are obtained. The aim is to test the null hypothesis of identical distributions. The alternative hypothesis is that the populations are of the same form but with a different measure of central tendency. This article examines mid p-values from the null permutation distributions of tests based on the class of two-sample linear rank statistics. The results obtained indicate that normal approximation-based computations are very close to the permutation simulations, and they provide p-values that are close to the exact mid p-values for all practical purposes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号