首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ABSTRACT

A statistical test can be seen as a procedure to produce a decision based on observed data, where some decisions consist of rejecting a hypothesis (yielding a significant result) and some do not, and where one controls the probability to make a wrong rejection at some prespecified significance level. Whereas traditional hypothesis testing involves only two possible decisions (to reject or not a null hypothesis), Kaiser’s directional two-sided test as well as the more recently introduced testing procedure of Jones and Tukey, each equivalent to running two one-sided tests, involve three possible decisions to infer the value of a unidimensional parameter. The latter procedure assumes that a point null hypothesis is impossible (e.g., that two treatments cannot have exactly the same effect), allowing a gain of statistical power. There are, however, situations where a point hypothesis is indeed plausible, for example, when considering hypotheses derived from Einstein’s theories. In this article, we introduce a five-decision rule testing procedure, equivalent to running a traditional two-sided test in addition to two one-sided tests, which combines the advantages of the testing procedures of Kaiser (no assumption on a point hypothesis being impossible) and Jones and Tukey (higher power), allowing for a nonnegligible (typically 20%) reduction of the sample size needed to reach a given statistical power to get a significant result, compared to the traditional approach.  相似文献   

2.

Decisions on the presence of seasonal unit roots in economic time series are commonly taken on the basis of statistical hypothesis tests. Some of these tests have absence of unit roots as the null hypothesis, while others use unit roots as their null. Following a suggestion by Hylleberg (1995) to combine such tests in order to reach a clearer conclusion, we evaluate the merits of such test combinations on the basis of a Bayesian decision setup. We find that the potential gains over a pure application of the most common test due to Hylleberg et al. (1990) can be small.  相似文献   

3.

Item response models are essential tools for analyzing results from many educational and psychological tests. Such models are used to quantify the probability of correct response as a function of unobserved examinee ability and other parameters explaining the difficulty and the discriminatory power of the questions in the test. Some of these models also incorporate a threshold parameter for the probability of the correct response to account for the effect of guessing the correct answer in multiple choice type tests. In this article we consider fitting of such models using the Gibbs sampler. A data augmentation method to analyze a normal-ogive model incorporating a threshold guessing parameter is introduced and compared with a Metropolis-Hastings sampling method. The proposed method is an order of magnitude more efficient than the existing method. Another objective of this paper is to develop Bayesian model choice techniques for model discrimination. A predictive approach based on a variant of the Bayes factor is used and compared with another decision theoretic method which minimizes an expected loss function on the predictive space. A classical model choice technique based on a modified likelihood ratio test statistic is shown as one component of the second criterion. As a consequence the Bayesian methods proposed in this paper are contrasted with the classical approach based on the likelihood ratio test. Several examples are given to illustrate the methods.  相似文献   

4.
ABSTRACT

The cost and time of pharmaceutical drug development continue to grow at rates that many say are unsustainable. These trends have enormous impact on what treatments get to patients, when they get them and how they are used. The statistical framework for supporting decisions in regulated clinical development of new medicines has followed a traditional path of frequentist methodology. Trials using hypothesis tests of “no treatment effect” are done routinely, and the p-value < 0.05 is often the determinant of what constitutes a “successful” trial. Many drugs fail in clinical development, adding to the cost of new medicines, and some evidence points blame at the deficiencies of the frequentist paradigm. An unknown number effective medicines may have been abandoned because trials were declared “unsuccessful” due to a p-value exceeding 0.05. Recently, the Bayesian paradigm has shown utility in the clinical drug development process for its probability-based inference. We argue for a Bayesian approach that employs data from other trials as a “prior” for Phase 3 trials so that synthesized evidence across trials can be utilized to compute probability statements that are valuable for understanding the magnitude of treatment effect. Such a Bayesian paradigm provides a promising framework for improving statistical inference and regulatory decision making.  相似文献   

5.
ABSTRACT

In a test of significance, it is common practice to report the p-value as one way of summarizing the incompatibility between a set of data and a proposed model for the data constructed under a set of assumptions together with a null hypothesis. However, the p-value does have some flaws, one being in general its definition for two-sided tests and a related serious logical one of incoherence, in its interpretation as a statistical measure of evidence for its respective null hypothesis. We shall address these two issues in this article.  相似文献   

6.
Abstract

Statistical distributions are very useful in describing and predicting real world phenomena. In many applied areas there is a clear need for the extended forms of the well-known distributions. Generally, the new distributions are more flexible to model real data that present a high degree of skewness and kurtosis. The choice of the best-suited statistical distribution for modeling data is very important.

In this article, we proposed an extended generalized Gompertz (EGGo) family of EGGo. Certain statistical properties of EGGo family including distribution shapes, hazard function, skewness, limit behavior, moments and order statistics are discussed. The flexibility of this family is assessed by its application to real data sets and comparison with other competing distributions. The maximum likelihood equations for estimating the parameters based on real data are given. The performances of the estimators such as maximum likelihood estimators, least squares estimators, weighted least squares estimators, Cramer-von-Mises estimators, Anderson-Darling estimators and right tailed Anderson-Darling estimators are discussed. The likelihood ratio test is derived to illustrate that the EGGo distribution is better than other nested models in fitting data set or not. We use R software for simulation in order to perform applications and test the validity of this model.  相似文献   

7.
ABSTRACT

When the power of different nonparametric tests is evaluated by simulation, the alternative hypothesis should be carefully designed to ensure validity of the results in the specific research field. In the article, we propose a probit-based progressive shift alternative that is more realistic than the simple shift alternative for skewed non-negative data that occur in many research areas. Our motivation comes from parasitology. The progressive shift alternative is used to compare the power of six location-scale tests and seven commonly used location tests for several skewed theoretical and empirical parasite distributions. It is shown that location-scale tests are more powerful than location tests. Programs for applying the methods studied in the article are freely available for download.  相似文献   

8.
ABSTRACT

This article develops and investigates a confidence interval and hypothesis testing procedure for a population proportion based on a ranked set sample (RSS). The inference is exact, in the sense that it is based on the exact distribution of the total number of successes observed in the RSS. Furthermore, this distribution can be readily computed with the well-known and freely available R statistical software package. A data example that illustrates the methodology is presented. In addition, the properties of the inference procedures are compared with their simple random sample (SRS) counterparts. In regards to expected lengths of confidence intervals and the power of tests, the RSS inference procedures are superior to the SRS methods.  相似文献   

9.
In this article we discuss variable selection for decision making with focus on decisions regarding when to provide treatment and which treatment to provide. Current variable selection techniques were developed for use in a supervised learning setting where the goal is prediction of the response. These techniques often downplay the importance of interaction variables that have small predictive ability but that are critical when the ultimate goal is decision making rather than prediction. We propose two new techniques designed specifically to find variables that aid in decision making. Simulation results are given along with an application of the methods on data from a randomized controlled trial for the treatment of depression.  相似文献   

10.
Abstract

A convention in designing randomized clinical trials has been to choose sample sizes that yield specified statistical power when testing hypotheses about treatment response. Manski and Tetenov recently critiqued this convention and proposed enrollment of sufficiently many subjects to enable near-optimal treatment choices. This article develops a refined version of that analysis applicable to trials comparing aggressive treatment of patients with surveillance. The need for a refined analysis arises because the earlier work assumed that there is only a primary health outcome of interest, without secondary outcomes. An important aspect of choice between surveillance and aggressive treatment is that the latter may have side effects. One should then consider how the primary outcome and side effects jointly determine patient welfare. This requires new analysis of sample design. As a case study, we reconsider a trial comparing nodal observation and lymph node dissection when treating patients with cutaneous melanoma. Using a statistical power calculation, the investigators assigned 971 patients to dissection and 968 to observation. We conclude that assigning 244 patients to each option would yield findings that enable suitably near-optimal treatment choice. Thus, a much smaller sample size would have sufficed to inform clinical practice.  相似文献   

11.
The method of control variates has been intensively used for reducing the variance of estimated (linear) regression metamodels in simulation experiments. In contrast to previous studies, this article presents a procedure for applying multiple control variates when the objective is to estimate and validate a nonlinear regression metamodel for a single response, in terms of selected decision variables. This procedure includes robust statistical regression techniques for estimation and validation. Assuming joint normality of the response and controls, confidence intervals and hypothesis tests for the metamodel parameters are obtained. Finally, results for measuring the efficiency of the use of control variates are discussed.  相似文献   

12.
ABSTRACT

Efforts to address a reproducibility crisis have generated several valid proposals for improving the quality of scientific research. We argue there is also need to address the separate but related issues of relevance and responsiveness. To address relevance, researchers must produce what decision makers actually need to inform investments and public policy—that is, the probability that a claim is true or the probability distribution of an effect size given the data. The term responsiveness refers to the irregularity and delay in which issues about the quality of research are brought to light. Instead of relying on the good fortune that some motivated researchers will periodically conduct efforts to reveal potential shortcomings of published research, we could establish a continuous quality-control process for scientific research itself. Quality metrics could be designed through the application of this statistical process control for the research enterprise. We argue that one quality control metric—the probability that a research hypothesis is true—is required to address at least relevance and may also be part of the solution for improving responsiveness and reproducibility. This article proposes a “straw man” solution which could be the basis of implementing these improvements. As part of this solution, we propose one way to “bootstrap” priors. The processes required for improving reproducibility and relevance can also be part of a comprehensive statistical quality control for science itself by making continuously monitored metrics about the scientific performance of a field of research.  相似文献   

13.
Various exact tests for statistical inference are available for powerful and accurate decision rules provided that corresponding critical values are tabulated or evaluated via Monte Carlo methods. This article introduces a novel hybrid method for computing p‐values of exact tests by combining Monte Carlo simulations and statistical tables generated a priori. To use the data from Monte Carlo generations and tabulated critical values jointly, we employ kernel density estimation within Bayesian‐type procedures. The p‐values are linked to the posterior means of quantiles. In this framework, we present relevant information from the Monte Carlo experiments via likelihood‐type functions, whereas tabulated critical values are used to reflect prior distributions. The local maximum likelihood technique is employed to compute functional forms of prior distributions from statistical tables. Empirical likelihood functions are proposed to replace parametric likelihood functions within the structure of the posterior mean calculations to provide a Bayesian‐type procedure with a distribution‐free set of assumptions. We derive the asymptotic properties of the proposed nonparametric posterior means of quantiles process. Using the theoretical propositions, we calculate the minimum number of needed Monte Carlo resamples for desired level of accuracy on the basis of distances between actual data characteristics (e.g. sample sizes) and characteristics of data used to present corresponding critical values in a table. The proposed approach makes practical applications of exact tests simple and rapid. Implementations of the proposed technique are easily carried out via the recently developed STATA and R statistical packages.  相似文献   

14.
A general way of testing in the presence of nuisance parameters is to choose from a family of tests the one to maximize evidence against null hypothesis; that is, to minimize the significance level. This method yields exact tests when applied to distribution-free testing in various statistical designs; arbitrary choice of score functions is eliminated. However, the exact null distributions are highly non-normal, and there are problems with both computation and asymptotic theory.  相似文献   

15.
ABSTRACT

Researchers commonly use p-values to answer the question: How strongly does the evidence favor the alternative hypothesis relative to the null hypothesis? p-Values themselves do not directly answer this question and are often misinterpreted in ways that lead to overstating the evidence against the null hypothesis. Even in the “post p?<?0.05 era,” however, it is quite possible that p-values will continue to be widely reported and used to assess the strength of evidence (if for no other reason than the widespread availability and use of statistical software that routinely produces p-values and thereby implicitly advocates for their use). If so, the potential for misinterpretation will persist. In this article, we recommend three practices that would help researchers more accurately interpret p-values. Each of the three recommended practices involves interpreting p-values in light of their corresponding “Bayes factor bound,” which is the largest odds in favor of the alternative hypothesis relative to the null hypothesis that is consistent with the observed data. The Bayes factor bound generally indicates that a given p-value provides weaker evidence against the null hypothesis than typically assumed. We therefore believe that our recommendations can guard against some of the most harmful p-value misinterpretations. In research communities that are deeply attached to reliance on “p?<?0.05,” our recommendations will serve as initial steps away from this attachment. We emphasize that our recommendations are intended merely as initial, temporary steps and that many further steps will need to be taken to reach the ultimate destination: a holistic interpretation of statistical evidence that fully conforms to the principles laid out in the ASA statement on statistical significance and p-values.  相似文献   

16.
ABSTRACT

Online consumer product ratings data are increasing rapidly. While most of the current graphical displays mainly represent the average ratings, Ho and Quinn proposed an easily interpretable graphical display based on an ordinal item response theory (IRT) model, which successfully accounts for systematic interrater differences. Conventionally, the discrimination parameters in IRT models are constrained to be positive, particularly in the modeling of scored data from educational tests. In this article, we use real-world ratings data to demonstrate that such a constraint can have a great impact on the parameter estimation. This impact on estimation was explained through rater behavior. We also discuss correlation among raters and assess the prediction accuracy for both the constrained and the unconstrained models. The results show that the unconstrained model performs better when a larger fraction of rater pairs exhibit negative correlations in ratings.  相似文献   

17.
This paper describes the author's research connecting the empirical analysis of treatment response with the normative analysis of treatment choice under ambiguity. Imagine a planner who must choose a treatment rule assigning a treatment to each member of a heterogeneous population of interest. The planner observes certain covariates for each person. Each member of the population has a response function mapping treatments into a real-valued outcome of interest. Suppose that the planner wants to choose a treatment rule that maximizes the population mean outcome. An optimal rule assigns to each member of the population a treatment that maximizes mean outcome conditional on the person's observed covariates. However, identification problems in the empirical analysis of treatment response commonly prevent planners from knowing the conditional mean outcomes associated with alternative treatments; hence planners commonly face problems of treatment choice under ambiguity. The research surveyed here characterizes this ambiguity in practical settings where the planner may be able to bound but not identify the relevant conditional mean outcomes. The statistical problem of treatment choice using finite-sample data is discussed as well.  相似文献   

18.
This article introduces a class of statistical tests for the hypothesis that some feature that is present in each of several variables is common to them. Features are data properties such as serial correlation, trends, seasonality, heteroscedasticity, autoregressive conditional hetero-scedasticity, and excess kurtosis. A feature is detected by a hypothesis test taking no feature as the null, and a common feature is detected by a test that finds linear combinations of variables with no feature. Often, an exact asymptotic critical value can be obtained that is simply a test of overidentifying restrictions in an instrumental variable regression. This article tests for a common international business cycle.  相似文献   

19.

In this article we examine the effect that logarithmic and power transformations have on the order of integration in raw time series. For this purpose, we use a version of the tests of Robinson (1994) that permits us to test I ( d ) statistical models. The results, obtained via Monte Carlo, show that there is no effect in the degree of dependence of the series when this type of transformations are employed, resulting thus in useful mechanisms to be applied when a more plausible economic interpretation of the data is required.  相似文献   

20.
ABSTRACT

A two-dimensionally indexed random coefficients autoregressive models (2D ? RCAR) and the corresponding statistical inference are important tools for the analysis of spatial lattice data. The study of such models is motivated by their second-order properties that are similar to those of 2D ? (G)ARCH which play an important role in spatial econometrics. In this article, we study the asymptotic properties of two-stage generalized moment method (2S ? GMM) under general asymptotic framework for 2D ? RCA models. So, the efficiency, strong consistency, the asymptotic normality, and hypothesis tests of 2S ? GMM estimation are derived. A simulation experiment is presented to highlight the theoretical results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号