首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ABSTRACT

The cost and time of pharmaceutical drug development continue to grow at rates that many say are unsustainable. These trends have enormous impact on what treatments get to patients, when they get them and how they are used. The statistical framework for supporting decisions in regulated clinical development of new medicines has followed a traditional path of frequentist methodology. Trials using hypothesis tests of “no treatment effect” are done routinely, and the p-value < 0.05 is often the determinant of what constitutes a “successful” trial. Many drugs fail in clinical development, adding to the cost of new medicines, and some evidence points blame at the deficiencies of the frequentist paradigm. An unknown number effective medicines may have been abandoned because trials were declared “unsuccessful” due to a p-value exceeding 0.05. Recently, the Bayesian paradigm has shown utility in the clinical drug development process for its probability-based inference. We argue for a Bayesian approach that employs data from other trials as a “prior” for Phase 3 trials so that synthesized evidence across trials can be utilized to compute probability statements that are valuable for understanding the magnitude of treatment effect. Such a Bayesian paradigm provides a promising framework for improving statistical inference and regulatory decision making.  相似文献   

2.
We propose a simple method for evaluating the model that has been chosen by an adaptive regression procedure, our main focus being the lasso. This procedure deletes each chosen predictor and refits the lasso to get a set of models that are “close” to the chosen “base model,” and compares the error rates of the base model with that of nearby models. If the deletion of a predictor leads to significant deterioration in the model's predictive power, the predictor is called indispensable; otherwise, the nearby model is called acceptable and can serve as a good alternative to the base model. This provides both an assessment of the predictive contribution of each variable and a set of alternative models that may be used in place of the chosen model. We call this procedure “Next-Door analysis” since it examines models “next” to the base model. It can be applied to supervised learning problems with 1 penalization and stepwise procedures. We have implemented it in the R language as a library to accompany the well-known glmnet library. The Canadian Journal of Statistics 48: 447–470; 2020 © 2020 Statistical Society of Canada  相似文献   

3.
ABSTRACT

Researchers commonly use p-values to answer the question: How strongly does the evidence favor the alternative hypothesis relative to the null hypothesis? p-Values themselves do not directly answer this question and are often misinterpreted in ways that lead to overstating the evidence against the null hypothesis. Even in the “post p?<?0.05 era,” however, it is quite possible that p-values will continue to be widely reported and used to assess the strength of evidence (if for no other reason than the widespread availability and use of statistical software that routinely produces p-values and thereby implicitly advocates for their use). If so, the potential for misinterpretation will persist. In this article, we recommend three practices that would help researchers more accurately interpret p-values. Each of the three recommended practices involves interpreting p-values in light of their corresponding “Bayes factor bound,” which is the largest odds in favor of the alternative hypothesis relative to the null hypothesis that is consistent with the observed data. The Bayes factor bound generally indicates that a given p-value provides weaker evidence against the null hypothesis than typically assumed. We therefore believe that our recommendations can guard against some of the most harmful p-value misinterpretations. In research communities that are deeply attached to reliance on “p?<?0.05,” our recommendations will serve as initial steps away from this attachment. We emphasize that our recommendations are intended merely as initial, temporary steps and that many further steps will need to be taken to reach the ultimate destination: a holistic interpretation of statistical evidence that fully conforms to the principles laid out in the ASA statement on statistical significance and p-values.  相似文献   

4.
Abstract

The present note explores sources of misplaced criticisms of P-values, such as conflicting definitions of “significance levels” and “P-values” in authoritative sources, and the consequent misinterpretation of P-values as error probabilities. It then discusses several properties of P-values that have been presented as fatal flaws: That P-values exhibit extreme variation across samples (and thus are “unreliable”), confound effect size with sample size, are sensitive to sample size, and depend on investigator sampling intentions. These properties are often criticized from a likelihood or Bayesian framework, yet they are exactly the properties P-values should exhibit when they are constructed and interpreted correctly within their originating framework. Other common criticisms are that P-values force users to focus on irrelevant hypotheses and overstate evidence against those hypotheses. These problems are not however properties of P-values but are faults of researchers who focus on null hypotheses and overstate evidence based on misperceptions that p?=?0.05 represents enough evidence to reject hypotheses. Those problems are easily seen without use of Bayesian concepts by translating the observed P-value p into the Shannon information (S-value or surprisal) –log2(p).  相似文献   

5.
This article provides a strategy to identify the existence and direction of a causal effect in a generalized nonparametric and nonseparable model identified by instrumental variables. The causal effect concerns how the outcome depends on the endogenous treatment variable. The outcome variable, treatment variable, other explanatory variables, and the instrumental variable can be essentially any combination of continuous, discrete, or “other” variables. In particular, it is not necessary to have any continuous variables, none of the variables need to have large support, and the instrument can be binary even if the corresponding endogenous treatment variable and/or outcome is continuous. The outcome can be mismeasured or interval-measured, and the endogenous treatment variable need not even be observed. The identification results are constructive, and can be empirically implemented using standard estimation results.  相似文献   

6.
7.
In this paper optimal experimental designs for multilevel models with covariates and two levels of nesting are considered. Multilevel models are used to describe the relationship between an outcome variable and a treatment condition and covariate. It is assumed that the outcome variable is measured on a continuous scale. As optimality criteria D-optimality, and L-optimality are chosen. It is shown that pre-stratification on the covariate leads to a more efficient design and that the person level is the optimal level of randomization. Furthermore, optimal sample sizes are given and it is shown that these do not depend on the optimality criterion when randomization is done at the group level.  相似文献   

8.
This paper describes a computer program GTEST for designing group testing experiments for classifying each member of a population of items as “good” or “defective”. The outcome of a test on a group of items is either “negative” (if all items in the group are good) or “positive” (if at least one of the items is defective, but it is not known which). GTEST is based on a Bayesian approach. At each stage, it attempts to maximize (nearly) the expected reduction in the “entropy”, which is a quantitative measure of the amount of uncertainty about the state of the items. The user controls the procedure through specification of the prior probabilities of being defective, restrictions on the construction of the test group, and priorities that are assigned to the items. The nominal prior probabilities can be modified adaptively, to reduce the sensitivity of the procedure to the proportion of defectives in the population.  相似文献   

9.
A new two-sample rank test for location is proposed. This test, called the D-test, is asymptotically efficient for underlying densities which follow a “flat-topped” Laplace distribution. The D-statistic is simple to compute, and the test may be suitable when there is censoring. The D-test includes the median test as a special case.  相似文献   

10.
The last decade saw enormous progress in the development of causal inference tools to account for noncompliance in randomized clinical trials. With survival outcomes, structural accelerated failure time (SAFT) models enable causal estimation of effects of observed treatments without making direct assumptions on the compliance selection mechanism. The traditional proportional hazards model has however rarely been used for causal inference. The estimator proposed by Loeys and Goetghebeur (2003, Biometrics vol. 59 pp. 100–105) is limited to the setting of all or nothing exposure. In this paper, we propose an estimation procedure for more general causal proportional hazards models linking the distribution of potential treatment-free survival times to the distribution of observed survival times via observed (time-constant) exposures. Specifically, we first build models for observed exposure-specific survival times. Next, using the proposed causal proportional hazards model, the exposure-specific survival distributions are backtransformed to their treatment-free counterparts, to obtain – after proper mixing – the unconditional treatment-free survival distribution. Estimation of the parameter(s) in the causal model is then based on minimizing a test statistic for equality in backtransformed survival distributions between randomized arms.  相似文献   

11.
12.
P. Reimnitz 《Statistics》2013,47(2):245-263
The classical “Two Armed Bandit” problem with Bernoulli-distributed outcomes is being considered. First the terms “asymptotic nearly admissibility” and “asymptotic nearly optimality” are defined. A nontrivial asymptotic nearly admissible and (with respect to a certain Bayes risk) asymptotic nearly optimal strategy is presented, then these properties are shown. Finally, it is discussed how these results generalize to the non-Bernoulli cases and the “k-Armed Bandit” problem (;k≧2).  相似文献   

13.
There are two distinct definitions of “P-value” for evaluating a proposed hypothesis or model for the process generating an observed dataset. The original definition starts with a measure of the divergence of the dataset from what was expected under the model, such as a sum of squares or a deviance statistic. A P-value is then the ordinal location of the measure in a reference distribution computed from the model and the data, and is treated as a unit-scaled index of compatibility between the data and the model. In the other definition, a P-value is a random variable on the unit interval whose realizations can be compared to a cutoff α to generate a decision rule with known error rates under the model and specific alternatives. It is commonly assumed that realizations of such decision P-values always correspond to divergence P-values. But this need not be so: Decision P-values can violate intuitive single-sample coherence criteria where divergence P-values do not. It is thus argued that divergence and decision P-values should be carefully distinguished in teaching, and that divergence P-values are the relevant choice when the analysis goal is to summarize evidence rather than implement a decision rule.  相似文献   

14.
ABSTRACT

Various approaches can be used to construct a model from a null distribution and a test statistic. I prove that one such approach, originating with D. R. Cox, has the property that the p-value is never greater than the Generalized Likelihood Ratio (GLR). When combined with the general result that the GLR is never greater than any Bayes factor, we conclude that, under Cox’s model, the p-value is never greater than any Bayes factor. I also provide a generalization, illustrations for the canonical Normal model, and an alternative approach based on sufficiency. This result is relevant for the ongoing discussion about the evidential value of small p-values, and the movement among statisticians to “redefine statistical significance.”  相似文献   

15.
The importance of interval forecasts is reviewed. Several general approaches to calculating such forecasts are described and compared. They include the use of theoretical formulas based on a fitted probability model (with or without a correction for parameter uncertainty), various “approximate” formulas (which should be avoided), and empirically based, simulation, and resampling procedures. The latter are useful when theoretical formulas are not available or there are doubts about some model assumptions. The distinction between a forecasting method and a forecasting model is expounded. For large groups of series, a forecasting method may be chosen in a fairly ad hoc way. With appropriate checks, it may be possible to base interval forecasts on the model for which the method is optimal. It is certainly unsound to use a model for which the method is not optimal, but, strangely, this is sometimes done. Some general comments are made as to why prediction intervals tend to be too narrow in practice to encompass the required proportion of future observations. An example demonstrates the overriding importance of careful model specification. In particular, when data are “nearly nonstationary,” the difference between fitting a stationary and a nonstationary model is critical.  相似文献   

16.
ABSTRACT

To estimate causal treatment effects, we propose a new matching approach based on the reduced covariates obtained from sufficient dimension reduction. Compared with the original covariates and the propensity score, which are commonly used for matching in the literature, the reduced covariates are nonparametrically estimable and are effective in imputing the missing potential outcomes, under a mild assumption on the low-dimensional structure of the data. Under the ignorability assumption, the consistency of the proposed approach requires a weaker common support condition. In addition, researchers are allowed to employ different reduced covariates to find matched subjects for different treatment groups. We develop relevant asymptotic results and conduct simulation studies as well as real data analysis to illustrate the usefulness of the proposed approach.  相似文献   

17.
For the problem of estimating a parameter θ when θ is known to lie in a closed, convex subset D of Rk, conditions are given under which estimators δ of θ cannot be Bayes estimators, as well as conditions under which δ is inadmissible. The estimators considered are so-called “boundary estimators”. Maximum-likelihood estimators in truncated parameter spaces are examples to which our results often apply. For the special case when k = 1 and D is compact, two classes of estimators dominating the inadmissible ones are constructed. Some examples are given.  相似文献   

18.
An important problem for fitting local linear regression is the choice of the smoothing parameter. As the smoothing parameter becomes large, the estimator tends to a straight line, which is the least squares fit in the ordinary linear regression setting. This property may be used to assess the adequacy of a simple linear model. Motivated by Silverman's (1981) work in kernel density estimation, a suitable test statistic is the critical smoothing parameter where the estimate changes from nonlinear to linear, while linearity or non- linearity requires a more precise judgment. We define the critical smoothing parameter through the approximate F-tests by Hastie and Tibshirani (1990). To assess the significance, the “wild bootstrap” procedure is used to replicate the data and the proportion of bootstrap samples which give a nonlinear estimate when using the critical bandwidth is obtained as the p-value. Simulation results show that the critical smoothing test is useful in detecting a wide range of alternatives.  相似文献   

19.
In searching for the “best” growth inhibitor, we decided to consider growth inhibition in terms of the lengths of the terminal sprouts. For it is logical to infer that the trees with the longer sprouts (after a 20-month period) will most likely be the ones that will need trimming in the future. Additionally, we reasoned that if a particular treatment produced a smaller proportion of “long” sprouts, then it would be a more effective growth inhibitor. It was now necessary to define what was meant by “long”. After consultation with foresters we chose cutoff lengths of 15.0, 25.0 and 35.0 cm. Hence the response variable was chosen to be the proportion of the terminal sprouts on a tree that exceeded a specified cutoff length. By varying the cutoff lengths, we would minimize the effect of the arbitrariness involved in choosing one particular length.  相似文献   

20.
This paper sets out to identify the abilities that a person needs to be able to successfully use an experimental device, such as a probability wheel or balls in an urn, for the elicitation of subjective probabilities. It is assumed that the successful use of the device requires that the person elicits unique probability values that obey the standard probability laws. This leads to a definition of probability based on the idea of the similarity between the likeliness of events and this concept is naturally extended to the idea that probabilities have strengths, which relates to information about the likeliness of an event that lies beyond a simple probability value. The latter notion is applied to the problem of explaining the Ellsberg paradox. To avoid the definition of probability being circular, probabilities are defined such that they depend on the choice of a reference set of events R which, in simple cases, corresponds to the raw outcomes produced by using an experimental device. However, it is shown that even when the events in R are considered as having an “equal chance” of occurring, the values and/or strengths of probabilities can still be affected by the choice of the set R.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号