首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
ABSTRACT

Scientific research of all kinds should be guided by statistical thinking: in the design and conduct of the study, in the disciplined exploration and enlightened display of the data, and to avoid statistical pitfalls in the interpretation of the results. However, formal, probability-based statistical inference should play no role in most scientific research, which is inherently exploratory, requiring flexible methods of analysis that inherently risk overfitting. The nature of exploratory work is that data are used to help guide model choice, and under these circumstances, uncertainty cannot be precisely quantified, because of the inevitable model selection bias that results. To be valid, statistical inference should be restricted to situations where the study design and analysis plan are specified prior to data collection. Exploratory data analysis provides the flexibility needed for most other situations, including statistical methods that are regularized, robust, or nonparametric. Of course, no individual statistical analysis should be considered sufficient to establish scientific validity: research requires many sets of data along many lines of evidence, with a watchfulness for systematic error. Replicating and predicting findings in new data and new settings is a stronger way of validating claims than blessing results from an isolated study with statistical inferences.  相似文献   

2.
Covariance matrices, or in general matrices of sums of squares and cross-products, are used as input in many multivariate analyses techniques. The eigenvalues of these matrices play an important role in the statistical analysis of data including estimation and hypotheses testing. It has been recognized that one or few observations can exert an undue influence on the eigenvalues of a covariance matrix. The relationship between the eigenvalues of the covariance matrix computed from all data and the eigenvalues of the perturbed covariance matrix (a covariance matrix computed after a small subset of the observations has been deleted) cannot in general be written in closed-form. Two methods for approximating the eigenvalues of a perturbed covariance matrix have been suggested by Hadi (1988) and Wang and Nyquist (1991) for the case of a perturbation by a single observation. In this paper we improve on these two methods and give some additional theoretical results that may give further insight into the problem. We also compare the two improved approximations in terms of their accuracies.  相似文献   

3.
4.
Open Data (OD) is an emerging term in the process of defining how scientific data may be published and re-used without price or permission barriers. Scientists generally see published data as belonging to the scientific community, but many publishers claim copyright over data and will not allow its re-use without permission. This is a major impediment to the progress of scholarship in the digital age. This article reviews the need for Open Data, shows examples of why Open Data are valuable, and summarizes some early initiatives in formalizing the right of access to and re-use of scientific data.  相似文献   

5.
Identifying important biomarkers that are predictive for cancer patients’ prognosis is key in gaining better insights into the biological influences on the disease and has become a critical component of precision medicine. The emergence of large-scale biomedical survival studies, which typically involve excessive number of biomarkers, has brought high demand in designing efficient screening tools for selecting predictive biomarkers. The vast amount of biomarkers defies any existing variable selection methods via regularization. The recently developed variable screening methods, though powerful in many practical setting, fail to incorporate prior information on the importance of each biomarker and are less powerful in detecting marginally weak while jointly important signals. We propose a new conditional screening method for survival outcome data by computing the marginal contribution of each biomarker given priorily known biological information. This is based on the premise that some biomarkers are known to be associated with disease outcomes a priori. Our method possesses sure screening properties and a vanishing false selection rate. The utility of the proposal is further confirmed with extensive simulation studies and analysis of a diffuse large B-cell lymphoma dataset. We are pleased to dedicate this work to Jack Kalbfleisch, who has made instrumental contributions to the development of modern methods of analyzing survival data.  相似文献   

6.
Time to event outcome trials in clinical research are typically large, expensive and high‐profile affairs. Such trials are commonplace in oncology and cardiovascular therapeutic areas but are also seen in other areas such as respiratory in indications like chronic obstructive pulmonary disease. Their progress is closely monitored and results are often eagerly awaited. Once available, the top line result is often big news, at least within the therapeutic area in which it was conducted, and the data are subsequently fully scrutinized in a series of high‐profile publications. In such circumstances, the statistician has a vital role to play in the design, conduct, analysis and reporting of the trial. In particular, in drug development it is incumbent on the statistician to ensure at the outset that the sizing of the trial is fully appreciated by their medical, and other non‐statistical, drug development team colleagues and that the risk of delivering a statistically significant but clinically unpersuasive result is minimized. The statistician also has a key role in advising the team when, early in the life of an outcomes trial, a lower than anticipated event rate appears to be emerging. This paper highlights some of the important features relating to outcome trial sample sizing and makes a number of simple recommendations aimed at ensuring a better, common understanding of the interplay between sample size and power and the final result required to provide a statistically positive and clinically persuasive outcome. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

7.
Library Binding     
Abstract

Open Data (OD) is an emerging term in the process of defining how scientific data may be published and re-used without price or permission barriers. Scientists generally see published data as belonging to the scientific community, but many publishers claim copyright over data and will not allow its re-use without permission. This is a major impediment to the progress of scholarship in the digital age. This article reviews the need for Open Data, shows examples of why Open Data are valuable, and summarizes some early initiatives in formalizing the right of access to and re-use of scientific data.  相似文献   

8.
赵彦云 《统计研究》2015,32(6):3-10
本文认为大数据统计与三个问题有关:大数据发展趋向极限无穷,人类社会数据信息将发生什么变化?大数据发展会不会产生危害社会进步的数据垃圾?大数据即使是一场革命,那么作为数据科学的统计学脱胎换骨地继承与发展是什么?本文对此回答包括,提出大数据发展的统计设计观点,从理论和实践上做出了论证分析,并联系我国实际,探讨了我国大数据发展中的统计设计理论和内容要点。  相似文献   

9.
This article provides a unified methodology of meta-analysis that synthesizes medical evidence by using both available individual patient data (IPD) and published summary statistics within the framework of likelihood principle. Most up-to-date scientific evidence on medicine is crucial information not only to consumers but also to decision makers, and can only be obtained when existing evidence from the literature and the most recent individual patient data are optimally synthesized. We propose a general linear mixed effects model to conduct meta-analyses when individual patient data are only available for some of the studies and summary statistics have to be used for the rest of the studies. Our approach includes both the traditional meta-analyses in which only summary statistics are available for all studies and the other extreme case in which individual patient data are available for all studies as special examples. We implement the proposed model with statistical procedures from standard computing packages. We provide measures of heterogeneity based on the proposed model. Finally, we demonstrate the proposed methodology through a real life example studying the cerebrospinal fluid biomarkers to identify individuals with high risk of developing Alzheimer's disease when they are still cognitively normal.  相似文献   

10.
In drug development, we ask ourselves which population, endpoint and treatment comparison should be investigated. In this context, we also debate what matters most to the different stakeholders that are involved in clinical drug development, for example, patients, physicians, regulators and payers. With the publication of draft ICH E9 addendum on estimands in 2017, we now have a common framework and language to discuss such questions in an informed and transparent way. This has led to the estimand discussion being a key element in study development, including design, analysis and interpretation of a treatment effect. At an invited session at the 2018 PSI annual conference, PSI hosted a role‐play debate where the aim of the session was to mimic a regulatory and payer scientific advice discussion for a COPD drug. Including role‐play views from an industry sponsor, a patient, a regulator and a payer. This paper presents the invented COPD case‐study design and considerations relating to appropriate estimands are discussed by each of the stakeholders from their differing viewpoints with the additional inclusion of a technical (academic) perspective. The rationale for each perspective on approaches for handling intercurrent events is presented, with a key emphasis on the application of while‐on‐treatment and treatment policy estimands in this context. It is increasingly recognised that the treatment effect estimated by the treatment policy approach may not always be of primary clinical interest and may not appropriately communicate to patients the efficacy they can expect if they take the treatment as directed.  相似文献   

11.
Hopes and expectations for the use and utility of new, emerging biomarkers in drug development have probably never been higher, especially in oncology. Biomarkers are exalted as vital patient selection tools in an effort to target those most likely to benefit from a new drug, and so to reduce development costs, lessen risk and expedite developments times. It is further hoped that biomarkers can be used as surrogate endpoints for clinical outcomes, to demonstrate effectiveness and, ultimately, to support drug approval. However, I perceive that all is not straightforward, and, particularly in terms of the promise of accelerated drug development, biomarker strategies may not in all cases deliver the advances and advantages hoped for.  相似文献   

12.
Noting that several rule discovery algorithms in data mining can produce a large number of irrelevant or obvious rules from data, there has been substantial research in data mining that addressed the issue of what makes rules truly 'interesting'. This resulted in the development of a number of interestingness measures and algorithms that find all interesting rules from data. However, these approaches have the drawback that many of the discovered rules, while supposed to be interesting by definition, may actually (1) be obvious in that they logically follow from other discovered rules or (2) be expected given some of the other discovered rules and some simple distributional assumptions. In this paper we argue that this is a paradox since rules that are supposed to be interesting, in reality are uninteresting for the above reason. We show that this paradox exists for various popular interestingness measures and present an abstract characterization of an approach to alleviate the paradox. We finally discuss existing work in data mining that addresses this issue and show how these approaches can be viewed with respect to the characterization presented here.  相似文献   

13.
Detection (diagnosis) techniques play an important role in clinical medicine. Early detection of diseases could be life-saving, and the consequences of false-positives and false-negatives could be costly. Using multiple measurements strategy is a popular tool to increase diagnostic accuracy. In addition to the new diagnostic technology, recent advances in genomics, proteomics, and other areas have allowed some of these newly developed individual biomarkers measured by non-invasive and inexpensive procedures (e.g. samples from serum, urine or stool) to progress from basic discovery research to assay development. As more tests become commercially available, there is an increasing interest for clinicians to request combinations of various non-invasive and inexpensive tests to increase diagnostic accuracy. Using information regarding individual test sensitivities and specificities, we proposed a likelihood approach to combine individual test results and to approximate or estimate the combined sensitivities and specificities of various tests taking into account the conditional correlations to quantify system performance. To illustrate this approach, we considered an example using various combinations of diagnostic tests to detect bladder cancer.  相似文献   

14.
Covariance matrices play an important role in many multivariate techniques and hence a good covariance estimation is crucial in this kind of analysis. In many applications a sparse covariance matrix is expected due to the nature of the data or for simple interpretation. Hard thresholding, soft thresholding, and generalized thresholding were therefore developed to this end. However, these estimators do not always yield well-conditioned covariance estimates. To have sparse and well-conditioned estimates, we propose doubly shrinkage estimators: shrinking small covariances towards zero and then shrinking covariance matrix towards a diagonal matrix. Additionally, a richness index is defined to evaluate how rich a covariance matrix is. According to our simulations, the richness index serves as a good indicator to choose relevant covariance estimator.  相似文献   

15.
Students in elementary statistics traditionally see experiments and data as words and numbers in a text. They receive little exposure to the important statistical activities of sample selection, data collection, experimental design, development of statistical models, the need for randomization, selection of factors, etc. They often leave the first course without a firm understanding of the role of applied statistics or of the statistician in scientific investigations. In an attempt to improve elementary statistics education, we have developed a statistics laboratory similar to those of other elementary science courses. We will discuss our experiences in teaching a laboratory component with the traditional elementary statistics course. In each lab session, students, working in teams, discuss the design of an experiment, carry out the experiment, and analyze their data using Minitab on a Macintosh or MS-DOS based computer. The students then individually either answer a series of short answer questions or write a formal scientific report. The labs are designed to be relatively inexpensive and portable. They do not require a prior background in science, statistics or computing.  相似文献   

16.
Summary A measurement error model is a regression model with (substantial) measurement errors in the variables. Disregarding these measurement errors in estimating the regression parameters results in asymptotically biased estimators. Several methods have been proposed to eliminate, or at least to reduce, this bias, and the relative efficiency and robustness of these methods have been compared. The paper gives an account of these endeavors. In another context, when data are of a categorical nature, classification errors play a similar role as measurement errors in continuous data. The paper also reviews some recent advances in this field. This work was supported by the Deutsche Forschungsgemeinschaft (DFG) within the frame of the Sonderforschungsbereich SFB 386. We thank two anonymous referees for their helpful comments.  相似文献   

17.
Response     
This article focuses on the roles consulting units can play in statistics departments, universities, and the statistics profession. An opinion on what constitutes legitimate academic activity is given. The roles endorsed are consistent with the view that statistics is both a science and an integral part of scientific methodology and hence that statistics departments need to be concerned with both the development of new methods and the proper application of those methods in the pure and applied sciences. Consulting units serve as an interface between the developers and users of statistical methodology. Communication between developers and users can be carried on effectively through consultants who are actively involved in the education of users and in the education of statistics students, either formally or informally. Such communication is also enhanced when consultants are publishing in both statistics and nonstatistics journals. Routine data analysis is not endorsed as a justifiable academic activity for consulting faculty. Supervision of graduate students on such analyses, however, can be justified. The success a consulting unit can have in playing its roles depends on appropriate funding, staffing, and administration of the unit.  相似文献   

18.
《Serials Review》2012,38(4):239-244
Abstract

As physical collections are increasingly pressed for space, libraries continue to look at practices such as weeding and off-site storage, coupled with services like on-demand article-level document delivery, as potential space solutions. As libraries discard long runs of print journals, though, what role do journal backfiles play going forward? When many see backfiles as space hogs, is there data that provides a compelling argument for keeping backfiles of journals, and, if so, how should those backfiles be handled and by whom? Incorporating information from an interview with Glenn Jaeger, owner of Absolute Backorder Service, interlibrary loan data from the University of Prince Edward Island, and examples from the literature, this Balance Point column looks at the role that print journal backfiles play in the library landscape.  相似文献   

19.
I consider the design of multistage sampling schemes for epidemiologic studies involving latent variable models, with surrogate measurements of the latent variables on a subset of subjects. Such models arise in various situations: when detailed exposure measurements are combined with variables that can be used to assign exposures to unmeasured subjects; when biomarkers are obtained to assess an unobserved pathophysiologic process; or when additional information is to be obtained on confounding or modifying variables. In such situations, it may be possible to stratify the subsample on data available for all subjects in the main study, such as outcomes, exposure predictors, or geographic locations. Three circumstances where analytic calculations of the optimal design are possible are considered: (i) when all variables are binary; (ii) when all are normally distributed; and (iii) when the latent variable and its measurement are normally distributed, but the outcome is binary. In each of these cases, it is often possible to considerably improve the cost efficiency of the design by appropriate selection of the sampling fractions. More complex situations arise when the data are spatially distributed: the spatial correlation can be exploited to improve exposure assignment for unmeasured locations using available measurements on neighboring locations; some approaches for informative selection of the measurement sample using location and/or exposure predictor data are considered.  相似文献   

20.
Summary.  Longitudinal population-based surveys are widely used in the health sciences to study patterns of change over time. In many of these data sets unique patient identifiers are not publicly available, making it impossible to link the repeated measures from the same individual directly. This poses a statistical challenge for making inferences about time trends because repeated measures from the same individual are likely to be positively correlated, i.e., although the time trend that is estimated under the naïve assumption of independence is unbiased, an unbiased estimate of the variance cannot be obtained without knowledge of the subject identifiers linking repeated measures over time. We propose a simple method for obtaining a conservative estimate of variability for making inferences about trends in proportions over time, ensuring that the type I error is no greater than the specified level. The method proposed is illustrated by using longitudinal data on diabetes hospitalization proportions in South Carolina.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号