首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Recently-developed genotype imputation methods are a powerful tool for detecting untyped genetic variants that affect disease susceptibility in genetic association studies. However, existing imputation methods require individual-level genotype data, whereas in practice it is often the case that only summary data are available. For example this may occur because, for reasons of privacy or politics, only summary data are made available to the research community at large; or because only summary data are collected, as in DNA pooling experiments. In this article, we introduce a new statistical method that can accurately infer the frequencies of untyped genetic variants in these settings, and indeed substantially improve frequency estimates at typed variants in pooling experiments where observations are noisy. Our approach, which predicts each allele frequency using a linear combination of observed frequencies, is statistically straight-forward, and related to a long history of the use of linear methods for estimating missing values (e.g. Kriging). The main statistical novelty is our approach to regularizing the covariance matrix estimates, and the resulting linear predictors, which is based on methods from population genetics. We find that, besides being both fast and flexible - allowing new problems to be tackled that cannot be handled by existing imputation approaches purpose-built for the genetic context - these linear methods are also very accurate. Indeed, imputation accuracy using this approach is similar to that obtained by state-of-the art imputation methods that use individual-level data, but at a fraction of the computational cost.  相似文献   

2.
3.
极差(R)、平均差(AD)和标准差(δ)是描述离散程度所采用的三个测度值,通过证明,三者之间存在着R≥δ≥AD≥0的关系式。通过例证还得知,四分位差(Qd)与它们的关系并不确定。在具体运用三个测度值时,应考虑到各自的优缺点。  相似文献   

4.
Differences between plant varieties are based on phenotypic observations, which are both space and time consuming. Moreover, the phenotypic data result from the combined effects of genotype and environment. On the contrary, molecular data are easier to obtain and give a direct access to the genotype. In order to save experimental trials and to concentrate efforts on the relevant comparisons between varieties, the relationship between phenotypic and genetic distances is studied. It appears that the classical genetic distances based on molecular data are not appropriate for predicting phenotypic distances. In the linear model framework, we define a new pseudo genetic distance, which is a prediction of the phenotypic one. The distribution of this distance given the pseudo genetic distance is established. Statistical properties of the predicted distance are derived when the parameters of the model are either given or estimated. We finally apply these results to distinguishing between 144 maize lines. This case study is very satisfactory because the use of anonymous molecular markers (RFLP) leads to saving 29% of the trials with an acceptable error risk. These results need to be confirmed on other varieties and species and would certainly be improved by using genes coding for phenotypic traits.  相似文献   

5.
Many late-onset diseases are caused by what appears to be a combination of a genetic predisposition to disease and environmental factors. The use of existing cohort studies provides an opportunity to infer genetic predisposition to disease on a representative sample of a study population, now that many such studies are gathering genetic information on the participants. One feature to using existing cohorts is that subjects may be censored due to death prior to genetic sampling, thereby adding a layer of complexity to the analysis. We develop a statistical framework to infer parameters of a latent variables model for disease onset. The latent variables model describes the role of genetic and modifiable risk factors on the onset ages of multiple diseases, and accounts for right-censoring of disease onset ages. The framework also allows for missing genetic information by inferring a subject's unknown genotype through appropriately incorporated covariate information. The model is applied to data gathered in the Framingham Heart Study for measuring the effect of different Apo-E genotypes on the occurrence of various cardiovascular disease events.  相似文献   

6.
The identification of influential observations has drawn a great deal of attention in regression diagnostics. Most of these identification techniques are based on single case deletion and among them DFFITS has become very popular with the statisticians. But this technique along with all other single case diagnostics may be ineffective in the presence of multiple influential observations. In this paper we develop a generalized version of DFFITS based on group deletion and then propose a new technique to identify multiple influential observations using this. The advantage of using the proposed method in the identification of multiple influential cases is then investigated through several well-referred data sets.  相似文献   

7.
While effective communication of statistical concepts is important for the enthusiastic adoption of these concepts by collaborators, statisticians are not necessarily trained in the process of communication with collaborators in other substantive fields. It is proposed that increased attention be paid to pedagogical techniques for communicating to our non-statistical colleagues what statisticians have to offer to the design and analysis aspects of a collaborative effort. One approach is to offer examples relevant to our colleagues’ fields when we explain statistical ideas. This paper provides several such examples from the field of neurology, focusing on the issue of sample selection bias and prospective study designs.  相似文献   

8.
From a geometric perspective, linear model theory relies on a single assumption, that (‘corrected’) data vector directions are uniformly distributed in Euclidean space. We use this perspective to explore pictorially the effects of violations of the traditional assumptions (normality, independence and homogeneity of variance) on the Type I error rate. First, for several non‐normal distributions we draw geometric pictures and carry out simulations to show how the effects of non‐normality diminish with increased parent distribution symmetry and continuity, and increased sample size. Second, we explore the effects of dependencies on Type I error rate. Third, we use simulation and geometry to investigate the effect of heterogeneity of variance on Type I error rate. We conclude, in a fresh way, that independence and homogeneity of variance are more important assumptions than normality. The practical implication is that statisticians and authors of statistical computing packages need to pay more attention to the correctness of these assumptions than to normality.  相似文献   

9.
The 175th anniversary of the ASA provides an opportunity to look back into the past and peer into the future. What led our forebears to found the association? What commonalities do we still see? What insights might we glean from their experiences and observations? I will use the anniversary as a chance to reflect on where we are now and where we are headed in terms of statistical education amidst the growth of data science. Statistics is the science of learning from data. By fostering more multivariable thinking, building data-related skills, and developing simulation-based problem solving, we can help to ensure that statisticians are fully engaged in data science and the analysis of the abundance of data now available to us.  相似文献   

10.
Although the Bezier curve is very popular in the area of computational graphics it has rarely been used by statisticians. In this paper we develop methods and techniques for use of the Bezier curve in estimation of density and regression function. Also, asymptotic mean integrated square error for both estimators are derived. Comparisons with kernel estimator are conducted using simulation.  相似文献   

11.
Genomewide association studies have become the primary tool for discovering the genetic basis of complex human diseases. Such studies are susceptible to the confounding effects of population stratification, in that the combination of allele-frequency heterogeneity with disease-risk heterogeneity among different ancestral subpopulations can induce spurious associations between genetic variants and disease. This article provides a statistically rigorous and computationally feasible solution to this challenging problem of unmeasured confounders. We show that the odds ratio of disease with a genetic variant is identifiable if and only if the genotype is independent of the unknown population substructure conditional on a set of observed ancestry-informative markers in the disease-free population. Under this condition, the odds ratio of interest can be estimated by fitting a semiparametric logistic regression model with an arbitrary function of a propensity score relating the genotype probability to ancestry-informative markers. Approximating the unknown function of the propensity score by B-splines, we derive a consistent and asymptotically normal estimator for the odds ratio of interest with a consistent variance estimator. Simulation studies demonstrate that the proposed inference procedures perform well in realistic settings. An application to the well-known Wellcome Trust Case-Control Study is presented. Supplemental materials are available online.  相似文献   

12.
The Fay–Herriot model is a linear mixed model that plays a relevant role in small area estimation (SAE). Under the SAE set-up, tools for selecting an adequate model are required. Applied statisticians are often interested on deciding if it is worthwhile to use a mixed effect model instead of a simpler fixed-effect model. This problem is not standard because under the null hypothesis the random effect variance is on the boundary of the parameter space. The likelihood ratio test and the residual likelihood ratio test are proposed and their finite sample distributions are derived. Finally, we analyse their behaviour under simulated scenarios and we also apply them to real data.  相似文献   

13.
Robust regression has not had a great impact on statistical practice, although all statisticians are convinced of its importance. The procedures for robust regression currently available are complex, and computer intensive. With a modification of the Gaussian paradigm, taking into consideration outliers and leverage points, we propose an iteratively weighted least squares method which gives robust fits. The procedure is illustrated by applying it on data sets which have been previously used to illustrate robust regression methods.It is hoped that this simple, effective and accessible method will find its use in statistical practice.  相似文献   

14.
This paper provides an overview of “Improving Design, Evaluation and Analysis of early drug development Studies” (IDEAS), a European Commission–funded network bringing together leading academic institutions and small‐ to large‐sized pharmaceutical companies to train a cohort of graduate‐level medical statisticians. The network is composed of a diverse mix of public and private sector partners spread across Europe, which will host 14 early‐stage researchers for 36 months. IDEAS training activities are composed of a well‐rounded mixture of specialist methodological components and generic transferable skills. Particular attention is paid to fostering collaborations between researchers and supervisors, which span academia and the private sector. Within this paper, we review existing medical statistics programmes (MSc and PhD) and highlight the training they provide on skills relevant to drug development. Motivated by this review and our experiences with the IDEAS project, we propose a concept for a joint, harmonised European PhD programme to train statisticians in quantitative methods for drug development.  相似文献   

15.
We introduce health technology assessment and evidence synthesis briefly, and then concentrate on the statistical approaches used for conducting network meta-analysis (NMA) in the development and approval of new health technologies. NMA is an extension of standard meta-analysis where indirect as well as direct information is combined and can be seen as similar to the analysis of incomplete-block designs. We illustrate it with an example involving three treatments, using fixed-effects and random-effects models, and using frequentist and Bayesian approaches. As most statisticians in the pharmaceutical industry are familiar with SAS? software for analyzing clinical trials, we provide example code for each of the methods we illustrate. One issue that has been overlooked in the literature is the choice of constraints applied to random effects, and we show how this affects the estimates and standard errors and propose a symmetric set of constraints that is equivalent to most current practice. Finally, we discuss the role of statisticians in planning and carrying out NMAs and the strategy for dealing with important issues such as heterogeneity.  相似文献   

16.
Regression methods for common data types such as measured, count and categorical variables are well understood but increasingly statisticians need ways to model relationships between variable types such as shapes, curves, trees, correlation matrices and images that do not fit into the standard framework. Data types that lie in metric spaces but not in vector spaces are difficult to use within the usual regression setting, either as the response and/or a predictor. We represent the information in these variables using distance matrices which requires only the specification of a distance function. A low-dimensional representation of such distance matrices can be obtained using methods such as multidimensional scaling. Once these variables have been represented as scores, an internal model linking the predictors and the responses can be developed using standard methods. We call scoring as the transformation from a new observation to a score, whereas backscoring is a method to represent a score as an observation in the data space. Both methods are essential for prediction and explanation. We illustrate the methodology for shape data, unregistered curve data and correlation matrices using motion capture data from an experiment to study the motion of children with cleft lip.  相似文献   

17.
This is an expository article. Here we show how the successfully used Kalman filter, popular with control engineers and other scientists, can be easily understood by statisticians if we use a Bayesian formulation and some well-known results in multivariate statistics. We also give a simple example illustrating the use of the Kalman filter for quality control work.  相似文献   

18.
In this paper we set out what we consider to be a set of best practices for statisticians in the reporting of pharmaceutical industry‐sponsored clinical trials. We make eight recommendations covering: author responsibilities and recognition; publication timing; conflicts of interest; freedom to act; full author access to data; trial registration and independent review. These recommendations are made in the context of the prominent role played by statisticians in the design, conduct, analysis and reporting of pharmaceutical sponsored trials and the perception of the reporting of these trials in the wider community. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

19.
In this paper, we propose the class of generalized additive models for location, scale and shape in a test for the association of genetic markers with non-normally distributed phenotypes comprising a spike at zero. The resulting statistical test is a generalization of the quantitative transmission disequilibrium test with mating type indicator, which was originally designed for normally distributed quantitative traits and parent-offspring data. As a motivational example, we consider coronary artery calcification (CAC), which can accurately be identified by electron beam tomography. In the investigated regions, individuals will have a continuous measure of the extent of calcium found or they will be calcium-free. Hence, the resulting distribution is a mixed discrete-continuous distribution with spike at zero. We carry out parent-offspring simulations motivated by such CAC measurement values in a screening population to study statistical properties of the proposed test for genetic association. Furthermore, we apply the approach to data of the Genetic Analysis Workshop 16 that are based on real genotype and family data of the Framingham Heart Study, and test the association of selected genetic markers with simulated coronary artery calcification.  相似文献   

20.
ABSTRACT

The current concerns about reproducibility have focused attention on proper use of statistics across the sciences. This gives statisticians an extraordinary opportunity to change what are widely regarded as statistical practices detrimental to the cause of good science. However, how that should be done is enormously complex, made more difficult by the balkanization of research methods and statistical traditions across scientific subdisciplines. Working within those sciences while also allying with science reform movements—operating simultaneously on the micro and macro levels—are the key to making lasting change in applied science.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号