首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Genome-wide association studies are effective in investigating the loci related with complex diseases. Sometimes, the genotype is not exactly decoded and only genotype probability is obtained. In this case, F-test based on imputed genotype is usually used for the association analysis. Simulations show that existing methods such as the dosage test have poor performance when the genetic model is misspecified. In this study, we develop a robust test to detect the association of a disease and genetic loci while the genotype is uncertain and the genetic model is unknown.  相似文献   

2.
Although efficiency robust tests are preferred for genetic association studies when the genetic model is unknown, their statistical properties have been studied for different study designs separately under special situations. We study some statistical properties of the maximin efficiency robust test and a maximum‐type robust test (MAX3) under a general setting and obtain unified results. The results can also be applied to testing hypothesis with a constrained two‐dimensional parameter space. The results are applied to genetic association studies using case–parents trio data.  相似文献   

3.
Quantitative trait loci (QTL) mapping has been a standard means in identifying genetic regions harboring potential genes underlying complex traits. Likelihood ratio test (LRT) has been commonly applied to assess the significance of a genetic locus in a mixture model content. Given the time constraint in commonly used permutation tests to assess the significance of LRT in QTL mapping, we study the behavior of the LRT statistic in mixture model when the proportions of the distributions are unknown. We found that the asymptotic null distribution is stationary Gaussian process after suitable transformation. The result can be applied to one-parameter exponential family mixture model. Under certain condition, such as in a backcross mapping model, the tail probability of the supremum of the process is calculated and the threshold values can be determined by solving the distribution function. Simulation studies were performed to evaluate the asymptotic results.  相似文献   

4.
In case–control studies the Cochran–Armitage trend test is powerful for detection of an association between a risk genetic marker and a disease of interest. To apply this test, a score should be assigned to the genotypes based on the genetic model. When the underlying genetic model is unknown, the trend test statistic is quite sensitive to the choice of the score. In this paper, we study the asymptotic property of the robust suptest statistic defined as a supremum of Cochran–Armitage trend test across all scores between 0 and 1. Through numerical studies we show that small to moderate sample size performances of the suptest appear reasonable in terms of type I error control and we compared empirical powers of the suptest to those of three individual Cochran–Armitage trend tests and the maximum of the three Cochran–Armitage trend tests. The use of the suptest is applied to rheumatoid arthritis data from a genome-wide association study.  相似文献   

5.
The asymptotic distributions of many classical test statistics are normal. The resulting approximations are often accurate for commonly used significance levels, 0.05 or 0.01. In genome‐wide association studies, however, the significance level can be as low as 1×10−7, and the accuracy of the p‐values can be challenging. We study the accuracies of these small p‐values are using two‐term Edgeworth expansions for three commonly used test statistics in GWAS. These tests have nuisance parameters not defined under the null hypothesis but estimable. We derive results for this general form of testing statistics using Edgeworth expansions, and find that the commonly used score test, maximin efficiency robust test and the chi‐squared test are second order accurate in the presence of the nuisance parameter, justifying the use of the p‐values obtained from these tests in the genome‐wide association studies.  相似文献   

6.
A two-way contingency table in which both variables have the same categories is termed a symmetric table. In many applications, because of the social processes involved, most of the observations lie on the main diagonal and the off-diagonal counts are small. For these tables, the model of independence is implausible and interest is then focussed on the off-diagonal cells and the models of quasi-independence and quasi-symmetry. For ordinal variables, a linear-by-linear association model can be used to model the interaction structure. For sparse tables, large-sample goodness-of-fit tests are often unreliable and one should use an exact test. In this paper, we review exact tests and the computing problems involved. We propose new recursive algorithms for exact goodness-of-fit tests of quasi-independence, quasi-symmetry, linear-by-linear association and some related models. We propose that all computations be carried out using symbolic computation and rational arithmetic in order to calculate the exact p-values accurately and describe how we implemented our proposals. Two examples are presented.  相似文献   

7.
Identifying the risk factors for comorbidity is important in psychiatric research. Empirically, studies have shown that testing multiple, correlated traits simultaneously is more powerful than testing a single trait at a time in association analysis. Furthermore, for complex diseases, especially mental illnesses and behavioral disorders, the traits are often recorded in different scales such as dichotomous, ordinal and quantitative. In the absence of covariates, nonparametric association tests have been developed for multiple complex traits to study comorbidity. However, genetic studies generally contain measurements of some covariates that may affect the relationship between the risk factors of major interest (such as genes) and the outcomes. While it is relatively easy to adjust these covariates in a parametric model for quantitative traits, it is challenging for multiple complex traits with possibly different scales. In this article, we propose a nonparametric test for multiple complex traits that can adjust for covariate effects. The test aims to achieve an optimal scheme of adjustment by using a maximum statistic calculated from multiple adjusted test statistics. We derive the asymptotic null distribution of the maximum test statistic, and also propose a resampling approach, both of which can be used to assess the significance of our test. Simulations are conducted to compare the type I error and power of the nonparametric adjusted test to the unadjusted test and other existing adjusted tests. The empirical results suggest that our proposed test increases the power through adjustment for covariates when there exist environmental effects, and is more robust to model misspecifications than some existing parametric adjusted tests. We further demonstrate the advantage of our test by analyzing a data set on genetics of alcoholism.  相似文献   

8.
A multistage variable selection method is introduced for detecting association signals in structured brain‐wide and genome‐wide association studies (brain‐GWAS). Compared to conventional methods that link one voxel to one single nucleotide polymorphism (SNP), our approach is more efficient and powerful in selecting the important signals by integrating anatomic and gene grouping structures in the brain and the genome, respectively. It avoids resorting to a large number of multiple comparisons while effectively controlling the false discoveries. Validity of the proposed approach is demonstrated by both theoretical investigation and numerical simulations. We apply our proposed method to a brain‐GWAS using Alzheimer's Disease Neuroimaging Initiative positron emission tomography (ADNI PET) imaging and genomic data. We confirm previously reported association signals and also uncover several novel SNPs and genes that are either associated with brain glucose metabolism or have their association significantly modified by Alzheimer's disease status.  相似文献   

9.
Causal inference approaches in systems genetics exploit quantitative trait loci (QTL) genotypes to infer causal relationships among phenotypes. The genetic architecture of each phenotype may be complex, and poorly estimated genetic architectures may compromise the inference of causal relationships among phenotypes. Existing methods assume QTLs are known or inferred without regard to the phenotype network structure. In this paper we develop a QTL-driven phenotype network method (QTLnet) to jointly infer a causal phenotype network and associated genetic architecture for sets of correlated phenotypes. Randomization of alleles during meiosis and the unidirectional influence of genotype on phenotype allow the inference of QTLs causal to phenotypes. Causal relationships among phenotypes can be inferred using these QTL nodes, enabling us to distinguish among phenotype networks that would otherwise be distribution equivalent. We jointly model phenotypes and QTLs using homogeneous conditional Gaussian regression models, and we derive a graphical criterion for distribution equivalence. We validate the QTLnet approach in a simulation study. Finally, we illustrate with simulated data and a real example how QTLnet can be used to infer both direct and indirect effects of QTLs and phenotypes that co-map to a genomic region.  相似文献   

10.
Many late-onset complex diseases exhibit variable age of onset. Efficiently incorporating age of onset information into linkage analysis can potentially increase the power of dissecting complex diseases. In this paper, we treat age of onset as a genetic trait with censored observations. We use multiple markers to infer the inheritance vector at the disease susceptibility (DS) locus in order to extract information about the inheritance pattern of the disease allele in a pedigree. Given the inheritance distribution at the DS locus, we define the genetic frailty for each individual within a nuclear family as the sum of frailties due to a putative major disease gene and a polygenic effect due to any remaining DS loci. Conditioning on these frailties we use the proportional hazards model for the risk of developing disease. We show that a test of linkage can be formulated as a test of zero variance due to a specific locus of the additive gamma frailties. Maximum likelihood estimation, using the EM algorithm, and likelihood ratio tests are employed for parameter estimation and tests of linkage. A simulation study presented indicates that the proposed method is well behaved and can be more powerful than the currently available allele-sharing based linkage methods. A breast cancer data example is used for illustration.  相似文献   

11.
We propose a semiparametric approach for the analysis of case–control genome-wide association study. Parametric components are used to model both the conditional distribution of the case status given the covariates and the distribution of genotype counts, whereas the distribution of the covariates are modelled nonparametrically. This yields a direct and joint modelling of the case status, covariates and genotype counts, and gives a better understanding of the disease mechanism and results in more reliable conclusions. Side information, such as the disease prevalence, can be conveniently incorporated into the model by an empirical likelihood approach and leads to more efficient estimates and a powerful test in the detection of disease-associated SNPs. Profiling is used to eliminate a nuisance nonparametric component, and the resulting profile empirical likelihood estimates are shown to be consistent and asymptotically normal. For the hypothesis test on disease association, we apply the approximate Bayes factor (ABF) which is computationally simple and most desirable in genome-wide association studies where hundreds of thousands to a million genetic markers are tested. We treat the approximate Bayes factor as a hybrid Bayes factor which replaces the full data by the maximum likelihood estimates of the parameters of interest in the full model and derive it under a general setting. The deviation from Hardy–Weinberg Equilibrium (HWE) is also taken into account and the ABF for HWE using cases is shown to provide evidence of association between a disease and a genetic marker. Simulation studies and an application are further provided to illustrate the utility of the proposed methodology.  相似文献   

12.
In genetic association studies, detecting phenotype–genotype association is a primary goal. We assume that the relationship between the data—phenotype, genetic markers and environmental covariates—can be modeled by a generalized linear model. The number of markers is allowed to be far greater than the number of individuals of the study. A multivariate score statistic is used to test each marker for association with a phenotype. We assume that the test statistics asymptotically follow a multivariate normal distribution under the complete null hypothesis of no phenotype–genotype association. We present the familywise error rate order k approximation method to find a local significance level (alternatively, an adjusted p-value) for each test such that the familywise error rate is controlled. The special case k=1 gives the Šidák method. As a by-product, an effective number of independent tests can be defined. Furthermore, if environmental covariates and genetic markers are uncorrelated, or no environmental covariates are present, we show that covariances between score statistics depend on genetic markers alone. This not only leads to more efficient calculations but also to a local significance level that is determined only by the collection of markers used, independent of the phenotypes and environmental covariates of the experiment at hand.  相似文献   

13.
We consider the problem of testing against trend and umbrella alternatives, with known and unknown peak, in two-way layouts with fixed effects. We consider the non-parametric two-way layout ANOVA model of Akritas and Arnold (J. Amer. Statist. Assoc. 89 (1994) 336), and use the non-parametric formulation of patterned alternatives introduced by Akritas and Brunner (Research Developments in Probability and Statistics: Festschrift in honor of Madan L. Puri, VSP, Zeist, The Netherlands, 1996, pp. 277–288). The hypotheses of no main effects and of no simple effects are both considered. New rank test statistics are developed to specifically detect these types of alternatives. For main effects, we consider two types of statistics, one using weights similar to Hettmansperger and Norton (J. Amer. Statist. Assoc. 82 (1987) 292) and one with weights which maximize the asymptotic efficacy. For simple effects, we consider in detail only statistics to detect trend or umbrella patterns with known peaks, and refer to Callegari (Ph.D. Thesis, University of Padova, Italy) for a discussion about possible statistics for umbrella alternatives with unknown peaks. The null asymptotic distributions of the new statistics are derived. A number of simulation studies investigate their finite sample behaviors and compare the achieved alpha levels and power with some alternative procedures. An application to data used in a clinical study is presented to illustrate how to utilize some of the proposed tests for main effects.  相似文献   

14.
Testing goodness‐of‐fit of commonly used genetic models is of critical importance in many applications including association studies and testing for departure from Hardy–Weinberg equilibrium. Case–control design has become widely used in population genetics and genetic epidemiology, thus it is of interest to develop powerful goodness‐of‐fit tests for genetic models using case–control data. This paper develops a likelihood ratio test (LRT) for testing recessive and dominant models for case–control studies. The LRT statistic has a closed‐form formula with a simple $\chi^{2}(1)$ null asymptotic distribution, thus its implementation is easy even for genome‐wide association studies. Moreover, it has the same power and optimality as when the disease prevalence is known in the population. The Canadian Journal of Statistics 41: 341–352; 2013 © 2013 Statistical Society of Canada  相似文献   

15.
Genome-wide association studies commonly involve simultaneous tests of millions of single nucleotide polymorphisms (SNP) for disease association. The SNPs in nearby genomic regions, however, are often highly correlated due to linkage disequilibrium (LD, a genetic term for correlation). Simple Bonferonni correction for multiple comparisons is therefore too conservative. Permutation tests, which are often employed in practice, are both computationally expensive for genome-wide studies and limited in their scopes. We present an accurate and computationally efficient method, based on Poisson de-clumping heuristics, for approximating genome-wide significance of SNP associations. Compared with permutation tests and other multiple comparison adjustment approaches, our method computes the most accurate and robust p-value adjustments for millions of correlated comparisons within seconds. We demonstrate analytically that the accuracy and the efficiency of our method are nearly independent of the sample size, the number of SNPs, and the scale of p-values to be adjusted. In addition, our method can be easily adopted to estimate false discovery rate. When applied to genome-wide SNP datasets, we observed highly variable p-value adjustment results evaluated from different genomic regions. The variation in adjustments along the genome, however, are well conserved between the European and the African populations. The p-value adjustments are significantly correlated with LD among SNPs, recombination rates, and SNP densities. Given the large variability of sequence features in the genome, we further discuss a novel approach of using SNP-specific (local) thresholds to detect genome-wide significant associations. This article has supplementary material online.  相似文献   

16.
Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA's statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally "validated" in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN.  相似文献   

17.
The hazard rate (HR) and mean residual lifetime are two of the most practical and best-known functions in biometry, reliability, statistics and life testing. Recently, the reversed HR function is found to have interesting properties useful in additional areas such as censored data and forensic science. For these three biometric functions, we propose testing methods that they take on a known functional form against that they dominate or are dominated by this known form. This goodness-of-fit-type testing is wider in applications and more interesting than the long-standing testing procedures for exponentiality against the monotonicity of these functions or even the change point problems. This is so since we can test against any choice of the survival distribution and not just exponentiality. For this general testing, we present easy to implement tests and generalize them into classes of statistics that could lead to more powerful and efficient testing.  相似文献   

18.
A primary focus of an increasing number of scientific studies is to determine whether two exposures interact in the effect that they produce on an outcome of interest. Interaction is commonly assessed by fitting regression models in which the linear predictor includes the product between those exposures. When the main interest lies in the interaction, this approach is not entirely satisfactory because it is prone to (possibly severe) bias when the main exposure effects or the association between outcome and extraneous factors are misspecified. In this article, we therefore consider conditional mean models with identity or log link which postulate the statistical interaction in terms of a finite-dimensional parameter, but which are otherwise unspecified. We show that estimation of the interaction parameter is often not feasible in this model because it would require nonparametric estimation of auxiliary conditional expectations given high-dimensional variables. We thus consider 'multiply robust estimation' under a union model that assumes at least one of several working submodels holds. Our approach is novel in that it makes use of information on the joint distribution of the exposures conditional on the extraneous factors in making inferences about the interaction parameter of interest. In the special case of a randomized trial or a family-based genetic study in which the joint exposure distribution is known by design or by Mendelian inheritance, the resulting multiply robust procedure leads to asymptotically distribution-free tests of the null hypothesis of no interaction on an additive scale. We illustrate the methods via simulation and the analysis of a randomized follow-up study.  相似文献   

19.
In studies of complex disorders such as nicotine dependence, it is common that researchers assess multiple variables related to a disorder as well as other disorders that are potentially correlated with the primary disorder of interest. In this work, we refer to those variables and disorders broadly as multiple traits. The multiple traits may or may not have a common causal genetic variant. Intuitively, it may be more powerful to accommodate multiple traits in genetic traits, but the analysis of multiple traits is generally more complicated than the analysis of a single trait. Furthermore, it is not well documented as to how much power we may potentially gain by considering multiple traits. Our aim is to enhance our understanding on this important and practical issue. We considered a variety of correlation structures between traits and the disease locus. To focus on the effect of accommodating multiple traits, we examined genetic models that are relatively simple so that we can pinpoint the factors affecting the power. We conducted simulation studies to explore the performance of testing multiple traits simultaneously and the performance of testing a single trait at a time in family-based association studies. Our simulation results demonstrated that the performance of testing multiple traits simultaneously is better than that of testing each trait individually for almost models considered. We also found that the power of association tests varies among the underlying models. The advantage of conducting a multiple traits test is minimized when some traits are influenced by the gene only through other traits; and it is maximized when there are causal relations between the traits and the gene, and among the traits themselves or when there are extraneous traits.  相似文献   

20.
This paper deals with the asymptotics of a class of tests for association in 2-way contingency tables based on square forms in cell frequencies, given the total number of observations (multinomial sampling) or one set of marginal totals (stratified sampling). The case when both row and column marginal totals are fixed (hypergeometric sampling) was studied in Kulinskaya (1994), The class of tests under consideration includes a number of classical measures for association, Its two subclasses are the tests based on statistics using centralized cell frequencies (asymptotically distributed as weighted sums of central chi-squares) and those using the non-centralized cell frequencies (asymptotically normal). The parameters of asymptotic distributions depend on the sampling model and on true marginal probabilities. Maximum efficiency for asymptotically normal statistics is achieved under hypergeometric sampling, If the cell frequencies or the statistic as a whole are centralized using marginal proportions as estimates for marginal probabilities, the asymptotic distribution does not differ much between models and it is equivalent to that under hypergeometric sampling. These findings give an extra justification for the use of permutation tests for association (which are based on hypergeometric sampling). As an application, several well known measures of association are analysed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号