首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Analysis of familial aggregation in the presence of varying family sizes   总被引:2,自引:0,他引:2  
Summary.  Family studies are frequently undertaken as the first step in the search for genetic and/or environmental determinants of disease. Significant familial aggregation of disease is suggestive of a genetic aetiology for the disease and may lead to more focused genetic analysis. Of course, it may also be due to shared environmental factors. Many methods have been proposed in the literature for the analysis of family studies. One model that is appealing for the simplicity of its computation and the conditional interpretation of its parameters is the quadratic exponential model. However, a limiting factor in its application is that it is not reproducible , meaning that all families must be of the same size. To increase the applicability of this model, we propose a hybrid approach in which analysis is based on the assumption of the quadratic exponential model for a selected family size and combines a missing data approach for smaller families with a marginalization approach for larger families. We apply our approach to a family study of colorectal cancer that was sponsored by the Cancer Genetics Network of the National Institutes of Health. We investigate the properties of our approach in simulation studies. Our approach applies more generally to clustered binary data.  相似文献   

2.
Many late-onset complex diseases exhibit variable age of onset. Efficiently incorporating age of onset information into linkage analysis can potentially increase the power of dissecting complex diseases. In this paper, we treat age of onset as a genetic trait with censored observations. We use multiple markers to infer the inheritance vector at the disease susceptibility (DS) locus in order to extract information about the inheritance pattern of the disease allele in a pedigree. Given the inheritance distribution at the DS locus, we define the genetic frailty for each individual within a nuclear family as the sum of frailties due to a putative major disease gene and a polygenic effect due to any remaining DS loci. Conditioning on these frailties we use the proportional hazards model for the risk of developing disease. We show that a test of linkage can be formulated as a test of zero variance due to a specific locus of the additive gamma frailties. Maximum likelihood estimation, using the EM algorithm, and likelihood ratio tests are employed for parameter estimation and tests of linkage. A simulation study presented indicates that the proposed method is well behaved and can be more powerful than the currently available allele-sharing based linkage methods. A breast cancer data example is used for illustration.  相似文献   

3.
Familial aggregation studies seek to identify diseases that cluster in families. These studies are often carried out as a first step in the search for hereditary factors affecting the risk of disease. It is necessary to account for age at disease onset to avoid potential misclassification of family members who are disease-free at the time of study participation or who die before developing disease. This is especially true for late-onset diseases, such as prostate cancer or Alzheimer's disease. We propose a discrete time model that accounts for the age at disease onset and allows the familial association to vary with age and to be modified by covariates, such as pedigree relationship. The parameters of the model have interpretations as conditional log-odds and log-odds ratios, which can be viewed as discrete time conditional cross hazard ratios. These interpretations are appealing for cancer risk assessment. Properties of this model are explored in simulation studies, and the method is applied to a large family study of cancer conducted by the National Cancer Institute-sponsored Cancer Genetics Network (CGN).  相似文献   

4.
Abstract. Family‐based case–control designs are commonly used in epidemiological studies for evaluating the role of genetic susceptibility and environmental exposure to risk factors in the etiology of rare diseases. Within this framework, it is often reasonable to assume genetic susceptibility and environmental exposure being conditionally independent of each other within families in the source population. We focus on this setting to explore the situation of measurement error affecting the assessment of the environmental exposure. We correct for measurement error through a likelihood‐based method. We exploit a conditional likelihood approach to relate the probability of disease to the genetic and the environmental risk factors. We show that this approach provides less biased and more efficient results than that based on logistic regression. Regression calibration, instead, provides severely biased estimators of the parameters. The comparison of the correction methods is performed through simulation, under common measurement error structures.  相似文献   

5.
Large cohort studies are commonly launched to study the risk effect of genetic variants or other risk factors on a chronic disorder. In these studies, family data are often collected to provide additional information for the purpose of improving the inference results. Statistical analysis of the family data can be very challenging due to the missing observations of genotypes, incomplete records of disease occurrences in family members, and the complicated dependence attributed to the shared genetic background and environmental factors. In this article, we investigate a class of logistic models with family-shared random effects to tackle these challenges, and develop a robust regression method based on the conditional logistic technique for statistical inference. An expectation–maximization (EM) algorithm with fast computation speed is developed to handle the missing genotypes. The proposed estimators are shown to be consistent and asymptotically normal. Additionally, a score test based on the proposed method is derived to test the genetic effect. Extensive simulation studies demonstrate that the proposed method performs well in finite samples in terms of estimate accuracy, robustness and computational speed. The proposed procedure is applied to an Alzheimer's disease study.  相似文献   

6.
An algorithm for functional evaluation of the likelihood of paternal and maternal recombination fractions for pedigree data is proposed. The idea behind the algorithm is that the probability of affected status and certain marker genotypes of ancestors is inherited by their descendants along with the inheritance of certain haplotypes. In this algorithm, the likelihood is evaluated by a single recursive call for each terminal sibling set along with the inheritance flow. The advantage of the algorithm is not only the simplicity of its implementation, but also its functional form of evaluation. The likelihood is obtained as a polynomial of the recombination fractions, making it easier to validate the likelihood more carefully, resulting in a more accurate localization of the disease locus. We report an experimental implementation of this algorithm in R, together with several practical applications.  相似文献   

7.
We propose a fully Bayesian model with a non-informative prior for analyzing misclassified binary data with a validation substudy. In addition, we derive a closed-form algorithm for drawing all parameters from the posterior distribution and making statistical inference on odds ratios. Our algorithm draws each parameter from a beta distribution, avoids the specification of initial values, and does not have convergence issues. We apply the algorithm to a data set and compare the results with those obtained by other methods. Finally, the performance of our algorithm is assessed using simulation studies.  相似文献   

8.
Family survival data can be used to estimate the degree of genetic and environmental contributions to the age at onset of a disease or of a specific event in life. The data can be modeled with a correlated frailty model in which the frailty variable accounts for the degree of kinship within the family. The heritability (degree of heredity) of the age at a specific event in life (or the onset of a disease) is usually defined as the proportion of variance of the survival age that is associated with genetic effects. If the survival age is (interval) censored, heritability as usually defined cannot be estimated. Instead, it is defined as the proportion of variance of the frailty associated with genetic effects. In this paper we describe a correlated frailty model to estimate the heritability and the degree of environmental effects on the age at which individuals contact a social worker for the first time and to test whether there is a difference between the survival functions of this age for twins and non-twins.  相似文献   

9.
The Additive Genetic Gamma Frailty Model   总被引:1,自引:0,他引:1  
In this paper the additive genetic gamma frailty model is defined. Individual frailties are correlated as a result of an additive genetic model. An algorithm to construct additive genetic gamma frailties for any pedigree is given so that the variance–covariance structure among individual frailties equals the numerator relationship matrix times a variance. The EM algorithm can be used to estimate the parameters in the model. Calculations are similar using the EM algorithm in the shared frailty model, however the E step is not correspondingly simple. This is illustrated re-analysing data, analysed by the shared frailty model in Nielsen et al . (1992), from the Danish adoptive register. Goodness of fit of the additive genetic gamma frailty model can be tested after analysing data with the correlated frailty model. Doing so, a "defect" in the often used and otherwise well behaving likelihood was found  相似文献   

10.
Absolute risk is the chance that a person with given risk factors and free of the disease of interest at age a will be diagnosed with that disease in the interval (a, a + τ]. Absolute risk is sometimes called cumulative incidence. Absolute risk is a “crude” risk because it is reduced by the chance that the person will die of competing causes of death before developing the disease of interest. Cohort studies admit flexibility in modeling absolute risk, either by allowing covariates to affect the cause-specific relative hazards or to affect the absolute risk itself. An advantage of cause-specific relative risk models is that various data sources can be used to fit the required components. For example, case–control data can be used to estimate relative risk and attributable risk, and these can be combined with registry data on age-specific composite hazard rates for the disease of interest and with national data on competing hazards of mortality to estimate absolute risk. Family-based designs, such as the kin-cohort design and collections of pedigrees with multiple affected individuals can be used to estimate the genotype-specific hazard of disease. Such analyses must be adjusted for ascertainment, and failure to take into account residual familial risk, such as might be induced by unmeasured genetic variants or by unmeasured behavioral or environmental exposures that are correlated within families, can lead to overestimates of mutation-specific absolute risk in the general population.  相似文献   

11.
In 1918 R.A. Fisher published an interpretation of covariation between relatives in terms of Mendelian inheritance, which has allowed inference on genetic and environmental components of variation from plant, animal and human pedigree data. Fisher had introduced maximum likelihood six years earlier. His 1918 paper abo contained the basics of linear regression and decomposition of variance. These concepts have now been united to allow flexible modelling of the mean and covariance structure of non-independent data on continuous traits, using maximum likelihood under a multivariate normal assumption. FISHER is a software package, designed for pedigree analysis and easily adapted for repeated measures and longitudinal data analysis. A range of applications illustrate FISHER as a useful statistical tool. Issues related to assumptions, tests-of-fit, and robustness of inference are discussed.  相似文献   

12.
In this article we propose a method called GLLS for the fitting of bilinear time series models. The GLLS procedure is the combination of the LASSO method, the generalized cross-validation method, the least angle regression method, and the stepwise regression method. Compared with the traditional methods such as the repeated residual method and the genetic algorithm, GLLS has the advantage of shrinking the coefficients of the models and saving the computational time. The Monte Carlo simulation studies and a real data example are reported to assess the performance of the proposed GLLS method.  相似文献   

13.
Family-based follow-up study designs are important in epidemiology as they enable investigations of disease aggregation within families. Such studies are subject to methodological complications since data may include multiple endpoints as well as intra-family correlation. The methods herein are developed for the analysis of age of onset with multiple disease types for family-based follow-up studies. The proposed model expresses the marginalized frailty model in terms of the subdistribution hazards (SDH). As with Pipper and Martinussen’s (Scand J Stat 30:509–521, 2003) model, the proposed multivariate SDH model yields marginal interpretations of the regression coefficients while allowing the correlation structure to be specified by a frailty term. Further, the proposed model allows for a direct investigation of the covariate effects on the cumulative incidence function since the SDH is modeled rather than the cause specific hazard. A simulation study suggests that the proposed model generally offers improved performance in terms of bias and efficiency when a sufficient number of events is observed. The proposed model also offers type I error rates close to nominal. The method is applied to a family-based study of breast cancer when death in absence of breast cancer is considered a competing risk.  相似文献   

14.
In familial data, ascertainment correction is often necessary to decipher genetic bases of complex human diseases. This is because families usually are not drawn at random or are not selected according to well-defined rules. While there has been much progress in identifying genes associated with a certain phenotype, little attention has been paid so far for familial studies on exploring common genetic influences on different phenotypes of interest. In this study, we develop a powerful bivariate analytical approach that can be used for a complex situation with paired binary traits. In addition, our model has been framed to accommodate the possibility of imperfect diagnosis as traits may be wrongly observed. Thus, the primary focus is to see whether a particular gene jointly influences both phenotypes. We examine the plausibility of this theory in a sample of families ascertained on the basis of at least one affected individual. We propose a bivariate binary mixed model that provides a novel and flexible way to account for wrong ascertainment in families collected with multiple cases. A hierarchical Bayesian analysis using Markov Chain Monte Carlo (MCMC) method has been carried out to investigate the effect of covariates on the disease status. Results based on simulated data indicate that estimates of the parameters are biased when classification errors and/or ascertainment are ignored.  相似文献   

15.
In biomedical research, profiling is now commonly conducted, generating high-dimensional genomic measurements (without loss of generality, say genes). An important analysis objective is to rank genes according to their marginal associations with a disease outcome/phenotype. Clinical-covariates, including for example clinical risk factors and environmental exposures, usually exist and need to be properly accounted for. In this study, we propose conducting marginal ranking of genes using a receiver operating characteristic (ROC) based method. This method can accommodate categorical, censored survival, and continuous outcome variables in a very similar manner. Unlike logistic-model-based methods, it does not make very specific assumptions on model, making it robust. In ranking genes, we account for both the main effects of clinical-covariates and their interactions with genes, and develop multiple diagnostic accuracy improvement measurements. Using simulation studies, we show that the proposed method is effective in that genes associated with or gene–covariate interactions associated with the outcome receive high rankings. In data analysis, we observe some differences between the rankings using the proposed method and the logistic-model-based method.  相似文献   

16.
Recently-developed genotype imputation methods are a powerful tool for detecting untyped genetic variants that affect disease susceptibility in genetic association studies. However, existing imputation methods require individual-level genotype data, whereas in practice it is often the case that only summary data are available. For example this may occur because, for reasons of privacy or politics, only summary data are made available to the research community at large; or because only summary data are collected, as in DNA pooling experiments. In this article, we introduce a new statistical method that can accurately infer the frequencies of untyped genetic variants in these settings, and indeed substantially improve frequency estimates at typed variants in pooling experiments where observations are noisy. Our approach, which predicts each allele frequency using a linear combination of observed frequencies, is statistically straight-forward, and related to a long history of the use of linear methods for estimating missing values (e.g. Kriging). The main statistical novelty is our approach to regularizing the covariance matrix estimates, and the resulting linear predictors, which is based on methods from population genetics. We find that, besides being both fast and flexible - allowing new problems to be tackled that cannot be handled by existing imputation approaches purpose-built for the genetic context - these linear methods are also very accurate. Indeed, imputation accuracy using this approach is similar to that obtained by state-of-the art imputation methods that use individual-level data, but at a fraction of the computational cost.  相似文献   

17.
The error distribution is generally unknown in deconvolution problems with real applications. A separate independent experiment is thus often conducted to collect the additional noise data in those studies. In this paper, we study the nonparametric deconvolution estimation from a contaminated sample coupled with an additional noise sample. A ridge-based kernel deconvolution estimator is proposed and its asymptotic properties are investigated depending on the error magnitude. We then present a data-driven bandwidth selection algorithm with combining the bootstrap method and the idea of simulation extrapolation. The finite sample performance of the proposed methods and the effects of error magnitude are evaluated through simulation studies. A real data analysis for a gene Illumina BeadArray study is performed to illustrate the use of the proposed methods.  相似文献   

18.
In many epidemiologic studies the first indication of an environmental or genetic contribution to the risk of disease is the way in which the diseased cases cluster within the same family units. The concept of clustering is contrasted with incidence. We assume that all individuals within the same family are independent, up to their disease status. This assumption is used to provide an exact test of the initial hypothesis of no familial link with the disease, conditional on the number of diseased cases and the sizes of the various family units. Ascertainment bias is described and the appropriate sampling distribution is demonstrated. Two numerical examples with published data illustrate these methods.  相似文献   

19.
Spatially correlated data appear in many environmental studies, and consequently there is an increasing demand for estimation methods that take account of spatial correlation and thereby improve the accuracy of estimation. In this paper we propose an iterative nonparametric procedure for modelling spatial data with general correlation structures. The asymptotic normality of the proposed estimators is established under mild conditions. We demonstrate, using both simulation and case studies, that the proposed estimators are more efficient than the traditional locally linear methods which fail to account for spatial correlation.  相似文献   

20.
We present two new statistics for estimating the number of factors underlying in a multivariate system. One of the two new methods, the original NUMFACT, has been used in high profile environmental studies. The two new methods are first explained from a geometrical viewpoint. We then present an algebraic development and asymptotic cutoff points. Next we present a simulation study that shows that for skewed data the new methods are typically superior to traditional methods and for normally distributed data the new methods are competitive to the best of the traditional methods. We finally show how the methods compare by using two environmental data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号