首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
The field of genetic epidemiology is growing rapidly with the realization that many important diseases are influenced by both genetic and environmental factors. For this reason, pedigree data are becoming increasingly valuable as a means of studying patterns of disease occurrence. Analysis of pedigree data is complicated by the lack of independence among family members and by the non-random sampling schemes used to ascertain families. An additional complicating factor is the variability in age at disease onset from one person to another. In developing statistical methods for analysing pedigree data, analytic results are often intractable, making simulation studies imperative for assessing the performance of proposed methods and estimators. In this paper, an algorithm is presented for simulating disease data in pedigrees, incorporating variable age at onset and genetic and environmental effects. Computational formulas are developed in the context of a proportional hazards model and assuming single ascertainment of families, but the methods can be easily generalized to alternative models. The algorithm is computationally efficient, making multi-dataset simulation studies feasible. Numerical examples are provided to demonstrate the methods.  相似文献   

2.
Consider a standard conjugate family of prior distributions for a vector-parameter indexing an exponential family. Two distinct model parameterizations may well lead to standard conjugate families which are not consistent, i.e. one family cannot be derived from the other by the usual change-of-variable technique. This raises the problem of finding suitable parameterizations that may lead to enriched conjugate families which are more flexible than the traditional ones. The previous remark motivates the definition of a new property for an exponential family, named conditional reducibility. Features of conditionally-reducible natural exponential families are investigated thoroughly. In particular, we relate this new property to the notion of cut, and show that conditionally-reducible families admit a reparameterization in terms of a vector having likelihood-independent components. A general methodology to obtain enriched conjugate distributions for conditionally-reducible families is described in detail, generalizing previous works and more recent contributions in the area. The theory is illustrated with reference to natural exponential families having simple quadratic variance function.  相似文献   

3.
A generalized linear empirical Bayes model is developed for empirical Bayes analysis of several means in natural exponential families. A unified approach is presented for all natural exponential families with quadratic variance functions (the Normal, Poisson, Binomial, Gamma, and two others.) The hyperparameters are estimated using the extended quasi-likelihood of Nelder and Pregibon (1987), which is easily implemented via the GLIM package. The accuracy of these estimates is developed by asymptotic approximation of the variance. Two data examples are illustrated.  相似文献   

4.
Many late-onset diseases are caused by what appears to be a combination of a genetic predisposition to disease and environmental factors. The use of existing cohort studies provides an opportunity to infer genetic predisposition to disease on a representative sample of a study population, now that many such studies are gathering genetic information on the participants. One feature to using existing cohorts is that subjects may be censored due to death prior to genetic sampling, thereby adding a layer of complexity to the analysis. We develop a statistical framework to infer parameters of a latent variables model for disease onset. The latent variables model describes the role of genetic and modifiable risk factors on the onset ages of multiple diseases, and accounts for right-censoring of disease onset ages. The framework also allows for missing genetic information by inferring a subject's unknown genotype through appropriately incorporated covariate information. The model is applied to data gathered in the Framingham Heart Study for measuring the effect of different Apo-E genotypes on the occurrence of various cardiovascular disease events.  相似文献   

5.
Large cohort studies are commonly launched to study the risk effect of genetic variants or other risk factors on a chronic disorder. In these studies, family data are often collected to provide additional information for the purpose of improving the inference results. Statistical analysis of the family data can be very challenging due to the missing observations of genotypes, incomplete records of disease occurrences in family members, and the complicated dependence attributed to the shared genetic background and environmental factors. In this article, we investigate a class of logistic models with family-shared random effects to tackle these challenges, and develop a robust regression method based on the conditional logistic technique for statistical inference. An expectation–maximization (EM) algorithm with fast computation speed is developed to handle the missing genotypes. The proposed estimators are shown to be consistent and asymptotically normal. Additionally, a score test based on the proposed method is derived to test the genetic effect. Extensive simulation studies demonstrate that the proposed method performs well in finite samples in terms of estimate accuracy, robustness and computational speed. The proposed procedure is applied to an Alzheimer's disease study.  相似文献   

6.
It may sometimes be clear from background knowledge that a population under investigation proportionally consists of a known number of subpopulations, whose distributions belong to the same, yet unknown, family. While a parametric family is commonly used in practice, one can also consider some nonparametric families to avoid distributional misspecification. In this article, we propose a solution using a mixture-based nonparametric family for the component distribution in a finite mixture model as opposed to some recent research that utilizes a kernel-based approach. In particular, we present a semiparametric maximum likelihood estimation procedure for the model parameters and tackle the bandwidth parameter selection problem via some popular means for model selection. Empirical comparisons through simulation studies and three real data sets suggest that estimators based on our mixture-based approach are more efficient than those based on the kernel-based approach, in terms of both parameter estimation and overall density estimation.  相似文献   

7.
Abstract. Family‐based case–control designs are commonly used in epidemiological studies for evaluating the role of genetic susceptibility and environmental exposure to risk factors in the etiology of rare diseases. Within this framework, it is often reasonable to assume genetic susceptibility and environmental exposure being conditionally independent of each other within families in the source population. We focus on this setting to explore the situation of measurement error affecting the assessment of the environmental exposure. We correct for measurement error through a likelihood‐based method. We exploit a conditional likelihood approach to relate the probability of disease to the genetic and the environmental risk factors. We show that this approach provides less biased and more efficient results than that based on logistic regression. Regression calibration, instead, provides severely biased estimators of the parameters. The comparison of the correction methods is performed through simulation, under common measurement error structures.  相似文献   

8.
The problem of inference based on a rounded random sample from the exponential distribution is treated. The main results are given by an explicit expression for the maximum-likelihood estimator, a confidence interval with a guaranteed level of confidence, and a conjugate class of distributions for Bayesian analysis. These results are exemplified on two concrete examples. The large and increasing body of results on the topic of grouped data has been mostly focused on the effect on the estimators. The methods and results for the derivation of confidence intervals here are hence of some general theoretical value as a model approach for other parametric models. The Bayesian credibility interval recommended in cases with a lack of other prior information follows by letting the prior equal the inverted exponential with a scale equal to one divided by the resolution. It is shown that this corresponds to the standard non-informative prior for the scale in the case of non-rounded data. For cases with the absence of explicit prior information it is argued that the inverted exponential prior with a scale given by the resolution is a reasonable choice for more general digitized scale families also.  相似文献   

9.
The quadratic discriminant function (QDF) with known parameters has been represented in terms of a weighted sum of independent noncentral chi-square variables. To approximate the density function of the QDF as m-dimensional exponential family, its moments in each order have been calculated. This is done using the recursive formula for the moments via the Stein's identity in the exponential family. We validate the performance of our method using simulation study and compare with other methods in the literature based on the real data. The finding results reveal better estimation of misclassification probabilities, and less computation time with our method.  相似文献   

10.
In the case of exponential families, it is a straightforward matter to approximate a density function by use of summary statistics; however, an appropriate approach to such approximation is far less clear when an exponential family is not assumed. In this paper, a maximin argument based on information theory is used to derive a new approach to density approximation from summary statistics which is not restricted by the assumption of validity of an underlying exponential family. Information-theoretic criteria are developed to assess loss of predictive power of summary statistics under such minimal knowledge. Under these criteria, optimal density approximations in the maximin sense are obtained and shown to be related to exponential families. Conditions for existence of optimal density approximations are developed. Applications of the proposed approach are illustrated, and methods for estimation of densities are provided in the case of simple random sampling. Large-sample theory for estimates is developed.  相似文献   

11.
In familial data, ascertainment correction is often necessary to decipher genetic bases of complex human diseases. This is because families usually are not drawn at random or are not selected according to well-defined rules. While there has been much progress in identifying genes associated with a certain phenotype, little attention has been paid so far for familial studies on exploring common genetic influences on different phenotypes of interest. In this study, we develop a powerful bivariate analytical approach that can be used for a complex situation with paired binary traits. In addition, our model has been framed to accommodate the possibility of imperfect diagnosis as traits may be wrongly observed. Thus, the primary focus is to see whether a particular gene jointly influences both phenotypes. We examine the plausibility of this theory in a sample of families ascertained on the basis of at least one affected individual. We propose a bivariate binary mixed model that provides a novel and flexible way to account for wrong ascertainment in families collected with multiple cases. A hierarchical Bayesian analysis using Markov Chain Monte Carlo (MCMC) method has been carried out to investigate the effect of covariates on the disease status. Results based on simulated data indicate that estimates of the parameters are biased when classification errors and/or ascertainment are ignored.  相似文献   

12.
Two dice are rolled repeatedly, only their sum is registered. Have the two dice been “shaved,” so two of the six sides appear more frequently? Pavlides and Perlman discussed this somewhat complicated type of situation through curved exponential families. Here, we contrast their approach by regarding data as incomplete data from a simple exponential family. The latter, supplementary approach is in some respects simpler, it provides additional insight about the relationships among the likelihood equation, the Fisher information, and the EM algorithm, and it illustrates the information content in ancillary statistics.  相似文献   

13.
In many epidemiologic studies the first indication of an environmental or genetic contribution to the risk of disease is the way in which the diseased cases cluster within the same family units. The concept of clustering is contrasted with incidence. We assume that all individuals within the same family are independent, up to their disease status. This assumption is used to provide an exact test of the initial hypothesis of no familial link with the disease, conditional on the number of diseased cases and the sizes of the various family units. Ascertainment bias is described and the appropriate sampling distribution is demonstrated. Two numerical examples with published data illustrate these methods.  相似文献   

14.
Large, family-based imaging studies can provide a better understanding of the interactions of environmental and genetic influences on brain structure and function. The interpretation of imaging data from large family studies, however, has been hindered by the paucity of well-developed statistical tools for that permit the analysis of complex imaging data together with behavioral and clinical data. In this paper, we propose to use two methods for these analyses. First, a variance components model along with score statistics is used to test linear hypotheses of unknown parameters, such as the associations of brain measures (e.g., cortical and subcortical surfaces) with their potential genetic determinants. Second, we develop a test procedure based on a resampling method to assess simultaneously the statistical significance of linear hypotheses across the entire brain. The value of these methods lies in their computational simplicity and in their applicability to a wide range of imaging data. Simulation studies show that our test procedure can accurately control the family-wise error rate. We apply our methods to the detection of statistical significance of gender-by-age interactions and of the effects of genetic variation on the thickness of the cerebral cortex in a family study of major depressive disorder.  相似文献   

15.
16.
We propose a model selection criterion for correlated survival data when the cluster size is informative to the outcome. This approach, called Resampling Cluster Survival Information Criterion (RCSIC), uses the Cox proportional hazards model that is weighted with the inverse of the cluster size. The RCSIC based on the within-cluster resampling idea takes into account the possible variability of the within-cluster subsampling and the possible informativeness of cluster sizes. The RCSIC allows for easy execution for the within-cluster resampling idea without a large number of resamples of the data. In contrast with the traditional model selection method in survival analysis, the RCSIC has an additional penalization for the within-cluster subsampling variability. Our simulations show the satisfactory results where the RCSIC provides a more robust power for variable selection in terms of clustered survival analysis, regardless of whether informative cluster size exists or not. Applying the RCSIC method to a periodontal disease studies, we identify the tooth loss in patients associated with the risk factors, Age, Filled Tooth, Molar, Crown, Decayed Tooth, and Smoking Status, respectively.  相似文献   

17.
Classes of processes of the diffusion type permitting a sufficient data reduction are derived. None of these classes are exponential families in the usual sense. For one type of such classes the sufficient statistic equals that of a curved exponential family of diffusion-type processes. For a second type the last observation is sufficient. In particular cases both types of classes are defined by means of a RICCATI equation  相似文献   

18.
In fitting a generalized linear model, many authors have noticed that data sets can show greater residual variability than predicted under the exponential family. Two main approaches have been used to model this overdispersion. The first approach uses a sampling density which is a conjugate mixture of exponential family distributions. The second uses a quasilikelihood which adds a new scale parameter to the exponential likelihood. The approaches are compared by means of a Bayesian analysis using noninformative priors. In examples, it is indicated that the posterior analysis can be significantly different using the two approaches.  相似文献   

19.
In studies of disease inheritance, it is more convenient to collect family data by first locating an affected individual and then enquiring about the status of his or her relatives. Although the different categories of children classified by disease, sex, and other covariates may have a particular multinomial distribution among families of a given size, the numbers as ascertained do not have the same distribution because of unequal probabilities of selection of families. The introduction of weighted distributions to correct for ascertainment bias in the estimation of parameters in the classical segregation model can be traced to Fisher in 1934. This theory was presented in a general formulation by C. R. Rao at the First International Symposium on Classical and Contagious Distributions in 1963. Further expansion on the topic was given by C. R. Rao in the ISI Centenary Volume published in 1985. The effects of different two-phase sampling designs on the estimation of parameters in the classical segregation model are examined. An approximation to the classical segregation likelihood model is found to produce results close to those of the exact likelihood function in Monte Carlo simulations for a balanced two-phase design. This has implications for more complex models in which the computation of the exact likelihood is prohibitive, such as for the enhancement of a typical survey sampling plan designed initially for linkage analysis but then used retroactively for a combined segregation and linkage analysis.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号