首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 921 毫秒
1.
We study the genotype calling algorithms for the high-throughput single-nucleotide polymorphism (SNP) arrays. Building upon the novel SNP-robust multi-chip average preprocessing approach and the state-of-the-art corrected robust linear model with Mahalanobis distance (CRLMM) approach for genotype calling, we propose a simple modification to better model and combine the information across multiple SNPs with empirical Bayes modeling, which could often significantly improve the genotype calling of CRLMM. Through applications to the HapMap Trio data set and a non-HapMap test set of high quality SNP chips, we illustrate the competitive performance of the proposed method.  相似文献   

2.
In a Kin–Cohort design, genotype data are first obtained from a sample composed mostly of individuals who have experienced onset of a disease. Disease history data are then obtained from their relatives. The design is useful when examining the conditional distribution, given genotype, of phenotypes such as disease severity scores when some genotypes are rare. Here, the problem of combining the genotype data in the probands with the phenotype information in their relatives is considered. A class of unbiased estimators is described, the optimal member which reaches the semiparametric efficiency bound is identified, and results from simulation experiments are discussed.  相似文献   

3.
We consider testing hypotheses about a single Poisson mean. When prior information is not available, use of objective priors is of interest. We provide intrinsic priors based on the arithmetic intrinsic and fractional Bayes factors, and evaluate their characteristics.  相似文献   

4.
Genome-wide association studies commonly involve simultaneous tests of millions of single nucleotide polymorphisms (SNP) for disease association. The SNPs in nearby genomic regions, however, are often highly correlated due to linkage disequilibrium (LD, a genetic term for correlation). Simple Bonferonni correction for multiple comparisons is therefore too conservative. Permutation tests, which are often employed in practice, are both computationally expensive for genome-wide studies and limited in their scopes. We present an accurate and computationally efficient method, based on Poisson de-clumping heuristics, for approximating genome-wide significance of SNP associations. Compared with permutation tests and other multiple comparison adjustment approaches, our method computes the most accurate and robust p-value adjustments for millions of correlated comparisons within seconds. We demonstrate analytically that the accuracy and the efficiency of our method are nearly independent of the sample size, the number of SNPs, and the scale of p-values to be adjusted. In addition, our method can be easily adopted to estimate false discovery rate. When applied to genome-wide SNP datasets, we observed highly variable p-value adjustment results evaluated from different genomic regions. The variation in adjustments along the genome, however, are well conserved between the European and the African populations. The p-value adjustments are significantly correlated with LD among SNPs, recombination rates, and SNP densities. Given the large variability of sequence features in the genome, we further discuss a novel approach of using SNP-specific (local) thresholds to detect genome-wide significant associations. This article has supplementary material online.  相似文献   

5.
Data collected on a rectangular lattice are common in many areas, and models used often make simplifying assumptions. These assumptions include axial symmetry in the spatial process and separability. Some different methods for testing axial symmetry and separability are considered. Using the sample periodogram is shown to provide some simple satisfactory tests of both hypotheses, but tests for separability given axial symmetry have low power for small lattices.  相似文献   

6.
The transmission/disequilibrium test (TDT) is widely used to detect the linkage disequilibrium between a candidate locus (a marker) and a disease locus. The TDT is a family-based design and has the advantage that it is a valid test when population stratification exist. The TDT requires the marker genotypes of affected individuals and their parents. For diseases with late age of onset, it is difficult or impossible to obtain the marker genotype of the parents. Therefore, when both parents marker genotypes are unavailable, Ewex and Spielman extended the TDT to the S-TDT for use in sibships with at least one affected individual and one unaffected individual. When only one of the parents' genotype is available. Sun et al. proposed a test the 1-TDT, for use with niarker genotypes of affected individuals and only one available parent. Here, we study the saniple sizes of TDT, S-TDT, and 1-TDT. We show that the sample size needed for the 1-TDT is rogghly the same as the sample size needed for the S-TDT with two sibs and is about twice the sample size for the TDT.  相似文献   

7.
We propose a more efficient version of the slice sampler for Dirichlet process mixture models described by Walker (Commun. Stat., Simul. Comput. 36:45–54, 2007). This new sampler allows for the fitting of infinite mixture models with a wide-range of prior specifications. To illustrate this flexibility we consider priors defined through infinite sequences of independent positive random variables. Two applications are considered: density estimation using mixture models and hazard function estimation. In each case we show how the slice efficient sampler can be applied to make inference in the models. In the mixture case, two submodels are studied in detail. The first one assumes that the positive random variables are Gamma distributed and the second assumes that they are inverse-Gaussian distributed. Both priors have two hyperparameters and we consider their effect on the prior distribution of the number of occupied clusters in a sample. Extensive computational comparisons with alternative “conditional” simulation techniques for mixture models using the standard Dirichlet process prior and our new priors are made. The properties of the new priors are illustrated on a density estimation problem.  相似文献   

8.
Model selection procedures often depend explicitly on the sample size n of the experiment. One example is the Bayesian information criterion (BIC) criterion and another is the use of Zellner–Siow priors in Bayesian model selection. Sample size is well-defined if one has i.i.d real observations, but is not well-defined for vector observations or in non-i.i.d. settings; extensions of critera such as BIC to such settings thus requires a definition of effective sample size that applies also in such cases. A definition of effective sample size that applies to fairly general linear models is proposed and illustrated in a variety of situations. The definition is also used to propose a suitable ‘scale’ for default proper priors for Bayesian model selection.  相似文献   

9.
The versatile new criterion called the intrinsic Bayes factor (IBF), introduced by Berger and Pericchi [J. Amer. Statist. Assoc. 91 (1996) 109–122], has made it possible to perform model selection and hypotheses testing using standard (improper) noninformative priors in a variety of situations. In this paper, we use their methodology to test several hypotheses regarding the shape parameter of the power law process, which has been widely used to model failure times of repairable systems. Assuming that we have data from the process according to the time-truncation sampling scheme, we derive the arithmetic IBFs using four default priors, including the reference and Jeffreys priors. We establish the frequentist probability matching properties of these priors. We also identify two priors that are justifiable under both time-truncation and failure-truncation schemes, so that the IBFs for both schemes can be unified. Deducing the intrinsic priors of a certain canonical form, as the time of truncation tends to infinity, we show that the arithmetic IBFs correspond asymptotically to actual Bayes factors. We also discuss the expected IBFs, which are useful with small samples. We then use these results to analyze an actual data set on the interruption times of a transmission line, summarizing our results under the default priors.  相似文献   

10.
Many late-onset diseases are caused by what appears to be a combination of a genetic predisposition to disease and environmental factors. The use of existing cohort studies provides an opportunity to infer genetic predisposition to disease on a representative sample of a study population, now that many such studies are gathering genetic information on the participants. One feature to using existing cohorts is that subjects may be censored due to death prior to genetic sampling, thereby adding a layer of complexity to the analysis. We develop a statistical framework to infer parameters of a latent variables model for disease onset. The latent variables model describes the role of genetic and modifiable risk factors on the onset ages of multiple diseases, and accounts for right-censoring of disease onset ages. The framework also allows for missing genetic information by inferring a subject's unknown genotype through appropriately incorporated covariate information. The model is applied to data gathered in the Framingham Heart Study for measuring the effect of different Apo-E genotypes on the occurrence of various cardiovascular disease events.  相似文献   

11.
We present particle-based algorithms for sequential filtering and parameter learning in state-space autoregressive (AR) models with structured priors. Non-conjugate priors are specified on the AR coefficients at the system level by imposing uniform or truncated normal priors on the moduli and wavelengths of the reciprocal roots of the AR characteristic polynomial. Sequential Monte Carlo algorithms are considered and implemented for on-line filtering and parameter learning within this modeling framework. More specifically, three SMC approaches are considered and compared by applying them to data simulated from different state-space AR models. An analysis of a human electroencephalogram signal is also presented to illustrate the use of the structured state-space AR models in describing biomedical signals.  相似文献   

12.
This paper develops an objective Bayesian analysis method for estimating unknown parameters of the half-logistic distribution when a sample is available from the progressively Type-II censoring scheme. Noninformative priors such as Jeffreys and reference priors are derived. In addition, derived priors are checked to determine whether they satisfy probability-matching criteria. The Metropolis–Hasting algorithm is applied to generate Markov chain Monte Carlo samples from these posterior density functions because marginal posterior density functions of each parameter cannot be expressed in an explicit form. Monte Carlo simulations are conducted to investigate frequentist properties of estimated models under noninformative priors. For illustration purposes, a real data set is presented, and the quality of models under noninformative priors is evaluated through posterior predictive checking.  相似文献   

13.
Interactions among multiple genes across the genome may contribute to the risks of many complex human diseases. Whole-genome single nucleotide polymorphisms (SNPs) data collected for many thousands of SNP markers from thousands of individuals under the case-control design promise to shed light on our understanding of such interactions. However, nearby SNPs are highly correlated due to linkage disequilibrium (LD) and the number of possible interactions is too large for exhaustive evaluation. We propose a novel Bayesian method for simultaneously partitioning SNPs into LD-blocks and selecting SNPs within blocks that are associated with the disease, either individually or interactively with other SNPs. When applied to homogeneous population data, the method gives posterior probabilities for LD-block boundaries, which not only result in accurate block partitions of SNPs, but also provide measures of partition uncertainty. When applied to case-control data for association mapping, the method implicitly filters out SNP associations created merely by LD with disease loci within the same blocks. Simulation study showed that this approach is more powerful in detecting multi-locus associations than other methods we tested, including one of ours. When applied to the WTCCC type 1 diabetes data, the method identified many previously known T1D associated genes, including PTPN22, CTLA4, MHC, and IL2RA. The method also revealed some interesting two-way associations that are undetected by single SNP methods. Most of the significant associations are located within the MHC region. Our analysis showed that the MHC SNPs form long-distance joint associations over several known recombination hotspots. By controlling the haplotypes of the MHC class II region, we identified additional associations in both MHC class I (HLA-A, HLA-B) and class III regions (BAT1). We also observed significant interactions between genes PRSS16, ZNF184 in the extended MHC region and the MHC class II genes. The proposed method can be broadly applied to the classification problem with correlated discrete covariates.  相似文献   

14.
In this paper, we propose a three level hierarchical Bayesian model for variable selection and estimation in quantile regression problems. Specifically, at the first level we consider a zero mean normal priors for the coefficients with unknown variance parameters. At the second level, we specify two different priors for the unknown variance parameters which introduce two different models producing different levels of sparsity. Then, at the third level we suggest joint improper priors for the unknown hyperparameters assuming they are independent. Simulations and Boston Housing data are utilized to compare the performance of our models with six existing models. The results indicate that our models perform good in the simulations and Boston Housing data.  相似文献   

15.
A multistage variable selection method is introduced for detecting association signals in structured brain‐wide and genome‐wide association studies (brain‐GWAS). Compared to conventional methods that link one voxel to one single nucleotide polymorphism (SNP), our approach is more efficient and powerful in selecting the important signals by integrating anatomic and gene grouping structures in the brain and the genome, respectively. It avoids resorting to a large number of multiple comparisons while effectively controlling the false discoveries. Validity of the proposed approach is demonstrated by both theoretical investigation and numerical simulations. We apply our proposed method to a brain‐GWAS using Alzheimer's Disease Neuroimaging Initiative positron emission tomography (ADNI PET) imaging and genomic data. We confirm previously reported association signals and also uncover several novel SNPs and genes that are either associated with brain glucose metabolism or have their association significantly modified by Alzheimer's disease status.  相似文献   

16.
When prior information on model parameters is weak or lacking, Bayesian statistical analyses are typically performed with so-called “default” priors. We consider the problem of constructing default priors for the parameters of survival models in the presence of censoring, using Jeffreys’ rule. We compare these Jeffreys priors to the “uncensored” Jeffreys priors, obtained without considering censored observations, for the parameters of the exponential and log-normal models. The comparison is based on the frequentist coverage of the posterior Bayes intervals obtained from these prior distributions.  相似文献   

17.
In the Bayesian approach, the Behrens–Fisher problem has been posed as one of estimation for the difference of two means. No Bayesian solution to the Behrens–Fisher testing problem has yet been given due, perhaps, to the fact that the conventional priors used are improper. While default Bayesian analysis can be carried out for estimation purposes, it poses difficulties for testing problems. This paper generates sensible intrinsic and fractional prior distributions for the Behrens–Fisher testing problem from the improper priors commonly used for estimation. It allows us to compute the Bayes factor to compare the null and the alternative hypotheses. This default procedure of model selection is compared with a frequentist test and the Bayesian information criterion. We find discrepancy in the sense that frequentist and Bayesian information criterion reject the null hypothesis for data, that the Bayes factor for intrinsic or fractional priors do not.  相似文献   

18.
In this paper, we develop an info-metric framework for testing hypotheses about structural instability in nonlinear, dynamic models estimated from the information in population moment conditions. Our methods are designed to distinguish between three states of the world: (i) the model is structurally stable in the sense that the population moment condition holds at the same parameter value throughout the sample; (ii) the model parameters change at some point in the sample but otherwise the model is correctly specified; and (iii) the model exhibits more general forms of instability than a single shift in the parameters. An advantage of the info-metric approach is that the null hypotheses concerned are formulated in terms of distances between various choices of probability measures constrained to satisfy (i) and (ii), and the empirical measure of the sample. Under the alternative hypotheses considered, the model is assumed to exhibit structural instability at a single point in the sample, referred to as the break point; our analysis allows for the break point to be either fixed a priori or treated as occuring at some unknown point within a certain fraction of the sample. We propose various test statistics that can be thought of as sample analogs of the distances described above, and derive their limiting distributions under the appropriate null hypothesis. The limiting distributions of our statistics are nonstandard but coincide with various distributions that arise in the literature on structural instability testing within the Generalized Method of Moments framework. A small simulation study illustrates the finite sample performance of our test statistics.  相似文献   

19.
Uniformly most powerful Bayesian tests (UMPBTs) are a new class of Bayesian tests in which null hypotheses are rejected if their Bayes factor exceeds a specified threshold. The alternative hypotheses in UMPBTs are defined to maximize the probability that the null hypothesis is rejected. Here, we generalize the notion of UMPBTs by restricting the class of alternative hypotheses over which this maximization is performed, resulting in restricted most powerful Bayesian tests (RMPBTs). We then derive RMPBTs for linear models by restricting alternative hypotheses to g priors. For linear models, the rejection regions of RMPBTs coincide with those of usual frequentist F‐tests, provided that the evidence thresholds for the RMPBTs are appropriately matched to the size of the classical tests. This correspondence supplies default Bayes factors for many common tests of linear hypotheses. We illustrate the use of RMPBTs for ANOVA tests and t‐tests and compare their performance in numerical studies.  相似文献   

20.
Mixtures of Dirichlet process priors offer a reasonable compromise between purely parametric and purely non‐parametric models, and are popularly used in survival analysis and for testing problems with non‐parametric alternatives. In this paper, we study large sample properties of the posterior distribution with a mixture of Dirichlet process priors. We show that the posterior distribution of the survival function is consistent with right censored data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号