首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recent research has extended standard methods for meta‐analysis to more general forms of evidence synthesis, where the aim is to combine different data types or data summaries that contain information about functions of multiple parameters to make inferences about the parameters of interest. We consider one such scenario in which the goal is to make inferences about the association between a primary binary exposure and continuously valued outcome in the context of several confounding exposures, and where the data are available in various different forms: individual participant data (IPD) with repeated measures, sample means that have been aggregated over strata, and binary data generated by thresholding the underlying continuously valued outcome measure. We show that an estimator of the population mean of a continuously valued outcome can be constructed using binary threshold data provided that a separate estimate of the outcome standard deviation is available. The results of a simulation study show that this estimator has negligible bias but is less efficient than the sample mean – the minimum variance ratio is based on a Taylor series expansion. Combining this estimator with sample means and IPD from different sources (such as a series of published studies) using both linear and probit regression does, however, improve the precision of estimation considerably by incorporating data that would otherwise have been excluded for being in the wrong format. We apply these methods to investigate the association between the G277S mutation in the transferrin gene and serum ferritin (iron) levels separately in pre‐ and post‐menopausal women based on data from three published studies.  相似文献   

2.
Interactions among multiple genes across the genome may contribute to the risks of many complex human diseases. Whole-genome single nucleotide polymorphisms (SNPs) data collected for many thousands of SNP markers from thousands of individuals under the case-control design promise to shed light on our understanding of such interactions. However, nearby SNPs are highly correlated due to linkage disequilibrium (LD) and the number of possible interactions is too large for exhaustive evaluation. We propose a novel Bayesian method for simultaneously partitioning SNPs into LD-blocks and selecting SNPs within blocks that are associated with the disease, either individually or interactively with other SNPs. When applied to homogeneous population data, the method gives posterior probabilities for LD-block boundaries, which not only result in accurate block partitions of SNPs, but also provide measures of partition uncertainty. When applied to case-control data for association mapping, the method implicitly filters out SNP associations created merely by LD with disease loci within the same blocks. Simulation study showed that this approach is more powerful in detecting multi-locus associations than other methods we tested, including one of ours. When applied to the WTCCC type 1 diabetes data, the method identified many previously known T1D associated genes, including PTPN22, CTLA4, MHC, and IL2RA. The method also revealed some interesting two-way associations that are undetected by single SNP methods. Most of the significant associations are located within the MHC region. Our analysis showed that the MHC SNPs form long-distance joint associations over several known recombination hotspots. By controlling the haplotypes of the MHC class II region, we identified additional associations in both MHC class I (HLA-A, HLA-B) and class III regions (BAT1). We also observed significant interactions between genes PRSS16, ZNF184 in the extended MHC region and the MHC class II genes. The proposed method can be broadly applied to the classification problem with correlated discrete covariates.  相似文献   

3.
Nowadays airborne laser scanning is used in many territorial studies, providing point data which may contain strong discontinuities. Motivated by the need to interpolate such data and preserve their edges, this paper considers robust nonparametric smoothers. These estimators, when implemented with bounded loss functions, have suitable jump‐preserving properties. Iterative algorithms are developed here, and are equivalent to nonlinear M‐smoothers, but have the advantage of resembling the linear Kernel regression. The selection of their coefficients is carried out by combining cross‐validation and robust‐tuning techniques. Two real case studies and a simulation experiment confirm the validity of the method; in particular, the performance in building recognition is excellent.  相似文献   

4.
There is considerable interest in understanding how factors such as time and geographic distance between isolates might influence the evolutionary direction of foot‐and‐mouth disease. Genetic differences between viruses can be measured as the proportion of nucleotides that differ for a given sequence or gene. We present a Bayesian hierarchical regression model for the statistical analysis of continuous data with sample space restricted to the interval (0, 1). The data are modelled using beta distributions with means that depend on covariates through a link function. We discuss methodology for: (i) the incorporation of informative prior information into an analysis; (ii) fitting the model using Markov chain Monte Carlo sampling; (iii) model selection using Bayes factors; and (iv) semiparametric beta regression using penalized splines. The model was applied to two different datasets.  相似文献   

5.
Recently-developed genotype imputation methods are a powerful tool for detecting untyped genetic variants that affect disease susceptibility in genetic association studies. However, existing imputation methods require individual-level genotype data, whereas in practice it is often the case that only summary data are available. For example this may occur because, for reasons of privacy or politics, only summary data are made available to the research community at large; or because only summary data are collected, as in DNA pooling experiments. In this article, we introduce a new statistical method that can accurately infer the frequencies of untyped genetic variants in these settings, and indeed substantially improve frequency estimates at typed variants in pooling experiments where observations are noisy. Our approach, which predicts each allele frequency using a linear combination of observed frequencies, is statistically straight-forward, and related to a long history of the use of linear methods for estimating missing values (e.g. Kriging). The main statistical novelty is our approach to regularizing the covariance matrix estimates, and the resulting linear predictors, which is based on methods from population genetics. We find that, besides being both fast and flexible - allowing new problems to be tackled that cannot be handled by existing imputation approaches purpose-built for the genetic context - these linear methods are also very accurate. Indeed, imputation accuracy using this approach is similar to that obtained by state-of-the art imputation methods that use individual-level data, but at a fraction of the computational cost.  相似文献   

6.
Count response data often exhibit departures from the assumptions of standard Poisson generalized linear models. In particular, cluster level correlation of the data and truncation at zero are two common characteristics of such data. This paper describes a random components truncated Poisson model that can be applied to clustered and zero‐truncated count data. Residual maximum likelihood method estimators for the parameters of this model are developed and their use is illustrated using a dataset of non‐zero counts of sheets with edge‐strain defects in iron sheets produced by the Mobarekeh Steel Complex, Iran. The paper also reports on a small‐scale simulation study that supports the estimation procedure.  相似文献   

7.
In longitudinal clinical trials, when outcome variables at later time points are only defined for patients who survive to those times, the evaluation of the causal effect of treatment is complicated. In this paper, we describe an approach that can be used to obtain the causal effect of three treatment arms with ordinal outcomes in the presence of death using a principal stratification approach. We introduce a set of flexible assumptions to identify the causal effect and implement a sensitivity analysis for non-identifiable assumptions which we parameterize parsimoniously. Methods are illustrated on quality of life data from a recent colorectal cancer clinical trial.  相似文献   

8.
We consider two related aspects of the study of old‐age mortality. One is the estimation of a parameterized hazard function from grouped data, and the other is its possible deceleration at extreme old age owing to heterogeneity described by a mixture of distinct sub‐populations. The first is treated by half of a logistic transform, which is known to be free of discretization bias at older ages, and also preserves the increasing slope of the log hazard in the Gompertz case. It is assumed that data are available in the form published by official statistical agencies, that is, as aggregated frequencies in discrete time. Local polynomial modelling and weighted least squares are applied to cause‐of‐death mortality counts. The second, related, problem is to discover what conditions are necessary for population mortality to exhibit deceleration for a mixture of Gompertz sub‐populations. The general problem remains open but, in the case of three groups, we demonstrate that heterogeneity may be such that it is possible for a population to show decelerating mortality and then return to a Gompertz‐like increase at a later age. This implies that there are situations, depending on the extent of heterogeneity, in which there is at least one age interval in which the hazard function decreases before increasing again.  相似文献   

9.
Estimation of the allele frequency at genetic markers is a key ingredient in biological and biomedical research, such as studies of human genetic variation or of the genetic etiology of heritable traits. As genetic data becomes increasingly available, investigators face a dilemma: when should data from other studies and population subgroups be pooled with the primary data? Pooling additional samples will generally reduce the variance of the frequency estimates; however, used inappropriately, pooled estimates can be severely biased due to population stratification. Because of this potential bias, most investigators avoid pooling, even for samples with the same ethnic background and residing on the same continent. Here, we propose an empirical Bayes approach for estimating allele frequencies of single nucleotide polymorphisms. This procedure adaptively incorporates genotypes from related samples, so that more similar samples have a greater influence on the estimates. In every example we have considered, our estimator achieves a mean squared error (MSE) that is smaller than either pooling or not, and sometimes substantially improves over both extremes. The bias introduced is small, as is shown by a simulation study that is carefully matched to a real data example. Our method is particularly useful when small groups of individuals are genotyped at a large number of markers, a situation we are likely to encounter in a genome-wide association study.  相似文献   

10.
Sometimes, in industrial quality control experiments and destructive stress testing, only values smaller than all previous ones are observed. Here we consider nonparametric quantile estimation, both the ‘sample quantile function’ and kernel-type estimators, from such record-breaking data. For a single record-breaking sample, consistent estimation is not possible except in the extreme tails of the distribution. Hence replication is required, and for m. such independent record-breaking samples the quantile estimators are shown to be strongly consistent and asymptotically normal as m-→∞. Also, for small m, the mean-squared errors, biases and smoothing parameters (for the smoothed estimators) are investigated through computer simulations.  相似文献   

11.
Suppose the probability model for failure time data, subject to censoring, is specified by the hazard function λ(t)exp(βT x), where x is a vector of covariates. Analytical difficulties involved in finding the optimal design are avoided by assuming that λ is completely specified and by using D-optimality based on the information matrix for β Optimal designs are found to depend on β, but some results of practical consequence are obtained. It is found that censoring does not affect the choice of design appreciably when βT x ≥ 0 for all points of the feasible region, but may have an appreciable effect when βixi 0, for all i and all points in the feasible experimental region. The nature of the effect is discussed in detail for the cases of one and two parameters. It is argued that in practical biomedical situations the optimal design is almost always the same as for uncensored data.  相似文献   

12.
A model with nonrandom latent and infectious periods is suggested for epidemics in a large community. This permits a relatively complete statistical analysis of data from the spread of a single epidemic. An attractive feature of such models is the possibility of exploring how the rate of spread of the disease depends on the number of susceptibles and infectives. An application to smallpox data is included.  相似文献   

13.
Fan J  Feng Y  Niu YS 《Annals of statistics》2010,38(5):2723-2750
Estimation of genewise variance arises from two important applications in microarray data analysis: selecting significantly differentially expressed genes and validation tests for normalization of microarray data. We approach the problem by introducing a two-way nonparametric model, which is an extension of the famous Neyman-Scott model and is applicable beyond microarray data. The problem itself poses interesting challenges because the number of nuisance parameters is proportional to the sample size and it is not obvious how the variance function can be estimated when measurements are correlated. In such a high-dimensional nonparametric problem, we proposed two novel nonparametric estimators for genewise variance function and semiparametric estimators for measurement correlation, via solving a system of nonlinear equations. Their asymptotic normality is established. The finite sample property is demonstrated by simulation studies. The estimators also improve the power of the tests for detecting statistically differentially expressed genes. The methodology is illustrated by the data from MicroArray Quality Control (MAQC) project.  相似文献   

14.
This paper discusses the estimation of time‐dependent probabilities of a finite state‐space discrete‐time process using aggregate cross‐sectional data. A large‐sample version of multistate logistic regression is described and shown to be useful for analysing multistate life tables. The technique is applied to the estimation of disability‐free, severely disabled and other disabled survival curves and health expectancies in Australia, based on data from national health surveys in 1988, 1993 and 1998. A conclusion is that there has been a general upward trend in the future time expected to be spent in a state of disability, the picture being more pessimistic for males than females. For example, during 1988‐1998 the estimated increase in male life expectancy at birth of 2.7 years is decomposed as a decrease of 1.2 years in disability‐free health (life) expectancy and increases of 1.3 and 2.6 years, respectively, in states of severe disability and other disability.  相似文献   

15.
This paper presents a method of fitting factorial models to recidivism data consisting of the (possibly censored) time to ‘fail’ of individuals, in order to test for differences between groups. Here ‘failure’ means rearrest, reconviction or reincarceration, etc. A proportion P of the sample is assumed to be ‘susceptible’ to failure, i.e. to fail eventually, while the remaining 1-P are ‘immune’, and never fail. Thus failure may be described in two ways: by the probability P that an individual ever fails again (‘probability of recidivism’), and by the rate of failure Λ for the susceptibles. Related analyses have been proposed previously: this paper argues that a factorial approach, as opposed to regression approaches advocated previously, offers simplified analysis and interpretation of these kinds of data. The methods proposed, which are also applicable in medical statistics and reliability analyses, are demonstrated on data sets in which the factors are Parole Type (released to freedom or on parole), Age group (≤ 20 years, 20–40 years, > 40 years), and Marital Status. The outcome (failure) is a return to prison following first or second release.  相似文献   

16.
Several statistics based on the empirical characteristic function have been proposed for testing the simple goodness-of-fit hypothesis that the data come from a population with a completely specified characteristic function which cannot be inverted in a closed form, the typical example being the class of stable characteristic functions. As an alternative approach, it is pointed out here that the inversion formula of Gil-Pelaez and Rosén, as applied to the data and the hypothetical characteristic function via numerical integration, is the natural replacement of the probability integral transformation in the given situation. The transformed sample is from the uniform (0, l) distribution if and only if the null hypothesis is true, and for testing uniformity on (0,1) the whole arsenal of methods statistics so far produced can be used.  相似文献   

17.
The commonly used survey technique of clustering introduces dependence into sample data. Such data is frequently used in economic analysis, though the dependence induced by the sample structure of the data is often ignored. In this paper, the effect of clustering on the non-parametric, kernel estimate of the density, f(x), is examined. The window width commonly used for density estimation for the case of i.i.d. data is shown to no longer be optimal. A new optimal bandwidth using a higher-order kernel is proposed and is shown to give a smaller integrated mean squared error than two window widths which are widely used for the case of i.i.d. data. Several illustrations from simulation are provided.  相似文献   

18.
The estimation of Bayesian networks given high‐dimensional data, in particular gene expression data, has been the focus of much recent research. Whilst there are several methods available for the estimation of such networks, these typically assume that the data consist of independent and identically distributed samples. It is often the case, however, that the available data have a more complex mean structure, plus additional components of variance, which must then be accounted for in the estimation of a Bayesian network. In this paper, score metrics that take account of such complexities are proposed for use in conjunction with score‐based methods for the estimation of Bayesian networks. We propose first, a fully Bayesian score metric, and second, a metric inspired by the notion of restricted maximum likelihood. We demonstrate the performance of these new metrics for the estimation of Bayesian networks using simulated data with known complex mean structures. We then present the analysis of expression levels of grape‐berry genes adjusting for exogenous variables believed to affect the expression levels of the genes. Demonstrable biological effects can be inferred from the estimated conditional independence relationships and correlations amongst the grape‐berry genes.  相似文献   

19.
The most common forecasting methods in business are based on exponential smoothing, and the most common time series in business are inherently non‐negative. Therefore it is of interest to consider the properties of the potential stochastic models underlying exponential smoothing when applied to non‐negative data. We explore exponential smoothing state space models for non‐negative data under various assumptions about the innovations, or error, process. We first demonstrate that prediction distributions from some commonly used state space models may have an infinite variance beyond a certain forecasting horizon. For multiplicative error models that do not have this flaw, we show that sample paths will converge almost surely to zero even when the error distribution is non‐Gaussian. We propose a new model with similar properties to exponential smoothing, but which does not have these problems, and we develop some distributional properties for our new model. We then explore the implications of our results for inference, and compare the short‐term forecasting performance of the various models using data on the weekly sales of over 300 items of costume jewelry. The main findings of the research are that the Gaussian approximation is adequate for estimation and one‐step‐ahead forecasting. However, as the forecasting horizon increases, the approximate prediction intervals become increasingly problematic. When the model is to be used for simulation purposes, a suitably specified scheme must be employed.  相似文献   

20.
Acute respiratory diseases are transmitted over networks of social contacts. Large-scale simulation models are used to predict epidemic dynamics and evaluate the impact of various interventions, but the contact behavior in these models is based on simplistic and strong assumptions which are not informed by survey data. These assumptions are also used for estimating transmission measures such as the basic reproductive number and secondary attack rates. Development of methodology to infer contact networks from survey data could improve these models and estimation methods. We contribute to this area by developing a model of within-household social contacts and using it to analyze the Belgian POLYMOD data set, which contains detailed diaries of social contacts in a 24-hour period. We model dependency in contact behavior through a latent variable indicating which household members are at home. We estimate age-specific probabilities of being at home and age-specific probabilities of contact conditional on two members being at home. Our results differ from the standard random mixing assumption. In addition, we find that the probability that all members contact each other on a given day is fairly low: 0.49 for households with two 0-5 year olds and two 19-35 year olds, and 0.36 for households with two 12-18 year olds and two 36+ year olds. We find higher contact rates in households with 2-3 members, helping explain the higher influenza secondary attack rates found in households of this size.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号