首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We describe estimation, learning, and prediction in a treatment-response model with two outcomes. The introduction of potential outcomes in this model introduces four cross-regime correlation parameters that are not contained in the likelihood for the observed data and thus are not identified. Despite this inescapable identification problem, we build upon the results of Koop and Poirier (1997) to describe how learning takes place about the four nonidentified correlations through the imposed positive definiteness of the covariance matrix. We then derive bivariate distributions associated with commonly estimated “treatment parameters” (including the Average Treatment Effect and effect of Treatment on the Treated), and use the learning that takes place about the nonidentified correlations to calculate these densities. We illustrate our points in several generated data experiments and apply our methods to estimate the joint impact of child labor on achievement scores in language and mathematics.  相似文献   

2.
When historical data are available, incorporating them in an optimal way into the current data analysis can improve the quality of statistical inference. In Bayesian analysis, one can achieve this by using quality-adjusted priors of Zellner, or using power priors of Ibrahim and coauthors. These rules are constructed by raising the prior and/or the sample likelihood to some exponent values, which act as measures of compatibility of their quality or proximity of historical data to current data. This paper presents a general, optimum procedure that unifies these rules and is derived by minimizing a Kullback–Leibler divergence under a divergence constraint. We show that the exponent values are directly related to the divergence constraint set by the user and investigate the effect of this choice theoretically and also through sensitivity analysis. We show that this approach yields ‘100% efficient’ information processing rules in the sense of Zellner. Monte Carlo experiments are conducted to investigate the effect of historical and current sample sizes on the optimum rule. Finally, we illustrate these methods by applying them on real data sets.  相似文献   

3.
High-content automated imaging platforms allow the multiplexing of several targets simultaneously to generate multi-parametric single-cell data sets over extended periods of time. Typically, standard simple measures such as mean value of all cells at every time point are calculated to summarize the temporal process, resulting in loss of time dynamics of the single cells. Multiple experiments are performed but observation time points are not necessarily identical, leading to difficulties when integrating summary measures from different experiments. We used functional data analysis to analyze continuous curve data, where the temporal process of a response variable for each single cell can be described using a smooth curve. This allows analyses to be performed on continuous functions, rather than on original discrete data points. Functional regression models were applied to determine common temporal characteristics of a set of single cell curves and random effects were employed in the models to explain variation between experiments. The aim of the multiplexing approach is to simultaneously analyze the effect of a large number of compounds in comparison to control to discriminate between their mode of action. Functional principal component analysis based on T-statistic curves for pairwise comparison to control was used to study time-dependent compound effects.  相似文献   

4.
Summary.  Genetic polymorphisms in deoxyribonucleic acid coding regions may have a phenotypic effect on the carrier, e.g. by influencing susceptibility to disease. Detection of deleterious mutations via association studies is hampered by the large number of candidate sites; therefore methods are needed to narrow down the search to the most promising sites. For this, a possible approach is to use structural and sequence-based information of the encoded protein to predict whether a mutation at a particular site is likely to disrupt the functionality of the protein itself. We propose a hierarchical Bayesian multivariate adaptive regression spline (BMARS) model for supervised learning in this context and assess its predictive performance by using data from mutagenesis experiments on lac repressor and lysozyme proteins. In these experiments, about 12 amino-acid substitutions were performed at each native amino-acid position and the effect on protein functionality was assessed. The training data thus consist of repeated observations at each position, which the hierarchical framework is needed to account for. The model is trained on the lac repressor data and tested on the lysozyme mutations and vice versa. In particular, we show that the hierarchical BMARS model, by allowing for the clustered nature of the data, yields lower out-of-sample misclassification rates compared with both a BMARS and a frequen-tist MARS model, a support vector machine classifier and an optimally pruned classification tree.  相似文献   

5.
In this article we describe methods for obtaining the predictive distributions of outcome gains in the framework of a standard latent variable selection model. Although most previous work has focused on estimation of mean treatment parameters as the method for characterizing outcome gains from program participation, we show how the entire distributions associated with these gains can be obtained in certain situations. Although the out-of-sample outcome gain distributions depend on an unidentified parameter, we use the results of Koop and Poirier to show that learning can take place about this parameter through information contained in the identified parameters via a positive definiteness restriction on the covariance matrix. In cases where this type of learning is not highly informative, the spread of the predictive distributions depends more critically on the prior. We show both theoretically and in extensive generated data experiments how learning occurs, and delineate the sensitivity of our results to the prior specifications. We relate our analysis to three treatment parameters widely used in the evaluation literature—the average treatment effect, the effect of treatment on the treated, and the local average treatment effect—and show how one might approach estimation of the predictive distributions associated with these outcome gains rather than simply the estimation of mean effects. We apply these techniques to predict the effect of literacy on the weekly wages of a sample of New Jersey child laborers in 1903.  相似文献   

6.
Abstract

Teratological experiments are controlled dose-response studies in which impregnated animals are randomly assigned to various exposure levels of a toxic substance. Subsequently, both continuous and discrete responses are recorded on the litters of fetuses that these animals produce. Discrete responses are usually binary in nature, such as the presence or absence of some fetal anomaly. This clustered binary data usually exhibits over-dispersion (or under-dispersion), which can be interpreted as either variation between litter response probabilities or intralitter correlation. To model the correlation and/or variation, the beta-binomial distribution has been assumed for the number of positive fetal responses within a litter. Although the mean of the beta-binomial model has been linked to dose-response functions, in terms of measuring over-dispersion, it may be a restrictive method in modeling data from teratological studies. Also for certain toxins, a threshold effect has been observed in the dose-response pattern of the data. We propose to incorporate a random effect into a general threshold dose-response model to account for the variation in responses, while at the same time estimating the threshold effect. We fit this model to a well-known data set in the field of teratology. Simulation studies are performed to assess the validity of the random effects threshold model in these types of studies.  相似文献   

7.
Abstract

In continuous-time capture-recapture experiments, individual heterogeneity has a large effect on the capture probability. To account for the heterogeneity, we consider an individual covariate, which is categorical and subject to missing. In this article, we develop a general model to summarize three kinds of missing mechanisms, and propose a maximum likelihood estimator of the abundance. A likelihood ratio confidence interval of the abundance is also proposed. We illustrate the proposed methods by simulation studies and a real data example of a bird species prinia subflava in Hong Kong.  相似文献   

8.
The recent literature on time series has developed a lot of models for the analysis of the dynamic conditional correlation, involving the same variable observed in different locations; very often, in this framework, the consideration of the spatial interactions is omitted. We propose to extend a time-varying conditional correlation model (following an autoregressive moving average dynamics) to include the spatial effects, with a specification depending on the local spatial interactions. The spatial part is based on a fixed symmetric weight matrix, called Gaussian kernel matrix, but its effect will vary along the time depending on the degree of time correlation in a certain period. We show the theoretical aspects, with the support of simulation experiments, and apply this methodology to two space–time data sets, in a demographic and a financial framework, respectively.  相似文献   

9.
Recent advances in technology have allowed researchers to collect large scale complex biological data, simultaneously, often in matrix format. In genomic studies, for instance, measurements from tens to hundreds of thousands of genes are taken from individuals across several experimental groups. In time course microarray experiments, gene expression is measured at several time points for each individual across the whole genome resulting in a high-dimensional matrix for each gene. In such experiments, researchers are faced with high-dimensional longitudinal data. Unfortunately, traditional methods for longitudinal data are not appropriate for high-dimensional situations. In this paper, we use the growth curve model and introduce test useful for high-dimensional longitudinal data and evaluate its performance using simulations. We also show how our approach can be used to filter genes in time course genomic experiments. We illustrate this using publicly available genomic data, involving experiments comparing normal human lung tissue with vanadium pentoxide treated human lung tissue, designed with the aim of understanding the susceptibility of individuals working in petro-chemical factories to airway re-modelling. Using our method, we were able to filter out 1053 (about 5 %) genes as non-noise genes from a pool of  22,277. Although our focus is on hypothesis testing, we also provided modified maximum likelihood estimator for the mean parameter of the growth curve model and assessed its performance through bias and mean squared error.  相似文献   

10.
Research involving a clinical intervention is normally aimed at testing the treatment effects on a dependent variable, which is assumed to be a relevant indicator of health or quality-of-life status. In much clinical research large-n trials are in fact impractical because the availability of individuals within well-defined categories is limited in this application field. This makes it more and more important to concentrate on single-case experiments. The goal with these is to investigate the presence of a difference in the effect of the treatments considered in the study. In this setting, valid inference generally cannot be made using the parametric statistical procedures that are typically used for the analysis of clinical trials and other large-n designs. Hence, nonparametric tools can be a valid alternative to analyze this kind of data. We propose a permutation solution to assess treatment effects in single-case experiments within alternation designs. An extension to the case of more than two treatments is also presented. A simulation study shows that the approach is both reliable under the null hypothesis and powerful under the alternative, and that it improves the performance of a considered competitor. In the end, we present the results of a real case application.  相似文献   

11.
Summary.  In microarray experiments, accurate estimation of the gene variance is a key step in the identification of differentially expressed genes. Variance models go from the too stringent homoscedastic assumption to the overparameterized model assuming a specific variance for each gene. Between these two extremes there is some room for intermediate models. We propose a method that identifies clusters of genes with equal variance. We use a mixture model on the gene variance distribution. A test statistic for ranking and detecting differentially expressed genes is proposed. The method is illustrated with publicly available complementary deoxyribonucleic acid microarray experiments, an unpublished data set and further simulation studies.  相似文献   

12.
Summary.  The paper extends the susceptible–exposed–infective–removed model to handle heterogeneity introduced by spatially arranged populations, biologically plausible distributional assumptions and incorporation of observations from additional diagnostic tests. These extensions are motivated by a desire to analyse disease transmission experiments in a more detailed fashion than before. Such experiments are performed by veterinarians to gain knowledge about the dynamics of an infectious disease. By fitting our spatial susceptible–exposed–infective–removed with diagnostic testing model to data for a specific disease and production environment a valuable decision support tool is obtained, e.g. when evaluating on-farm control measures. Partial observability of the epidemic process is an inherent problem when trying to estimate model parameters from experimental data. We therefore extend existing work on Markov chain Monte Carlo estimation in partially observable epidemics to the multitype epidemic set-up of our model. Throughout the paper, data from a Belgian classical swine fever virus transmission experiment are used as a motivating example.  相似文献   

13.
We consider bivariate current status data with death which often occur in animal tumorigenicity experiments. Instead of observing exact tumor onset time, the existence of tumor is known at death time or sacrifice time. Such an incomplete data structure makes it difficult to investigate the effect of treatment on tumor onset times. Furthermore, when tumor onsets occur at two sites, information for the order of their onsets is unknown. A multistate model is applied to incorporate the sequential occurrence of events. For the inference of parameters, an EM algorithm is applied and a real NTP (National Toxicology Program) dataset is analyzed as an illustrative example.  相似文献   

14.
Incorporation of historical controls using semiparametric mixed models   总被引:1,自引:0,他引:1  
The analysis of animal carcinogenicity data is complicated by various statistical issues. A topic of recent debate is how to control for the effect of the animal's body weight on the outcome of interest, the onset of tumours. We propose a method which incorporates historical information from the control animals in previously conducted experiments. We allow non-linearity in the effects of body weight by modelling the relationship nonparametrically through a penalized spline. A simple extension of the penalized spline model allows the relationship between weight and the onset of tumour to vary from one experiment to another.  相似文献   

15.
We propose a new summary tool, so-called average predictive comparison (APC), which summarizes the effect of a particular predictor in a context of regression. Different from the definition in our earlier work (Liu and Gustafson, 2008), the new definition allows a pointwise evaluation of a predictor's effect for any given value of this predictor. We employ this summary tool to examine the consequence of erroneously omitting interactions in regression models. To be able to involve curved relationships between a response variable and predictors, we consider fractional polynomial regression models (Royston and Altman, 1994). We derive the asymptotic properties of the APC estimates under a general setting with p(≥2)p(2) predictors involved. In particular, when there are only two predictors of interest, we find out that the APC estimator is robust to the model misspecification under some certain conditions. We illustrate the application of the proposed summary tool via a real data example. We also conduct simulation experiments to further check the performance of the APC estimates.  相似文献   

16.
Scientific experiments commonly result in clustered discrete and continuous data. Existing methods for analyzing such data include the use of quasi-likelihood procedures and generalized estimating equations to estimate marginal mean response parameters. In applications to areas such as developmental toxicity studies, where discrete and continuous measurements are recorded on each fetus, or clinical ophthalmologic trials, where different types of observations are made on each eye, the assumption that data within cluster are exchangeable is often very reasonable. We use this assumption to formulate fully parametric regression models for clusters of bivariate data with binary and continuous components. The regression models proposed have marginal interpretations and reproducible model structures. Tractable expressions for likelihood equations are derived and iterative schemes are given for computing efficient estimates (MLEs) of the marginal mean, correlations, variances and higher moments. We demonstrate the use the ‘exchangeable’ procedure with an application to a developmental toxicity study involving fetal weight and malformation data.  相似文献   

17.
Randomized and natural experiments are commonly used in economics and other social science fields to estimate the effect of programs and interventions. Even when employing experimental data, assessing the impact of a treatment is often complicated by the presence of sample selection (outcomes are only observed for a selected group) and noncompliance (some treatment group individuals do not receive the treatment while some control individuals do). We address both of these identification problems simultaneously and derive nonparametric bounds for average treatment effects within a principal stratification framework. We employ these bounds to empirically assess the wage effects of Job Corps (JC), the most comprehensive and largest federally funded job training program for disadvantaged youth in the United States. Our results strongly suggest positive average effects of JC on wages for individuals who comply with their treatment assignment and would be employed whether or not they enrolled in JC (the “always-employed compliers”). Under relatively weak monotonicity and mean dominance assumptions, we find that this average effect is between 5.7% and 13.9% 4 years after randomization, and between 7.7% and 17.5% for non-Hispanics. Our results are consistent with larger effects of JC on wages than those found without adjusting for noncompliance.  相似文献   

18.
The design of infectious disease studies has received little attention because they are generally viewed as observational studies. That is, epidemic and endemic disease transmission happens and we observe it. We argue here that statistical design often provides useful guidance for such studies with regard to type of data and the size of the data set to be collected. It is shown that data on disease transmission in part of the community enables the estimation of central parameters and it is possible to compute the sample size required to make inferences with a desired precision. We illustrate this for data on disease transmission in a single community of uniformly mixing individuals and for data on outbreak sizes in households. Data on disease transmission is usually incomplete and this creates an identifiability problem for certain parameters of multitype epidemic models. We identify designs that can overcome this problem for the important objective of estimating parameters that help to assess the effectiveness of a vaccine. With disease transmission in animal groups there is greater scope for conducting planned experiments and we explore some possibilities for such experiments. The topic is largely unexplored and numerous open research problems in the area of statistical design of infectious disease data are mentioned.  相似文献   

19.
Crossover designs, or repeated measurements designs, are used for experiments in which t treatments are applied to each of n experimental units successively over p time periods. Such experiments are widely used in areas such as clinical trials, experimental psychology and agricultural field trials. In addition to the direct effect on the response of the treatment in the period of application, there is also the possible presence of a residual, or carry-over, effect of a treatment from one or more previous periods. We use a model in which the residual effect from a treatment depends upon the treatment applied in the succeeding period; that is, a model which includes interactions between the treatment direct and residual effects. We assume that residual effects do not persist further than one succeeding period.A particular class of strongly balanced repeated measurements designs with n=t2 units and which are uniform on the periods is examined. A lower bound for the A-efficiency of the designs for estimating the direct effects is derived and it is shown that such designs are highly efficient for any number of periods p=2,…,2t.  相似文献   

20.
The statistical inference problem on effect size indices is addressed using a series of independent two-armed experiments from k arbitrary populations. The effect size parameter simply quantifies the difference between two groups. It is a meaningful index to be used when data are measured on different scales. In the context of bivariate statistical models, we define estimators of the effect size indices and propose large sample testing procedures to test the homogeneity of these indices. The null and non-null distributions of the proposed testing procedures are derived and their performance is evaluated via Monte Carlo simulation. Further, three types of interval estimation of the proposed indices are considered for both combined and uncombined data. Lower and upper confidence limits for the actual effect size indices are obtained and compared via bootstrapping. It is found that the length of the intervals based on the combined effect size estimator are almost half the length of the intervals based on the uncombined effect size estimators. Finally, we illustrate the proposed procedures for hypothesis testing and interval estimation using a real data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号