首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 699 毫秒
1.
We propose a semiparametric modeling approach for mixtures of symmetric distributions. The mixture model is built from a common symmetric density with different components arising through different location parameters. This structure ensures identifiability for mixture components, which is a key feature of the model as it allows applications to settings where primary interest is inference for the subpopulations comprising the mixture. We focus on the two-component mixture setting and develop a Bayesian model using parametric priors for the location parameters and for the mixture proportion, and a nonparametric prior probability model, based on Dirichlet process mixtures, for the random symmetric density. We present an approach to inference using Markov chain Monte Carlo posterior simulation. The performance of the model is studied with a simulation experiment and through analysis of a rainfall precipitation data set as well as with data on eruptions of the Old Faithful geyser.  相似文献   

2.
Summary.  Functional magnetic resonance imaging (FMRI) measures the physiological response of the human brain to experimentally controlled stimulation. In a periodically designed experiment it is of interest to test for a difference in the timing (phase shift) of the response between two anatomically distinct brain regions. We suggest two tests for an interregional difference in phase shift: one based on asymptotic theory and one based on bootstrapping. Whilst the two procedures differ in some of their assumptions, both tests rely on employing the large number of voxels (three-dimensional pixels) in non-activated brain regions to take account of spatial autocorrelation between voxelwise phase shift observations within the activated regions of interest. As an example we apply both tests, and their counterparts assuming spatial independence, to FMRI phase shift data that were acquired from a normal young woman during performance of a periodically designed covert verbal fluency task. We conclude that it is necessary to take account of spatial autocovariance between voxelwise FMRI time series parameter estimates such as the phase shift, and that the most promising way of achieving this is by modelling the spatial autocorrelation structure from a suitably defined base region of the image slice.  相似文献   

3.
Incorporation of historical controls using semiparametric mixed models   总被引:1,自引:0,他引:1  
The analysis of animal carcinogenicity data is complicated by various statistical issues. A topic of recent debate is how to control for the effect of the animal's body weight on the outcome of interest, the onset of tumours. We propose a method which incorporates historical information from the control animals in previously conducted experiments. We allow non-linearity in the effects of body weight by modelling the relationship nonparametrically through a penalized spline. A simple extension of the penalized spline model allows the relationship between weight and the onset of tumour to vary from one experiment to another.  相似文献   

4.
In sample surveys and many other areas of application, the ratio of variables is often of great importance. This often occurs when one variable is available at the population level while another variable of interest is available for sample data only. In this case, using the sample ratio, we can often gather valuable information on the variable of interest for the unsampled observations. In many other studies, the ratio itself is of interest, for example when estimating proportions from a random number of observations. In this note we compare three confidence intervals for the population ratio: A large sample interval, a log based version of the large sample interval, and Fieller’s interval. This is done through data analysis and through a small simulation experiment. The Fieller method has often been proposed as a superior interval for small sample sizes. We show through a data example and simulation experiments that Fieller’s method often gives nonsensical and uninformative intervals when the observations are noisy relative to the mean of the data. The large sample interval does not similarly suffer and thus can be a more reliable method for small and large samples.  相似文献   

5.
This article considers the case where two surveys collect data on a common variable, with one survey being much smaller than the other. The smaller survey collects data on an additional variable of interest, related to the common variable collected in the two surveys, and out-of-scope with respect to the larger survey. Estimation of the two related variables is of interest at domains defined at a granular level. We propose a multilevel model for integrating data from the two surveys, by reconciling survey estimates available for the common variable, accounting for the relationship between the two variables, and expanding estimation for the other variable, for all the domains of interest. The model is specified as a hierarchical Bayes model for domain-level survey data, and posterior distributions are constructed for the two variables of interest. A synthetic estimation approach is considered as an alternative to the hierarchical modelling approach. The methodology is applied to wage and benefits estimation using data from the National Compensation Survey and the Occupational Employment Statistics Survey, available from the Bureau of Labor Statistics, Department of Labor, United States.  相似文献   

6.
Summary. There is currently great interest in understanding the way in which recombination rates vary, over short scales, across the human genome. Aside from inherent interest, an understanding of this local variation is essential for the sensible design and analysis of many studies aimed at elucidating the genetic basis of common diseases or of human population histories. Standard pedigree-based approaches do not have the fine scale resolution that is needed to address this issue. In contrast, samples of deoxyribonucleic acid sequences from unrelated chromosomes in the population carry relevant information, but inference from such data is extremely challenging. Although there has been much recent interest in the development of full likelihood inference methods for estimating local recombination rates from such data, they are not currently practicable for data sets of the size being generated by modern experimental techniques. We introduce and study two approximate likelihood methods. The first, a marginal likelihood, ignores some of the data. A careful choice of what to ignore results in substantial computational savings with virtually no loss of relevant information. For larger sequences, we introduce a 'composite' likelihood, which approximates the model of interest by ignoring certain long-range dependences. An informal asymptotic analysis and a simulation study suggest that inference based on the composite likelihood is practicable and performs well. We combine both methods to reanalyse data from the lipoprotein lipase gene, and the results seriously question conclusions from some earlier studies of these data.  相似文献   

7.
Summary.  Phage display is a biological process that is used to screen random peptide libraries for ligands that bind to a target of interest with high affinity. On the basis of a count data set from an innovative multistage phage display experiment, we propose a class of Bayesian mixture models to cluster peptide counts into three groups that exhibit different display patterns across stages. Among the three groups, the investigators are particularly interested in that with an ascending display pattern in the counts, which implies that the peptides are likely to bind to the target with strong affinity. We apply a Bayesian false discovery rate approach to identify the peptides with the strongest affinity within the group. A list of peptides is obtained, among which important ones with meaningful functions are further validated by biologists. To examine the performance of the Bayesian model, we conduct a simulation study and obtain desirable results.  相似文献   

8.
Some experiences with the use of student projects in experimental design courses at Wisconsin are described. Each student is given the opportunity of selecting a problem of direct interest to him/her, designing and performing an experiment, collecting and analyzing the data. Some ideas with regard to pedagogy and the use of simulated data are also discussed.  相似文献   

9.
We use Bayesian methods to infer an unobserved function that is convolved with a known kernel. Our method is based on the assumption that the function of interest is a Gaussian process and, assuming a particular correlation structure, the resulting convolution is also a Gaussian process. This fact is used to obtain inferences regarding the unobserved process, effectively providing a deconvolution method. We apply the methodology to the problem of estimating the parameters of an oil reservoir from well-test pressure data. Here, the unknown process describes the structure of the well. Applications to data from Mexican oil wells show very accurate results.  相似文献   

10.
For right-censored data, the accelerated failure time (AFT) model is an alternative to the commonly used proportional hazards regression model. It is a linear model for the (log-transformed) outcome of interest, and is particularly useful for censored outcomes that are not time-to-event, such as laboratory measurements. We provide a general and easily computable definition of the R2 measure of explained variation under the AFT model for right-censored data. We study its behavior under different censoring scenarios and under different error distributions; in particular, we also study its robustness when the parametric error distribution is misspecified. Based on Monte Carlo investigation results, we recommend the log-normal distribution as a robust error distribution to be used in practice for the parametric AFT model, when the R2 measure is of interest. We apply our methodology to an alcohol consumption during pregnancy data set from Ukraine.  相似文献   

11.
The author considers density estimation from contaminated data where the measurement errors come from two very different sources. A first error, of Berkson type, is incurred before the experiment: the variable X of interest is unobservable and only a surrogate can be measured. A second error, of classical type, is incurred after the experiment: the surrogate can only be observed with measurement error. The author develops two nonparametric estimators of the density of X, valid whenever Berkson, classical or a mixture of both errors are present. Rates of convergence of the estimators are derived and a fully data‐driven procedure is proposed. Finite sample performance is investigated via simulations and on a real data example.  相似文献   

12.
Data on functional disability are of widespread policy interest in the United States, especially with respect to planning for Medicare and Social Security for a growing population of elderly adults. We consider an extract of functional disability data from the National Long Term Care Survey (NLTCS) and attempt to develop disability profiles using variations of the Grade of Membership (GoM) model. We first describe GoM as an individual-level mixture model that allows individuals to have partial membership in several mixture components simultaneously. We then prove the equivalence between individual-level and population-level mixture models, and use this property to develop a Markov Chain Monte Carlo algorithm for Bayesian estimation of the model. We use our approach to analyze functional disability data from the NLTCS.  相似文献   

13.
Abstract. In geophysical and environmental problems, it is common to have multiple variables of interest measured at the same location and time. These multiple variables typically have dependence over space (and/or time). As a consequence, there is a growing interest in developing models for multivariate spatial processes, in particular, the cross‐covariance models. On the other hand, many data sets these days cover a large portion of the Earth such as satellite data, which require valid covariance models on a globe. We present a class of parametric covariance models for multivariate processes on a globe. The covariance models are flexible in capturing non‐stationarity in the data yet computationally feasible and require moderate numbers of parameters. We apply our covariance model to surface temperature and precipitation data from an NCAR climate model output. We compare our model to the multivariate version of the Matérn cross‐covariance function and models based on coregionalization and demonstrate the superior performance of our model in terms of AIC (and/or maximum loglikelihood values) and predictive skill. We also present some challenges in modelling the cross‐covariance structure of the temperature and precipitation data. Based on the fitted results using full data, we give the estimated cross‐correlation structure between the two variables.  相似文献   

14.
Summary.  An important question within industrial statistics is how to find operating conditions that achieve some goal for the mean of a characteristic of interest while simultaneously minimizing the characteristic's process variance. Often, people refer to this kind of situation as the robust parameter design problem. The robust parameter design literature is rich with ways to create separate models for the mean and variance from this type of experiment. Many times time and/or cost constraints force certain factors of interest to be much more difficult to change than others. An appropriate approach to such an experiment restricts the randomization, which leads to a split-plot structure. The paper modifies the central composite design to allow the estimation of separate models for the characteristic's mean and variances under a split-plot structure. The paper goes on to discuss an appropriate analysis of the experimental results. It illustrates the methodology with an industrial experiment involving a chemical vapour deposition process for the manufacture of silicon wafers. The methodology was used to achieve a silicon layer thickness value of 485 Å while minimizing the process variation.  相似文献   

15.
In some applications, the failure time of interest is the time from an originating event to a failure event while both event times are interval censored. We propose fitting Cox proportional hazards models to this type of data using a spline‐based sieve maximum marginal likelihood, where the time to the originating event is integrated out in the empirical likelihood function of the failure time of interest. This greatly reduces the complexity of the objective function compared with the fully semiparametric likelihood. The dependence of the time of interest on time to the originating event is induced by including the latter as a covariate in the proportional hazards model for the failure time of interest. The use of splines results in a higher rate of convergence of the estimator of the baseline hazard function compared with the usual non‐parametric estimator. The computation of the estimator is facilitated by a multiple imputation approach. Asymptotic theory is established and a simulation study is conducted to assess its finite sample performance. It is also applied to analyzing a real data set on AIDS incubation time.  相似文献   

16.
When previous results are available about quantities of interest in a designed experiment they should be incorporated into the analysis. We suppose that estimates of treatment effects and variance components and their precisions are available from previous data but not necessarily the full data. A prior-posterior method is described which incorporates these previous estimates directly into the analysis of a current set of data. General but concise formula are derived for the class of generally balanced designs. Previous results are represented by a multivariate normal prior for the treatment means and independent inverse chi-squared distributions for the variance components. Joint and marginal posterior modes and a measure of dispersion are proposed as combined or updated estimates. These posterior summary statistics have highly interpretable forms and are readily computed.  相似文献   

17.
In this paper, we examine a method for analyzing competing risks data where the failure type of interest is missing or incomplete, but where there is an intermediate event, and only patients who experience the intermediate event can die of the cause of interest. In some applications, a method called “log-rank subtraction” has been applied to these problems. There has been no systematic study of this methodology, though. We investigate the statistical properties of the method and further propose a modified method by including a weight function in the construction of the test statistic to correct for potential biases. A class of tests is then proposed for comparing the disease-specific mortality in the two groups. The tests are based on comparing the difference of weighted log-rank scores for the failure type of interest. We derive the asymptotic properties for the modified test procedure. Simulation studies indicate that the tests are unbiased and have reasonable power. The results are also illustrated with data from a breast cancer study.  相似文献   

18.
ABSTRACT

We propose a multiple imputation method based on principal component analysis (PCA) to deal with incomplete continuous data. To reflect the uncertainty of the parameters from one imputation to the next, we use a Bayesian treatment of the PCA model. Using a simulation study and real data sets, the method is compared to two classical approaches: multiple imputation based on joint modelling and on fully conditional modelling. Contrary to the others, the proposed method can be easily used on data sets where the number of individuals is less than the number of variables and when the variables are highly correlated. In addition, it provides unbiased point estimates of quantities of interest, such as an expectation, a regression coefficient or a correlation coefficient, with a smaller mean squared error. Furthermore, the widths of the confidence intervals built for the quantities of interest are often smaller whilst ensuring a valid coverage.  相似文献   

19.
For high-dimensional data, it is a tedious task to determine anomalies such as outliers. We present a novel outlier detection method for high-dimensional contingency tables. We use the class of decomposable graphical models to model the relationship among the variables of interest, which can be depicted by an undirected graph called the interaction graph. Given an interaction graph, we derive a closed-form expression of the likelihood ratio test (LRT) statistic and an exact distribution for efficient simulation of the test statistic. An observation is declared an outlier if it deviates significantly from the approximated distribution of the test statistic under the null hypothesis. We demonstrate the use of the LRT outlier detection framework on genetic data modeled by Chow–Liu trees.  相似文献   

20.
In the analysis of retrospective data or when interpreting results from a single-arm phase II clinical trial relative to historical data, it is often of interest to show plots summarizing time-to-event outcomes comparing treatment groups. If the groups being compared are imbalanced with respect to factors known to influence outcome, these plots can be misleading and seemingly incompatible with results obtained from a regression model that accounts for these imbalances. We consider ways in which covariate information can be used to obtain adjusted curves for time-to-event outcomes. We first review a common model-based method and then suggest another model-based approach that is not as reliant on model assumptions. Finally, an approach that is partially model free is suggested. Each method is applied to an example from hematopoietic cell transplantation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号