期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

HIERARCHICAL BAYESIAN MODELLING OF SOCIAL VARIATION IN THE AGE DEPENDENCE OF DISABILITY PREVALENCE

Patrick Graham 《Australian & New Zealand Journal of Statistics》2005,47(4):401-424

Motivated by a study of social variation in the relationship of functional limitation prevalence to age, this paper examines methods for modelling social variation in health outcomes. It is argued that, from a Bayesian perspective, modelling the dependence of functional limitation prevalence on age separately for each social group, corresponds to an implausible prior model, in addition to leading to imprecise estimates for some groups. The alternative strategy of fitting a single model, perhaps including some age‐by‐group interactions but omitting higher‐order interactions, requires a strong prior commitment to the absence of such effects. Hierarchical Bayesian modelling is proposed as a compromise between these two analytical approaches. Under all hierarchical Bayes analyses there is strong evidence for an ethnic group difference in limitation prevalence in early‐ to mid‐adulthood among tertiary‐qualified males. In contrast, the single‐model approach largely misses this effect, while the group‐specific analyses exhibit an unrealistically large degree of heterogeneity in gender‐education‐specific ethnicity effects. The sensitivity of posterior inferences to prior specifications is studied. 相似文献

2.

Partially replicated two‐level fractional factorial designs

Chen‐Tuo Liao Feng‐Shun Chai 《Revue canadienne de statistique》2004,32(4):421-438

In a two‐level factorial experiment, the authors consider designs with partial duplication which permit estimation of the constant term, all main effects and some specified two‐factor interactions, assuming that the other effects are negligible. They construct parallel‐flats designs with two identical parallel flats that meet prior specifications; they also consider classes of 3‐flat and 4‐flat designs. They show that the designs obtained can have a very simple covariance structure and high D‐efficiency. They give an algorithm from which they generate a series of practical designs with run sizes 12, 16, 24, and 32. 相似文献

3.

A SEMIPARAMETRIC BAYESIAN APPROACH TO NETWORK MODELLING USING DIRICHLET PROCESS PRIOR DISTRIBUTIONS

Pulak Ghosh Paramjit Gill Saman Muthukumarana Tim Swartz 《Australian & New Zealand Journal of Statistics》2010,52(3):289-302

This paper considers the use of Dirichlet process prior distributions in the statistical analysis of network data. Dirichlet process prior distributions have the advantages of avoiding the parametric specifications for distributions, which are rarely known, and of facilitating a clustering effect, which is often applicable to network nodes. The approach is highlighted for two network models and is conveniently implemented using WinBUGS software. 相似文献

4.

A Graphical Diagnostic for Identifying Influential Model Choices in Bayesian Hierarchical Models

IDA SCHEEL PETER J. GREEN JONATHAN C. ROUGIER 《Scandinavian Journal of Statistics》2011,38(3):529-550

Abstract. Real‐world phenomena are frequently modelled by Bayesian hierarchical models. The building‐blocks in such models are the distribution of each variable conditional on parent and/or neighbour variables in the graph. The specifications of centre and spread of these conditional distributions may be well motivated, whereas the tail specifications are often left to convenience. However, the posterior distribution of a parameter may depend strongly on such arbitrary tail specifications. This is not easily detected in complex models. In this article, we propose a graphical diagnostic, the Local critique plot, which detects such influential statistical modelling choices at the node level. It identifies the properties of the information coming from the parents and neighbours (the local prior) and from the children and co‐parents (the lifted likelihood) that are influential on the posterior distribution, and examines local conflict between these distinct information sources. The Local critique plot can be derived for all parameters in a chain graph model. 相似文献

5.

A Distribution-free Approach in Statistical Modelling with Repeated Measurements and Missing Values

Nan Wang 《统计学通讯:理论与方法》2014,43(8):1686-1697

Mixed effect models, which contain both fixed effects and random effects, are frequently used in dealing with correlated data arising from repeated measurements (made on the same statistical units). In mixed effect models, the distributions of the random effects need to be specified and they are often assumed to be normal. The analysis of correlated data from repeated measurements can also be done with GEE by assuming any type of correlation as initial input. Both mixed effect models and GEE are approaches requiring distribution specifications (likelihood, score function). In this article, we consider a distribution-free least square approach under a general setting with missing value allowed. This approach does not require the specifications of the distributions and initial correlation input. Consistency and asymptotic normality of the estimation are discussed. 相似文献

6.

Investigating the sensitivity of Gaussian processes to the choice of their correlation function and prior specifications

《Journal of Statistical Computation and Simulation》2012,82(8):681-699

A Gaussian process (GP) can be thought of as an infinite collection of random variables with the property that any subset, say of dimension n, of these variables have a multivariate normal distribution of dimension n, mean vector β and covariance matrix Σ [O'Hagan, A., 1994, Kendall's Advanced Theory of Statistics, Vol. 2B, Bayesian Inference (John Wiley & Sons, Inc.)]. The elements of the covariance matrix are routinely specified through the multiplication of a common variance by a correlation function. It is important to use a correlation function that provides a valid covariance matrix (positive definite). Further, it is well known that the smoothness of a GP is directly related to the specification of its correlation function. Also, from a Bayesian point of view, a prior distribution must be assigned to the unknowns of the model. Therefore, when using a GP to model a phenomenon, the researcher faces two challenges: the need of specifying a correlation function and a prior distribution for its parameters. In the literature there are many classes of correlation functions which provide a valid covariance structure. Also, there are many suggestions of prior distributions to be used for the parameters involved in these functions. We aim to investigate how sensitive the GPs are to the (sometimes arbitrary) choices of their correlation functions. For this, we have simulated 25 sets of data each of size 64 over the square [0, 5]×[0, 5] with a specific correlation function and fixed values of the GP's parameters. We then fit different correlation structures to these data, with different prior specifications and check the performance of the adjusted models using different model comparison criteria. 相似文献

7.

Alternative imputation techniques for complex metric variables

Seppo Laaksonen 《Journal of applied statistics》2003,30(9):1009-1020

This paper deals with imputation techniques and strategies. Usually, imputation truly commences after the first data editing, but many preceding operations are needed before that. In this editing step, the missing or deficient items are to be recognized and coded, and then it is decided which of these, if any, should be substituted by imputing. There are a number of imputation methods and their specifications. Consequently, it is not clear what method finally should be chosen, especially when an imputation method may be best in one respect, and another method in the other. In this paper, we consider these questions through the following four imputation methods: (i) random hot decking, (ii) logistic regression imputation, (iii) linear regression imputation, and (iv) regression-based nearest neighbour hot decking. The last two methods are applied with the two different specifications. The two metric variables have been used in empirical tests. The first is very complex, but the second is more ordinary, and thus easier to handle. The empirical examples are based on simulations, which clearly show the biases of the various methods and their specifications. In general, it seems that method (iv) is recommendable although the results from it are not perfect either. 相似文献

8.

Systematic Effects of Capital Service Price Definition on Perceptions of Input Substitution

Michael Hazilla Raymond J. Kopp 《商业与经济统计学杂志》2013,31(2):209-224

Earlier attempts at reconciling disparate substitution elasticity estimates examined differences in separability hypotheses, data bases, and estimation techniques, as well as methods employed to construct capital service prices. Although these studies showed that differences in elasticity estimates between two or three studies may be attributable to the aforementioned features of the econometric models, they have been unable to demonstrate this link statistically and establish the existence of systematic relationships between features of the econometric models and the perception of production technologies generated by those models. Using sectoral data covering the entire production side of the U.S. economy, we estimate 34 production models for alternative definitions of the capital service price. We employ substitution elasticities calculated from these models as dependent variables in the statistical search for systematic relationships between features of the econometric models and perceptions of the sectoral technology as characterized by the elasticities. Statistically significant systematic effects are found between the monotonicity and concavity properties of the cost functions and service price–technical change specifications as well as between substitution elasticities. 相似文献

9.

On the Predictive Distributions of Outcome Gains in the Presence of an Unidentified Parameter

《商业与经济统计学杂志》2013,31(2):258-268

In this article we describe methods for obtaining the predictive distributions of outcome gains in the framework of a standard latent variable selection model. Although most previous work has focused on estimation of mean treatment parameters as the method for characterizing outcome gains from program participation, we show how the entire distributions associated with these gains can be obtained in certain situations. Although the out-of-sample outcome gain distributions depend on an unidentified parameter, we use the results of Koop and Poirier to show that learning can take place about this parameter through information contained in the identified parameters via a positive definiteness restriction on the covariance matrix. In cases where this type of learning is not highly informative, the spread of the predictive distributions depends more critically on the prior. We show both theoretically and in extensive generated data experiments how learning occurs, and delineate the sensitivity of our results to the prior specifications. We relate our analysis to three treatment parameters widely used in the evaluation literature—the average treatment effect, the effect of treatment on the treated, and the local average treatment effect—and show how one might approach estimation of the predictive distributions associated with these outcome gains rather than simply the estimation of mean effects. We apply these techniques to predict the effect of literacy on the weekly wages of a sample of New Jersey child laborers in 1903. 相似文献

10.

A pooled Bayes test of independence for sparse contingency tables from small areas

Balgobin Nandram Dalho Kim Jingran Zhou 《Journal of Statistical Computation and Simulation》2019,89(5):899-926

We study the association between bone mineral density (BMD) and body mass index (BMI) when contingency tables are constructed from the several U.S. counties, where BMD has three levels (normal, osteopenia and osteoporosis) and BMI has four levels (underweight, normal, overweight and obese). We use the Bayes factor (posterior odds divided by prior odds or equivalently the ratio of the marginal likelihoods) to construct the new test. Like the chi-squared test and Fisher's exact test, we have a direct Bayes test which is a standard test using data from each county. In our main contribution, for each county techniques of small area estimation are used to borrow strength across counties and a pooled test of independence of BMD and BMI is obtained using a hierarchical Bayesian model. Our pooled Bayes test is computed by performing a Monte Carlo integration using random samples rather than Gibbs samples. We have seen important differences among the pooled Bayes test, direct Bayes test and the Cressie-Read test that allows for some degree of sparseness, when the degree of evidence against independence is studied. As expected, we also found that the direct Bayes test is sensitive to the prior specifications but the pooled Bayes test is not so sensitive. Moreover, the pooled Bayes test has competitive power properties, and it is superior when the cell counts are small to moderate. 相似文献

11.

Estimating the grid of time-points for the piecewise exponential model

Demarqui FN Loschi RH Colosimo EA 《Lifetime data analysis》2008,14(3):333-356

One of the greatest challenges related to the use of piecewise exponential models (PEMs) is to find an adequate grid of time-points needed in its construction. In general, the number of intervals in such a grid and the position of their endpoints are ad-hoc choices. We extend previous works by introducing a full Bayesian approach for the piecewise exponential model in which the grid of time-points (and, consequently, the endpoints and the number of intervals) is random. We estimate the failure rates using the proposed procedure and compare the results with the non-parametric piecewise exponential estimates. Estimates for the survival function using the most probable partition are compared with the Kaplan-Meier estimators (KMEs). A sensitivity analysis for the proposed model is provided considering different prior specifications for the failure rates and for the grid. We also evaluate the effect of different percentage of censoring observations in the estimates. An application to a real data set is also provided. We notice that the posteriors are strongly influenced by prior specifications, mainly for the failure rates parameters. Thus, the priors must be fairly built, say, really disclosing the expert prior opinion. 相似文献

12.

Bayesian analysis of screening data: Application to AIDS in blood donors

Joseph L. Gastwirth Wesley O. Johnson Dana M. Reneau 《Revue canadienne de statistique》1991,19(2):135-150

The rapid increase in the number of AIDS cases during the 1980s and the spread of the disease from the high-risk groups into the general population has created widespread concern. In particular, assessing the accuracy of the screening tests used to detect antibodies to the HIV (AIDS) virus in donated blood and determining the prevalance of the disease in the population are fundamental statistical problems. Because the prevalence of AIDS varies widely by geographic region and data on the number of infected blood donors are published regularly, Bayesian methods, which utilize prior results and update them as new data become available, are quite useful. In this paper we develop a Bayesian procedure for estimating the prevalence of a rare disease, the sensitivity and specificity of the screening tests, and the predictive value of a positive or negative screening test. We apply the procedure to data on blood donors in the United States and in Canada. Our results augment those described in Gastwirth (1987) using classical methods. Indeed, we show that the inclusion of sound prior knowledge into the statistical analysis does not yield sufficiently precise estimates of the predictive value of a positive test. Hence confirmatory testing is needed to obtain reliable estimates. The emphasis of the Bayesian predictive paradigm on prediction intervals for future data yields a valuable insight. We demonstrate that using them might have detected a decline in the specificity of the most frequently used screening test earlier than it apparently was. 相似文献

13.

Estimation procedures for grouped data – a comparative study

Xun Xiao Amitava Mukherjee Min Xie 《Journal of applied statistics》2016,43(11):2110-2130

Interval-censored data are very common in the reliability and lifetime data analysis. This paper investigates the performance of different estimation procedures for a special type of interval-censored data, i.e. grouped data, from three widely used lifetime distributions. The approaches considered here include the maximum likelihood estimation, the minimum distance estimation based on chi-square criterion, the moment estimation based on imputation (IM) method and an ad hoc estimation procedure. Although IM-based techniques are extensively used recently, we show that this method is not always effective. It is found that the ad hoc estimation procedure is equivalent to the minimum distance estimation with another distance metric and more effective in the simulation. The procedures of different approaches are presented and their performances are investigated by Monte Carlo simulation for various combinations of sample sizes and parameter settings. The numerical results provide guidelines to analyse grouped data for practitioners when they need to choose a good estimation approach. 相似文献

14.

Addressing potential prior‐data conflict when using informative priors in proof‐of‐concept studies

下载免费PDF全文

Timothy Mutsvari Dominique Tytgat Rosalind Walley 《Pharmaceutical statistics》2016,15(1):28-36

Bayesian methods are increasingly used in proof‐of‐concept studies. An important benefit of these methods is the potential to use informative priors, thereby reducing sample size. This is particularly relevant for treatment arms where there is a substantial amount of historical information such as placebo and active comparators. One issue with using an informative prior is the possibility of a mismatch between the informative prior and the observed data, referred to as prior‐data conflict. We focus on two methods for dealing with this: a testing approach and a mixture prior approach. The testing approach assesses prior‐data conflict by comparing the observed data to the prior predictive distribution and resorting to a non‐informative prior if prior‐data conflict is declared. The mixture prior approach uses a prior with a precise and diffuse component. We assess these approaches for the normal case via simulation and show they have some attractive features as compared with the standard one‐component informative prior. For example, when the discrepancy between the prior and the data is sufficiently marked, and intuitively, one feels less certain about the results, both the testing and mixture approaches typically yield wider posterior‐credible intervals than when there is no discrepancy. In contrast, when there is no discrepancy, the results of these approaches are typically similar to the standard approach. Whilst for any specific study, the operating characteristics of any selected approach should be assessed and agreed at the design stage; we believe these two approaches are each worthy of consideration. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

15.

Improved population-based probability of developing cancer when direct estimates of the cancer-free population are available

Simonetti A Mariotto A Krapcho M Feuer EJ 《Lifetime data analysis》2012,18(3):284-301

Age-conditional probabilities of developing a first cancer represent the transition from being cancer-free to developing a first cancer. Natural inputs into their calculation are rates of first cancer per person-years alive and cancer-free. However these rates are not readily available because they require information on the cancer-free population. Instead rates of first cancer per person-years alive, calculated using as denominator the mid-year populations, available from census data, can be easily calculated from cancer registry data. Methods have been developed to estimate age-conditional probabilities of developing cancer based on these easily available rates per person-years alive that do not directly account for the cancer-free population. In the last few years models (Merrill et al., Int J Epidemiol 29(2):197-207, 2000; Mariotto et al., SEER Cancer Statistics Review, 2002; Clegg et al., Biometrics 58(3):684-688, 2002; Gigli et al., Stat Methods Med Res 15(3):235-253, 2006, and software (ComPrev:Complete Prevalence Software, Version 1.0, 2005) have been developed that allow estimation of cancer prevalence (DevCan: Probability of Developing or Dying of Cancer Software, Version 6.0, 2005). Estimates of population-based cancer prevalence allows for the estimation of the cancer-free population and consequently of rates per person-years alive and cancer-free. In this paper we present a method that directly estimates the age-conditional probabilities of developing a first cancer using rates per person-years alive and cancer-free obtained from prevalence estimates. We explore conditions when the previous and the new estimators give similar or different values using real data from the Surveillance, Epidemiology and End Results (SEER) program. 相似文献

16.

Analysis of bivariate binary data with possible chances of wrong ascertainment

《Journal of Statistical Computation and Simulation》2012,82(4):724-738

In familial data, ascertainment correction is often necessary to decipher genetic bases of complex human diseases. This is because families usually are not drawn at random or are not selected according to well-defined rules. While there has been much progress in identifying genes associated with a certain phenotype, little attention has been paid so far for familial studies on exploring common genetic influences on different phenotypes of interest. In this study, we develop a powerful bivariate analytical approach that can be used for a complex situation with paired binary traits. In addition, our model has been framed to accommodate the possibility of imperfect diagnosis as traits may be wrongly observed. Thus, the primary focus is to see whether a particular gene jointly influences both phenotypes. We examine the plausibility of this theory in a sample of families ascertained on the basis of at least one affected individual. We propose a bivariate binary mixed model that provides a novel and flexible way to account for wrong ascertainment in families collected with multiple cases. A hierarchical Bayesian analysis using Markov Chain Monte Carlo (MCMC) method has been carried out to investigate the effect of covariates on the disease status. Results based on simulated data indicate that estimates of the parameters are biased when classification errors and/or ascertainment are ignored. 相似文献

17.

Slice sampling mixture models

Maria Kalli Jim E. Griffin Stephen G. Walker 《Statistics and Computing》2011,21(1):93-105

We propose a more efficient version of the slice sampler for Dirichlet process mixture models described by Walker (Commun. Stat., Simul. Comput. 36:45–54, 2007). This new sampler allows for the fitting of infinite mixture models with a wide-range of prior specifications. To illustrate this flexibility we consider priors defined through infinite sequences of independent positive random variables. Two applications are considered: density estimation using mixture models and hazard function estimation. In each case we show how the slice efficient sampler can be applied to make inference in the models. In the mixture case, two submodels are studied in detail. The first one assumes that the positive random variables are Gamma distributed and the second assumes that they are inverse-Gaussian distributed. Both priors have two hyperparameters and we consider their effect on the prior distribution of the number of occupied clusters in a sample. Extensive computational comparisons with alternative “conditional” simulation techniques for mixture models using the standard Dirichlet process prior and our new priors are made. The properties of the new priors are illustrated on a density estimation problem. 相似文献

18.

A semiparametric pseudo-score method for analysis of two-phase studies with continuous phase-I covariates

Chatterjee N Chen YH 《Lifetime data analysis》2007,13(4):607-622

Two-phase study designs can reduce cost and other practical burdens associated with large scale epidemiologic studies by limiting ascertainment of expensive covariates to a smaller but informative sub-sample (phase-II) of the main study (phase-I). During the analysis of such studies, however, subjects who are selected at phase-I but not at phase-II, remain informative as they may have partial covariate information. A variety of semi-parametric methods now exist for incorporating such data from phase-I subjects when the covariate information can be summarized into a finite number of strata. In this article, we consider extending the pseudo-score approach proposed by Chatterjee et al. (J Am Stat Assoc 98:158–168, 2003) using a kernel smoothing approach to incorporate information on continuous phase-I covariates. Practical issues and algorithms for implementing the methods using existing software are discussed. A sandwich-type variance estimator based on the influence function representation of the pseudo-score function is proposed. Finite sample performance of the methods are studies using simulated data. Advantage of the proposed smoothing approach over alternative methods that use discretized phase-I covariate information is illustrated using two-phase data simulated within the National Wilms Tumor Study (NWTS). 相似文献

19.

Detecting homogeneous segments in DNA sequences by using hidden Markov models 总被引：2，自引：0，他引：2

R. J. Boys D. A. Henderson & D. J. Wilkinson 《Journal of the Royal Statistical Society. Series C, Applied statistics》2000,49(2):269-285

In recent years there has been a rapid growth in the amount of DNA being sequenced and in its availability through genetic databases. Statistical techniques which identify structure within these sequences can be of considerable assistance to molecular biologists particularly when they incorporate the discrete nature of changes caused by evolutionary processes. This paper focuses on the detection of homogeneous segments within heterogeneous DNA sequences. In particular, we study an intron from the chimpanzee α-fetoprotein gene; this protein plays an important role in the embryonic development of mammals. We present a Bayesian solution to this segmentation problem using a hidden Markov model implemented by Markov chain Monte Carlo methods. We consider the important practical problem of specifying informative prior knowledge about sequences of this type. Two Gibbs sampling algorithms are contrasted and the sensitivity of the analysis to the prior specification is investigated. Model selection and possible ways to overcome the label switching problem are also addressed. Our analysis of intron 7 identifies three distinct homogeneous segment types, two of which occur in more than one region, and one of which is reversible. 相似文献

20.

A penalized spline estimator for fixed effects panel data models

Peter Pütz Thomas Kneib 《AStA Advances in Statistical Analysis》2018,102(2):145-166

Estimating nonlinear effects of continuous covariates by penalized splines is well established for regressions with cross-sectional data as well as for panel data regressions with random effects. Penalized splines are particularly advantageous since they enable both the estimation of unknown nonlinear covariate effects and inferential statements about these effects. The latter are based, for example, on simultaneous confidence bands that provide a simultaneous uncertainty assessment for the whole estimated functions. In this paper, we consider fixed effects panel data models instead of random effects specifications and develop a first-difference approach for the inclusion of penalized splines in this case. We take the resulting dependence structure into account and adapt the construction of simultaneous confidence bands accordingly. In addition, the penalized spline estimates as well as the confidence bands are also made available for derivatives of the estimated effects which are of considerable interest in many application areas. As an empirical illustration, we analyze the dynamics of life satisfaction over the life span based on data from the German Socio-Economic Panel. An open-source software implementation of our methods is available in the R package pamfe. 相似文献