期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Physically constrained spatiotemporal modeling: generating clear-sky constructions of land surface temperature from sparse,remotely sensed satellite data

Gavin Q. Collins Matthew J. Heaton Leiqiu Hu 《Journal of applied statistics》2020,47(8):1439

Satellite remote-sensing is used to collect important atmospheric and geophysical data at various spatial resolutions, providing insight into spatiotemporal surface and climate variability globally. These observations are often plagued with missing spatial and temporal information of Earth''s surface due to (1) cloud cover at the time of a satellite passing and (2) infrequent passing of polar-orbiting satellites. While many methods are available to model missing data in space and time, in the case of land surface temperature (LST) from thermal infrared remote sensing, these approaches generally ignore the temporal pattern called the ‘diurnal cycle’ which physically constrains temperatures to peak in the early afternoon and reach a minimum at sunrise. In order to infill an LST dataset, we parameterize the diurnal cycle into a functional form with unknown spatiotemporal parameters. Using multiresolution spatial basis functions, we estimate these parameters from sparse satellite observations to reconstruct an LST field with continuous spatial and temporal distributions. These estimations may then be used to better inform scientists of spatiotemporal thermal patterns over relatively complex domains. The methodology is demonstrated using data collected by MODIS on NASA''s Aqua and Terra satellites over both Houston, TX and Phoenix, AZ USA. 相似文献

2.

Bayesian analysis of multivariate mortality data with large families

M.-H. Chen D. K. Dey & D. Sinha 《Journal of the Royal Statistical Society. Series C, Applied statistics》2000,49(1):129-144

This paper presents a Bayesian method for the analysis of toxicological multivariate mortality data when the discrete mortality rate for each family of subjects at a given time depends on familial random effects and the toxicity level experienced by the family. Our aim is to model and analyse one set of such multivariate mortality data with large family sizes: the potassium thiocyanate (KSCN) tainted fish tank data of O'Hara Hines. The model used is based on a discretized hazard with additional time-varying familial random effects. A similar previous study (using sodium thiocyanate (NaSCN)) is used to construct a prior for the parameters in the current study. A simulation-based approach is used to compute posterior estimates of the model parameters and mortality rates and several other quantities of interest. Recent tools in Bayesian model diagnostics and variable subset selection have been incorporated to verify important modelling assumptions regarding the effects of time and heterogeneity among the families on the mortality rate. Further, Bayesian methods using predictive distributions are used for comparing several plausible models. 相似文献

3.

Efficient estimation of variance components in nonparametric mixed-effects models with large samples

Nathaniel E. Helwig 《Statistics and Computing》2016,26(6):1319-1336

Linear mixed-effects (LME) regression models are a popular approach for analyzing correlated data. Nonparametric extensions of the LME regression model have been proposed, but the heavy computational cost makes these extensions impractical for analyzing large samples. In particular, simultaneous estimation of the variance components and smoothing parameters poses a computational challenge when working with large samples. To overcome this computational burden, we propose a two-stage estimation procedure for fitting nonparametric mixed-effects regression models. Our results reveal that, compared to currently popular approaches, our two-stage approach produces more accurate estimates that can be computed in a fraction of the time. 相似文献

4.

Double threshold autoregressive conditionally heteroscedastic model building by genetic algorithms

《Journal of Statistical Computation and Simulation》2012,82(6):541-558

Asymmetric behaviour in both mean and variance is often observed in real time series. The approach we adopt is based on double threshold autoregressive conditionally heteroscedastic (DTARCH) model with normal innovations. This model allows threshold nonlinearity in mean and volatility to be modelled as a result of the impact of lagged changes in assets and squared shocks, respectively. A methodology for building DTARCH models is proposed based on genetic algorithms (GAs). The most important structural parameters, that is regimes and thresholds, are searched for by GAs, while the remaining structural parameters, that is the delay parameters and models orders, vary in some pre-specified intervals and are determined using exhaustive search and an Asymptotic Information Criterion (AIC) like criterion. For each structural parameters trial set, a DTARCH model is fitted that maximizes the (penalized) likelihood (AIC criterion). For this purpose the iteratively weighted least squares algorithm is used. Then the best model according to the AIC criterion is chosen. Extension to the double threshold generalized ARCH (DTGARCH) model is also considered. The proposed methodology is checked using both simulated and market index data. Our findings show that our GAs-based procedure yields results that comparable to that reported in the literature and concerned with real time series. As far as artificial time series are considered, the proposed procedure seems to be able to fit the data quite well. In particular, a comparison is performed between the present procedure and the method proposed by Tsay [Tsay, R.S., 1989, Testing and modeling threshold autoregressive processes. Journal of the American Statistical Association, Theory and Methods, 84, 231–240.] for estimating the delay parameter. The former almost always yields better results than the latter. However, adopting Tsay's procedure as a preliminary stage for finding the appropriate delay parameter may save computational time specially if the delay parameter may vary in a large interval. 相似文献

5.

Stylometric analyses using Dirichlet process mixture models

Paramjit S. Gill Tim B. Swartz 《Journal of statistical planning and inference》2011,141(11):3665-3674

Stylometry refers to the statistical analysis of literary style of authors based on the characteristics of expression in their writings. We propose an approach to stylometry based on a Bayesian Dirichlet process mixture model using multinomial word frequency data. The parameters of the multinomial distribution of word frequency data are the “word prints” of the author. Our approach is based on model-based clustering of the vectors of probability values of the multinomial distribution. The resultant clusters identify different writing styles that assist in author attribution for disputed works in a corpus. As a test case, the methodology is applied to the problem of authorship attribution involving the Federalist papers. Our results are consistent with previous stylometric analyses of these papers. 相似文献

6.

Modeling Event Times with Multiple Outcomes Using the Wiener Process with Drift

Horrocks J Thompson ME 《Lifetime data analysis》2004,10(1):29-49

Length of stay in hospital (LOS) is a widely used outcome measure in Health Services research, often acting as a surrogate for resource consumption or as a measure of efficiency. The distribution of LOS is typically highly skewed, with a few large observations. An interesting feature is the presence of multiple outcomes (e.g. healthy discharge, death in hospital, transfer to another institution). Health Services researchers are interested in modeling the dependence of LOS on covariates, often using administrative data collected for other purposes, such as calculating fees for doctors. Even after all available covariates have been included in the model, unexplained heterogeneity usually remains. In this article, we develop a parametric regression model for LOS that addresses these features. The model is based on the time, T, that a Wiener process with drift (representing an unobserved health level process) hits one of two barriers, one representing healthy discharge and the other death in hospital. Our approach to analyzing event times has many parallels with competing risks analysis (Kalbfleisch and Prentice, The Statistical Analysis of Failure Time Data, New York: John Wiley and Sons, 1980)), and can be seen as a way of formalizing a competing risks situation. The density of T is an infinite series, and we outline a proof that the density and its derivatives are absolutely and uniformly convergent, and regularity conditions are satisfied. Expressions for the expected value of T, the conditional expectation of T given outcome, and the probability of each outcome are available in terms of model parameters. The proposed regression model uses an approximation to the density formed by truncating the series, and its parameters are estimated by maximum likelihood. An extension to allow a third outcome (e.g. transfers out of hospital) is discussed, as well as a mixture model that addresses the issue of unexplained heterogeneity. The model is illustrated using administrative data. 相似文献

7.

Correlation-adjusted estimation of sensitivity and specificity of two diagnostic tests 总被引：7，自引：0，他引：7

Marios P. Georgiadis Wesley O. Johnson Ian A. Gardner Ramanpreet Singh 《Journal of the Royal Statistical Society. Series C, Applied statistics》2003,52(1):63-76

Summary. Models for multiple-test screening data generally require the assumption that the tests are independent conditional on disease state. This assumption may be unreasonable, especially when the biological basis of the tests is the same. We propose a model that allows for correlation between two diagnostic test results. Since models that incorporate test correlation involve more parameters than can be estimated with the available data, posterior inferences will depend more heavily on prior distributions, even with large sample sizes. If we have reasonably accurate information about one of the two screening tests (perhaps the standard currently used test) or the prevalences of the populations tested, accurate inferences about all the parameters, including the test correlation, are possible. We present a model for evaluating dependent diagnostic tests and analyse real and simulated data sets. Our analysis shows that, when the tests are correlated, a model that assumes conditional independence can perform very poorly. We recommend that, if the tests are only moderately accurate and measure the same biological responses, researchers use the dependence model for their analyses. 相似文献

8.

Multiple Regression Model Averaging and the Focused Information Criterion With an Application to Portfolio Choice

Filip Klimenka James Lewis Wolter 《商业与经济统计学杂志》2013,31(3):506-516

ABSTRACT

We consider multiple regression (MR) model averaging using the focused information criterion (FIC). Our approach is motivated by the problem of implementing a mean-variance portfolio choice rule. The usual approach is to estimate parameters ignoring the intention to use them in portfolio choice. We develop an estimation method that focuses on the trading rule of interest. Asymptotic distributions of submodel estimators in the MR case are derived using a localization framework. The localization is of both regression coefficients and error covariances. Distributions of submodel estimators are used for model selection with the FIC. This allows comparison of submodels using the risk of portfolio rule estimators. FIC model averaging estimators are then characterized. This extension further improves risk properties. We show in simulations that applying these methods in the portfolio choice case results in improved estimates compared with several competitors. An application to futures data shows superior performance as well. 相似文献

9.

A non-homogeneous skew-Gaussian Bayesian spatial model

Hossein Boojari Majid Jafari Khaledi Firoozeh Rivaz 《Statistical Methods and Applications》2016,25(1):55-73

In spatial statistics, models are often constructed based on some common, but possible restrictive assumptions for the underlying spatial process, including Gaussianity as well as stationarity and isotropy. However, these assumptions are frequently violated in applied problems. In order to simultaneously handle skewness and non-homogeneity (i.e., non-stationarity and anisotropy), we develop the fixed rank kriging model through the use of skew-normal distribution for its non-spatial latent variables. Our approach to spatial modeling is easy to implement and also provides a great flexibility in adjusting to skewed and large datasets with heterogeneous correlation structures. We adopt a Bayesian framework for our analysis, and describe a simple MCMC algorithm for sampling from the posterior distribution of the model parameters and performing spatial prediction. Through a simulation study, we demonstrate that the proposed model could detect departures from normality and, for illustration, we analyze a synthetic dataset of CO\(_2\) measurements. Finally, to deal with multivariate spatial data showing some degree of skewness, a multivariate extension of the model is also provided. 相似文献

10.

STATISTICAL INFERENCE OF TRANSMISSION FIDELITY OF DNA METHYLATION PATTERNS OVER SOMATIC CELL DIVISIONS IN MAMMALS

Fu AQ Genereux DP Stöger R Laird CD Stephens M 《The annals of applied statistics》2010,4(2):871-892

We develop Bayesian inference methods for a recently-emerging type of epigenetic data to study the transmission fidelity of DNA methylation patterns over cell divisions. The data consist of parent-daughter double-stranded DNA methylation patterns with each pattern coming from a single cell and represented as an unordered pair of binary strings. The data are technically difficult and time-consuming to collect, putting a premium on an efficient inference method. Our aim is to estimate rates for the maintenance and de novo methylation events that gave rise to the observed patterns, while accounting for measurement error. We model data at multiple sites jointly, thus using whole-strand information, and considerably reduce confounding between parameters. We also adopt a hierarchical structure that allows for variation in rates across sites without an explosion in the effective number of parameters. Our context-specific priors capture the expected stationarity, or near-stationarity, of the stochastic process that generated the data analyzed here. This expected stationarity is shown to greatly increase the precision of the estimation. Applying our model to a data set collected at the human FMR1 locus, we find that measurement errors, generally ignored in similar studies, occur at a non-trivial rate (inappropriate bisulfite conversion error: 1.6% with 80% CI: 0.9-2.3%). Accounting for these errors has a substantial impact on estimates of key biological parameters. The estimated average failure of maintenance rate and daughter de novo rate decline from 0.04 to 0.024 and from 0.14 to 0.07, respectively, when errors are accounted for. Our results also provide evidence that de novo events may occur on both parent and daughter strands: the median parent and daughter de novo rates are 0.08 (80% CI: 0.04-0.13) and 0.07 (80% CI: 0.04-0.11), respectively. 相似文献

11.

Modelling spatially correlated data via mixtures: a Bayesian approach

Carmen Fernández Peter J. Green 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2002,64(4):805-826

Summary. The paper develops mixture models for spatially indexed data. We confine attention to the case of finite, typically irregular, patterns of points or regions with prescribed spatial relationships, and to problems where it is only the weights in the mixture that vary from one location to another. Our specific focus is on Poisson-distributed data, and applications in disease mapping. We work in a Bayesian framework, with the Poisson parameters drawn from gamma priors, and an unknown number of components. We propose two alternative models for spatially dependent weights, based on transformations of autoregressive Gaussian processes: in one (the logistic normal model), the mixture component labels are exchangeable; in the other (the grouped continuous model), they are ordered. Reversible jump Markov chain Monte Carlo algorithms for posterior inference are developed. Finally, the performances of both of these formulations are examined on synthetic data and real data on mortality from a rare disease. 相似文献

12.

Additive–multiplicative rates model for recurrent events

Yanyan Liu Yuanshan Wu Jianwen Cai Haibo Zhou 《Lifetime data analysis》2010,16(3):353-373

Recurrent events are frequently encountered in biomedical studies. Evaluating the covariates effects on the marginal recurrent event rate is of practical interest. There are mainly two types of rate models for the recurrent event data: the multiplicative rates model and the additive rates model. We consider a more flexible additive–multiplicative rates model for analysis of recurrent event data, wherein some covariate effects are additive while others are multiplicative. We formulate estimating equations for estimating the regression parameters. The estimators for these regression parameters are shown to be consistent and asymptotically normally distributed under appropriate regularity conditions. Moreover, the estimator of the baseline mean function is proposed and its large sample properties are investigated. We also conduct simulation studies to evaluate the finite sample behavior of the proposed estimators. A medical study of patients with cystic fibrosis suffered from recurrent pulmonary exacerbations is provided for illustration of the proposed method. 相似文献

13.

Fitting Kent models to compositional data with small concentration

J. L. Scealy A. H. Welsh 《Statistics and Computing》2014,24(2):165-179

Compositional data can be transformed to directional data by the square root transformation and then modelled by using the Kent distribution. The current approach for estimating the parameters in the Kent model for compositional data relies on a large concentration assumption which assumes that the majority of the transformed data is not distributed too close to the boundaries of the positive orthant. When the data is distributed close to the boundaries with large variance significant folding may result. To treat this case we propose new estimators of the parameters derived based on the actual folded Kent distribution which are obtained via the EM algorithm. We show that these new estimators significantly reduce the bias in the current estimators when both the sample size and amount of folding is moderately large. We also propose using a saddlepoint density approximation for the Kent distribution normalising constant in order to more accurately estimate the shape parameters when the concentration is small or only moderately large. 相似文献

14.

Bayesian Clustering of Many Garch Models

L. Bauwens J. V. K. Rombouts 《Econometric Reviews》2007,26(2):365-386

We consider the estimation of a large number of GARCH models, of the order of several hundreds. Our interest lies in the identification of common structures in the volatility dynamics of the univariate time series. To do so, we classify the series in an unknown number of clusters. Within a cluster, the series share the same model and the same parameters. Each cluster contains therefore similar series. We do not know a priori which series belongs to which cluster. The model is a finite mixture of distributions, where the component weights are unknown parameters and each component distribution has its own conditional mean and variance. Inference is done by the Bayesian approach, using data augmentation techniques. Simulations and an illustration using data on U.S. stocks are provided. 相似文献

15.

Bayesian Clustering of Many Garch Models

L. Bauwens 《Econometric Reviews》2013,32(2-4):365-386

We consider the estimation of a large number of GARCH models, of the order of several hundreds. Our interest lies in the identification of common structures in the volatility dynamics of the univariate time series. To do so, we classify the series in an unknown number of clusters. Within a cluster, the series share the same model and the same parameters. Each cluster contains therefore similar series. We do not know a priori which series belongs to which cluster. The model is a finite mixture of distributions, where the component weights are unknown parameters and each component distribution has its own conditional mean and variance. Inference is done by the Bayesian approach, using data augmentation techniques. Simulations and an illustration using data on U.S. stocks are provided. 相似文献

16.

Blinded assessment of treatment effects for survival endpoint in an ongoing trial

Jun Xie Hui Quan Ji Zhang 《Pharmaceutical statistics》2012,11(3):204-213

Many assumptions, including assumptions regarding treatment effects, are made at the design stage of a clinical trial for power and sample size calculations. It is desirable to check these assumptions during the trial by using blinded data. Methods for sample size re‐estimation based on blinded data analyses have been proposed for normal and binary endpoints. However, there is a debate that no reliable estimate of the treatment effect can be obtained in a typical clinical trial situation. In this paper, we consider the case of a survival endpoint and investigate the feasibility of estimating the treatment effect in an ongoing trial without unblinding. We incorporate information of a surrogate endpoint and investigate three estimation procedures, including a classification method and two expectation–maximization (EM) algorithms. Simulations and a clinical trial example are used to assess the performance of the procedures. Our studies show that the expectation–maximization algorithms highly depend on the initial estimates of the model parameters. Despite utilization of a surrogate endpoint, all three methods have large variations in the treatment effect estimates and hence fail to provide a precise conclusion about the treatment effect. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

17.

Bayesian Analysis of Masked Series System Lifetime Data

Chiranjit Mukhopadhyay Sanjib Basu 《统计学通讯:理论与方法》2013,42(2):329-348

The problem of analyzing series system lifetime data with masked or partial information on cause of failure is recent, compared to that of the standard competing risks model. A generic Gibbs sampling scheme is developed in this article towards a Bayesian analysis for a general parametric competing risks model with masked cause of failure data. The masking probabilities are not subjected to the symmetry assumption and independent Dirichlet priors are used to marginalize these nuisance parameters. The developed methodology is illustrated for the case where the components of a series system have independent log-Normal life distributions by employing independent Normal-Gamma priors for these component lifetime parameters. The Gibbs sampling scheme developed for the required analysis can also be used to provide a Bayesian analysis of data arising from the conventional competing risks model of independent log-Normals, which interestingly has so far remained by and large neglected in the literature. The developed methodology is deployed to analyze a masked lifetime data of PS/2 computer systems. 相似文献

18.

Comparisons of computational methods for clustered binary data

Tanujit Dey Chae Young Lim 《Journal of Statistical Computation and Simulation》2013,83(11):2030-2046

Clustered binary data are common in medical research and can be fitted to the logistic regression model with random effects which belongs to a wider class of models called the generalized linear mixed model. The likelihood-based estimation of model parameters often has to handle intractable integration which leads to several estimation methods to overcome such difficulty. The penalized quasi-likelihood (PQL) method is the one that is very popular and computationally efficient in most cases. The expectation–maximization (EM) algorithm allows to estimate maximum-likelihood estimates, but requires to compute possibly intractable integration in the E-step. The variants of the EM algorithm to evaluate the E-step are introduced. The Monte Carlo EM (MCEM) method computes the E-step by approximating the expectation using Monte Carlo samples, while the Modified EM (MEM) method computes the E-step by approximating the expectation using the Laplace's method. All these methods involve several steps of approximation so that corresponding estimates of model parameters contain inevitable errors (large or small) induced by approximation. Understanding and quantifying discrepancy theoretically is difficult due to the complexity of approximations in each method, even though the focus is on clustered binary data. As an alternative competing computational method, we consider a non-parametric maximum-likelihood (NPML) method as well. We review and compare the PQL, MCEM, MEM and NPML methods for clustered binary data via simulation study, which will be useful for researchers when choosing an estimation method for their analysis. 相似文献

19.

Estimation of parameters in incomplete data models defined by dynamical systems

Sophie Donnet Adeline Samson 《Journal of statistical planning and inference》2007

Parametric incomplete data models defined by ordinary differential equations (ODEs) are widely used in biostatistics to describe biological processes accurately. Their parameters are estimated on approximate models, whose regression functions are evaluated by a numerical integration method. Accurate and efficient estimations of these parameters are critical issues. This paper proposes parameter estimation methods involving either a stochastic approximation EM algorithm (SAEM) in the maximum likelihood estimation, or a Gibbs sampler in the Bayesian approach. Both algorithms involve the simulation of non-observed data with conditional distributions using Hastings–Metropolis (H–M) algorithms. A modified H–M algorithm, including an original local linearization scheme to solve the ODEs, is proposed to reduce the computational time significantly. The convergence on the approximate model of all these algorithms is proved. The errors induced by the numerical solving method on the conditional distribution, the likelihood and the posterior distribution are bounded. The Bayesian and maximum likelihood estimation methods are illustrated on a simulated pharmacokinetic nonlinear mixed-effects model defined by an ODE. Simulation results illustrate the ability of these algorithms to provide accurate estimates. 相似文献

20.

A non-homogeneous hidden Markov model for precipitation occurrence 总被引：9，自引：0，他引：9

J. P. Hughes P Guttorp & S. P. Charles 《Journal of the Royal Statistical Society. Series C, Applied statistics》1999,48(1):15-30

A non-homogeneous hidden Markov model is proposed for relating precipitation occurrences at multiple rain-gauge stations to broad scale atmospheric circulation patterns (the so-called 'downscaling problem'). We model a 15-year sequence of winter data from 30 rain stations in south-western Australia. The first 10 years of data are used for model development and the remaining 5 years are used for model evaluation. The fitted model accurately reproduces the observed rainfall statistics in the reserved data despite a shift in atmospheric circulation (and, consequently, rainfall) between the two periods. The fitted model also provides some useful insights into the processes driving rainfall in this region. 相似文献