期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Information analysis of local suppression scheme based on a spatial-temporal model

Dal Ho Kim Jayoun Lee 《Journal of applied statistics》2018,45(16):2929-2942

In a wireless sensor network, data collection is relatively cheap whereas data transmission is relatively expensive. Thus, preserving battery life is critical. If the process of interest is sufficiently predictable, the suppression in transmission can be adopted to improve efficiency of sensor networks because the loss of information is not great. The prime interest lies in finding an inference-efficient way to support suppressed data collection application. In this paper, we present a suppression scheme for a multiple nodes setting with spatio-temporal processes, especially when process knowledge is insufficient. We also explore the impact of suppression schemes on the inference of the regional processes under various suppression levels. Finally, we formalize the hierarchical Bayesian model for these schemes. 相似文献

2.

Exploring spatial dependence in area-level random effect model for disaggregate-level crop yield estimation

Hukum Chandra 《Journal of applied statistics》2013,40(4):823-842

This paper describes an application of small area estimation (SAE) techniques under area-level spatial random effect models when only area (or district or aggregated) level data are available. In particular, the SAE approach is applied to produce district-level model-based estimates of crop yield for paddy in the state of Uttar Pradesh in India using the data on crop-cutting experiments supervised under the Improvement of Crop Statistics scheme and the secondary data from the Population Census. The diagnostic measures are illustrated to examine the model assumptions as well as reliability and validity of the generated model-based small area estimates. The results show a considerable gain in precision in model-based estimates produced applying SAE. Furthermore, the model-based estimates obtained by exploiting spatial information are more efficient than the one obtained by ignoring this information. However, both of these model-based estimates are more efficient than the direct survey estimate. In many districts, there is no survey data and therefore it is not possible to produce direct survey estimates for these districts. The model-based estimates generated using SAE are still reliable for such districts. These estimates produced by using SAE will provide invaluable information to policy-analysts and decision-makers. 相似文献

3.

Some comments on efficiency gains from auxiliary information for right-censored data

《Journal of statistical planning and inference》2001,96(1):191-202

Heavily right-censored time to event, or survival, data arise frequently in research areas such as medicine and industrial reliability. Recently, there have been suggestions that auxiliary outcomes which are more fully observed may be used to “enhance” or increase the efficiency of inferences for a primary survival time variable. However, efficiency gains from this approach have mostly been very small. Most of the situations considered have involved semiparametric models, so in this note we consider two very simple fully parametric models. In the one case involving a correlated auxiliary variable that is always observed, we find that efficiency gains are small unless the response and auxiliary variable are very highly correlated and the response is heavily censored. In the second case, which involves an intermediate stage in a three-stage model of failure, the efficiency gains can be more substantial. We suggest that careful study of specific situations is needed to identify opportunities for “enhanced” inferences, but that substantial gains seem more likely when auxiliary information involves structural information about the failure process. 相似文献

4.

Robustness of power-law behavior in cascading line failure models

F. Sloothaak S. C. Borst B. Zwart 《随机性模型》2018,34(1):45-72

Inspired by reliability issues in electric transmission networks, we use a probabilistic approach to study the occurrence of large failures in a stylized cascading line failure model. Such models capture the phenomenon where an initial line failure potentially triggers massive knock-on effects. Under certain critical conditions, the probability that the number of line failures exceeds a large threshold obeys a power-law distribution, a distinctive property observed in empiric blackout data. In this paper, we examine the robustness of the power-law behavior by exploring under which conditions this behavior prevails. 相似文献

5.

Model-based computed tomography image estimation: partitioning approach

Fekadu L. Bayisa Jun Yu 《Journal of applied statistics》2019,46(14):2627-2648

ABSTRACT

There is a growing interest to get a fully MR based radiotherapy. The most important development needed is to obtain improved bone tissue estimation. The existing model-based methods perform poorly on bone tissues. This paper was aimed at obtaining improved bone tissue estimation. Skew-Gaussian mixture model and Gaussian mixture model were proposed to investigate CT image estimation from MR images by partitioning the data into two major tissue types. The performance of the proposed models was evaluated using the leave-one-out cross-validation method on real data. In comparison with the existing model-based approaches, the model-based partitioning approach outperformed in bone tissue estimation, especially in dense bone tissue estimation. 相似文献

6.

Mixtures of modified t-factor analyzers for model-based clustering,classification, and discriminant analysis

Jeffrey L. Andrews Paul D. McNicholas 《Journal of statistical planning and inference》2011,141(4):1479-1486

A novel family of mixture models is introduced based on modified t-factor analyzers. Modified factor analyzers were recently introduced within the Gaussian context and our work presents a more flexible and robust alternative. We introduce a family of mixtures of modified t-factor analyzers that uses this generalized version of the factor analysis covariance structure. We apply this family within three paradigms: model-based clustering; model-based classification; and model-based discriminant analysis. In addition, we apply the recently published Gaussian analogue to this family under the model-based classification and discriminant analysis paradigms for the first time. Parameter estimation is carried out within the alternating expectation-conditional maximization framework and the Bayesian information criterion is used for model selection. Two real data sets are used to compare our approach to other popular model-based approaches; in these comparisons, the chosen mixtures of modified t-factor analyzers model performs favourably. We conclude with a summary and suggestions for future work. 相似文献

7.

Semi parametric estimation of employment duration models

Daniel HcFadden 《Econometric Reviews》1987,6(2):257-270

Semi parametric methods provide estimates of finite parameter vectors without requiring that the complete data generation process be assumed in a finite-dimensional family. By avoiding bias from incorrect specification, such estimators gain robustness, although usually at the cost of decreased precision. The most familiar semi parametric method in econometrics is ordi¬nary least squares, which estimates the parameters of a linear regression model without requiring that the distribution of the disturbances be in a finite-parameter family. The recent literature in econometric theory has extended semi parametric methods to a variety of non-linear models, including models appropriate for analysis of censored duration data. Horowitz and Newman make perhaps the first empirical application of these methods, to data on employment duration. Their analysis provides insights into the practical problems of implementing these methods, and limited information on performance. Their data set, containing 226 male controls from the Denver income maintenance experiment in 1971-74, does not show any significant covariates (except race), even when a fully parametric model is assumed. Consequently, the authors are unable to reject the fully parametric model in a test against the alternative semi parametric estimators. This provides some negative, but tenuous, evidence that in practical applications the reduction in bias using semi parametric estimators is insufficient to offset loss in precision. Larger samples, and data sets with strongly significant covariates, will need to be interval, and if the observation period is long enough will eventually be more loyal on average for those starting employment spells near the end of the observation period. 相似文献

8.

On positive harris recurrence of stochastic fluid networks

《随机性模型》2013,29(4):391-405

The stochastic fluid networks we consider here are open multiclass fluid networks with Markov-modulated inflow rate. They are typically used as models for manufacturing and telecommunication systems. The main aim of this paper is to investigate the question of positive Harris recurrence of the joint process of buffer content and inflow rate. It will turn out that a Markovian server allocation exists under which the process is positive Harris recurrent if the usual traffic conditions are satisfied on average. Moreover, for special networks like single-server networks, re-entrant lines and Jackson networks it is possible to show that certain service disciplines induce positive Harris recurrent state processes. As a by-product we show that these stochastic fluid networks converge under appropriately defined scaling to solutions of deterministic fluid models. 相似文献

9.

Multivariate-multiple circular regression

Sungsu Kim Ashis SenGupta 《Journal of Statistical Computation and Simulation》2017,87(7):1277-1291

We introduce a fully model-based approach of studying functional relationships between a multivariate circular-dependent variable and several circular covariates, enabling inference regarding all model parameters and related prediction. Two multiple circular regression models are presented for this approach. First, for an univariate circular-dependent variable, we propose the least circular mean-square error (LCMSE) estimation method, and asymptotic properties of the LCMSE estimators and inferential methods are developed and illustrated. Second, using a simulation study, we provide some practical suggestions for model selection between the two models. An illustrative example is given using a real data set from protein structure prediction problem. Finally, a straightforward extension to the case with a multivariate-dependent circular variable is provided. 相似文献

10.

Analysing exponential random graph (p-star) models with missing data using Bayesian data augmentation

Johan H. Koskinen Garry L. Robins Philippa E. Pattison 《Statistical Methodology》2010,7(3):366-384

Missing data are often problematic in social network analysis since what is missing may potentially alter the conclusions about what we have observed as tie-variables need to be interpreted in relation to their local neighbourhood and the global structure. Some ad hoc methods for dealing with missing data in social networks have been proposed but here we consider a model-based approach. We discuss various aspects of fitting exponential family random graph (or p-star) models (ERGMs) to networks with missing data and present a Bayesian data augmentation algorithm for the purpose of estimation. This involves drawing from the full conditional posterior distribution of the parameters, something which is made possible by recently developed algorithms. With ERGMs already having complicated interdependencies, it is particularly important to provide inference that adequately describes the uncertainty, something that the Bayesian approach provides. To the extent that we wish to explore the missing parts of the network, the posterior predictive distributions, immediately available at the termination of the algorithm, are at our disposal, which allows us to explore the distribution of what is missing unconditionally on any particular parameter values. Some important features of treating missing data and of the implementation of the algorithm are illustrated using a well-known collaboration network and a variety of missing data scenarios. 相似文献

11.

Clustering gene expression time course data using mixtures of multivariate t-distributions

Paul D. McNicholas Sanjeena Subedi 《Journal of statistical planning and inference》2012,142(5):1114-1127

Clustering gene expression time course data is an important problem in bioinformatics because understanding which genes behave similarly can lead to the discovery of important biological information. Statistically, the problem of clustering time course data is a special case of the more general problem of clustering longitudinal data. In this paper, a very general and flexible model-based technique is used to cluster longitudinal data. Mixtures of multivariate t-distributions are utilized, with a linear model for the mean and a modified Cholesky-decomposed covariance structure. Constraints are placed upon the covariance structure, leading to a novel family of mixture models, including parsimonious models. In addition to model-based clustering, these models are also used for model-based classification, i.e., semi-supervised clustering. Parameters, including the component degrees of freedom, are estimated using an expectation-maximization algorithm and two different approaches to model selection are considered. The models are applied to simulated data to illustrate their efficacy; this includes a comparison with their Gaussian analogues—the use of these Gaussian analogues with a linear model for the mean is novel in itself. Our family of multivariate t mixture models is then applied to two real gene expression time course data sets and the results are discussed. We conclude with a summary, suggestions for future work, and a discussion about constraining the degrees of freedom parameter. 相似文献

12.

A Bayesian approach to Weibull survival models—Application to a cancer clinical trial

Keith Abrams Deborah Ashby Doug Errington 《Lifetime data analysis》1996,2(2):159-174

In this paper we outline a class of fully parametric proportional hazards models, in which the baseline hazard is assumed to be a power transform of the time scale, corresponding to assuming that survival times follow a Weibull distribution. Such a class of models allows for the possibility of time varying hazard rates, but assumes a constant hazard ratio. We outline how Bayesian inference proceeds for such a class of models using asymptotic approximations which require only the ability to maximize the joint log posterior density. We apply these models to a clinical trial to assess the efficacy of neutron therapy compared to conventional treatment for patients with tumors of the pelvic region. In this trial there was prior information about the log hazard ratio both in terms of elicited clinical beliefs and the results of previous studies. Finally, we consider a number of extensions to this class of models, in particular the use of alternative baseline functions, and the extension to multi-state data. 相似文献

13.

Process modeling for soil moisture using sensor network data

《Statistical Methodology》2014

The quantity of water contained in soil is referred to as the soil moisture. Soil moisture plays an important role in agriculture, percolation, and soil chemistry. Precipitation, temperature, atmospheric demand and topography are the primary processes that control soil moisture. Estimates of landscape variation in soil moisture are limited due to the complexity required to link high spatial variation in measurements with the aforesaid processes that vary in space and time. In this paper we develop an inferential framework that takes the form of data fusion using high temporal resolution environmental data from wireless networks along with sparse reflectometer data as inputs and yields inference on moisture variation as precipitation and temperature vary over time and drainage and canopy coverage vary in space. We specifically address soil moisture modeling in the context of wireless sensor networks. 相似文献

14.

Model-based clustering using copulas with applications 总被引：1，自引：0，他引：1

Ioannis Kosmidis Dimitris Karlis 《Statistics and Computing》2016,26(5):1079-1099

The majority of model-based clustering techniques is based on multivariate normal models and their variants. In this paper copulas are used for the construction of flexible families of models for clustering applications. The use of copulas in model-based clustering offers two direct advantages over current methods: (i) the appropriate choice of copulas provides the ability to obtain a range of exotic shapes for the clusters, and (ii) the explicit choice of marginal distributions for the clusters allows the modelling of multivariate data of various modes (either discrete or continuous) in a natural way. This paper introduces and studies the framework of copula-based finite mixture models for clustering applications. Estimation in the general case can be performed using standard EM, and, depending on the mode of the data, more efficient procedures are provided that can fully exploit the copula structure. The closure properties of the mixture models under marginalization are discussed, and for continuous, real-valued data parametric rotations in the sample space are introduced, with a parallel discussion on parameter identifiability depending on the choice of copulas for the components. The exposition of the methodology is accompanied and motivated by the analysis of real and artificial data. 相似文献

15.

Estimation of prolongation of hospital stay attributable to nosocomial infections: New approaches based on multistate models

Gabi Schulgen Martin Schumacher 《Lifetime data analysis》1996,2(3):219-240

Evaluation of the impact of nosocomial infection on duration of hospital stay usually relies on estimates obtained in prospective cohort studies. However, the statistical methods used to estimate the extra length of stay are usually not adequate. A naive comparison of duration of stay in infected and non-infected patients is not adequate to estimate the extra hospitalisation time due to nosocomial infections. Matching for duration of stay prior to infection can compensate in part for the bias of ad hoc methods. New model-based approaches have been developed to estimate the excess length of stay. It will be demonstrated that statistical models based on multivariate counting processes provide an appropriate framework to analyse the occurrence and impact of nosocomial infections. We will propose and investigate new approaches to estimate the extra time spent in hospitals attributable to nosocomial infections based on functionals of the transition probabilities in multistate models. Additionally, within the class of structural nested failure time models an alternative approach to estimate the extra stay due to nosocomial infections is derived. The methods are illustrated using data from a cohort study on 756 patients admitted to intensive care units at the University Hospital in Freiburg. 相似文献

16.

Vector-borne infectious disease mapping with stochastic difference equations: an analysis of dengue disease in Malaysia

N. A. Samat D. F. Percy 《Journal of applied statistics》2012,39(9):2029-2046

Few publications consider the estimation of relative risk for vector-borne infectious diseases. Most of these articles involve exploratory analysis that includes the study of covariates and their effects on disease distribution and the study of geographic information systems to integrate patient-related information. The aim of this paper is to introduce an alternative method of relative risk estimation based on discrete time–space stochastic SIR-SI models (susceptible–infective–recovered for human populations; susceptible–infective for vector populations) for the transmission of vector-borne infectious diseases, particularly dengue disease. First, we describe deterministic compartmental SIR-SI models that are suitable for dengue disease transmission. We then adapt these to develop corresponding discrete time–space stochastic SIR-SI models. Finally, we develop an alternative method of estimating the relative risk for dengue disease mapping based on these models and apply them to analyse dengue data from Malaysia. This new approach offers a better model for estimating the relative risk for dengue disease mapping compared with the other common approaches, because it takes into account the transmission process of the disease while allowing for covariates and spatial correlation between risks in adjacent regions. 相似文献

17.

Effects of model misspecification on tests of no randomized treatment effect arising from Cox's proportional hazards model

A. G. DiRienzo & S. W. Lagakos 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2001,63(4):745-757

We examine the asymptotic and small sample properties of model-based and robust tests of the null hypothesis of no randomized treatment effect based on the partial likelihood arising from an arbitrarily misspecified Cox proportional hazards model. When the distribution of the censoring variable is either conditionally independent of the treatment group given covariates or conditionally independent of covariates given the treatment group, the numerators of the partial likelihood treatment score and Wald tests have asymptotic mean equal to 0 under the null hypothesis, regardless of whether or how the Cox model is misspecified. We show that the model-based variance estimators used in the calculation of the model-based tests are not, in general, consistent under model misspecification, yet using analytic considerations and simulations we show that their true sizes can be as close to the nominal value as tests calculated with robust variance estimators. As a special case, we show that the model-based log-rank test is asymptotically valid. When the Cox model is misspecified and the distribution of censoring depends on both treatment group and covariates, the asymptotic distributions of the resulting partial likelihood treatment score statistic and maximum partial likelihood estimator do not, in general, have a zero mean under the null hypothesis. Here neither the fully model-based tests, including the log-rank test, nor the robust tests will be asymptotically valid, and we show through simulations that the distortion to test size can be substantial. 相似文献

18.

Model-based classification using latent Gaussian mixture models 总被引：1，自引：0，他引：1

Paul D. McNicholas 《Journal of statistical planning and inference》2010

A novel model-based classification technique is introduced based on parsimonious Gaussian mixture models (PGMMs). PGMMs, which were introduced recently as a model-based clustering technique, arise from a generalization of the mixtures of factor analyzers model and are based on a latent Gaussian mixture model. In this paper, this mixture modelling structure is used for model-based classification and the particular area of application is food authenticity. Model-based classification is performed by jointly modelling data with known and unknown group memberships within a likelihood framework and then estimating parameters, including the unknown group memberships, within an alternating expectation-conditional maximization framework. Model selection is carried out using the Bayesian information criteria and the quality of the maximum a posteriori classifications is summarized using the misclassification rate and the adjusted Rand index. This new model-based classification technique gives excellent classification performance when applied to real food authenticity data on the chemical properties of olive oils from nine areas of Italy. 相似文献

19.

Joint modeling of degradation and failure time data 总被引：1，自引：0，他引：1

Axel Lehmann 《Journal of statistical planning and inference》2009

This paper surveys some approaches to model the relationship between failure time data and covariate data like internal degradation and external environmental processes. These models which reflect the dependency between system state and system reliability include threshold models and hazard-based models. In particular, we consider the class of degradation–threshold–shock models (DTS models) in which failure is due to the competing causes of degradation and trauma. For this class of reliability models we express the failure time in terms of degradation and covariates. We compute the survival function of the resulting failure time and derive the likelihood function for the joint observation of failure times and degradation data at discrete times. We consider a special class of DTS models where degradation is modeled by a process with stationary independent increments and related to external covariates through a random time scale and extend this model class to repairable items by a marked point process approach. The proposed model class provides a rich conceptual framework for the study of degradation–failure issues. 相似文献

20.

EXPLORA:content interpretation of data

Peter Hoschka Willi Klösgen 《Journal of applied statistics》1991,18(1):87-97

Most approaches to applying knowledge-based techniques for data analyses concentrate on the context-independent statistical support. EXPLORA however is developed for the subject-specific interpretation with regard to the contents of the data to be analyzed (i.e. content interpretation). Therefore its knowledge base includes also the objects and semantic relations of the real system that produces the data. In this paper we describe the functional model representing the process of content interpretation, summarize the software architecture of the system and give some examples of its applications by pilot-users in survey analysis. EXPLORA addresses applications with data produced regularly which have to be analyzed in a routine way. The system systematically searches for statistical results (facts) to detect relations which possibly could be overlooked by a human analyst. On the other hand EXPLORA will help overcome the large bulk of information which today is usually still produced when presenting the data. Therefore a second knowledge process of content interpretation consists in discovering messages about the data by condensing the facts. Approaches for inductive generalization which have been developed for machine learning are utilized to identify common values of attributes of the objects to which the facts relate. At a later stage the system searches for interesting facts by applying redundancy rules and domaindependent selection rules. EXPLORA formulates the messages in terms of the domain, groups and orders them and even provides flexible navigations in the fact spaces. 相似文献