首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A spatial hidden Markov model (SHMM) is introduced to analyse the distribution of a species on an atlas, taking into account that false observations and false non-detections of the species can occur during the survey, blurring the true map of presence and absence of the species. The reconstruction of the true map is tackled as the restoration of a degraded pixel image, where the true map is an autologistic model, hidden behind the observed map, whose normalizing constant is efficiently computed by simulating an auxiliary map. The distribution of the species is explained under the Bayesian paradigm and Markov chain Monte Carlo (MCMC) algorithms are developed. We are interested in the spatial distribution of the bird species Greywing Francolin in the south of Africa. Many climatic and land-use explanatory variables are also available: they are included in the SHMM and a subset of them is selected by the mutation operators within the MCMC algorithm.  相似文献   

2.
The Finnish common toad data of Heikkinen and Hogmander are reanalysed using an alternative fully Bayesian model that does not require a pseudolikelihood approximation and an alternative prior distribution for the true presence or absence status of toads in each 10 km×10 km square. Markov chain Monte Carlo methods are used to obtain posterior probability estimates of the square-specific presences of the common toad and these are presented as a map. The results are different from those of Heikkinen and Hogmander and we offer an explanation in terms of the prior used for square-specific presence of the toads. We suggest that our approach is more faithful to the data and avoids unnecessary confounding of effects. We demonstrate how to extend our model efficiently with square-specific covariates and illustrate this by introducing deterministic spatial changes.  相似文献   

3.
Empirical distribution function (EDF) is a commonly used estimator of population cumulative distribution function. Survival function is estimated as the complement of EDF. However, clinical diagnosis of an event is often subjected to misclassification, by which the outcome is given with some uncertainty. In the presence of such errors, the true distribution of the time to first event is unknown. We develop a method to estimate the true survival distribution by incorporating negative predictive values and positive predictive values of the prediction process into a product-limit style construction. This will allow us to quantify the bias of the EDF estimates due to the presence of misclassified events in the observed data. We present an unbiased estimator of the true survival rates and its variance. Asymptotic properties of the proposed estimators are provided and these properties are examined through simulations. We evaluate our methods using data from the VIRAHEP-C study.  相似文献   

4.
The product limit or Kaplan‐Meier (KM) estimator is commonly used to estimate the survival function in the presence of incomplete time to event. Application of this method assumes inherently that the occurrence of an event is known with certainty. However, the clinical diagnosis of an event is often subject to misclassification due to assay error or adjudication error, by which the event is assessed with some uncertainty. In the presence of such errors, the true distribution of the time to first event would not be estimated accurately using the KM method. We develop a method to estimate the true survival distribution by incorporating negative predictive values and positive predictive values, into a KM‐like method of estimation. This allows us to quantify the bias in the KM survival estimates due to the presence of misclassified events in the observed data. We present an unbiased estimator of the true survival function and its variance. Asymptotic properties of the proposed estimators are provided, and these properties are examined through simulations. We demonstrate our methods using data from the Viral Resistance to Antiviral Therapy of Hepatitis C study.  相似文献   

5.
The estimation of abundance from presence–absence data is an intriguing problem in applied statistics. The classical Poisson model makes strong independence and homogeneity assumptions and in practice generally underestimates the true abundance. A controversial ad hoc method based on negative‐binomial counts (Am. Nat.) has been empirically successful but lacks theoretical justification. We first present an alternative estimator of abundance based on a paired negative binomial model that is consistent and asymptotically normally distributed. A quadruple negative binomial extension is also developed, which yields the previous ad hoc approach and resolves the controversy in the literature. We examine the performance of the estimators in a simulation study and estimate the abundance of 44 tree species in a permanent forest plot.  相似文献   

6.
7.
We revisit the problem of estimating the proportion π of true null hypotheses where a large scale of parallel hypothesis tests are performed independently. While the proportion is a quantity of interest in its own right in applications, the problem has arisen in assessing or controlling an overall false discovery rate. On the basis of a Bayes interpretation of the problem, the marginal distribution of the p-value is modeled in a mixture of the uniform distribution (null) and a non-uniform distribution (alternative), so that the parameter π of interest is characterized as the mixing proportion of the uniform component on the mixture. In this article, a nonparametric exponential mixture model is proposed to fit the p-values. As an alternative approach to the convex decreasing mixture model, the exponential mixture model has the advantages of identifiability, flexibility, and regularity. A computation algorithm is developed. The new approach is applied to a leukemia gene expression data set where multiple significance tests over 3,051 genes are performed. The new estimate for π with the leukemia gene expression data appears to be about 10% lower than the other three estimates that are known to be conservative. Simulation results also show that the new estimate is usually lower and has smaller bias than the other three estimates.  相似文献   

8.
One of the fundamental issues in analyzing microarray data is to determine which genes are expressed and which ones are not for a given group of subjects. In datasets where many genes are expressed and many are not expressed (i.e., underexpressed), a bimodal distribution for the gene expression levels often results, where one mode of the distribution represents the expressed genes and the other mode represents the underexpressed genes. To model this bimodality, we propose a new class of mixture models that utilize a random threshold value for accommodating bimodality in the gene expression distribution. Theoretical properties of the proposed model are carefully examined. We use this new model to examine the problem of differential gene expression between two groups of subjects, develop prior distributions, and derive a new criterion for determining which genes are differentially expressed between the two groups. Prior elicitation is carried out using empirical Bayes methodology in order to estimate the threshold value as well as elicit the hyperparameters for the two component mixture model. The new gene selection criterion is demonstrated via several simulations to have excellent false positive rate and false negative rate properties. A gastric cancer dataset is used to motivate and illustrate the proposed methodology.  相似文献   

9.
In survey research, it is assumed that reported response by the individual is correct. However, given the issues of prestige bias, self-respect, respondent's reported data often produces estimated values which are highly deviated from the true values. This causes measurement error (ME) to be present in the sample estimates. In this article, the estimation of population mean in the presence of measurement error using information on a single auxiliary variable is studied. A generalized estimator of population mean is proposed. The class of estimators is obtained by using some conventional and non-conventional measures. Simulation and numerical study is also conducted to assess the performance of estimators in the presence and absence of measurement error.  相似文献   

10.
This paper introduces a new biostatistical approach to the problems of estimating true values and approximating the distribution of true values from unreliable data.We present the basic rationale for the unmixing method and report on a simulation study of its properties in estimating the centiles of a skewed, outlier-prone class of distributions.We also present an application to highly skewed USDA vitamin A intake data, and a pseudo-code version of the unmixing algorithm that we hope will allow other researchers to experiment with it  相似文献   

11.
The Riesz distributions on a symmetric cone are used to introduce a class of beta-Riesz distributions. Some fundamental properties of these distributions are established. In particular, we study the effect of a projection on a beta-Riesz distribution and we give some properties of independence. We also calculate the expectation of a beta-Riesz random variable. As a corollary, we give the regression on the mean of a Riesz random variable; that is, we determine the conditional expectation E(UU+V) where U and V are two independent Riesz random variables.  相似文献   

12.
A common assumption in fitting panel data models is normality of stochastic subject effects. This can be extremely restrictive, making vague most potential features of true distributions. The objective of this article is to propose a modeling strategy, from a semi-parametric Bayesian perspective, to specify a flexible distribution for the random effects in dynamic panel data models. This is addressed here by assuming the Dirichlet process mixture model to introduce Dirichlet process prior for the random-effects distribution. We address the role of initial conditions in dynamic processes, emphasizing on joint modeling of start-up and subsequent responses. We adopt Gibbs sampling techniques to approximate posterior estimates. These important topics are illustrated by a simulation study and also by testing hypothetical models in two empirical contexts drawn from economic studies. We use modified versions of information criteria to compare the fitted models.  相似文献   

13.
We investigate the effect of unobserved heterogeneity in the context of the linear transformation model for censored survival data in the clinical trials setting. The unobserved heterogeneity is represented by a frailty term, with unknown distribution, in the linear transformation model. The bias of the estimate under the assumption of no unobserved heterogeneity when it truly is present is obtained. We also derive the asymptotic relative efficiency of the estimate of treatment effect under the incorrect assumption of no unobserved heterogeneity. Additionally we investigate the loss of power for clinical trials that are designed assuming the model without frailty when, in fact, the model with frailty is true. Numerical studies under a proportional odds model show that the loss of efficiency and the loss of power can be substantial when the heterogeneity, as embodied by a frailty, is ignored. An erratum to this article can be found at  相似文献   

14.
It is important that the proportion of true null hypotheses be estimated accurately in a multiple hypothesis context. Current estimation methods, however, are not suitable for high-dimensional data such as microarray data. First, they do not consider the (strong) dependence between hypotheses (or genes), thereby resulting in inaccurate estimation. Second, the unknown distribution of false null hypotheses cannot be estimated properly by these methods. Third, the estimation is affected strongly by outliers. In this paper, we find out the optimal procedure for estimating the proportion of true null hypotheses under a (strong) dependence based on the Dirichlet process prior. In addition, by using the minimum Hellinger distance, the estimation should be robust to any model misspecification as well as to any outliers while maintaining efficiency. The results are confirmed by a simulation study, and the newly developed methodology is illustrated by a real microarray data.  相似文献   

15.
The receiver operating characteristic (ROC) curve is a graphical representation of the relationship between false positive and true positive rates. It is a widely used statistical tool for describing the accuracy of a diagnostic test. In this paper we propose a new nonparametric ROC curve estimator based on the smoothed empirical distribution functions. We prove its strong consistency and perform a simulation study to compare it with some other popular nonparametric estimators of the ROC curve. We also apply the proposed method to a real data set.  相似文献   

16.
For classification problems where the test data are labeled sequentially, the point at which all true positives are first identified is often of critical importance. This article develops hypothesis tests to assess whether all true positives have been labeled in the test data. The tests use a partial receiver operating characteristic (ROC) that is generated from a labeled subset of the test data. These methods are developed in the context of unexploded ordnance (UXO) classification, but are applicable to any binary classification problem. First, the likelihood of the observed ROC given binormal model parameters is derived using order statistics, leading to a nonlinear parameter estimation problem. I then derive the approximate distribution of the point on the ROC at which all true instances are found. Using estimated binormal parameters, this distribution can be integrated up to a desired confidence level to define a critical false alarm rate (FAR). If the selected operating point is before this critical point, then additional labels out to the critical point are required. A second test uses the uncertainty in binormal parameters to determine the critical FAR. These tests are demonstrated with UXO classification examples and both approaches are recommended for testing operating points.  相似文献   

17.
Self-reported income information particularly suffers from an intentional coarsening of the data, which is called heaping or rounding. If it does not occur completely at random – which is usually the case – heaping and rounding have detrimental effects on the results of statistical analysis. Conventional statistical methods do not consider this kind of reporting bias, and thus might produce invalid inference. We describe a novel statistical modeling approach that allows us to deal with self-reported heaped income data in an adequate and flexible way. We suggest modeling heaping mechanisms and the true underlying model in combination. To describe the true net income distribution, we use the zero-inflated log-normal distribution. Heaping points are identified from the data by applying a heuristic procedure comparing a hypothetical income distribution and the empirical one. To determine heaping behavior, we employ two distinct models: either we assume piecewise constant heaping probabilities, or heaping probabilities are considered to increase steadily with proximity to a heaping point. We validate our approach by some examples. To illustrate the capacity of the proposed method, we conduct a case study using income data from the German National Educational Panel Study.  相似文献   

18.
We propose bivariate Weibull regression model with heterogeneity (frailty or random effect) which is generated by Weibull distribution. We assume that the bivariate survival data follow bivariate Weibull of Hanagal (Econ Qual Control 19:83–90, 2004). There are some interesting situations like survival times in genetic epidemiology, dental implants of patients and twin births (both monozygotic and dizygotic) where genetic behavior (which is unknown and random) of patients follows a known frailty distribution. These are the situations which motivate to study this particular model. We propose two-stage maximum likelihood estimation for hierarchical likelihood in the proposed model. We present a small simulation study to compare these estimates with the true value of the parameters and it is observed that these estimates are very close to the true values of the parameters.  相似文献   

19.
Overdispersion has been a common phenomenon in count data and usually treated with the negative binomial model. This paper shows that measurement errors in covariates in general also lead to overdispersion on the observed data if the true data generating process is indeed the Poisson regression. This kind of overdispersion cannot be treated using the negative binomial model, as otherwise, biases will occur. To provide consistent estimates, we propose a new type of corrected score estimator assuming that the distribution of the latent variables is known. The consistency and asymptotic normality of the proposed estimator are established. Simulation results show that this estimator has good finite sample performance. We also illustrate that the Akaike information criterion and Bayesian information criterion work well for selecting the correct model if the true model is the errors-in-variables Poisson regression.  相似文献   

20.
This article assumes the goal of proposing a simulation-based theoretical model comparison methodology with application to two time series road accident models. The model comparison exercise helps to quantify the main differences and similarities between the two models and comprises of three main stages: (1) simulation of time series through a true model with predefined properties; (2) estimation of the alternative model using the simulated data; (3) sensitivity analysis to quantify the effect of changes in the true model parameters on alternative model parameter estimates through analysis of variance, ANOVA. The proposed methodology is applied to two time series road accident models: UCM (unobserved components model) and DRAG (Demand for Road Use, Accidents and their Severity). Assuming that the real data-generating process is the UCM, new datasets approximating the road accident data are generated, and DRAG models are estimated using the simulated data. Since these two methodologies are usually assumed to be equivalent, in a sense that both models accurately capture the true effects of the regressors, we are specifically addressing the modeling of the stochastic trend, through the alternative model. Stochastic trend is the time-varying component and is one of the crucial factors in time series road accident data. Theoretically, it can be easily modeled through UCM, given its modeling properties. However, properly capturing the effect of a non-stationary component such as stochastic trend in a stationary explanatory model such as DRAG is challenging. After obtaining the parameter estimates of the alternative model (DRAG), the estimates of both true and alternative models are compared and the differences are quantified through experimental design and ANOVA techniques. It is observed that the effects of the explanatory variables used in the UCM simulation are only partially captured by the respective DRAG coefficients. This a priori, could be due to multicollinearity but the results of both simulation of UCM data and estimating of DRAG models reveal that there is no significant static correlation among regressors. Moreover, in fact, using ANOVA, it is determined that this regression coefficient estimation bias is caused by the presence of the stochastic trend present in the simulated data. Thus, the results of the methodological development suggest that the stochastic component present in the data should be treated accordingly through a preliminary, exploratory data analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号