Risk assessors often use different probability plots as a way to assess the fit of a particular distribution or model by comparing the plotted points to a straight line and to obtain estimates of the parameters in parametric distributions or models. When empirical data do not fall in a sufficiently straight line on a probability plot, and when no other single parametric distribution provides an acceptable (graphical) fit to the data, the risk assessor may consider a mixture model with two component distributions. Animated probability plots are a way to visualize the possible behaviors of mixture models with two component distributions. When no single parametric distribution provides an adequate fit to an empirical dataset, animated probability plots can help an analyst pick some plausible mixture models for the data based on their qualitative fit. After using animations during exploratory data analysis, the analyst must then use other statistical tools, including but not limited to: Maximum Likelihood Estimation (MLE) to find the optimal parameters, Goodness of Fit (GoF) tests, and a variety of diagnostic plots to check the adequacy of the fit. Using a specific example with two LogNormal components, we illustrate the use of animated probability plots as a tool for exploring the suitability of a mixture model with two component distributions. Animations work well with other types of probability plots, and they may be extended to analyze mixture models with three or more component distributions.  相似文献   

A popular way to account for unobserved heterogeneity is to assume that the data are drawn from a finite mixture distribution. A barrier to using finite mixture models is that parameters that could previously be estimated in stages must now be estimated jointly: using mixture distributions destroys any additive separability of the log‐likelihood function. We show, however, that an extension of the EM algorithm reintroduces additive separability, thus allowing one to estimate parameters sequentially during each maximization step. In establishing this result, we develop a broad class of estimators for mixture models. Returning to the likelihood problem, we show that, relative to full information maximum likelihood, our sequential estimator can generate large computational savings with little loss of efficiency.  相似文献   

Variability is the heterogeneity of values within a population. Uncertainty refers to lack of knowledge regarding the true value of a quantity. Mixture distributions have the potential to improve the goodness of fit to data sets not adequately described by a single parametric distribution. Uncertainty due to random sampling error in statistics of interests can be estimated based upon bootstrap simulation. In order to evaluate the robustness of using mixture distribution as a basis for estimating both variability and uncertainty, 108 synthetic data sets generated from selected population mixture log-normal distributions were investigated, and properties of variability and uncertainty estimates were evaluated with respect to variation in sample size, mixing weight, and separation between components of mixtures. Furthermore, mixture distributions were compared with single-component distributions. Findings include: (1). mixing weight influences the stability of variability and uncertainty estimates; (2). bootstrap simulation results tend to be more stable for larger sample sizes; (3). when two components are well separated, the stability of bootstrap simulation is improved; however, a larger degree of uncertainty arises regarding the percentiles coinciding with the separated region; (4). when two components are not well separated, a single distribution may often be a better choice because it has fewer parameters and better numerical stability; and (5). dependencies exist in sampling distributions of parameters of mixtures and are influenced by the amount of separation between the components. An emission factor case study based upon NO(x) emissions from coal-fired tangential boilers is used to illustrate the application of the approach.  相似文献   

This paper shows how to use realized kernels to carry out efficient feasible inference on the ex post variation of underlying equity prices in the presence of simple models of market frictions. The weights can be chosen to achieve the best possible rate of convergence and to have an asymptotic variance which equals that of the maximum likelihood estimator in the parametric version of this problem. Realized kernels can also be selected to (i) be analyzed using endogenously spaced data such as that in data bases on transactions, (ii) allow for market frictions which are endogenous, and (iii) allow for temporally dependent noise. The finite sample performance of our estimators is studied using simulation, while empirical work illustrates their use in practice.  相似文献   

Two recently developed probabilistic multidimensional models for analyzing pairwise choice data are introduced, discussed in terms of their differential properties, and extended in several ways. The first one, the wandering vector model, was originally suggested by Carroll [12] and extended by De Soete and Carroll [30]. The second model, called the wandering ideal point model, is a more recently proposed [32] unfolding analog of the wandering vector model. A general maximum likelihood estimation method for fitting the various models described is mentioned, as well as a statistical test for assessing the goodness of fit. Finally, an application of the models is provided concerning consumer choice for some 14 brands of over-the-counter analgesics to illustrate how such models can be gainfully utilized for marketing decision making concerning product positioning.  相似文献   

ARCH and GARCH models directly address the dependency of conditional second moments, and have proved particularly valuable in modelling processes where a relatively large degree of fluctuation is present. These include financial time series, which can be particularly heavy tailed. However, little is known about properties of ARCH or GARCH models in the heavy–tailed setting, and no methods are available for approximating the distributions of parameter estimators there. In this paper we show that, for heavy–tailed errors, the asymptotic distributions of quasi–maximum likelihood parameter estimators in ARCH and GARCH models are nonnormal, and are particularly difficult to estimate directly using standard parametric methods. Standard bootstrap methods also fail to produce consistent estimators. To overcome these problems we develop percentile–t, subsample bootstrap approximations to estimator distributions. Studentizing is employed to approximate scale, and the subsample bootstrap is used to estimate shape. The good performance of this approach is demonstrated both theoretically and numerically.  相似文献   

Variability arises due to differences in the value of a quantity among different members of a population. Uncertainty arises due to lack of knowledge regarding the true value of a quantity for a given member of a population. We describe and evaluate two methods for quantifying both variability and uncertainty. These methods, bootstrap simulation and a likelihood-based method, are applied to three datasets. The datasets include a synthetic sample of 19 values from a Lognormal distribution, a sample of nine values obtained from measurements of the PCB concentration in leafy produce, and a sample of five values for the partitioning of chromium in the flue gas desulfurization system of coal-fired power plants. For each of these datasets, we employ the two methods to characterize uncertainty in the arithmetic mean and standard deviation, cumulative distribution functions based upon fitted parametric distributions, the 95th percentile of variability, and the 63rd percentile of uncertainty for the 81st percentile of variability. The latter is intended to show that it is possible to describe any point within the uncertain frequency distribution by specifying an uncertainty percentile and a variability percentile. Using the bootstrap method, we compare results based upon use of the method of matching moments and the method of maximum likelihood for fitting distributions to data. Our results indicate that with only 5–19 data points as in the datasets we have evaluated, there is substantial uncertainty based upon random sampling error. Both the boostrap and likelihood-based approaches yield comparable uncertainty estimates in most cases.  相似文献   

Applying a hockey stick parametric dose-response model to data on late or retarded development in Iraqi children exposed in utero to methylmercury, with mercury (Hg) exposure characterized by the peak Hg concentration in mothers'hair during pregnancy, Cox et al. calculated the "best statistical estimate" of the threshold for health effects as 10 ppm Hg in hair with a 95% range of uncertainty of between 0 and 13.6 ppm.(1)A new application of the hockey stick model to the Iraqi data shows, however, that the statistical upper limit of the threshold based on the hockey stick model could be as high as 255 ppm. Furthermore, the maximum likelihood estimate of the threshold using a different parametric model is virtually zero. These and other analyses demonstrate that threshold estimates based on parametric models exhibit high statistical variability and model dependency, and are highly sensitive to the precise definition of an abnormal response. Consequently, they are not a reliable basis for setting a reference dose (RfD) for methylmercury. Benchmark analyses and statistical analyses useful for deriving NOAELs are also presented. We believe these latter analyses—particularly the benchmark analyses—generally form a sounder basis for determining RfDs than the type of hockey stick analysis presented by Cox et al. However, the acute nature of the exposures, as well as other limitations in the Iraqi data suggest that other data may be more appropriate for determining acceptable human exposures to methylmercury.  相似文献   

A novel method was used to incorporate in vivo host–pathogen dynamics into a new robust outbreak model for legionellosis. Dose‐response and time‐dose‐response (TDR) models were generated for Legionella longbeachae exposure to mice via the intratracheal route using a maximum likelihood estimation approach. The best‐fit TDR model was then incorporated into two L. pneumophila outbreak models: an outbreak that occurred at a spa in Japan, and one that occurred in a Melbourne aquarium. The best‐fit TDR from the murine dosing study was the beta‐Poisson with exponential‐reciprocal dependency model, which had a minimized deviance of 32.9. This model was tested against other incubation distributions in the Japan outbreak, and performed consistently well, with reported deviances ranging from 32 to 35. In the case of the Melbourne outbreak, the exponential model with exponential dependency was tested against non‐time‐dependent distributions to explore the performance of the time‐dependent model with the lowest number of parameters. This model reported low minimized deviances around 8 for the Weibull, gamma, and lognormal exposure distribution cases. This work shows that the incorporation of a time factor into outbreak distributions provides models with acceptable fits that can provide insight into the in vivo dynamics of the host‐pathogen system.  相似文献   

We show how correctly to extend known methods for generating error bands in reduced form VAR's to overidentified models. We argue that the conventional pointwise bands common in the literature should be supplemented with measures of shape uncertainty, and we show how to generate such measures. We focus on bands that characterize the shape of the likelihood. Such bands are not classical confidence regions. We explain that classical confidence regions mix information about parameter location with information about model fit, and hence can be misleading as summaries of the implications of the data for the location of parameters. Because classical confidence regions also present conceptual and computational problems in multivariate time series models, we suggest that likelihood-based bands, rather than approximate confidence bands based on asymptotic theory, be standard in reporting results for this type of model.  相似文献   

This paper analyzes the properties of standard estimators, tests, and confidence sets (CS's) for parameters that are unidentified or weakly identified in some parts of the parameter space. The paper also introduces methods to make the tests and CS's robust to such identification problems. The results apply to a class of extremum estimators and corresponding tests and CS's that are based on criterion functions that satisfy certain asymptotic stochastic quadratic expansions and that depend on the parameter that determines the strength of identification. This covers a class of models estimated using maximum likelihood (ML), least squares (LS), quantile, generalized method of moments, generalized empirical likelihood, minimum distance, and semi‐parametric estimators. The consistency/lack‐of‐consistency and asymptotic distributions of the estimators are established under a full range of drifting sequences of true distributions. The asymptotic sizes (in a uniform sense) of standard and identification‐robust tests and CS's are established. The results are applied to the ARMA(1, 1) time series model estimated by ML and to the nonlinear regression model estimated by LS. In companion papers, the results are applied to a number of other models.  相似文献   

本文在对经典的和修正的Levy tempered stable分布进行研究的基础上,结合现实中金融资产收益分布的实际特征,分析Levy tempered stable分布在构建模拟金融资产价格过程的Levy Jump模型的优势。由于这类分布的概率密度函数不存在解析式,直接应用传统MLE方法进行参数估计存在困难。为此,根据特征函数与概率密度函数的等价关系,本文建立基于特征函数(CF)具有连续矩条件的GMM(简称CF-CGMM)的Levy tempered Stable分布参数估计方法。同时,利用恒生指数、上证指数、标准普尔500指数数据对以上分布和参数估计方法进行实证研究,并根据参数计算结果和统计假设检验,对不同Levy tempered Stable分布的拟和优度进行检验和比较。本文也在参数估计和统计检验工作的基础上,根据Levy tempered Stable分布模型中不同参数的含义,结合实证计算的结果,对恒生指数、上证指数、标准普尔500指数价格运动特征给出符合现实的解释。  相似文献   

Many environmental data sets, such as for air toxic emission factors, contain several values reported only as below detection limit. Such data sets are referred to as "censored." Typical approaches to dealing with the censored data sets include replacing censored values with arbitrary values of zero, one-half of the detection limit, or the detection limit. Here, an approach to quantification of the variability and uncertainty of censored data sets is demonstrated. Empirical bootstrap simulation is used to simulate censored bootstrap samples from the original data. Maximum likelihood estimation (MLE) is used to fit parametric probability distributions to each bootstrap sample, thereby specifying alternative estimates of the unknown population distribution of the censored data sets. Sampling distributions for uncertainty in statistics such as the mean, median, and percentile are calculated. The robustness of the method was tested by application to different degrees of censoring, sample sizes, coefficients of variation, and numbers of detection limits. Lognormal, gamma, and Weibull distributions were evaluated. The reliability of using this method to estimate the mean is evaluated by averaging the best estimated means of 20 cases for small sample size of 20. The confidence intervals for distribution percentiles estimated with bootstrap/MLE method compared favorably to results obtained with the nonparametric Kaplan-Meier method. The bootstrap/MLE method is illustrated via an application to an empirical air toxic emission factor data set.  相似文献   

Regional estimates of cryptosporidiosis risks from drinking water exposure were developed and validated, accounting for AIDS status and age. We constructed a model with probability distributions and point estimates representing Cryptosporidium in tap water, tap water consumed per day (exposure characterization); dose response, illness given infection, prolonged illness given illness; and three conditional probabilities describing the likelihood of case detection by active surveillance (health effects characterization). The model predictions were combined with population data to derive expected case numbers and incidence rates per 100,000 population, by age and AIDS status, borough specific and for New York City overall in 2000 (risk characterization). They were compared with same-year surveillance data to evaluate predictive ability, assumed to represent true incidence of waterborne cryptosporidiosis. The predicted mean risks, similar to previously published estimates for this region, overpredicted observed incidence-most extensively when accounting for AIDS status. The results suggest that overprediction may be due to conservative parameters applied to both non-AIDS and AIDS populations, and that biological differences for children need to be incorporated. Interpretations are limited by the unknown accuracy of available surveillance data, in addition to variability and uncertainty of model predictions. The model appears sensitive to geographical differences in AIDS prevalence. The use of surveillance data for validation and model parameters pertinent to susceptibility are discussed.  相似文献   

The two-stage mathematical model of carcinogenesis has been shown to be nonidentifiable whenever tumor incidence data alone is used to fit the model (Hanin and Yakovlev, 1996). This lack of identifiability implies that more than one parameter vector satisfies the optimization criteria for parameter estimation, e.g., maximum likelihood estimation. A question of greater concern to persons using the two-stage model of carcinogenesis is under what conditions can identifiable parameters be obtained from the observed experimental data. We outline how to obtain identifiable parameters for the two-stage model.  相似文献   

Li R  Englehardt JD  Li X 《Risk analysis》2012,32(2):345-359
Multivariate probability distributions, such as may be used for mixture dose‐response assessment, are typically highly parameterized and difficult to fit to available data. However, such distributions may be useful in analyzing the large electronic data sets becoming available, such as dose‐response biomarker and genetic information. In this article, a new two‐stage computational approach is introduced for estimating multivariate distributions and addressing parameter uncertainty. The proposed first stage comprises a gradient Markov chain Monte Carlo (GMCMC) technique to find Bayesian posterior mode estimates (PMEs) of parameters, equivalent to maximum likelihood estimates (MLEs) in the absence of subjective information. In the second stage, these estimates are used to initialize a Markov chain Monte Carlo (MCMC) simulation, replacing the conventional burn‐in period to allow convergent simulation of the full joint Bayesian posterior distribution and the corresponding unconditional multivariate distribution (not conditional on uncertain parameter values). When the distribution of parameter uncertainty is such a Bayesian posterior, the unconditional distribution is termed predictive. The method is demonstrated by finding conditional and unconditional versions of the recently proposed emergent dose‐response function (DRF). Results are shown for the five‐parameter common‐mode and seven‐parameter dissimilar‐mode models, based on published data for eight benzene–toluene dose pairs. The common mode conditional DRF is obtained with a 21‐fold reduction in data requirement versus MCMC. Example common‐mode unconditional DRFs are then found using synthetic data, showing a 71% reduction in required data. The approach is further demonstrated for a PCB 126‐PCB 153 mixture. Applicability is analyzed and discussed. Matlab® computer programs are provided.  相似文献   

The effect of bioaerosol size was incorporated into predictive dose‐response models for the effects of inhaled aerosols of Francisella tularensis (the causative agent of tularemia) on rhesus monkeys and guinea pigs with bioaerosol diameters ranging between 1.0 and 24 μm. Aerosol‐size‐dependent models were formulated as modification of the exponential and β‐Poisson dose‐response models and model parameters were estimated using maximum likelihood methods and multiple data sets of quantal dose‐response data for which aerosol sizes of inhaled doses were known. Analysis of F. tularensis dose‐response data was best fit by an exponential dose‐response model with a power function including the particle diameter size substituting for the rate parameter k scaling the applied dose. There were differences in the pathogen's aerosol‐size‐dependence equation and models that better represent the observed dose‐response results than the estimate derived from applying the model developed by the International Commission on Radiological Protection (ICRP, 1994) that relies on differential regional lung deposition for human particle exposure.  相似文献   

Based on results reported from the NHANES II Survey (the National Health and Nutrition Examination Survey II) for people living in the United States during 1976–1980, we use exploratory data analysis, probability plots, and the method of maximum likelihood to fit lognormal distributions to percentiles of body weight for males and females as a function of age from 6 months through 74 years. The results are immediately useful in probabilistic (and deterministic) risk assessments.  相似文献   

For several years machine learning methods have been proposed for risk classification. While machine learning methods have also been used for failure diagnosis and condition monitoring, to the best of our knowledge, these methods have not been used for probabilistic risk assessment. Probabilistic risk assessment is a subjective process. The problem of how well machine learning methods can emulate expert judgments is challenging. Expert judgments are based on mental shortcuts, heuristics, which are susceptible to biases. This paper presents a process for developing natural language-based probabilistic risk assessment models, applying deep learning algorithms to emulate experts’ quantified risk estimates. This allows the risk analyst to obtain an a priori risk assessment when there is limited information in the form of text and numeric data. Universal sentence embedding (USE) with gradient boosting regression (GBR) trees trained over limited structured data presented the most promising results. When we apply these models’ outputs to generate survival distributions for autonomous systems’ likelihood of loss with distance, we observe that for open water and ice shelf operating environments, the differences between the survival distributions generated by the machine learning algorithm and those generated by the experts are not statistically significant.  相似文献   

In this paper, we explore the differences between store sales models that allow for heterogeneity in marketing effects across stores and models that accommodate potential irregularities in sales response through the use of nonparametric estimation techniques. In particular, we investigate the following question: What benefits can we gain from incorporating store heterogeneity versus functional flexibility in sales response models concerning fit and predictive validity, as compared to a simple parametric store sales model? In an empirical study based on store-level data, we also compare the different model versions with respect to estimated price elasticities and resulting shapes for own- and cross-price effects. Our empirical results indicate that addressing heterogeneity is not advantageous in general, as model fit, predictive validity and the accuracy of price elasticities did not improve for many brands. In contrast, estimating sales response flexibly provides much more potential for statistical improvements and leads to different implications concerning price elasticities, too.  相似文献   

