首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
Finite mixture models, that is, weighted averages of parametric distributions, provide a powerful way to extend parametric families of distributions to fit data sets not adequately fit by a single parametric distribution. First-order finite mixture models have been widely used in the physical, chemical, biological, and social sciences for over 100 years. Using maximum likelihood estimation, we demonstrate how a first-order finite mixture model can represent the large variability in data collected by the U.S. Environmental Protection Agency for the concentration of Radon 222 in drinking water supplied from ground water, even when 28% of the data fall at or below the minimum reporting level. Extending the use of maximum likelihood, we also illustrate how a second-order finite mixture model can separate and represent both the variability and the uncertainty in the data set.  相似文献   

2.
The purpose of this paper is to undertake a statistical analysis to specify empirical distributions and to estimate univariate parametric probability distributions for air exchange rates for residential structures in the United States. To achieve this goal, we used data compiled by the Brookhaven National Laboratory using a method known as the perfluorocarbon tracer (PFT) technique. While these data are not fully representative of all areas of the country or all housing types, they are judged to be by far the best available. The analysis is characterized by four key points: the use of data for 2,844 households; a four-region breakdown based on heating degree days, a best available measure of climatic factors affecting air exchange rates; estimation of lognormal distributions as well as provision of empirical (frequency) distributions; and provision of these distributions for all of the data, for the data segmented by the four regions, for the data segmented by the four seasons, and for the data segmented by a 16 region by season breakdown. Except in a few cases, primarily for small sample sizes, air exchange rates were found to be well fit by lognormal distributions (adjusted R2 0.95). The empirical or lognormal distributions may be used in indoor air models or as input variables for probabilistic human health risk assessments.  相似文献   

3.
This research reports empirical distributions and estimated univariate parametric probability distributions for house volume and certain zone volumes within households for residential structures in the United States. The author derived the distributions from two separate databases. The volumes were found to be exceptionally well fit by lognormal distributions (adjusted R2 >> 0.95) in almost all cases. In addition, data from one database indicates that the correlation between house volume and air changes per hour is very weak.  相似文献   

4.
Using probability plots and Maximum Likelihood Estimation (MLE), we fit lognormal distributions to data compiled by Ershow et al. for daily intake of total water and tap water by three groups of women (controls, pregnant, and lactating; all between 15–49 years of age) in the United States. We also develop bivariate lognormal distributions for the joint distribution of water ingestion and body weight for these three groups. Overall, we recommend the marginal distributions for water intake as fit by MLE for use in human health risk assessments.  相似文献   

5.
Fish consumption rates play a critical role in the assessment of human health risks posed by the consumption of fish from chemically contaminated water bodies. Based on data from the 1989 Michigan Sport Anglers Fish Consumption Survey, we examined total fish consumption, consumption of self-caught fish, and consumption of Great Lakes fish for all adults, men, women, and certain higher risk subgroups such as anglers. We present average daily consumption rates as compound probability distributions consisting of a Bernoulli trial (to distinguish those who ate fish from those who did not) combined with a distribution (both empirical and parametric) for those who ate fish. We found that the average daily consumption rates for adults who ate fish are reasonably well fit by lognormal distributions. The compound distributions may be used as input variables for Monte Carlo simulations in public health risk assessments.  相似文献   

6.
In this paper, we explore the differences between store sales models that allow for heterogeneity in marketing effects across stores and models that accommodate potential irregularities in sales response through the use of nonparametric estimation techniques. In particular, we investigate the following question: What benefits can we gain from incorporating store heterogeneity versus functional flexibility in sales response models concerning fit and predictive validity, as compared to a simple parametric store sales model? In an empirical study based on store-level data, we also compare the different model versions with respect to estimated price elasticities and resulting shapes for own- and cross-price effects. Our empirical results indicate that addressing heterogeneity is not advantageous in general, as model fit, predictive validity and the accuracy of price elasticities did not improve for many brands. In contrast, estimating sales response flexibly provides much more potential for statistical improvements and leads to different implications concerning price elasticities, too.  相似文献   

7.
Many environmental data sets, such as for air toxic emission factors, contain several values reported only as below detection limit. Such data sets are referred to as "censored." Typical approaches to dealing with the censored data sets include replacing censored values with arbitrary values of zero, one-half of the detection limit, or the detection limit. Here, an approach to quantification of the variability and uncertainty of censored data sets is demonstrated. Empirical bootstrap simulation is used to simulate censored bootstrap samples from the original data. Maximum likelihood estimation (MLE) is used to fit parametric probability distributions to each bootstrap sample, thereby specifying alternative estimates of the unknown population distribution of the censored data sets. Sampling distributions for uncertainty in statistics such as the mean, median, and percentile are calculated. The robustness of the method was tested by application to different degrees of censoring, sample sizes, coefficients of variation, and numbers of detection limits. Lognormal, gamma, and Weibull distributions were evaluated. The reliability of using this method to estimate the mean is evaluated by averaging the best estimated means of 20 cases for small sample size of 20. The confidence intervals for distribution percentiles estimated with bootstrap/MLE method compared favorably to results obtained with the nonparametric Kaplan-Meier method. The bootstrap/MLE method is illustrated via an application to an empirical air toxic emission factor data set.  相似文献   

8.
Survival models are developed to predict response and time‐to‐response for mortality in rabbits following exposures to single or multiple aerosol doses of Bacillus anthracis spores. Hazard function models were developed for a multiple‐dose data set to predict the probability of death through specifying functions of dose response and the time between exposure and the time‐to‐death (TTD). Among the models developed, the best‐fitting survival model (baseline model) is an exponential dose–response model with a Weibull TTD distribution. Alternative models assessed use different underlying dose–response functions and use the assumption that, in a multiple‐dose scenario, earlier doses affect the hazard functions of each subsequent dose. In addition, published mechanistic models are analyzed and compared with models developed in this article. None of the alternative models that were assessed provided a statistically significant improvement in fit over the baseline model. The general approach utilizes simple empirical data analysis to develop parsimonious models with limited reliance on mechanistic assumptions. The baseline model predicts TTDs consistent with reported results from three independent high‐dose rabbit data sets. More accurate survival models depend upon future development of dose–response data sets specifically designed to assess potential multiple‐dose effects on response and time‐to‐response. The process used in this article to develop the best‐fitting survival model for exposure of rabbits to multiple aerosol doses of B. anthracis spores should have broad applicability to other host–pathogen systems and dosing schedules because the empirical modeling approach is based upon pathogen‐specific empirically‐derived parameters.  相似文献   

9.
Variability is the heterogeneity of values within a population. Uncertainty refers to lack of knowledge regarding the true value of a quantity. Mixture distributions have the potential to improve the goodness of fit to data sets not adequately described by a single parametric distribution. Uncertainty due to random sampling error in statistics of interests can be estimated based upon bootstrap simulation. In order to evaluate the robustness of using mixture distribution as a basis for estimating both variability and uncertainty, 108 synthetic data sets generated from selected population mixture log-normal distributions were investigated, and properties of variability and uncertainty estimates were evaluated with respect to variation in sample size, mixing weight, and separation between components of mixtures. Furthermore, mixture distributions were compared with single-component distributions. Findings include: (1). mixing weight influences the stability of variability and uncertainty estimates; (2). bootstrap simulation results tend to be more stable for larger sample sizes; (3). when two components are well separated, the stability of bootstrap simulation is improved; however, a larger degree of uncertainty arises regarding the percentiles coinciding with the separated region; (4). when two components are not well separated, a single distribution may often be a better choice because it has fewer parameters and better numerical stability; and (5). dependencies exist in sampling distributions of parameters of mixtures and are influenced by the amount of separation between the components. An emission factor case study based upon NO(x) emissions from coal-fired tangential boilers is used to illustrate the application of the approach.  相似文献   

10.
This note suggests the use of Bézier curves to model probability distributions on computers. This represents an approach completely different from the current practice, which mostly employs parametric families or piece-wise polynomials. The Bézier curves combine simplicity and flexibility with an easy manipulation method, allowing more accurate curves to be modeled to data.  相似文献   

11.
Today, chemical risk and safety assessments rely heavily on the estimation of environmental fate by models. The key compound‐related properties in such models describe partitioning and reactivity. Uncertainty in determining these properties can be separated into random and systematic (incompleteness) components, requiring different types of representation. Here, we evaluate two approaches that are suitable to treat also systematic errors, fuzzy arithmetic, and probability bounds analysis. When a best estimate (mode) and a range can be computed for an input parameter, then it is possible to characterize the uncertainty with a triangular fuzzy number (possibility distribution) or a corresponding probability box bound by two uniform distributions. We use a five‐compartment Level I fugacity model and reported empirical data from the literature for three well‐known environmental pollutants (benzene, pyrene, and DDT) as illustrative cases for this evaluation. Propagation of uncertainty by discrete probability calculus or interval arithmetic can be done at a low computational cost and gives maximum flexibility in applying different approaches. Our evaluation suggests that the difference between fuzzy arithmetic and probability bounds analysis is small, at least for this specific case. The fuzzy arithmetic approach can, however, be regarded as less conservative than probability bounds analysis if the assumption of independence is removed. Both approaches are sensitive to repeated parameters that may inflate the uncertainty estimate. Uncertainty described by probability boxes was therefore also propagated through the model by Monte Carlo simulation to show how this problem can be avoided.  相似文献   

12.
Li R  Englehardt JD  Li X 《Risk analysis》2012,32(2):345-359
Multivariate probability distributions, such as may be used for mixture dose‐response assessment, are typically highly parameterized and difficult to fit to available data. However, such distributions may be useful in analyzing the large electronic data sets becoming available, such as dose‐response biomarker and genetic information. In this article, a new two‐stage computational approach is introduced for estimating multivariate distributions and addressing parameter uncertainty. The proposed first stage comprises a gradient Markov chain Monte Carlo (GMCMC) technique to find Bayesian posterior mode estimates (PMEs) of parameters, equivalent to maximum likelihood estimates (MLEs) in the absence of subjective information. In the second stage, these estimates are used to initialize a Markov chain Monte Carlo (MCMC) simulation, replacing the conventional burn‐in period to allow convergent simulation of the full joint Bayesian posterior distribution and the corresponding unconditional multivariate distribution (not conditional on uncertain parameter values). When the distribution of parameter uncertainty is such a Bayesian posterior, the unconditional distribution is termed predictive. The method is demonstrated by finding conditional and unconditional versions of the recently proposed emergent dose‐response function (DRF). Results are shown for the five‐parameter common‐mode and seven‐parameter dissimilar‐mode models, based on published data for eight benzene–toluene dose pairs. The common mode conditional DRF is obtained with a 21‐fold reduction in data requirement versus MCMC. Example common‐mode unconditional DRFs are then found using synthetic data, showing a 71% reduction in required data. The approach is further demonstrated for a PCB 126‐PCB 153 mixture. Applicability is analyzed and discussed. Matlab® computer programs are provided.  相似文献   

13.
Using exploratory data analysis, probability plots, scatterplots, and computer animations to rotate and visualize the data, we fit a trivariate Normal distribution to data for the height, the natural logarithm of body weight, and the body fat for 646 men between the ages of 50 and 80 years as reported by the medical staff of the U.S. Veterans Administration's “Normative Aging Study” in Boston, MA. Although these data do not include any children, women, or young men, the measurements represent the best data that we could find through a 4-year search. We believe that these data are well measured and reliable for men in the specified age range and that these data reveal an interesting statistical pattern for use in probabilistic PBPK models.  相似文献   

14.
Based on results reported from the NHANES II Survey (the National Health and Nutrition Examination Survey II) for people living in the United States during 1976–1980, we use exploratory data analysis, probability plots, and the method of maximum likelihood to fit lognormal distributions to percentiles of body weight for males and females as a function of age from 6 months through 74 years. The results are immediately useful in probabilistic (and deterministic) risk assessments.  相似文献   

15.
Moment‐matching discrete distributions were developed by Miller and Rice (1983) as a method to translate continuous probability distributions into discrete distributions for use in decision and risk analysis. Using gaussian quadrature, they showed that an n‐point discrete distribution can be constructed that exactly matches the first 2n ‐ 1 moments of the underlying distribution. These moment‐matching discrete distributions offer several theoretical advantages over the typical discrete approximations as shown in Smith (1993), but they also pose practical problems. In particular, how does the analyst estimate the moments given only the subjective assessments of the continuous probability distribution? Smith suggests that the moments can be estimated by fitting a distribution to the assessments. This research note shows that the quality of the moment estimates cannot be judged solely by how close the fitted distribution is to the true distribution. Examples are used to show that the relative errors in higher order moment estimates can be greater than 100%, even though the cumulative distribution function is estimated within a Kolmogorov‐Smirnov distance less than 1%.  相似文献   

16.
《Risk analysis》2018,38(10):2073-2086
The guidelines for setting environmental quality standards are increasingly based on probabilistic risk assessment due to a growing general awareness of the need for probabilistic procedures. One of the commonly used tools in probabilistic risk assessment is the species sensitivity distribution (SSD), which represents the proportion of species affected belonging to a biological assemblage as a function of exposure to a specific toxicant. Our focus is on the inverse use of the SSD curve with the aim of estimating the concentration, HCp, of a toxic compound that is hazardous to p% of the biological community under study. Toward this end, we propose the use of robust statistical methods in order to take into account the presence of outliers or apparent skew in the data, which may occur without any ecological basis. A robust approach exploits the full neighborhood of a parametric model, enabling the analyst to account for the typical real‐world deviations from ideal models. We examine two classic HCp estimation approaches and consider robust versions of these estimators. In addition, we also use data transformations in conjunction with robust estimation methods in case of heteroscedasticity. Different scenarios using real data sets as well as simulated data are presented in order to illustrate and compare the proposed approaches. These scenarios illustrate that the use of robust estimation methods enhances HCp estimation.  相似文献   

17.
The article proposes and investigates the performance of two Bayesian nonparametric estimation procedures in the context of benchmark dose estimation in toxicological animal experiments. The methodology is illustrated using several existing animal dose‐response data sets and is compared with traditional parametric methods available in standard benchmark dose estimation software (BMDS), as well as with a published model‐averaging approach and a frequentist nonparametric approach. These comparisons together with simulation studies suggest that the nonparametric methods provide a lot of flexibility in terms of model fit and can be a very useful tool in benchmark dose estimation studies, especially when standard parametric models fail to fit to the data adequately.  相似文献   

18.
We consider the problem of estimating the probability of detection (POD) of flaws in an industrial steel component. Modeled as an increasing function of the flaw height, the POD characterizes the detection process; it is also involved in the estimation of the flaw size distribution, a key input parameter of physical models describing the behavior of the steel component when submitted to extreme thermodynamic loads. Such models are used to assess the resistance of highly reliable systems whose failures are seldom observed in practice. We develop a Bayesian method to estimate the flaw size distribution and the POD function, using flaw height measures from periodic in‐service inspections conducted with an ultrasonic detection device, together with measures from destructive lab experiments. Our approach, based on approximate Bayesian computation (ABC) techniques, is applied to a real data set and compared to maximum likelihood estimation (MLE) and a more classical approach based on Markov Chain Monte Carlo (MCMC) techniques. In particular, we show that the parametric model describing the POD as the cumulative distribution function (cdf) of a log‐normal distribution, though often used in this context, can be invalidated by the data at hand. We propose an alternative nonparametric model, which assumes no predefined shape, and extend the ABC framework to this setting. Experimental results demonstrate the ability of this method to provide a flexible estimation of the POD function and describe its uncertainty accurately.  相似文献   

19.
The National Collegiate Athletic Association׳s (NCAA) men׳s Division I college basketball tournament is an annual competition that draws widespread attention in the United States. Predicting the winner of each game is a popular activity undertaken by numerous websites, fans, and more recently, academic researchers. This paper analyzes the 29 tournaments from 1985 to 2013, and presents two models to capture the winning seed distribution (i.e., a probability distribution modeling the winners of each round). The Exponential Model uses the exponential random variable to model the waiting time between a seed׳s successive winnings in a round. The Markov Model uses Markov chains to estimate the winning seed distributions by considering a seed׳s total number of winnings in previous tournaments. The proposed models allow one to estimate the likelihoods of different seed combinations by applying the estimated winning seed distributions, which accurately summarize aggregate performance of the seeds. Moreover, the proposed models show that the winning rate of seeds is not a monotonically decreasing function of the seed number. Results of the proposed models are validated using a chi-squared goodness of fit test and compared to the frequency of observed events.  相似文献   

20.
We investigate a class of semiparametric ARCH(∞) models that includes as a special case the partially nonparametric (PNP) model introduced by Engle and Ng (1993) and which allows for both flexible dynamics and flexible function form with regard to the “news impact” function. We show that the functional part of the model satisfies a type II linear integral equation and give simple conditions under which there is a unique solution. We propose an estimation method that is based on kernel smoothing and profiled likelihood. We establish the distribution theory of the parametric components and the pointwise distribution of the nonparametric component of the model. We also discuss efficiency of both the parametric part and the nonparametric part. We investigate the performance of our procedures on simulated data and on a sample of S&P500 index returns. We find evidence of asymmetric news impact functions, consistent with the parametric analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号