首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
With the rapid increase in the ability to store and analyze large amounts of data, organizations are gathering extensive data regarding their customers, vendors, and other entities. There has been a concurrent increase in the demand for preserving the privacy of confidential data that may be collected. The rapid growth of e‐commerce has also increased calls for maintaining privacy and confidentiality of data. For numerical data, data perturbation methods offer an easy yet effective solution to the dilemma of providing access to legitimate users while protecting the data from snoopers (legitimate users who perform illegitimate analysis). In this study, we define a new security requirement that achieves the objective of providing access to legitimate users without an increase in the ability of a snooper to predict confidential information. We also derive the specifications under which perturbation methods can achieve this objective. Numerical examples are provided to show that the use of the new specification achieves the objective of no additional information to the snooper. Implications of the new specification for e‐commerce are discussed.  相似文献   

2.
National Statistical Agencies and other data custodian agencies hold a wealth of data regarding individuals and organizations, collected from censuses, surveys and administrative sources. In many cases, these data are made available to external researchers, for the investigation of questions of social and economic importance. To enhance access to this information, several national statistical agencies are developing remote analysis systems (RAS) designed to accept queries from a researcher, run them on data held in a secure environment, and then return the results. RAS prevent a researcher from accessing the underlying data, and most rely on manual checking to ensure the responses have acceptably low disclosure risk. However, the need for scalability and consistency will increasingly require automated methods. We propose a RAS output confidentialization procedure based on statistical bootstrapping that automates disclosure control while achieving a provably good balance between disclosure risk and usefulness of the responses. The bootstrap masking mechanism is easy to implement for most statistical queries, yet the characteristics of the bootstrap distribution assure us that it is also effective in providing both useful responses and low disclosure risk. Interestingly, our proposed bootstrap masking mechanism represents an ideal application of Efron's bootstrap—one that takes advantage of all the theoretical properties of the bootstrap, without ever having to construct the bootstrap distribution.  相似文献   

3.
A Flexible Count Data Regression Model for Risk Analysis   总被引:1,自引:0,他引:1  
In many cases, risk and reliability analyses involve estimating the probabilities of discrete events such as hardware failures and occurrences of disease or death. There is often additional information in the form of explanatory variables that can be used to help estimate the likelihood of different numbers of events in the future through the use of an appropriate regression model, such as a generalized linear model. However, existing generalized linear models (GLM) are limited in their ability to handle the types of variance structures often encountered in using count data in risk and reliability analysis. In particular, standard models cannot handle both underdispersed data (variance less than the mean) and overdispersed data (variance greater than the mean) in a single coherent modeling framework. This article presents a new GLM based on a reformulation of the Conway-Maxwell Poisson (COM) distribution that is useful for both underdispersed and overdispersed count data and demonstrates this model by applying it to the assessment of electric power system reliability. The results show that the proposed COM GLM can provide as good of fits to data as the commonly used existing models for overdispered data sets while outperforming these commonly used models for underdispersed data sets.  相似文献   

4.
Survey research is often deployed in the study of situational issues facing organizations and functions within organizations. One particular survey research approach can be described as follows: (1) survey questionnaires involving perceptual questions about a situational issue are administered to key informants, one key informant per unit of analysis; (2) key informants vary in a transparent manner across units of analysis such that groups of these key informants are discernible; and (3) perceptual responses, after data collection, are then pooled to create a single larger data set for subsequent statistical manipulations. In this methodological note, we draw attention to this particular survey research approach and ask the question: When is it appropriate to pool data provided by key informants with transparently different demographics across units of analysis so as to create a single larger data set for statistical manipulations? We use a simple example and data from a published study to motivate the relevance and gravity of this methodological question. Offering the concept and empirical assessment of measurement equivalence as the answer to this methodological question of data pooling, we prescribe and demonstrate, with the total quality management→customer satisfaction relationship, the procedural steps for evaluating the seven subdimensions of measurement equivalence. In conclusion, we highlight methods that should be adopted, before data collection, to minimize the risk of violating measurement equivalence. After data collection and for the instances when the empirical assessment for measurement equivalence advises against pooling of such data, we also offer suggestions for analyzing such data and presenting associated statistical results.  相似文献   

5.
Quantitative risk assessments for physical, chemical, biological, occupational, or environmental agents rely on scientific studies to support their conclusions. These studies often include relatively few observations, and, as a result, models used to characterize the risk may include large amounts of uncertainty. The motivation, development, and assessment of new methods for risk assessment is facilitated by the availability of a set of experimental studies that span a range of dose‐response patterns that are observed in practice. We describe construction of such a historical database focusing on quantal data in chemical risk assessment, and we employ this database to develop priors in Bayesian analyses. The database is assembled from a variety of existing toxicological data sources and contains 733 separate quantal dose‐response data sets. As an illustration of the database's use, prior distributions for individual model parameters in Bayesian dose‐response analysis are constructed. Results indicate that including prior information based on curated historical data in quantitative risk assessments may help stabilize eventual point estimates, producing dose‐response functions that are more stable and precisely estimated. These in turn produce potency estimates that share the same benefit. We are confident that quantitative risk analysts will find many other applications and issues to explore using this database.  相似文献   

6.
This article proposes a methodology for incorporating electrical component failure data into the human error assessment and reduction technique (HEART) for estimating human error probabilities (HEPs). The existing HEART method contains factors known as error-producing conditions (EPCs) that adjust a generic HEP to a more specific situation being assessed. The selection and proportioning of these EPCs are at the discretion of an assessor, and are therefore subject to the assessor's experience and potential bias. This dependence on expert opinion is prevalent in similar HEP assessment techniques used in numerous industrial areas. The proposed method incorporates factors based on observed trends in electrical component failures to produce a revised HEP that can trigger risk mitigation actions more effectively based on the presence of component categories or other hazardous conditions that have a history of failure due to human error. The data used for the additional factors are a result of an analysis of failures of electronic components experienced during system integration and testing at NASA Goddard Space Flight Center. The analysis includes the determination of root failure mechanisms and trend analysis. The major causes of these defects were attributed to electrostatic damage, electrical overstress, mechanical overstress, or thermal overstress. These factors representing user-induced defects are quantified and incorporated into specific hardware factors based on the system's electrical parts list. This proposed methodology is demonstrated with an example comparing the original HEART method and the proposed modified technique.  相似文献   

7.
Developmental anomalies induced by toxic chemicals may be identified using laboratory experiments with rats, mice or rabbits. Multinomial responses of fetuses from the same mother are often positively correlated, resulting in overdispersion relative to multinomial variation. In this article, a simple data transformation based on the concept of generalized design effects due to Rao-Scott is proposed for dose-response modeling of developmental toxicity. After scaling the original multinomial data using the average design effect, standard methods for analysis of uncorrected multinomial data can be applied. Benchmark doses derived using this approach are comparable to those obtained using generalized estimating equations with an extended Dirichlet-trinomial covariance function to describe the dispersion of the original data. This empirical agreement, coupled with a large sample theoretical justification of the Rao-Scott transformation, confirms the applicability of the statistical methods proposed in this article for developmental toxicity risk assessment.  相似文献   

8.
网络流量异常检测及分析是网络及安全管理领域的重要研究内容.本文探讨了网络流量异常的种类、网络流量异常检测的方法,分析了基于传统检测方法在网络流量异常检测应用中存在的问题.并重点对基于流数据模型的网络流量异常检测进行了研究,综述了已有流数据挖掘研究方法在网络流量异常检测中的研究进展.最后,本文对现有研究工作存在的问题及未来的研究方向进行了探讨.  相似文献   

9.
I use an analogy with the history of physical measurements, population and energy projections, and analyze the trends in several data sets to quantify the overconfidence of the experts in the reliability of their uncertainty estimates. Data sets include (i) time trends in the sequential measurements of the same physical quantity; (ii) national population projections; and (iii) projections for the U.S., energy sector. Probabilities of large deviations for the true values are parametrized by an exponential distribution with the slope determined by the data. Statistics of past errors can be used in probabilistic risk assessment to hedge against unsuspected uncertainties and to include the possibility of human error into the framework of uncertainty analysis. By means of a sample Monte Carlo simulation of cancer risk caused by ingestion of benzene in soil, I demonstrate how the upper 95th percentiles of risk are changed when unsuspected uncertainties are included. I recommend to inflate the estimated uncertainties by default safety factors determined from the relevant historical data sets.  相似文献   

10.
Longitudinal data are important in exposure and risk assessments, especially for pollutants with long half‐lives in the human body and where chronic exposures to current levels in the environment raise concerns for human health effects. It is usually difficult and expensive to obtain large longitudinal data sets for human exposure studies. This article reports a new simulation method to generate longitudinal data with flexible numbers of subjects and days. Mixed models are used to describe the variance‐covariance structures of input longitudinal data. Based on estimated model parameters, simulation data are generated with similar statistical characteristics compared to the input data. Three criteria are used to determine similarity: the overall mean and standard deviation, the variance components percentages, and the average autocorrelation coefficients. Upon the discussion of mixed models, a simulation procedure is produced and numerical results are shown through one human exposure study. Simulations of three sets of exposure data successfully meet above criteria. In particular, simulations can always retain correct weights of inter‐ and intrasubject variances as in the input data. Autocorrelations are also well followed. Compared with other simulation algorithms, this new method stores more information about the input overall distribution so as to satisfy the above multiple criteria for statistical targets. In addition, it generates values from numerous data sources and simulates continuous observed variables better than current data methods. This new method also provides flexible options in both modeling and simulation procedures according to various user requirements.  相似文献   

11.
动态综合评价中的数据预处理方法研究   总被引:1,自引:0,他引:1  
数据预处理是综合评价的基础,通常包括指标类型一致化处理和无量纲化处理,指标值预处理结果会对后续的评价结论产生较大的影响。针对利用现有静态数据预处理方法对动态综合评价中的三维数据进行处理时会消除数据中隐含增量信息的问题,提出一种动态综合评价的数据预处理方法来解决该问题。通过介绍现有常用评价指标预处理方法,并分析这些预处理方法的特点,扬长避短,提出全局改进归一化方法,作为动态综合评价中指标一致化及无量纲化方法,对指标值进行预处理。最后通过算例比较分析,说明各方法特点,验证全局改进归一化方法是一种有效的动态综合评价数据预处理方法。  相似文献   

12.
Intrusion detection systems help network administrators prepare for and deal with network security attacks. These systems collect information from a variety of systems and network sources, and analyze them for signs of intrusion and misuse. A variety of techniques have been employed for analysis ranging from traditional statistical methods to new data mining approaches. In this study the performance of three data mining methods in detecting network intrusion is examined. An experimental design (3times2x2) is created to evaluate the impact of three data mining methods, two data representation formats, and two data proportion schemes on the classification accuracy of intrusion detection systems. The results indicate that data mining methods and data proportion have a significant impact on classification accuracy. Within data mining methods, rough sets provide better accuracy, followed by neural networks and inductive learning. Balanced data proportion performs better than unbalanced data proportion. There are no major differences in performance between binary and integer data representation.  相似文献   

13.
There are often several data sets that may be used in developing a quantitative risk estimate for a carcinogen. These estimates are usually based, however, on the dose-response data for tumor incidences from a single sex/strain/species of animal. When appropriate, the use of more data should result in a higher level of confidence in the risk estimate. The decision to use more than one data set (e.g., representing different animal sexes, strains, species, or tumor sites) can be made following biological and statistical analyses of the compatibility of these data sets. Biological analysis involves consideration of factors such as the relevance of the animal models, study design and execution, dose selection and route of administration, the mechanism of action of the agent, its pharmacokinetics, any species- and/or sex-specific effects, and tumor site specificity. If the biological analysis does not prohibit combining data sets, statistical compatibility of the data sets is then investigated. A generalized likelihood ratio test is proposed for determining the compatibility of different data sets with respect to a common dose-response model, such as the linearized multistage model. The biological and statistical factors influencing the decision to combine data sets are described, followed by a case study of bromodichloromethane.  相似文献   

14.
Many environmental data sets, such as for air toxic emission factors, contain several values reported only as below detection limit. Such data sets are referred to as "censored." Typical approaches to dealing with the censored data sets include replacing censored values with arbitrary values of zero, one-half of the detection limit, or the detection limit. Here, an approach to quantification of the variability and uncertainty of censored data sets is demonstrated. Empirical bootstrap simulation is used to simulate censored bootstrap samples from the original data. Maximum likelihood estimation (MLE) is used to fit parametric probability distributions to each bootstrap sample, thereby specifying alternative estimates of the unknown population distribution of the censored data sets. Sampling distributions for uncertainty in statistics such as the mean, median, and percentile are calculated. The robustness of the method was tested by application to different degrees of censoring, sample sizes, coefficients of variation, and numbers of detection limits. Lognormal, gamma, and Weibull distributions were evaluated. The reliability of using this method to estimate the mean is evaluated by averaging the best estimated means of 20 cases for small sample size of 20. The confidence intervals for distribution percentiles estimated with bootstrap/MLE method compared favorably to results obtained with the nonparametric Kaplan-Meier method. The bootstrap/MLE method is illustrated via an application to an empirical air toxic emission factor data set.  相似文献   

15.
We propose a novel methodology for evaluating the accuracy of numerical solutions to dynamic economic models. It consists in constructing a lower bound on the size of approximation errors. A small lower bound on errors is a necessary condition for accuracy: If a lower error bound is unacceptably large, then the actual approximation errors are even larger, and hence, the approximation is inaccurate. Our lower‐bound error analysis is complementary to the conventional upper‐error (worst‐case) bound analysis, which provides a sufficient condition for accuracy. As an illustration of our methodology, we assess approximation in the first‐ and second‐order perturbation solutions for two stylized models: a neoclassical growth model and a new Keynesian model. The errors are small for the former model but unacceptably large for the latter model under some empirically relevant parameterizations.  相似文献   

16.
《Risk analysis》2018,38(1):194-209
This article presents the findings from a numerical simulation study that was conducted to evaluate the performance of alternative statistical analysis methods for background screening assessments when data sets are generated with incremental sampling methods (ISMs). A wide range of background and site conditions are represented in order to test different ISM sampling designs. Both hypothesis tests and upper tolerance limit (UTL) screening methods were implemented following U.S. Environmental Protection Agency (USEPA) guidance for specifying error rates. The simulations show that hypothesis testing using two‐sample t ‐tests can meet standard performance criteria under a wide range of conditions, even with relatively small sample sizes. Key factors that affect the performance include unequal population variances and small absolute differences in population means. UTL methods are generally not recommended due to conceptual limitations in the technique when applied to ISM data sets from single decision units and due to insufficient power given standard statistical sample sizes from ISM.  相似文献   

17.
The benchmark dose (BMD) approach has gained acceptance as a valuable risk assessment tool, but risk assessors still face significant challenges associated with selecting an appropriate BMD/BMDL estimate from the results of a set of acceptable dose‐response models. Current approaches do not explicitly address model uncertainty, and there is an existing need to more fully inform health risk assessors in this regard. In this study, a Bayesian model averaging (BMA) BMD estimation method taking model uncertainty into account is proposed as an alternative to current BMD estimation approaches for continuous data. Using the “hybrid” method proposed by Crump, two strategies of BMA, including both “maximum likelihood estimation based” and “Markov Chain Monte Carlo based” methods, are first applied as a demonstration to calculate model averaged BMD estimates from real continuous dose‐response data. The outcomes from the example data sets examined suggest that the BMA BMD estimates have higher reliability than the estimates from the individual models with highest posterior weight in terms of higher BMDL and smaller 90th percentile intervals. In addition, a simulation study is performed to evaluate the accuracy of the BMA BMD estimator. The results from the simulation study recommend that the BMA BMD estimates have smaller bias than the BMDs selected using other criteria. To further validate the BMA method, some technical issues, including the selection of models and the use of bootstrap methods for BMDL derivation, need further investigation over a more extensive, representative set of dose‐response data.  相似文献   

18.
数据挖掘技术在商业银行中的应用   总被引:1,自引:0,他引:1  
数据挖掘能够有效分析商业银行数据库中的信息,将其转化为知识为银行的经验决策服务。本文在介绍数据挖掘技术及其主要任务的基础上,总结了数据挖掘在商业银行业务中的主要应用领域为客户关系管理、风险管理和金融欺诈监测,并具体介绍了数据挖掘技术在上述几个领域内的应用。  相似文献   

19.
A conventional dose–response function can be refitted as additional data become available. A predictive dose–response function in contrast does not require a curve-fitting step, only additional data and presents the unconditional probabilities of illness, reflecting the level of information it contains. In contrast, the predictive Bayesian dose–response function becomes progressively less conservative as more information is included. This investigation evaluated the potential for using predictive Bayesian methods to develop a dose–response for human infection that improves on existing models, to show how predictive Bayesian statistical methods can utilize additional data, and expand the Bayesian methods for a broad audience including those concerned about an oversimplification of dose–response curve use in quantitative microbial risk assessment (QMRA). This study used a dose–response relationship incorporating six separate data sets for Cryptosporidium parvum. A Pareto II distribution with known priors was applied to one of the six data sets to calibrate the model, while the others were used for subsequent updating. While epidemiological principles indicate that local variations, host susceptibility, and organism strain virulence may vary, the six data sets all appear to be well characterized using the Bayesian approach. The adaptable model was applied to an existing data set for Campylobacter jejuni for model validation purposes, which yielded results that demonstrate the ability to analyze a dose–response function with limited data using and update those relationships with new data. An analysis of the goodness of fit compared to the beta-Poisson methods also demonstrated correlation between the predictive Bayesian model and the data.  相似文献   

20.
Psychometric data on risk perceptions are often collected using the method developed by Slovic, Fischhoff, and Lichtenstein, where an array of risk issues are evaluated with respect to a number of risk characteristics, such as how dreadful, catastrophic or involuntary exposure to each risk is. The analysis of these data has often been carried out at an aggregate level, where mean scores for all respondents are compared between risk issues. However, this approach may conceal important variation between individuals, and individual analyses have also been performed for single risk issues. This paper presents a new methodological approach using a technique called multilevel modelling for analysing individual and aggregated responses simultaneously, to produce unconditional and unbiased results at both individual and aggregate levels of the data. Two examples are given using previously published data sets on risk perceptions collected by the authors, and results between the traditional and new approaches compared. The discussion focuses on the implications of and possibilities provided by the new methodology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号