首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Statistical database management systems keep raw, elementary and/or aggregated data and include query languages with facilities to calculate various statistics from this data. In this article we examine statistical database query languages with respect to the criteria identified and taxonomy developed in Ozsoyoglu and Ozsoyoglu (1985b). The criteria include statistical metadata and objects, aggregation features and interface to statistical packages. The taxonomy of statistical database query languages classifies them with respect to the data model used, the type of user interface and method of implementation. Temporal databases are rich sources of data for statistical analysis. Aggregation features of temporal query languages, as well as the issues in calculating aggregates from temporal data, are also examined.  相似文献   

3.
Fault detection and Isolation takes a strategic position in modern industrial processes for which various approaches are proposed. These approaches are usually developed and based on a consistency test between an observed state of the process provided by sensors and an expected behaviour provided by a mathematical model of the system. These methods require a reliable model of the system to be monitored which is a complex task. Alternatively, we propose in this paper to use blind source separation filters (BSSFs) in order to detect and isolate faults in a three tank pilot plant. This technique is very beneficial as it uses blind identification without an explicit mathematical model of the system. The independent component analysis (ICA), relying on the assumption of the statistical independence of the extracted sources, is used as a tool for each BSSF to extract signals of the process under consideration. The experimental results show the effectiveness and robustness of this approach in detecting and isolating faults that are on sensors in the system.  相似文献   

4.
Distributed agent-based simulation is a popular method to realize computational experiment on large-scale artificial society. Meanwhile, the partitioning strategy of the artificial society models among hosts is playing an essential role for simulation engine to offer high execution efficiency as it has great impact on the communication overheads and computational load-balancing during simulation. Aiming at the problem, we firstly analyze the execution and scheduling process of agents during simulation and model it as wide-sense cyclostationary random process. Then, a static statistical partitioning model is proposed to obtain the optimal partitioning strategy with minimum average communication cost and load imbalance factor. To solve the static statistical partitioning model, this paper turns it into a graph-partitioning problem. A statistical movement graph-based partitioning algorithm is then devised which generates task graph model by mining the statistical movement information from initialization data of simulation model. In the experiments, two other popular partitioning methods are used to evaluate the performance of proposed graph-partitioning algorithm. Furthermore, this paper compares the graph-partitioning performance under different task graph model. The results indicate that our proposed statistical movement graph-based static partitioning method outperforms all other methods in reducing the communication overhead while satisfying the load balance constraint.  相似文献   

5.
孙怡帆等 《统计研究》2019,36(3):124-128
从大量基因中识别出致病基因是大数据下的一个十分重要的高维统计问题。基因间网络结构的存在使得对于致病基因的识别已从单个基因识别扩展到基因模块识别。从基因网络中挖掘出基因模块就是所谓的社区发现(或节点聚类)问题。绝大多数社区发现方法仅利用网络结构信息,而忽略节点本身的信息。Newman和Clauset于2016年提出了一个将二者有机结合的基于统计推断的社区发现方法(简称为NC方法)。本文以NC方法为案例,介绍统计方法在实际基因网络中的应用和取得的成果,并从统计学角度提出了改进措施。通过对NC方法的分析可以看出对于以基因网络为代表的非结构化数据,统计思想和原理在数据分析中仍然处于核心地位。而相应的统计方法则需要针对数据的特点及关心的问题进行相应的调整和优化。  相似文献   

6.
This study presents statistical techniques to obtain local approximate query answers for aggregate multivariate materialized views thus eliminating the need for repetitive scanning of the source data. In widely distributed management information systems, detailed data do not necessarily reside in the same physical location as the decision-maker; thus, requiring scanning of the source data as needed by the query demand. Decision-making, business intelligence and data analysis could involve multiple data sources, data diversity, aggregates and large amounts of data. Management often confronts delays in information acquisition from remote sites. Management decisions usually involve analyses that require the most precise summary data available. These summaries are readily available from data warehouses and can be used to estimate or approximate data in exchange for a quicker response. An approach to supporting aggregate materialized view management is proposed that reconstructs data sets locally using posterior parameter estimates based on sufficient statistics in a log-linear model with a multinomial likelihood.  相似文献   

7.
对某个具体的政府行政职能部门而言,其数据供给和需求常常是不对称的。如何构建一套统计体系,将与其有关的统计数据信息采集、加工、应用、发布通过一个信息系统来实现,以满足其对外的数据供给和对内的数据需求?本文以北京市住建委为例,给出了政府行政职能部门统计体系设计的四个基本步骤:解析行政管理职能,构造统计体系的基本框架,分模块进行具体统计内容设计,落实统计报送周期、数据来源和传输渠道。  相似文献   

8.
中国区域间投入产出模型研制方法研究   总被引:3,自引:0,他引:3       下载免费PDF全文
 本文提出了一个新的区域间贸易系数的估算模型,对中国区域间投入产出模型研制方法和具体步骤做了进一步研究和完善,并研制了2002年和2007年中国区域间投入产出模型。区域间投入产出模型的结构使其很难具备比较完整、准确的调查统计数据基础,因此决定了它的研制工作必须将调查统计数据与科学推算方法密切结合。在模型研制当中,我们结合国家统计局全国投入产出调查中反映各省间流入、流出的调查结果,和运输量统计等基础数据,采用典型调查和非调查相结合的方法;同时充分利用各省表的信息,并以全国表作为总量控制,进行平衡调整。  相似文献   

9.
ABSTRACT

One main challenge for statistical prediction with data from multiple sources is that not all the associated covariate data are available for many sampled subjects. Consequently, we need new statistical methodology to handle this type of “fragmentary data” that has become more and more popular in recent years. In this article, we propose a novel method based on the frequentist model averaging that fits some candidate models using all available covariate data. The weights in model averaging are selected by delete-one cross-validation based on the data from complete cases. The optimality of the selected weights is rigorously proved under some conditions. The finite sample performance of the proposed method is confirmed by simulation studies. An example for personal income prediction based on real data from a leading e-community of wealth management in China is also presented for illustration.  相似文献   

10.
When a genetic algorithm (GA) is employed in a statistical problem, the result is affected by both variability due to sampling and the stochastic elements of algorithm. Both of these components should be controlled in order to obtain reliable results. In the present work we analyze parametric estimation problems tackled by GAs, and pursue two objectives: the first one is related to a formal variability analysis of final estimates, showing that it can be easily decomposed in the two sources of variability. In the second one we introduce a framework of GA estimation with fixed computational resources, which is a form of statistical and the computational tradeoff question, crucial in recent problems. In this situation the result should be optimal from both the statistical and computational point of view, considering the two sources of variability and the constraints on resources. Simulation studies will be presented for illustrating the proposed method and the statistical and computational tradeoff question.  相似文献   

11.
12.
This paper is concerned with the analysis of observations made on a system that is being stimulated at fixed time intervals but where the precise nature and effect of any individual stimulus is unknown. The realized values are modelled as a stochastic process consisting of a random signal embedded in noise. The aim of the analysis is to use the data to unravel the unknown structure of the system and to ascertain the probabilistic behaviour of the stimuli. A method of parameter estimation based on quasi-profile likelihood is presented and the statistical properties of the estimates are established while recognizing that there will be a discrepancy between the model and the true data-generating mechanism. A method of model validation and determination is also advanced and kernel smoothing techniques are proposed as a basis for identifying the amplitude distribution of the stimuli. The data processing techniques described have a direct application to the investigation of excitatory post-synaptic currents recorded from nerve cells in the central nervous system and their use in quantal analysis of such data is illustrated.  相似文献   

13.
黄恒君 《统计研究》2019,36(7):3-12
大数据在统计生产中潜力巨大,有助于构建高质量的统计生产体系,但符合统计生产目标的数据源特征及其数据质量问题有待明确。本文在寻求大数据源与传统统计数据源共同点的基础上,讨论了统计生产中的大数据源及其数据质量问题,进而探讨了大数据与传统统计生产融合应用。首先从数据生成流程及数据特征两个方面论证并限定了可用于统计生产的大数据源;然后在广义数据质量框架下讨论了大数据统计生产中的数据质量问题,梳理了大数据统计生产流程的数据质量控制要点和质量缺陷;最后根据数据质量分析结果,提出了将大数据融入传统调查的统计体系构建思路。  相似文献   

14.
范新妍等 《统计研究》2021,38(2):99-113
传统信用评分方法主要利用统计分类方法,只能预测借款人是否会发生违约,但不能预测违约发生的时点。治愈率模型是二分类和生存分析的混合模型,不仅可以预测是否会发生违约,而且可以预测违约发生的时点,比传统二分类方法可以提供更多的信息。另外,随着大数据的发展,数据源越来越多,针对相同或者相似任务,可以收集到多个数据集,本文提出了融合多源数据的整合治愈率模型,可以对多个数据集同时建模和估计参数,通过复合惩罚函数进行组间和组内双层变量选择,并通过促进两个子模型回归系数符号相同,提高模型的可解释性。通过数值模拟发现,所提方法在变量选择和参数估计上均有明显优势。最后,将所提方法应用于信用贷款的违约时点预测中,模型表现良好。  相似文献   

15.
We present a mathematical theory of objective, frequentist chance phenomena that uses as a model a set of probability measures. In this work, sets of measures are not viewed as a statistical compound hypothesis or as a tool for modeling imprecise subjective behavior. Instead we use sets of measures to model stable (although not stationary in the traditional stochastic sense) physical sources of finite time series data that have highly irregular behavior. Such models give a coarse-grained picture of the phenomena, keeping track of the range of the possible probabilities of the events. We present methods to simulate finite data sequences coming from a source modeled by a set of probability measures, and to estimate the model from finite time series data. The estimation of the set of probability measures is based on the analysis of a set of relative frequencies of events taken along subsequences selected by a collection of rules. In particular, we provide a universal methodology for finding a family of subsequence selection rules that can estimate any set of probability measures with high probability.  相似文献   

16.
Causal probabilistic models have been suggested for representing diagnostic knowledge in expert systems. This paper describes the theoretical basis for and the implementation of an expert system based on causal probabilistic networks. The system includes model search for building the knowledge base, a shell for making the knowledge base available for users in consultation sessions, and a user interface. The system contains facilities for storing knowledge and propagating new knowledge, and mechanisms for building the knowledge base by semi-automated analysis of a large sparse contingency table. The contingency table contains data acquired for patients in the same diagnostic category as the intended application area of the expert system. The knowledge base of the expert system is created by combining expert knowledge and a statistical model search in a model conversion scheme based on a theory developed by Lauritzen & Spiegelhalter and using exact tests as suggested by Kreiner. The system is implemented on a PC and has been used to simulate the diagnostic value of additional clinical information for coronary artery disease patients under consideration for being referred to coronary arteriography.  相似文献   

17.
Dual-record system estimation has been widely used to obtain vital events in the past. Because of the weakness of the statistical assumptions of the model, as well as the biases involved in the estimators, its use became limited. The proposed estimators for dual-record systems are based on further division of the cells of the original table. The results have shown that they improved the underestimation of the total counts when compared with the classical Chandra Sekar-Deming estimator.  相似文献   

18.
Epidemic surveillance in a community involves monitoring infection trend, triggering alarms before outbreaks, and identifying sources and paths of disease transmission. Algorithms for outbreak detection that are derived from industrial statistical process control (SPC) and scan statistics have been reported in the literature, but there are relatively few methods reported for identifying transmission paths. In this work, we propose an expanded spatial-temporal (EST) model for identifying infection sources. Three dimensional information, subject, location, and time, are expanded into a two-dimensional space by dividing the time horizon into segments and multiplying each segment by the locations. Based on the EST model, we further propose a variable-selection algorithm to identify potential location/time combinations as sources of infection, and thus achieve diagnosis. Numerical simulations show that the proposed scheme is effective in locating infection sources.  相似文献   

19.
Tests of sharp null hypotheses, although frequently computed, are.rarely appropriate as the major end product of statistical analyses, except possibly in some, areas of the natural sciences. But procedures closely akin to tests, although often less formal, are needed in almost every investigation in which exploratory data analysis is used to help to decide upon the statistical model appropriate for the final analysis. The term “diagnostic check”has been suggested by Box and Jenkins for these procedures. Traditional statistical tests often suggest useful diagnostic checks -and this, in my view, is what tests are mainly good for-but visual examination and interpretation of data plots are often equally important. Biere is also much to be gained by the development of new diagnostic checks, and testing theory may be useful as one guide to this development.  相似文献   

20.
This article utilizes stochastic ideas for reasoning about association rule mining, and provides a formal statistical view of this discipline. A simple stochastic model is proposed, based on which support and confidence are reasonable estimates for certain probabilities of the model. Statistical properties of the corresponding estimators, like moments and confidence intervals, are derived, and items and itemsets are observed for correlations. After a brief review of measures of interest of association rules, with the main focus on interestingness measures motivated by statistical principles, two new measures are described. These measures, called α- and σ-precision, respectively, rely on statistical properties of the estimators discussed before. Experimental results demonstrate the effectivity of both measures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号