首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
21世纪是世界各国进行高科技战略决战的世纪。信息技术是核心,统计技术的工程化将给21世纪的科技发展提供技术高科技。20世纪中期,6-δ的统计管理技术将日本带入了经济发达大国。21世纪初,中国能否应用统计工程技术也能进入世界发达国家的行列?文章旨在统计工程化的技术分析中为统计知识体系创新提出了个人的理解,并对统计工程与未来科技发展的关系,特别是21世纪的统计教育提出了新的见解。  相似文献   

2.
Real world applications of association rule mining have well-known problems of discovering a large number of rules, many of which are not interesting or useful for the application at hand. The algorithms for closed and maximal itemsets mining significantly reduce the volume of rules discovered and complexity associated with the task, but the implications of their use and important differences with respect to the generalization power, precision and recall when used in the classification problem have not been examined. In this paper, we present a systematic evaluation of the association rules discovered from frequent, closed and maximal itemset mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriate sequence of usage. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided as a whole and w.r.t individual classes. Empirical results confirm that with a proper combination of data mining and statistical analysis, a large number of non-significant, redundant and contradictive rules can be eliminated while preserving relatively high precision and recall. More importantly, the results reveal the important characteristics and differences between using frequent, closed and maximal itemsets for the classification task, and the effect of incorporating statistical/heuristic measures for optimizing such rule sets. With closed itemset mining already being a preferred choice for complexity and redundancy reduction during rule generation, this study has further confirmed that overall closed itemset based association rules are also of better quality in terms of classification precision and recall, and precision and recall on individual class examples. On the other hand maximal itemset based association rules, that are a subset of closed itemset based rules, show to be insufficient in this regard, and typically will have worse recall and generalization power. Empirical results also show the downfall of using the confidence measure at the start to generate association rules, as typically done within the association rule framework. Removing rules that occur below a certain confidence threshold, will also remove the knowledge of existence of any contradictions in the data to the relatively higher confidence rules, and thus precision can be increased by disregarding contradictive rules prior to application of confidence constraint.  相似文献   

3.
In recent years, several expert systems have been developed for practical applications in applied statistical methodologies. Existing expert systems in statistics have explored several areas, e.g. the determination of appropriate statistical tests, regression analysis, and determination of the ‘best’ experimental design for industrial screening experiments. We present here the DESIGN EXPERT which is a prototype expert system for the design of complex statistical experiments. It is intended for scientific investigators and statisticians who must design and analyze complex experiments, e.g. multilevel medical experiments with nested factors, repeated measures, and both fixed and random eflects. This system is ‘expert’ in the sense that it is capable of the following:(i) recognize specific types of complex experimental designs, based on the application of inference rules to non-technical information supplied by the user; (ii) encode the obtained and inferred information in a flexible general-purpose internal representation, for use by other program modules; (iii) generate analysis of variance tables for the recognized design and an appropriate BMDP runfile for data analysis, using the encoded information. DESIGN EXPERT is capable of recognizing randomized block designs, including lattice designs within embedded Latin squares, cross-over designs, split plots, nesting, repeated measures and covariates. It is written in an experimental programming language developed specifically for research in artificial intelligence.  相似文献   

4.
This article describes Online Information Exchange (ONIX) for Serials, a set of XML-based messages parallel to the ONIX for Books format described in an earlier “Standards Update.” These messages are being developed by a joint working party of National Information Standards Organization (NISO) and EDItEUR, the European EDI organization that oversees ONIX development and maintenance. The article describes the messages, their development, and some of the applications in which it is envisioned the messages will be used.  相似文献   

5.
Abstract

This article describes Online Information Exchange (ONIX) for Serials, a set of XML-based messages parallel to the ONIX for Books format described in an earlier “Standards Update.” These messages are being developed by a joint working party of National Information Standards Organization (NISO) and EDItEUR, the European EDI organization that oversees ONIX development and maintenance. The article describes the messages, their development, and some of the applications in which it is envisioned the messages will be used.  相似文献   

6.
When historical data are available, incorporating them in an optimal way into the current data analysis can improve the quality of statistical inference. In Bayesian analysis, one can achieve this by using quality-adjusted priors of Zellner, or using power priors of Ibrahim and coauthors. These rules are constructed by raising the prior and/or the sample likelihood to some exponent values, which act as measures of compatibility of their quality or proximity of historical data to current data. This paper presents a general, optimum procedure that unifies these rules and is derived by minimizing a Kullback–Leibler divergence under a divergence constraint. We show that the exponent values are directly related to the divergence constraint set by the user and investigate the effect of this choice theoretically and also through sensitivity analysis. We show that this approach yields ‘100% efficient’ information processing rules in the sense of Zellner. Monte Carlo experiments are conducted to investigate the effect of historical and current sample sizes on the optimum rule. Finally, we illustrate these methods by applying them on real data sets.  相似文献   

7.
Causal probabilistic models have been suggested for representing diagnostic knowledge in expert systems. This paper describes the theoretical basis for and the implementation of an expert system based on causal probabilistic networks. The system includes model search for building the knowledge base, a shell for making the knowledge base available for users in consultation sessions, and a user interface. The system contains facilities for storing knowledge and propagating new knowledge, and mechanisms for building the knowledge base by semi-automated analysis of a large sparse contingency table. The contingency table contains data acquired for patients in the same diagnostic category as the intended application area of the expert system. The knowledge base of the expert system is created by combining expert knowledge and a statistical model search in a model conversion scheme based on a theory developed by Lauritzen & Spiegelhalter and using exact tests as suggested by Kreiner. The system is implemented on a PC and has been used to simulate the diagnostic value of additional clinical information for coronary artery disease patients under consideration for being referred to coronary arteriography.  相似文献   

8.
The estimand framework included in the addendum to the ICH E9 guideline facilitates discussions to ensure alignment between the key question of interest, the analysis, and interpretation. Therapeutic knowledge and drug mechanism play a crucial role in determining the strategy and defining the estimand for clinical trial designs. Clinical trials in patients with hematological malignancies often present unique challenges for trial design due to complexity of treatment options and existence of potential curative but highly risky procedures, for example, stem cell transplant or treatment sequence across different phases (induction, consolidation, maintenance). Here, we illustrate how to apply the estimand framework in hematological clinical trials and how the estimand framework can address potential difficulties in trial result interpretation. This paper is a result of a cross-industry collaboration to connect the International Conference on Harmonisation (ICH) E9 addendum concepts to applications. Three randomized phase 3 trials will be used to consider common challenges including intercurrent events in hematologic oncology trials to illustrate different scientific questions and the consequences of the estimand choice for trial design, data collection, analysis, and interpretation. Template language for describing estimand in both study protocols and statistical analysis plans is suggested for statisticians' reference.  相似文献   

9.
This paper overviews some recent developments in panel data asymptotics, concentrating on the nonstationary panel case and gives a new result for models with individual effects. Underlying recent theory are asymptotics for multi-indexed processes in which both indexes may pass to infinity. We review some of the new limit theory that has been developed, show how it can be applied and give a new interpretation of individual effects in nonstationary panel data. Fundamental to the interpretation of much of the asymptotics is the concept of a panel regression coefficient which measures the long run average relation across a section of the panel. This concept is analogous to the statistical interpretation of the coefficient in a classical regression relation. A variety of nonstationary panel data models are discussed and the paper reviews the asymptotic properties of estimators in these various models. Some recent developments in panel unit root tests and stationary dynamic panel regression models are also reviewed.  相似文献   

10.
Nonstationary panel data analysis: an overview of some recent developments   总被引:2,自引:0,他引:2  
This paper overviews some recent developments in panel data asymptotics, concentrating on the nonstationary panel case and gives a new result for models with individual effects. Underlying recent theory are asymptotics for multi-indexed processes in which both indexes may pass to infinity. We review some of the new limit theory that has been developed, show how it can be applied and give a new interpretation of individual effects in nonstationary panel data. Fundamental to the interpretation of much of the asymptotics is the concept of a panel regression coefficient which measures the long run average relation across a section of the panel. This concept is analogous to the statistical interpretation of the coefficient in a classical regression relation. A variety of nonstationary panel data models are discussed and the paper reviews the asymptotic properties of estimators in these various models. Some recent developments in panel unit root tests and stationary dynamic panel regression models are also reviewed.  相似文献   

11.
Statistical database management systems keep raw, elementary and/or aggregated data and include query languages with facilities to calculate various statistics from this data. In this article we examine statistical database query languages with respect to the criteria identified and taxonomy developed in Ozsoyoglu and Ozsoyoglu (1985b). The criteria include statistical metadata and objects, aggregation features and interface to statistical packages. The taxonomy of statistical database query languages classifies them with respect to the data model used, the type of user interface and method of implementation. Temporal databases are rich sources of data for statistical analysis. Aggregation features of temporal query languages, as well as the issues in calculating aggregates from temporal data, are also examined.  相似文献   

12.
赵君明 《统计研究》1998,15(3):49-52
本文对量化分析和模糊量化分析的异国进行比较。限于篇幅,这里仅讨论量化分析和模糊量化分析的第1,2类。  相似文献   

13.
Under given concrete exogenous conditions, the fraction of identifiable records in a microdata file without positive identifiers such as name and address is estimated. The effect of possible noise in the data, as well as the sample property of microdata files, is taken into account. Using real microdata files, it is shown that there is no risk of disclosure if the information content of characteristics known to the investigator (additional knowledge) is limited. Files with additional knowledge of large information content yield a high risk of disclosure. This can be eliminated only by massive modifications of the data records, which, however, involve large biases for complex statistical evaluations. In this case, the requirement for privacy protection and high-quality data perhaps may be fulfilled only if the linkage of such files with extensive additional knowledge is prevented by appropriate organizational and legal restrictions.  相似文献   

14.
Nonparametric estimators of component and system life distributions are developed and presented for situations where recurrent competing risks data from series systems are available. The use of recurrences of components’ failures leads to improved efficiencies in statistical inference, thereby leading to resource-efficient experimental or study designs or improved inferences about the distributions governing the event times. Finite and asymptotic properties of the estimators are obtained through simulation studies and analytically. The detrimental impact of parametric model misspecification is also vividly demonstrated, lending credence to the virtue of adopting nonparametric or semiparametric models, especially in biomedical settings. The estimators are illustrated by applying them to a data set pertaining to car repairs for vehicles that were under warranty.  相似文献   

15.
Priors elicited according to maximal entropy rules have been used for years in objective and subjective Bayesian analysis. However, when the prior knowledge remains fuzzy or dubious, they often suffer from impropriety which can make them uncomfortable to use. In this article we suggest the formal elicitation of an encompassing family for the standard maximal entropy (ME) priors and the maximal data information (MDI) priors, which can lead to obtain proper families. An interpretation is given in the objective framework of channel coding. In a subjective framework, the performance of the method is shown in a reliability context when flat but proper priors are elicited for the Weibull lifetime distributions. Such priors appear as practical tools for sensitivity studies.  相似文献   

16.
The purpose of assessing adverse events (AEs) in clinical studies is to evaluate what AE patterns are likely to occur during treatment. In contrast, it is difficult to specify which of these patterns occurs in each patient. To tackle this challenging issue, we constructed a new statistical model including nonnegative matrix factorization by incorporating background knowledge of AE-specific structures such as severity and drug mechanism of action. The model uses a meta-analysis framework for integrating data from multiple clinical studies because insufficient information is derived from a single trial. We demonstrated the proposed method by applying it to real data consisting of three Phase III studies, two mechanisms of action, five anticancer treatments, 3317 patients, 848 AE types, and 99,546 AEs. The extracted typical treatment-specific AE patterns coincided with medical knowledge. We also demonstrated patient-level safety profiles using the data of AEs that were observed by the end of the second cycle.  相似文献   

17.
This work presents an optimal value to be used in the power transformation to transform the exponential to normality for statistical process control (SPC) applications. The optimal value is found by minimizing the sum of absolute differences between two distinct cumulative probability functions. Based on this criterion, a numerical search yields a proposed value of 3.5142, so the transformed distribution is well approximated by the normal distribution. Two examples are presented to demonstrate the effectiveness of using the transformation method and its applications in SPC. The transformed data are almost normally distributed and the performance of the individual charts is satisfactory. Compared to charts that use the original exponential data and probability control limits, the individual charts constructed using the transformed distribution are superior in appearance, ease of interpretation and implementation by practitioners.  相似文献   

18.
The paper gives a review of a number of data models for aggregate statistical data which have appeared in the computer science literature in the last ten years.After a brief introduction to the data model in general, the fundamental concepts of statistical data are introduced. These are called statistical objects because they are complex data structures (vectors, matrices, relations, time series, etc) which may have different possible representations (e.g. tables, relations, vectors, pie-charts, bar-charts, graphs, and so on). For this reason a statistical object is defined by two different types of attribute (a summary attribute, with its own summary type and with its own instances, called summary data, and the set of category attributes, which describe the summary attribute). Some conceptual models of statistical data (CSM, SDM4S), some semantic models of statistical data (SCM, SAM*, OSAM*), and some graphical models of statistical data (SUBJECT, GRASS, STORM) are also discussed.  相似文献   

19.
In many clinical research applications the time to occurrence of one event of interest, that may be obscured by another??so called competing??event, is investigated. Specific interventions can only have an effect on the endpoint they address or research questions might focus on risk factors for a certain outcome. Different approaches for the analysis of time-to-event data in the presence of competing risks were introduced in the last decades including some new methodologies, which are not yet frequently used in the analysis of competing risks data. Cause-specific hazard regression, subdistribution hazard regression, mixture models, vertical modelling and the analysis of time-to-event data based on pseudo-observations are described in this article and are applied to a dataset of a cohort study intended to establish risk stratification for cardiac death after myocardial infarction. Data analysts are encouraged to use the appropriate methods for their specific research questions by comparing different regression approaches in the competing risks setting regarding assumptions, methodology and interpretation of the results. Notes on application of the mentioned methods using the statistical software R are presented and extensions to the presented standard methods proposed in statistical literature are mentioned.  相似文献   

20.
Wavelets are a commonly used tool in science and technology. Often, their use involves applying a wavelet transform to the data, thresholding the coefficients and applying the inverse transform to obtain an estimate of the desired quantities. In this paper, we argue that it is often possible to gain more insight into the data by producing not just one, but many wavelet reconstructions using a range of threshold values and analysing the resulting object, which we term the Time–Threshold Map (TTM) of the input data. We discuss elementary properties of the TTM, in its “basic” and “derivative” versions, using both Haar and Unbalanced Haar wavelet families. We then show how the TTM can help in solving two statistical problems in the signal + noise model: breakpoint detection, and estimating the longest interval of approximate stationarity. We illustrate both applications with examples involving volatility of financial returns. We also briefly discuss other possible uses of the TTM.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号