首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
连续属性的离散化在数据挖掘中有着非常重要的作用。本文从独立性角度考虑,提出一种基于似然比假设检验的连续属性离散化方法,有效避免了chi2算法中的局限性并在统计模拟中取得了良好的效果。  相似文献   

2.
连续属性离散化在人工智能和数据挖掘领域具有重要的作用。文章用杂度削减指标来量化离散化后区间内的一致性和区间之间的差异性,进而提出了基于这一指标的离散化方法。通过调整最小杂度削减阈值并构建二叉树来确定划分区间的断点,能够有效地寻求全局最优的离散化方案。模拟实验验证了该方法相对于传统方法的优势。  相似文献   

3.
粗糙集理论为处理模糊、不完整、不确定性知识分析提供了一种新的分析理论,更具客观性,其不足之处在于决策表中的值必须是离散值.文章首先介绍了粗糙集理论及其基本概念;然后引入了基于模糊C均值聚类法的连续属性离散化方法,并结合二者提出了一种综合评价方法;最后以20家上市公司作为经验样本,对其财务状况做出了综合评价.  相似文献   

4.
在实际决策中,决策者经常面临不确定的环境,事先无法取得精确的属性值。随机多属性决策,即属性值为随机变量的多属性决策,是不确定型多属性决策中非常重要的类型。根据随机变量的类别,随机多属性决策同样可以分为:离散型和连续型两种情形。目前关于离散型随机多属性决策的研究并不多见。本文考虑了有信息缺失的离散型随机多属  相似文献   

5.
在采用聚类方法产生训练集的基础上,运用粗集理论离散化预处理该训练集,可以更好的提高分类精度.文章运用PAM算法聚类原始样本构成训练集,再利用布尔逻辑和粗集理论结合的离散化算法离散化该训练集,并以此离散化的训练集训练分类器.实验结果证明,基于该方法在相同的数据集上分类,比仅基于PAM算法预处理的RDDTE方法产生的分类精度最高提高了15.5%,且选用更少量的训练集.  相似文献   

6.
目前获取属性权重方法,大多数需要决策者对属性直接打分或两两比较或优劣排序,这要求属性数量不宜过多、决策者应对各个属性有充分了解.众里取大规则下由频率确定属性权重的方法要求决策者挑选一个最重要属性,进而通过频率确定属性权重.文章首先视属性赋值为随机变量,依概率期望定义权重公式;然后根据离散选择模型相关结论,推导出属性频率与权重的转换表达式,将其代入权重公式得到属性权重;最后构造卡方统计量,判断权重之间是否具有显著差异.该方法适用于有大量决策者和众多属性的群决策问题.  相似文献   

7.
中国在线商品价格离散影响因素的实证检验   总被引:2,自引:0,他引:2  
本文在相关研究基础上对影响在线商品价格离散的因素进行了实证研究,结果发现在线商品价格离散与产品属性、网站特点、产品市场特点等因素有关。  相似文献   

8.
 作为一种近似处理的工具,粗集主要用于不确定情况下的决策分析,并且不需要任何事先的数据假定。但当前的主流粗集分类方法仍然需要先经过离散化的步骤,这就损失了数值型变量提供的高质量信息。本文对隶属函数重新加以概率定义,并提出了一种基于Bayes概率边界域的粗集分类技术,比较好的解决了当前粗集方法所面临的数值型属性分类的不适应、分类规则不完备等一系列问题。  相似文献   

9.
疾病地图经常应用于疾病地理分布的研究,由于相邻的区域之间,疾病相对风险比较相似,通常使用随机效应模型的方法,这样可以使估计值更为稳定。为进一步研究空间疾病风险的差异程度,我们提出了新的疾病风险结构模型,将离散的区域疾病风险连续化,并将模型建立在高斯随机场下,这样我们不仅可以估计每个区域个体的相对风险,还可以估计区域总体的疾病风险。  相似文献   

10.
回归分析中虚拟变量的系数转换   总被引:1,自引:0,他引:1       下载免费PDF全文
曹志祥 《统计研究》1994,11(1):69-71
回归分析中虚拟变量的系数转换曹志祥在工业统计回归分析中,有时遇到自变量是属性变量的情形,即该变量描述的现象是足性的,或是可分类的,但回归分析中不宜直接使用属性变量。因为,对属性变量所赋与的离散值之间的相等间距掩盖了不同属类之间的差异,用属性变量直接回...  相似文献   

11.
Evaluation of system reliability for complex systems based on Taylor's approximation becomes increasingly intractable. Taguchi's concept of random experimentation has been exploited by English et al (1996) for discretization of complex systems and determination of reliability values. We indicate a few demerits of discretization method and propose to retain the continuous character of the original problem by evaluating system reliability using a range approximation method. Our proposed method works better than discretization approach in all the three engineering problems considered for the purpose of demonstration.  相似文献   

12.
A two-parameter discrete gamma distribution is derived corresponding to the continuous two parameters gamma distribution using the general approach for discretization of continuous probability distributions. One parameter discrete gamma distribution is obtained as a particular case. A few important distributional and reliability properties of the proposed distribution are examined. Parameter estimation by different methods is discussed. Performance of different estimation methods are compared through simulation. Data fitting is carried out to investigate the suitability of the proposed distribution in modeling discrete failure time data and other count data.  相似文献   

13.
A discrete version of the Gumbel distribution (Type-I Extreme Value distribution) has been derived by using the general approach of discretization of a continuous distribution. Important distributional and reliability properties have been explored. It has been shown that depending on the choice of parameters the proposed distribution can be positively or negatively skewed; possess long-tail(s). Log-concavity of the distribution and consequent results have been established. Estimation of parameters by method of maximum likelihood, method of moments, and method of proportions has been discussed. A method of checking model adequacy and regression type estimation based on empirical survival function has also been examined. A simulation study has been carried out to compare and check the efficacy of the three methods of estimations. The distribution has been applied to model three real count data sets from diverse application area namely, survival times in number of days, maximum annual floods data from Brazil and goal differences in English premier league, and the results show the relevance of the proposed distribution.  相似文献   

14.
Abstract

In the area of goodness-of-fit there is a clear distinction between the problem of testing the fit of a continuous distribution and that of testing a discrete distribution. In all continuous problems the data is recorded with a limited number of decimals, so in theory one could say that the problem is always of a discrete nature, but it is a common practice to ignore discretization and proceed as if the data is continuous. It is therefore an interesting question whether in a given problem of test of fit, the “limited resolution” in the observed recorded values may be or may be not of concern, if the analysis done ignores this implied discretization. In this article, we address the problem of testing the fit of a continuous distribution with data recorded with a limited resolution. A measure for the degree of discretization is proposed which involves the size of the rounding interval, the dispersion in the underlying distribution and the sample size. This measure is shown to be a key characteristic which allows comparison, in different problems, of the amount of discretization involved. Some asymptotic results are given for the distribution of the EDF (empirical distribution function) statistics that explicitly depend on the above mentioned measure of degree of discretization. The results obtained are illustrated with some simulations for testing normality when the parameters are known and also when they are unknown. The asymptotic distributions are shown to be an accurate approximation for the true finite n distribution obtained by Monte Carlo. A real example from image analysis is also discussed. The conclusion drawn is that in the cases where the value of the measure for the degree of discretization is not “large”, the practice of ignoring discreteness is of no concern. However, when this value is “large”, the effect of ignoring discreteness leads to an exceded number of rejections of the distribution tested, as compared to what would be the number of rejections if no rounding is taking into account. The error made in the number of rejections might be huge.  相似文献   

15.
Many statistical methods for continuous distributions assume a linear conditional expectation. Components of multivariate distributions are often measured on a discrete ordinal scale based on a discretization of an underlying continuous latent variable. The results in this paper show that common examples of discretized bivariate and trivariate distributions will have a linear conditional expectation. Examples and simulations are provided to illustrate the results.  相似文献   

16.
The field of microrheology is based on experiments involving particle diffusion. Microscopic tracer beads are placed into a non-Newtonian fluid and tracked using high speed video capture and light microscopy. The modelling of the behaviour of these beads is now an active scientific area which demands multiple stochastic and statistical methods. We propose an approximate wavelet-based simulation technique for two classes of continuous time anomalous diffusion models, the fractional Ornstein–Uhlenbeck process and the fractional generalized Langevin equation. The proposed algorithm is an iterative method that provides approximate discretizations that converge quickly and in an appropriate sense to the continuous time target process. As compared to previous works, it covers cases where the natural discretization of the target process does not have closed form in the time domain. Moreover, we propose to minimize the border effect via smoothing.  相似文献   

17.
In some physical systems, where the goal is to describe behaviour over an entire field using scattered observations, a multiple regression model can be derived from the discretization of a continuous process. These models often have more parameters than observations. We propose a technique for constructing smoothed estimators in this situation. Our method assumes the model has random explanatory and response variables, and imposes a smoothness penalty based on the signal-to-noise ratio of the model. Results are présentés using a known value for the ratio, and a method for estimating the ratio is discussed. The procedure is applied to modelling temperature measurements taken in the California Current.  相似文献   

18.
It is often assumed in statistics that the random variables under consideration come from a continuous distribution. However, real data is always given in a rounded (discretized) form. The rounding errors become serious when the sample size is large. In this paper, we consider the situation where the mesh of discretization tends to zero as the sample size tends to infinity, and give some sets of sufficient conditions under which the rounding errors can be asymptotically ignored, in the context of Z-estimation. It is theoretically proved that the mid-point discretization is preferable.  相似文献   

19.
This article discusses the discretization of continuous-time filters for application to discrete time series sampled at any fixed frequency. In this approach, the filter is first set up directly in continuous-time; since the filter is expressed over a continuous range of lags, we also refer to them as continuous-lag filters. The second step is to discretize the filter itself. This approach applies to different problems in signal extraction, including trend or business cycle analysis, and the method allows for coherent design of discrete filters for observed data sampled as a stock or a flow, for nonstationary data with stochastic trend, and for different sampling frequencies. We derive explicit formulas for the mean squared error (MSE) optimal discretization filters. We also discuss the problem of optimal interpolation for nonstationary processes – namely, how to estimate the values of a process and its components at arbitrary times in-between the sampling times. A number of illustrations of discrete filter coefficient calculations are provided, including the local level model (LLM) trend filter, the smooth trend model (STM) trend filter, and the Band Pass (BP) filter. The essential methodology can be applied to other kinds of trend extraction problems. Finally, we provide an extended demonstration of the method on CPI flow data measured at monthly and annual sampling frequencies.  相似文献   

20.
Summary.  A Bayesian intensity model is presented for studying a bioassay problem involving interval-censored tumour onset times, and without discretization of times of death. Both tumour lethality and base-line hazard rates are estimated in the absence of cause-of-death information. Markov chain Monte Carlo methods are used in the numerical estimation, and sophisticated group updating algorithms are applied to achieve reasonable convergence properties. This method was tried on the rat tumorigenicity data that have previously been analysed by Ahn, Moon and Kodell, and our results seem to be more realistic.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号