共查询到20条相似文献,搜索用时 93 毫秒
1.
连续属性的离散化在数据挖掘中有着非常重要的作用。本文从独立性角度考虑,提出一种基于似然比假设检验的连续属性离散化方法,有效避免了chi2算法中的局限性并在统计模拟中取得了良好的效果。 相似文献
2.
连续属性离散化在人工智能和数据挖掘领域具有重要的作用。文章用杂度削减指标来量化离散化后区间内的一致性和区间之间的差异性,进而提出了基于这一指标的离散化方法。通过调整最小杂度削减阈值并构建二叉树来确定划分区间的断点,能够有效地寻求全局最优的离散化方案。模拟实验验证了该方法相对于传统方法的优势。 相似文献
3.
4.
在实际决策中,决策者经常面临不确定的环境,事先无法取得精确的属性值。随机多属性决策,即属性值为随机变量的多属性决策,是不确定型多属性决策中非常重要的类型。根据随机变量的类别,随机多属性决策同样可以分为:离散型和连续型两种情形。目前关于离散型随机多属性决策的研究并不多见。本文考虑了有信息缺失的离散型随机多属 相似文献
5.
在采用聚类方法产生训练集的基础上,运用粗集理论离散化预处理该训练集,可以更好的提高分类精度.文章运用PAM算法聚类原始样本构成训练集,再利用布尔逻辑和粗集理论结合的离散化算法离散化该训练集,并以此离散化的训练集训练分类器.实验结果证明,基于该方法在相同的数据集上分类,比仅基于PAM算法预处理的RDDTE方法产生的分类精度最高提高了15.5%,且选用更少量的训练集. 相似文献
6.
7.
中国在线商品价格离散影响因素的实证检验 总被引:2,自引:0,他引:2
本文在相关研究基础上对影响在线商品价格离散的因素进行了实证研究,结果发现在线商品价格离散与产品属性、网站特点、产品市场特点等因素有关。 相似文献
8.
9.
10.
回归分析中虚拟变量的系数转换曹志祥在工业统计回归分析中,有时遇到自变量是属性变量的情形,即该变量描述的现象是足性的,或是可分类的,但回归分析中不宜直接使用属性变量。因为,对属性变量所赋与的离散值之间的相等间距掩盖了不同属类之间的差异,用属性变量直接回... 相似文献
11.
Evaluation of system reliability for complex systems based on Taylor's approximation becomes increasingly intractable. Taguchi's concept of random experimentation has been exploited by English et al (1996) for discretization of complex systems and determination of reliability values. We indicate a few demerits of discretization method and propose to retain the continuous character of the original problem by evaluating system reliability using a range approximation method. Our proposed method works better than discretization approach in all the three engineering problems considered for the purpose of demonstration. 相似文献
12.
A two-parameter discrete gamma distribution is derived corresponding to the continuous two parameters gamma distribution using the general approach for discretization of continuous probability distributions. One parameter discrete gamma distribution is obtained as a particular case. A few important distributional and reliability properties of the proposed distribution are examined. Parameter estimation by different methods is discussed. Performance of different estimation methods are compared through simulation. Data fitting is carried out to investigate the suitability of the proposed distribution in modeling discrete failure time data and other count data. 相似文献
13.
Subrata Chakraborty Dhrubajyoti Chakravarty Josmar Mazucheli Wesley Bertoli 《Journal of applied statistics》2021,48(4):712
A discrete version of the Gumbel distribution (Type-I Extreme Value distribution) has been derived by using the general approach of discretization of a continuous distribution. Important distributional and reliability properties have been explored. It has been shown that depending on the choice of parameters the proposed distribution can be positively or negatively skewed; possess long-tail(s). Log-concavity of the distribution and consequent results have been established. Estimation of parameters by method of maximum likelihood, method of moments, and method of proportions has been discussed. A method of checking model adequacy and regression type estimation based on empirical survival function has also been examined. A simulation study has been carried out to compare and check the efficacy of the three methods of estimations. The distribution has been applied to model three real count data sets from diverse application area namely, survival times in number of days, maximum annual floods data from Brazil and goal differences in English premier league, and the results show the relevance of the proposed distribution. 相似文献
14.
《统计学通讯:模拟与计算》2013,42(3):953-976
Abstract In the area of goodness-of-fit there is a clear distinction between the problem of testing the fit of a continuous distribution and that of testing a discrete distribution. In all continuous problems the data is recorded with a limited number of decimals, so in theory one could say that the problem is always of a discrete nature, but it is a common practice to ignore discretization and proceed as if the data is continuous. It is therefore an interesting question whether in a given problem of test of fit, the “limited resolution” in the observed recorded values may be or may be not of concern, if the analysis done ignores this implied discretization. In this article, we address the problem of testing the fit of a continuous distribution with data recorded with a limited resolution. A measure for the degree of discretization is proposed which involves the size of the rounding interval, the dispersion in the underlying distribution and the sample size. This measure is shown to be a key characteristic which allows comparison, in different problems, of the amount of discretization involved. Some asymptotic results are given for the distribution of the EDF (empirical distribution function) statistics that explicitly depend on the above mentioned measure of degree of discretization. The results obtained are illustrated with some simulations for testing normality when the parameters are known and also when they are unknown. The asymptotic distributions are shown to be an accurate approximation for the true finite n distribution obtained by Monte Carlo. A real example from image analysis is also discussed. The conclusion drawn is that in the cases where the value of the measure for the degree of discretization is not “large”, the practice of ignoring discreteness is of no concern. However, when this value is “large”, the effect of ignoring discreteness leads to an exceded number of rejections of the distribution tested, as compared to what would be the number of rejections if no rounding is taking into account. The error made in the number of rejections might be huge. 相似文献
15.
Many statistical methods for continuous distributions assume a linear conditional expectation. Components of multivariate distributions are often measured on a discrete ordinal scale based on a discretization of an underlying continuous latent variable. The results in this paper show that common examples of discretized bivariate and trivariate distributions will have a linear conditional expectation. Examples and simulations are provided to illustrate the results. 相似文献
16.
《Journal of Statistical Computation and Simulation》2012,82(4):697-723
The field of microrheology is based on experiments involving particle diffusion. Microscopic tracer beads are placed into a non-Newtonian fluid and tracked using high speed video capture and light microscopy. The modelling of the behaviour of these beads is now an active scientific area which demands multiple stochastic and statistical methods. We propose an approximate wavelet-based simulation technique for two classes of continuous time anomalous diffusion models, the fractional Ornstein–Uhlenbeck process and the fractional generalized Langevin equation. The proposed algorithm is an iterative method that provides approximate discretizations that converge quickly and in an appropriate sense to the continuous time target process. As compared to previous works, it covers cases where the natural discretization of the target process does not have closed form in the time domain. Moreover, we propose to minimize the border effect via smoothing. 相似文献
17.
Gary Sneddon 《Revue canadienne de statistique》1999,27(1):63-79
In some physical systems, where the goal is to describe behaviour over an entire field using scattered observations, a multiple regression model can be derived from the discretization of a continuous process. These models often have more parameters than observations. We propose a technique for constructing smoothed estimators in this situation. Our method assumes the model has random explanatory and response variables, and imposes a smoothness penalty based on the signal-to-noise ratio of the model. Results are présentés using a known value for the ratio, and a method for estimating the ratio is discussed. The procedure is applied to modelling temperature measurements taken in the California Current. 相似文献
18.
Yoichi Nishiyama 《Journal of statistical planning and inference》2011,141(1):287-292
It is often assumed in statistics that the random variables under consideration come from a continuous distribution. However, real data is always given in a rounded (discretized) form. The rounding errors become serious when the sample size is large. In this paper, we consider the situation where the mesh of discretization tends to zero as the sample size tends to infinity, and give some sets of sufficient conditions under which the rounding errors can be asymptotically ignored, in the context of Z-estimation. It is theoretically proved that the mid-point discretization is preferable. 相似文献
19.
Tucker McElroy 《Econometric Reviews》2013,32(5):475-513
This article discusses the discretization of continuous-time filters for application to discrete time series sampled at any fixed frequency. In this approach, the filter is first set up directly in continuous-time; since the filter is expressed over a continuous range of lags, we also refer to them as continuous-lag filters. The second step is to discretize the filter itself. This approach applies to different problems in signal extraction, including trend or business cycle analysis, and the method allows for coherent design of discrete filters for observed data sampled as a stock or a flow, for nonstationary data with stochastic trend, and for different sampling frequencies. We derive explicit formulas for the mean squared error (MSE) optimal discretization filters. We also discuss the problem of optimal interpolation for nonstationary processes – namely, how to estimate the values of a process and its components at arbitrary times in-between the sampling times. A number of illustrations of discrete filter coefficient calculations are provided, including the local level model (LLM) trend filter, the smooth trend model (STM) trend filter, and the Band Pass (BP) filter. The essential methodology can be applied to other kinds of trend extraction problems. Finally, we provide an extended demonstration of the method on CPI flow data measured at monthly and annual sampling frequencies. 相似文献
20.
T. Härkänen E. Arjas 《Journal of the Royal Statistical Society. Series C, Applied statistics》2004,53(4):601-617
Summary. A Bayesian intensity model is presented for studying a bioassay problem involving interval-censored tumour onset times, and without discretization of times of death. Both tumour lethality and base-line hazard rates are estimated in the absence of cause-of-death information. Markov chain Monte Carlo methods are used in the numerical estimation, and sophisticated group updating algorithms are applied to achieve reasonable convergence properties. This method was tried on the rat tumorigenicity data that have previously been analysed by Ahn, Moon and Kodell, and our results seem to be more realistic. 相似文献