首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Density estimation for pre-binned data is challenging due to the loss of exact position information of the original observations. Traditional kernel density estimation methods cannot be applied when data are pre-binned in unequally spaced bins or when one or more bins are semi-infinite intervals. We propose a novel density estimation approach using the generalized lambda distribution (GLD) for data that have been pre-binned over a sequence of consecutive bins. This method enjoys the high power of the parametric model and the great shape flexibility of the GLD. The performances of the proposed estimators are benchmarked via simulation studies. Both simulation results and a real data application show that the proposed density estimators work well for data of moderate or large sizes.  相似文献   

2.
We present a hybrid method for random pattern sequence classification that takes into account the random structural properties of the sequence. The method works in two steps. A segmentation step, dividing the original sequence into segments, such that all observations in a same segment belong to a unique class, and a classification step, where each segment is classified by a neural network classifier.  相似文献   

3.
This paper considers estimation and prediction in the Aalen additive hazards model in the case where the covariate vector is high-dimensional such as gene expression measurements. Some form of dimension reduction of the covariate space is needed to obtain useful statistical analyses. We study the partial least squares regression method. It turns out that it is naturally adapted to this setting via the so-called Krylov sequence. The resulting PLS estimator is shown to be consistent provided that the number of terms included is taken to be equal to the number of relevant components in the regression model. A standard PLS algorithm can also be constructed, but it turns out that the resulting predictor can only be related to the original covariates via time-dependent coefficients. The methods are applied to a breast cancer data set with gene expression recordings and to the well known primary biliary cirrhosis clinical data.  相似文献   

4.
针对本身已经具有饱和状态过程且近似满足Logistic函数形式的原始序列,提出通过对其进行倒数生成,建立无偏灰色Verhulst直接建模模型,并在此基础上将同时优化背景值和灰导数与利用"最小一乘法"确定响应系数的方法相结合,从而建立了优化的无偏灰色Verhulst直接建模模型。结果表明,该模型对满足Logistic函数形式的曲线进行模拟和预测具有完全重合性。通过实例分析说明了优化的新模型的可行性和有效性。  相似文献   

5.
The ARHD model     
We introduce and study a new model for functional data. The ARHD is an autoregressive model in which the first order derivative of the random curves appears explicitly. Convergent estimates are obtained through an original double penalization method. The prediction method is applied to a real set of data already studied in the literature.  相似文献   

6.
赵雪艳 《统计研究》2020,37(6):106-118
对应分析在对定性数据进行数量化处理过程中出现了“弓形效应”,关于对应分析的“弓形效应”的修正方法已经有了丰富的研究成果,避免了可能错误的分析结果,对理论界和应用领域都有重要意义。数量化Ⅱ类是关于定性数据的一种判别分析方法,在国内外已被广泛应用。本文通过大量模拟数据研究发现,数量化Ⅱ类在对定性数据进行数量化过程中出现了“弓形效应”,降低了正判别率,同时不能正确再现原始数据信息,得出与原始数据信息不符的错误分析结果,为修正“弓形效应”,提出了二阶段判别分析法,并从正判别率和对原始数据再现程度两个方面对数量化Ⅱ类与二阶段判别分析法进行了比较,同时将二阶段判别分析法运用到个人信用评级中,发现二阶段判别分析法的判别性能优于数量化Ⅱ类。  相似文献   

7.
Multinomial goodness-of-fit tests arise in a diversity of milieu. The long history of the problem has spawned a multitude of asymptotic tests. If the sample size relative to the number of categories is small, the accuracy of these tests is compromised. In that case, an exact test is a prudent option. But such tests are computationally intensive and need efficient algorithms. This paper gives a conceptual overview, and empirical comparisons of two avenues, namely the network and fast Fourier transform (FFT) algorithms, for an exact goodness-of-fit test on a multinomial. We show that a recursive execution of a polynomial product forms the basis of both these approaches. Specific details to implement the network method, and techniques to enhance the efficiency of the FFT algorithm are given. Our empirical comparisons show that for exact analysis with the chi-square and likelihood ratio statistics, the network-cum-polynomial multiplication algorithm is the more efficient and accurate of the two.  相似文献   

8.
由于灰代数运算体系尚不完善,难以有效构建基于灰数序列的灰色模型,而传统灰数序列的白化方法又将导致信息损失,故在不破坏区间灰数独立性及信息完整性的前提下,设计一种区间灰数序列白化处理的新方法,重点研究白化序列与原区间灰数序列在平移变换及倍乘变换过程中的数据特点;同时将白化序列成功地应用于区间灰数预测及关联分析模型的构建。这一研究成果对拓展灰色模型的适用范围具有重要意义。  相似文献   

9.
Multiple comparison methods are widely implemented in statistical packages and heavily used. To obtain the critical value of a multiple comparison method for a given confidence level, a double integral equation must be solved. Current computer implementations evaluate one double integral for each candidate critical value using Gaussian quadrature. Consequently, iterative refinement of the critical value can slow the response time enough to hamper interactive data analysis. However, for balanced designs, to obtain the critical value for multiple comparisons with the best, subset selection, and one-sided multiple comparison with a control, if one regards the inner integral as a function of the outer integration variable, then this function can be obtained by discrete convolution using the Fast Fourier Transform (FFT). Exploiting the fact that this function need not be re-evaluated during iterative refinement of the critical value, it is shown that the FFT method obtains critical values at least four times as accurate and two to five times as fast as the Gaussian quadrature method.  相似文献   

10.
Compositional data are known as a sort of complex multidimensional data with the feature that reflect the relative information rather than absolute information. There are a variety of models for regression analysis with compositional variables. Similar to the traditional regression analysis, the heteroskedasticity still exists in these models. However, the existing heteroskedastic regression analysis methods cannot apply in these models with compositional error term. In this paper, we mainly study the heteroskedastic linear regression model with compositional response and covariates. The parameter estimator is obtained through weighted least squares method. For the hypothesis test of parameter, the test statistic is based on the original least squares estimator and corresponding heteroskedasticity-consistent covariance matrix estimator. When the proposed method is applied to both simulation and real example, we use the original least squares method as a comparison during the whole process. The results implicate the model's practicality and effectiveness in regression analysis with heteroskedasticity.  相似文献   

11.
A Partial Likelihood Estimator of Vaccine Efficacy   总被引:1,自引:0,他引:1  
A partial likelihood method is proposed for estimating vaccine efficacy for a general epidemic model. In contrast to the maximum likelihood estimator (MLE) which requires complete observation of the epidemic, the suggested method only requires information on the sequence in which individuals are infected and not the exact infection times. A simulation study shows that the method performs almost as well as the MLE. The method is applied to data on the infectious disease mumps.  相似文献   

12.
The problem of selecting the best of k populations is studied for data which are incomplete as some of the values have been deleted randomly. This situation is met in extreme value analysis where only data exceeding a threshold are observable. For increasing sample size we study the case where the probability that a value is observed tends to zero, but the sparse condition is satisfied, so that the mean number of observable values in each population is bounded away from zero and infinity as the sample size tends to infinity. The incomplete data are described by thinned point processes which are approximated by Poisson point processes. Under weak assumptions and after suitable transformations these processes converge to a Poisson point process. Optimal selection rules for the limit model are used to construct asymptotically optimal selection rules for the original sequence of models. The results are applied to extreme value data for high thresholds data.  相似文献   

13.
The conditional distribution given complete sufficient statistics is used along with the Rao-Blackwell theorem to obtain uniformly minimum variance unbiased (UMVU) estimators after a transformation to normality has been applied to data. The estimators considered are for the mean, the variance and the cumulative distribution of the original non-normal data. Previous procedures to obtain UMVU estimators have used Laplace transforms, Taylor expansions and the jackknife. An integration method developed in this paper requires only integrability of the normalizing transformation function. This method is easy to employ and it is always possible to obtain a numerical result.  相似文献   

14.
Generalized discriminant analysis based on distances   总被引:14,自引:1,他引:13  
This paper describes a method of generalized discriminant analysis based on a dissimilarity matrix to test for differences in a priori groups of multivariate observations. Use of classical multidimensional scaling produces a low‐dimensional representation of the data for which Euclidean distances approximate the original dissimilarities. The resulting scores are then analysed using discriminant analysis, giving tests based on the canonical correlations. The asymptotic distributions of these statistics under permutations of the observations are shown to be invariant to changes in the distributions of the original variables, unlike the distributions of the multi‐response permutation test statistics which have been considered by other workers for testing differences among groups. This canonical method is applied to multivariate fish assemblage data, with Monte Carlo simulations to make power comparisons and to compare theoretical results and empirical distributions. The paper proposes classification based on distances. Error rates are estimated using cross‐validation.  相似文献   

15.
基于灰色模型的背景值表达式及非齐次指数增长序列的形式1,得到了一种一次累加序列与原始序列的关系,给出了系数确定方法,获得了适用于非齐次指数增长序列的直接型离散灰色模型,并给出了系数确定的方法。实例研究表明:本优化模型不仅具有可操作性,而且精度高,效果好。  相似文献   

16.
针对多目标决策值为区间数的规范化问题,提出一种具有奖优罚劣特性的[-1,1]线性变换算子,规范化处理原始决策信息,将其运用到目标权重确定,且属性值为区间数的多目标灰色局势决策中,给出了基于"奖优罚劣"算子的区间数多目标灰色局势决策方法,并以空舰导弹设计方案的选择作为应用案例,结果表明该方法操作方便、计算简单、易于实现,可以为一些具有区间值的不确定决策问题提供一种有效、科学、实用的方法。  相似文献   

17.
Spectral clustering uses eigenvectors of the Laplacian of the similarity matrix. It is convenient to solve binary clustering problems. When applied to multi-way clustering, either the binary spectral clustering is recursively applied or an embedding to spectral space is done and some other methods, such as K-means clustering, are used to cluster the points. Here we propose and study a K-way clustering algorithm – spectral modular transformation, based on the fact that the graph Laplacian has an equivalent representation, which has a diagonal modular structure. The method first transforms the original similarity matrix into a new one, which is nearly disconnected and reveals a cluster structure clearly, then we apply linearized cluster assignment algorithm to split the clusters. In this way, we can find some samples for each cluster recursively using the divide and conquer method. To get the overall clustering results, we apply the cluster assignment obtained in the previous step as the initialization of multiplicative update method for spectral clustering. Examples show that our method outperforms spectral clustering using other initializations.  相似文献   

18.
Statistical disclosure control (SDC) is a balancing act between mandatory data protection and the comprehensible demand from researchers for access to original data. In this paper, a family of methods is defined to ‘mask’ sensitive variables before data files can be released. In the first step, the variable to be masked is ‘cloned’ (C). Then, the duplicated variable as a whole or just a part of it is ‘suppressed’ (S). The masking procedure's third step ‘imputes’ (I) data for these artificial missings. Then, the original variable can be deleted and its masked substitute has to serve as the basis for the analysis of data. The idea of this general ‘CSI framework’ is to open the wide field of imputation methods for SDC. The method applied in the I-step can make use of available auxiliary variables including the original variable. Different members of this family of methods delivering variance estimators are discussed in some detail. Furthermore, a simulation study analyzes various methods belonging to the family with respect to both, the quality of parameter estimation and privacy protection. Based on the results obtained, recommendations are formulated for different estimation tasks.  相似文献   

19.
Summary.  Social science applications of sequence analysis have thus far involved the development of a typology on the basis of an analysis of one or two variables which have had a relatively low number of different states. There is a yet unexplored potential for sequence analysis to be applied to a greater number of variables and thereby a much larger state space. The development of a typology of employment experiences, for example, without reference to data on changes in housing, marital and family status is arguably inadequate. The paper demonstrates the use of sequence analysis in the examination of multivariable combinations of status as they change over time and shows that this method can provide insights that are difficult to achieve through other analytic methods. The data that are examined here provide support to intuitive understandings of clusters of common experiences which are both life course specific and related to socio-economic factors. Housing tenure is found to be of key importance in understanding the holistic trajectories that are examined. This suggests that life course trajectories are sharply differentiated by experience of social housing.  相似文献   

20.
近似非齐次指数增长序列的间接DGM(1,1)模型分析   总被引:6,自引:1,他引:5  
DGM(1,1)模型对近似齐次指数增长序列具有较高的预测精度,而实际上服从近似齐次指数增长规律的数据序列十分有限。根据灰色系统理论的差异信息原理,通过原始序列的累减生成将近似非齐次指数增长序列转化为近似齐次指数增长序列,对累减生成序列建立DGM(1,1)模型,并在此基础上实现对原始序列的还原以达到数据模拟及预测之目的。因原始序列的累减生成最大可能地满足了建模序列的齐次性要求,提高了模拟及预测精度,拓展了模型的适用范围,故通过算例验证了此种改进方法的简单性、实用性及有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号