首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In the method of paired comparisons (PCs), treatments are compared on the basis of qualitative characteristics they possess, in the light of their sensory evaluations made by judges. However, there may emerge the situations where in addition to qualitative merits/worths, judges may assign quantitative weights to reflect/specify the relative importance of the treatments. In this study, an attempt is made to reconcile the qualitative and the quantitative PCs through assigning quantitative weights to treatments having qualitative merits using/extending the Bradley–Terry (BT) model. Behaviors of the existing BT model and the proposed weighted BT model are studied through the test of goodness-of-fit. Experimental and simulated data sets are used for illustration.  相似文献   

2.
The analysis of high-dimensional data often begins with the identification of lower dimensional subspaces. Principal component analysis is a dimension reduction technique that identifies linear combinations of variables along which most variation occurs or which best “reconstruct” the original variables. For example, many temperature readings may be taken in a production process when in fact there are just a few underlying variables driving the process. A problem with principal components is that the linear combinations can seem quite arbitrary. To make them more interpretable, we introduce two classes of constraints. In the first, coefficients are constrained to equal a small number of values (homogeneity constraint). The second constraint attempts to set as many coefficients to zero as possible (sparsity constraint). The resultant interpretable directions are either calculated to be close to the original principal component directions, or calculated in a stepwise manner that may make the components more orthogonal. A small dataset on characteristics of cars is used to introduce the techniques. A more substantial data mining application is also given, illustrating the ability of the procedure to scale to a very large number of variables.  相似文献   

3.
The K-means clustering method is a widely adopted clustering algorithm in data mining and pattern recognition, where the partitions are made by minimizing the total within group sum of squares based on a given set of variables. Weighted K-means clustering is an extension of the K-means method by assigning nonnegative weights to the set of variables. In this paper, we aim to obtain more meaningful and interpretable clusters by deriving the optimal variable weights for weighted K-means clustering. Specifically, we improve the weighted k-means clustering method by introducing a new algorithm to obtain the globally optimal variable weights based on the Karush-Kuhn-Tucker conditions. We present the mathematical formulation for the clustering problem, derive the structural properties of the optimal weights, and implement an recursive algorithm to calculate the optimal weights. Numerical examples on simulated and real data indicate that our method is superior in both clustering accuracy and computational efficiency.  相似文献   

4.
The asymptotic behavior of localized principal components applying kernels as weights is investigated. In particular, we show that the first-order approximation of the first localized principal component at any given point only depends on the bandwidth parameter(s) and the density at that point. This result is extended to the context of local principal curves, where the characteristics of the points at which the curve stops at the edges are identified. This is used to provide a method which allows the curve to proceed beyond its natural endpoint if desired.  相似文献   

5.
Regression tends to give very unstable and unreliable regression weights when predictors are highly collinear. Several methods have been proposed to counter this problem. A subset of these do so by finding components that summarize the information in the predictors and the criterion variables. The present paper compares six such methods (two of which are almost completely new) to ordinary regression: Partial least Squares (PLS), Principal Component regression (PCR), Principle covariates regression, reduced rank regression, and two variants of what is called power regression. The comparison is mainly done by means of a series of simulation studies, in which data are constructed in various ways, with different degrees of collinearity and noise, and the methods are compared in terms of their capability of recovering the population regression weights, as well as their prediction quality for the complete population. It turns out that recovery of regression weights in situations with collinearity is often very poor by all methods, unless the regression weights lie in the subspace spanning the first few principal components of the predictor variables. In those cases, typically PLS and PCR give the best recoveries of regression weights. The picture is inconclusive, however, because, especially in the study with more real life like simulated data, PLS and PCR gave the poorest recoveries of regression weights in conditions with relatively low noise and collinearity. It seems that PLS and PCR are particularly indicated in cases with much collinearity, whereas in other cases it is better to use ordinary regression. As far as prediction is concerned: Prediction suffers far less from collinearity than recovery of the regression weights.  相似文献   

6.
Technical advances in many areas have produced more complicated high‐dimensional data sets than the usual high‐dimensional data matrix, such as the fMRI data collected in a period for independent trials, or expression levels of genes measured in different tissues. Multiple measurements exist for each variable in each sample unit of these data. Regarding the multiple measurements as an element in a Hilbert space, we propose Principal Component Analysis (PCA) in Hilbert space. The principal components (PCs) thus defined carry information about not only the patterns of variations in individual variables but also the relationships between variables. To extract the features with greatest contributions to the explained variations in PCs for high‐dimensional data, we also propose sparse PCA in Hilbert space by imposing a generalized elastic‐net constraint. Efficient algorithms to solve the optimization problems in our methods are provided. We also propose a criterion for selecting the tuning parameter.  相似文献   

7.
We propose new ensemble approaches to estimate the population mean for missing response data with fully observed auxiliary variables. We first compress the working models according to their categories through a weighted average, where the weights are proportional to the square of the least‐squares coefficients of model refitting. Based on the compressed values, we develop two ensemble frameworks, under which one is to adjust weights in the inverse probability weighting procedure and the other is built upon an additive structure by reformulating the augmented inverse probability weighting function. The asymptotic normality property is established for the proposed estimators through the theory of estimating functions with plugged‐in nuisance parameter estimates. Simulation studies show that the new proposals have substantial advantages over existing ones for small sample sizes, and an acquired immune deficiency syndrome data example is used for illustration.  相似文献   

8.
在面板数据聚类分析方法的研究中,基于面板数据兼具截面维度和时间维度的特征,对欧氏距离函数进行了改进,在聚类过程中考虑指标权重与时间权重,提出了适用于面板数据聚类分析的"加权距离函数"以及相应的Ward.D聚类方法。首先定义了考虑指标绝对值、邻近时点增长率以及波动变异程度的欧氏距离函数;然后,将指标权重与时间权重通过线性模型集结成综合加权距离,最终实现面板数据的加权聚类过程。实证分析结果显示,考虑指标权重与时间权重的面板数据加权聚类分析方法具有更好的分辨能力,能提高样本聚类的准确性。  相似文献   

9.
We address the task of choosing prior weights for models that are to be used for weighted model averaging. Models that are very similar should usually be given smaller weights than models that are quite distinct. Otherwise, the importance of a model in the weighted average could be increased by augmenting the set of models with duplicates of the model or virtual duplicates of it. Similarly, the importance of a particular model feature (a certain covariate, say) could be exaggerated by including many models with that feature. Ways of forming a correlation matrix that reflects the similarity between models are suggested. Then, weighting schemes are proposed that assign prior weights to models on the basis of this matrix. The weighting schemes give smaller weights to models that are more highly correlated. Other desirable properties of a weighting scheme are identified, and we examine the extent to which these properties are held by the proposed methods. The weighting schemes are applied to real data, and prior weights, posterior weights and Bayesian model averages are determined. For these data, empirical Bayes methods were used to form the correlation matrices that yield the prior weights. Predictive variances are examined, as empirical Bayes methods can result in unrealistically small variances.  相似文献   

10.
We develop an improved approximation to the asymptotic null distribution of the goodness-of-fit tests for panel observed multi-state Markov models (Aguirre-Hernandez and Farewell, Stat Med 21:1899–1911, 2002) and hidden Markov models (Titman and Sharples, Stat Med 27:2177–2195, 2008). By considering the joint distribution of the grouped observed transition counts and the maximum likelihood estimate of the parameter vector it is shown that the distribution can be expressed as a weighted sum of independent c21{\chi^2_1} random variables, where the weights are dependent on the true parameters. The performance of this approximation for finite sample sizes and where the weights are calculated using the maximum likelihood estimates of the parameters is considered through simulation. In the scenarios considered, the approximation performs well and is a substantial improvement over the simple χ 2 approximation.  相似文献   

11.
In this study, classical and robust principal component analyses are used to evaluate socioeconomic development of regions of development agencies that give service on the purpose of decreasing development difference among regions in Turkey. Due to the high differences between development levels of regions outlier problem occurs, hence robust statistical methods are used. Also, classical and robust statistical methods are used to investigate if there are any outliers in data set. In classic principal component analyse, the number of observations must be larger than the number of variables. Otherwise determinant of covariance matrix is zero. In Robust method for Principal Component Analysis (ROBPCA), a robust approach to principal component analyse in high-dimensional data, even if the number of variables is larger than the number of observations, principal components are obtained. In this paper, firstly 26 development agencies are evaluated with 19 variables by using principal component analysis based on classical and robust scatter matrices and then these 26 development agencies are evaluated with 46 variables by using the ROBPCA method.  相似文献   

12.
In practice, when a principal component analysis is applied on a large number of variables the resultant principal components may not be easy to interpret, as each principal component is a linear combination of all the original variables. Selection of a subset of variables that contains, in some sense, as much information as possible and enhances the interpretations of the first few covariance principal components is one possible approach to tackle this problem. This paper describes several variable selection criteria and investigates which criteria are best for this purpose. Although some criteria are shown to be better than others, the main message of this study is that it is unwise to rely on only one or two criteria. It is also clear that the interdependence between variables and the choice of how to measure closeness between the original components and those using subsets of variables are both important in determining the best criteria to use.  相似文献   

13.
An approach to non-linear principal components using radially symmetric kernel basis functions is described. The procedure consists of two steps: a projection of the data set to a reduced dimension using a non-linear transformation whose parameters are determined by the solution of a generalized symmetric eigenvector equation. This is achieved by demanding a maximum variance transformation subject to a normalization condition (Hotelling's approach) and can be related to the homogeneity analysis approach of Gifi through the minimization of a loss function. The transformed variables are the principal components whose values define contours, or more generally hypersurfaces, in the data space. The second stage of the procedure defines the fitting surface, the principal surface, in the data space (again as a weighted sum of kernel basis functions) using the definition of self-consistency of Hastie and Stuetzle. The parameters of this principal surface are determined by a singular value decomposition and crossvalidation is used to obtain the kernel bandwidths. The approach is assessed on four data sets.  相似文献   

14.
Principal components are often used for reducing dimensions in multivariate data, but they frequently fail to provide useful results and their interpretation is rather difficult. In this article, the use of entropy optimization principles for dimensional reduction in multivariate data is proposed. Under the assumptions of multivariate normality, a four-step procedure is developed for selecting principal variables and hence discarding redundant variables. For comparative performance of the information theoretic procedure, we use simulated data with known dimensionality. Principal variables of cluster bean (Guar) are identified by applying this procedure to a real data set generated in a plant breeding experiment.  相似文献   

15.
This article studies dynamic panel data models in which the long run outcome for a particular cross-section is affected by a weighted average of the outcomes in the other cross-sections. We show that imposing such a structure implies a model with several cointegrating relationships that, unlike in the standard case, are nonlinear in the coe?cients to be estimated. Assuming that the weights are exogenously given, we extend the dynamic ordinary least squares methodology and provide a dynamic two-stage least squares estimator. We derive the large sample properties of our proposed estimator under a set of low-level assumptions. Then our methodology is applied to US financial market data, which consist of credit default swap spreads, as well as firm-specific and industry data. We construct the economic space using a “closeness” measure for firms based on input–output matrices. Our estimates show that this particular form of spatial correlation of credit default swap spreads is substantial and highly significant.  相似文献   

16.
This article provides a method of interpreting a surprising inequality in multiple linear regression: the squared multiple correlation can be greater than the sum of the simple squared correlations between the response variable and each of the predictor variables. The interpretation is obtained via principal component analysis by studying the influence of some components with small variance on the response variable. One example is used as an illustration and some conclusions are derived.  相似文献   

17.
We consider the case of a multicenter trial in which the center specific sample sizes are potentially small. Under homogeneity, the conventional procedure is to pool information using a weighted estimator where the weights used are inverse estimated center-specific variances. Whereas this procedure is efficient for conventional asymptotics (e. g. center-specific sample sizes become large, number of center fixed), it is commonly believed that the efficiency of this estimator holds true also for meta-analytic asymptotics (e.g. center-specific sample size bounded, potentially small, and number of centers large). In this contribution we demonstrate that this estimator fails to be efficient. In fact, it shows a persistent bias with increasing number of centers showing that it isnot meta-consistent. In addition, we show that the Cochran and Mantel-Haenszel weighted estimators are meta-consistent and, in more generality, provide conditions on the weights such that the associated weighted estimator is meta-consistent.  相似文献   

18.
张波  刘晓倩 《统计研究》2019,36(4):119-128
本文旨在研究基于fused惩罚的稀疏主成分分析方法,以适用于相邻变量之间高度相关甚至完全相等的数据情形。首先,从回归分析角度出发,提出一种求解稀疏主成分的简便思路,给出一种广义的稀疏主成分模型—— GSPCA模型及其求解算法,并证明在惩罚函数取1-范数时,该模型与现有的稀疏主成分模型——SPC模型的求解结果一致。其次,本文提出将fused惩罚与主成分分析相结合,得到一种fused稀疏主成分分析方法,并从惩罚性矩阵分解和回归分析两个角度,给出两种模型形式。在理论上证明了两种模型的求解结果是一致的,故将其统称为FSPCA模型。模拟实验显示,FSPCA模型在处理相邻变量之间高度相关甚至完全相等的数据集上的表现良好。最后,将FSPCA模型应用于手写数字识别,发现与SPC模型相比,FSPCA模型所提取的主成分具备更好的解释性,这使得该模型更具实用价值。  相似文献   

19.
A fundamental concept in two-arm non-parametric survival analysis is the comparison of observed versus expected numbers of events on one of the treatment arms (the choice of which arm is arbitrary), where the expectation is taken assuming that the true survival curves in the two arms are identical. This concept is at the heart of the counting-process theory that provides a rigorous basis for methods such as the log-rank test. It is natural, therefore, to maintain this perspective when extending the log-rank test to deal with non-proportional hazards, for example, by considering a weighted sum of the “observed - expected” terms, where larger weights are given to time periods where the hazard ratio is expected to favor the experimental treatment. In doing so, however, one may stumble across some rather subtle issues, related to difficulties in the interpretation of hazard ratios, that may lead to strange conclusions. An alternative approach is to view non-parametric survival comparisons as permutation tests. With this perspective, one can easily improve on the efficiency of the log-rank test, while thoroughly controlling the false positive rate. In particular, for the field of immuno-oncology, where researchers often anticipate a delayed treatment effect, sample sizes could be substantially reduced without loss of power.  相似文献   

20.
In this article, a robust variable selection procedure based on the weighted composite quantile regression (WCQR) is proposed. Compared with the composite quantile regression (CQR), WCQR is robust to heavy-tailed errors and outliers in the explanatory variables. For the choice of the weights in the WCQR, we employ a weighting scheme based on the principal component method. To select variables with grouping effect, we consider WCQR with SCAD-L2 penalization. Furthermore, under some suitable assumptions, the theoretical properties, including the consistency and oracle property of the estimator, are established with a diverging number of parameters. In addition, we study the numerical performance of the proposed method in the case of ultrahigh-dimensional data. Simulation studies and real examples are provided to demonstrate the superiority of our method over the CQR method when there are outliers in the explanatory variables and/or the random error is from a heavy-tailed distribution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号