首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Sparse principal components analysis (SPCA) is a technique for finding principal components with a small number of non‐zero loadings. Our contribution to this methodology is twofold. First we derive the sparse solutions that minimise the least squares criterion subject to sparsity requirements. Second, recognising that sparsity is not the only requirement for achieving simplicity, we suggest a backward elimination algorithm that computes sparse solutions with large loadings. This algorithm can be run without specifying the number of non‐zero loadings in advance. It is also possible to impose the requirement that a minimum amount of variance be explained by the components. We give thorough comparisons with existing SPCA methods and present several examples using real datasets.  相似文献   

2.
The principal components analysis (PCA) in the frequency domain of a stationary p-dimensional time series (X n ) n∈? leads to a summarizing time series written as a linear combination series X n =∑ m C m ° X n?m . Therefore, we observe that, when the coefficients C m , m≠0, are close to 0, this PCA is close to the usual PCA, that is the PCA in the temporal domain. When the coefficients tend to 0, the corresponding limit is said to satisfy a property noted 𝒫, of which we will study the consequences. Finally, we will examine, for any series, the proximity between the two PCAs.  相似文献   

3.
Consider a parallel system with n independent components. Assume that the lifetime of the jth component follows an exponential distribution with a constant but unknown parameter λj, 1≤jn. We test rj components of type-j for failure and compute the total time Tj of rj failures for the jth component. Based on T=(T1,T2,…,Tn) and r=(r1,r2,…,rn), we derive optimal reliability test plans which ensure the usual probability requirements on system reliability. Further, we solve the associated nonlinear integer programming problem by a simple enumeration of integers over the feasible range. An algorithm is developed to obtain integer solutions with minimum cost. Finally, some examples have been discussed for various levels of producer’s and consumer’s risk to illustrate the approach. Our optimal plans lead to considerable savings in costs over the available plans in the literature.  相似文献   

4.
A new method for constructing interpretable principal components is proposed. The method first clusters the variables, and then interpretable (sparse) components are constructed from the correlation matrices of the clustered variables. For the first step of the method, a new weighted-variances method for clustering variables is proposed. It reflects the nature of the problem that the interpretable components should maximize the explained variance and thus provide sparse dimension reduction. An important feature of the new clustering procedure is that the optimal number of clusters (and components) can be determined in a non-subjective manner. The new method is illustrated using well-known simulated and real data sets. It clearly outperforms many existing methods for sparse principal component analysis in terms of both explained variance and sparseness.  相似文献   

5.
Interpretation of principal components is difficult due to their weights (loadings, coefficients) being of various sizes. Whereas very small weights or very large weights can give clear indication of the importance of particular variables, weights that are neither large nor small (‘grey area’ weights) are problematical. This is a particular problem in the fast moving goods industries where a lot of multivariate panel data are collected on products. These panel data are subjected to univariate analyses and multivariate analyses where principal components (PCs) are key to the interpretation of the data. Several authors have suggested alternatives to PCs, seeking simplified components such as sparse PCs. Here components, termed simple components (SCs), are sought in conjunction with Thurstonian criteria that a component should have only a few variables highly weighted on it and each variable should be weighted heavily on just a few components. An algorithm is presented that finds SCs efficiently. Simple components are found for panel data consisting of the responses to a questionnaire on efficacy and other features of deodorants. It is shown that five SCs can explain an amount of variation within the data comparable to that explained by the PCs, but with easier interpretation.  相似文献   

6.
Differential analysis techniques are commonly used to offer scientists a dimension reduction procedure and an interpretable gateway to variable selection, especially when confronting high-dimensional genomic data. Huang et al. used a gene expression profile of breast cancer cell lines to identify genomic markers which are highly correlated with in vitro sensitivity of a drug Dasatinib. They considered three statistical methods to identify differentially expressed genes and finally used the results from the intersection. But the statistical methods that are used in the paper are not sufficient to select the genomic markers. In this paper we used three alternative statistical methods to select a combined list of genomic markers and compared the genes that were proposed by Huang et al. We then proposed to use sparse principal component analysis (Sparse PCA) to identify a final list of genomic markers. The Sparse PCA incorporates correlation into account among the genes and helps to draw a successful genomic markers discovery. We present a new and a small set of genomic markers to separate out the groups of patients effectively who are sensitive to the drug Dasatinib. The analysis procedure will also encourage scientists in identifying genomic markers that can help to separate out two groups.  相似文献   

7.
Abstract

In this article we study the relationship between principal component analysis and a multivariate dependency measure. It is shown, via simulated examples and real data, that the information provided by principal components is compatible with that obtained via the dependency measure δ. Furthermore, we show that in some instances in which principal component analysis fails to give reasonable results due to nonlinearity among the random variables, the dependency statistic δ still provides good results. Finally, we give some ideas about using the statistic δ in order to reduce the dimensionality of a given data set.  相似文献   

8.
In practice, when a principal component analysis is applied on a large number of variables the resultant principal components may not be easy to interpret, as each principal component is a linear combination of all the original variables. Selection of a subset of variables that contains, in some sense, as much information as possible and enhances the interpretations of the first few covariance principal components is one possible approach to tackle this problem. This paper describes several variable selection criteria and investigates which criteria are best for this purpose. Although some criteria are shown to be better than others, the main message of this study is that it is unwise to rely on only one or two criteria. It is also clear that the interdependence between variables and the choice of how to measure closeness between the original components and those using subsets of variables are both important in determining the best criteria to use.  相似文献   

9.
We introduce a new family of skew-normal distributions that contains the skew-normal distributions introduced by Azzalini (Scand J Stat 12:171–178, 1985), Arellano-Valle et al. (Commun Stat Theory Methods 33(7):1465–1480, 2004), Gupta and Gupta (Test 13(2):501–524, 2008) and Sharafi and Behboodian (Stat Papers, 49:769–778, 2008). We denote this distribution by GBSN n 1, λ2). We present some properties of GBSN n 1, λ2) and derive the moment generating function. Finally, we use two numerical examples to illustrate the practical usefulness of this distribution.  相似文献   

10.
A fast routine for converting regression algorithms into corresponding orthogonal regression (OR) algorithms was introduced in Ammann and Van Ness (1988). The present paper discusses the properties of various ordinary and robust OR procedures created using this routine. OR minimizes the sum of the orthogonal distances from the regression plane to the data points. OR has three types of applications. First, L 2 OR is the maximum likelihood solution of the Gaussian errors-in-variables (EV) regression problem. This L 2 solution is unstable, thus the robust OR algorithms created from robust regression algorithms should prove very useful. Secondly, OR is intimately related to principal components analysis. Therefore, the routine can also be used to create L 1, robust, etc. principal components algorithms. Thirdly, OR treats the x and y variables symmetrically which is important in many modeling problems. Using Monte Carlo studies this paper compares the performance of standard regression, robust regression, OR, and robust OR on Gaussian EV data, contaminated Gaussian EV data, heavy-tailed EV data, and contaminated heavy-tailed EV data.  相似文献   

11.
We consider a generalization of the Azzalini skew–normal distribution. We denote this distribution by SNB n (λ). Some properties of SNB n (λ) are studied. Its moment generating function is derived, and the bivariate case of SNB n (λ) is introduced. Finally, we illustrate a numerical example and we present an application for order statistics.  相似文献   

12.
In this note, we derive some mixture representations for the reliability function of the conditional residual lifetime of a coherent system with n independent and identically distributed (i.i.d.) components under the condition that at time t1 the jth failures has occurred and at time t2 the kth failures (j < k) have not occurred yet. Based on the mixture representations, we then discuss the stochastic comparisons of the conditional residual lifetimes of two coherent systems with i.i.d. components.  相似文献   

13.
The robust principal components analysis (RPCA) introduced by Campbell (Applied Statistics 1980, 29, 231–237) provides in addition to robust versions of the usual output of a principal components analysis, weights for the contribution of each point to the robust estimation of each component. Low weights may thus be used to indicate outliers. The present simulation study provides critical values for testing the kth smallest weight in the RPCA of a sample of n p-dimensional vectors, under the null hypothesis of a multivariate normal distribution. The cases p=2(2)10, 15, 20 for n=20, 30, 40, 50, 75, 100 subject to n≥p/2, are examined, with k≤√n.  相似文献   

14.
Double arrays of n rows and p columns can be regarded as n drawings from some p-dimensional population. A sequence of such arrays is considered. Principal component analysis for each array forms sequences of sample principal components and eigenvalues. The continuity of these sequences, in the sense of convergence with probability one and convergence in probability, is investigated, that appears to be informative for pattern study and prediction of principal components. Various features of paths of sequences of population principal components are highlighted through an example.  相似文献   

15.
The use of large-dimensional factor models in forecasting has received much attention in the literature with the consensus being that improvements on forecasts can be achieved when comparing with standard models. However, recent contributions in the literature have demonstrated that care needs to be taken when choosing which variables to include in the model. A number of different approaches to determining these variables have been put forward. These are, however, often based on ad hoc procedures or abandon the underlying theoretical factor model. In this article, we will take a different approach to the problem by using the least absolute shrinkage and selection operator (LASSO) as a variable selection method to choose between the possible variables and thus obtain sparse loadings from which factors or diffusion indexes can be formed. This allows us to build a more parsimonious factor model that is better suited for forecasting compared to the traditional principal components (PC) approach. We provide an asymptotic analysis of the estimator and illustrate its merits empirically in a forecasting experiment based on U.S. macroeconomic data. Overall we find that compared to PC we obtain improvements in forecasting accuracy and thus find it to be an important alternative to PC. Supplementary materials for this article are available online.  相似文献   

16.
Series and parallel systems consisting of two dependent components are studied under bivariate shock models. The random variables N1 and N2 that represent respectively the number of shocks until failure of component 1 and component 2 are assumed to be dependent and phase-type. The times between successive shocks are assumed to follow a continuous phase-type distribution, and survival functions and mean time to failure values of series and parallel systems are obtained in matrix forms. An upper bound for the joint survival function of the components is also provided under the particular case when the times between shocks follow exponential distribution.  相似文献   

17.
This article investigates the relevance of considering a large number of macroeconomic indicators to forecast the complete distribution of a variable. The baseline time series model is a semiparametric specification based on the quantile autoregressive (QAR) model that assumes that the quantiles depend on the lagged values of the variable. We then augment the time series model with macroeconomic information from a large dataset by including principal components or a subset of variables selected by LASSO. We forecast the distribution of the h-month growth rate for four economic variables from 1975 to 2011 and evaluate the forecast accuracy relative to a stochastic volatility model using the quantile score. The results for the output and employment measures indicate that the multivariate models outperform the time series forecasts, in particular at long horizons and in tails of the distribution, while for the inflation variables the improved performance occurs mostly at the 6-month horizon. We also illustrate the practical relevance of predicting the distribution by considering forecasts at three dates during the last recession.  相似文献   

18.
How to improve the fit of Archimedean copulas by means of transforms   总被引:1,自引:1,他引:0  
The selection of copulas is an important aspect of dependence modeling issues. In many practical applications, only a limited number of copulas is tested and the copula with the best result for a goodness-of-fit test is chosen, which, however, does not always lead to the best possible fit. In this paper we develop a practical and logical method for improving the goodness-of-fit of a particular Archimedean copula by means of transforms. In order to do this, we introduce concordance invariant transforms which can also be tail dependence preserving, based on an analysis on the λ-function, l = \fracjj¢{\lambda=\frac{\varphi}{\varphi'}}, where j{\varphi} is the Archimedean generator. The methodology is applied to the data set studied in Cook and Johnson (J R Stat Soc B 43:210–218, 1981) and Genest and Rivest (J Am Stat Assoc 88:1043–1043, 1993), where we improve the fit of the Frank copula and obtain statistically significant results.  相似文献   

19.
ABSTRACT

We present a decomposition of prediction error for the multilevel model in the context of predicting a future observable y *j in the jth group of a hierarchical dataset. The multilevel prediction rule is used for prediction and the components of prediction error are estimated via a simulation study that spans the various combinations of level-1 (individual) and level-2 (group) sample sizes and different intraclass correlation values. Additionally, analytical results present the increase in predicted mean square error (PMSE) with respect to prediction error bias. The components of prediction error provide information with respect to the cost of parameter estimation versus data imputation for predicting future values in a hierarchical data set. Specifically, the cost of parameter estimation is very small compared to data imputation.  相似文献   

20.
Summary Letg(x) andf(x) be continuous density function on (a, b) and let {ϕj} be a complete orthonormal sequence of functions onL 2(g), which is the set of squared integrable functions weighted byg on (a, b). Suppose that over (a, b). Given a grouped sample of sizen fromf(x), the paper investigates the asymptotic properties of the restricted maximum likelihood estimator of density, obtained by setting all but the firstm of the ϑj’s equal to0. Practical suggestions are given for performing estimation via the use of Fourier and Legendre polynomial series. Research partially supported by: CNR grant, n. 93. 00837. CT10.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号