首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Relationships between species and their environment are a key component to understand ecological communities. Usually, this kind of data are repeated over time or space for communities and their environment, which leads to a sequence of pairs of ecological tables, i.e. multi-way matrices. This work proposes a new method which is a combined approach of STATICO and Tucker3 techniques and deals to the problem of describing not only the stable part of the dynamics of structure–function relationships between communities and their environment (in different locations and/or at different times), but also the interactions and changes associated with the ecosystems’ dynamics. At the same time, emphasis is given to the comparison with the STATICO method on the same (real) data set, where advantages and drawbacks are explored and discussed. Thus, this study produces a general methodological framework and develops a new technique to facilitate the use of these practices by researchers. Furthermore, from this first approach with estuarine environmental data one of the major advantages of modeling ecological data sets with the CO-TUCKER model is the gain in interpretability.  相似文献   

2.
ABSTRACT

Canonical correlations are maximized correlation coefficients indicating the relationships between pairs of canonical variates that are linear combinations of the two sets of original variables. The number of non-zero canonical correlations in a population is called its dimensionality. Parallel analysis (PA) is an empirical method for determining the number of principal components or factors that should be retained in factor analysis. An example is given to illustrate for adapting proposed procedures based on PA and bootstrap modified PA to the context of canonical correlation analysis (CCA). The performances of the proposed procedures are evaluated in a simulation study by their comparison with traditional sequential test procedures with respect to the under-, correct- and over-determination of dimensionality in CCA.  相似文献   

3.
Many recent articles have found that atheoretical forecasting methods using many predictors give better predictions for key macroeconomic variables than various small-model methods. The practical relevance of these results is open to question, however, because these articles generally use ex post revised data not available to forecasters and because no comparison is made to best actual practice. We provide some evidence on both of these points using a new large dataset of vintage data synchronized with the Fed’s Greenbook forecast. This dataset consist of a large number of variables as observed at the time of each Greenbook forecast since 1979. We compare realtime, large dataset predictions to both simple univariate methods and to the Greenbook forecast. For inflation we find that univariate methods are dominated by the best atheoretical large dataset methods and that these, in turn, are dominated by Greenbook. For GDP growth, in contrast, we find that once one takes account of Greenbook’s advantage in evaluating the current state of the economy, neither large dataset methods, nor the Greenbook process offers much advantage over a univariate autoregressive forecast.  相似文献   

4.
The number of parameters mushrooms in a linear mixed effects (LME) model in the case of multivariate repeated measures data. Computation of these parameters is a real problem with the increase in the number of response variables or with the increase in the number of time points. The problem becomes more intricate and involved with the addition of additional random effects. A multivariate analysis is not possible in a small sample setting. We propose a method to estimate these many parameters in bits and pieces from baby models, by taking a subset of response variables at a time, and finally using these bits and pieces at the end to get the parameter estimates for the mother model, with all variables taken together. Applying this method one can calculate the fixed effects, the best linear unbiased predictions (BLUPs) for the random effects in the model, and also the BLUPs at each time of observation for each response variable, to monitor the effectiveness of the treatment for each subject. The proposed method is illustrated with an example of multiple response variables measured over multiple time points arising from a clinical trial in osteoporosis.  相似文献   

5.
Techniques of credit scoring have been developed these last years in order to reduce the risk taken by banks and financial institutions in the loans that they are granting. Credit Scoring is a classification problem of individuals in one of the two following groups: defaulting borrowers or non-defaulting borrowers. The aim of this paper is to propose a new method of discrimination when the dependent variable is categorical and when a large number of categorical explanatory variables are retained. This method, Categorical Multiblock Linear Discriminant Analysis, computes components which take into account both relationships between explanatory categorical variables and canonical correlation between each explanatory categorical variable and the dependent variable. A comparison with three other techniques and an application on credit scoring data are provided.  相似文献   

6.
This paper describes a proposal for the extension of the dual multiple factor analysis (DMFA) method developed by Lê and Pagès 15 to the analysis of categorical tables in which the same set of variables is measured on different sets of individuals. The extension of DMFA is based on the transformation of categorical variables into properly weighted indicator variables, in a way analogous to that used in the multiple factor analysis of categorical variables. The DMFA of categorical variables enables visual comparison of the association structures between categories over the sample as a whole and in the various subsamples (sets of individuals). For each category, DMFA allows us to obtain its global (considering all the individuals) and partial (considering each set of individuals) coordinates in a factor space. This visual analysis allows us to compare the set of individuals to identify their similarities and differences. The suitability of the technique is illustrated through two applications: one using simulated data for two groups of individuals with very different association structures and the other using real data from a voting intention survey in which some respondents were interviewed by telephone and others face to face. The results indicate that the two data collection methods, while similar, are not entirely equivalent.  相似文献   

7.
Regression methods for common data types such as measured, count and categorical variables are well understood but increasingly statisticians need ways to model relationships between variable types such as shapes, curves, trees, correlation matrices and images that do not fit into the standard framework. Data types that lie in metric spaces but not in vector spaces are difficult to use within the usual regression setting, either as the response and/or a predictor. We represent the information in these variables using distance matrices which requires only the specification of a distance function. A low-dimensional representation of such distance matrices can be obtained using methods such as multidimensional scaling. Once these variables have been represented as scores, an internal model linking the predictors and the responses can be developed using standard methods. We call scoring as the transformation from a new observation to a score, whereas backscoring is a method to represent a score as an observation in the data space. Both methods are essential for prediction and explanation. We illustrate the methodology for shape data, unregistered curve data and correlation matrices using motion capture data from an experiment to study the motion of children with cleft lip.  相似文献   

8.
Technical advances in many areas have produced more complicated high‐dimensional data sets than the usual high‐dimensional data matrix, such as the fMRI data collected in a period for independent trials, or expression levels of genes measured in different tissues. Multiple measurements exist for each variable in each sample unit of these data. Regarding the multiple measurements as an element in a Hilbert space, we propose Principal Component Analysis (PCA) in Hilbert space. The principal components (PCs) thus defined carry information about not only the patterns of variations in individual variables but also the relationships between variables. To extract the features with greatest contributions to the explained variations in PCs for high‐dimensional data, we also propose sparse PCA in Hilbert space by imposing a generalized elastic‐net constraint. Efficient algorithms to solve the optimization problems in our methods are provided. We also propose a criterion for selecting the tuning parameter.  相似文献   

9.
Cluster analysis is one of the most widely used method in statistical analyses, in which homogeneous subgroups are identified in a heterogeneous population. Due to the existence of the continuous and discrete mixed data in many applications, so far, some ordinary clustering methods such as, hierarchical methods, k-means and model-based methods have been extended for analysis of mixed data. However, in the available model-based clustering methods, by increasing the number of continuous variables, the number of parameters increases and identifying as well as fitting an appropriate model may be difficult. In this paper, to reduce the number of the parameters, for the model-based clustering mixed data of continuous (normal) and nominal data, a set of parsimonious models is introduced. Models in this set are extended, using the general location model approach, for modeling distribution of mixed variables and applying factor analyzer structure for covariance matrices. The ECM algorithm is used for estimating the parameters of these models. In order to show the performance of the proposed models for clustering, results from some simulation studies and analyzing two real data sets are presented.  相似文献   

10.
Summary.  Social science applications of sequence analysis have thus far involved the development of a typology on the basis of an analysis of one or two variables which have had a relatively low number of different states. There is a yet unexplored potential for sequence analysis to be applied to a greater number of variables and thereby a much larger state space. The development of a typology of employment experiences, for example, without reference to data on changes in housing, marital and family status is arguably inadequate. The paper demonstrates the use of sequence analysis in the examination of multivariable combinations of status as they change over time and shows that this method can provide insights that are difficult to achieve through other analytic methods. The data that are examined here provide support to intuitive understandings of clusters of common experiences which are both life course specific and related to socio-economic factors. Housing tenure is found to be of key importance in understanding the holistic trajectories that are examined. This suggests that life course trajectories are sharply differentiated by experience of social housing.  相似文献   

11.
Pettitt  A. N.  Weir  I. S.  Hart  A. G. 《Statistics and Computing》2002,12(4):353-367
A Gaussian conditional autoregressive (CAR) formulation is presented that permits the modelling of the spatial dependence and the dependence between multivariate random variables at irregularly spaced sites so capturing some of the modelling advantages of the geostatistical approach. The model benefits not only from the explicit availability of the full conditionals but also from the computational simplicity of the precision matrix determinant calculation using a closed form expression involving the eigenvalues of a precision matrix submatrix. The introduction of covariates into the model adds little computational complexity to the analysis and thus the method can be straightforwardly extended to regression models. The model, because of its computational simplicity, is well suited to application involving the fully Bayesian analysis of large data sets involving multivariate measurements with a spatial ordering. An extension to spatio-temporal data is also considered. Here, we demonstrate use of the model in the analysis of bivariate binary data where the observed data is modelled as the sign of the hidden CAR process. A case study involving over 450 irregularly spaced sites and the presence or absence of each of two species of rain forest trees at each site is presented; Markov chain Monte Carlo (MCMC) methods are implemented to obtain posterior distributions of all unknowns. The MCMC method works well with simulated data and the tree biodiversity data set.  相似文献   

12.
In the past decades, the number of variables explaining observations in different practical applications increased gradually. This has led to heavy computational tasks, despite of widely using provisional variable selection methods in data processing. Therefore, more methodological techniques have appeared to reduce the number of explanatory variables without losing much of the information. In these techniques, two distinct approaches are apparent: ‘shrinkage regression’ and ‘sufficient dimension reduction’. Surprisingly, there has not been any communication or comparison between these two methodological categories, and it is not clear when each of these two approaches are appropriate. In this paper, we fill some of this gap by first reviewing each category in brief, paying special attention to the most commonly used methods in each category. We then compare commonly used methods from both categories based on their accuracy, computation time, and their ability to select effective variables. A simulation study on the performance of the methods in each category is generated as well. The selected methods are concurrently tested on two sets of real data which allows us to recommend conditions under which one approach is more appropriate to be applied to high-dimensional data.  相似文献   

13.
Cox's seminal 1972 paper on regression methods for possibly censored failure time data popularized the use of time to an event as a primary response in prospective studies. But one key assumption of this and other regression methods is that observations are independent of one another. In many problems, failure times are clustered into small groups where outcomes within a group are correlated. Examples include failure times for two eyes from one person or for members of the same family.This paper presents a survey of models for multivariate failure time data. Two distinct classes of models are considered: frailty and marginal models. In a frailty model, the correlation is assumed to derive from latent variables (frailties) common to observations from the same cluster. Regression models are formulated for the conditional failure time distribution given the frailties. Alternatively, marginal models describe the marginal failure time distribution of each response while separately modelling the association among responses from the same cluster.We focus on recent extensions of the proportional hazards model for multivariate failure time data. Model formulation, parameter interpretation and estimation procedures are considered.  相似文献   

14.
Statistics, as one of the applied sciences, has great impacts in vast area of other sciences. Prediction of protein structures with great emphasize on their geometrical features using dihedral angles has invoked the new branch of statistics, known as directional statistics. One of the available biological techniques to predict is molecular dynamics simulations producing high-dimensional molecular structure data. Hence, it is expected that the principal component analysis (PCA) can response some related statistical problems particulary to reduce dimensions of the involved variables. Since the dihedral angles are variables on non-Euclidean space (their locus is the torus), it is expected that direct implementation of PCA does not provide great information in this case. The principal geodesic analysis is one of the recent methods to reduce the dimensions in the non-Euclidean case. A procedure to utilize this technique for reducing the dimension of a set of dihedral angles is highlighted in this paper. We further propose an extension of this tool, implemented in such way the torus is approximated by the product of two unit circle and evaluate its application in studying a real data set. A comparison of this technique with some previous methods is also undertaken.  相似文献   

15.
A versatile procedure is described comprising an application of statistical techniques to the analysis of the large, multi‐dimensional data arrays produced by electroencephalographic (EEG) measurements of human brain function. Previous analytical methods have been unable to identify objectively the precise times at which statistically significant experimental effects occur, owing to the large number of variables (electrodes) and small number of subjects, or have been restricted to two‐treatment experimental designs. Many time‐points are sampled in each experimental trial, making adjustment for multiple comparisons mandatory. Given the typically large number of comparisons and the clear dependence structure among time‐points, simple Bonferroni‐type adjustments are far too conservative. A three‐step approach is proposed: (i) summing univariate statistics across variables; (ii) using permutation tests for treatment effects at each time‐point; and (iii) adjusting for multiple comparisons using permutation distributions to control family‐wise error across the whole set of time‐points. Our approach provides an exact test of the individual hypotheses while asymptotically controlling family‐wise error in the strong sense, and can provide tests of interaction and main effects in factorial designs. An application to two experimental data sets from EEG studies is described, but the approach has application to the analysis of spatio‐temporal multivariate data gathered in many other contexts.  相似文献   

16.
Confronted with multivariate group-structured data, one is in fact always interested in describing differences between groups. In this paper, canonical correlation analysis (CCA) is used as an exploratory data analysis tool to detect and describe differences between groups of objects. CCA allows for the construction of Gabriel biplots, relating representations of objects and variables in the plane that best represents the distinction of the groups of object points. In the case of non-linear CCA, transformations of the original variables are suggested to achieve a better group separation compared with that obtained by linear CCA. One can detect which (transformed) variables are responsible for this separation. The separation itself might be due to several characteristics of the data (eg. distances between the centres of gravity of the original or transformed groups of object points, or differences in the structure of the original groups). Four case studies give an overview of an exploration of the possibilities offered by linear and non-linear CCA.  相似文献   

17.
In this paper some hierarchical methods for identifying groups of variables are illustrated and compared. It is shown that the use of multivariate association measures between two sets of variables can overcome the drawbacks of the usually employed bivariate correlation coefficient, but the resulting methods are generally not monotonic. Thus a new multivariate association measure is proposed, based on the links existing between canonical correlation analysis and principal component analysis, which can be more suitably used for the purpose at hand. The hierarchical method based on the suggested measure is illustrated and compared with other possible solutions by analysing simulated and real data sets. Finally an extension of the suggested method to the more general situation of mixed (qualitative and quantitative) variables is proposed and theoretically discussed.  相似文献   

18.
Drug discovery is the process of identifying compounds which have potentially meaningful biological activity. A major challenge that arises is that the number of compounds to search over can be quite large, sometimes numbering in the millions, making experimental testing intractable. For this reason computational methods are employed to filter out those compounds which do not exhibit strong biological activity. This filtering step, also called virtual screening reduces the search space, allowing for the remaining compounds to be experimentally tested.In this paper we propose several novel approaches to the problem of virtual screening based on Canonical Correlation Analysis (CCA) and on a kernel-based extension. Spectral learning ideas motivate our proposed new method called Indefinite Kernel CCA (IKCCA). We show the strong performance of this approach both for a toy problem as well as using real world data with dramatic improvements in predictive accuracy of virtual screening over an existing methodology.  相似文献   

19.
ABSTRACT

Environmental data is typically indexed in space and time. This work deals with modelling spatio-temporal air quality data, when multiple measurements are available for each space-time point. Typically this situation arises when different measurements referring to several response variables are observed in each space-time point, for example, different pollutants or size resolved data on particular matter. Nonetheless, such a kind of data also arises when using a mobile monitoring station moving along a path for a certain period of time. In this case, each spatio-temporal point has a number of measurements referring to the response variable observed several times over different locations in a close neighbourhood of the space-time point. We deal with this type of data within a hierarchical Bayesian framework, in which observed measurements are modelled in the first stage of the hierarchy, while the unobserved spatio-temporal process is considered in the following stages. The final model is very flexible and includes autoregressive terms in time, different structures for the variance-covariance matrix of the errors, and can manage covariates available at different space-time resolutions. This approach is motivated by the availability of data on urban pollution dynamics: fast measures of gases and size resolved particulate matter have been collected using an Optical Particle Counter located on a cabin of a public conveyance that moves on a monorail on a line transect of a town. Urban microclimate information is also available and included in the model. Simulation studies are conducted to evaluate the performance of the proposed model over existing alternatives that do not model data over the first stage of the hierarchy.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号