首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 782 毫秒
1.
This paper describes a proposal for the extension of the dual multiple factor analysis (DMFA) method developed by Lê and Pagès 15 to the analysis of categorical tables in which the same set of variables is measured on different sets of individuals. The extension of DMFA is based on the transformation of categorical variables into properly weighted indicator variables, in a way analogous to that used in the multiple factor analysis of categorical variables. The DMFA of categorical variables enables visual comparison of the association structures between categories over the sample as a whole and in the various subsamples (sets of individuals). For each category, DMFA allows us to obtain its global (considering all the individuals) and partial (considering each set of individuals) coordinates in a factor space. This visual analysis allows us to compare the set of individuals to identify their similarities and differences. The suitability of the technique is illustrated through two applications: one using simulated data for two groups of individuals with very different association structures and the other using real data from a voting intention survey in which some respondents were interviewed by telephone and others face to face. The results indicate that the two data collection methods, while similar, are not entirely equivalent.  相似文献   

2.
At the core of multivariate statistics is the investigation of relationships between different sets of variables. More precisely, the inter-variable relationships and the causal relationships. The latter is a regression problem, where one set of variables is referred to as the response variables and the other set of variables as the predictor variables. In this situation, the effect of the predictors on the response variables is revealed through the regression coefficients. Results from the resulting regression analysis can be viewed graphically using the biplot. The consequential biplot provides a single graphical representation of the samples together with the predictor variables and response variables. In addition, their effect in terms of the regression coefficients can be visualized, although sub-optimally, in the said biplot.KEYWORDS: Biplot, regression analysis, multivariate regression, rank approximation  相似文献   

3.
Using several variables known to be related to prostate cancer, a multivariate classification method is developed to predict the onset of clinical prostate cancer. A multivariate mixed-effects model is used to describe longitudinal changes in prostate specific antigen (PSA), a free testosterone index (FTI), and body mass index (BMI) before any clinical evidence of prostate cancer. The patterns of change in these three variables are allowed to vary depending on whether the subject develops prostate cancer or not and the severity of the prostate cancer at diagnosis. An application of Bayes' theorem provides posterior probabilities that we use to predict whether an individual will develop prostate cancer and, if so, whether it is a high-risk or a low-risk cancer. The classification rule is applied sequentially one multivariate observation at a time until the subject is classified as a cancer case or until the last observation has been used. We perform the analyses using each of the three variables individually, combined together in pairs, and all three variables together in one analysis. We compare the classification results among the various analyses and a simulation study demonstrates how the sensitivity of prediction changes with respect to the number and type of variables used in the prediction process.  相似文献   

4.
Consider a population of individuals who are free of a disease under study, and who are exposed simultaneously at random exposure levels, say X,Y,Z,… to several risk factors which are suspected to cause the disease in the populationm. At any specified levels X=x, Y=y, Z=z, …, the incidence rate of the disease in the population ot risk is given by the exposure–response relationship r(x,y,z,…) = P(disease|x,y,z,…). The present paper examines the relationship between the joint distribution of the exposure variables X,Y,Z, … in the population at risk and the joint distribution of the exposure variables U,V,W,… among cases under the linear and the exponential risk models. It is proven that under the exponential risk model, these two joint distributions belong to the same family of multivariate probability distributions, possibly with different parameters values. For example, if the exposure variables in the population at risk have jointly a multivariate normal distribution, so do the exposure variables among cases; if the former variables have jointly a multinomial distribution, so do the latter. More generally, it is demonstrated that if the joint distribution of the exposure variables in the population at risk belongs to the exponential family of multivariate probability distributions, so does the joint distribution of exposure variables among cases. If the epidemiologist can specify the differnce among the mean exposure levels in the case and control groups which are considered to be clinically or etiologically important in the study, the results of the present paper may be used to make sample size determinations for the case–control study, corresponding to specified protection levels, i.e., size α and 1–β of a statistical test. The multivariate normal, the multinomial, the negative multinomial and Fisher's multivariate logarithmic series exposure distributions are used to illustrate our results.  相似文献   

5.
In this paper, we propose a new measure of fit which can be used in the case of quantile–quantile plots. This measure, when applied to Small's and Srivastava's graphical methods provides two new tests for assessing multivariate normality. For different sample sizes and numbers of variables, the critical values of these tests were evaluated via simulations. The power of the new tests and its comparison with some other tests for multivariate normality are presented herein.  相似文献   

6.
In this article we consider a set of t repeated measurements on p variables (or characteristics) on each of the n individuals. Thus, data on each individual is a p ×t matrix. The n individuals themselves may be divided and randomly assigned to g groups. Analysis of these data using a MANOVA model, assuming that the data on an individual has a covariance matrix which is a Kronecker product of two positive definite matrices, is considered. The well-known Satterthwaite type approximation to the distribution of a quadratic form in normal variables is extended to the distribution of a multivariate quadratic form in multivariate normal variables. The multivariate tests using this approximation are developed for testing the usual hypotheses. Results are illustrated on a data set. A method for analysing unbalanced data is also discussed.  相似文献   

7.
ABSTRACT

We consider Pitman-closeness to evaluate the performance of univariate and multivariate forecasting methods. Optimal weights for the combination of forecasts are calculated with respect to this criterion. These weights depend on the assumption of the distribution of the individual forecasts errors. In the normal case they are identical with the optimal weights with respect to the MSE-criterion (univariate case) and with the optimal weights with respect to the MMSE-criterion (multivariate case). Further, we present a simple example to show how the different combination techniques perform. There we can see how much the optimal multivariate combination can outperform different other combinations. In practice, we can find multivariate forecasts e.g., in econometrics. There is often the situation that forecast institutes estimate several economic variables.  相似文献   

8.
Linear rank tests are used extensively for comparing two or more groups of continuous outcomes. Tests in this class retain proper test size with minimal assumptions and can have high efficiency towards an alternative of interest. In recent years, these tests have been increasingly used in settings where an individual's observation is itself a scalar summary of several outcome measures. Here, simple distributional structures on the outcome variables can lead to complex differences between the distributions of summary statistics of the comparison groups. The local asymptotic power of linear rank tests when the groups are assumed to differ by a location or scale alternative has been studied in detail. However, not much is known about their behavior for other types of alternatives. To address this, we derive the asymptotic distribution of linear rank tests under a general contiguous alternative and then investigate the implications for location–scale families and more general settings, including an example drawn from an AIDS clinical trial where the continuous outcome is a summary statistic computed from repeated measures of a biological marker.  相似文献   

9.
When data sets are multilevel (group nesting or repeated measures), different sources of variations must be identified. In the framework of unsupervised analyses, multilevel simultaneous component analysis (MSCA) has recently been proposed as the most satisfactory option for analyzing multilevel data. MSCA estimates submodels for the different levels in data and thereby separates the “within”-subject and “between”-subject variations in the variables. Following the principles of MSCA and the strategy of decomposing the available data matrix into orthogonal blocks, and taking into account the between- and the within data structures, we generalize, in a multilevel perspective, multivariate models in which a matrix of response variables can be used to guide the projections (formed by responses predicted by explanatory variables or by a limited number of their combinations/composites) into choices of meaningful directions. To this end, the current paper proposes the multilevel version of the multivariate regression model and dimensionality-reduction methods (used to predict responses with fewer linear composites of explanatory variables). The principle findings of the study are that the minimization of the loss functions related to multivariate regression, principal-component regression, reduced-rank regression, and canonical-correlation regression are equivalent to the separate minimization of the sum of two separate loss functions corresponding to the between and within structures, under some constraints. The paper closes with a case study of an application focusing on the relationships between mental health severity and the intensity of care in the Lombardy region mental health system.  相似文献   

10.
Tables of critical values are given, which can be used to execute interim analyses in clinical trials involving two groups when the joint distribution of the test statistics can be approximated by a multivariate normal distribution. Critical values are given for both the one and two interim analyses cases for a variety of partitions of α and correlation structures. Results of power calculations are presented, which reflect the effects of both the correlation structure and partitions of α.Several examples are given, which illustrate how to apply the tables to a variety of experiments  相似文献   

11.
This work stems from the idea of describing the scientific productivity of Italian statisticians. There are several problems that must be addressed in achieving this goal: What data should be used? Have the data been cleaned? What techniques can be used? We propose the use of multiple sources and multiple metrics to get a complete information base. We check the correctness of the data using multivariate outlier identification techniques. We appropriately transform the data. We apply robust clustering to verify the existence of homogeneous groups. We suggest the use of forward search to establish a ranking among scholars. The proposed methodology, which, in this case, allowed us to group scholars into four homogeneous groups and sort them according to multidimensional data, can be applied to other similar applications in bibliometrics.  相似文献   

12.
The essence of the generalised multivariate Behrens–Fisher problem (BFP) is how to test the null hypothesis of equality of mean vectors for two or more populations when their dispersion matrices differ. Solutions to the BFP usually assume variables are multivariate normal and do not handle high‐dimensional data. In ecology, species' count data are often high‐dimensional, non‐normal and heterogeneous. Also, interest lies in analysing compositional dissimilarities among whole communities in non‐Euclidean (semi‐metric or non‐metric) multivariate space. Hence, dissimilarity‐based tests by permutation (e.g., PERMANOVA, ANOSIM) are used to detect differences among groups of multivariate samples. Such tests are not robust, however, to heterogeneity of dispersions in the space of the chosen dissimilarity measure, most conspicuously for unbalanced designs. Here, we propose a modification to the PERMANOVA test statistic, coupled with either permutation or bootstrap resampling methods, as a solution to the BFP for dissimilarity‐based tests. Empirical simulations demonstrate that the type I error remains close to nominal significance levels under classical scenarios known to cause problems for the un‐modified test. Furthermore, the permutation approach is found to be more powerful than the (more conservative) bootstrap for detecting changes in community structure for real ecological datasets. The utility of the approach is shown through analysis of 809 species of benthic soft‐sediment invertebrates from 101 sites in five areas spanning 1960 km along the Norwegian continental shelf, based on the Jaccard dissimilarity measure.  相似文献   

13.
Summary.  We consider joint spatial modelling of areal multivariate categorical data assuming a multiway contingency table for the variables, modelled by using a log-linear model, and connected across units by using spatial random effects. With no distinction regarding whether variables are response or explanatory, we do not limit inference to conditional probabilities, as in customary spatial logistic regression. With joint probabilities we can calculate arbitrary marginal and conditional probabilities without having to refit models to investigate different hypotheses. Flexible aggregation allows us to investigate subgroups of interest; flexible conditioning enables not only the study of outcomes given risk factors but also retrospective study of risk factors given outcomes. A benefit of joint spatial modelling is the opportunity to reveal disparities in health in a richer fashion, e.g. across space for any particular group of cells, across groups of cells at a particular location, and, hence, potential space–group interaction. We illustrate with an analysis of birth records for the state of North Carolina and compare with spatial logistic regression.  相似文献   

14.
A variety of primary endpoints are used in clinical trials treating patients with severe infectious diseases, and existing guidelines do not provide a consistent recommendation. We propose to study simultaneously two primary endpoints, cure and death, in a comprehensive multistate cure‐death model as starting point for a treatment comparison. This technique enables us to study the temporal dynamic of the patient‐relevant probability to be cured and alive. We describe and compare traditional and innovative methods suitable for a treatment comparison based on this model. Traditional analyses using risk differences focus on one prespecified timepoint only. A restricted logrank‐based test of treatment effect is sensitive to ordered categories of responses and integrates information on duration of response. The pseudo‐value regression provides a direct regression model for examination of treatment effect via difference in transition probabilities. Applied to a topical real data example and simulation scenarios, we demonstrate advantages and limitations and provide an insight into how these methods can handle different kinds of treatment imbalances. The cure‐death model provides a suitable framework to gain a better understanding of how a new treatment influences the time‐dynamic cure and death process. This might help the future planning of randomised clinical trials, sample size calculations, and data analyses.  相似文献   

15.
A novel approach to solve the independent component analysis (ICA) model in the presence of noise is proposed. We use wavelets as natural denoising tools to solve the noisy ICA model. To do this, we use a multivariate wavelet denoising algorithm allowing spatial and temporal dependency. We propose also using a statistical approach, named nested design of experiments, to select the parameters such as wavelet family and thresholding type. This technique helps us to select more convenient combination of the parameters. This approach could be extended to many other problems in which one needs to choose parameters between many choices. The performance of the proposed method is illustrated on the simulated data and promising results are obtained. Also, the suggested method applied in latent variables regression in the presence of noise on real data. The good results confirm the ability of multivariate wavelet denoising to solving noisy ICA.  相似文献   

16.
Quarterly data for the period 1960:1 to 1997:2, conventional tests, a bootstrap simulation approach and a multivariate Rao's F-test have been used to investigate if the causality between government spending and revenue in Finland was changed at the beginning of 1990 due to future plans to create the European Monetary Union (EMU). The results indicate that during the period before 1990, the government revenue Granger-caused spending, while the opposite happened after 1990, which agrees better with Barro's tax smoothing hypothesis. However, when using monthly data instead of quarterly data for almost the same sample period, totally different results have been noted. The general conclusion is that the relationship between spending and revenue in Finland is still not completely understood. The ambiguity of these results may well be due to the fact that there are several time scales involved in the relationship, and that the conventional analyses may be inadequate to separate out the time scale structured relationships between these variables. Therefore, to investigate empirically the relation between these variables we attempt to use the wavelets analysis that enables us to separate out different time scales of variation in the data. We find that time scale decomposition is important for analysing these economic variables.  相似文献   

17.
Multivariate panel count data often occur when there exist several related recurrent events or response variables defined by occurrences of related events. For univariate panel count data, several nonparametric treatment comparison procedures have been developed. However, it does not seem to exist a nonparametric procedure for multivariate cases. Based on differences between estimated mean functions, this article proposes a class of nonparametric test procedures for multivariate panel count data. The asymptotic distribution of the new test statistics is established and a simulation study is conducted. Moreover, the new procedures are applied to a skin cancer problem that motivated this study.  相似文献   

18.
In many experiments where data have been collected at two points in time (pre-treatment and post-treatment), investigators wish to determine if there is a difference between two treatment groups. In recent years it has been proposed that an appropriate statistical analysis to determine if treatment differences exist is to use the post-treatment values as the primary comparison variables and the pre-treatment values as covariates. When there are several outcome variables, we propose new tests based on residuals as alternatives to existing methods and investigate how the powers of the new and existing tests are affected by various choices of covariates. The limiting distribution of the test statistic of the new test based on residuals is given. Monte Carlo simulations are employed in the power comparisons.  相似文献   

19.
Global sensitivity analysis with variance-based measures suffers from several theoretical and practical limitations, since they focus only on the variance of the output and handle multivariate variables in a limited way. In this paper, we introduce a new class of sensitivity indices based on dependence measures which overcomes these insufficiencies. Our approach originates from the idea to compare the output distribution with its conditional counterpart when one of the input variables is fixed. We establish that this comparison yields previously proposed indices when it is performed with Csiszár f-divergences, as well as sensitivity indices which are well-known dependence measures between random variables. This leads us to investigate completely new sensitivity indices based on recent state-of-the-art dependence measures, such as distance correlation and the Hilbert–Schmidt independence criterion. We also emphasize the potential of feature selection techniques relying on such dependence measures as alternatives to screening in high dimension.  相似文献   

20.
Existing sample statistics do little to address the question of multimodality, a question which is interesting in itself and which also arises in exploratory multivariate data analysis using projection pursuit. We propose a new index more strongly geared to the specific task of measuring multimodality than other sample statistics known to us, we show how to compute it, explore its properties, and consider its generalisation to the multivariate case. The behaviour of the index is illustrated by some simple numerical examples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号