首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 609 毫秒
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

In several sciences, especially when dealing with performance evaluation, complex testing problems may arise due in particular to the presence of multidimensional categorical data. In such cases the application of nonparametric methods can represent a reasonable approach. In this paper, we consider the problem of testing whether a “treatment” is stochastically larger than a “control” when univariate and multivariate ordinal categorical data are present. We propose a solution based on the nonparametric combination of dependent permutation tests (Pesarin in Multivariate permutation test with application to biostatistics. Wiley, Chichester, 2001), on variable transformation, and on tests on moments. The solution requires the transformation of categorical response variables into numeric variables and the breaking up of the original problem’s hypotheses into partial sub-hypotheses regarding the moments of the transformed variables. This type of problem is considered to be almost impossible to analyze within likelihood ratio tests, especially in the multivariate case (Wang in J Am Stat Assoc 91:1676–1683, 1996). A comparative simulation study is also presented along with an application example.  相似文献   

Methods for linear regression with multivariate response variables are well described in statistical literature. In this study we conduct a theoretical evaluation of the expected squared prediction error in bivariate linear regression where one of the response variables contains missing data. We make the assumption of known covariance structure for the error terms. On this basis, we evaluate three well-known estimators: standard ordinary least squares, generalized least squares, and a James–Stein inspired estimator. Theoretical risk functions are worked out for all three estimators to evaluate under which circumstances it is advantageous to take the error covariance structure into account.  相似文献   

We focus on the evaluation of the long-term health care services provided to elderly patients by nursing homes of four different health districts in the Umbria region (Italy). To this end, we analyze data coming from a longitudinal survey aimed at assessing several aspects of patient health conditions and develop an extended version of the latent Markov model with covariates, which allows us to deal with dropout and intermittent missing data patterns that are common in longitudinal studies. Maximum likelihood estimates are obtained by a two-step approach that allows for fast estimation of model parameters and prevents some drawbacks of the standard maximum likelihood method encountered in the presence of many response variables and covariates. In the application to the observed data, we show how to obtain indicators of the effectiveness of the health care services delivered by each health district, by means of a resampling procedure.  相似文献   

基于因子分析的农村公共品需求满意度研究   总被引:1,自引:0,他引:1  
基于陕西省农民对农田水利基础设施、农村道路基础设施、农村医疗卫生、农村社会保障等12种具体公共品的满意度评价,运用因子分析萃取出代表农村公共品满意度评价共性特征的四个公因子:农村"软性"基本公共服务、"硬性"基本公共服务、农业生产服务和农村高层次公共服务,然后依据因子分析结果构建了农村公共品整体满意度的评价体系,并结合问卷中的辅助问项回答情况对因子分析结果进行经济解释,以揭示农村公共品需求的层次性和需求满意度的形成机理,最后提出优化农村公共品供给、提高农民需求满意度的对策。  相似文献   

In many dose-response studies, each of several independent groups of animals is treated with a different dose of a substance. Many response variables are then measured on each animal. The distributions of the response variables may be nonnormal, and Jonckheere's (1954) test for ordered alternatives in the one-way layout is sometimes used to test whether the level of a single variable increases with increasing dose. In some applications, however, it is important to consider a set of response variables simultaneously. For instance, an increase in each of certain enzymes in the blood serum may suggest liver damage. To test whether these enzyme levels increase with increasing dose, it may be preferable to consider these enzymes as a group, rather than individually.

I propose two multivariate generalizations of Jonckheere's univariate test. Each multivariate test statistic is a function of coordinate-wise Jonckheere statistics—one a sum, the other a quadratic form. The sum statistic can be used to test the alternative hypothesis that each variable is stochastically increasing with increasing dose. The quadratic form statistic is designed for the more general alternative hypothesis that each variable is stochastically ordered with increasing dose.

For each of these two alternatives, I also propose a multivariate generalization of a normal theory test described by Puri (1965). I examine the asymptotic distributions of the four test statistics under the null hypothesis and under translation alternatives and compare each distribution-free test to the corresponding normal theory test in terms of asymptotic relative efficiency.

The multivariate Jonckheere tests are illustrated using does-response data from a subchronic toxicology study carried out by the National Toxicology Program. Four groups of ten male rats each were treated with increasing doses of vinylidene flouride, and the serum enzymes SDH, SGOT, and SGPT were measured. A comparison of univariate Jonckheere tests on each variable, bivariate tests on SDH and SGOT, and multivariate tests on all three variables gives insight into the behavior of the various procedures.  相似文献   

Investigations of multivariate population are pretty common in applied researches, and the two-way crossed factorial design is a common design used at the exploratory phase in industrial applications. When assumptions such as multivariate normality and covariance homogeneity are violated, the conventional wisdom is to resort to nonparametric tests for hypotheses testing. In this paper we compare the performances, and in particular the power, of some nonparametric and semi-parametric methods that have been developed in recent years. Specifically, we examined resampling methods and robust versions of classical multivariate analysis of variance (MANOVA) tests. In a simulation study, we generate data sets with different configurations of factor''s effect, number of replicates, number of response variables under null hypothesis, and number of response variables under alternative hypothesis. The objective is to elicit practical advice and guides to practitioners regarding the sensitivity of the tests in the various configurations, the tradeoff between power and type I error, the strategic impact of increasing number of response variables, and the favourable performance of one test when the alternative is sparse. A real case study from an industrial engineering experiment in thermoformed packaging production is used to compare and illustrate the application of the various methods.  相似文献   

Generalized linear models (GLMs) are widely studied to deal with complex response variables. For the analysis of categorical dependent variables with more than two response categories, multivariate GLMs are presented to build the relationship between this polytomous response and a set of regressors. Traditional variable selection approaches have been proposed for the multivariate GLM with a canonical link function when the number of parameters is fixed in the literature. However, in many model selection problems, the number of parameters may be large and grow with the sample size. In this paper, we present a new selection criterion to the model with a diverging number of parameters. Under suitable conditions, the criterion is shown to be model selection consistent. A simulation study and a real data analysis are conducted to support theoretical findings.  相似文献   

Two sample surveys of Post-Docs were planned and carried out at the University of Ferrara in 2004 and 2007 aimed at determining the professional status of Post-Docs, the relationship between their PhD education and employment, and their satisfaction with certain aspects of the education and research program. As part of these surveys, two methodological contributions were developed. The first concerns an extension of the non-parametric combination of dependent rankings to construct a synthesis of composite indicators measuring satisfaction with particular aspects of PhD programs [R. Arboretti Giancristofaro and L. Salmaso, Global ranking indicators with application to the evaluation of PhD programs, Atti del Convegno “Valutazione e Customer Satisfaction per la Qualità dei Servizi”, Roma, 8–9 Settembre 2005, pp. 19–22; R. Arboretti Giancristofaro, S. Bonnini, and L. Salmaso, A performance indicator for multivariate data, Quad. Stat. 9 (2007), pp. 1–29; R. Arboretti Giancristofaro, F. Pesarin, and L. Salmaso, Nonparametric approaches for multivariate testing with mixed variables and for ranking on ordered categorical variables with an application to the evaluation of PhD programs, in Real Data Analysis, S. Sawilowsky, ed., a volume in Quantitative Methods in Education and the Behavioral Sciences: Issues, Research and Teaching, Ronald C. Serlin, series ed., Information Age Publishing, Charlotte, North Carolina, 2007, pp. 355–385]. The procedure was applied to highlight differences in the interviewed Post-Docs’ multivariate satisfaction profiles in relation to two aspects: education/employment relationship; employment expectations; and opportunities. The second consists of an inferential procedure providing a solution to the problem of hypothesis testing, where the objective is to compare the heterogeneity of two populations on the basis of sampling data [G.R. Arboretti, S. Bonnini, and F. Pesarin, A permutation approach for testing heterogeneity in two-sample categorical variables, Stat. Comput. (2009) doi: 10.1007/S11222-008-9085-8.]. The procedure was applied to compare the degrees of heterogeneity of Post-Doc judgments in the two surveys with regard to the adequacy of the PhD education for the work carried out.  相似文献   

In this paper, a joint model for analyzing multivariate mixed ordinal and continuous responses, where continuous outcomes may be skew, is presented. For modeling the discrete ordinal responses, a continuous latent variable approach is considered and for describing continuous responses, a skew-normal mixed effects model is used. A Bayesian approach using Markov Chain Monte Carlo (MCMC) is adopted for parameter estimation. Some simulation studies are performed for illustration of the proposed approach. The results of the simulation studies show that the use of the separate models or the normal distributional assumption for shared random effects and within-subject errors of continuous and ordinal variables, instead of the joint modeling under a skew-normal distribution, leads to biased parameter estimates. The approach is used for analyzing a part of the British Household Panel Survey (BHPS) data set. Annual income and life satisfaction are considered as the continuous and the ordinal longitudinal responses, respectively. The annual income variable is severely skewed, therefore, the use of the normality assumption for the continuous response does not yield acceptable results. The results of data analysis show that gender, marital status, educational levels and the amount of money spent on leisure have a significant effect on annual income, while marital status has the highest impact on life satisfaction.  相似文献   

At the core of multivariate statistics is the investigation of relationships between different sets of variables. More precisely, the inter-variable relationships and the causal relationships. The latter is a regression problem, where one set of variables is referred to as the response variables and the other set of variables as the predictor variables. In this situation, the effect of the predictors on the response variables is revealed through the regression coefficients. Results from the resulting regression analysis can be viewed graphically using the biplot. The consequential biplot provides a single graphical representation of the samples together with the predictor variables and response variables. In addition, their effect in terms of the regression coefficients can be visualized, although sub-optimally, in the said biplot.KEYWORDS: Biplot, regression analysis, multivariate regression, rank approximation  相似文献   

The number of parameters mushrooms in a linear mixed effects (LME) model in the case of multivariate repeated measures data. Computation of these parameters is a real problem with the increase in the number of response variables or with the increase in the number of time points. The problem becomes more intricate and involved with the addition of additional random effects. A multivariate analysis is not possible in a small sample setting. We propose a method to estimate these many parameters in bits and pieces from baby models, by taking a subset of response variables at a time, and finally using these bits and pieces at the end to get the parameter estimates for the mother model, with all variables taken together. Applying this method one can calculate the fixed effects, the best linear unbiased predictions (BLUPs) for the random effects in the model, and also the BLUPs at each time of observation for each response variable, to monitor the effectiveness of the treatment for each subject. The proposed method is illustrated with an example of multiple response variables measured over multiple time points arising from a clinical trial in osteoporosis.  相似文献   

The purpose of this paper is to discuss response surface designs for multivariate generalized linear models (GLMs). Such models are considered whenever several response variables can be measured for each setting of a group of control variables, and the response variables are adequately represented by GLMs. The mean-squared error of prediction (MSEP) matrix is used to assess the quality of prediction associated with a given design. The MSEP incorporates both the prediction variance and the prediction bias, which results from using maximum likelihood estimates of the parameters of the fitted linear predictor. For a given design, quantiles of a scalar-valued function of the MSEP are obtained within a certain region of interest. The quantiles depend on the unknown parameters of the linear predictor. The dispersion of these quantiles over the space of the unknown parameters is determined and then depicted by the so-called quantile dispersion graphs. An application of the proposed methodology is presented using the special case of the bivariate binary distribution.  相似文献   

When data sets are multilevel (group nesting or repeated measures), different sources of variations must be identified. In the framework of unsupervised analyses, multilevel simultaneous component analysis (MSCA) has recently been proposed as the most satisfactory option for analyzing multilevel data. MSCA estimates submodels for the different levels in data and thereby separates the “within”-subject and “between”-subject variations in the variables. Following the principles of MSCA and the strategy of decomposing the available data matrix into orthogonal blocks, and taking into account the between- and the within data structures, we generalize, in a multilevel perspective, multivariate models in which a matrix of response variables can be used to guide the projections (formed by responses predicted by explanatory variables or by a limited number of their combinations/composites) into choices of meaningful directions. To this end, the current paper proposes the multilevel version of the multivariate regression model and dimensionality-reduction methods (used to predict responses with fewer linear composites of explanatory variables). The principle findings of the study are that the minimization of the loss functions related to multivariate regression, principal-component regression, reduced-rank regression, and canonical-correlation regression are equivalent to the separate minimization of the sum of two separate loss functions corresponding to the between and within structures, under some constraints. The paper closes with a case study of an application focusing on the relationships between mental health severity and the intensity of care in the Lombardy region mental health system.  相似文献   

Between-group comparisons often entail many correlated response variables. The multivariate linear model, with its assumption of multivariate normality, is the accepted standard tool for these tests. When this assumption is violated, the non-parametric multivariate Kruskal–Wallis (MKW) test is frequently used. However, this test requires complete cases with no missing values in response variables. Deletion of cases with missing values likely leads to inefficient statistical inference. Here we extend the MKW test to retain information from partially observed cases. Results of simulated studies and analysis of real data show that the proposed method provides adequate coverage and superior power to complete case analyses.  相似文献   

In prediction problems both response and covariates may have high correlation with a second group of influential regressors, that can be considered as background variables. An important challenge is to perform variable selection and importance assessment among the covariates in the presence of these variables. A clinical example is the prediction of the lean body mass (response) from bioimpedance (covariates), where anthropometric measures play the role of background variables. We introduce a reduced dataset in which the variables are defined as the residuals with respect to the background, and perform variable selection and importance assessment both in linear and random forest models. Using a clinical dataset of multi-frequency bioimpedance, we show the effectiveness of this method to select the most relevant predictors of the lean body mass beyond anthropometry.  相似文献   

Quantile regression models are a powerful tool for studying different points of the conditional distribution of univariate response variables. Their multivariate counterpart extension though is not straightforward, starting with the definition of multivariate quantiles. We propose here a flexible Bayesian quantile regression model when the response variable is multivariate, where we are able to define a structured additive framework for all predictor variables. We build on previous ideas considering a directional approach to define the quantiles of a response variable with multiple outputs, and we define noncrossing quantiles in every directional quantile model. We define a Markov chain Monte Carlo (MCMC) procedure for model estimation, where the noncrossing property is obtained considering a Gaussian process design to model the correlation between several quantile regression models. We illustrate the results of these models using two datasets: one on dimensions of inequality in the population, such as income and health; the second on scores of students in the Brazilian High School National Exam, considering three dimensions for the response variable.  相似文献   

The dependence of the one year probability of utilizing preventive services and the one year probability of utilizing visits for acute micro-organism disease in the Kaiser-Oregon Prepaid Medical Care System on ten (10) explanatory factors has been investigated using a multiple logistic function analysis.Our results demonstrate the importance of disaggregating the population and the type of medical care when investigating the determinants of utilizing service. This is especially important for understanding the underlying structure of medical care utilization decisions.The study also illustrates a potentially fruitful application of the multivariate logistic analysis to the fields of health care planning and policy analysis. The use of specific types of medical care in the short-term is probabilistic and depends on many factors. For most groups, the multivariate logistic approach produces an analysis reasonably consistent with the actual data.Further research is needed to test the predictive ability of these types of utilization models and to investigate the determinants of other morbidity specific types of medical care utilization. The problem will then be to develop a model of the quantity of alternative types of services utilized-conditioned on the number of persons initiating service for alternative health reasons.  相似文献   

Nowadays, many manufacturing and service systems provide products and services to their customers in several consecutive stages of operations, in each of which one or more quality characteristics of interest are monitored. In these environments, the final quality in the last stage not only depends on the quality of the task performed in that stage but also is dependent on the quality of the products and services in intermediate stages as well as the design parameters in each stage. In this paper, a novel methodology based on the posterior preference approach is proposed to robustly optimize these multistage processes. In this methodology, a multi-response surface optimization problem is solved in order to find preferred solutions among different non dominated solutions (NDSs) according to decision maker's preference. In addition, as the intermediate response variables (quality characteristics) may act as covariates in the next stages, a robust multi-response estimation method is applied to extract the relationships between the outputs and inputs of each stage. NDSs are generated by the ?-constraint method. The robust preferred solutions are selected considering some newly defined conformance criteria. The applicability of the proposed approach is illustrated by a numerical example at the end.  相似文献   

Consider a vector valued response variable related to a vector valued explanatory variable through a normal multivariate linear model. The multivariate calibration problem deals with statistical inference on unknown values of the explanatory variable. The problem addressed is the construction of joint confidence regions for several unknown values of the explanatory variable. The problem is investigated when the variance covariance matrix is a scalar multiple of the identity matrix and also when it is a completely unknown positive definite matrix. The problem is solved in only two cases: (i) the response and explanatory variables have the same dimensions, and (ii) the explanatory variable is a scalar. In the former case, exact joint confidence regions are derived based on a natural pivot statistic. In the latter case, the joint confidence regions are only conservative. Computational aspects and the practical implementation of the confidence regions are discussed and illustrated using an example.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号