首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 78 毫秒
The estimation of population parameters of the continuous common factor model from categorical observed variables is meanwhile regularly performed. It is shown that the formula for the calculation of the determinacy of the regression factor score predictor from the estimated model parameters has to be adapted under these conditions. A method for the calculation of this determinacy from the model parameters of the continuous population factor model based on categorical variables is proposed and evaluated by means of simulated population data. It turns out that using the uncorrected formula can lead to serious overestimation of determinacy for categorical variables.  相似文献   

Mixture separation for mixed-mode data   总被引:3,自引:0,他引:3  
One possible approach to cluster analysis is the mixture maximum likelihood method, in which the data to be clustered are assumed to come from a finite mixture of populations. The method has been well developed, and much used, for the case of multivariate normal populations. Practical applications, however, often involve mixtures of categorical and continuous variables. Everitt (1988) and Everitt and Merette (1990) recently extended the normal model to deal with such data by incorporating the use of thresholds for the categorical variables. The computations involved in this model are so extensive, however, that it is only feasible for data containing very few categorical variables. In the present paper we consider an alternative model, known as the homogeneous Conditional Gaussian model in graphical modelling and as the location model in discriminant analysis. We extend this model to the finite mixture situation, obtain maximum likelihood estimates for the population parameters, and show that computation is feasible for an arbitrary number of variables. Some data sets are clustered by this method, and a small simulation study demonstrates characteristics of its performance.  相似文献   

The multinomial logit model (MNL) is one of the most frequently used statistical models in marketing applications. It allows one to relate an unordered categorical response variable, for example representing the choice of a brand, to a vector of covariates such as the price of the brand or variables characterising the consumer. In its classical form, all covariates enter in strictly parametric, linear form into the utility function of the MNL model. In this paper, we introduce semiparametric extensions, where smooth effects of continuous covariates are modelled by penalised splines. A mixed model representation of these penalised splines is employed to obtain estimates of the corresponding smoothing parameters, leading to a fully automated estimation procedure. To validate semiparametric models against parametric models, we utilise different scoring rules as well as predicted market share and compare parametric and semiparametric approaches for a number of brand choice data sets.  相似文献   

Model-based clustering methods for continuous data are well established and commonly used in a wide range of applications. However, model-based clustering methods for categorical data are less standard. Latent class analysis is a commonly used method for model-based clustering of binary data and/or categorical data, but due to an assumed local independence structure there may not be a correspondence between the estimated latent classes and groups in the population of interest. The mixture of latent trait analyzers model extends latent class analysis by assuming a model for the categorical response variables that depends on both a categorical latent class and a continuous latent trait variable; the discrete latent class accommodates group structure and the continuous latent trait accommodates dependence within these groups. Fitting the mixture of latent trait analyzers model is potentially difficult because the likelihood function involves an integral that cannot be evaluated analytically. We develop a variational approach for fitting the mixture of latent trait models and this provides an efficient model fitting strategy. The mixture of latent trait analyzers model is demonstrated on the analysis of data from the National Long Term Care Survey (NLTCS) and voting in the U.S. Congress. The model is shown to yield intuitive clustering results and it gives a much better fit than either latent class analysis or latent trait analysis alone.  相似文献   

Both continuous and categorical covariates are common in traditional Chinese medicine (TCM) research, especially in the clinical syndrome identification and in the risk prediction research. For groups of dummy variables which are generated by the same categorical covariate, it is important to penalize them group-wise rather than individually. In this paper, we discuss the group lasso method for a risk prediction analysis in TCM osteoporosis research. It is the first time to apply such a group-wise variable selection method in this field. It may lead to new insights of using the grouped penalization method to select appropriate covariates in the TCM research. The introduced methodology can select categorical and continuous variables, and estimate their parameters simultaneously. In our application of the osteoporosis data, four covariates (including both categorical and continuous covariates) are selected out of 52 covariates. The accuracy of the prediction model is excellent. Compared with the prediction model with different covariates, the group lasso risk prediction model can significantly decrease the error rate and help TCM doctors to identify patients with a high risk of osteoporosis in clinical practice. Simulation results show that the application of the group lasso method is reasonable for the categorical covariates selection model in this TCM osteoporosis research.  相似文献   

Modeling clustered categorical data based on extensions of generalized linear model theory has received much attention in recent years. The rapidly increasing number of approaches suitable for categorical data in which clusters are uncorrelated, but correlations exist within a cluster, has caused uncertainty among applied scientists as to their respective merits and demerits. Upon centering estimation around solving an unbiased estimating function for mean parameters and estimation of covariance parameters describing within-cluster or among-cluster heterogeneity, many approaches can easily be related. This contribution describes a series of algorithms and their implementation in detail, based on a classification of inferential procedures for clustered data.  相似文献   

In contrast to the common belief that the logit model has no analytical presentation, it is possible to find such a solution in the case of categorical predictors. This paper shows that a binary logistic regression by categorical explanatory variables can be constructed in a closed-form solution. No special software and no iterative procedures of nonlinear estimation are needed to obtain a model with all its parameters and characteristics, including coefficients of regression, their standard errors and t-statistics, as well as the residual and null deviances. The derivation is performed for logistic models with one binary or categorical predictor, and several binary or categorical predictors. The analytical formulae can be used for arithmetical calculation of all the parameters of the logit regression. The explicit expressions for the characteristics of logit regression are convenient for the analysis and interpretation of the results of logistic modeling.  相似文献   

Generalized linear models (GLMs) are widely studied to deal with complex response variables. For the analysis of categorical dependent variables with more than two response categories, multivariate GLMs are presented to build the relationship between this polytomous response and a set of regressors. Traditional variable selection approaches have been proposed for the multivariate GLM with a canonical link function when the number of parameters is fixed in the literature. However, in many model selection problems, the number of parameters may be large and grow with the sample size. In this paper, we present a new selection criterion to the model with a diverging number of parameters. Under suitable conditions, the criterion is shown to be model selection consistent. A simulation study and a real data analysis are conducted to support theoretical findings.  相似文献   

If the capture probabilities in a capture‐recapture experiment depend on covariates, parametric models may be fitted and the population size may then be estimated. Here a semiparametric model for the capture probabilities that allows both continuous and categorical covariates is developed. Kernel smoothing and profile estimating equations are used to estimate the nonparametric and parametric components. Analytic forms of the standard errors are derived, which allows an empirical bias bandwidth selection procedure to be used to estimate the bandwidth. The method is evaluated in simulations and is applied to a real data set concerning captures of Prinia flaviventris, which is a common bird species in Southeast Asia.  相似文献   

The likelihood function of a general nonlinear, non-Gaussian state space model is a high-dimensional integral with no closed-form solution. In this article, I show how to calculate the likelihood function exactly for a large class of non-Gaussian state space models that include stochastic intensity, stochastic volatility, and stochastic duration models among others. The state variables in this class follow a nonnegative stochastic process that is popular in econometrics for modeling volatility and intensities. In addition to calculating the likelihood, I also show how to perform filtering and smoothing to estimate the latent variables in the model. The procedures in this article can be used for either Bayesian or frequentist estimation of the model’s unknown parameters as well as the latent state variables. Supplementary materials for this article are available online.  相似文献   

In the context of local interpolators, radial basis functions (RBFs) are known to reduce the computational time by using a subset of the data for prediction purposes. In this paper, we propose a new distance-based spatial RBFs method which allows modeling spatial continuous random variables. The trend is incorporated into a RBF according to a detrending procedure with mixed variables, among which we may have categorical variables. In order to evaluate the efficiency of the proposed method, a simulation study is carried out for a variety of practical scenarios for five distinct RBFs, incorporating principal coordinates. Finally, the proposed method is illustrated with an application of prediction of calcium concentration measured at a depth of 0–20 cm in Brazil, selecting the smoothing parameter by cross-validation.  相似文献   

In in most cases, the distribution of communications is unknown and one may summarize social network communications with categorical attributes in a contingency table. Due to the categorical nature of the data and a large number of features, there are many parameters to be considered and estimated in the model. Hence, the accuracy of estimators decreases. To overcome the problem of high dimensionality and unknown communications distribution, multiple correspondence analysis is used to reduce the number of parameters. Then the rescaled data are studied in a Dirichlet model in which the parameters should be estimated. Moreover, two control charts, Hotelling’s T2 and multivariate exponentially weighted moving average (MEWMA), are developed to monitor the parameters of the Dirichlet distribution. The performance of the proposed method is evaluated through simulation studies in terms of average run length criterion. Finally, the proposed method is applied to a real case.  相似文献   

Logistic-normal models can be applied for analysis of longitudinal binary data. The aim of this article is to propose a goodness-of-fit test using nonparametric smoothing techniques for checking the adequacy of logistic-normal models. Moreover, the leave-one-out cross-validation method for selecting the suitable bandwidth is developed. The quadratic form of the proposed test statistic based on smoothing residuals provides a global measure for checking the model with categorical and continuous covariates. The formulae of expectation and variance of the proposed statistics are derived, and their asymptotic distribution is approximated by a scaled chi-squared distribution. The power performance of the proposed test for detecting the interaction term or the squared term of continuous covariates is examined by simulation studies. A longitudinal dataset is utilized to illustrate the application of the proposed test.  相似文献   

For clustering mixed categorical and continuous data, Lawrence and Krzanowski (1996) proposed a finite mixture model in which component densities conform to the location model. In the graphical models literature the location model is known as the homogeneous Conditional Gaussian model. In this paper it is shown that their model is not identifiable without imposing additional restrictions. Specifically, for g groups and m locations, (g!)m–1 distinct sets of parameter values (not including permutations of the group mixing parameters) produce the same likelihood function. Excessive shrinkage of parameter estimates in a simulation experiment reported by Lawrence and Krzanowski (1996) is shown to be an artifact of the model's non-identifiability. Identifiable finite mixture models can be obtained by imposing restrictions on the conditional means of the continuous variables. These new identified models are assessed in simulation experiments. The conditional mean structure of the continuous variables in the restricted location mixture models is similar to that in the underlying variable mixture models proposed by Everitt (1988), but the restricted location mixture models are more computationally tractable.  相似文献   

A general framework is proposed for modelling clustered mixed outcomes. A mixture of generalized linear models is used to describe the joint distribution of a set of underlying variables, and an arbitrary function relates the underlying variables to be observed outcomes. The model accommodates multilevel data structures, general covariate effects and distinct link functions and error distributions for each underlying variable. Within the framework proposed, novel models are developed for clustered multiple binary, unordered categorical and joint discrete and continuous outcomes. A Markov chain Monte Carlo sampling algorithm is described for estimating the posterior distributions of the parameters and latent variables. Because of the flexibility of the modelling framework and estimation procedure, extensions to ordered categorical outcomes and more complex data structures are straightforward. The methods are illustrated by using data from a reproductive toxicity study.  相似文献   

We propose a multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal component method dedicated to categorical data: multiple correspondence analysis (MCA). The uncertainty concerning the parameters of the imputation model is reflected using a non-parametric bootstrap. Multiple imputation using MCA (MIMCA) requires estimating a small number of parameters due to the dimensionality reduction property of MCA. It allows the user to impute a large range of data sets. In particular, a high number of categories per variable, a high number of variables or a small number of individuals are not an issue for MIMCA. Through a simulation study based on real data sets, the method is assessed and compared to the reference methods (multiple imputation using the loglinear model, multiple imputation by logistic regressions) as well to the latest works on the topic (multiple imputation by random forests or by the Dirichlet process mixture of products of multinomial distributions model). The proposed method provides a good point estimate of the parameters of the analysis model considered, such as the coefficients of a main effects logistic regression model, and a reliable estimate of the variability of the estimators. In addition, MIMCA has the great advantage that it is substantially less time consuming on data sets of high dimensions than the other multiple imputation methods.  相似文献   

In the paper simple resampling technique based on semiparametric smoothing is introduced. Although the method is very flexible and in principle can be applied to any sparse data and ill-posed statistical problem, its efficient or even reasonable implementation requires special investigation. In the paper a problem of fitting local dependence structure of finite-state random sequences is addressed. This problem is relevant, for example, in genetics, bioinformatics, computer linguistics, etc., and usually leads to analysis of sparse contingency tables of dependent categorical data. Thus, the classical assumptions of log-linear model, a standard technique for analysis of contingency tables, do not hold. A framework convenient for implementation of semiparametric smoothing and resampling is proposed. It is based on a special representation form of data under consideration and generalized logit model. A computer experiment is carried out to gain better insight on practical performance of the procedure.  相似文献   

Forecasting in economic data analysis is dominated by linear prediction methods where the predicted values are calculated from a fitted linear regression model. With multiple predictor variables, multivariate nonparametric models were proposed in the literature. However, empirical studies indicate the prediction performance of multi-dimensional nonparametric models may be unsatisfactory. We propose a new semiparametric model average prediction (SMAP) approach to analyse panel data and investigate its prediction performance with numerical examples. Estimation of individual covariate effect only requires univariate smoothing and thus may be more stable than previous multivariate smoothing approaches. The estimation of optimal weight parameters incorporates the longitudinal correlation and the asymptotic properties of the estimated results are carefully studied in this paper.  相似文献   

Summary This paper investigates the effects of ordinal regressors in linear regression models and in limited dependent variable models. Each ordered categorical variable is interpreted as a rough measurement of an underlying continuous variable as it is often done in microeconometrics for the dependent variable. It is shown that using ordinal indicators only leads to correct answers in a few special cases. In most situations, the usual estimators are biased. In order to estimate the parameters of the model consistently, the indirect estimation procedure suggested by Gourieroux et al. (1993) is applied. To demonstrate this method, first a simulation study is performed and then in a second step, two real data sets are used. In the latter case, continuous regressors are transformed into categorical variables to study the behavior of the estimation procedure. The method is extended to the case of limited dependent variable models. In general, the indirect estimators lead to adequate results. Received: March 27, 2000; revised version: March 6, 2001  相似文献   

A discrimination procedure, based on the location model is described and suggested for use in situation where the discriminating variables are mixtures of continuous and binary variables. Some procedures that have been previously employed, in a similar situation, like Fisher's linear discriminant function and the logistic regression were compared with this method using error rate (ER). Optimal ERs for these procedures are reported using real and simulated data for the case of varying sample size and number of continuous and binary variables and were used as a measure for assessing the performance of the various procedures. The suggested procedure performed considerably better in the cases considered and never did produce a result that is poor when compared with other procedures. Hence, the suggested procedure might be considered for such situations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号