Seemingly unrelated regressions (SUR) models appear frequently in econometrics and in the analyses of repeated measures designs and longitudinal data. It is known that iterative algorithms are generally required to obtain the MLEs of the regression parameters. Under a minimal set of lattice conditional independence (LCI) restrictions imposed on the covariance structure, however, closed-form MLEs can be obtained by standard linear regression techniques (Andersson and Perlman, 1993, 1994, 1998). In this paper, simulation is used to study the efficiency of these LCI model-based estimators. We also propose two possible improvements of the usual two-stage estimators for the regression parameters.  相似文献   

In the analysis of non-monotone missing data patterns in multinomial distributions for contingency tables, it is known that explicit MLEs of the unknown parameters cannot be obtained. Iterative procedures such as the EM-algorithm are therefore required to obtain the MLEs. These iterative procedures, however, may offer several potential difficulties. Andersson and Perlman [Ann. Statist. 21 (1993) 1318–1358] introduced lattice conditional independence (LCI) models for multivariate normal distributions, which can be applied to the analysis of non-monotone missing observations in continuous data (Andersson and Perlman, Statist. Probab. Lett. 12 (1991) 465–486). In this paper, we show that LCI models may also be applied to the analysis of categorical data with non-monotone missing data patterns. Under a parsimonious set of LCI assumptions naturally determined by the observed data pattern, the likelihood function for the observed data can be factored as in the monotone case and explicit MLEs can be obtained for the unknown parameters. Furthermore, the LCI assumptions can be tested by explicit likelihood ratio tests.  相似文献   

It is often the case that high-dimensional data consist of only a few informative components. Standard statistical modeling and estimation in such a situation is prone to inaccuracies due to overfitting, unless regularization methods are practiced. In the context of classification, we propose a class of regularization methods through shrinkage estimators. The shrinkage is based on variable selection coupled with conditional maximum likelihood. Using Stein's unbiased estimator of the risk, we derive an estimator for the optimal shrinkage method within a certain class. A comparison of the optimal shrinkage methods in a classification context, with the optimal shrinkage method when estimating a mean vector under a squared loss, is given. The latter problem is extensively studied, but it seems that the results of those studies are not completely relevant for classification. We demonstrate and examine our method on simulated data and compare it to feature annealed independence rule and Fisher's rule.  相似文献   

Maximum likelihood (ML) estimation with spatial econometric models is a long-standing problem that finds application in several areas of economic importance. The problem is particularly challenging in the presence of missing data, since there is an implied dependence between all units, irrespective of whether they are observed or not. Out of the several approaches adopted for ML estimation in this context, that of LeSage and Pace [Models for spatially dependent missing data. J Real Estate Financ Econ. 2004;29(2):233–254] stands out as one of the most commonly used with spatial econometric models due to its ability to scale with the number of units. Here, we review their algorithm, and consider several similar alternatives that are also suitable for large datasets. We compare the methods through an extensive empirical study and conclude that, while the approximate approaches are suitable for large sampling ratios, for small sampling ratios the only reliable algorithms are those that yield exact ML or restricted ML estimates.  相似文献   

We consider the piecewise proportional hazards (PWPH) model with interval-censored (IC) relapse times under the distribution-free set-up. The partial likelihood approach is not applicable for IC data, and the generalized likelihood approach has not been studied in the literature. It turns out that under the PWPH model with IC data, the semi-parametric MLE (SMLE) of the covariate effect under the standard generalized likelihood may not be unique and may not be consistent. In fact, the parameter under the PWPH model with IC data is not identifiable unless the identifiability assumption is imposed. We propose a modification to the likelihood function so that its SMLE is unique. Under the identifiability assumption, our simulation study suggests that the SMLE is consistent. We apply the method to our cancer relapse time data and conclude that the bone marrow micrometastasis does not have a significant prognostic factor.  相似文献   

We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed.  相似文献   

We propose a method for estimating parameters in generalized linear models when the outcome variable is missing for some subjects and the missing data mechanism is non-ignorable. We assume throughout that the covariates are fully observed. One possible method for estimating the parameters is maximum likelihood with a non-ignorable missing data model. However, caution must be used when fitting non-ignorable missing data models because certain parameters may be inestimable for some models. Instead of fitting a non-ignorable model, we propose the use of auxiliary information in a likelihood approach to reduce the bias, without having to specify a non-ignorable model. The method is applied to a mental health study.  相似文献   

As assumed hypothetical consensus category corresponding to a case being classified provides a basis for assessment of reliability of judges. Equivalent judges are characterised by the joint probability distribution of the judge assignment and the consensus category. Estimates of the conditional probabilities of judge assignment given consensus category and of consensus category given judge assignments are indices of reliability. All parameters can be estimated if data include classifications of a number of cases by 3 or more judges. Restrictive assumptions are imposed to obtain models for data from classifications by two judges. Maximum likelihood estimation is discussed and illustrated by example for the 3 or more judges case.  相似文献   

In this article we provide a rigorous treatment of one of the central statistical issues of credit risk management. GivenK-1 rating categories, the rating of a corporate bond over a certain horizon may either stay the same or change to one of the remainingK-2 categories; in addition, it is usually the case that the rating of some bonds is withdrawn during the time interval considered in the analysis. When estimating transition probabilities, we have thus to consider aK-th category, called withdrawal, which contains (partially) missing data. We show how maximum likelihood estimation can be performed in this setup; whereas in discrete time our solution gives rigorous support to a solution often used in applications, in continuous time the maximum likelihood estimator of the transition matrix computed by means of the EM algorithm represents a significant improvement over existing methods.  相似文献   

A full likelihood method is proposed to analyse continuous longitudinal data with non-ignorable (informative) missing values and non-monotone patterns. The problem arose in a breast cancer clinical trial where repeated assessments of quality of life were collected: patients rated their coping ability during and after treatment. We allow the missingness probabilities to depend on unobserved responses, and we use a multivariate normal model for the outcomes. A first-order Markov dependence structure for the responses is a natural choice and facilitates the construction of the likelihood; estimates are obtained via the Nelder–Mead simplex algorithm. Computations are difficult and become intractable with more than three or four assessments. Applying the method to the quality-of-life data results in easily interpretable estimates, confirms the suspicion that the data are non-ignorably missing and highlights the likely bias of standard methods. Although treatment comparisons are not affected here, the methods are useful for obtaining unbiased means and estimating trends over time.  相似文献   

Wong et al. [(2018), ‘Piece-wise Proportional Hazards Models with Interval-censored Data’, Journal of Statistical Computation and Simulation, 88, 140–155] studied the piecewise proportional hazards (PWPH) model with interval-censored (IC) data under the distribution-free set-up. It is well known that the partial likelihood approach is not applicable for IC data, and Wong et al. (2018) showed that the standard generalised likelihood approach does not work either. They proposed the maximum modified generalised likelihood estimator (MMGLE) and the simulation results suggest that the MMGLE is consistent. We establish the consistency and asymptotically normality of the MMGLE.  相似文献   

For the nonparametric estimation of multivariate finite mixture models with the conditional independence assumption, we propose a new formulation of the objective function in terms of penalised smoothed Kullback–Leibler distance. The nonlinearly smoothed majorisation-minimisation (NSMM) algorithm is derived from this perspective. An elegant representation of the NSMM algorithm is obtained using a novel projection-multiplication operator, a more precise monotonicity property of the algorithm is discovered, and the existence of a solution to the main optimisation problem is proved for the first time.  相似文献   


In this paper, we investigate the consistency of the Expectation Maximization (EM) algorithm-based information criteria for model selection with missing data. The criteria correspond to a penalization of the conditional expectation of the complete data log-likelihood given the observed data and with respect to the missing data conditional density. We present asymptotic properties related to maximum likelihood estimation in the presence of incomplete data and we provide sufficient conditions for the consistency of model selection by minimizing the information criteria. Their finite sample performance is illustrated through simulation and real data studies.  相似文献   

In this paper, a generalized partially linear model (GPLM) with missing covariates is studied and a Monte Carlo EM (MCEM) algorithm with penalized-spline (P-spline) technique is developed to estimate the regression coefficients and nonparametric function, respectively. As classical model selection procedures such as Akaike's information criterion become invalid for our considered models with incomplete data, some new model selection criterions for GPLMs with missing covariates are proposed under two different missingness mechanism, say, missing at random (MAR) and missing not at random (MNAR). The most attractive point of our method is that it is rather general and can be extended to various situations with missing observations based on EM algorithm, especially when no missing data involved, our new model selection criterions are reduced to classical AIC. Therefore, we can not only compare models with missing observations under MAR/MNAR settings, but also can compare missing data models with complete-data models simultaneously. Theoretical properties of the proposed estimator, including consistency of the model selection criterions are investigated. A simulation study and a real example are used to illustrate the proposed methodology.  相似文献   

We examined the impact of different methods for replacing missing data in discriminant analyses conducted on randomly generated samples from multivariate normal and non-normal distributions. The probabilities of correct classification were obtained for these discriminant analyses before and after randomly deleting data as well as after deleted data were replaced using: (1) variable means, (2) principal component projections, and (3) the EM algorithm. Populations compared were: (1) multivariate normal with covariance matrices ∑1=∑2, (2) multivariate normal with ∑1≠∑2 and (3) multivariate non-normal with ∑1=∑2. Differences in the probabilities of correct classification were most evident for populations with small Mahalanobis distances or high proportions of missing data. The three replacement methods performed similarly but all were better than non - replacement.  相似文献   

In the longitudinal studies, the mixture generalized estimation equation (mix-GEE) was proposed to improve the efficiency of the fixed-effects estimator for addressing the working correlation structure misspecification. When the subject-specific effect is one of interests, mixed-effects models were widely used to analyze longitudinal data. However, most of the existing approaches assume a normal distribution for the random effects, and this could affect the efficiency of the fixed-effects estimator. In this article, a conditional mixture generalized estimating equation (cmix-GEE) approach based on the advantage of mix-GEE and conditional quadratic inference function (CQIF) method is developed. The advantage of our new approach is that it does not require the normality assumption for random effects and can accommodate the serial correlation between observations within the same cluster. The feature of our proposed approach is that the estimators of the regression parameters are more efficient than CQIF even if the working correlation structure is not correctly specified. In addition, according to the estimates of some mixture proportions, the true working correlation matrix can be identified. We establish the asymptotic results for the fixed-effects parameter estimators. Simulation studies were conducted to evaluate our proposed method.  相似文献   

Nonlinear mixed‐effect models are often used in the analysis of longitudinal data. However, it sometimes happens that missing values for some of the model covariates are not purely random. Motivated by an application to HTV viral dynamics, where this situation occurs, the author considers likelihood inference for this type of problem. His approach involves a Monte Carlo EM algorithm, along with a Gibbs sampler and rejection/importance sampling methods. A concrete application is provided.  相似文献   

The occurrence of missing data is an often unavoidable consequence of repeated measures studies. Fortunately, multivariate general linear models such as growth curve models and linear mixed models with random effects have been well developed to analyze incomplete normally-distributed repeated measures data. Most statistical methods have assumed that the missing data occur at random. This assumption may include two types of missing data mechanism: missing completely at random (MCAR) and missing at random (MAR) in the sense of Rubin (1976). In this paper, we develop a test procedure for distinguishing these two types of missing data mechanism for incomplete normally-distributed repeated measures data. The proposed test is similar in spiril to the test of Park and Davis (1992). We derive the test for incomplete normally-distribrlted repeated measures data using linear mixed models. while Park and Davis (1992) cleirved thr test for incomplete repeatctl categorical data in the framework of Grizzle Starmer. and Koch (1969). Thr proposed procedure can be applied easily to any other multivariate general linear model which allow for missing data. The test is illustrated using the hip-replacernent patient.data from Crowder and Hand (1990).  相似文献   

In this article, the profile maximal likelihood estimate (PMLE) is proposed for non linear mixed models (NLMMs) with longitudinal data where the variance components are estimated by the expectation-maximization (EM) algorithm. Strong consistency and the asymptotic normality of the estimators are derived. A simulation study is conducted where the performance of the PLME and the Fishing scoring estimate (FSE) in literatures are compared. Moreover, a real data is also analyzed to investigate the empirical performance of the procedure.  相似文献   

In this article, by using the constant and random selection matrices, several properties of the maximum likelihood (ML) estimates and the ML estimator of a normal distribution with missing data are derived. The constant selection matrix allows us to obtain an explicit form of the ML estimates and the exact relationship between the EM algorithm and the score function. The random selection matrix allows us to clarify how the missing-data mechanism works in the proof of the consistency of the ML estimator, to derive the asymptotic properties of the sequence by the EM algorithm, and to derive the information matrix.  相似文献   

