首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this work, we modify finite mixtures of factor analysers to provide a method for simultaneous clustering of subjects and multivariate discrete outcomes. The joint clustering is performed through a suitable reparameterization of the outcome (column)-specific parameters. We develop an expectation–maximization-type algorithm for maximum likelihood parameter estimation where the maximization step is divided into orthogonal sub-blocks that refer to row and column-specific parameters, respectively. Model performance is evaluated via a simulation study with varying sample size, number of outcomes and row/column-specific clustering (partitions). We compare the performance of our model with the performance of standard model-based biclustering approaches. The proposed method is also demonstrated on a benchmark data set where a multivariate binary response is considered.  相似文献   

2.
We present a novel methodology for estimating the parameters of a finite mixture model (FMM) based on partially rank‐ordered set (PROS) sampling and use it in a fishery application. A PROS sampling design first selects a simple random sample of fish and creates partially rank‐ordered judgement subsets by dividing units into subsets of prespecified sizes. The final measurements are then obtained from these partially ordered judgement subsets. The traditional expectation–maximization algorithm is not directly applicable for these observations. We propose a suitable expectation–maximization algorithm to estimate the parameters of the FMMs based on PROS samples. We also study the problem of classification of the PROS sample into the components of the FMM. We show that the maximum likelihood estimators based on PROS samples perform substantially better than their simple random sample counterparts even with small samples. The results are used to classify a fish population using the length‐frequency data.  相似文献   

3.
Misclassifications in binary responses have long been a common problem in medical and health surveys. One way to handle misclassifications in clustered or longitudinal data is to incorporate the misclassification model through the generalized estimating equation (GEE) approach. However, existing methods are developed under a non-survey setting and cannot be used directly for complex survey data. We propose a pseudo-GEE method for the analysis of binary survey responses with misclassifications. We focus on cluster sampling and develop analysis strategies for analyzing binary survey responses with different forms of additional information for the misclassification process. The proposed methodology has several attractive features, including simultaneous inferences for both the response model and the association parameters. Finite sample performance of the proposed estimators is evaluated through simulation studies and an application using a real dataset from the Canadian Longitudinal Study on Aging.  相似文献   

4.
Nonresponse is a very common phenomenon in survey sampling. Nonignorable nonresponse – that is, a response mechanism that depends on the values of the variable having nonresponse – is the most difficult type of nonresponse to handle. This article develops a robust estimation approach to estimating equations (EEs) by incorporating the modelling of nonignorably missing data, the generalized method of moments (GMM) method and the imputation of EEs via the observed data rather than the imputed missing values when some responses are subject to nonignorably missingness. Based on a particular semiparametric logistic model for nonignorable missing response, this paper proposes the modified EEs to calculate the conditional expectation under nonignorably missing data. We can apply the GMM to infer the parameters. The advantage of our method is that it replaces the non-parametric kernel-smoothing with a parametric sampling importance resampling (SIR) procedure to avoid nonparametric kernel-smoothing problems with high dimensional covariates. The proposed method is shown to be more robust than some current approaches by the simulations.  相似文献   

5.
Estimating equations which are not necessarily likelihood-based score equations are becoming increasingly popular for estimating regression model parameters. This paper is concerned with estimation based on general estimating equations when true covariate data are missing for all the study subjects, but surrogate or mismeasured covariates are available instead. The method is motivated by the covariate measurement error problem in marginal or partly conditional regression of longitudinal data. We propose to base estimation on the expectation of the complete data estimating equation conditioned on available data. The regression parameters and other nuisance parameters are estimated simultaneously by solving the resulting estimating equations. The expected estimating equation (EEE) estimator is equal to the maximum likelihood estimator if the complete data scores are likelihood scores and conditioning is with respect to all the available data. A pseudo-EEE estimator, which requires less computation, is also investigated. Asymptotic distribution theory is derived. Small sample simulations are conducted when the error process is an order 1 autoregressive model. Regression calibration is extended to this setting and compared with the EEE approach. We demonstrate the methods on data from a longitudinal study of the relationship between childhood growth and adult obesity.  相似文献   

6.
Clustered binary data are common in medical research and can be fitted to the logistic regression model with random effects which belongs to a wider class of models called the generalized linear mixed model. The likelihood-based estimation of model parameters often has to handle intractable integration which leads to several estimation methods to overcome such difficulty. The penalized quasi-likelihood (PQL) method is the one that is very popular and computationally efficient in most cases. The expectation–maximization (EM) algorithm allows to estimate maximum-likelihood estimates, but requires to compute possibly intractable integration in the E-step. The variants of the EM algorithm to evaluate the E-step are introduced. The Monte Carlo EM (MCEM) method computes the E-step by approximating the expectation using Monte Carlo samples, while the Modified EM (MEM) method computes the E-step by approximating the expectation using the Laplace's method. All these methods involve several steps of approximation so that corresponding estimates of model parameters contain inevitable errors (large or small) induced by approximation. Understanding and quantifying discrepancy theoretically is difficult due to the complexity of approximations in each method, even though the focus is on clustered binary data. As an alternative competing computational method, we consider a non-parametric maximum-likelihood (NPML) method as well. We review and compare the PQL, MCEM, MEM and NPML methods for clustered binary data via simulation study, which will be useful for researchers when choosing an estimation method for their analysis.  相似文献   

7.
In this article, we consider a model allowing the analysis of multivariate data, which can contain data attributes of different types (e.g., continuous, discrete, binary). This model is a two-level hierarchical model which supports a wide range of correlation structures and can accommodate overdispersed data. Maximum likelihood estimation of the model parameters is achieved with an automated Monte Carlo expectation maximization algorithm. Our method is tested in a simulation study in the bivariate case and applied to a data set dealing with beehive activity.  相似文献   

8.
ABSTRACT

We propose a multiple imputation method based on principal component analysis (PCA) to deal with incomplete continuous data. To reflect the uncertainty of the parameters from one imputation to the next, we use a Bayesian treatment of the PCA model. Using a simulation study and real data sets, the method is compared to two classical approaches: multiple imputation based on joint modelling and on fully conditional modelling. Contrary to the others, the proposed method can be easily used on data sets where the number of individuals is less than the number of variables and when the variables are highly correlated. In addition, it provides unbiased point estimates of quantities of interest, such as an expectation, a regression coefficient or a correlation coefficient, with a smaller mean squared error. Furthermore, the widths of the confidence intervals built for the quantities of interest are often smaller whilst ensuring a valid coverage.  相似文献   

9.
The weighted least squares (WLS) estimator is often employed in linear regression using complex survey data to deal with the bias in ordinary least squares (OLS) arising from informative sampling. In this paper a 'quasi-Aitken WLS' (QWLS) estimator is proposed. QWLS modifies WLS in the same way that Cragg's quasi-Aitken estimator modifies OLS. It weights by the usual inverse sample inclusion probability weights multiplied by a parameterized function of covariates, where the parameters are chosen to minimize a variance criterion. The resulting estimator is consistent for the superpopulation regression coefficient under fairly mild conditions and has a smaller asymptotic variance than WLS.  相似文献   

10.
This article considers the case where two surveys collect data on a common variable, with one survey being much smaller than the other. The smaller survey collects data on an additional variable of interest, related to the common variable collected in the two surveys, and out-of-scope with respect to the larger survey. Estimation of the two related variables is of interest at domains defined at a granular level. We propose a multilevel model for integrating data from the two surveys, by reconciling survey estimates available for the common variable, accounting for the relationship between the two variables, and expanding estimation for the other variable, for all the domains of interest. The model is specified as a hierarchical Bayes model for domain-level survey data, and posterior distributions are constructed for the two variables of interest. A synthetic estimation approach is considered as an alternative to the hierarchical modelling approach. The methodology is applied to wage and benefits estimation using data from the National Compensation Survey and the Occupational Employment Statistics Survey, available from the Bureau of Labor Statistics, Department of Labor, United States.  相似文献   

11.
This article considers the statistical analysis of dependent competing risks model with incomplete data under Type-I progressive hybrid censored condition using a Marshall–Olkin bivariate Weibull distribution. Based on the expectation maximum algorithm, maximum likelihood estimators for the unknown parameters are obtained, and the missing information principle is used to obtain the observed information matrix. As the maximum likelihood approach may fail when the available information is insufficient, Bayesian approach incorporated with auxiliary variables is developed for estimating the parameters of the model, and Monte Carlo method is employed to construct the highest posterior density credible intervals. The proposed method is illustrated through a numerical example under different progressive censoring schemes and masking probabilities. Finally, a real data set is analyzed for illustrative purposes.  相似文献   

12.
The method of target estimation developed by Cabrera and Fernholz [(1999). Target estimation for bias and mean square error reduction. The Annals of Statistics, 27(3), 1080–1104.] to reduce bias and variance is applied to logistic regression models of several parameters. The expectation functions of the maximum likelihood estimators for the coefficients in the logistic regression models of one and two parameters are analyzed and simulations are given to show a reduction in both bias and variability after targeting the maximum likelihood estimators. In addition to bias and variance reduction, it is found that targeting can also correct the skewness of the original statistic. An example based on real data is given to show the advantage of using target estimators for obtaining better confidence intervals of the corresponding parameters. The notion of the target median is also presented with some applications to the logistic models.  相似文献   

13.
In this article, we consider the estimation of regression parameters in linear model in the presence of interval-censored data. When the response variable is interval-censored, the traditional methods can not be used to estimate the parameters directly. In this article, unbiased transformation is carried out and a new random variable which has the same expectation as the function of the response variable is established. With the regression analysis for the constructed statistic we conclude the estimator by least square method.  相似文献   

14.
Doubly censored failure time data occur in many areas including demographical studies, epidemiology studies, medical studies and tumorigenicity experiments, and correspondingly some inference procedures have been developed in the literature (Biometrika, 91, 2004, 277; Comput. Statist. Data Anal., 57, 2013, 41; J. Comput. Graph. Statist., 13, 2004, 123). In this paper, we discuss regression analysis of such data under a class of flexible semiparametric transformation models, which includes some commonly used models for doubly censored data as special cases. For inference, the non‐parametric maximum likelihood estimation will be developed and in particular, we will present a novel expectation–maximization algorithm with the use of subject‐specific independent Poisson variables. In addition, the asymptotic properties of the proposed estimators are established and an extensive simulation study suggests that the proposed methodology works well for practical situations. The method is applied to an AIDS study.  相似文献   

15.
We propose a robust estimation procedure for the analysis of longitudinal data including a hidden process to account for unobserved heterogeneity between subjects in a dynamic fashion. We show how to perform estimation by an expectation–maximization-type algorithm in the hidden Markov regression literature. We show that the proposed robust approaches work comparably to the maximum-likelihood estimator when there are no outliers and the error is normal and outperform it when there are outliers or the error is heavy tailed. A real data application is used to illustrate our proposal. We also provide details on a simple criterion to choose the number of hidden states.  相似文献   

16.
This article introduces a new method, named the two-sided M-Bayesian credible limits method, to estimate reliability parameters. It is especially suitable for situations of high reliability or zero-failure data. The definition, properties, and related formulas of the two-sided M-Bayesian credible limit are proposed. A real data set are also discussed. By means of an example we can see that the two-sided M-Bayesian credible limits method is efficient and easy to perform.  相似文献   

17.
Longitudinal or clustered response data arise in many applications such as biostatistics, epidemiology and environmental studies. The repeated responses cannot in general be assumed to be independent. One method of analysing such data is by using the generalized estimating equations (GEE) approach. The current GEE method for estimating regression effects in longitudinal data focuses on the modelling of the working correlation matrix assuming a known variance function. However, correct choice of the correlation structure may not necessarily improve estimation efficiency for the regression parameters if the variance function is misspecified [Wang YG, Lin X. Effects of variance-function misspecification in analysis of longitudinal data. Biometrics. 2005;61:413–421]. In this connection two problems arise: finding a correct variance function and estimating the parameters of the chosen variance function. In this paper, we study the problem of estimating the parameters of the variance function assuming that the form of the variance function is known and then the effect of a misspecified variance function on the estimates of the regression parameters. We propose a GEE approach to estimate the parameters of the variance function. This estimation approach borrows the idea of Davidian and Carroll [Variance function estimation. J Amer Statist Assoc. 1987;82:1079–1091] by solving a nonlinear regression problem where residuals are regarded as the responses and the variance function is regarded as the regression function. A limited simulation study shows that the proposed method performs at least as well as the modified pseudo-likelihood approach developed by Wang and Zhao [A modified pseudolikelihood approach for analysis of longitudinal data. Biometrics. 2007;63:681–689]. Both these methods perform better than the GEE approach.  相似文献   

18.
In this paper, the estimation of parameters for a three-parameter Weibull distribution based on progressively Type-II right censored sample is studied. Different estimation procedures for complete sample are generalized to the case with progressively censored data. These methods include the maximum likelihood estimators (MLEs), corrected MLEs, weighted MLEs, maximum product spacing estimators and least squares estimators. We also proposed the use of a censored estimation method with one-step bias-correction to obtain reliable initial estimates for iterative procedures. These methods are compared via a Monte Carlo simulation study in terms of their biases, root mean squared errors and their rates of obtaining reliable estimates. Recommendations are made from the simulation results and a numerical example is presented to illustrate all of the methods of inference developed here.  相似文献   

19.
This article investigates the presence of habit formation in household consumption, using data from the Panel Study of Income Dynamics. We develop an econometric model of internal habit formation of the multiplicative specification. The restrictions of the model allow for classical measurement errors in consumption without parametric assumptions on the distribution of measurement errors. We estimate the parameters by nonlinear generalized method of moments and find that habit formation is an important determinant of household food-consumption patterns. Using the parameter estimates, we develop bounds for the expectation of the implied heterogenous intertemporal elasticity of substitution and relative risk aversion that account for measurement errors, and compute confidence intervals for these bounds. Supplementary materials for this article are available online.  相似文献   

20.
The multivariate t linear mixed model (MtLMM) has been recently proposed as a robust tool for analysing multivariate longitudinal data with atypical observations. Missing outcomes frequently occur in longitudinal research even in well controlled situations. As a powerful alternative to the traditional expectation maximization based algorithm employing single imputation, we consider a Bayesian analysis of the MtLMM to account for the uncertainties of model parameters and missing outcomes through multiple imputation. An inverse Bayes formulas sampler coupled with Metropolis-within-Gibbs scheme is used to effectively draw the posterior distributions of latent data and model parameters. The techniques for multiple imputation of missing values, estimation of random effects, prediction of future responses, and diagnostics of potential outliers are investigated as well. The proposed methodology is illustrated through a simulation study and an application to AIDS/HIV data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号