首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Summary.  Factor analysis is a powerful tool to identify the common characteristics among a set of variables that are measured on a continuous scale. In the context of factor analysis for non-continuous-type data, most applications are restricted to item response data only. We extend the factor model to accommodate ranked data. The Monte Carlo expectation–maximization algorithm is used for parameter estimation at which the E-step is implemented via the Gibbs sampler. An analysis based on both complete and incomplete ranked data (e.g. rank the top q out of k items) is considered. Estimation of the factor scores is also discussed. The method proposed is applied to analyse a set of incomplete ranked data that were obtained from a survey that was carried out in GuangZhou, a major city in mainland China, to investigate the factors affecting people's attitude towards choosing jobs.  相似文献   

2.
When modeling multilevel data, it is important to accurately represent the interdependence of observations within clusters. Ignoring data clustering may result in parameter misestimation. However, it is not well established to what degree parameter estimates are affected by model misspecification when applying missing data techniques (MDTs) to incomplete multilevel data. We compare the performance of three MDTs with incomplete hierarchical data. We consider the impact of imputation model misspecification on the quality of parameter estimates by employing multiple imputation under assumptions of a normal model (MI/NM) with two-level cross-sectional data when values are missing at random on the dependent variable at rates of 10%, 30%, and 50%. Five criteria are used to compare estimates from MI/NM to estimates from MI assuming a linear mixed model (MI/LMM) and maximum likelihood estimation to the same incomplete data sets. With 10% missing data (MD), techniques performed similarly for fixed-effects estimates, but variance components were biased with MI/NM. Effects of model misspecification worsened at higher rates of MD, with the hierarchical structure of the data markedly underrepresented by biased variance component estimates. MI/LMM and maximum likelihood provided generally accurate and unbiased parameter estimates but performance was negatively affected by increased rates of MD.  相似文献   

3.
In this paper, we study the performance of a soccer player based on analysing an incomplete data set. To achieve this aim, we fit the bivariate Rayleigh distribution to the soccer dataset by the maximum likelihood method. In this way, the missing data and right censoring problems, that usually happen in such studies, are considered. Our aim is to inference about the performance of a soccer player by considering the stress and strength components. The first goal of the player of interest in a match is assumed as the stress component and the second goal of the match is assumed as the strength component. We propose some methods to overcome incomplete data problem and we use these methods to inference about the performance of a soccer player.  相似文献   

4.
In many clinical studies, subjects are at risk of experiencing more than one type of potentially recurrent event. In some situations, however, the occurrence of an event is observed, but the specific type is not determined. We consider the analysis of this type of incomplete data when the objectives are to summarize features of conditional intensity functions and associated treatment effects, and to study the association between different types of event. Here we describe a likelihood approach based on joint models for the multi-type recurrent events where parameter estimation is obtained from a Monte-Carlo EM algorithm. Simulation studies show that the proposed method gives unbiased estimators for regression coefficients and variance–covariance parameters, and the coverage probabilities of confidence intervals for regression coefficients are close to the nominal level. When the distribution of the frailty variable is misspecified, the method still provides estimators of the regression coefficients with good properties. The proposed method is applied to a motivating data set from an asthma study in which exacerbations were to be sub-typed by cellular analysis of sputum samples as eosinophilic or non-eosinophilic.  相似文献   

5.
The McDonald extended distribution: properties and applications   总被引:1,自引:0,他引:1  
We study a five-parameter lifetime distribution called the McDonald extended exponential model to generalize the exponential, generalized exponential, Kumaraswamy exponential and beta exponential distributions, among others. We obtain explicit expressions for the moments and incomplete moments, quantile and generating functions, mean deviations, Bonferroni and Lorenz curves and Gini concentration index. The method of maximum likelihood and a Bayesian procedure are adopted for estimating the model parameters. The applicability of the new model is illustrated by means of a real data set.  相似文献   

6.
Abstract

In survival or reliability studies, it is common to have data which are not only incomplete but weakly dependent too. Random truncation and censoring are two common forms of such data when they are neither independent nor strongly mixing but rather associated. The focus of this paper is on estimating conditional distribution and conditional quantile functions for randomly left truncated data satisfying association condition. We aim at deriving strong uniform consistency rates and asymptotic normality for the estimators and thereby, extend to association case some results stated under iid and α-mixing hypotheses. The performance of the quantile function estimator is evaluated on simulated data sets.  相似文献   

7.
This paper presents a new parametric model for recurrent events, in which the time of each recurrence is associated to one or multiple latent causes and no information is provided about the responsible cause for the event. This model is characterized by a rate function and it is based on the Poisson-exponential distribution, namely the distribution of the maximum among a random number (truncated Poisson distributed) of exponential times. The time of each recurrence is then given by the maximum lifetime value among all latent causes. Inference is based on a maximum likelihood approach. A simulation study is performed in order to observe the frequentist properties of the estimation procedure for small and moderate sample sizes. We also investigated likelihood-based tests procedures. A real example from a gastroenterology study concerning small bowel motility during fasting state is used to illustrate the methodology. Finally, we apply the proposed model to a real data set and compare it with the classical Homogeneous Poisson model, which is a particular case.  相似文献   

8.
Most data used to study the durations of unemployment spells come from the Current Population Survey (CPS), which is a point-in-time survey and gives an incomplete picture of the underlying duration distribution. We introduce a new sample of completed unemployment spells obtained from panel data and apply CPS sampling and reporting techniques to replicate the type of data used by other researchers. Predicted duration distributions derived from this CPS-like data are then compared to the actual distribution. We conclude that the best inferences that can be made about unemployment durations by using CPS-like data are seriously biased.  相似文献   

9.
Multiple imputation has emerged as a popular approach to handling data sets with missing values. For incomplete continuous variables, imputations are usually produced using multivariate normal models. However, this approach might be problematic for variables with a strong non-normal shape, as it would generate imputations incoherent with actual distributions and thus lead to incorrect inferences. For non-normal data, we consider a multivariate extension of Tukey's gh distribution/transformation [38] to accommodate skewness and/or kurtosis and capture the correlation among the variables. We propose an algorithm to fit the incomplete data with the model and generate imputations. We apply the method to a national data set for hospital performance on several standard quality measures, which are highly skewed to the left and substantially correlated with each other. We use Monte Carlo studies to assess the performance of the proposed approach. We discuss possible generalizations and give some advices to practitioners on how to handle non-normal incomplete data.  相似文献   

10.
Missing data can rarely be avoided in large scale studies in which subjects are requested to complete questionnaires with many items. Analyses of such surveys are often based on the records with no missing items, resulting in a loss of efficiency and, when data are missing not at random, in bias. This paper applies the method of multiple imputation to handle missing data in an analysis of alcohol consumption of the subjects in the Medical Research Council National Survey of Health and Development. The outcomes studied are derived from the entries in diaries of food and drink intake over seven designated days. Background variables and other responses related to alcohol consumption and associated problems are used as collateral information. In conventional analyses, subpopulation means of quantities of alcohol consumed are compared. Since we are interested in the harmful effects of alcohol, we make inferences about the percentages of those who consume more than a given quantity of net alcohol. We assess the contribution to the analyses made by the incomplete records and outline a more integrated way of applying multiple imputation in large scale longitudinal surveys.  相似文献   

11.
Clustered longitudinal data feature cross‐sectional associations within clusters, serial dependence within subjects, and associations between responses at different time points from different subjects within the same cluster. Generalized estimating equations are often used for inference with data of this sort since they do not require full specification of the response model. When data are incomplete, however, they require data to be missing completely at random unless inverse probability weights are introduced based on a model for the missing data process. The authors propose a robust approach for incomplete clustered longitudinal data using composite likelihood. Specifically, pairwise likelihood methods are described for conducting robust estimation with minimal model assumptions made. The authors also show that the resulting estimates remain valid for a wide variety of missing data problems including missing at random mechanisms and so in such cases there is no need to model the missing data process. In addition to describing the asymptotic properties of the resulting estimators, it is shown that the method performs well empirically through simulation studies for complete and incomplete data. Pairwise likelihood estimators are also compared with estimators obtained from inverse probability weighted alternating logistic regression. An application to data from the Waterloo Smoking Prevention Project is provided for illustration. The Canadian Journal of Statistics 39: 34–51; 2011 © 2010 Statistical Society of Canada  相似文献   

12.
A four-parameter extension of the generalized gamma distribution capable of modelling a bathtub-shaped hazard rate function is defined and studied. The beauty and importance of this distribution lies in its ability to model monotone and non-monotone failure rate functions, which are quite common in lifetime data analysis and reliability. The new distribution has a number of well-known lifetime special sub-models, such as the exponentiated Weibull, exponentiated generalized half-normal, exponentiated gamma and generalized Rayleigh, among others. We derive two infinite sum representations for its moments. We calculate the density of the order statistics and two expansions for their moments. The method of maximum likelihood is used for estimating the model parameters and the observed information matrix is obtained. Finally, a real data set from the medical area is analysed.  相似文献   

13.
The United States is experiencing a major public health problem relating to increasing levels of excess body fat. This paper is about the relationship in the United States between trends in the distribution of body mass index (BMI), including trends in overweight and obesity, and demographic change. We provide estimates of the counterfactual distribution of BMI that would have been observed in 2003–2008 had demographics remained fixed at 1980 values, roughly the beginning of the period of increasing overweight and obesity. We find that changes in demographics are partly responsible for the changes in the population distribution of BMI and are capable of explaining about 8.6% of the increase in the combined rate of overweight and obesity among women and about 7.2% of the increase among men. We also use demographic projections to predict a BMI distribution and corresponding rates of overweight and obesity for 2050.  相似文献   

14.
Summary.  We consider the Bayesian analysis of human movement data, where the subjects perform various reaching tasks. A set of markers is placed on each subject and a system of cameras records the three-dimensional Cartesian co-ordinates of the markers during the reaching movement. It is of interest to describe the mean and variability of the curves that are traced by the markers during one reaching movement, and to identify any differences due to covariates. We propose a methodology based on a hierarchical Bayesian model for the curves. An important part of the method is to obtain identifiable features of the movement so that different curves can be compared after temporal warping. We consider four landmarks and a set of equally spaced pseudolandmarks are located in between. We demonstrate that the algorithm works well in locating the landmarks, and shape analysis techniques are used to describe the posterior distribution of the mean curve. A feature of this type of data is that some parts of the movement data may be missing—the Bayesian methodology is easily adapted to cope with this situation.  相似文献   

15.
The occurrence of missing data is an often unavoidable consequence of repeated measures studies. Fortunately, multivariate general linear models such as growth curve models and linear mixed models with random effects have been well developed to analyze incomplete normally-distributed repeated measures data. Most statistical methods have assumed that the missing data occur at random. This assumption may include two types of missing data mechanism: missing completely at random (MCAR) and missing at random (MAR) in the sense of Rubin (1976). In this paper, we develop a test procedure for distinguishing these two types of missing data mechanism for incomplete normally-distributed repeated measures data. The proposed test is similar in spiril to the test of Park and Davis (1992). We derive the test for incomplete normally-distribrlted repeated measures data using linear mixed models. while Park and Davis (1992) cleirved thr test for incomplete repeatctl categorical data in the framework of Grizzle Starmer. and Koch (1969). Thr proposed procedure can be applied easily to any other multivariate general linear model which allow for missing data. The test is illustrated using the hip-replacernent patient.data from Crowder and Hand (1990).  相似文献   

16.
The field of genetic epidemiology is growing rapidly with the realization that many important diseases are influenced by both genetic and environmental factors. For this reason, pedigree data are becoming increasingly valuable as a means of studying patterns of disease occurrence. Analysis of pedigree data is complicated by the lack of independence among family members and by the non-random sampling schemes used to ascertain families. An additional complicating factor is the variability in age at disease onset from one person to another. In developing statistical methods for analysing pedigree data, analytic results are often intractable, making simulation studies imperative for assessing the performance of proposed methods and estimators. In this paper, an algorithm is presented for simulating disease data in pedigrees, incorporating variable age at onset and genetic and environmental effects. Computational formulas are developed in the context of a proportional hazards model and assuming single ascertainment of families, but the methods can be easily generalized to alternative models. The algorithm is computationally efficient, making multi-dataset simulation studies feasible. Numerical examples are provided to demonstrate the methods.  相似文献   

17.
18.
A popular choice when analyzing ordinal data is to consider the cumulative proportional odds model to relate the marginal probabilities of the ordinal outcome to a set of covariates. However, application of this model relies on the condition of identical cumulative odds ratios across the cut-offs of the ordinal outcome; the well-known proportional odds assumption. This paper focuses on the assessment of this assumption while accounting for repeated and missing data. In this respect, we develop a statistical method built on multiple imputation (MI) based on generalized estimating equations that allows to test the proportionality assumption under the missing at random setting. The performance of the proposed method is evaluated for two MI algorithms for incomplete longitudinal ordinal data. The impact of both MI methods is compared with respect to the type I error rate and the power for situations covering various numbers of categories of the ordinal outcome, sample sizes, rates of missingness, well-balanced and skewed data. The comparison of both MI methods with the complete-case analysis is also provided. We illustrate the use of the proposed methods on a quality of life data from a cancer clinical trial.  相似文献   

19.
Phillips and Sweeting [J. R. Statist. Soc. B 58 (1996) 775–783.] considered estimation of the parameter of the exponential distribution with censored failure time data when there is incomplete knowledge of the censoring times. It was shown that, under particular models for the censoring mechanism and censoring errors, it will usually be safe to ignore such errors provided they are not expected to be too large. A flexible model is introduced which includes the extreme cases of no censoring errors and no information on the censoring values. The effect of alternative assumptions about knowledge of the censoring values on the estimation of failure rate is investigated.  相似文献   

20.
This paper presents a Bayesian analysis of the projected normal distribution, which is a flexible and useful distribution for the analysis of directional data. We obtain samples from the posterior distribution using the Gibbs sampler after the introduction of suitably chosen latent variables. The procedure is illustrated using simulated data as well as a real data set previously analysed in the literature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号