首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 15 毫秒
1.
The sample selection bias problem occurs when the outcome of interest is only observed according to some selection rule, where there is a dependence structure between the outcome and the selection rule. In a pioneering work, J. Heckman proposed a sample selection model based on a bivariate normal distribution for dealing with this problem. Due to the non-robustness of the normal distribution, many alternatives have been introduced in the literature by assuming extensions of the normal distribution like the Student-t and skew-normal models. One common limitation of the existent sample selection models is that they require a transformation of the outcome of interest, which is common R+-valued, such as income and wage. With this, data are analyzed on a non-original scale which complicates the interpretation of the parameters. In this paper, we propose a sample selection model based on the bivariate Birnbaum–Saunders distribution, which has the same number of parameters that the classical Heckman model. Further, our associated outcome equation is R+-valued. We discuss estimation by maximum likelihood and present some Monte Carlo simulation studies. An empirical application to the ambulatory expenditures data from the 2001 Medical Expenditure Panel Survey is presented.  相似文献   

2.
3.
Models for dealing with survival data in the presence of a cured fraction of individuals have attracted the attention of many researchers and practitioners in recent years. In this paper, we propose a cure rate model under the competing risks scenario. For the number of causes that can lead to the event of interest, we assume the polylogarithm distribution. The model is flexible in the sense it encompasses some well-known models, which can be tested using large sample test statistics applied to nested models. Maximum-likelihood estimation based on the EM algorithm and hypothesis testing are investigated. Results of simulation studies designed to gauge the performance of the estimation method and of two test statistics are reported. The methodology is applied in the analysis of a data set.  相似文献   

4.
5.
Classical factor analysis relies on the assumption of normally distributed factors that guarantees the model to be estimated via the maximum likelihood method. Even when the assumption of Gaussian factors is not explicitly formulated and estimation is performed via the iterated principal factors’ method, the interest is actually mainly focussed on the linear structure of the data, since only moments up to the second ones are involved. In many real situations, the factors could not be adequately described by the first two moments only. For example, skewness characterizing most latent variables in social analysis can be properly measured by the third moment: the factors are not normally distributed and covariance is no longer a sufficient statistic. In this work we propose a factor model characterized by skew-normally distributed factors. Skew-normal refers to a parametric class of probability distributions, that extends the normal distribution by an additional shape parameter regulating the skewness. The model estimation can be solved by the generalized EM algorithm, in which the iterative Newthon–Raphson procedure is needed in the M-step to estimate the factor shape parameter. The proposed skew-normal factor analysis is applied to the study of student satisfaction towards university courses, in order to identify the factors representing different aspects of the latent overall satisfaction.  相似文献   

6.
Skew-normal distribution is a class of distributions that includes the normal distributions as a special case. In this paper, we explore the use of Markov Chain Monte Carlo (MCMC) methods to develop a Bayesian analysis in a multivariate, null intercept, measurement error model [R. Aoki, H. Bolfarine, J.A. Achcar, and D. Leão Pinto Jr, Bayesian analysis of a multivariate null intercept error-in-variables regression model, J. Biopharm. Stat. 13(4) (2003b), pp. 763–771] where the unobserved value of the covariate (latent variable) follows a skew-normal distribution. The results and methods are applied to a real dental clinical trial presented in [A. Hadgu and G. Koch, Application of generalized estimating equations to a dental randomized clinical trial, J. Biopharm. Stat. 9 (1999), pp. 161–178].  相似文献   

7.
The widely used Fellegi–Sunter model for probabilistic record linkage does not leverage information contained in field values and consequently leads to identical classification of match status regardless of whether records agree on rare or common values. Since agreement on rare values is less likely to occur by chance than agreement on common values, records agreeing on rare values are more likely to be matches. Existing frequency-based methods typically rely on knowledge of error probabilities associated with field values and frequencies of agreed field values among matches, often derived using prior studies or training data. When such information is unavailable, applications of these methods are challenging. In this paper, we propose a simple two-step procedure for frequency-based matching using the Fellegi–Sunter framework to overcome these challenges. Matching weights are adjusted based on frequency distributions of the agreed field values among matches and non-matches, estimated by the Fellegi–Sunter model without relying on prior studies or training data. Through a real-world application and simulation, our method is found to produce comparable or better performance than the unadjusted method. Furthermore, frequency-based matching provides greater improvement in matching accuracy when using poorly discriminating fields with diminished benefit as the discriminating power of matching fields increases.  相似文献   

8.
In this paper, we consider three different mixture models based on the Birnbaum-Saunders (BS) distribution, viz., (1) mixture of two different BS distributions, (2) mixture of a BS distribution and a length-biased version of another BS distribution, and (3) mixture of a BS distribution and its length-biased version. For all these models, we study their characteristics including the shape of their density and hazard rate functions. For the maximum likelihood estimation of the model parameters, we use the EM algorithm. For the purpose of illustration, we analyze two data sets related to enzyme and depressive condition problems. In the case of the enzyme data, it is shown that Model 1 provides the best fit, while for the depressive condition data, it is shown all three models fit well with Model 3 providing the best fit.  相似文献   

9.
We consider the detection of changes in the mean of a set of time series. The breakpoints are allowed to be series specific, and the series are assumed to be correlated. The correlation between the series is supposed to be constant along time but is allowed to take an arbitrary form. We show that such a dependence structure can be encoded in a factor model. Thanks to this representation, the inference of the breakpoints can be achieved via dynamic programming, which remains one the most efficient algorithms. We propose a model selection procedure to determine both the number of breakpoints and the number of factors. This proposed method is implemented in the FASeg R package, which is available on the CRAN. We demonstrate the performances of our procedure through simulation experiments and present an application to geodesic data.  相似文献   

10.
The family of power series cure rate models provides a flexible modeling framework for survival data of populations with a cure fraction. In this work, we present a simplified estimation procedure for the maximum likelihood (ML) approach. ML estimates are obtained via the expectation-maximization (EM) algorithm where the expectation step involves computation of the expected number of concurrent causes for each individual. It has the big advantage that the maximization step can be decomposed into separate maximizations of two lower-dimensional functions of the regression and survival distribution parameters, respectively. Two simulation studies are performed: the first to investigate the accuracy of the estimation procedure for different numbers of covariates and the second to compare our proposal with the direct maximization of the observed log-likelihood function. Finally, we illustrate the technique for parameter estimation on a dataset of survival times for patients with malignant melanoma.  相似文献   

11.
This article focuses on the parameter estimation of experimental items/units from Weibull Poisson Model under progressive type-II censoring with binomial removals (PT-II CBRs). The expectation–maximization algorithm has been used for maximum likelihood estimators (MLEs). The MLEs and Bayes estimators have been obtained under symmetric and asymmetric loss functions. Performance of competitive estimators have been studied through their simulated risks. One sample Bayes prediction and expected experiment time have also been studied. Furthermore, through real bladder cancer data set, suitability of considered model and proposed methodology have been illustrated.  相似文献   

12.
We present a new test for the presence of a normal mixture distribution, based on the posterior Bayes factor of Aitkin (1991). The new test has slightly lower power than the likelihood ratio test. It does not require the computation of the MLEs of the parameters or a search for multiple maxima, but requires computations based on classification likelihood assignments of observations to mixture components.  相似文献   

13.
Estimators derived from the expectation‐maximization (EM) algorithm are not robust since they are based on the maximization of the likelihood function. We propose an iterative proximal‐point algorithm based on the EM algorithm to minimize a divergence criterion between a mixture model and the unknown distribution that generates the data. The algorithm estimates in each iteration the proportions and the parameters of the mixture components in two separate steps. Resulting estimators are generally robust against outliers and misspecification of the model. Convergence properties of our algorithm are studied. The convergence of the introduced algorithm is discussed on a two‐component Weibull mixture entailing a condition on the initialization of the EM algorithm in order for the latter to converge. Simulations on Gaussian and Weibull mixture models using different statistical divergences are provided to confirm the validity of our work and the robustness of the resulting estimators against outliers in comparison to the EM algorithm. An application to a dataset of velocities of galaxies is also presented. The Canadian Journal of Statistics 47: 392–408; 2019 © 2019 Statistical Society of Canada  相似文献   

14.
The cumulative exposure model (CEM) is a commonly used statistical model utilized to analyze data from a step-stress accelerated life testing which is a special class of accelerated life testing (ALT). In practice, researchers conduct ALT to: (1) determine the effects of extreme levels of stress factors (e.g., temperature) on the life distribution, and (2) to gain information on the parameters of the life distribution more rapidly than under normal operating (or environmental) conditions. In literature, researchers assume that the CEM is from well-known distributions, such as the Weibull family. This study, on the other hand, considers a p-step-stress model with q stress factors from the two-parameter Birnbaum-Saunders distribution when there is a time constraint on the duration of the experiment. In this comparison paper, we consider different frameworks to numerically compute the point estimation for the unknown parameters of the CEM using the maximum likelihood theory. Each framework implements at least one optimization method; therefore, numerical examples and extensive Monte Carlo simulations are considered to compare and numerically examine the performance of the considered estimation frameworks.  相似文献   

15.
The K-means algorithm and the normal mixture model method are two common clustering methods. The K-means algorithm is a popular heuristic approach which gives reasonable clustering results if the component clusters are ball-shaped. Currently, there are no analytical results for this algorithm if the component distributions deviate from the ball-shape. This paper analytically studies how the K-means algorithm changes its classification rule as the normal component distributions become more elongated under the homoscedastic assumption and compares this rule with that of the Bayes rule from the mixture model method. We show that the classification rules of both methods are linear, but the slopes of the two classification lines change in the opposite direction as the component distributions become more elongated. The classification performance of the K-means algorithm is then compared to that of the mixture model method via simulation. The comparison, which is limited to two clusters, shows that the K-means algorithm provides poor classification performances consistently as the component distributions become more elongated while the mixture model method can potentially, but not necessarily, take advantage of this change and provide a much better classification performance.  相似文献   

16.
The factor score determinacy coefficient represents the common variance of the factor score predictor with the corresponding factor. The aim of the present simulation study was to compare the bias of determinacy coefficients based on different estimation methods of the exploratory factor model. Overall, determinacy coefficients computed from parameters based on maximum likelihood estimation, unweighted least squares estimation, and principal axis factoring were more precise than determinacy coefficients based on generalized least squares estimation and alpha factoring.  相似文献   

17.
A progressive hybrid censoring scheme is a mixture of type-I and type-II progressive censoring schemes. In this paper, we mainly consider the analysis of progressive type-II hybrid-censored data when the lifetime distribution of the individual item is the normal and extreme value distributions. Since the maximum likelihood estimators (MLEs) of these parameters cannot be obtained in the closed form, we propose to use the expectation and maximization (EM) algorithm to compute the MLEs. Also, the Newton–Raphson method is used to estimate the model parameters. The asymptotic variance–covariance matrix of the MLEs under EM framework is obtained by Fisher information matrix using the missing information and asymptotic confidence intervals for the parameters are then constructed. This study will end up with comparing the two methods of estimation and the asymptotic confidence intervals of coverage probabilities corresponding to the missing information principle and the observed information matrix through a simulation study, illustrated examples and real data analysis.  相似文献   

18.
In this paper, inference for the scale parameter of lifetime distribution of a k-unit parallel system is provided. Lifetime distribution of each unit of the system is assumed to be a member of a scale family of distributions. Maximum likelihood estimator (MLE) and confidence intervals for the scale parameter based on progressively Type-II censored sample are obtained. A β-expectation tolerance interval for the lifetime of the system is obtained. As a member of the scale family, half-logistic distribution is considered and the performance of the MLE, confidence intervals and tolerance intervals are studied using simulation.  相似文献   

19.
This paper describes a Bayesian approach to mixture modelling and a method based on predictive distribution to determine the number of components in the mixtures. The implementation is done through the use of the Gibbs sampler. The method is described through the mixtures of normal and gamma distributions. Analysis is presented in one simulated and one real data example. The Bayesian results are then compared with the likelihood approach for the two examples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号