首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Biological control of pests is an important branch of entomology, providing environmentally friendly forms of crop protection. Bioassays are used to find the optimal conditions for the production of parasites and strategies for application in the field. In some of these assays, proportions are measured and, often, these data have an inflated number of zeros. In this work, six models will be applied to data sets obtained from biological control assays for Diatraea saccharalis , a common pest in sugar cane production. A natural choice for modelling proportion data is the binomial model. The second model will be an overdispersed version of the binomial model, estimated by a quasi-likelihood method. This model was initially built to model overdispersion generated by individual variability in the probability of success. When interest is only in the positive proportion data, a model can be based on the truncated binomial distribution and in its overdispersed version. The last two models include the zero proportions and are based on a finite mixture model with the binomial distribution or its overdispersed version for the positive data. Here, we will present the models, discuss their estimation and compare the results.  相似文献   

2.
Abstract

In this paper we are concerned with variable selection in finite mixture of semiparametric regression models. This task consists of model selection for non parametric component and variable selection for parametric part. Thus, we encountered separate model selections for every non parametric component of each sub model. To overcome this computational burden, we introduced a class of variable selection procedures for finite mixture of semiparametric regression models using penalized approach for variable selection. It is shown that the new method is consistent for variable selection. Simulations show that the performance of proposed method is good, and it consequently improves pervious works in this area and also requires much less computing power than existing methods.  相似文献   

3.
This research was motivated by our goal to design an efficient clinical trial to compare two doses of docosahexaenoic acid supplementation for reducing the rate of earliest preterm births (ePTB) and/or preterm births (PTB). Dichotomizing continuous gestational age (GA) data using a classic binomial distribution will result in a loss of information and reduced power. A distributional approach is an improved strategy to retain statistical power from the continuous distribution. However, appropriate distributions that fit the data properly, particularly in the tails, must be chosen, especially when the data are skewed. A recent study proposed a skew-normal method. We propose a three-component normal mixture model and introduce separate treatment effects at different components of GA. We evaluate operating characteristics of mixture model, beta-binomial model, and skew-normal model through simulation. We also apply these three methods to data from two completed clinical trials from the USA and Australia. Finite mixture models are shown to have favorable properties in PTB analysis but minimal benefit for ePTB analysis. Normal models on log-transformed data have the largest bias. Therefore we recommend finite mixture model for PTB study. Either finite mixture model or beta-binomial model is acceptable for ePTB study.  相似文献   

4.
The article considers Bayesian analysis of hierarchical models for count, binomial and multinomial data using efficient MCMC sampling procedures. To this end, an improved method of auxiliary mixture sampling is proposed. In contrast to previously proposed samplers the method uses a bounded number of latent variables per observation, independent of the intensity of the underlying Poisson process in the case of count data, or of the number of experiments in the case of binomial and multinomial data. The bounded number of latent variables results in a more general error distribution, which is a negative log-Gamma distribution with arbitrary integer shape parameter. The required approximations of these distributions by Gaussian mixtures have been computed. Overall, the improvement leads to a substantial increase in efficiency of auxiliary mixture sampling for highly structured models. The method is illustrated for finite mixtures of generalized linear models and an epidemiological case study.  相似文献   

5.
The rootogram is a graphical tool associated with the work of J. W. Tukey that was originally used for assessing goodness of fit of univariate distributions. Here, we extend the rootogram to regression models and show that this is particularly useful for diagnosing and treating issues such as overdispersion and/or excess zeros in count data models. We also introduce a weighted version of the rootogram that can be applied out of sample or to (weighted) subsets of the data, for example, in finite mixture models. An empirical illustration revisiting a well-known dataset from ethology is included, for which a negative binomial hurdle model is employed. Supplementary materials providing two further illustrations are available online: the first, using data from public health, employs a two-component finite mixture of negative binomial models; the second, using data from finance, involves underdispersion. An R implementation of our tools is available in the R package countreg. It also contains the data and replication code.  相似文献   

6.
Selection of the important variables is one of the most important model selection problems in statistical applications. In this article, we address variable selection in finite mixture of generalized semiparametric models. To overcome computational burden, we introduce a class of variable selection procedures for finite mixture of generalized semiparametric models using penalized approach for variable selection. Estimation of nonparametric component will be done via multivariate kernel regression. It is shown that the new method is consistent for variable selection and the performance of proposed method will be assessed via simulation.  相似文献   

7.
Analysis of the human sex ratio by using overdispersion models   总被引:2,自引:1,他引:1  
For study of the human sex ratio, one of the most important data sets was collected in Saxony in the 19th century by Geissler. The data contain the sizes of families, with the sex of all children, at the time of registration of the birth of a child. These data are reanalysed to determine how the probability for each sex changes with family size. Three models for overdispersion are fitted: the beta–binomial model of Skellam, the 'multiplicative' binomial model of Altham and the double-binomial model of Efron. For each distribution, both the probability and the dispersion parameters are allowed to vary simultaneously with family size according to two separate regression equations. A finite mixture model is also fitted. The models are fitted using non-linear Poisson regression. They are compared using direct likelihood methods based on the Akaike information criterion. The multiplicative and beta–binomial models provide similar fits, substantially better than that of the double-binomial model. All models show that both the probability that the child is a boy and the dispersion are greater in larger families. There is also some indication that a point probability mass is needed for families containing children uniquely of one sex.  相似文献   

8.
In recent years, a variety of regression models, including zero-inflated and hurdle versions, have been proposed to explain the case of a dependent variable with respect to exogenous covariates. Apart from the classical Poisson, negative binomial and generalised Poisson distributions, many proposals have appeared in the statistical literature, perhaps in response to the new possibilities offered by advanced software that now enables researchers to implement numerous special functions in a relatively simple way. However, we believe that a significant research gap remains, since very little attention has been paid to the quasi-binomial distribution, which was first proposed over fifty years ago. We believe this distribution might constitute a valid alternative to existing regression models, in situations in which the variable has bounded support. Therefore, in this paper we present a zero-inflated regression model based on the quasi-binomial distribution, taking into account the moments and maximum likelihood estimators, and perform a score test to compare the zero-inflated quasi-binomial distribution with the zero-inflated binomial distribution, and the zero-inflated model with the homogeneous model (the model in which covariates are not considered). This analysis is illustrated with two data sets that are well known in the statistical literature and which contain a large number of zeros.  相似文献   

9.
A general model is proposed for flexibly estimating the density of a continuous response variable conditional on a possibly high-dimensional set of covariates. The model is a finite mixture of asymmetric student t densities with covariate-dependent mixture weights. The four parameters of the components, the mean, degrees of freedom, scale and skewness, are all modeled as functions of the covariates. Inference is Bayesian and the computation is carried out using Markov chain Monte Carlo simulation. To enable model parsimony, a variable selection prior is used in each set of covariates and among the covariates in the mixing weights. The model is used to analyze the distribution of daily stock market returns, and shown to more accurately forecast the distribution of returns than other widely used models for financial data.  相似文献   

10.
Summary.  We consider a finite mixture model with k components and a kernel distribution from a general one-parameter family. The problem of testing the hypothesis k =2 versus k 3 is studied. There has been no general statistical testing procedure for this problem. We propose a modified likelihood ratio statistic where under the null and the alternative hypotheses the estimates of the parameters are obtained from a modified likelihood function. It is shown that estimators of the support points are consistent. The asymptotic null distribution of the modified likelihood ratio test proposed is derived and found to be relatively simple and easily applied. Simulation studies for the asymptotic modified likelihood ratio test based on finite mixture models with normal, binomial and Poisson kernels suggest that the test proposed performs well. Simulation studies are also conducted for a bootstrap method with normal kernels. An example involving foetal movement data from a medical study illustrates the testing procedure.  相似文献   

11.
Despite the popularity and importance, there is limited work on modelling data which come from complex survey design using finite mixture models. In this work, we explored the use of finite mixture regression models when the samples were drawn using a complex survey design. In particular, we considered modelling data collected based on stratified sampling design. We developed a new design-based inference where we integrated sampling weights in the complete-data log-likelihood function. The expectation–maximisation algorithm was developed accordingly. A simulation study was conducted to compare the new methodology with the usual finite mixture of a regression model. The comparison was done using bias-variance components of mean square error. Additionally, a simulation study was conducted to assess the ability of the Bayesian information criterion to select the optimal number of components under the proposed modelling approach. The methodology was implemented on real data with good results.  相似文献   

12.
Variable selection in finite mixture of regression (FMR) models is frequently used in statistical modeling. The majority of applications of variable selection in FMR models use a normal distribution for regression error. Such assumptions are unsuitable for a set of data containing a group or groups of observations with asymmetric behavior. In this paper, we introduce a variable selection procedure for FMR models using the skew-normal distribution. With appropriate choice of the tuning parameters, we establish the theoretical properties of our procedure, including consistency in variable selection and the oracle property in estimation. To estimate the parameters of the model, a modified EM algorithm for numerical computations is developed. The methodology is illustrated through numerical experiments and a real data example.  相似文献   

13.
The phenotype of a quantitative trait locus (QTL) is often modeled by a finite mixture of normal distributions. If the QTL effect depends on the number of copies of a specific allele one carries, then the mixture model has three components. In this case, the mixing proportions have a binomial structure according to the Hardy–Weinberg equilibrium. In the search for QTL, a significance test of homogeneity against the Hardy–Weinberg normal mixture model alternative is an important first step. The LOD score method, a likelihood ratio test used in genetics, is a favored choice. However, there is not yet a general theory for the limiting distribution of the likelihood ratio statistic in the presence of unknown variance. This paper derives the limiting distribution of the likelihood ratio statistic, which can be described by the supremum of a quadratic form of a Gaussian process. Further, the result implies that the distribution of the modified likelihood ratio statistic is well approximated by a chi-squared distribution. Simulation results show that the approximation has satisfactory precision for the cases considered. We also give a real-data example.  相似文献   

14.
We extend proportional hazards frailty models for lifetime data to allow a negative binomial, Poisson, Geometric or other discrete distribution of the frailty variable. This might represent, for example, the unknown number of flaws in an item under test. Zero frailty corresponds to a limited failure model containing a proportion of units that never fail (long-term survivors). Ways of modifying the model to avoid this are discussed. The models are illustrated on a previously published set of data on failures of printed circuit boards and on new data on breaking strengths of samples of cord.  相似文献   

15.
Abstract

Variable selection in finite mixture of regression (FMR) models is frequently used in statistical modeling. The majority of applications of variable selection in FMR models use a normal distribution for regression error. Such assumptions are unsuitable for a set of data containing a group or groups of observations with heavy tails and outliers. In this paper, we introduce a robust variable selection procedure for FMR models using the t distribution. With appropriate selection of the tuning parameters, the consistency and the oracle property of the regularized estimators are established. To estimate the parameters of the model, we develop an EM algorithm for numerical computations and a method for selecting tuning parameters adaptively. The parameter estimation performance of the proposed model is evaluated through simulation studies. The application of the proposed model is illustrated by analyzing a real data set.  相似文献   

16.
This paper addresses the problem of detecting a mixture of parallel regression lines when information about group member¬ship of individual cases is not given. The problem is approached as a missing variable problem, with the missing variables being the dummy variables that code for groups. If a mixture of par¬allel regression lines with normally distributed error terms is present, a simple regression model without dummy variables will produce residuals that follow approximately a mixed normal dis¬tribution. In a simulation studyr several goodness-of-fit tests of normality were used to test the residuals obtained from mis-specified models that excluded dummy variables, Factors varied in the simulation included the number and the separation of the parallel lines and the sample size, The goodness-of-fit test based on the sample kurtosis (82) was overall most powerful in detecting mixtures of parallel regression lines, Applications are discussed.  相似文献   

17.
For clustering mixed categorical and continuous data, Lawrence and Krzanowski (1996) proposed a finite mixture model in which component densities conform to the location model. In the graphical models literature the location model is known as the homogeneous Conditional Gaussian model. In this paper it is shown that their model is not identifiable without imposing additional restrictions. Specifically, for g groups and m locations, (g!)m–1 distinct sets of parameter values (not including permutations of the group mixing parameters) produce the same likelihood function. Excessive shrinkage of parameter estimates in a simulation experiment reported by Lawrence and Krzanowski (1996) is shown to be an artifact of the model's non-identifiability. Identifiable finite mixture models can be obtained by imposing restrictions on the conditional means of the continuous variables. These new identified models are assessed in simulation experiments. The conditional mean structure of the continuous variables in the restricted location mixture models is similar to that in the underlying variable mixture models proposed by Everitt (1988), but the restricted location mixture models are more computationally tractable.  相似文献   

18.
Abstract

In this article, we propose a new penalized-likelihood method to conduct model selection for finite mixture of regression models. The penalties are imposed on mixing proportions and regression coefficients, and hence order selection of the mixture and the variable selection in each component can be simultaneously conducted. The consistency of order selection and the consistency of variable selection are investigated. A modified EM algorithm is proposed to maximize the penalized log-likelihood function. Numerical simulations are conducted to demonstrate the finite sample performance of the estimation procedure. The proposed methodology is further illustrated via real data analysis.  相似文献   

19.
Many probability distributions can be represented as compound distributions. Consider some parameter vector as random. The compound distribution is the expected distribution of the variable of interest given the random parameters. Our idea is to define a partition of the domain of definition of the random parameters, so that we can represent the expected density of the variable of interest as a finite mixture of conditional densities. We then model the mixture probabilities of the conditional densities using information on population categories, thus modifying the original overall model. We thus obtain specific models for sub-populations that stem from the overall model. The distribution of a sub-population of interest is thus completely specified in terms of mixing probabilities. All characteristics of interest can be derived from this distribution and the comparison between sub-populations easily proceeds from the comparison of the mixing probabilities. A real example based on EU-SILC data is given. Then the methodology is investigated through simulation.  相似文献   

20.
Summary.  We propose a mixture of binomial and beta–binomial distributions for estimating the size of closed populations. The new mixture model is applied to several real capture–recapture data sets and is shown to provide a convenient, objective framework for model selection. The new model is compared with three alternative models in a simulation study, and the results shed light on the general performance of models in this area. The new model provides a robust flexible analysis, which automatically deals with small capture probabilities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号