首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
SUMMARY Using San Francisco city clinic cohort data, we estimate the HIV seroconversion distribution by both non-parametric and parametric methods, and illustrate the effects of age on this distribution. The non-parametric methods include the Turnbull method, the Bacchetti method, the expectation, maximization and smoothing (EMS) method and the penalized spline method. The seroconversion density curves estimated by these nonparametric methods are of bimodal nature with obvious effects of age. As a result of the bimodal nature of the seroconversion curves, the parametric models considered are mixtures of two distributions taken from the generalized log-logistic distribution with three parameters, the Weibull distribution and the log-normal distribution. In terms of the logarithm of the likelihood values, it appears that the non-parametric methods with smoothing as well as without smoothing (i.e. the Turnbull method) provided much better fits than did the parametric models. Among the non-parametric methods, the EMS and the spline estimates are more appealing, because the unsmoothed Turnbull estimates are very unstable and because the Bacchetti estimates have a longer tail. Among the parametric models, the mixture of a generalized log-logistic distribution with three parameters and a Weibull distribution or a log-normal distribution provided better fits than did other mixtures of parametric models.  相似文献   

2.
Abstract.  We consider large sample inference in a semiparametric logistic/proportional-hazards mixture model. This model has been proposed to model survival data where there exists a positive portion of subjects in the population who are not susceptible to the event under consideration. Previous studies of the logistic/proportional-hazards mixture model have focused on developing point estimation procedures for the unknown parameters. This paper studies large sample inferences based on the semiparametric maximum likelihood estimator. Specifically, we establish existence, consistency and asymptotic normality results for the semiparametric maximum likelihood estimator. We also derive consistent variance estimates for both the parametric and non-parametric components. The results provide a theoretical foundation for making large sample inference under the logistic/proportional-hazards mixture model.  相似文献   

3.
Abstract.  We consider a two-component mixture model where one component distribution is known while the mixing proportion and the other component distribution are unknown. These kinds of models were first introduced in biology to study the differences in expression between genes. The various estimation methods proposed till now have all assumed that the unknown distribution belongs to a parametric family. In this paper, we show how this assumption can be relaxed. First, we note that generally the above model is not identifiable, but we show that under moment and symmetry conditions some 'almost everywhere' identifiability results can be obtained. Where such identifiability conditions are fulfilled we propose an estimation method for the unknown parameters which is shown to be strongly consistent under mild conditions. We discuss applications of our method to microarray data analysis and to the training data problem. We compare our method to the parametric approach using simulated data and, finally, we apply our method to real data from microarray experiments.  相似文献   

4.
Consider the problem of covariance analysis based on regression models whose regression function is the sum of a linear and a non-parametric component. We propose a parametric and a non-parametric statistical test to compare the effects of the linear and non-parametric components, respectively, on the response variable in   L ≥ 2  groups. Serially correlated errors within each group are allowed. The first (second) test compares the differences between the estimates of the parametric (non-parametric) components of each group by means of a Mahalanobis  ( L 2)  distance. The asymptotic distribution of each statistic under the null hypothesis is obtained. A modest simulation study and an application to a real data set illustrate our methodology.  相似文献   

5.
This paper presents an EM algorithm for maximum likelihood estimation in generalized linear models with overdispersion. The algorithm is initially derived as a form of Gaussian quadrature assuming a normal mixing distribution, but with only slight variation it can be used for a completely unknown mixing distribution, giving a straightforward method for the fully non-parametric ML estimation of this distribution. This is of value because the ML estimates of the GLM parameters may be sensitive to the specification of a parametric form for the mixing distribution. A listing of a GLIM4 algorithm for fitting the overdispersed binomial logit model is given in an appendix.A simple method is given for obtaining correct standard errors for parameter estimates when using the EM algorithm.Several examples are discussed.  相似文献   

6.
A finite mixture model is considered in which the mixing probabilities vary from observation to observation. A parametric model is assumed for one mixture component distribution, while the others are nonparametric nuisance parameters. Generalized estimating equations (GEE) are proposed for the semi‐parametric estimation. Asymptotic normality of the GEE estimates is demonstrated and the lower bound for their dispersion (asymptotic covariance) matrix is derived. An adaptive technique is developed to derive estimates with nearly optimal small dispersion. An application to the sociological analysis of voting results is discussed. The Canadian Journal of Statistics 41: 217–236; 2013 © 2013 Statistical Society of Canada  相似文献   

7.
It has been found that, for a variety of probability distributions, there is a surprising linear relation between mode, mean, and median. In this article, the relation between mode, mean, and median regression functions is assumed to follow a simple parametric model. We propose a semiparametric conditional mode (mode regression) estimation for an unknown (unimodal) conditional distribution function in the context of regression model, so that any m-step-ahead mean and median forecasts can then be substituted into the resultant model to deliver m-step-ahead mode prediction. In the semiparametric model, Least Squared Estimator (LSEs) for the model parameters and the simultaneous estimation of the unknown mean and median regression functions by the local linear kernel method are combined to infer about the parametric and nonparametric components of the proposed model. The asymptotic normality of these estimators is derived, and the asymptotic distribution of the parameter estimates is also given and is shown to follow usual parametric rates in spite of the presence of the nonparametric component in the model. These results are applied to obtain a data-based test for the dependence of mode regression over mean and median regression under a regression model.  相似文献   

8.
Insurance and economic data are frequently characterized by positivity, skewness, leptokurtosis, and multi-modality; although many parametric models have been used in the literature, often these peculiarities call for more flexible approaches. Here, we propose a finite mixture of contaminated gamma distributions that provides a better characterization of data. It is placed in between parametric and non-parametric density estimation and strikes a balance between these alternatives, as a large class of densities can be implemented. We adopt a maximum likelihood approach to estimate the model parameters, providing the likelihood and the expected-maximization algorithm implemented to estimate all unknown parameters. We apply our approach to an artificial dataset and to two well-known datasets as the workers compensation data and the healthcare expenditure data taken from the medical expenditure panel survey. The Value-at-Risk is evaluated and comparisons with other benchmark models are provided.  相似文献   

9.
We developed a flexible non-parametric Bayesian model for regional disease-prevalence estimation based on cross-sectional data that are obtained from several subpopulations or clusters such as villages, cities, or herds. The subpopulation prevalences are modeled with a mixture distribution that allows for zero prevalence. The distribution of prevalences among diseased subpopulations is modeled as a mixture of finite Polya trees. Inferences can be obtained for (1) the proportion of diseased subpopulations in a region, (2) the distribution of regional prevalences, (3) the mean and median prevalence in the region, (4) the prevalence of any sampled subpopulation, and (5) predictive distributions of prevalences for regional subpopulations not included in the study, including the predictive probability of zero prevalence. We focus on prevalence estimation using data from a single diagnostic test, but we also briefly discuss the scenario where two conditionally dependent (or independent) diagnostic tests are used. Simulated data demonstrate the utility of our non-parametric model over parametric analysis. An example involving brucellosis in cattle is presented.  相似文献   

10.
Bayesian finite mixture modelling is a flexible parametric modelling approach for classification and density fitting. Many areas of application require distinguishing a signal from a noise component. In practice, it is often difficult to justify a specific distribution for the signal component; therefore, the signal distribution is usually further modelled via a mixture of distributions. However, modelling the signal as a mixture of distributions is computationally non-trivial due to the difficulties in justifying the exact number of components to be used and due to the label switching problem. This paper proposes the use of a non-parametric distribution to model the signal component. We consider the case of discrete data and show how this new methodology leads to more accurate parameter estimation and smaller false non-discovery rate. Moreover, it does not incur the label switching problem. We show an application of the method to data generated by ChIP-sequencing experiments.  相似文献   

11.
Abstract

In this paper we introduce continuous tree mixture model that is the mixture of undirected graphical models with tree structured graphs and is considered as multivariate analysis with a non parametric approach. We estimate its parameters, the component edge sets and mixture proportions through regularized maximum likalihood procedure. Our new algorithm, which uses expectation maximization algorithm and the modified version of Kruskal algorithm, simultaneosly estimates and prunes the mixture component trees. Simulation studies indicate this method performs better than the alternative Gaussian graphical mixture model. The proposed method is also applied to water-level data set and is compared with the results of Gaussian mixture model.  相似文献   

12.
A semiparametric two-component mixture model is considered, in which the distribution of one (primary) component is unknown and assumed symmetric. The distribution of the other component (admixture) is known. Generalized estimating equations are constructed for the estimation of the mixture proportion and the location parameter of the primary component. Asymptotic normality of the estimates is demonstrated and the lower bound for the asymptotic covariance matrix is obtained. An adaptive estimation technique is proposed to obtain the estimates with nearly optimal asymptotic variances.  相似文献   

13.
The interval-censored survival data appear very frequently, where the event of interest is not observed exactly but it is only known to occur within some time interval. In this paper, we propose a location-scale regression model based on the log-generalized gamma distribution for modelling interval-censored data. We shall be concerned only with parametric forms. The proposed model for interval-censored data represents a parametric family of models that has, as special submodels, other regression models which are broadly used in lifetime data analysis. Assuming interval-censored data, we consider a frequentist analysis, a Jackknife estimator and a non-parametric bootstrap for the model parameters. We derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and present some techniques to perform global influence.  相似文献   

14.
Parameters of a finite mixture model are often estimated by the expectation–maximization (EM) algorithm where the observed data log-likelihood function is maximized. This paper proposes an alternative approach for fitting finite mixture models. Our method, called the iterative Monte Carlo classification (IMCC), is also an iterative fitting procedure. Within each iteration, it first estimates the membership probabilities for each data point, namely the conditional probability of a data point belonging to a particular mixing component given that the data point value is obtained, it then classifies each data point into a component distribution using the estimated conditional probabilities and the Monte Carlo method. It finally updates the parameters of each component distribution based on the classified data. Simulation studies were conducted to compare IMCC with some other algorithms for fitting mixture normal, and mixture t, densities.  相似文献   

15.
Abstract.  We study a semiparametric generalized additive coefficient model (GACM), in which linear predictors in the conventional generalized linear models are generalized to unknown functions depending on certain covariates, and approximate the non-parametric functions by using polynomial spline. The asymptotic expansion with optimal rates of convergence for the estimators of the non-parametric part is established. Semiparametric generalized likelihood ratio test is also proposed to check if a non-parametric coefficient can be simplified as a parametric one. A conditional bootstrap version is suggested to approximate the distribution of the test under the null hypothesis. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed methods. We further apply the proposed model and methods to a data set from a human visceral Leishmaniasis study conducted in Brazil from 1994 to 1997. Numerical results outperform the traditional generalized linear model and the proposed GACM is preferable.  相似文献   

16.
In this paper, we consider a new mixture of varying coefficient models, in which each mixture component follows a varying coefficient model and the mixing proportions and dispersion parameters are also allowed to be unknown smooth functions. We systematically study the identifiability, estimation and inference for the new mixture model. The proposed new mixture model is rather general, encompassing many mixture models as its special cases such as mixtures of linear regression models, mixtures of generalized linear models, mixtures of partially linear models and mixtures of generalized additive models, some of which are new mixture models by themselves and have not been investigated before. The new mixture of varying coefficient model is shown to be identifiable under mild conditions. We develop a local likelihood procedure and a modified expectation–maximization algorithm for the estimation of the unknown non‐parametric functions. Asymptotic normality is established for the proposed estimator. A generalized likelihood ratio test is further developed for testing whether some of the unknown functions are constants. We derive the asymptotic distribution of the proposed generalized likelihood ratio test statistics and prove that the Wilks phenomenon holds. The proposed methodology is illustrated by Monte Carlo simulations and an analysis of a CO2‐GDP data set.  相似文献   

17.
As the treatments of cancer progress, a certain number of cancers are curable if diagnosed early. In population‐based cancer survival studies, cure is said to occur when mortality rate of the cancer patients returns to the same level as that expected for the general cancer‐free population. The estimates of cure fraction are of interest to both cancer patients and health policy makers. Mixture cure models have been widely used because the model is easy to interpret by separating the patients into two distinct groups. Usually parametric models are assumed for the latent distribution for the uncured patients. The estimation of cure fraction from the mixture cure model may be sensitive to misspecification of latent distribution. We propose a Bayesian approach to mixture cure model for population‐based cancer survival data, which can be extended to county‐level cancer survival data. Instead of modeling the latent distribution by a fixed parametric distribution, we use a finite mixture of the union of the lognormal, loglogistic, and Weibull distributions. The parameters are estimated using the Markov chain Monte Carlo method. Simulation study shows that the Bayesian method using a finite mixture latent distribution provides robust inference of parameter estimates. The proposed Bayesian method is applied to relative survival data for colon cancer patients from the Surveillance, Epidemiology, and End Results (SEER) Program to estimate the cure fractions. The Canadian Journal of Statistics 40: 40–54; 2012 © 2012 Statistical Society of Canada  相似文献   

18.
Abstract.  We consider marginal semiparametric partially linear models for longitudinal/clustered data and propose an estimation procedure based on a spline approximation of the non-parametric part of the model and an extension of the parametric marginal generalized estimating equations (GEE). Our estimates of both parametric part and non-parametric part of the model have properties parallel to those of parametric GEE, that is, the estimates are efficient if the covariance structure is correctly specified and they are still consistent and asymptotically normal even if the covariance structure is misspecified. By showing that our estimate achieves the semiparametric information bound, we actually establish the efficiency of estimating the parametric part of the model in a stronger sense than what is typically considered for GEE. The semiparametric efficiency of our estimate is obtained by assuming only conditional moment restrictions instead of the strict multivariate Gaussian error assumption.  相似文献   

19.
In biomedical research and diagnostic practice it is common to classify objects dichotomously based on continuous observations (x) measuring some form of biological activity, where some proportion of the objects have a level of activity above background. In this paper, we consider the problem of estimating the proportion of positive objects for a typical assay where:(i) the distribution of x for positive objects is unknown. although (ii) the risk of positivity is known to be a monotonic function of x:and (iii) x has been measured for a set of negative control objects. Monte Carlo simulations evaluating four alternative estimators of the positivity, including novel non-parametric mixture decompositions, indicate that where the positives and negatives have distributions of x with a moderate degree of overlap, a non-parametric decomposition using a latent class model provides precise and close to unbiased estimates. The methods are illustrated using data from an autoradiography assay used in cell biology.  相似文献   

20.
Kinetic models are used extensively in science, engineering, and medicine. Mathematically, they are a set of coupled differential equations including a source function, otherwise known as an input function. We investigate whether parametric modeling of a noisy input function offers any benefit over the non-parametric input function in estimating kinetic parameters. Our analysis includes four formulations of Bayesian posteriors of model parameters where noise is taken into account in the likelihood functions. Posteriors are determined numerically with a Markov chain Monte Carlo simulation. We compare point estimates derived from the posteriors to a weighted non-linear least squares estimate. Results imply that parametric modeling of the input function does not improve the accuracy of model parameters, even with perfect knowledge of the functional form. Posteriors are validated using an unconventional utilization of the χ2-test. We demonstrate that if the noise in the input function is not taken into account, the resulting posteriors are incorrect.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号