首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
ABSTRACT

We propose an extension of parametric product partition models. We name our proposal nonparametric product partition models because we associate a random measure instead of a parametric kernel to each set within a random partition. Our methodology does not impose any specific form on the marginal distribution of the observations, allowing us to detect shifts of behaviour even when dealing with heavy-tailed or skewed distributions. We propose a suitable loss function and find the partition of the data having minimum expected loss. We then apply our nonparametric procedure to multiple change-point analysis and compare it with PPMs and with other methodologies that have recently appeared in the literature. Also, in the context of missing data, we exploit the product partition structure in order to estimate the distribution function of each missing value, allowing us to detect change points using the loss function mentioned above. Finally, we present applications to financial as well as genetic data.  相似文献   

2.
We consider the problem of random grouping of data from discrete distributions to form χ2 goodness of fit tests. In general the random partitions need not converge to the partition one gets using the true distribution function and the same partitioning scheme. We present a method of guaranteeing the convergence of the random partition and yielding the usual χ2 asymptotics.  相似文献   

3.
Covariate informed product partition models incorporate the intuitively appealing notion that individuals or units with similar covariate values a priori have a higher probability of co-clustering than those with dissimilar covariate values. These methods have been shown to perform well if the number of covariates is relatively small. However, as the number of covariates increase, their influence on partition probabilities overwhelm any information the response may provide in clustering and often encourage partitions with either a large number of singleton clusters or one large cluster resulting in poor model fit and poor out-of-sample prediction. This same phenomenon is observed in Bayesian nonparametric regression methods that induce a conditional distribution for the response given covariates through a joint model. In light of this, we propose two methods that calibrate the covariate-dependent partition model by capping the influence that covariates have on partition probabilities. We demonstrate the new methods’ utility using simulation and two publicly available datasets.  相似文献   

4.
In the causal analysis of survival data a time-based response is related to a set of explanatory variables. Definition of the relation between the time and the covariates may become a difficult task, particularly in the preliminary stage, when the information is limited. Through a nonparametric approach, we propose to estimate the survival function allowing to evaluate the relative importance of each potential explanatory variable, in a simple and explanatory fashion. To achieve this aim, each of the explanatory variables is used to partition the observed survival times. The observations are assumed to be partially exchangeable according to such partition. We then consider, conditionally on each partition, a hierarchical nonparametric Bayesian model on the hazard functions. We define and compare different prior distribution for the hazard functions.  相似文献   

5.
This paper shows that the Fisher information in a partition of the space of a random variable X is less than or equal to the Fisher information of X about the unknown parameter. It also provides tables to construct β-sufficient partitions if X has an exponential distribution or a geometric distribution. Simulation study was conducted to show the advantage of using these tables over the usual methods used to construct histograms.  相似文献   

6.
We consider probability models for the estimation of normal means that allow for equality among arbitrary subsets of the means. These models, which are called product partition models, assign probabilities to random partitions of sets of objects. Here, the objects correspond to the means. We look at two interesting cases – the first is when all the means are equal and the second is when there are several sets of equal means that are far apart. We show that the posterior distribution of the number of sets in the partition is asymptotically Poisson-like. This will help us calibrate the choice of one of our prior parameters. Finally, we look at simulations to see how well the above results hold for moderate sample sizes.  相似文献   

7.
Some conditional models to deal with binary longitudinal responses are proposed, extending random effects models to include serial dependence of Markovian form, and hence allowing for quite general association structures between repeated observations recorded on the same individual. The presence of both these components implies a form of dependence between them, and so a complicated expression for the resulting likelihood. To handle this problem, we introduce, as a first instance, what Follmann and Wu (1995) called, in a different setting, an approximate conditional model, which represents an optimal choice for the general framework of categorical longitudinal responses. Then we define two more formally correct models for the binary case, with no assumption about the distribution of the random effect. All of the discussed models are estimated by means of an EM algorithm for nonparametric maximum likelihood. The algorithm, an adaptation of that used by Aitkin (1996) for the analysis of overdispersed generalized linear models, is initially derived as a form of Gaussian quadrature, and then extended to a completely unknown mixing distribution. A large scale simulation work is described to explore the behaviour of the proposed approaches in a number of different situations.  相似文献   

8.
Nonparametric inference for point processes is discussed by way of histograms, which provide a nice tool for the analysis of on-line data. The construction of histograms depends on a sequence of partitions, which we take tc be nonenibedded to allow partitions with sets of equal measure. This presents some theoretical problems, which are addressed with an assumption on the decomposition of second order moments. In another direction, we drop the usual independence assumption on the sample, replacing it by a strong mixing assumption. Under this setting, we study the convergence of the histogram in probability, which depends on approximation conditions between the distributions of random pairs and the product of their marginal distributions, and^almost completely, which is based on the decomposition of the second order moments. This last convergence is stated on two versions according to the assumption of Laplace transforms or the Cramer moment conditions. These are somewhat stronger, but enable us to recover the usual condition on the decrease rate of sets on each partition. In the final section we prove that the finite dimensional distributions converge in distribution to a Gaussian centered vector with a specified covariance.  相似文献   

9.
Many probability distributions can be represented as compound distributions. Consider some parameter vector as random. The compound distribution is the expected distribution of the variable of interest given the random parameters. Our idea is to define a partition of the domain of definition of the random parameters, so that we can represent the expected density of the variable of interest as a finite mixture of conditional densities. We then model the mixture probabilities of the conditional densities using information on population categories, thus modifying the original overall model. We thus obtain specific models for sub-populations that stem from the overall model. The distribution of a sub-population of interest is thus completely specified in terms of mixing probabilities. All characteristics of interest can be derived from this distribution and the comparison between sub-populations easily proceeds from the comparison of the mixing probabilities. A real example based on EU-SILC data is given. Then the methodology is investigated through simulation.  相似文献   

10.
This paper examines the use of Dirichlet process mixtures for curve fitting. An important modelling aspect in this setting is the choice between constant and covariate‐dependent weights. By examining the problem of curve fitting from a predictive perspective, we show the advantages of using covariate‐dependent weights. These advantages are a result of the incorporation of covariate proximity in the latent partition. However, closer examination of the partition yields further complications, which arise from the vast number of total partitions. To overcome this, we propose to modify the probability law of the random partition to strictly enforce the notion of covariate proximity, while still maintaining certain properties of the Dirichlet process. This allows the distribution of the partition to depend on the covariate in a simple manner and greatly reduces the total number of possible partitions, resulting in improved curve fitting and faster computations. Numerical illustrations are presented.  相似文献   

11.
This paper presents a generalization of the partition of the chi-squared statistic presented in Beh & Davy (1998). For a three-way contingency table with one or two sets of ordered categories, the chi-squared statistic partition is defined using orthogonal polynomials. Using this partition, information about the relationship between the variables can be obtained by identifying important associations in terms of the location (linear), dispersion (quadratic) and higher order components. The paper compares these partitions with log-linear models for ordinal data.  相似文献   

12.
Robust mixture modelling using the t distribution   总被引:2,自引:0,他引:2  
Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.  相似文献   

13.
Categorical data frequently arise in applications in the Social Sciences. In such applications, the class of log-linear models, based on either a Poisson or (product) multinomial response distribution, is a flexible model class for inference and prediction. In this paper we consider the Bayesian analysis of both Poisson and multinomial log-linear models. It is often convenient to model multinomial or product multinomial data as observations of independent Poisson variables. For multinomial data, Lindley (1964) [20] showed that this approach leads to valid Bayesian posterior inferences when the prior density for the Poisson cell means factorises in a particular way. We develop this result to provide a general framework for the analysis of multinomial or product multinomial data using a Poisson log-linear model. Valid finite population inferences are also available, which can be particularly important in modelling social data. We then focus particular attention on multivariate normal prior distributions for the log-linear model parameters. Here, an improper prior distribution for certain Poisson model parameters is required for valid multinomial analysis, and we derive conditions under which the resulting posterior distribution is proper. We also consider the construction of prior distributions across models, and for model parameters, when uncertainty exists about the appropriate form of the model. We present classes of Poisson and multinomial models, invariant under certain natural groups of permutations of the cells. We demonstrate that, if prior belief concerning the model parameters is also invariant, as is the case in a ‘reference’ analysis, then the choice of prior distribution is considerably restricted. The analysis of multivariate categorical data in the form of a contingency table is considered in detail. We illustrate the methods with two examples.  相似文献   

14.
Longitudinal studies of a binary outcome are common in the health, social, and behavioral sciences. In general, a feature of random effects logistic regression models for longitudinal binary data is that the marginal functional form, when integrated over the distribution of the random effects, is no longer of logistic form. Recently, Wang and Louis (2003) proposed a random intercept model in the clustered binary data setting where the marginal model has a logistic form. An acknowledged limitation of their model is that it allows only a single random effect that varies from cluster to cluster. In this paper, we propose a modification of their model to handle longitudinal data, allowing separate, but correlated, random intercepts at each measurement occasion. The proposed model allows for a flexible correlation structure among the random intercepts, where the correlations can be interpreted in terms of Kendall's τ. For example, the marginal correlations among the repeated binary outcomes can decline with increasing time separation, while the model retains the property of having matching conditional and marginal logit link functions. Finally, the proposed method is used to analyze data from a longitudinal study designed to monitor cardiac abnormalities in children born to HIV-infected women.  相似文献   

15.
Many recent applications of nonparametric Bayesian inference use random partition models, i.e. probability models for clustering a set of experimental units. We review the popular basic constructions. We then focus on an interesting extension of such models. In many applications covariates are available that could be used to a priori inform the clustering. This leads to random clustering models indexed by covariates, i.e., regression models with the outcome being a partition of the experimental units. We discuss some alternative approaches that have been used in the recent literature to implement such models, with an emphasis on a recently proposed extension of product partition models. Several of the reviewed approaches were not originally intended as covariate-based random partition models, but can be used for such inference.  相似文献   

16.
We propose a random partition model that implements prediction with many candidate covariates and interactions. The model is based on a modified product partition model that includes a regression on covariates by favouring homogeneous clusters in terms of these covariates. Additionally, the model allows for a cluster‐specific choice of the covariates that are included in this evaluation of homogeneity. The variable selection is implemented by introducing a set of cluster‐specific latent indicators that include or exclude covariates. The proposed model is motivated by an application to predicting mortality in an intensive care unit in Lisboa, Portugal.  相似文献   

17.
Quadratic forms capture multivariate information in a single number, making them useful, for example, in hypothesis testing. When a quadratic form is large and hence interesting, it might be informative to partition the quadratic form into contributions of individual variables. In this paper it is argued that meaningful partitions can be formed, though the precise partition that is determined will depend on the criterion used to select it. An intuitively reasonable criterion is proposed and the partition to which it leads is determined. The partition is based on a transformation that maximises the sum of the correlations between individual variables and the variables to which they transform under a constraint. Properties of the partition, including optimality properties, are examined. The contributions of individual variables to a quadratic form are less clear‐cut when variables are collinear, and forming new variables through rotation can lead to greater transparency. The transformation is adapted so that it has an invariance property under such rotation, whereby the assessed contributions are unchanged for variables that the rotation does not affect directly. Application of the partition to Hotelling's one‐ and two‐sample test statistics, Mahalanobis distance and discriminant analysis is described and illustrated through examples. It is shown that bootstrap confidence intervals for the contributions of individual variables to a partition are readily obtained.  相似文献   

18.
Clusterwise regression aims to cluster data sets where the clusters are characterized by their specific regression coefficients in a linear regression model. In this paper, we propose a method for determining a partition which uses an idea of robust regression. We start with some random weighting to determine a start partition and continue in the spirit of M-estimators. The residuals for all regressions are used to assign the observations to the different groups. As target function we use the determination coefficient R2wR^{2}_{w} for the overall model. This coefficient is suitably defined for weighted regression.  相似文献   

19.
Cell lineage data consist of observations on quantitative characteristics of the descendants of an initial cell. The bifurcating autoregressive model has been previously used to model dependencies in cell lineage data by considering each line of descent to be a first order autoregressive process and allowing the environmental effects of sisters to be correlated. Here the basic bifurcating autoregressive model is modified to include random coefficients which allows for the relationship between mother and daughter cells to depend on environmental factors. Maximum likelihood inference under the assumption of multivariate normality is considered and the method is illustrated on several data sets.  相似文献   

20.
We study the nonparametric maximum likelihood estimate (NPMLE) of the cdf or sub-distribution functions of the failure time for the failure causes in a series system. The study is motivated by a cancer research data (from the Memorial Sloan-Kettering Cancer Center) with interval-censored time and masked failure cause. The NPMLE based on this data set suggests that the existing masking models are not appropriate. We propose a new model called the random partition masking model, which does not rely on the commonly used symmetry assumption (namely, given the failure cause, the probability of observing the masked failure causes is independent of the failure time; see Flehinger et al. Inference about defects in the presence of masking, Technometrics 38 (1996), pp. 247–255). The RPM model is easier to implement in simulation studies than the existing models. We discuss the algorithms for computing the NPMLE and study its asymptotic properties. Our simulation and data analysis indicate that the NPMLE is feasible for a moderate sample size.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号