首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Colours and Cocktails: Compositional Data Analysis 2013 Lancaster Lecture   总被引:1,自引:0,他引:1  
The different constituents of physical mixtures such as coloured paint, cocktails, geological and other samples can be represented by d‐dimensional vectors called compositions with non‐negative components that sum to one. Data in which the observations are compositions are called compositional data. There are a number of different ways of thinking about and consequently analysing compositional data. The log‐ratio methods proposed by Aitchison in the 1980s have become the dominant methods in the field. One reason for this is the development of normative arguments converting the properties of log‐ratio methods to ‘essential requirements’ or Principles for any method of analysis to satisfy. We discuss different ways of thinking about compositional data and interpret the development of the Principles in terms of these different viewpoints. We illustrate the properties on which the Principles are based, focussing particularly on the key subcompositional coherence property. We show that this Principle is based on implicit assumptions and beliefs that do not always hold. Moreover, it is applied selectively because it is not actually satisfied by the log‐ratio methods it is intended to justify. This implies that a more open statistical approach to compositional data analysis should be adopted.  相似文献   

2.
This paper is mainly concerned with modelling data from degradation sample paths over time. It uses a general growth curve model with Box‐Cox transformation, random effects and ARMA(p, q) dependence to analyse a set of such data. A maximum likelihood estimation procedure for the proposed model is derived and future values are predicted, based on the best linear unbiased prediction. The paper compares the proposed model with a nonlinear degradation model from a prediction point of view. Forecasts of failure times with various data lengths in the sample are also compared.  相似文献   

3.
The Cox proportional frailty model with a random effect has been proposed for the analysis of right-censored data which consist of a large number of small clusters of correlated failure time observations. For right-censored data, Cai et al. [3] proposed a class of semiparametric mixed-effects models which provides useful alternatives to the Cox model. We demonstrate that the approach of Cai et al. [3] can be used to analyze clustered doubly censored data when both left- and right-censoring variables are always observed. The asymptotic properties of the proposed estimator are derived. A simulation study is conducted to investigate the performance of the proposed estimator.  相似文献   

4.
In the analysis of semi‐competing risks data interest lies in estimation and inference with respect to a so‐called non‐terminal event, the observation of which is subject to a terminal event. Multi‐state models are commonly used to analyse such data, with covariate effects on the transition/intensity functions typically specified via the Cox model and dependence between the non‐terminal and terminal events specified, in part, by a unit‐specific shared frailty term. To ensure identifiability, the frailties are typically assumed to arise from a parametric distribution, specifically a Gamma distribution with mean 1.0 and variance, say, σ2. When the frailty distribution is misspecified, however, the resulting estimator is not guaranteed to be consistent, with the extent of asymptotic bias depending on the discrepancy between the assumed and true frailty distributions. In this paper, we propose a novel class of transformation models for semi‐competing risks analysis that permit the non‐parametric specification of the frailty distribution. To ensure identifiability, the class restricts to parametric specifications of the transformation and the error distribution; the latter are flexible, however, and cover a broad range of possible specifications. We also derive the semi‐parametric efficient score under the complete data setting and propose a non‐parametric score imputation method to handle right censoring; consistency and asymptotic normality of the resulting estimators is derived and small‐sample operating characteristics evaluated via simulation. Although the proposed semi‐parametric transformation model and non‐parametric score imputation method are motivated by the analysis of semi‐competing risks data, they are broadly applicable to any analysis of multivariate time‐to‐event outcomes in which a unit‐specific shared frailty is used to account for correlation. Finally, the proposed model and estimation procedures are applied to a study of hospital readmission among patients diagnosed with pancreatic cancer.  相似文献   

5.
In this paper, we develop a semiparametric regression model for longitudinal skewed data. In the new model, we allow the transformation function and the baseline function to be unknown. The proposed model can provide a much broader class of models than the existing additive and multiplicative models. Our estimators for regression parameters, transformation function and baseline function are asymptotically normal. Particularly, the estimator for the transformation function converges to its true value at the rate n ? 1 ∕ 2, the convergence rate that one could expect for a parametric model. In simulation studies, we demonstrate that the proposed semiparametric method is robust with little loss of efficiency. Finally, we apply the new method to a study on longitudinal health care costs.  相似文献   

6.
ABSTRACT

Incremental modelling of data streams is of great practical importance, as shown by its applications in advertising and financial data analysis. We propose two incremental covariance matrix decomposition methods for a compositional data type. The first method, exact incremental covariance decomposition of compositional data (C-EICD), gives an exact decomposition result. The second method, covariance-free incremental covariance decomposition of compositional data (C-CICD), is an approximate algorithm that can efficiently compute high-dimensional cases. Based on these two methods, many frequently used compositional statistical models can be incrementally calculated. We take multiple linear regression and principle component analysis as examples to illustrate the utility of the proposed methods via extensive simulation studies.  相似文献   

7.
In the present article, we discuss the regression of a point on the surface of a unit sphere in d dimensions given a point on the surface of a unit sphere in p dimensions, where p may not be equal to d. Point projection is added to the rotation and linear transformation for regression link function. The identifiability of the model is proved. Then, parameter estimation in this set up is discussed. Simulation studies and data analyses are done to illustrate the model.  相似文献   

8.
Given two independent samples of size n and m drawn from univariate distributions with unknown densities f and g, respectively, we are interested in identifying subintervals where the two empirical densities deviate significantly from each other. The solution is built by turning the nonparametric density comparison problem into a comparison of two regression curves. Each regression curve is created by binning the original observations into many small size bins, followed by a suitable form of root transformation to the binned data counts. Turned as a regression comparison problem, several nonparametric regression procedures for detection of sparse signals can be applied. Both multiple testing and model selection methods are explored. Furthermore, an approach for estimating larger connected regions where the two empirical densities are significantly different is also derived, based on a scale-space representation. The proposed methods are applied on simulated examples as well as real-life data from biology.  相似文献   

9.
Abstract. Suppose the random vector (X,Y) satisfies the regression model Y = m(X) + σ (X) ? , where m (?) and σ (?) are unknown location and scale functions and ? is independent of X. The response Y is subject to random right censoring, and the covariate X is completely observed. A new test for a specific parametric form of any scale function σ (?) (including the standard deviation function) is proposed. Its statistic is based on the distribution of the residuals obtained from the assumed regression model. Weak convergence of the corresponding process is obtained, and its finite sample behaviour is studied via simulations. Finally, characteristics of the test are illustrated in the analysis of a fatigue data set.  相似文献   

10.
Failure time data occur in many areas and in various censoring forms and many models have been proposed for their regression analysis such as the proportional hazards model and the proportional odds model. Another choice that has been discussed in the literature is a general class of semiparmetric transformation models, which include the two models above and many others as special cases. In this paper, we consider this class of models when one faces a general type of censored data, case K informatively interval-censored data, for which there does not seem to exist an established inference procedure. For the problem, we present a two-step estimation procedure that is quite flexible and can be easily implemented, and the consistency and asymptotic normality of the proposed estimators of regression parameters are established. In addition, an extensive simulation study is conducted and suggests that the proposed procedure works well for practical situations. An application is also provided.  相似文献   

11.
The authors propose a robust transformation linear mixed‐effects model for longitudinal continuous proportional data when some of the subjects exhibit outlying trajectories over time. It becomes troublesome when including or excluding such subjects in the data analysis results in different statistical conclusions. To robustify the longitudinal analysis using the mixed‐effects model, they utilize the multivariate t distribution for random effects or/and error terms. Estimation and inference in the proposed model are established and illustrated by a real data example from an ophthalmology study. Simulation studies show a substantial robustness gain by the proposed model in comparison to the mixed‐effects model based on Aitchison's logit‐normal approach. As a result, the data analysis benefits from the robustness of making consistent conclusions in the presence of influential outliers. The Canadian Journal of Statistics © 2009 Statistical Society of Canada  相似文献   

12.
灰色成分数据模型在中国产业结构分析预测中的应用   总被引:3,自引:0,他引:3  
针对成分数据这种特殊类型的统计数据,提出一种新的预测建模方法:对于一列按照时间顺序收集的成分数据,先运用对数变换使成分数据降维,然后对降维后的数据运用GM(1,1)模型进行预测,最后再将预测值进行反对数变换,从而得到了各成分的预测值.根据提出的方法,建立了中国产业结构的预测模型,并分析了中国产业结构的发展趋势和未来状况.经检验,运用该方法预测出的数据与实际值十分吻合.  相似文献   

13.
Biplots represent a widely used statistical tool for visualizing the resulting loadings and scores of a dimension reduction technique applied to multivariate data. If the underlying data carry only relative information (i.e. compositional data expressed in proportions, mg/kg, etc.) they have to be pre-processed with a logratio transformation before the dimension reduction is carried out. In the context of principal component analysis, the resulting biplot is called compositional biplot. We introduce an alternative, the ilr biplot, which is based on a special choice of orthonormal coordinates resulting from an isometric logratio (ilr) transformation. This allows to incorporate also external non-compositional variables, and to study the relations to the compositional variables. The methodology is demonstrated on real data sets.  相似文献   

14.
Multivariate normal distribution approaches for dependently truncated data   总被引:1,自引:1,他引:0  
Many statistical methods for truncated data rely on the independence assumption regarding the truncation variable. In many application studies, however, the dependence between a variable X of interest and its truncation variable L plays a fundamental role in modeling data structure. For truncated data, typical interest is in estimating the marginal distributions of (L, X) and often in examining the degree of the dependence between X and L. To relax the independence assumption, we present a method of fitting a parametric model on (L, X), which can easily incorporate the dependence structure on the truncation mechanisms. Focusing on a specific example for the bivariate normal distribution, the score equations and Fisher information matrix are provided. A robust procedure based on the bivariate t-distribution is also considered. Simulations are performed to examine finite-sample performances of the proposed method. Extension of the proposed method to doubly truncated data is briefly discussed.  相似文献   

15.
The analysis of compositional data using the log-ratio approach is based on ratios between the compositional parts. Zeros in the parts thus cause serious difficulties for the analysis. This is a particular problem in case of structural zeros, which cannot be simply replaced by a non-zero value as it is done, e.g. for values below detection limit or missing values. Instead, zeros to be incorporated into further statistical processing. The focus is on exploratory tools for identifying outliers in compositional data sets with structural zeros. For this purpose, Mahalanobis distances are estimated, computed either directly for subcompositions determined by their zero patterns, or by using imputation to improve the efficiency of the estimates, and then proceed to the subcompositional and subgroup level. For this approach, new theory is formulated that allows to estimate covariances for imputed compositional data and to apply estimations on subgroups using parts of this covariance matrix. Moreover, the zero pattern structure is analyzed using principal component analysis for binary data to achieve a comprehensive view of the overall multivariate data structure. The proposed tools are applied to larger compositional data sets from official statistics, where the need for an appropriate treatment of zeros is obvious.  相似文献   

16.
Length‐biased sampling data are often encountered in the studies of economics, industrial reliability, epidemiology, genetics and cancer screening. The complication of this type of data is due to the fact that the observed lifetimes suffer from left truncation and right censoring, where the left truncation variable has a uniform distribution. In the Cox proportional hazards model, Huang & Qin (Journal of the American Statistical Association, 107, 2012, p. 107) proposed a composite partial likelihood method which not only has the simplicity of the popular partial likelihood estimator, but also can be easily performed by the standard statistical software. The accelerated failure time model has become a useful alternative to the Cox proportional hazards model. In this paper, by using the composite partial likelihood technique, we study this model with length‐biased sampling data. The proposed method has a very simple form and is robust when the assumption that the censoring time is independent of the covariate is violated. To ease the difficulty of calculations when solving the non‐smooth estimating equation, we use a kernel smoothed estimation method (Heller; Journal of the American Statistical Association, 102, 2007, p. 552). Large sample results and a re‐sampling method for the variance estimation are discussed. Some simulation studies are conducted to compare the performance of the proposed method with other existing methods. A real data set is used for illustration.  相似文献   

17.
Among the diverse frameworks that have been proposed for regression analysis of angular data, the projected multivariate linear model provides a particularly appealing and tractable methodology. In this model, the observed directional responses are assumed to correspond to the angles formed by latent bivariate normal random vectors that are assumed to depend upon covariates through a linear model. This implies an angular normal distribution for the observed angles, and incorporates a regression structure through a familiar and convenient relationship. In this paper we extend this methodology to accommodate clustered data (e.g., longitudinal or repeated measures data) by formulating a marginal version of the model and basing estimation on an EM‐like algorithm in which correlation among within‐cluster responses is taken into account by incorporating a working correlation matrix into the M step. A sandwich estimator is used for the parameter estimates’ covariance matrix. The methodology is motivated and illustrated using an example involving clustered measurements of microbril angle on loblolly pine (Pinus taeda L.) Simulation studies are presented that evaluate the finite sample properties of the proposed fitting method. In addition, the relationship between within‐cluster correlation on the latent Euclidean vectors and the corresponding correlation structure for the observed angles is explored.  相似文献   

18.
Although there is no shortage of clustering algorithms proposed in the literature, the question of the most relevant strategy for clustering compositional data (i.e. data whose rows belong to the simplex) remains largely unexplored in cases where the observed value is equal or close to zero for one or more samples. This work is motivated by the analysis of two applications, both focused on the categorization of compositional profiles: (1) identifying groups of co-expressed genes from high-throughput RNA sequencing data, in which a given gene may be completely silent in one or more experimental conditions; and (2) finding patterns in the usage of stations over the course of one week in the Velib' bicycle sharing system in Paris, France. For both of these applications, we make use of appropriately chosen data transformations, including the Centered Log Ratio and a novel extension called the Log Centered Log Ratio, in conjunction with the K-means algorithm. We use a non-asymptotic penalized criterion, whose penalty is calibrated with the slope heuristics, to select the number of clusters. Finally, we illustrate the performance of this clustering strategy, which is implemented in the Bioconductor package coseq, on both the gene expression and bicycle sharing system data.  相似文献   

19.
A bivariate generalized linear model is developed as a mixture distribution with one component of the mixture being discrete with probability mass only at the origin. The use of the proposed model is illustrated by analyzing local area meteorological measurements with constant correlation structure that incorporates predictor variables. The Monte Carlo study is performed to evaluate the inferential efficiency of model parameters for two types of true models. These results suggest that the estimates of regression parameters are consistent and the efficiency of the inference increases for the proposed model for ρ≥0.50 especially in larger samples. As an illustration of a bivariate generalized linear model, we analyze a precipitation monitoring data of adjacent local stations for Tokyo and Yokohama.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号