首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.  相似文献   

2.
The emphasis in the literature is on normalizing transformations, despite the greater importance of the homogeneity of variance in analysis. A strategy for a choice of variance-stabilizing transformation is suggested. The relevant component of variation must be identified and, when this is not within-subject variation, a major explanatory variable must also be selected to subdivide the data. A plot of group standard deviation against group mean, or log standard deviation against log mean, may identify a simple power transformation or shifted log transformation. In other cases, within the shifted Box-Cox family of transformations, a contour plot to show the region of minimum heterogeneity defined by an appropriate index is proposed to enable an informed choice of transformation. If used in conjunction with the maximum-likelihood contour plot for the normalizing transformation, then it is possible to assess whether or not there exists a transformation that satisfies both criteria.  相似文献   

3.
The Box-Cox power family of transformations for multivariate regression data is considered. The influence of cases on the maximum likelihood estimators of the transformation parameters is investigated using the local influence approach, An example is given to- illustrate the local influence method and to show the effectiveness of the method.  相似文献   

4.
The authors propose a robust transformation linear mixed‐effects model for longitudinal continuous proportional data when some of the subjects exhibit outlying trajectories over time. It becomes troublesome when including or excluding such subjects in the data analysis results in different statistical conclusions. To robustify the longitudinal analysis using the mixed‐effects model, they utilize the multivariate t distribution for random effects or/and error terms. Estimation and inference in the proposed model are established and illustrated by a real data example from an ophthalmology study. Simulation studies show a substantial robustness gain by the proposed model in comparison to the mixed‐effects model based on Aitchison's logit‐normal approach. As a result, the data analysis benefits from the robustness of making consistent conclusions in the presence of influential outliers. The Canadian Journal of Statistics © 2009 Statistical Society of Canada  相似文献   

5.
We have tested alternative models of the demand for medical care using experimental data. The estimated response of demand to insurance plan is sensitive to the model used. We therefore use a split-sample analysis and find that a model that more closely approximates distributional assumptions and uses a nonparametric retransformation factor performs better in terms of mean squared forecast error. Simpler models are inferior either because they are not robust to outliers (e.g., ANOVA, ANOCOVA), or because they are inconsistent when strong distributional assumptions are violated (e.g., a two-parameter Box-Cox transformation).  相似文献   

6.
Box-Cox transformation is one of the most commonly used methodologies when data do not follow normal distribution. However, its use is restricted since it usually requires the availability of covariates. In this article, the use of a non-informative auxiliary variable is proposed for the implementation of Box-Cox transformation. Simulation studies are conducted to illustrate that the proposed approach is successful in attaining normality under different sample sizes and most of the distributions and in estimating transformation parameter for different sample sizes and mean-variance combinations. Methodology is illustrated on two real-life datasets.  相似文献   

7.
To transform the F distribution to a normal distribution, two types of formula for power transformation of the F variable are introduced. One formula is an extension of the Wilson-Hilferty transformation for the chi 2 variable, and the other type is based on the median of the F distribution. Combining those two formulas, a simple formula for the median of the F distribution is derived, and its numerical accuracy is evaluated. Simplification of the formula of the Wilson-Hilferty transformation, through the median formula, leads us to construct a power normal family from the generalized F distribution. Unlike the Box-Cox power normal family, our family has a property that the covariance structure of the maximum-likelihood estimates of the parameters is invariant under a scale transformation of the response variable. Numerical examples are given to show the diff erence between two power normal families.  相似文献   

8.
The pretest–posttest design is widely used to investigate the effect of an experimental treatment in biomedical research. The treatment effect may be assessed using analysis of variance (ANOVA) or analysis of covariance (ANCOVA). The normality assumption for parametric ANOVA and ANCOVA may be violated due to outliers and skewness of data. Nonparametric methods, robust statistics, and data transformation may be used to address the nonnormality issue. However, there is no simultaneous comparison for the four statistical approaches in terms of empirical type I error probability and statistical power. We studied 13 ANOVA and ANCOVA models based on parametric approach, rank and normal score-based nonparametric approach, Huber M-estimation, and Box–Cox transformation using normal data with and without outliers and lognormal data. We found that ANCOVA models preserve the nominal significance level better and are more powerful than their ANOVA counterparts when the dependent variable and covariate are correlated. Huber M-estimation is the most liberal method. Nonparametric ANCOVA, especially ANCOVA based on normal score transformation, preserves the nominal significance level, has good statistical power, and is robust for data distribution.  相似文献   

9.
This paper considers the analysis of linear models where the response variable is a linear function of observable component variables. For example, scores on two or more psychometric measures (the component variables) might be weighted and summed to construct a single response variable in a psychological study. A linear model is then fit to the response variable. The question addressed in this paper is how to optimally transform the component variables so that the response is approximately normally distributed. The transformed component variables, themselves, need not be jointly normal. Two cases are considered; in both cases, the Box-Cox power family of transformations is employed. In Case I, the coefficients of the linear transformation are known constants. In Case II, the linear function is the first principal component based on the matrix of correlations among the transformed component variables. For each case, an algorithm is described for finding the transformation powers that minimize a generalized Anderson-Darling statistic. The proposed transformation procedure is compared to likelihood-based methods by means of simulation. The proposed method rarely performed worse than likelihood-based methods and for many data sets performed substantially better. As an illustration, the algorithm is applied to a problem from rural sociology and social psychology; namely scaling family residences along an urban-rural dimension.  相似文献   

10.
The behavior of the Box-Cox estimate of power transformation is further examined. Through the asymptotic expansions and small-σ approximations, the exact nature of dependence of transformation estimation on the model structure, the spread of the means and the error variance is revealed. The results are shown to be useful in assessing what Box and Cox called transformation potential of a particular data set.  相似文献   

11.
The main purpose of this paper is to give an algorithm to attain joint normality of non-normal multivariate observations through a new power normal family introduced by the author (Isogai, 1999). The algorithm tries to transform each marginal variable simultaneously to joint normality, but due to a large number of parameters it repeats a maximization process with respect to the conditional normal density of one transformed variable given the other transformed variables. A non-normal data set is used to examine performance of the algorithm, and the degree of achievement of joint normality is evaluated by measures of multivariate skewness and kurtosis. Besides the above topic, making use of properties of our power normal family, we discuss not only a normal approximation formula of non-central F distributions in the frame of regression analysis but also some decomposition formulas of a power parameter, which appear in a Wilson-Hilferty power transformation setting.  相似文献   

12.
We provide a method for simultaneous variable selection and outlier identification using the mean-shift outlier model. The procedure consists of two steps: the first step is to identify potential outliers, and the second step is to perform all possible subset regressions for the mean-shift outlier model containing the potential outliers identified in step 1. This procedure is helpful for model selection while simultaneously considering outlier identification, and can be used to identify multiple outliers. In addition, we can evaluate the impact on the regression model of simultaneous omission of variables and interesting observations. In an example, we provide detailed output from the R system, and compare the results with those using posterior model probabilities as proposed by Hoeting et al. [Comput. Stat. Data Anal. 22 (1996), pp. 252-270] for simultaneous variable selection and outlier identification.  相似文献   

13.
Maclean et al. (1976) applied a specific Box-Cox transformation to test for mixtures of distributions against a single distribution. Their null hypothesis is that a sample of n observations is from a normal distribution with unknown mean and variance after a restricted Box-Cox transformation. The alternative is that the sample is from a mixture of two normal distributions, each with unknown mean and unknown, but equal, variance after another restricted Box-Cox transformation. We developed a computer program that calculated the maximum likelihood estimates (MLEs) and likelihood ratio test (LRT) statistic for the above. Our algorithm for the calculation of the MLEs of the unknown parameters used multiple starting points to protect against convergence to a local rather than global maximum. We then simulated the distribution of the LRT for samples drawn from a normal distribution and five Box-Cox transformations of a normal distribution. The null distribution appeared to be the same for the Box-Cox transformations studied and appeared to be distributed as a chi-square random variable for samples of 25 or more. The degrees of freedom parameter appeared to be a monotonically decreasing function of the sample size. The null distribution of this LRT appeared to converge to a chi-square distribution with 2.5 degrees of freedom. We estimated the critical values for the 0.10, 0.05, and 0.01 levels of significance.  相似文献   

14.
The performance of Box-Cox power transformations in classification using Hinkley's (1975) method is studied. Misclassification probabilities before and after transformation are compared. It is found that the use of Box-Cox transformations can sometimes substantially reduce the error probabilities. Estimates of error probabilities are obtained and certain properties are derived. Examples for a number of distributions are given.  相似文献   

15.
The union-intersection approach to multivariate test construction is used to develop an alternative to Wilks' likelihood ratio test statistic for testing for two or more outliers in multivariate normal data. It is shown that critical values of both statistics are poorly approximated by Bonferroni bounds. Simulated critical values are presented for both statistics for significance levels 1% and 5%, for sample sizes 10(5)30, 40, 50, 75 and 100 for 2, 3, 4 and 5 dimensions. A power comparison of the two tests in the slippage of the mean model for generating outliers indicates that the union-intersection test is the more powerful when the slippages are close to collinear. Although Wilks' test remains the preference for general use, the union-intersection test could be valuable when such special structure in the data is suspected.  相似文献   

16.
In time series analysis, the Box-Cox power transformation is generally used for variance stabilization. In this paper we show that the order and the first step ahead forecast of the transformed model are approximately invariant to those of the original model under certain assumptions on the mean and variance. A small Monte Carlo simulation is performed to support the results.  相似文献   

17.
Many methods have been developed for detecting multiple outliers in a single multivariate sample, but very few for the case where there may be groups in the data. We propose a method of simultaneously determining groups (as in cluster analysis) and detecting outliers, which are points that are distant from every group. Our method is an adaptation of the BACON algorithm proposed by Billor, Hadi and Velleman for the robust detection of multiple outliers in a single group of multivariate data. There are two versions of our method, depending on whether or not the groups can be assumed to have equal covariance matrices. The effectiveness of the method is illustrated by its application to two real data sets and further shown by a simulation study for different sample sizes and dimensions for 2 and 3 groups, with and without planted outliers in the data. When the number of groups is not known in advance, the algorithm could be used as a robust method of cluster analysis, by running it for various numbers of groups and choosing the best solution.  相似文献   

18.
19.
The phenomenon of crossing hazard rates is common in clinical trials with time to event endpoints. Many methods have been proposed for testing equality of hazard functions against a crossing hazards alternative. However, there has been relatively few approaches available in the literature for point or interval estimation of the crossing time point. The problem of constructing confidence intervals for the first crossing time point of two hazard functions is considered in this paper. After reviewing a recent procedure based on Cox proportional hazard modeling with Box-Cox transformation of the time to event, a nonparametric procedure using the kernel smoothing estimate of the hazard ratio is proposed. The proposed procedure and the one based on Cox proportional hazard modeling with Box-Cox transformation of the time to event are both evaluated by Monte–Carlo simulations and applied to two clinical trial datasets.  相似文献   

20.
Early investigations of the effects of non-normality indicated that skewness has a greater effect on the distribution of t-statistic than does kurtosis. When the distribution is skewed, the actual p-values can be larger than the values calculated from the t-tables. Transformation of data to normality has shown good results in the case of univariate t-test. In order to reduce the effect of skewness of the distribution on normal-based t-test, one can transform the data and perform the t-test on the transformed scale. This method is not only a remedy for satisfying the distributional assumption, but it also turns out that one can achieve greater efficiency of the test. We investigate the efficiency of tests after a Box-Cox transformation. In particular, we consider the one sample test of location and study the gains in efficiency for one-sample t-test following a Box-Cox transformation. Under some conditions, we prove that the asymptotic relative efficiency of transformed t-test and Hotelling's T 2-test of multivariate location with respect to the same statistic based on untransformed data is at least one.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号