首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The importance of the normal distribution for fitting continuous data is well known. However, in many practical situations data distribution departs from normality. For example, the sample skewness and the sample kurtosis are far away from 0 and 3, respectively, which are nice properties of normal distributions. So, it is important to have formal tests of normality against any alternative. D'Agostino et al. [A suggestion for using powerful and informative tests of normality, Am. Statist. 44 (1990), pp. 316–321] review four procedures Z 2(g 1), Z 2(g 2), D and K 2 for testing departure from normality. The first two of these procedures are tests of normality against departure due to skewness and kurtosis, respectively. The other two tests are omnibus tests. An alternative to the normal distribution is a class of skew-normal distributions (see [A. Azzalini, A class of distributions which includes the normal ones, Scand. J. Statist. 12 (1985), pp. 171–178]). In this paper, we obtain a score test (W) and a likelihood ratio test (LR) of goodness of fit of the normal regression model against the skew-normal family of regression models. It turns out that the score test is based on the sample skewness and is of very simple form. The performance of these six procedures, in terms of size and power, are compared using simulations. The level properties of the three statistics LR, W and Z 2(g 1) are similar and close to the nominal level for moderate to large sample sizes. Also, their power properties are similar for small departure from normality due to skewness (γ1≤0.4). Of these, the score test statistic has a very simple form and computationally much simpler than the other two statistics. The LR statistic, in general, has highest power, although it is computationally much complex as it requires estimates of the parameters under the normal model as well as those under the skew-normal model. So, the score test may be used to test for normality against small departure from normality due to skewness. Otherwise, the likelihood ratio statistic LR should be used as it detects general departure from normality (due to both skewness and kurtosis) with, in general, largest power.  相似文献   

2.
It is well known that the Pearson statistic \(\chi ^{2}\) can perform poorly in studying the association between ordinal categorical variables. Taguchi’s and Hirotsu’s statistics have been introduced in the literature as simple alternatives to Pearson’s chi-squared test for contingency tables with ordered categorical variables. The aim of this paper is to shed new light on these statistics, stressing their interpretations and characteristics, providing in this way new and different interpretations of these statistics. Moreover, a theoretical scheme is developed showing the links between the different proposals and classes of cumulative chi-squared statistical tests, starting from a unifying index of heterogeneity, unalikeability and variability measures. Users of statistics may find it attractive to understand well the different proposals. Some decompositions of both statistics are also highlighted. This paper presents a case study of optimizing the polysilicon deposition process in a very large-scale integrated circuit, to identify the optimal combination of factor levels. It is obtained by means of the information coming from a correspondence analysis based on Taguchi’s statistic and regression models for binary dependent variables. A new optimal combination of factor levels is obtained, different from many others proposed in the literature for this data.  相似文献   

3.
Stochastic frontier models are widely used to measure, e.g., technical efficiencies of firms. The classical stochastic frontier model often suffers from the empirical artefact that the residuals of the production function may have a positive skewness, whereas a negative one is expected under the model, which leads to estimated full efficiencies of all firms. We propose a new approach to the problem by generalizing the distribution used for the inefficiency variable. This generalized stochastic frontier model allows the sample data to have the wrong skewness while estimating well-defined and nondegenerate efficiency measures. We discuss the statistical properties of the model, and we discuss a test for the symmetry of the error term (no inefficiency). We provide a simulation study to show that our model delivers estimators of efficiency with smaller bias than those of the classical model even if the population skewness has the correct sign. Finally, we apply the model to data of the U.S. textile industry for 1958–2005 and show that for a number of years our model suggests technical efficiencies well below the frontier while the classical one estimates no inefficiency in those years.  相似文献   

4.
This paper studies four methods for estimating the Box-Cox parameter used to transform data to normality. Three of these are based on optimizing test statistics for standard normality tests (the Shapiro-Wilk. skewness, and kurtosis tests); the fourth uses the maximum likelihood estimator of the Box-Cox parameter. The four methods are compared and evaluated with a simulation study, where their performances under different skewness and kurtosis conditions are analyzed. The estimator based on optimizing the Shapiro-Wilk statistic generally gives rise to the best transformations, while the maximum likelihood estimator performs almost as well. Estimators based on optimizing skewness and kurtosis do not perform well in general.  相似文献   

5.
New measures of skewness for real-valued random variables are proposed. The measures are based on a functional representation of real-valued random variables. Specifically, the expected value of the transformed random variable can be used to characterize the distribution of the original variable. Firstly, estimators of the proposed skewness measures are analyzed. Secondly, asymptotic tests for symmetry are developed. The tests are consistent for both discrete and continuous distributions. Bootstrap versions improving the empirical results for moderated and small samples are provided. Some simulations illustrate the performance of the tests in comparison to other methods. The results show that our procedures are competitive and have some practical advantages.  相似文献   

6.
Recently, a new procedure for distribution fitting, based on matching of the first two moments, partial and complete, was introduced (Shore, 1995). When the sampling skewness of the fitted distribution is compared to the sample skewness, and both are regarded as estimates of the skewness of the underlying distribution, the mean-squared-error of the former is appreciably lower than that of the latter. In this paper we present some simulation results to support this claim and demonstrate its magnitude. An alternative two-moment distributional fitting procedure, based on a new family of four-parameter distributions, is also introduced and studied. Since three-moment distribution fitting is very common practice in simulation studies, these results may have important implications for the current state-of-the-art of simulation  相似文献   

7.
In several applied disciplines, as Economics, Marketing, Business, Sociology, Psychology, Political science, Environmental research and Medicine, it is common to collect data in the form of ordered categorical observations. In this paper, we introduce a class of models based on mixtures of discrete random variables in order to specify a general framework for the statistical analysis of this kind of data. The structure of these models allows the interpretation of the final response as related to feeling, uncertainty and a possible shelter option and the expression of the relationship among these components and subjects’ covariates. Such a model may be effectively estimated by maximum likelihood methods leading to asymptotically efficient inference. We present a simulation experiment and discuss a real case study to check the consistency and the usefulness of the approach. Some final considerations conclude the paper.  相似文献   

8.
It is well known that statistical classifiers trained from imbalanced data lead to low true positive rates and select inconsistent significant variables. In this article, an improved method is proposed to enhance the classification accuracy for the minority class by differentiating misclassification cost for each group. The overall error rate is replaced by an alternative composite criterion. Furthermore, we propose an approach to estimate the tuning parameter, the composite criterion, and the cut-point simultaneously. Simulations show that the proposed method achieves a high true positive rate on prediction and a good performance on variable selection for both continuous and categorical predictors, even with highly imbalanced data. An illustrative example of the analysis of the suboptimal health state data in traditional Chinese medicine is discussed to show the reasonable application of the proposed method.  相似文献   

9.
We introduce the 2nd-power skewness and kurtosis, which are interesting alternatives to the classical Pearson's skewness and kurtosis, called 3rd-power skewness and 4th-power kurtosis in our terminology. We use the sample 2nd-power skewness and kurtosis to build a powerful test of normality. This test can also be derived as Rao's score test on the asymmetric power distribution, which combines the large range of exponential tail behavior provided by the exponential power distribution family with various levels of asymmetry. We find that our test statistic is asymptotically chi-squared distributed. We also propose a modified test statistic, for which we show numerically that the distribution can be approximated for finite sample sizes with very high precision by a chi-square. Similarly, we propose a directional test based on sample 2nd-power kurtosis only, for the situations where the true distribution is known to be symmetric. Our tests are very similar in spirit to the famous Jarque–Bera test, and as such are also locally optimal. They offer the same nice interpretation, with in addition the gold standard power of the regression and correlation tests. An extensive empirical power analysis is performed, which shows that our tests are among the most powerful normality tests. Our test is implemented in an R package called PoweR.  相似文献   

10.
The location model is a familiar basis for discriminant analysis of mixtures of categorical and continuous variables. Its usual implementation involves second-order smoothing, using multivariate regression for the continuous variables and log-linear models for the categorical variables. In spite of the smoothing, these procedures still require many parameters to be estimated and this in turn restricts the categorical variables to a small number if implementation is to be feasible. In this paper we propose non-parametric smoothing procedures for both parts of the model. The number of parameters to be estimated is dramatically reduced and the range of applicability thereby greatly increased. The methods are illustrated on several data sets, and the performances are compared with a range of other popular discrimination techniques. The proposed method compares very favourably with all its competitors.  相似文献   

11.
In this paper, we present an innovative method for constructing proper priors for the skewness (shape) parameter in the skew‐symmetric family of distributions. The proposed method is based on assigning a prior distribution on the perturbation effect of the shape parameter, which is quantified in terms of the total variation distance. We discuss strategies to translate prior beliefs about the asymmetry of the data into an informative prior distribution of this class. We show via a Monte Carlo simulation study that our non‐informative priors induce posterior distributions with good frequentist properties, similar to those of the Jeffreys prior. Our informative priors yield better results than their competitors from the literature. We also propose a scale‐invariant and location‐invariant prior structure for models with unknown location and scale parameters and provide sufficient conditions for the propriety of the corresponding posterior distribution. Illustrative examples are presented using simulated and real data.  相似文献   

12.
Correspondence analysis (CA) has gained a reputation for being a very useful statistical technique for determining the nature of association between two or more categorical variables. For simple and multiple CA, the singular value decomposition (SVD) is the primary tool used and allows the user to construct a low-dimensional space to visualize this association. As an alternative to SVD, one may consider the bivariate moment decomposition (BMD), a method of decomposition that involves using orthogonal polynomials to reflect the structure of ordered categorical responses. When the features of BMD are combined with SVD, a hybrid decomposition (HD) is formed. The aim of this paper is to show the applicability of HD when performing simple and multiple CA.  相似文献   

13.
The skew-normal model is a class of distributions that extends the Gaussian family by including a skewness parameter. This model presents some inferential problems linked to the estimation of the skewness parameter. In particular its maximum likelihood estimator can be infinite especially for moderate sample sizes and is not clear how to calculate confidence intervals for this parameter. In this work, we show how these inferential problems can be solved if we are interested in the distribution of extreme statistics of two random variables with joint normal distribution. Such situations are not uncommon in applications, especially in medical and environmental contexts, where it can be relevant to estimate the distribution of extreme statistics. A theoretical result, found by Loperfido [7 Loperfido, N. 2002. Statistical implications of selectively reported inferential results. Statist. Probab. Lett., 56: 1322. [Crossref], [Web of Science ®] [Google Scholar]], proves that such extreme statistics have a skew-normal distribution with skewness parameter that can be expressed as a function of the correlation coefficient between the two initial variables. It is then possible, using some theoretical results involving the correlation coefficient, to find approximate confidence intervals for the parameter of skewness. These theoretical intervals are then compared with parametric bootstrap intervals by means of a simulation study. Two applications are given using real data.  相似文献   

14.
Imbalanced data brings biased classification and causes the low accuracy of the classification of the minority class. In this article, we propose a methodology to select grouped variables using the area under the ROC with an adjustable prediction cut point. The proposed method enhance the accuracy of classification for the minority class by maximizing the true positive rate. Simulation results show that the proposed method is appropriate for both the categorical and continuous covariates. An illustrative example of the analysis of the SHS data in TCM is discussed to show the reasonable application of the proposed method.  相似文献   

15.
Model-based clustering methods for continuous data are well established and commonly used in a wide range of applications. However, model-based clustering methods for categorical data are less standard. Latent class analysis is a commonly used method for model-based clustering of binary data and/or categorical data, but due to an assumed local independence structure there may not be a correspondence between the estimated latent classes and groups in the population of interest. The mixture of latent trait analyzers model extends latent class analysis by assuming a model for the categorical response variables that depends on both a categorical latent class and a continuous latent trait variable; the discrete latent class accommodates group structure and the continuous latent trait accommodates dependence within these groups. Fitting the mixture of latent trait analyzers model is potentially difficult because the likelihood function involves an integral that cannot be evaluated analytically. We develop a variational approach for fitting the mixture of latent trait models and this provides an efficient model fitting strategy. The mixture of latent trait analyzers model is demonstrated on the analysis of data from the National Long Term Care Survey (NLTCS) and voting in the U.S. Congress. The model is shown to yield intuitive clustering results and it gives a much better fit than either latent class analysis or latent trait analysis alone.  相似文献   

16.
In this paper we study the biases of jackknife estimators of central third moments which play an important role in improving the accuracy of the normal approximation. It has been found in simulation studies that the jackknife estimator of the skewness coefficient, into which the jackknife variance and third moment estimators are substituted, have downward biases. For the jackknife variance estimators, their asymptotic properties are precisely studied and their biases are discussed theoretically, Here we study the biases of the jackknife estimators of the central third moments for U-statistics theoretically, The results show that the biases are not always downward.  相似文献   

17.
The estimation of population parameters of the continuous common factor model from categorical observed variables is meanwhile regularly performed. It is shown that the formula for the calculation of the determinacy of the regression factor score predictor from the estimated model parameters has to be adapted under these conditions. A method for the calculation of this determinacy from the model parameters of the continuous population factor model based on categorical variables is proposed and evaluated by means of simulated population data. It turns out that using the uncorrected formula can lead to serious overestimation of determinacy for categorical variables.  相似文献   

18.
ABSTRACT

Classification of data consisting of both categorical and continuous variables between two groups is often handled by the sample location linear discriminant function confined to each of the locations specified by the observed values of the categorical variables. Homoscedasticity of across-location conditional dispersion matrices of the continuous variables is often assumed. Quite often, interactions between continuous and categorical variables cause across-location heteroscedasticity. In this article, we examine the effect of heterogeneous across-location conditional dispersion matrices on the overall expected and actual error rates associated with the sample location linear discriminant function. Performance of the sample location linear discriminant function is evaluated against the results for the restrictive classifier adjusted for across-location heteroscedasticity. Conclusions based on a Monte Carlo study are reported.  相似文献   

19.
We develop a discrete-time affine stochastic volatility model with time-varying conditional skewness (SVS). Importantly, we disentangle the dynamics of conditional volatility and conditional skewness in a coherent way. Our approach allows current asset returns to be asymmetric conditional on current factors and past information, which we term contemporaneous asymmetry. Conditional skewness is an explicit combination of the conditional leverage effect and contemporaneous asymmetry. We derive analytical formulas for various return moments that are used for generalized method of moments (GMM) estimation. Applying our approach to S&P500 index daily returns and option data, we show that one- and two-factor SVS models provide a better fit for both the historical and the risk-neutral distribution of returns, compared to existing affine generalized autoregressive conditional heteroscedasticity (GARCH), and stochastic volatility with jumps (SVJ) models. Our results are not due to an overparameterization of the model: the one-factor SVS models have the same number of parameters as their one-factor GARCH competitors and less than the SVJ benchmark.  相似文献   

20.
Abstract

Statistical distributions are very useful in describing and predicting real world phenomena. In many applied areas there is a clear need for the extended forms of the well-known distributions. Generally, the new distributions are more flexible to model real data that present a high degree of skewness and kurtosis. The choice of the best-suited statistical distribution for modeling data is very important.

In this article, we proposed an extended generalized Gompertz (EGGo) family of EGGo. Certain statistical properties of EGGo family including distribution shapes, hazard function, skewness, limit behavior, moments and order statistics are discussed. The flexibility of this family is assessed by its application to real data sets and comparison with other competing distributions. The maximum likelihood equations for estimating the parameters based on real data are given. The performances of the estimators such as maximum likelihood estimators, least squares estimators, weighted least squares estimators, Cramer-von-Mises estimators, Anderson-Darling estimators and right tailed Anderson-Darling estimators are discussed. The likelihood ratio test is derived to illustrate that the EGGo distribution is better than other nested models in fitting data set or not. We use R software for simulation in order to perform applications and test the validity of this model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号