首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 468 毫秒
1.
A normal quantile-quantile (QQ) plot is an important diagnostic for checking the assumption of normality. Though useful, these plots confuse students in my introductory statistics classes. A water-filling analogy, however, intuitively conveys the underlying concept. This analogy characterizes a QQ plot as a parametric plot of the water levels in two gradually filling vases. Each vase takes its shape from a probability distribution or sample. If the vases share a common shape, then the water levels match throughout the filling, and the QQ plot traces a diagonal line. An R package qqvases provides an interactive animation of this process and is suitable for classroom use.  相似文献   

2.
Quantile-quantile plots are most commonly used to compare the shapes of distributions, but they may also be used in conjunction with partial orders on distributions to compare the level and dispersion of distributions that have different shapes. We discuss several easily recognized patterns in quantile-quantile plots that suffice to demonstrate that one distribution is smaller than another in terms of each of several partial orders. We illustrate with financial applications, proposing a quantile plot for comparing the risks and returns of portfolios of investments. As competing portfolios have distributions that differ in level, dispersion, and shape, it is not sufficient to compare portfolios using measures of location and dispersion, such as expected returns and variances; however, quantile plots, with suitable scaling, do aid in such comparisons. In two plots, we compare specific portfolios to the stock market as a whole, finding these portfolios to have higher returns, greater risks or dispersion, thicker tails than their greater dispersion alone would justify. Nonetheless, investors in these risky portfolios are more than adequately compensated for the risks undertaken.  相似文献   

3.
There are a large number of different definitions used for sample quantiles in statistical computer packages. Often within the same package one definition will be used to compute a quantile explicitly, while other definitions may be used when producing a boxplot, a probability plot, or a QQ plot. We compare the most commonly implemented sample quantile definitions by writing them in a common notation and investigating their motivation and some of their properties. We argue that there is a need to adopt a standard definition for sample quantiles so that the same answers are produced by different packages and within each package. We conclude by recommending that the median-unbiased estimator be used because it has most of the desirable properties of a quantile estimator and can be defined independently of the underlying distribution.  相似文献   

4.
In a variety of settings, it is desirable to display a collection of likelihoods over a common interval. One approach is simply to superimpose the likelihood curves. However, where there are more than a handful of curves, such displays are extremely difficult to decipher. An alternative is simply to display a point estimate with a confidence interval, corresponding to each likelihood. However, these may be inadequate when the likelihood is not approximately normal, as can occur with small sample sizes or nonlinear models. A second dimension is needed to gauge the relative plausibility of different parameter values. We introduce the raindrop plot, a shaded figure over the range of parameter values having log-likelihood greater than some cutoff, with height varying proportional to the difference between the log-likelihood and the cutoff. In the case of a normal likelihood, this produces a reflected parabola so that deviations from normality can be easily detected. An analogue of the raindrop plot can also be used to display estimated random effect distributions, posterior distributions, and predictive distributions.  相似文献   

5.

A basic graphical approach for checking normality is the Q - Q plot that compares sample quantiles against the population quantiles. In the univariate analysis, the probability plot correlation coefficient test for normality has been studied extensively. We consider testing the multivariate normality by using the correlation coefficient of the Q - Q plot. When multivariate normality holds, the sample squared distance should follow a chi-square distribution for large samples. The plot should resemble a straight line. A correlation coefficient test can be constructed by using the pairs of points in the probability plot. When the correlation coefficient test does not reject the null hypothesis, the sample data may come from a multivariate normal distribution or some other distributions. So, we use the following two steps to test multivariate normality. First, we check the multivariate normality by using the probability plot correction coefficient test. If the test does not reject the null hypothesis, then we test symmetry of the distribution and determine whether multivariate normality holds. This test procedure is called the combination test. The size and power of this test are studied, and it is found that the combination test, in general, is more powerful than other tests for multivariate normality.  相似文献   

6.
Yuzhi Cai 《Econometric Reviews》2016,35(7):1173-1193
This article proposed a general quantile function model that covers both one- and multiple-dimensional models and that takes several existing models in the literature as its special cases. This article also developed a new uniform Bayesian framework for quantile function modelling and illustrated the developed approach through different quantile function models. Many distributions are defined explicitly only via their quanitle functions as the corresponding distribution or density functions do not have an explicit mathematical expression. Such distributions are rarely used in economic and financial modelling in practice. The developed methodology makes it more convenient to use these distributions in analyzing economic and financial data. Empirical applications to economic and financial time series and comparisons with other types of models and methods show that the developed method can be very useful in practice.  相似文献   

7.
《随机性模型》2013,29(2-3):377-400
Abstract

It is well known that general phase-type distributions are considerably overparameterized, that is, their representations often require many more parameters than is necessary to define the distributions. In addition, phase-type distributions, even those defined by a small number of parameters, may have representations of high order. These two problems have serious implications when using phase-type distributions to fit data. To address this issue we consider fitting data with the wider class of matrix-exponential distributions. Representations for matrix-exponential distributions do not need to have a simple probabilistic interpretation, and it is this relaxation which ensures that the problems of overparameterization and high order do not present themselves. However, when using matrix-exponential distributions to fit data, a problem arises because it is unknown, in general, when their representations actually correspond to a distribution. In this paper we develop a characterization for matrix-exponential distributions and use it in a method to fit data using maximum likelihood estimation. The fitting algorithm uses convex semi-infinite programming combined with a nonlinear search.  相似文献   

8.
It is well known that long-term exposure to high levels of pollution is hazardous to human health. Therefore, it is important to study and understand the behavior of pollutants in general. In this work, we study the occurrence of a pollutant concentration's surpassing a given threshold (an exceedance) as well as the length of time that the concentration stays above it. A general N(t)/D/1 queueing model is considered to jointly analyze those problems. A non-homogeneous Poisson process is used to model the arrivals of clusters of exceedances. Geometric and generalized negative binomial distributions are used to model the amount of time (cluster size) that the pollutant concentration stays above the threshold. A mixture model is also used for the cluster size distribution. The rate function of the non-homogeneous Poisson process is assumed to be of either the Weibull or the Musa–Okumoto type. The selection of the model that best fits the data is performed using the Bayes discrimination method and the sum of absolute differences as well as using a graphical criterion. Results are applied to the daily maximum ozone measurements provided by the monitoring network of the Metropolitan Area of Mexico City.  相似文献   

9.
Abstract. The modelling process in Bayesian Statistics constitutes the fundamental stage of the analysis, since depending on the chosen probability laws the inferences may vary considerably. This is particularly true when conflicts arise between two or more sources of information. For instance, inference in the presence of an outlier (which conflicts with the information provided by the other observations) can be highly dependent on the assumed sampling distribution. When heavy‐tailed (e.g. t) distributions are used, outliers may be rejected whereas this kind of robust inference is not available when we use light‐tailed (e.g. normal) distributions. A long literature has established sufficient conditions on location‐parameter models to resolve conflict in various ways. In this work, we consider a location–scale parameter structure, which is more complex than the single parameter cases because conflicts can arise between three sources of information, namely the likelihood, the prior distribution for the location parameter and the prior for the scale parameter. We establish sufficient conditions on the distributions in a location–scale model to resolve conflicts in different ways as a single observation tends to infinity. In addition, for each case, we explicitly give the limiting posterior distributions as the conflict becomes more extreme.  相似文献   

10.
In statistical modeling, we strive to specify models that resemble data collected in studies or observed from processes. Consequently, distributional specification and parameter estimation are central to parametric models. Graphical procedures, such as the quantile–quantile (QQ) plot, are arguably the most widely used method of distributional assessment, though critics find their interpretation to be overly subjective. Formal goodness of fit tests are available and are quite powerful, but only indicate whether there is a lack of fit, not why there is lack of fit. In this article, we explore the use of the lineup protocol to inject rigor into graphical distributional assessment and compare its power to that of formal distributional tests. We find that lineup tests are considerably more powerful than traditional tests of normality. A further investigation into the design of QQ plots shows that de-trended QQ plots are more powerful than the standard approach as long as the plot preserves distances in x and y to be the same. While we focus on diagnosing nonnormality, our approach is general and can be directly extended to the assessment of other distributions.  相似文献   

11.
Parametric models for interval censored data can now easily be fitted with minimal programming in certain standard statistical software packages. Regression equations can be introduced, both for the location and for the dispersion parameters. Finite mixture models can also be fitted, with a point mass on right (or left) censored observations, to allow for individuals who cannot have the event (or already have it). This mixing probability can also be allowed to follow a regression equation.Here, models based on nine different distributions are compared for three examples of heavily censored data as well as a set of simulated data. We find that, for parametric models, interval censoring can often be ignored and that the density, at centres of intervals, can be used instead in the likelihood function, although the approximation is not always reliable. In the context of heavily interval censored data, the conclusions from parametric models are remarkably robust with changing distributional assumptions and generally more informative than the corresponding non-parametric models.  相似文献   

12.
In modeling count data collected from manufacturing processes, economic series, disease outbreaks and ecological surveys, there are usually a relatively large or small number of zeros compared to positive counts. Such low or high frequencies of zero counts often require the use of underdispersed or overdispersed probability models for the underlying data generating mechanism. The commonly used models such as generalized or zero-inflated Poisson distributions are parametric and can usually account for only the overdispersion, but such distributions are often found to be inadequate in modeling underdispersion because of the need for awkward parameter or support restrictions. This article introduces a flexible class of semiparametric zero-altered models which account for both underdispersion and overdispersion and includes other familiar models such as those mentioned above as special cases. Consistency and asymptotic normality of the estimator of the dispersion parameter are derived under general conditions. Numerical support for the performance of the proposed method of inference is presented for the case of common discrete distributions.  相似文献   

13.
Various models have previously been proposed for data comprising m repeated measurements on each of N subjects. Log likelihood ratio tests may be used to help choose between possible models, but these tests are based on distributions which in theory apply only asymptotically. With small N , the log likelihood ratio approximation is unreliable, tending to reject the simpler of two models more often than it should. This is shown by reference to three datasets and analogous simulated data. For two of the three datasets, subjects fall into two groups. Log likelihood ratio tests confirm that for each of these two datasets group means over time differ. Tests suggest that group covariance structures also differ.  相似文献   

14.
In the class of discrete time Markovian processes, two models are widely used, the Markov chain and the hidden Markov model. A major difference between these two models lies in the relation between successive outputs of the observed variable. In a visible Markov chain, these are directly correlated while in hidden models they are not. However, in some situations it is possible to observe both a hidden Markov chain and a direct relation between successive observed outputs. Unfortunately, the use of either a visible or a hidden model implies the suppression of one of these hypothesis. This paper prsents a Markovian model under random environment called the Double Chain Markov Model which takes into account the maijn features of both visible and hidden models. Its main purpose is the modeling of non-homogeneous time-series. It is very flexible and can be estimated with traditional methods. The model is applied on a sequence of wind speeds and it appears to model data more successfully than both the usual Markov chains and hidden Markov models.  相似文献   

15.
Markov regression models are useful tools for estimating the impact of risk factors on rates of transition between multiple disease states. Alzheimer's disease (AD) is an example of a multi-state disease process in which great interest lies in identifying risk factors for transition. In this context, non-homogeneous models are required because transition rates change as subjects age. In this report we propose a non-homogeneous Markov regression model that allows for reversible and recurrent disease states, transitions among multiple states between observations, and unequally spaced observation times. We conducted simulation studies to demonstrate performance of estimators for covariate effects from this model and compare performance with alternative models when the underlying non-homogeneous process was correctly specified and under model misspecification. In simulation studies, we found that covariate effects were biased if non-homogeneity of the disease process was not accounted for. However, estimates from non-homogeneous models were robust to misspecification of the form of the non-homogeneity. We used our model to estimate risk factors for transition to mild cognitive impairment (MCI) and AD in a longitudinal study of subjects included in the National Alzheimer's Coordinating Center's Uniform Data Set. Using our model, we found that subjects with MCI affecting multiple cognitive domains were significantly less likely to revert to normal cognition.  相似文献   

16.
In this paper, we consider that the degradation of two performance characteristics of a product can be modelled by stochastic processes and jointly by copula functions, but different stochastic processes govern the behaviour of each performance characteristic (PC) degradation. Different heterogeneous and homogeneous models are presented considering copula functions and different combinations of the most used stochastic processes in degradation analysis as marginal distributions. This is an important aspect to consider because the behaviour of the degradation of each PC may be different in its nature. As the joint distributions of the proposed models result in complex distributions, the estimation of the parameters of interest is performed via MCMC. A simulation study is performed to compare heterogeneous and homogeneous models. In addition, the proposed models are implemented to crack propagation data of two terminals of an electronic device, and some insights are provided about the product reliability under heterogeneous models.  相似文献   

17.
In this article, we derive general matrix formulae for second-order biases of maximum likelihood estimators (MLEs) in a class of heteroscedastic symmetric nonlinear regression models, thus generalizing some results in the literature. This class of regression models includes all symmetric continuous distributions, and has a wide range of practical applications in various fields such as engineering, biology, medicine and economics, among others. The variety of distributions with different kurtosis coefficients than the normal may give more flexibility in the choice of an appropriate distribution, particularly to accommodate outlying and influential observations. We derive a joint iterative process for estimating the mean and dispersion parameters. We also present simulation studies for the biases of the MLEs.  相似文献   

18.
Estimating parameters in heavy-tailed distribution plays a central role in extreme value theory. It is well known that classical estimators based on the first order asymptotics such as the Hill, rank-based and QQ estimators are seriously biased under finer second order regular variation framework. To reduce the bias, many authors proposed the so-called second order reduced bias estimators for both first and second order tail parameters. In this work, estimation of parameters in heavy-tailed distributions are studied under the second order regular variation framework when the second order parameter in the distribution tail is known. This is motivated in large part by a recent work by the authors showing that the second order tail parameter is known for a large class of popular random difference equations (for example, ARCH models). The focus is on least squares estimators that generalize rank-based and QQ estimators. Though other possible estimators are also briefly discussed, the least squares estimators are most simple to use and perform best for finite samples in Monte Carlo simulations.  相似文献   

19.
Not only are copula functions joint distribution functions in their own right, they also provide a link between multivariate distributions and their lower‐dimensional marginal distributions. Copulas have a structure that allows us to characterize all possible multivariate distributions, and therefore they have the potential to be a very useful statistical tool. Although copulas can be traced back to 1959, there is still much scope for new results, as most of the early work was theoretical rather than practical. We focus on simple practical tools based on conditional expectation, because such tools are not widely available. When dealing with data sets in which the dependence throughout the sample is variable, we suggest that copula‐based regression curves may be more accurate predictors of specific outcomes than linear models. We derive simple conditional expectation formulae in terms of copulas and apply them to a combination of simulated and real data.  相似文献   

20.
The issue of modelling non-Gaussian time series data is one that has been examined by several authors in recent years. Zeger (1988) introduced a parameter-driven model for a time series of counts as well as a more general observation-driven model for non-Gaussian data (Zeger & Qaqish, 1988). This paper examines the application of the added variable plot to these two models. This plot is useful for determining the strength of relationships and the detection of influential or outlying observations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号