首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
When prediction intervals are constructed using unobserved component models (UCM), problems can arise due to the possible existence of components that may or may not be conditionally heteroscedastic. Accurate coverage depends on correctly identifying the source of the heteroscedasticity. Different proposals for testing heteroscedasticity have been applied to UCM; however, in most cases, these procedures are unable to identify the heteroscedastic component correctly. The main issue is that test statistics are affected by the presence of serial correlation, causing the distribution of the statistic under conditional homoscedasticity to remain unknown. We propose a nonparametric statistic for testing heteroscedasticity based on the well-known Wilcoxon''s rank statistic. We study the asymptotic validation of the statistic and examine bootstrap procedures for approximating its finite sample distribution. Simulation results show an improvement in the size of the homoscedasticity tests and a power that is clearly comparable with the best alternative in the literature. We also apply the test on real inflation data. Looking for the presence of a conditionally heteroscedastic effect on the error terms, we arrive at conclusions that almost all cases are different than those given by the alternative test statistics presented in the literature.  相似文献   

2.
On Maximum Depth and Related Classifiers   总被引:1,自引:0,他引:1  
Abstract.  Over the last couple of decades, data depth has emerged as a powerful exploratory and inferential tool for multivariate data analysis with wide-spread applications. This paper investigates the possible use of different notions of data depth in non-parametric discriminant analysis. First, we consider the situation where the prior probabilities of the competing populations are all equal and investigate classifiers that assign an observation to the population with respect to which it has the maximum location depth. We propose a different depth-based classification technique for unequal prior problems, which is also useful for equal prior cases, especially when the populations have different scatters and shapes. We use some simulated data sets as well as some benchmark real examples to evaluate the performance of these depth-based classifiers. Large sample behaviour of the misclassification rates of these depth-based non-parametric classifiers have been derived under appropriate regularity conditions.  相似文献   

3.
The standard approach to construct nonparametric tolerance intervals is to use the appropriate order statistics, provided a minimum sample size requirement is met. However, it is well-known that this traditional approach is conservative with respect to the nominal level. One way to improve the coverage probabilities is to use interpolation. However, the extension to the case of two-sided tolerance intervals, as well as for the case when the minimum sample size requirement is not met, have not been studied. In this paper, an approach using linear interpolation is proposed for improving coverage probabilities for the two-sided setting. In the case when the minimum sample size requirement is not met, coverage probabilities are shown to improve by using linear extrapolation. A discussion about the effect on coverage probabilities and expected lengths when transforming the data is also presented. The applicability of this approach is demonstrated using three real data sets.  相似文献   

4.
Consider a non‐parametric regression model Y =m (X )+ϵ , where m is an unknown regression function, Y is a real‐valued response variable, X is a real covariate, and ϵ is the error term. In this article, we extend the usual tests for homoscedasticity by developing consistent tests for independence between X and ϵ . Further, we investigate the local power of the proposed tests using Le Cam's contiguous alternatives. An asymptotic power study under local alternatives along with extensive finite sample simulation study shows that the performance of the new tests is competitive with existing ones. Furthermore, the practicality of the new tests is shown using two real data sets.  相似文献   

5.
ABSTRACT

Transformation of the response is a popular method to meet the usual assumptions of statistical methods based on linear models such as ANOVA and t-test. In this paper, we introduce new families of transformations for proportions or percentage data. Most of the transformations for proportions require 0 < x < 1 (where x denotes the proportion), which is often not the case in real data. The proposed families of transformations allow x = 0 and x = 1. We study the properties of the proposed transformations, as well as the performance in achieving normality and homoscedasticity. We analyze three real data sets to empirically show how the new transformation performs in meeting the usual assumptions. A simulation study is also performed to study the behavior of new families of transformations.  相似文献   

6.
We propose a new class of continuous distributions with two extra shape parameters named the generalized odd log-logistic family of distributions. The proposed family contains as special cases the proportional reversed hazard rate and odd log-logistic classes. Its density function can be expressed as a linear combination of exponentiated densities based on the same baseline distribution. Some of its mathematical properties including ordinary moments, quantile and generating functions, two entropy measures and order statistics are obtained. We derive a power series for the quantile function. We discuss the method of maximum likelihood to estimate the model parameters. We study the behaviour of the estimators by means of Monte Carlo simulations. We introduce the log-odd log-logistic Weibull regression model with censored data based on the odd log-logistic-Weibull distribution. The importance of the new family is illustrated using three real data sets. These applications indicate that this family can provide better fits than other well-known classes of distributions. The beauty and importance of the proposed family lies in its ability to model different types of real data.  相似文献   

7.
It is often necessary to compare two measurement methods in medicine and other experimental sciences. This problem covers a broad range of data. Many authors have explored ways of assessing the agreement of two sets of measurements. However, there has been relatively little attention to the problem of determining sample size for designing an agreement study. In this paper, a method using the interval approach for concordance is proposed to calculate sample size in conducting an agreement study. The philosophy behind this is that the concordance is satisfied when no more than the pre‐specified k discordances are found for a reasonable large sample size n since it is much easier to define a discordance pair. The goal here is to find such a reasonable large sample size n. The sample size calculation is based on two rates: the discordance rate and tolerance probability, which in turn can be used to quantify an agreement study. The proposed approach is demonstrated through a real data set. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

8.
The aim of this study is to determine the effect of informative priors for variables with missing value and to compare Bayesian Cox regression and Cox regression analysis. For this purpose, firstly simulated data sets with different sample size within different missing rate were generated and each of data sets were analysed by Cox regression and Bayesian Cox regression with informative prior. Secondly lung cancer data set as real data set was used for analysis. Consequently, using informative priors for variables with missing value solved the missing data problem.  相似文献   

9.
In this paper, we study the performance of the most popular bootstrap schemes for multilevel data. Also, we propose a modified version of the wild bootstrap procedure for hierarchical data structures. The wild bootstrap does not require homoscedasticity or assumptions on the distribution of the error processes. Hence, it is a valuable tool for robust inference in a multilevel framework. We assess the finite size performances of the schemes through a Monte Carlo study. The results show that for big sample sizes it always pays off to adopt an agnostic approach as the wild bootstrap outperforms other techniques.  相似文献   

10.
Abstract. We propose a Bayesian semiparametric methodology for quantile regression modelling. In particular, working with parametric quantile regression functions, we develop Dirichlet process mixture models for the error distribution in an additive quantile regression formulation. The proposed non‐parametric prior probability models allow the shape of the error density to adapt to the data and thus provide more reliable predictive inference than models based on parametric error distributions. We consider extensions to quantile regression for data sets that include censored observations. Moreover, we employ dependent Dirichlet processes to develop quantile regression models that allow the error distribution to change non‐parametrically with the covariates. Posterior inference is implemented using Markov chain Monte Carlo methods. We assess and compare the performance of our models using both simulated and real data sets.  相似文献   

11.
The Weibull, log-logistic and log-normal distributions are extensively used to model time-to-event data. The Weibull family accommodates only monotone hazard rates, whereas the log-logistic and log-normal are widely used to model unimodal hazard functions. The increasing availability of lifetime data with a wide range of characteristics motivate us to develop more flexible models that accommodate both monotone and nonmonotone hazard functions. One such model is the exponentiated Weibull distribution which not only accommodates monotone hazard functions but also allows for unimodal and bathtub shape hazard rates. This distribution has demonstrated considerable potential in univariate analysis of time-to-event data. However, the primary focus of many studies is rather on understanding the relationship between the time to the occurrence of an event and one or more covariates. This leads to a consideration of regression models that can be formulated in different ways in survival analysis. One such strategy involves formulating models for the accelerated failure time family of distributions. The most commonly used distributions serving this purpose are the Weibull, log-logistic and log-normal distributions. In this study, we show that the exponentiated Weibull distribution is closed under the accelerated failure time family. We then formulate a regression model based on the exponentiated Weibull distribution, and develop large sample theory for statistical inference. We also describe a Bayesian approach for inference. Two comparative studies based on real and simulated data sets reveal that the exponentiated Weibull regression can be valuable in adequately describing different types of time-to-event data.  相似文献   

12.
In this paper, we examine by Monte Carlo experiments the small sample properties of the W (Wald), LM (Lagrange Multiplier) and LR (Likelihood Ratio) tests for equality between sets of coefficients in two linear regressions under heteroscedasticity. The small sample properties of the size-corrected W, LM and LR tests proposed by Rothenberg (1984) are also examined and it is shown that the performances of the size-corrected W and LM tests are very good. Further, we examine the two-stage test which consists of a test for homoscedasticity followed by the Chow (1960) test if homoscedasticity is indicated or one of the W, LM or LR tests if heteroscedasticity should be assumed. It is shown that the pretest does not reduce much the bias in the size when the sizecorrected citical values are used in the W, LM and LR tests.  相似文献   

13.
14.
There is a growing demand for public use data while at the same time there are increasing concerns about the privacy of personal information. One proposed method for accomplishing both goals is to release data sets that do not contain real values but yield the same inferences as the actual data. The idea is to view confidential data as missing and use multiple imputation techniques to create synthetic data sets. In this article, we compare techniques for creating synthetic data sets in simple scenarios with a binary variable.  相似文献   

15.
This paper introduces a nonparametric approach for testing the equality of two or more survival distributions based on right censored failure times with missing population marks for the censored observations. The standard log-rank test is not applicable here because the population membership information is not available for the right censored individuals. We propose to use the imputed population marks for the censored observations leading to fractional at-risk sets that can be used in a two sample censored data log-rank test. We demonstrate with a simple example that there could be a gain in power by imputing population marks (the proposed method) for the right censored individuals compared to simply removing them (which also would maintain the right size). Performance of the imputed log-rank tests obtained this way is studied through simulation. We also obtain an asymptotic linear representation of our test statistic. Our testing methodology is illustrated using a real data set.  相似文献   

16.
It may sometimes be clear from background knowledge that a population under investigation proportionally consists of a known number of subpopulations, whose distributions belong to the same, yet unknown, family. While a parametric family is commonly used in practice, one can also consider some nonparametric families to avoid distributional misspecification. In this article, we propose a solution using a mixture-based nonparametric family for the component distribution in a finite mixture model as opposed to some recent research that utilizes a kernel-based approach. In particular, we present a semiparametric maximum likelihood estimation procedure for the model parameters and tackle the bandwidth parameter selection problem via some popular means for model selection. Empirical comparisons through simulation studies and three real data sets suggest that estimators based on our mixture-based approach are more efficient than those based on the kernel-based approach, in terms of both parameter estimation and overall density estimation.  相似文献   

17.
Relationships between species and their environment are a key component to understand ecological communities. Usually, this kind of data are repeated over time or space for communities and their environment, which leads to a sequence of pairs of ecological tables, i.e. multi-way matrices. This work proposes a new method which is a combined approach of STATICO and Tucker3 techniques and deals to the problem of describing not only the stable part of the dynamics of structure–function relationships between communities and their environment (in different locations and/or at different times), but also the interactions and changes associated with the ecosystems’ dynamics. At the same time, emphasis is given to the comparison with the STATICO method on the same (real) data set, where advantages and drawbacks are explored and discussed. Thus, this study produces a general methodological framework and develops a new technique to facilitate the use of these practices by researchers. Furthermore, from this first approach with estuarine environmental data one of the major advantages of modeling ecological data sets with the CO-TUCKER model is the gain in interpretability.  相似文献   

18.
High-dimensional sparse modeling with censored survival data is of great practical importance, as exemplified by applications in high-throughput genomic data analysis. In this paper, we propose a class of regularization methods, integrating both the penalized empirical likelihood and pseudoscore approaches, for variable selection and estimation in sparse and high-dimensional additive hazards regression models. When the number of covariates grows with the sample size, we establish asymptotic properties of the resulting estimator and the oracle property of the proposed method. It is shown that the proposed estimator is more efficient than that obtained from the non-concave penalized likelihood approach in the literature. Based on a penalized empirical likelihood ratio statistic, we further develop a nonparametric likelihood approach for testing the linear hypothesis of regression coefficients and constructing confidence regions consequently. Simulation studies are carried out to evaluate the performance of the proposed methodology and also two real data sets are analyzed.  相似文献   

19.
Considerable progress has been made in applying Markov chain Monte Carlo (MCMC) methods to the analysis of epidemic data. However, this likelihood based method can be inefficient due to the limited data available concerning an epidemic outbreak. This paper considers an alternative approach to studying epidemic data using Approximate Bayesian Computation (ABC) methodology. ABC is a simulation-based technique for obtaining an approximate sample from the posterior distribution of the parameters of the model and in an epidemic context is very easy to implement. A new approach to ABC is introduced which generates a set of values from the (approximate) posterior distribution of the parameters during each simulation rather than a single value. This is based upon coupling simulations with different sets of parameters and we call the resulting algorithm coupled ABC. The new methodology is used to analyse final size data for epidemics amongst communities partitioned into households. It is shown that for the epidemic data sets coupled ABC is more efficient than ABC and MCMC-ABC.  相似文献   

20.
We propose a mixture of latent variables model for the model-based clustering, classification, and discriminant analysis of data comprising variables with mixed type. This approach is a generalization of latent variable analysis, and model fitting is carried out within the expectation-maximization framework. Our approach is outlined and a simulation study conducted to illustrate the effect of sample size and noise on the standard errors and the recovery probabilities for the number of groups. Our modelling methodology is then applied to two real data sets and their clustering and classification performance is discussed. We conclude with discussion and suggestions for future work.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号