期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A likelihood ratio test of a homoscedastic normal mixture against a heteroscedastic normal mixture

Yungtai Lo 《Statistics and Computing》2008,18(3):233-240

It is generally assumed that the likelihood ratio statistic for testing the null hypothesis that data arise from a homoscedastic normal mixture distribution versus the alternative hypothesis that data arise from a heteroscedastic normal mixture distribution has an asymptotic χ ² reference distribution with degrees of freedom equal to the difference in the number of parameters being estimated under the alternative and null models under some regularity conditions. Simulations show that the χ ² reference distribution will give a reasonable approximation for the likelihood ratio test only when the sample size is 2000 or more and the mixture components are well separated when the restrictions suggested by Hathaway (Ann. Stat. 13:795–800, 1985) are imposed on the component variances to ensure that the likelihood is bounded under the alternative distribution. For small and medium sample sizes, parametric bootstrap tests appear to work well for determining whether data arise from a normal mixture with equal variances or a normal mixture with unequal variances. 相似文献

2.

On the distribution of the correlation coefficient when sampling from a mixture of two bivariate normal densities: Robustness and the effect of outliers

M. S. Srivastava G. C. Lee 《Revue canadienne de statistique》1984,12(2):119-133

The distribution of the sample correlation coefficient is derived when the population is a mixture of two bivariate normal distributions with zero mean but different covariances and mixing proportions 1 - λ and λ respectively; λ will be called the proportion of contamination. The test of ρ = 0 based on Student's t, Fisher's z, arcsine, or Ruben's transformation is shown numerically to be nonrobust when λ, the proportion of contamination, lies between 0.05 and 0.50 and the contaminated population has 9 times the variance of the standard (bivariate normal) population. These tests are also sensitive to the presence of outliers. 相似文献

3.

A joint test for parametric specification and independence in nonlinear regression models

Shuo Li 《Econometric Reviews》2019,38(10):1202-1215

This paper develops a testing procedure to simultaneously check (i) the independence between the error and the regressor(s), and (ii) the parametric specification in nonlinear regression models. This procedure generalizes the existing work of Sen and Sen [“Testing Independence and Goodness-of-fit in Linear Models,” Biometrika, 101, 927–942.] to a regression setting that allows any smooth parametric form of the regression function. We establish asymptotic theory for the test procedure under both conditional homoscedastic error and heteroscedastic error. The derived tests are easily implementable, asymptotically normal, and consistent against a large class of fixed alternatives. Besides, the local power performance is investigated. To calibrate the finite sample distribution of the test statistics, a smooth bootstrap procedure is proposed and found work well in simulation studies. Finally, two real data examples are analyzed to illustrate the practical merit of our proposed tests. 相似文献

4.

Estimating the Proportion of True Null Hypotheses in Nonparametric Exponential Mixture Model with Appication to the Leukemia Gene Expression Data

Hualing Zhao Xiaoxia Wu Hong Zhang 《统计学通讯:模拟与计算》2013,42(9):1580-1592

We revisit the problem of estimating the proportion π of true null hypotheses where a large scale of parallel hypothesis tests are performed independently. While the proportion is a quantity of interest in its own right in applications, the problem has arisen in assessing or controlling an overall false discovery rate. On the basis of a Bayes interpretation of the problem, the marginal distribution of the p-value is modeled in a mixture of the uniform distribution (null) and a non-uniform distribution (alternative), so that the parameter π of interest is characterized as the mixing proportion of the uniform component on the mixture. In this article, a nonparametric exponential mixture model is proposed to fit the p-values. As an alternative approach to the convex decreasing mixture model, the exponential mixture model has the advantages of identifiability, flexibility, and regularity. A computation algorithm is developed. The new approach is applied to a leukemia gene expression data set where multiple significance tests over 3,051 genes are performed. The new estimate for π with the leukemia gene expression data appears to be about 10% lower than the other three estimates that are known to be conservative. Simulation results also show that the new estimate is usually lower and has smaller bias than the other three estimates. 相似文献

5.

Testing lack of fit of regression models under heteroscedasticity

Chin-Shang Li 《Revue canadienne de statistique》1999,27(3):485-496

A test is proposed for assessing the lack of fit of heteroscedastic nonlinear regression models that is based on comparison of nonparametric kernel and parametric fits. A data-driven method is proposed for bandwidth selection using the asymptotically optimal bandwidth of the parametric null model which leads to a test that has a limiting normal distribution under the null hypothesis and is consistent against any fixed alternative. The resulting test is applied to the problem of testing the lack of fit of a generalized linear model. 相似文献

6.

Homogeneity testing under finite location-scale mixtures

Jiahua Chen Pengfei Li Guanfu Liu 《Revue canadienne de statistique》2020,48(4):670-684

The testing problem for the order of finite mixture models has a long history and remains an active research topic. Since Ghosh & Sen (1985) revealed the hard-to-manage asymptotic properties of the likelihood ratio test, many successful alternative approaches have been developed. The most successful attempts include the modified likelihood ratio test and the EM-test, which lead to neat solutions for finite mixtures of univariate normal distributions, finite mixtures of single-parameter distributions, and several mixture-like models. The problem remains challenging, and there is still no generic solution for location-scale mixtures. In this article, we provide an EM-test solution for homogeneity for finite mixtures of location-scale family distributions. This EM-test has nonstandard limiting distributions, but we are able to find the critical values numerically. We use computer experiments to obtain appropriate values for the tuning parameters. A simulation study shows that the fine-tuned EM-test has close to nominal type I errors and very good power properties. Two application examples are included to demonstrate the performance of the EM-test. 相似文献

7.

Robust mixture modeling using the skew <Emphasis Type="Italic">t</Emphasis> distribution

Tsung I. Lin Jack C. Lee Wan J. Hsieh 《Statistics and Computing》2007,17(2):81-92

A finite mixture model using the Student's t distribution has been recognized as a robust extension of normal mixtures. Recently, a mixture of skew normal distributions has been found to be effective in the treatment of heterogeneous data involving asymmetric behaviors across subclasses. In this article, we propose a robust mixture framework based on the skew t distribution to efficiently deal with heavy-tailedness, extra skewness and multimodality in a wide range of settings. Statistical mixture modeling based on normal, Student's t and skew normal distributions can be viewed as special cases of the skew t mixture model. We present analytically simple EM-type algorithms for iteratively computing maximum likelihood estimates. The proposed methodology is illustrated by analyzing a real data example. 相似文献

8.

A test for detecting etiologic heterogeneity in epidemiological studies

S. Karagulle 《Journal of applied statistics》2016,43(3):538-549

Current statistical methods for analyzing epidemiological data with disease subtype information allow us to acquire knowledge not only for risk factor-disease subtype association but also, on a more profound account, heterogeneity in these associations by multiple disease characteristics (so-called etiologic heterogeneity of the disease). Current interest, particularly in cancer epidemiology, lies in obtaining a valid p-value for testing the hypothesis whether a particular cancer is etiologically heterogeneous. We consider the two-stage logistic regression model along with pseudo-conditional likelihood estimation method and design a testing strategy based on Rao's score test. An extensive Monte Carlo simulation study is carried out, false discovery rate and statistical power of the suggested test are investigated. Simulation results indicate that applying the proposed testing strategy, even a small degree of true etiologic heterogeneity can be recovered with a large statistical power from the sampled data. The strategy is then applied on a breast cancer data set to illustrate its use in practice where there are multiple risk factors and multiple disease characteristics of simultaneous concern. 相似文献

9.

Testing Inference in Inflated Beta Regressions under Model Misspecification

Tatiene C. Souza Tarciana L. Pereira Francisco Cribari-Neto Verônica M. C. Lima 《统计学通讯:模拟与计算》2016,45(2):625-642

We consider testing inference in inflated beta regressions subject to model misspecification. In particular, quasi-z tests based on sandwich covariance matrix estimators are described and their finite sample behavior is investigated via Monte Carlo simulations. The numerical evidence shows that quasi-z testing inference can be considerably more accurate than inference made through the usual z tests, especially when there is model misspecification. Interval estimation is also considered. We also present an empirical application that uses real (not simulated) data. 相似文献

10.

Testing for two states in a hidden Markov model

Jörn Dannemann Hajo Holzmann 《Revue canadienne de statistique》2008,36(4):505-520

The authors consider hidden Markov models (HMMs) whose latent process has m ≥ 2 states and whose state‐dependent distributions arise from a general one‐parameter family. They propose a test of the hypothesis m = 2. Their procedure is an extension to HMMs of the modified likelihood ratio statistic proposed by Chen, Chen & Kalbfleisch (2004) for testing two states in a finite mixture. The authors determine the asymptotic distribution of their test under the hypothesis m = 2 and investigate its finite‐sample properties in a simulation study. Their test is based on inference for the marginal mixture distribution of the HMM. In order to illustrate the additional difficulties due to the dependence structure of the HMM, they show how to test general regular hypotheses on the marginal mixture of HMMs via a quasi‐modified likelihood ratio. They also discuss two applications. 相似文献

11.

A heteroscedastic measurement error model based on skew and heavy-tailed distributions with known error variances

Lorena Cáceres Tomaya 《Journal of Statistical Computation and Simulation》2018,88(11):2185-2200

In this paper, we study inference in a heteroscedastic measurement error model with known error variances. Instead of the normal distribution for the random components, we develop a model that assumes a skew-t distribution for the true covariate and a centred Student's t distribution for the error terms. The proposed model enables to accommodate skewness and heavy-tailedness in the data, while the degrees of freedom of the distributions can be different. Maximum likelihood estimates are computed via an EM-type algorithm. The behaviour of the estimators is also assessed in a simulation study. Finally, the approach is illustrated with a real data set from a methods comparison study in Analytical Chemistry. 相似文献

12.

Preliminary test of fit in a general class of conditionally heteroscedastic nonlinear time series

《Journal of Statistical Computation and Simulation》2012,82(5):763-781

This article is concerned with a general class of conditionally heteroscedastic time series including possibly nonlinear and asymmetric autoregressive conditional heteroscedastic (ARCH) and generalized ARCH models. A problem of preliminary test of fit (PTF, hereafter) within the broad class under consideration is discussed. It is noted that contrary to usual tests in the literature of conditionally heteroscedastic time series, PTF does not require any specification of the conditional variance in advance. Based on the joint limit distributions of sample autocorrelations, a certain Portmanteau-type statistic for PTF is proposed, and its limit is shown to be a chi-square distribution. In addition, some simulation studies, under various innovations, are reported to support our theoretical results. 相似文献

13.

Efficient Stratified Testing Procedure for a False Discovery Rate

Seungbong Han Adin-Cristian Andrei Kam-Wah Tsui 《统计学通讯:模拟与计算》2015,44(5):1117-1125

The false discovery rate (FDR) has become a popular error measure in the large-scale simultaneous testing. When data are collected from heterogenous sources and form grouped hypotheses testing, it may be beneficial to use the distinct feature of groups to conduct the multiple hypotheses testing. We propose a stratified testing procedure that uses different FDR levels according to the stratification features based on p-values. Our proposed method is easy to implement in practice. Simulations studies show that the proposed method produces more efficient testing results. The stratified testing procedure minimizes the overall false negative rate (FNR) level, while controlling the overall FDR. An example from a type II diabetes mice study further illustrates the practical advantages of this new approach. 相似文献

14.

A simple test for comparing regression curves versus one-sided alternatives

Natalie Neumeyer Juan Carlos Pardo-Fernndez 《Journal of statistical planning and inference》2009,139(12):4006-4016

In this article we present a simple procedure to test for the null hypothesis of equality of two regression curves versus one-sided alternatives in a general nonparametric and heteroscedastic setup. The test is based on the comparison of the sample averages of the estimated residuals in each regression model under the null hypothesis. The test statistic has asymptotic normal distribution and can detect any local alternative of rate n^-1/2. Some simulations and an application to a data set are included. 相似文献

15.

Asymptotics for L1-estimators of regression parameters under heteroscedasticityY

Keith Knight 《Revue canadienne de statistique》1999,27(3):497-507

We consider the asymptotic behaviour of L₁ -estimators in a linear regression under a very general form of heteroscedasticity. The limiting distributions of the estimators are derived under standard conditions on the design. We also consider the asymptotic behaviour of the bootstrap in the heteroscedastic model and show that it is consistent to first order only if the limiting distribution is normal. 相似文献

16.

Robust inference in an heteroscedastic measurement error model

Mário de Castro Manuel Galea 《Journal of the Korean Statistical Society》2010,39(4):439-447

In this paper we deal with robust inference in heteroscedastic measurement error models. Rather than the normal distribution, we postulate a Student t distribution for the observed variables. Maximum likelihood estimates are computed numerically. Consistent estimation of the asymptotic covariance matrices of the maximum likelihood and generalized least squares estimators is also discussed. Three test statistics are proposed for testing hypotheses of interest with the asymptotic chi-square distribution which guarantees correct asymptotic significance levels. Results of simulations and an application to a real data set are also reported. 相似文献

17.

Flexible mixture modeling via the multivariate t distribution with?the?Box-Cox transformation: an?alternative to?the?skew-t distribution

Lo K Gottardo R 《Statistics and Computing》2012,22(1):33-52

Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components. 相似文献

18.

Bootstrapping in a high dimensional but very low-sample size problem

《Journal of Statistical Computation and Simulation》2012,82(8):825-840

This article is concerned with testing multiple hypotheses, one for each of a large number of small data sets. Such data are sometimes referred to as high-dimensional, low-sample size data. Our model assumes that each observation within a randomly selected small data set follows a mixture of C shifted and rescaled versions of an arbitrary density f. A novel kernel density estimation scheme, in conjunction with clustering methods, is applied to estimate f. Bayes information criterion and a new criterion weighted mean of within-cluster variances are used to estimate C, which is the number of mixture components or clusters. These results are applied to the multiple testing problem. The null sampling distribution of each test statistic is determined by f, and hence a bootstrap procedure that resamples from an estimate of f is used to approximate this null distribution. 相似文献

19.

Generating correlated random vector involving discrete variables

Qing Xiao 《统计学通讯:理论与方法》2017,46(4):1594-1605

For the issue of generating correlated random vector containing discrete variables, one major obstacle is to determine a suitable correlation coefficient ρ_z in normal space for a specified correlation coefficient ρ_x. This paper develops a method to solve this problem. First, the double integral evaluated for ρ_x is transformed into independent standard uniform space, then, a Quasi Monte Carlo method is introduced to calculate the double integral. For a given ρ_x, an appropriate ρ_z is determined by a false position method. Compared with existing methodologies, the proposed method is less efficient, but it is relatively easy to implement. 相似文献

20.

Statistical Inference and Applications of Mixture of Varying Coefficient Models

《Scandinavian Journal of Statistics》2018,45(3):618-643

In this paper, we consider a new mixture of varying coefficient models, in which each mixture component follows a varying coefficient model and the mixing proportions and dispersion parameters are also allowed to be unknown smooth functions. We systematically study the identifiability, estimation and inference for the new mixture model. The proposed new mixture model is rather general, encompassing many mixture models as its special cases such as mixtures of linear regression models, mixtures of generalized linear models, mixtures of partially linear models and mixtures of generalized additive models, some of which are new mixture models by themselves and have not been investigated before. The new mixture of varying coefficient model is shown to be identifiable under mild conditions. We develop a local likelihood procedure and a modified expectation–maximization algorithm for the estimation of the unknown non‐parametric functions. Asymptotic normality is established for the proposed estimator. A generalized likelihood ratio test is further developed for testing whether some of the unknown functions are constants. We derive the asymptotic distribution of the proposed generalized likelihood ratio test statistics and prove that the Wilks phenomenon holds. The proposed methodology is illustrated by Monte Carlo simulations and an analysis of a CO₂‐GDP data set. 相似文献