期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Testing linear hypotheses in high-dimensional regressions

Zhidong Bai Dandan Jiang Jian-feng Yao 《Statistics》2013,47(6):1207-1223

For a multivariate linear model, Wilk's likelihood ratio test (LRT) constitutes one of the cornerstone tools. However, the computation of its quantiles under the null or the alternative hypothesis requires complex analytic approximations, and more importantly, these distributional approximations are feasible only for moderate dimension of the dependent variable, say p≤20. On the other hand, assuming that the data dimension p as well as the number q of regression variables are fixed while the sample size n grows, several asymptotic approximations are proposed in the literature for Wilk's Λ including the widely used chi-square approximation. In this paper, we consider necessary modifications to Wilk's test in a high-dimensional context, specifically assuming a high data dimension p and a large sample size n. Based on recent random matrix theory, the correction we propose to Wilk's test is asymptotically Gaussian under the null hypothesis and simulations demonstrate that the corrected LRT has very satisfactory size and power, surely in the large p and large n context, but also for moderately large data dimensions such as p=30 or p=50. As a byproduct, we give a reason explaining why the standard chi-square approximation fails for high-dimensional data. We also introduce a new procedure for the classical multiple sample significance test in multivariate analysis of variance which is valid for high-dimensional data. 相似文献

2.

Nonparametric estimation of extreme conditional quantiles

《Journal of Statistical Computation and Simulation》2012,82(8):567-580

The estimation of extreme conditional quantiles is an important issue in different scientific disciplines. Up to now, the extreme value literature focused mainly on estimation procedures based on independent and identically distributed samples. Our contribution is a two-step procedure for estimating extreme conditional quantiles. In a first step nonextreme conditional quantiles are estimated nonparametrically using a local version of [Koenker, R. and Bassett, G. (1978). Regression quantiles. Econometrica, 46, 33–50.] regression quantile methodology. Next, these nonparametric quantile estimates are used as analogues of univariate order statistics in procedures for extreme quantile estimation. The performance of the method is evaluated for both heavy tailed distributions and distributions with a finite right endpoint using a small sample simulation study. A bootstrap procedure is developed to guide in the selection of an optimal local bandwidth. Finally the procedure is illustrated in two case studies. 相似文献

3.

A low-end quantile estimator from a right-skewed distribution

Hongjun Wang 《统计学通讯:理论与方法》2013,42(10):2810-2833

ABSTRACT

In many statistical applications estimation of population quantiles is desired. In this study, a log–flip–robust (LFR) approach is proposed to estimate, specifically, lower-end quantiles (those below the median) from a continuous, positive, right-skewed distribution. Characteristics of common right-skewed distributions suggest that a logarithm transformation (L) followed by flipping the lower half of the sample (F) allows for the estimation of the lower-end quantile using robust methods (R) based on symmetric populations. Simulations show that this approach is superior in many cases to current methods, while not suffering from the sample size restrictions of other approaches. 相似文献

4.

Noncrossing structured additive multiple-output Bayesian quantile regression models

Bruno Santos Thomas Kneib 《Statistics and Computing》2020,30(4):855-869

Quantile regression models are a powerful tool for studying different points of the conditional distribution of univariate response variables. Their multivariate counterpart extension though is not straightforward, starting with the definition of multivariate quantiles. We propose here a flexible Bayesian quantile regression model when the response variable is multivariate, where we are able to define a structured additive framework for all predictor variables. We build on previous ideas considering a directional approach to define the quantiles of a response variable with multiple outputs, and we define noncrossing quantiles in every directional quantile model. We define a Markov chain Monte Carlo (MCMC) procedure for model estimation, where the noncrossing property is obtained considering a Gaussian process design to model the correlation between several quantile regression models. We illustrate the results of these models using two datasets: one on dimensions of inequality in the population, such as income and health; the second on scores of students in the Brazilian High School National Exam, considering three dimensions for the response variable. 相似文献

5.

Random weighting-based quantile estimation via importance resampling

Wenhui Wei Shesheng Gao Yongmin Zhong Chengfan Gu Zhaohui Gao 《统计学通讯:理论与方法》2013,42(19):4820-4833

Abstract

This paper presents a new method to estimate the quantiles of generic statistics by combining the concept of random weighting with importance resampling. This method converts the problem of quantile estimation to a dual problem of tail probabilities estimation. Random weighting theories are established to calculate the optimal resampling weights for estimation of tail probabilities via sequential variance minimization. Subsequently, the quantile estimation is constructed by using the obtained optimal resampling weights. Experimental results on real and simulated data sets demonstrate that the proposed random weighting method can effectively estimate the quantiles of generic statistics. 相似文献

6.

Sample size determination for estimating multivariate process capability indices based on lower confidence limits

Chung-I Li Jeh-Nan Pan 《Journal of applied statistics》2012,39(9):1911-1920

With the advent of modern technology, manufacturing processes have become very sophisticated; a single quality characteristic can no longer reflect a product's quality. In order to establish performance measures for evaluating the capability of a multivariate manufacturing process, several new multivariate capability (NMC) indices, such as NMC_p and NMC_pm, have been developed over the past few years. However, the sample size determination for multivariate process capability indices has not been thoroughly considered in previous studies. Generally, the larger the sample size, the more accurate an estimation will be. However, too large a sample size may result in excessive costs. Hence, the trade-off between sample size and precision in estimation is a critical issue. In this paper, the lower confidence limits of NMC_p and NMC_pm indices are used to determine the appropriate sample size. Moreover, a procedure for conducting the multivariate process capability study is provided. Finally, two numerical examples are given to demonstrate that the proper determination of sample size for multivariate process indices can achieve a good balance between sampling costs and estimation precision. 相似文献

7.

Outlier detection by robust principal components analysis

C. Caroni 《统计学通讯:模拟与计算》2013,42(1):139-151

The robust principal components analysis (RPCA) introduced by Campbell (Applied Statistics 1980, 29, 231–237) provides in addition to robust versions of the usual output of a principal components analysis, weights for the contribution of each point to the robust estimation of each component. Low weights may thus be used to indicate outliers. The present simulation study provides critical values for testing the kth smallest weight in the RPCA of a sample of n p-dimensional vectors, under the null hypothesis of a multivariate normal distribution. The cases p=2(2)10, 15, 20 for n=20, 30, 40, 50, 75, 100 subject to n≥p/2, are examined, with k≤√n. 相似文献

8.

Orthogonal polynomials generated by random vectors

Christopher S. Withers Saralees Nadarajah 《统计学通讯:理论与方法》2017,46(12):6130-6136

Every random q-vector with finite moments generates a set of orthonormal polynomials. These are generated from the basis functions xⁿ = x^n₁₁…x^n_q_q using Gram–Schmidt orthogonalization. One can cycle through these basis functions using any number of ways. Here, we give results using minimum cycling. The polynomials look simpler when centered about the mean of X, and still simpler form when X is symmetric about zero. This leads to an extension of the multivariate Hermite polynomial for a general random vector symmetric about zero. As an example, the results are applied to the multivariate normal distribution. 相似文献

9.

Copula representation of bivariate L-moments: a new estimation method for multiparameter two-dimensional copula models

Brahim Brahimi Fateh Chebana 《Statistics》2015,49(3):497-521

Serfling and Xiao [A contribution to multivariate L-moments, L-comoment matrices. J Multivariate Anal. 2007;98:1765–1781] extended the L-moment theory to the multivariate setting. In the present paper, we focus on the two-dimensional random vectors to establish a link between the bivariate L-moments (BLM) and the underlying bivariate copula functions. This connection provides a new estimate of dependence parameters of bivariate statistical data. Extensive simulation study is carried out to compare estimators based on the BLM, the maximum likelihood, the minimum distance and a rank approximate Z-estimation. The obtained results show that, when the sample size increases, BLM-based estimation performs better as far as the bias and computation time are concerned. Moreover, the root-mean-squared error is quite reasonable and less sensitive in general to outliers than those of the above cited methods. Further, the proposed BLM method is an easy-to-use tool for the estimation of multiparameter copula models. A generalization of the BLM estimation method to the multivariate case is discussed. 相似文献

10.

Estimation of scale parameters in mixture distributions

Dipak K. Dey 《Revue canadienne de statistique》1990,18(2):171-178

Simultaneous estimation of scale parameters is considered in mixture distributions under squared-error loss. A general class of estimators is obtained which dominates the componentwise best multiple estimators and the moment estimators. As special cases, improved estimators are obtained for the multivariate t-distribution and the p-variate Lomax distribution. 相似文献

11.

Robust multivariate mixture regression models with incomplete data

Hwa Kyung Lim Naveen N. Narisetty 《Journal of Statistical Computation and Simulation》2017,87(2):328-347

Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis. 相似文献

12.

Mixture of linear mixed models using multivariate t distribution

《Journal of Statistical Computation and Simulation》2012,82(4):771-787

Linear mixed models are widely used when multiple correlated measurements are made on each unit of interest. In many applications, the units may form several distinct clusters, and such heterogeneity can be more appropriately modelled by a finite mixture linear mixed model. The classical estimation approach, in which both the random effects and the error parts are assumed to follow normal distribution, is sensitive to outliers, and failure to accommodate outliers may greatly jeopardize the model estimation and inference. We propose a new mixture linear mixed model using multivariate t distribution. For each mixture component, we assume the response and the random effects jointly follow a multivariate t distribution, to conveniently robustify the estimation procedure. An efficient expectation conditional maximization algorithm is developed for conducting maximum likelihood estimation. The degrees of freedom parameters of the t distributions are chosen data adaptively, for achieving flexible trade-off between estimation robustness and efficiency. Simulation studies and an application on analysing lung growth longitudinal data showcase the efficacy of the proposed approach. 相似文献

13.

Analysis of means: a generalized approach using R

Philip Pallmann Ludwig A. Hothorn 《Journal of applied statistics》2016,43(8):1541-1560

Papers on the analysis of means (ANOM) have been circulating in the quality control literature for decades, routinely describing it as a statistical stand-alone concept. Therefore, we clarify that ANOM should rather be regarded as a special case of a much more universal approach known as multiple contrast tests (MCTs). Perceiving ANOM as a grand-mean-type MCT paves the way for implementing it in the open-source software R. We give a brief tutorial on how to exploit R's versatility and introduce the R package ANOM for drawing the familiar decision charts. Beyond that, we illustrate two practical aspects of data analysis with ANOM: firstly, we compare merits and drawbacks of ANOM-type MCTs and ANOVA F-test and assess their respective statistical powers, and secondly, we show that the benefit of using critical values from multivariate t-distributions for ANOM instead of simple Bonferroni quantiles is oftentimes negligible. 相似文献

14.

Copulas checker-type approximations: Application to quantiles estimation of sums of dependent random variables

A. Cuberos E. Masiello 《统计学通讯:理论与方法》2020,49(12):3044-3062

Abstract

Several approximations of copulas have been proposed in the literature. By using empirical versions of checker-type copulas approximations, we propose non parametric estimators of the copula. Under some conditions, the proposed estimators are copulas and their main advantage is that they can be sampled from easily. One possible application is the estimation of quantiles of sums of dependent random variables from a small sample of the multivariate law and a full knowledge of the marginal laws. We show that estimations may be improved by including in an easy way in the approximated copula some additional information on the law of a sub-vector for example. Our approach is illustrated by numerical examples. 相似文献

15.

On tail index estimation based on multivariate data

A. Dematteo S. Clémençon 《Journal of nonparametric statistics》2016,28(1):152-176

This article is devoted to the study of tail index estimation based on i.i.d. multivariate observations, drawn from a standard heavy-tailed distribution, that is, of which Pareto-like marginals share the same tail index. A multivariate central limit theorem for a random vector, whose components correspond to (possibly dependent) Hill estimators of the common tail index α, is established under mild conditions. We introduce the concept of (standard) heavy-tailed random vector of tail index α and show how this limit result can be used in order to build an estimator of α with small asymptotic mean squared error, through a proper convex linear combination of the coordinates. Beyond asymptotic results, simulation experiments illustrating the relevance of the approach promoted are also presented. 相似文献

16.

One stage multiple comparisons of k-1 treatment mean lifetimes with the control for exponential distributions under heteroscedasticity

Shu-Fei Wu 《统计学通讯:模拟与计算》2013,42(10):2968-2978

Abstract

In survival or reliability data analysis, it is often useful to estimate the quantiles of the lifetime distribution, such as the median time to failure. Different nonparametric methods can construct confidence intervals for the quantiles of the lifetime distributions, some of which are implemented in commonly used statistical software packages. We here investigate the performance of different interval estimation procedures under a variety of settings with different censoring schemes. Our main objectives in this paper are to (i) evaluate the performance of confidence intervals based on the transformation approach commonly used in statistical software, (ii) introduce a new density-estimation-based approach to obtain confidence intervals for survival quantiles, and (iii) compare it with the transformation approach. We provide a comprehensive comparative study and offer some useful practical recommendations based on our results. Some numerical examples are presented to illustrate the methodologies developed. 相似文献

17.

New estimators of distribution functions

A. K. Md. Ehsanes Saleh 《统计学通讯:理论与方法》2013,42(11):3145-3157

ABSTRACT

This article considers the estimation of a distribution function F_X(x) based on a random sample X₁, X₂, …, X_n when the sample is suspected to come from a close-by distribution F₀(x). The new estimators, namely the preliminary test (PTE) and Stein-type estimator (SE) are defined and compared with the “empirical distribution function” (edf) under local departure. In this case, we show that Stein-type estimators are superior to edf and PTE is superior to edf when it is close to F₀(x). As a by-product similar estimators are proposed for population quantiles. 相似文献

18.

Parametric inference for quantile event times with adjustment for covariates on competing risks data

Minjung Lee 《Journal of applied statistics》2019,46(12):2128-2144

ABSTRACT

We propose parametric inferences for quantile event times with adjustment for covariates on competing risks data. We develop parametric quantile inferences using parametric regression modeling of the cumulative incidence function from the cause-specific hazard and direct approaches. Maximum likelihood inferences are developed for estimation of the cumulative incidence function and quantiles. We develop the construction of parametric confidence intervals for quantiles. Simulation studies show that the proposed methods perform well. We illustrate the methods using early stage breast cancer data. 相似文献

19.

A note on the multivariate generalized asymmetric Laplace motion

Patrizia Semeraro 《统计学通讯:理论与方法》2020,49(10):2339-2355

Abstract

In this note, we use multivariate subordination to introduce a multivariate extension of the generalized asymmetric Laplace motion. The class introduced provides a unified framework for several multivariate extensions of the popular variance gamma process. We also show that the associated time one distribution extends the multivariate generalized asymmetric Laplace distributions proposed in the statistical literature. 相似文献

20.

Estimating and Testing a Structured Covariance Matrix for Three-Level Multivariate Data

Anuradha Roy Ricardo Leiva 《统计学通讯:理论与方法》2013,42(11):1945-1963

This article considers an approach to estimating and testing a new Kronecker product covariance structure for three-level (multiple time points (p), multiple sites (u), and multiple response variables (q)) multivariate data. Testing of such covariance structure is potentially important for high dimensional multi-level multivariate data. The hypothesis testing procedure developed in this article can not only test the hypothesis for three-level multivariate data, but also can test many different hypotheses, such as blocked compound symmetry, for two-level multivariate data as special cases. The tests are implemented with two real data sets. 相似文献