期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Distributed inference for two-sample U-statistics in massive data analysis

Bingyao Huang Yanyan Liu Liuhua Peng 《Scandinavian Journal of Statistics》2023,50(3):1090-1115

This paper considers distributed inference for two-sample U-statistics under the massive data setting. In order to reduce the computational complexity, this paper proposes distributed two-sample U-statistics and blockwise linear two-sample U-statistics. The blockwise linear two-sample U-statistic, which requires less communication cost, is more computationally efficient especially when the data are stored in different locations. The asymptotic properties of both types of distributed two-sample U-statistics are established. In addition, this paper proposes bootstrap algorithms to approximate the distributions of distributed two-sample U-statistics and blockwise linear two-sample U-statistics for both nondegenerate and degenerate cases. The distributed weighted bootstrap for the distributed two-sample U-statistic is new in the literature. The proposed bootstrap procedures are computationally efficient and are suitable for distributed computing platforms with theoretical guarantees. Extensive numerical studies illustrate that the proposed distributed approaches are feasible and effective. 相似文献

2.

Semiparametric estimation of moment condition models with weakly dependent data

Francesco Bravo Ba M. Chu 《Journal of nonparametric statistics》2017,29(1):108-136

This paper develops the asymptotic theory for the estimation of smooth semiparametric generalized estimating equations models with weakly dependent data. The paper proposes new estimation methods based on smoothed two-step versions of the generalised method of moments and generalised empirical likelihood methods. An important aspect of the paper is that it allows the first-step estimation to have an effect on the asymptotic variances of the second-step estimators and explicitly characterises this effect for the empirically relevant case of the so-called generated regressors. The results of the paper are illustrated with a partially linear model that has not been previously considered in the literature. The proofs of the results utilise a new uniform strong law of large numbers and a new central limit theorem for U-statistics with varying kernels that are of independent interest. 相似文献

3.

Non-Gaussian limit distributions for U-statistics based on trimmed and Winsorized samples

Yuri V. Borovskikh 《Statistics》2013,47(3):609-632

Conditions ensuring the asymptotic normality of U-statistics based on either trimmed samples or Winsorized samples are well known [P. Janssen, R. Serfling, and N. Veraverbeke, Asymptotic normality of U-statistics based on trimmed samples, J. Statist. Plann. Inference 16 (1987), pp. 63–74; U-statistics on Winsorized and trimmed samples, Statist. Probab. Lett. 9 (1990), pp. 439–447]. However, the class of U-statistics has a much richer family of limiting distributions. This paper complements known results by providing general limit theorems for U-statistics based on trimmed or Winsorized samples where the limiting distribution is given in terms of multiple Ito–Wiener stochastic integrals. 相似文献

4.

Integrative Analysis of Cancer Diagnosis Studies with Composite Penalization

Jin Liu Shuangge Ma Jian Huang 《Scandinavian Journal of Statistics》2014,41(1):87-103

In cancer diagnosis studies, high‐throughput gene profiling has been extensively conducted, searching for genes whose expressions may serve as markers. Data generated from such studies have the ‘large d, small n’ feature, with the number of genes profiled much larger than the sample size. Penalization has been extensively adopted for simultaneous estimation and marker selection. Because of small sample sizes, markers identified from the analysis of single data sets can be unsatisfactory. A cost‐effective remedy is to conduct integrative analysis of multiple heterogeneous data sets. In this article, we investigate composite penalization methods for estimation and marker selection in integrative analysis. The proposed methods use the minimax concave penalty (MCP) as the outer penalty. Under the homogeneity model, the ridge penalty is adopted as the inner penalty. Under the heterogeneity model, the Lasso penalty and MCP are adopted as the inner penalty. Effective computational algorithms based on coordinate descent are developed. Numerical studies, including simulation and analysis of practical cancer data sets, show satisfactory performance of the proposed methods. 相似文献

5.

Detection of multiple undocumented change-points using adaptive Lasso

Jie Shen Colin M. Gallagher QiQi Lu 《Journal of applied statistics》2014,41(6):1161-1173

The problem of detecting multiple undocumented change-points in a historical temperature sequence with simple linear trend is formulated by a linear model. We apply adaptive least absolute shrinkage and selection operator (Lasso) to estimate the number and locations of change-points. Model selection criteria are used to choose the Lasso smoothing parameter. As adaptive Lasso may overestimate the number of change-points, we perform post-selection on change-points detected by adaptive Lasso using multivariate t simultaneous confidence intervals. Our method is demonstrated on the annual temperature data (year: 1902–2000) from Tuscaloosa, Alabama. 相似文献

6.

Overlapping group lasso for high-dimensional generalized linear models

Shengbin Zhou Jingke Zhou Bo Zhang 《统计学通讯:理论与方法》2013,42(19):4903-4917

Abstract

Structured sparsity has recently been a very popular technique to deal with the high-dimensional data. In this paper, we mainly focus on the theoretical problems for the overlapping group structure of generalized linear models (GLMs). Although the overlapping group lasso method for GLMs has been widely applied in some applications, the theoretical properties about it are still unknown. Under some general conditions, we presents the oracle inequalities for the estimation and prediction error of overlapping group Lasso method in the generalized linear model setting. Then, we apply these results to the so-called Logistic and Poisson regression models. It is shown that the results of the Lasso and group Lasso procedures for GLMs can be recovered by specifying the group structures in our proposed method. The effect of overlap and the performance of variable selection of our proposed method are both studied by numerical simulations. Finally, we apply our proposed method to two gene expression data sets: the p53 data and the lung cancer data. 相似文献

7.

An Adaptive Test for the Two-Sample Location Problem Based on U-Statistics

W. Kössler N. Kumar 《统计学通讯:模拟与计算》2013,42(7):1329-1346

For the two-sample location problem with continuous data we consider a general class of tests, all members of it are based on U-statistics. The asymptotic efficacies are investigated in detail. We construct an adaptive test where all statistics involved are suitably chosen U-statistics. It is shown that the proposed adaptive test has good asymptotic and finite sample power properties. 相似文献

8.

On Incomplete U-Statistics Having Minimum Variance

Alan J. Lee 《Australian & New Zealand Journal of Statistics》1982,24(3):275-282

The problem of choosing a design for an incomplete U-statistic is discussed. Designs yielding minimum variance U-statistics are presented, and their efficiencies relative to complete U-statistics studied. 相似文献

9.

Estimation of the Parameters of Type-I Generalized Logistic Distribution Using Order Statistics

N. V. Sreekumar P. Yageen Thomas 《统计学通讯:理论与方法》2013,42(10):1506-1524

In this work, we propose a technique of estimating the location parameter μ and scale parameter σ of Type-I generalized logistic distribution by U-statistics constructed by using best linear functions of order statistics as kernels. The efficiency comparison of the proposed estimators with respect to maximum likelihood estimators is also made. 相似文献

10.

Ultrahigh dimensional variable selection through the penalized maximum trimmed likelihood estimator

N. M. Neykov P. Filzmoser P. N. Neytchev 《Statistical Papers》2014,55(1):187-207

The penalized maximum likelihood estimator (PMLE) has been widely used for variable selection in high-dimensional data. Various penalty functions have been employed for this purpose, e.g., Lasso, weighted Lasso, or smoothly clipped absolute deviations. However, the PMLE can be very sensitive to outliers in the data, especially to outliers in the covariates (leverage points). In order to overcome this disadvantage, the usage of the penalized maximum trimmed likelihood estimator (PMTLE) is proposed to estimate the unknown parameters in a robust way. The computation of the PMTLE takes advantage of the same technology as used for PMLE but here the estimation is based on subsamples only. The breakdown point properties of the PMTLE are discussed using the notion of $d$ -fullness. The performance of the proposed estimator is evaluated in a simulation study for the classical multiple linear and Poisson linear regression models. 相似文献

11.

Asymptotics of M-estimator in multivariate linear regression models for a class of random errors

Yi Wu Wei Yu Xuejun Wang 《Australian & New Zealand Journal of Statistics》2023,65(3):262-285

It is known that linear regression models have immense applications in various areas such as engineering technology, economics and social sciences. In this paper, we investigate the asymptotic properties of M-estimator in multivariate linear regression model based on a class of random errors satisfying a generalised Bernstein-type inequality. By using the generalised Bernstein-type inequality, we obtain a general result on almost sure convergence for a class of random variables and then obtain the strong consistency for the M-estimator in multivariate linear regression models under some mild conditions. The result extends or improves some existing ones in the literature. Moreover, we also consider the case when the dimension $p$ tends to infinity by establishing the rate of almost sure convergence for a class of random variables satisfying generalised Bernstein-type inequality. Some numerical simulations are also provided to verify the validity of the theoretical results. 相似文献

12.

U-Statistics based on spacings

David D. Tung S. Rao Jammalamadaka 《Journal of statistical planning and inference》2012,142(3):673-684

In this paper, we investigate the asymptotic theory for U-statistics based on sample spacings, i.e. the gaps between successive observations. The usual asymptotic theory for U-statistics does not apply here because spacings are dependent variables. However, under the null hypothesis, the uniform spacings can be expressed as conditionally independent Exponential random variables. We exploit this idea to derive the relevant asymptotic theory both under the null hypothesis and under a sequence of close alternatives.The generalized Gini mean difference of the sample spacings is a prime example of a U-statistic of this type. We show that such a Gini spacings test is analogous to Rao's spacings test. We find the asymptotically locally most powerful test in this class, and it has the same efficacy as the Greenwood statistic. 相似文献

13.

Moments of L-Statistics: A Divided Differences Approach

Girdhar G. Agarwal 《统计学通讯:模拟与计算》2013,42(5):829-843

The derivation of the distributions of linear combinations of order statistics or L-statistics and the computation of their moments has been approached in the literature several ways. In this paper we use the properties of divided differences to obtain expressions for moments of some order statistics that arise as special cases of L-statistics. Expectations of some well-known L-statistics such as the trimmed mean and the winsorised mean for the pareto distribution are computed. The study also undertakes the computation of L-moments that are expectations of certain linear combinations of order statistics. The algorithms have been implemented using some well-known continuous distributions as examples. 相似文献

14.

General Sparse Boosting: Improving Feature Selection of L2 Boosting by Correlation-Based Penalty Family

Junlong Zhao 《统计学通讯:模拟与计算》2015,44(6):1612-1640

In high-dimensional setting, componentwise L₂boosting has been used to construct sparse model that performs well, but it tends to select many ineffective variables. Several sparse boosting methods, such as, SparseL₂Boosting and Twin Boosting, have been proposed to improve the variable selection of L₂boosting algorithm. In this article, we propose a new general sparse boosting method (GSBoosting). The relations are established between GSBoosting and other well known regularized variable selection methods in the orthogonal linear model, such as adaptive Lasso, hard thresholds, etc. Simulation results show that GSBoosting has good performance in both prediction and variable selection. 相似文献

15.

基于随机化适应性Lasso的高维变量选择

闫懋博田茂再《统计研究》2021,38(1):147-160

Lasso等惩罚变量选择方法选入模型的变量数受到样本量限制。文献中已有研究变量系数显著性的方法舍弃了未选入模型的变量含有的信息。本文在变量数大于样本量即p>n的高维情况下,使用随机化bootstrap方法获得变量权重,在计算适应性Lasso时构建选择事件的条件分布并剔除系数不显著的变量,以得到最终估计结果。本文的创新点在于提出的方法突破了适应性Lasso可选变量数的限制,当观测数据含有大量干扰变量时能够有效地识别出真实变量与干扰变量。与现有的惩罚变量选择方法相比,多种情境下的模拟研究展示了所提方法在上述两个问题中的优越性。实证研究中对NCI-60癌症细胞系数据进行了分析,结果较以往文献有明显改善。相似文献

16.

BAYESIAN HYPER‐LASSOS WITH NON‐CONVEX PENALIZATION

Jim E. Griffin Philip J. Brown 《Australian & New Zealand Journal of Statistics》2011,53(4):423-442

The Lasso has sparked interest in the use of penalization of the log‐likelihood for variable selection, as well as for shrinkage. We are particularly interested in the more‐variables‐than‐observations case of characteristic importance for modern data. The Bayesian interpretation of the Lasso as the maximum a posteriori estimate of the regression coefficients, which have been given independent, double exponential prior distributions, is adopted. Generalizing this prior provides a family of hyper‐Lasso penalty functions, which includes the quasi‐Cauchy distribution of Johnstone and Silverman as a special case. The properties of this approach, including the oracle property, are explored, and an EM algorithm for inference in regression problems is described. The posterior is multi‐modal, and we suggest a strategy of using a set of perfectly fitting random starting values to explore modes in different regions of the parameter space. Simulations show that our procedure provides significant improvements on a range of established procedures, and we provide an example from chemometrics. 相似文献

17.

The Berry-Esseen bound for Studentized U-statistics

R. Helmers 《Revue canadienne de statistique》1985,13(1):79-82

Callaert and Veraverbeke (1981) recently obtained a Berry-Esseen-type bound of order n^–1/2 for Studentized nondegenerate U-statistics of degree two. The condition these authors need to obtain this order bound is the finiteness of the 4.5th absolute moment of the kernel h. In this note it is shown that this assumption can be weakened to that of a finite (4 + ?)th absolute moment of the kernel h, for some ? > 0. Our proof resembles part of Helmers and van Zwet (1982), where an analogous result is obtained for the Student t-statistic. The present note extends this to Studentized U-statistics. 相似文献

18.

A class of k-sample distribution-free tests for location against ordered alternatives

Anil Gaur 《统计学通讯:理论与方法》2017,46(5):2343-2353

This paper introduces a new class of distribution-free tests for testing the homogeneity of several location parameters against ordered alternatives. The proposed class of test statistics is based on a linear combination of two-sample U-statistics based on subsample extremes. The mean and variance of the test statistic are obtained under the null hypothesis as well as under the sequence of local alternatives. The optimal weights are also determined. It is shown via Pitman ARE comparisons that the proposed class of test statistics performs better than its competitor tests in case of heavy-tailed and long-tailed distributions 相似文献

19.

Estimation and variable selection for generalised partially linear single-index models

Peng Lai Ye Tian 《Journal of nonparametric statistics》2014,26(1):171-185

In this paper, we study the problem of estimation and variable selection for generalised partially linear single-index models based on quasi-likelihood, extending existing studies on variable selection for partially linear single-index models to binary and count responses. To take into account the unit norm constraint of the index parameter, we use the ‘delete-one-component’ approach. The asymptotic normality of the estimates is demonstrated. Furthermore, the smoothly clipped absolute deviation penalty is added for variable selection of parameters both in the nonparametric part and the parametric part, and the oracle property of the variable selection procedure is shown. Finally, some simulation studies are carried out to illustrate the finite sample performance. 相似文献

20.

SELECTION PROCEDURES FOR SCALE PARAMETERS USING TWO-SAMPLE U-STATISTICS

A.N. Gill G.P. Mehta 《Australian & New Zealand Journal of Statistics》1991,33(3):347-362

Let be k independent populations having the same known quantile of order p (0 p 1) and let F(x)=F(x/_i) be the absolutely continuous cumulative distribution function of the ith population indexed by the scale parameter ₁, i = 1,…, k. We propose subset selection procedures based on two-sample U-statistics for selecting a subset of k populations containing the one associated with the smallest scale parameter. These procedures are compared with the subset selection procedures based on two-sample linear rank statistics given by Gill & Mehta (1989) in the sense of Pitman asymptotic relative efficiency, with interesting results. 相似文献