期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Distributed inference for two-sample U-statistics in massive data analysis

Bingyao Huang Yanyan Liu Liuhua Peng 《Scandinavian Journal of Statistics》2023,50(3):1090-1115

This paper considers distributed inference for two-sample U-statistics under the massive data setting. In order to reduce the computational complexity, this paper proposes distributed two-sample U-statistics and blockwise linear two-sample U-statistics. The blockwise linear two-sample U-statistic, which requires less communication cost, is more computationally efficient especially when the data are stored in different locations. The asymptotic properties of both types of distributed two-sample U-statistics are established. In addition, this paper proposes bootstrap algorithms to approximate the distributions of distributed two-sample U-statistics and blockwise linear two-sample U-statistics for both nondegenerate and degenerate cases. The distributed weighted bootstrap for the distributed two-sample U-statistic is new in the literature. The proposed bootstrap procedures are computationally efficient and are suitable for distributed computing platforms with theoretical guarantees. Extensive numerical studies illustrate that the proposed distributed approaches are feasible and effective. 相似文献

2.

Higher order inference for stress–strength reliability with independent Burr-type X distributions

《Journal of Statistical Computation and Simulation》2012,82(15):3092-3107

In this paper, a small-sample asymptotic method is proposed for higher order inference in the stress–strength reliability model, R=P(Y<X), where X and Y are distributed independently as Burr-type X distributions. In a departure from the current literature, we allow the scale parameters of the two distributions to differ, and the likelihood-based third-order inference procedure is applied to obtain inference for R. The difficulty of the implementation of the method is in obtaining the the constrained maximum likelihood estimates (MLE). A penalized likelihood method is proposed to handle the numerical complications of maximizing the constrained likelihood model. The proposed procedures are illustrated using a sample of carbon fibre strength data. Our results from simulation studies comparing the coverage probabilities of the proposed small-sample asymptotic method with some existing large-sample asymptotic methods show that the proposed method is very accurate even when the sample sizes are small. 相似文献

3.

Exact average coverage probabilities and confidence coefficients of confidence intervals for discrete distributions 总被引：1，自引：0，他引：1

Hsiuying Wang 《Statistics and Computing》2009,19(2):139-148

For a confidence interval (L(X),U(X)) of a parameter θ in one-parameter discrete distributions, the coverage probability is a variable function of θ. The confidence coefficient is the infimum of the coverage probabilities, inf _θ P _θ(θ∈(L(X),U(X))). Since we do not know which point in the parameter space the infimum coverage probability occurs at, the exact confidence coefficients are unknown. Beside confidence coefficients, evaluation of a confidence intervals can be based on the average coverage probability. Usually, the exact average probability is also unknown and it was approximated by taking the mean of the coverage probabilities at some randomly chosen points in the parameter space. In this article, methodologies for computing the exact average coverage probabilities as well as the exact confidence coefficients of confidence intervals for one-parameter discrete distributions are proposed. With these methodologies, both exact values can be derived. 相似文献

4.

A matching prior based on the modified profile likelihood in a generalized Weibull stress‐strength model

Xiaoyi Min Dongchu Sun 《Revue canadienne de statistique》2013,41(1):83-97

Bayesian inference of a generalized Weibull stress‐strength model (SSM) with more than one strength component is considered. For this problem, properly assigning priors for the reliabilities is challenging due to the presence of nuisance parameters. Matching priors, which are priors matching the posterior probabilities of certain regions with their frequentist coverage probabilities, are commonly used but difficult to derive in this problem. Instead, we apply an alternative method and derive a matching prior based on a modification of the profile likelihood. Simulation studies show that this proposed prior performs well in terms of frequentist coverage and estimation even when the sample sizes are minimal. The prior is applied to two real datasets. The Canadian Journal of Statistics 41: 83–97; 2013 © 2012 Statistical Society of Canada 相似文献

5.

THE CONVERGENCE RATE OF SEQUENTIAL FIXED-WIDTH CONFIDENCE INTERVALS FOR REGULAR FUNCTIONALS

M. Aerts H. Callaert 《Australian & New Zealand Journal of Statistics》1986,28(1):97-106

Let X₁X₂,.be i.i.d. random variables and let U_n= (n r)^-1S?^(n,r) h (X_i1,., X_ir,) be a U-statistic with EU_n= v, v unknown. Assume that g(X₁) =E[h(X₁,.,X_r) - v |X₁]has a strictly positive variance s?². Further, let a be such that φ(a) - φ(-a) =α for fixed α, 0 < α < 1, where φ is the standard normal d.f., and let S²_n be the Jackknife estimator of n Var U_n. Consider the stopping times N(d)= min {n: S²_n: + n^-1≤²a^-2},d > 0, and a confidence interval for v of length 2d,of the form I_n,d= [U_n,-d, Un + d]. We assume that Var U_n is unknown, and hence, no fixed sample size method is available for finding a confidence interval for v of prescribed width 2d and prescribed coverage probability α Turning to a sequential procedure, let I_N(d),d be a sequence of sequential confidence intervals for v. The asymptotic consistency of this procedure, i.e. lim_{d → 0}P(v ∈ I_N(d),d)=α follows from Sproule (1969). In this paper, the rate at which |P(v ∈ I_N(d),d) converges to α is investigated. We obtain that |P(v ∈ I_N(d),d) - α| = 0 (d^{1/2-(1+k)/2(1+m)}), d → 0, where K = max {0,4 - m}, under the condition that E|h(X₁, X_r)|^m < ∞m > 2. This improves and extends recent results of Ghosh & DasGupta (1980) and Mukhopadhyay (1981). 相似文献

6.

Improved likelihood inference for the shape parameter in Weibull regression

《Journal of Statistical Computation and Simulation》2012,82(9):789-811

We obtain adjustments to the profile likelihood function in Weibull regression models with and without censoring. Specifically, we consider two different modified profile likelihoods: (i) the one proposed by Cox and Reid [Cox, D.R. and Reid, N., 1987, Parameter orthogonality and approximate conditional inference. Journal of the Royal Statistical Society B, 49, 1–39.], and (ii) an approximation to the one proposed by Barndorff–Nielsen [Barndorff–Nielsen, O.E., 1983, On a formula for the distribution of the maximum likelihood estimator. Biometrika, 70, 343–365.], the approximation having been obtained using the results by Fraser and Reid [Fraser, D.A.S. and Reid, N., 1995, Ancillaries and third-order significance. Utilitas Mathematica, 47, 33–53.] and by Fraser et al. [Fraser, D.A.S., Reid, N. and Wu, J., 1999, A simple formula for tail probabilities for frequentist and Bayesian inference. Biometrika, 86, 655–661.]. We focus on point estimation and likelihood ratio tests on the shape parameter in the class of Weibull regression models. We derive some distributional properties of the different maximum likelihood estimators and likelihood ratio tests. The numerical evidence presented in the paper favors the approximation to Barndorff–Nielsen's adjustment. 相似文献

7.

Empirical likelihood for the varying‐coefficient single‐index model

Zhensheng Huang Riquan Zhang 《Revue canadienne de statistique》2010,38(3):434-452

In this article the author investigates the application of the empirical‐likelihood‐based inference for the parameters of varying‐coefficient single‐index model (VCSIM). Unlike the usual cases, if there is no bias correction the asymptotic distribution of the empirical likelihood ratio cannot achieve the standard chi‐squared distribution. To this end, a bias‐corrected empirical likelihood method is employed to construct the confidence regions (intervals) of regression parameters, which have two advantages, compared with those based on normal approximation, that is, (1) they do not impose prior constraints on the shape of the regions; (2) they do not require the construction of a pivotal quantity and the regions are range preserving and transformation respecting. A simulation study is undertaken to compare the empirical likelihood with the normal approximation in terms of coverage accuracies and average areas/lengths of confidence regions/intervals. A real data example is given to illustrate the proposed approach. The Canadian Journal of Statistics 38: 434–452; 2010 © 2010 Statistical Society of Canada 相似文献

8.

Estimation of Conditional Ranks and Tests of Exogeneity in Nonparametric Nonseparable Models

Frédérique Fève Jean-Pierre Florens Ingrid Van Keilegom 《商业与经济统计学杂志》2018,36(2):334-345

Consider a nonparametric nonseparable regression model Y = ?(Z, U), where ?(Z, U) is strictly increasing in U and U ～ U[0, 1]. We suppose that there exists an instrument W that is independent of U. The observable random variables are Y, Z, and W, all one-dimensional. We construct test statistics for the hypothesis that Z is exogenous, that is, that U is independent of Z. The test statistics are based on the observation that Z is exogenous if and only if V = F_Y|Z(Y|Z) is independent of W, and hence they do not require the estimation of the function ?. The asymptotic properties of the proposed tests are proved, and a bootstrap approximation of the critical values of the tests is shown to be consistent and to work for finite samples via simulations. An empirical example using the U.K. Family Expenditure Survey is also given. As a byproduct of our results we obtain the asymptotic properties of a kernel estimator of the distribution of V, which equals U when Z is exogenous. We show that this estimator converges to the uniform distribution at faster rate than the parametric n^{? 1/2}-rate. 相似文献

9.

Empirical likelihood inference for a semiparametric hazards regression model

Wei Chen Dehui Wang 《统计学通讯:理论与方法》2013,42(11):3236-3248

ABSTRACT

We investigated the empirical likelihood inference approach under a general class of semiparametric hazards regression models with survival data subject to right-censoring. An empirical likelihood ratio for the full 2p regression parameters involved in the model is obtained. We showed that it converged weakly to a random variable which could be written as a weighted sum of 2p independent chi-squared variables with one degree of freedom. Using this, we could construct a confidence region for parameters. We also suggested an adjusted version for the preceding statistic, whose limit followed a standard chi-squared distribution with 2p degrees of freedom. 相似文献

10.

On Properties of Percentile Bootstrap Confidence Intervals for Period Using Periodogram

S. Mohammad E. Hosseini-Nasab Masoumeh Vakili 《统计学通讯:模拟与计算》2013,42(3):316-326

Periodic functions have many applications in astronomy. They can be used to model the magnitude of light intensity of the period variable stars that their brightness vary with time. Because the data related to the astronomical applications are commonly observed at the time points that are not regularly spaced, the use of the periodogram as a good tool for estimating period is highlighted. Our bootstrap inference about period is based on maximizing the periodogram and consists of percentile two-sided bootstrap confidence intervals construction for the true period. We also obtain their coverage levels theoretically, and discuss the benefit of double-bootstrap confidence intervals for the parameter by which the coverage levels are substantially improved. Precisely, we show that the coverage error of single-bootstrap confidence intervals is of order n ^?1, decreasing to order n ^?2 when applying double-bootstrap methods. The simulation study given here is a numerical assessment of the theoretical work. 相似文献

11.

ON THE COVERAGE PROBABILITY OF CONFIDENCE INTERVALS IN REGRESSION AFTER VARIABLE SELECTION 总被引：1，自引：1，他引：0

Paul Kabaila 《Australian & New Zealand Journal of Statistics》2005,47(4):549-562

This paper considers a linear regression model with regression parameter vector β. The parameter of interest is θ= a^Tβ where a is specified. When, as a first step, a data‐based variable selection (e.g. minimum Akaike information criterion) is used to select a model, it is common statistical practice to then carry out inference about θ, using the same data, based on the (false) assumption that the selected model had been provided a priori. The paper considers a confidence interval for θ with nominal coverage 1 ‐ α constructed on this (false) assumption, and calls this the naive 1 ‐ α confidence interval. The minimum coverage probability of this confidence interval can be calculated for simple variable selection procedures involving only a single variable. However, the kinds of variable selection procedures used in practice are typically much more complicated. For the real‐life data presented in this paper, there are 20 variables each of which is to be either included or not, leading to 2²⁰ different models. The coverage probability at any given value of the parameters provides an upper bound on the minimum coverage probability of the naive confidence interval. This paper derives a new Monte Carlo simulation estimator of the coverage probability, which uses conditioning for variance reduction. For these real‐life data, the gain in efficiency of this Monte Carlo simulation due to conditioning ranged from 2 to 6. The paper also presents a simple one‐dimensional search strategy for parameter values at which the coverage probability is relatively small. For these real‐life data, this search leads to parameter values for which the coverage probability of the naive 0.95 confidence interval is 0.79 for variable selection using the Akaike information criterion and 0.70 for variable selection using Bayes information criterion, showing that these confidence intervals are completely inadequate. 相似文献

12.

Minimum Contrast Empirical Likelihood Inference of Discontinuity in Density*

Jun Ma Hugo Jales Zhengfei Yu 《商业与经济统计学杂志》2020,38(4):934-950

Abstract

This article investigates the asymptotic properties of a simple empirical-likelihood-based inference method for discontinuity in density. The parameter of interest is a function of two one-sided limits of the probability density function at (possibly) two cut-off points. Our approach is based on the first-order conditions from a minimum contrast problem. We investigate both first-order and second-order properties of the proposed method. We characterize the leading coverage error of our inference method and propose a coverage-error-optimal (CE-optimal, hereafter) bandwidth selector. We show that the empirical likelihood ratio statistic is Bartlett correctable. An important special case is the manipulation testing problem in a regression discontinuity design (RDD), where the parameter of interest is the density difference at a known threshold. In RDD, the continuity of the density of the assignment variable at the threshold is considered as a “no-manipulation” behavioral assumption, which is a testable implication of an identifying condition for the local average treatment effect. When specialized to the manipulation testing problem, the CE-optimal bandwidth selector has an explicit form. We propose a data-driven CE-optimal bandwidth selector for use in practice. Results from Monte Carlo simulations are presented. Usefulness of our method is illustrated by an empirical example. 相似文献

13.

On the Sum of Symmetric Random Variables

Robert Chen Larry A. Shepp 《The American statistician》2013,67(3)

Let U and V be two symmetric (about zero) random variables with U + V symmetric about C; here C is a constant. It is easy to see that if U and V are mutually independent, or if both U and V satisfy the weak law of large numbers, then C = 0. So, intuitively, we would suspect that C = 0 in general. However, we show that there exist two random variables U and V symmetric about 0 with U + V symmetric about C ≠ 0 The example given is closely related to one given by Alejandro D. De Acosta in another context. 相似文献

14.

SOME LAW OF THE ITERATED LOGARITHM TYPE RESULTS FOR THE EMPIRICAL PROCESS

Galen R. Shorack 《Australian & New Zealand Journal of Statistics》1980,22(1):50-59

We consider Z^±_n= sup_{0< t ≤ 1/2}2 U^±_n (t)/(t(1- t))^1/2, where + and -denote the positive and negative parts respectively of the sample paths of the empirical process U_n. U^±_n and U_n are seen to behave rather differently, which is tied to the asymmetry of the binomial distribution, or to the asymmetry of the distribution of small order statistics. Csáki (1975) showed that log Z^±_n/log₂n is the appropriate normalization for a law of the iterated logarithm (LIL) for Z^±_n we show that Z^-_n/(2 log₂n)^1/2 is the appropriate normalization for Z^-_n. Csörgö & Révész (1975) posed the question: if we replace the sup over (0,1/2) above, by -the sup over [a_n, 1-a_n] where a_n→0, how fast can a_n→0 and still have |Z_n|/(2 log₂n)^1/2 maintain a finite lim sup a.s.? This question is answered herein. The techniques developed are then used in Section 4 to give an interesting new proof of the upper class half of a result of Chung (1949) for |U_n(t)|. The proofs draw heavily on James (1975); two basic inequalities of that paper are strengthened to their potential, and are felt to be of independent interest. 相似文献

15.

Inferactive data analysis

Nan Bi Jelena Markovic Lucy Xia Jonathan Taylor 《Scandinavian Journal of Statistics》2020,47(1):212-249

We describe inferactive data analysis, so-named to denote an interactive approach to data analysis with an emphasis on inference after data analysis. Our approach is a compromise between Tukey's exploratory and confirmatory data analysis allowing also for Bayesian data analysis. We see this as a useful step in concrete providing tools (with statistical guarantees) for current data scientists. The basis of inference we use is (a conditional approach to) selective inference, in particular its randomized form. The relevant reference distributions are constructed from what we call a DAG-DAG—a Data Analysis Generative DAG, and a selective change of variables formula is crucial to any practical implementation of inferactive data analysis via sampling these distributions. We discuss a canonical example of an incomplete cross-validation test statistic to discriminate between black box models, and a real HIV dataset example to illustrate inference after making multiple queries on data. 相似文献

16.

Post hoc false positive control for structured hypotheses

Guillermo Durand Gilles Blanchard Pierre Neuvial Etienne Roquain 《Scandinavian Journal of Statistics》2020,47(4):1114-1148

In a high-dimensional multiple testing framework, we present new confidence bounds on the false positives contained in subsets S of selected null hypotheses. These bounds are post hoc in the sense that the coverage probability holds simultaneously over all S, possibly chosen depending on the data. This article focuses on the common case of structured null hypotheses, for example, along a tree, a hierarchy, or geometrically (spatially or temporally). Following recent advances in post hoc inference, we build confidence bounds for some prespecified forest-structured subsets and deduce a bound for any subset S by interpolation. The proposed bounds are shown to improve substantially previous ones when the signal is locally structured. Our findings are supported both by theoretical results and numerical experiments. Moreover, our bounds can be obtained by an algorithm (with complexity bilinear in the sizes of the reference hierarchy and of the selected subset) that is implemented in the open-source R package sansSouci available from https://github.com/pneuvial/sanssouci , making our approach operational. 相似文献

17.

Functional Autoregression for Sparsely Sampled Data

Daniel R. Kowal David S. Matteson David Ruppert 《商业与经济统计学杂志》2019,37(1):97-109

We develop a hierarchical Gaussian process model for forecasting and inference of functional time series data. Unlike existing methods, our approach is especially suited for sparsely or irregularly sampled curves and for curves sampled with nonnegligible measurement error. The latent process is dynamically modeled as a functional autoregression (FAR) with Gaussian process innovations. We propose a fully nonparametric dynamic functional factor model for the dynamic innovation process, with broader applicability and improved computational efficiency over standard Gaussian process models. We prove finite-sample forecasting and interpolation optimality properties of the proposed model, which remain valid with the Gaussian assumption relaxed. An efficient Gibbs sampling algorithm is developed for estimation, inference, and forecasting, with extensions for FAR(p) models with model averaging over the lag p. Extensive simulations demonstrate substantial improvements in forecasting performance and recovery of the autoregressive surface over competing methods, especially under sparse designs. We apply the proposed methods to forecast nominal and real yield curves using daily U.S. data. Real yields are observed more sparsely than nominal yields, yet the proposed methods are highly competitive in both settings. Supplementary materials, including R code and the yield curve data, are available online. 相似文献

18.

Self‐normalization for Spatial Data

Xianyang Zhang Bo Li Xiaofeng Shao 《Scandinavian Journal of Statistics》2014,41(2):311-324

This paper considers inference for both spatial lattice data with possibly irregularly shaped sampling region and non‐lattice data, by extending the recently proposed self‐normalization (SN) approach from stationary time series to the spatial setup. A nice feature of the SN method is that it avoids the choice of tuning parameters, which are usually required for other non‐parametric inference approaches. The extension is non‐trivial as spatial data has no natural one‐directional time ordering. The SN‐based inference is convenient to implement and is shown through simulation studies to provide more accurate coverage compared with the widely used subsampling approach. We also illustrate the idea of SN using a real data example. 相似文献

19.

On An Intriguing Distributional Identity

M. C. Jones Éric Marchand William E. Strawderman 《The American statistician》2019,73(1):16-21

For a continuous random variable X with support equal to (a, b), with c.d.f. F, and g: Ω₁ → Ω₂ a continuous, strictly increasing function, such that Ω₁∩Ω₂?(a, b), but otherwise arbitrary, we establish that the random variables F(X) ? F(g(X)) and F(g^{? 1}(X)) ? F(X) have the same distribution. Further developments, accompanied by illustrations and observations, address as well the equidistribution identity U ? ψ(U) = ^dψ^{? 1}(U) ? U for U ～ U(0, 1), where ψ is a continuous, strictly increasing and onto function, but otherwise arbitrary. Finally, we expand on applications with connections to variance reduction techniques, the discrepancy between distributions, and a risk identity in predictive density estimation. 相似文献

20.

Testing for two states in a hidden Markov model

Jörn Dannemann Hajo Holzmann 《Revue canadienne de statistique》2008,36(4):505-520

The authors consider hidden Markov models (HMMs) whose latent process has m ≥ 2 states and whose state‐dependent distributions arise from a general one‐parameter family. They propose a test of the hypothesis m = 2. Their procedure is an extension to HMMs of the modified likelihood ratio statistic proposed by Chen, Chen & Kalbfleisch (2004) for testing two states in a finite mixture. The authors determine the asymptotic distribution of their test under the hypothesis m = 2 and investigate its finite‐sample properties in a simulation study. Their test is based on inference for the marginal mixture distribution of the HMM. In order to illustrate the additional difficulties due to the dependence structure of the HMM, they show how to test general regular hypotheses on the marginal mixture of HMMs via a quasi‐modified likelihood ratio. They also discuss two applications. 相似文献