首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper considers distributed inference for two-sample U-statistics under the massive data setting. In order to reduce the computational complexity, this paper proposes distributed two-sample U-statistics and blockwise linear two-sample U-statistics. The blockwise linear two-sample U-statistic, which requires less communication cost, is more computationally efficient especially when the data are stored in different locations. The asymptotic properties of both types of distributed two-sample U-statistics are established. In addition, this paper proposes bootstrap algorithms to approximate the distributions of distributed two-sample U-statistics and blockwise linear two-sample U-statistics for both nondegenerate and degenerate cases. The distributed weighted bootstrap for the distributed two-sample U-statistic is new in the literature. The proposed bootstrap procedures are computationally efficient and are suitable for distributed computing platforms with theoretical guarantees. Extensive numerical studies illustrate that the proposed distributed approaches are feasible and effective.  相似文献   

2.
In this paper, a small-sample asymptotic method is proposed for higher order inference in the stress–strength reliability model, R=P(Y<X), where X and Y are distributed independently as Burr-type X distributions. In a departure from the current literature, we allow the scale parameters of the two distributions to differ, and the likelihood-based third-order inference procedure is applied to obtain inference for R. The difficulty of the implementation of the method is in obtaining the the constrained maximum likelihood estimates (MLE). A penalized likelihood method is proposed to handle the numerical complications of maximizing the constrained likelihood model. The proposed procedures are illustrated using a sample of carbon fibre strength data. Our results from simulation studies comparing the coverage probabilities of the proposed small-sample asymptotic method with some existing large-sample asymptotic methods show that the proposed method is very accurate even when the sample sizes are small.  相似文献   

3.
For a confidence interval (L(X),U(X)) of a parameter θ in one-parameter discrete distributions, the coverage probability is a variable function of θ. The confidence coefficient is the infimum of the coverage probabilities, inf  θ P θ (θ∈(L(X),U(X))). Since we do not know which point in the parameter space the infimum coverage probability occurs at, the exact confidence coefficients are unknown. Beside confidence coefficients, evaluation of a confidence intervals can be based on the average coverage probability. Usually, the exact average probability is also unknown and it was approximated by taking the mean of the coverage probabilities at some randomly chosen points in the parameter space. In this article, methodologies for computing the exact average coverage probabilities as well as the exact confidence coefficients of confidence intervals for one-parameter discrete distributions are proposed. With these methodologies, both exact values can be derived.  相似文献   

4.
Bayesian inference of a generalized Weibull stress‐strength model (SSM) with more than one strength component is considered. For this problem, properly assigning priors for the reliabilities is challenging due to the presence of nuisance parameters. Matching priors, which are priors matching the posterior probabilities of certain regions with their frequentist coverage probabilities, are commonly used but difficult to derive in this problem. Instead, we apply an alternative method and derive a matching prior based on a modification of the profile likelihood. Simulation studies show that this proposed prior performs well in terms of frequentist coverage and estimation even when the sample sizes are minimal. The prior is applied to two real datasets. The Canadian Journal of Statistics 41: 83–97; 2013 © 2012 Statistical Society of Canada  相似文献   

5.
Let X1X2,.be i.i.d. random variables and let Un= (n r)-1S?(n,r) h (Xi1,., Xir,) be a U-statistic with EUn= v, v unknown. Assume that g(X1) =E[h(X1,.,Xr) - v |X1]has a strictly positive variance s?2. Further, let a be such that φ(a) - φ(-a) =α for fixed α, 0 < α < 1, where φ is the standard normal d.f., and let S2n be the Jackknife estimator of n Var Un. Consider the stopping times N(d)= min {n: S2n: + n-12a-2},d > 0, and a confidence interval for v of length 2d,of the form In,d= [Un,-d, Un + d]. We assume that Var Un is unknown, and hence, no fixed sample size method is available for finding a confidence interval for v of prescribed width 2d and prescribed coverage probability α Turning to a sequential procedure, let IN(d),d be a sequence of sequential confidence intervals for v. The asymptotic consistency of this procedure, i.e. limd → 0P(v ∈ IN(d),d)=α follows from Sproule (1969). In this paper, the rate at which |P(v ∈ IN(d),d) converges to α is investigated. We obtain that |P(v ∈ IN(d),d) - α| = 0 (d1/2-(1+k)/2(1+m)), d → 0, where K = max {0,4 - m}, under the condition that E|h(X1, Xr)|m < ∞m > 2. This improves and extends recent results of Ghosh & DasGupta (1980) and Mukhopadhyay (1981).  相似文献   

6.
We obtain adjustments to the profile likelihood function in Weibull regression models with and without censoring. Specifically, we consider two different modified profile likelihoods: (i) the one proposed by Cox and Reid [Cox, D.R. and Reid, N., 1987, Parameter orthogonality and approximate conditional inference. Journal of the Royal Statistical Society B, 49, 1–39.], and (ii) an approximation to the one proposed by Barndorff–Nielsen [Barndorff–Nielsen, O.E., 1983, On a formula for the distribution of the maximum likelihood estimator. Biometrika, 70, 343–365.], the approximation having been obtained using the results by Fraser and Reid [Fraser, D.A.S. and Reid, N., 1995, Ancillaries and third-order significance. Utilitas Mathematica, 47, 33–53.] and by Fraser et al. [Fraser, D.A.S., Reid, N. and Wu, J., 1999, A simple formula for tail probabilities for frequentist and Bayesian inference. Biometrika, 86, 655–661.]. We focus on point estimation and likelihood ratio tests on the shape parameter in the class of Weibull regression models. We derive some distributional properties of the different maximum likelihood estimators and likelihood ratio tests. The numerical evidence presented in the paper favors the approximation to Barndorff–Nielsen's adjustment.  相似文献   

7.
In this article the author investigates the application of the empirical‐likelihood‐based inference for the parameters of varying‐coefficient single‐index model (VCSIM). Unlike the usual cases, if there is no bias correction the asymptotic distribution of the empirical likelihood ratio cannot achieve the standard chi‐squared distribution. To this end, a bias‐corrected empirical likelihood method is employed to construct the confidence regions (intervals) of regression parameters, which have two advantages, compared with those based on normal approximation, that is, (1) they do not impose prior constraints on the shape of the regions; (2) they do not require the construction of a pivotal quantity and the regions are range preserving and transformation respecting. A simulation study is undertaken to compare the empirical likelihood with the normal approximation in terms of coverage accuracies and average areas/lengths of confidence regions/intervals. A real data example is given to illustrate the proposed approach. The Canadian Journal of Statistics 38: 434–452; 2010 © 2010 Statistical Society of Canada  相似文献   

8.
Consider a nonparametric nonseparable regression model Y = ?(Z, U), where ?(Z, U) is strictly increasing in U and UU[0, 1]. We suppose that there exists an instrument W that is independent of U. The observable random variables are Y, Z, and W, all one-dimensional. We construct test statistics for the hypothesis that Z is exogenous, that is, that U is independent of Z. The test statistics are based on the observation that Z is exogenous if and only if V = FY|Z(Y|Z) is independent of W, and hence they do not require the estimation of the function ?. The asymptotic properties of the proposed tests are proved, and a bootstrap approximation of the critical values of the tests is shown to be consistent and to work for finite samples via simulations. An empirical example using the U.K. Family Expenditure Survey is also given. As a byproduct of our results we obtain the asymptotic properties of a kernel estimator of the distribution of V, which equals U when Z is exogenous. We show that this estimator converges to the uniform distribution at faster rate than the parametric n? 1/2-rate.  相似文献   

9.
ABSTRACT

We investigated the empirical likelihood inference approach under a general class of semiparametric hazards regression models with survival data subject to right-censoring. An empirical likelihood ratio for the full 2p regression parameters involved in the model is obtained. We showed that it converged weakly to a random variable which could be written as a weighted sum of 2p independent chi-squared variables with one degree of freedom. Using this, we could construct a confidence region for parameters. We also suggested an adjusted version for the preceding statistic, whose limit followed a standard chi-squared distribution with 2p degrees of freedom.  相似文献   

10.
Periodic functions have many applications in astronomy. They can be used to model the magnitude of light intensity of the period variable stars that their brightness vary with time. Because the data related to the astronomical applications are commonly observed at the time points that are not regularly spaced, the use of the periodogram as a good tool for estimating period is highlighted. Our bootstrap inference about period is based on maximizing the periodogram and consists of percentile two-sided bootstrap confidence intervals construction for the true period. We also obtain their coverage levels theoretically, and discuss the benefit of double-bootstrap confidence intervals for the parameter by which the coverage levels are substantially improved. Precisely, we show that the coverage error of single-bootstrap confidence intervals is of order n ?1, decreasing to order n ?2 when applying double-bootstrap methods. The simulation study given here is a numerical assessment of the theoretical work.  相似文献   

11.
This paper considers a linear regression model with regression parameter vector β. The parameter of interest is θ= aTβ where a is specified. When, as a first step, a data‐based variable selection (e.g. minimum Akaike information criterion) is used to select a model, it is common statistical practice to then carry out inference about θ, using the same data, based on the (false) assumption that the selected model had been provided a priori. The paper considers a confidence interval for θ with nominal coverage 1 ‐ α constructed on this (false) assumption, and calls this the naive 1 ‐ α confidence interval. The minimum coverage probability of this confidence interval can be calculated for simple variable selection procedures involving only a single variable. However, the kinds of variable selection procedures used in practice are typically much more complicated. For the real‐life data presented in this paper, there are 20 variables each of which is to be either included or not, leading to 220 different models. The coverage probability at any given value of the parameters provides an upper bound on the minimum coverage probability of the naive confidence interval. This paper derives a new Monte Carlo simulation estimator of the coverage probability, which uses conditioning for variance reduction. For these real‐life data, the gain in efficiency of this Monte Carlo simulation due to conditioning ranged from 2 to 6. The paper also presents a simple one‐dimensional search strategy for parameter values at which the coverage probability is relatively small. For these real‐life data, this search leads to parameter values for which the coverage probability of the naive 0.95 confidence interval is 0.79 for variable selection using the Akaike information criterion and 0.70 for variable selection using Bayes information criterion, showing that these confidence intervals are completely inadequate.  相似文献   

12.
Abstract

This article investigates the asymptotic properties of a simple empirical-likelihood-based inference method for discontinuity in density. The parameter of interest is a function of two one-sided limits of the probability density function at (possibly) two cut-off points. Our approach is based on the first-order conditions from a minimum contrast problem. We investigate both first-order and second-order properties of the proposed method. We characterize the leading coverage error of our inference method and propose a coverage-error-optimal (CE-optimal, hereafter) bandwidth selector. We show that the empirical likelihood ratio statistic is Bartlett correctable. An important special case is the manipulation testing problem in a regression discontinuity design (RDD), where the parameter of interest is the density difference at a known threshold. In RDD, the continuity of the density of the assignment variable at the threshold is considered as a “no-manipulation” behavioral assumption, which is a testable implication of an identifying condition for the local average treatment effect. When specialized to the manipulation testing problem, the CE-optimal bandwidth selector has an explicit form. We propose a data-driven CE-optimal bandwidth selector for use in practice. Results from Monte Carlo simulations are presented. Usefulness of our method is illustrated by an empirical example.  相似文献   

13.
Let U and V be two symmetric (about zero) random variables with U + V symmetric about C; here C is a constant. It is easy to see that if U and V are mutually independent, or if both U and V satisfy the weak law of large numbers, then C = 0. So, intuitively, we would suspect that C = 0 in general. However, we show that there exist two random variables U and V symmetric about 0 with U + V symmetric about C ≠ 0 The example given is closely related to one given by Alejandro D. De Acosta in another context.  相似文献   

14.
We consider Z±n= sup0< t ≤ 1/22 U±n (t)/(t(1- t))1/2, where + and -denote the positive and negative parts respectively of the sample paths of the empirical process Un. U±n and Un are seen to behave rather differently, which is tied to the asymmetry of the binomial distribution, or to the asymmetry of the distribution of small order statistics. Csáki (1975) showed that log Z±n/log2n is the appropriate normalization for a law of the iterated logarithm (LIL) for Z±n we show that Z-n/(2 log2n)1/2 is the appropriate normalization for Z-n. Csörgö & Révész (1975) posed the question: if we replace the sup over (0,1/2) above, by -the sup over [an, 1-an] where an→0, how fast can an→0 and still have |Zn|/(2 log2n)1/2 maintain a finite lim sup a.s.? This question is answered herein. The techniques developed are then used in Section 4 to give an interesting new proof of the upper class half of a result of Chung (1949) for |Un(t)|. The proofs draw heavily on James (1975); two basic inequalities of that paper are strengthened to their potential, and are felt to be of independent interest.  相似文献   

15.
We describe inferactive data analysis, so-named to denote an interactive approach to data analysis with an emphasis on inference after data analysis. Our approach is a compromise between Tukey's exploratory and confirmatory data analysis allowing also for Bayesian data analysis. We see this as a useful step in concrete providing tools (with statistical guarantees) for current data scientists. The basis of inference we use is (a conditional approach to) selective inference, in particular its randomized form. The relevant reference distributions are constructed from what we call a DAG-DAG—a Data Analysis Generative DAG, and a selective change of variables formula is crucial to any practical implementation of inferactive data analysis via sampling these distributions. We discuss a canonical example of an incomplete cross-validation test statistic to discriminate between black box models, and a real HIV dataset example to illustrate inference after making multiple queries on data.  相似文献   

16.
In a high-dimensional multiple testing framework, we present new confidence bounds on the false positives contained in subsets S of selected null hypotheses. These bounds are post hoc in the sense that the coverage probability holds simultaneously over all S, possibly chosen depending on the data. This article focuses on the common case of structured null hypotheses, for example, along a tree, a hierarchy, or geometrically (spatially or temporally). Following recent advances in post hoc inference, we build confidence bounds for some prespecified forest-structured subsets and deduce a bound for any subset S by interpolation. The proposed bounds are shown to improve substantially previous ones when the signal is locally structured. Our findings are supported both by theoretical results and numerical experiments. Moreover, our bounds can be obtained by an algorithm (with complexity bilinear in the sizes of the reference hierarchy and of the selected subset) that is implemented in the open-source R package sansSouci available from https://github.com/pneuvial/sanssouci , making our approach operational.  相似文献   

17.
We develop a hierarchical Gaussian process model for forecasting and inference of functional time series data. Unlike existing methods, our approach is especially suited for sparsely or irregularly sampled curves and for curves sampled with nonnegligible measurement error. The latent process is dynamically modeled as a functional autoregression (FAR) with Gaussian process innovations. We propose a fully nonparametric dynamic functional factor model for the dynamic innovation process, with broader applicability and improved computational efficiency over standard Gaussian process models. We prove finite-sample forecasting and interpolation optimality properties of the proposed model, which remain valid with the Gaussian assumption relaxed. An efficient Gibbs sampling algorithm is developed for estimation, inference, and forecasting, with extensions for FAR(p) models with model averaging over the lag p. Extensive simulations demonstrate substantial improvements in forecasting performance and recovery of the autoregressive surface over competing methods, especially under sparse designs. We apply the proposed methods to forecast nominal and real yield curves using daily U.S. data. Real yields are observed more sparsely than nominal yields, yet the proposed methods are highly competitive in both settings. Supplementary materials, including R code and the yield curve data, are available online.  相似文献   

18.
This paper considers inference for both spatial lattice data with possibly irregularly shaped sampling region and non‐lattice data, by extending the recently proposed self‐normalization (SN) approach from stationary time series to the spatial setup. A nice feature of the SN method is that it avoids the choice of tuning parameters, which are usually required for other non‐parametric inference approaches. The extension is non‐trivial as spatial data has no natural one‐directional time ordering. The SN‐based inference is convenient to implement and is shown through simulation studies to provide more accurate coverage compared with the widely used subsampling approach. We also illustrate the idea of SN using a real data example.  相似文献   

19.
For a continuous random variable X with support equal to (a, b), with c.d.f. F, and g: Ω1 → Ω2 a continuous, strictly increasing function, such that Ω1∩Ω2?(a, b), but otherwise arbitrary, we establish that the random variables F(X) ? F(g(X)) and F(g? 1(X)) ? F(X) have the same distribution. Further developments, accompanied by illustrations and observations, address as well the equidistribution identity U ? ψ(U) = dψ? 1(U) ? U for UU(0, 1), where ψ is a continuous, strictly increasing and onto function, but otherwise arbitrary. Finally, we expand on applications with connections to variance reduction techniques, the discrepancy between distributions, and a risk identity in predictive density estimation.  相似文献   

20.
The authors consider hidden Markov models (HMMs) whose latent process has m ≥ 2 states and whose state‐dependent distributions arise from a general one‐parameter family. They propose a test of the hypothesis m = 2. Their procedure is an extension to HMMs of the modified likelihood ratio statistic proposed by Chen, Chen & Kalbfleisch (2004) for testing two states in a finite mixture. The authors determine the asymptotic distribution of their test under the hypothesis m = 2 and investigate its finite‐sample properties in a simulation study. Their test is based on inference for the marginal mixture distribution of the HMM. In order to illustrate the additional difficulties due to the dependence structure of the HMM, they show how to test general regular hypotheses on the marginal mixture of HMMs via a quasi‐modified likelihood ratio. They also discuss two applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号