首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Stochastic gradient descent (SGD) provides a scalable way to compute parameter estimates in applications involving large‐scale data or streaming data. As an alternative version, averaged implicit SGD (AI‐SGD) has been shown to be more stable and more efficient. Although the asymptotic properties of AI‐SGD have been well established, statistical inferences based on it such as interval estimation remain unexplored. The bootstrap method is not computationally feasible because it requires to repeatedly resample from the entire data set. In addition, the plug‐in method is not applicable when there is no explicit covariance matrix formula. In this paper, we propose a scalable statistical inference procedure, which can be used for conducting inferences based on the AI‐SGD estimator. The proposed procedure updates the AI‐SGD estimate as well as many randomly perturbed AI‐SGD estimates, upon the arrival of each observation. We derive some large‐sample theoretical properties of the proposed procedure and examine its performance via simulation studies.  相似文献   

2.
Abstract

Inferential methods based on ranks present robust and powerful alternative methodology for testing and estimation. In this article, two objectives are followed. First, develop a general method of simultaneous confidence intervals based on the rank estimates of the parameters of a general linear model and derive the asymptotic distribution of the pivotal quantity. Second, extend the method to high dimensional data such as gene expression data for which the usual large sample approximation does not apply. It is common in practice to use the asymptotic distribution to make inference for small samples. The empirical investigation in this article shows that for methods based on the rank-estimates, this approach does not produce a viable inference and should be avoided. A method based on the bootstrap is outlined and it is shown to provide a reliable and accurate method of constructing simultaneous confidence intervals based on rank estimates. In particular it is shown that commonly applied methods of normal or t-approximation are not satisfactory, particularly for large-scale inferences. Methods based on ranks are uniquely suitable for analysis of microarray gene expression data since they often involve large scale inferences based on small samples containing a large number of outliers and violate the assumption of normality. A real microarray data is analyzed using the rank-estimate simultaneous confidence intervals. Viability of the proposed method is assessed through a Monte Carlo simulation study under varied assumptions.  相似文献   

3.
Comparison of accuracy between two diagnostic tests can be implemented by investigating the difference in paired Youden indices. However, few literature articles have discussed the inferences for the difference in paired Youden indices. In this paper, we propose an exact confidence interval for the difference in paired Youden indices based on the generalized pivotal quantities. For comparison, the maximum likelihood estimate‐based interval and a bootstrap‐based interval are also included in the study for the difference in paired Youden indices. Abundant simulation studies are conducted to compare the relative performance of these intervals by evaluating the coverage probability and average interval length. Our simulation results demonstrate that the exact confidence interval outperforms the other two intervals even with small sample size when the underlying distributions are normal. A real application is also used to illustrate the proposed intervals. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

4.
收入基尼系数的统计推断   总被引:2,自引:0,他引:2  
 基尼系数估计量的统计推断是基尼系数研究的一个重点。本文我们使用Davidson(2009)提出的近似大样本渐进分布方法,对收入基尼系数估计量进行统计推断,包括计算估计量的标准差、构造置信区间和进行假设检验。通过模拟试验,我们验证了在小样本情形下,依据该方法所做的统计推断具有较高的可靠性。在此基础上,我们对我国城镇居民的真实收入基尼系数进行了统计推断。  相似文献   

5.
Recently, Zhang [Simultaneous confidence intervals for several inverse Gaussian populations. Stat Probab Lett. 2014;92:125–131] proposed simultaneous pairwise confidence intervals (SPCIs) based on the fiducial generalized pivotal quantity concept to make inferences about the inverse Gaussian means under heteroscedasticity. In this paper, we propose three new methods for constructing SPCIs to make inferences on the means of several inverse Gaussian distributions when scale parameters and sample sizes are unequal. One of the methods results in a set of classic SPCIs (in the sense that it is not simulation-based inference) and the two others are based on a parametric bootstrap approach. The advantages of our proposed methods over Zhang’s (2014) method are: (i) the simulation results show that the coverage probability of the proposed parametric bootstrap approaches is fairly close to the nominal confidence coefficient while the coverage probability of Zhang’s method is smaller than the nominal confidence coefficient when the number of groups and the variance of groups are large and (ii) the proposed set of classic SPCIs is conservative in contrast to Zhang’s method.  相似文献   

6.
叶光 《统计研究》2011,28(3):99-106
 针对完全修正最小二乘(full-modified ordinary least square,简称FMOLS)估计方法,给出一种协整参数的自举推断程序,证明零假设下自举统计量与检验统计量具有相同的渐近分布。关于检验功效的研究表明,虽然有约束自举的实际检验水平表现良好,但如果零假设不成立,自举统计量的分布是不确定的,因而其经验分布不能作为检验统计量精确分布的有效估计。实际应用中建议使用无约束自举,因为无论观测数据是否满足零假设,其自举统计量与零假设下检验统计量都具有相同的渐近分布。最后,利用蒙特卡洛模拟对自举推断和渐近推断的有限样本表现进行比较研究。  相似文献   

7.
The problem of estimating standard errors for diagnostic accuracy measures might be challenging for many complicated models. We can address such a problem by using the Bootstrap methods to blunt its technical edge with resampled empirical distributions. We consider two cases where bootstrap methods can successfully improve our knowledge of the sampling variability of the diagnostic accuracy estimators. The first application is to make inference for the area under the ROC curve resulted from a functional logistic regression model which is a sophisticated modelling device to describe the relationship between a dichotomous response and multiple covariates. We consider using this regression method to model the predictive effects of multiple independent variables on the occurrence of a disease. The accuracy measures, such as the area under the ROC curve (AUC) are developed from the functional regression. Asymptotical results for the empirical estimators are provided to facilitate inferences. The second application is to test the difference of two weighted areas under the ROC curve (WAUC) from a paired two sample study. The correlation between the two WAUC complicates the asymptotic distribution of the test statistic. We then employ the bootstrap methods to gain satisfactory inference results. Simulations and examples are supplied in this article to confirm the merits of the bootstrap methods.  相似文献   

8.
The linear regression model is commonly used in applications. One of the assumptions made is that the error variances are constant across all observations. This assumption, known as homoskedasticity, is frequently violated in practice. A commonly used strategy is to estimate the regression parameters by ordinary least squares and to compute standard errors that deliver asymptotically valid inference under both homoskedasticity and heteroskedasticity of an unknown form. Several consistent standard errors have been proposed in the literature, and evaluated in numerical experiments based on their point estimation performance and on the finite sample behaviour of associated hypothesis tests. We build upon the existing literature by constructing heteroskedasticity-consistent interval estimators and numerically evaluating their finite sample performance. Different bootstrap interval estimators are also considered. The numerical results favour the HC4 interval estimator.  相似文献   

9.
In this article, we develop new bootstrap-based inference for noncausal autoregressions with heavy-tailed innovations. This class of models is widely used for modeling bubbles and explosive dynamics in economic and financial time series. In the noncausal, heavy-tail framework, a major drawback of asymptotic inference is that it is not feasible in practice as the relevant limiting distributions depend crucially on the (unknown) decay rate of the tails of the distribution of the innovations. In addition, even in the unrealistic case where the tail behavior is known, asymptotic inference may suffer from small-sample issues. To overcome these difficulties, we propose bootstrap inference procedures using parameter estimates obtained with the null hypothesis imposed (the so-called restricted bootstrap). We discuss three different choices of bootstrap innovations: wild bootstrap, based on Rademacher errors; permutation bootstrap; a combination of the two (“permutation wild bootstrap”). Crucially, implementation of these bootstraps do not require any a priori knowledge about the distribution of the innovations, such as the tail index or the convergence rates of the estimators. We establish sufficient conditions ensuring that, under the null hypothesis, the bootstrap statistics estimate consistently particular conditionaldistributions of the original statistics. In particular, we show that validity of the permutation bootstrap holds without any restrictions on the distribution of the innovations, while the permutation wild and the standard wild bootstraps require further assumptions such as symmetry of the innovation distribution. Extensive Monte Carlo simulations show that the finite sample performance of the proposed bootstrap tests is exceptionally good, both in terms of size and of empirical rejection probabilities under the alternative hypothesis. We conclude by applying the proposed bootstrap inference to Bitcoin/USD exchange rates and to crude oil price data. We find that indeed noncausal models with heavy-tailed innovations are able to fit the data, also in periods of bubble dynamics. Supplementary materials for this article are available online.  相似文献   

10.
Conventional approaches for inference about efficiency in parametric stochastic frontier (PSF) models are based on percentiles of the estimated distribution of the one-sided error term, conditional on the composite error. When used as prediction intervals, coverage is poor when the signal-to-noise ratio is low, but improves slowly as sample size increases. We show that prediction intervals estimated by bagging yield much better coverages than the conventional approach, even with low signal-to-noise ratios. We also present a bootstrap method that gives confidence interval estimates for (conditional) expectations of efficiency, and which have good coverage properties that improve with sample size. In addition, researchers who estimate PSF models typically reject models, samples, or both when residuals have skewness in the “wrong” direction, i.e., in a direction that would seem to indicate absence of inefficiency. We show that correctly specified models can generate samples with “wrongly” skewed residuals, even when the variance of the inefficiency process is nonzero. Both our bagging and bootstrap methods provide useful information about inefficiency and model parameters irrespective of whether residuals have skewness in the desired direction.  相似文献   

11.
This article presents parametric bootstrap (PB) approaches for hypothesis testing and interval estimation for the regression coefficients and the variance components of panel data regression models with complete panels. The PB pivot variables are proposed based on sufficient statistics of the parameters. On the other hand, we also derive generalized inferences and improved generalized inferences for variance components in this article. Some simulation results are presented to compare the performance of the PB approaches with the generalized inferences. Our studies show that the PB approaches perform satisfactorily for various sample sizes and parameter configurations, and the performance of PB approaches is mostly the same as that of generalized inferences with respect to the expected lengths and powers. The PB inferences have almost exact coverage probabilities and Type I error rates. Furthermore, the PB procedure can be simply carried out by a few simulation steps, and the derivation is easier to understand and to be extended to the incomplete panels. Finally, the proposed approaches are illustrated by using a real data example.  相似文献   

12.
The European Union Statistics on Income and Living Conditions (EU-SILC) is the main source of information about poverty and economic inequality in the member states of the European Union. The sample sizes of its annual national surveys are sufficient for reliable estimation at the national level but not for inferences at the sub-national level, failing to respond to a rising demand from policy-makers and local authorities. We provide a comprehensive map of median income, inequality (Gini coefficient and Lorenz curve) and poverty (poverty rates) based on the equivalised household income in the countries in which the EU-SILC is conducted. We study the distribution of income of households (pro-rated to its members), not merely its median (or mean), because we regard its dispersion and frequency of lower extremes (relative poverty) as important characteristics. The estimation for the regions with small sample sizes is improved by the small-area methods. The uncertainty of complex nonlinear statistics is assessed by bootstrap. Household-level sampling weights are taken into account in both the estimates and the associated bootstrap standard errors.  相似文献   

13.
Survival models deal with the time until the occurrence of an event of interest. However, in some situations the event may not occur in part of the studied population. The fraction of the population that will never experience the event of interest is generally called cure rate. Models that consider this fact (cure rate models) have been extensively studied in the literature. Hypothesis testing on the parameters of these models can be performed based on likelihood ratio, gradient, score or Wald statistics. Critical values of these tests are obtained through approximations that are valid in large samples and may result in size distortion in small or moderate sample sizes. In this sense, this paper proposes bootstrap corrections to the four mentioned tests and bootstrap Bartlett correction for the likelihood ratio statistic in the Weibull promotion time model. Besides, we present an algorithm for bootstrap resampling when the data presents cure fraction and right censoring time (random and non-informative). Simulation studies are conducted to compare the finite sample performances of the corrected tests. The numerical evidence favours the corrected tests we propose. We also present an application in an actual data set.  相似文献   

14.
Exact confidence intervals for variances rely on normal distribution assumptions. Alternatively, large-sample confidence intervals for the variance can be attained if one estimates the kurtosis of the underlying distribution. The method used to estimate the kurtosis has a direct impact on the performance of the interval and thus the quality of statistical inferences. In this paper the author considers a number of kurtosis estimators combined with large-sample theory to construct approximate confidence intervals for the variance. In addition, a nonparametric bootstrap resampling procedure is used to build bootstrap confidence intervals for the variance. Simulated coverage probabilities using different confidence interval methods are computed for a variety of sample sizes and distributions. A modification to a conventional estimator of the kurtosis, in conjunction with adjustments to the mean and variance of the asymptotic distribution of a function of the sample variance, improves the resulting coverage values for leptokurtically distributed populations.  相似文献   

15.
ABSTRACT

Regression analysis is one of the important tools in statistics to investigate the relationships among variables. When the sample size is small, however, the assumptions for regression analysis can be violated. This research focuses on using the exact bootstrap to construct confidence intervals for regression parameters in small samples. The comparison of the exact bootstrap method with the basic bootstrap method was carried out by a simulation study. It was found that on a very small sample (n ≈ 5) under Laplace distribution with the independent variable treated as random, the exact bootstrap was more effective than the standard bootstrap confidence interval.  相似文献   

16.
We consider the problem of choosing among a class of possible estimators by selecting the estimator with the smallest bootstrap estimate of finite sample variance. This is an alternative to using cross-validation to choose an estimator adaptively. The problem of a confidence interval based on such an adaptive estimator is considered. We illustrate the ideas by applying the method to the problem of choosing the trimming proportion of an adaptive trimmed mean. It is shown that a bootstrap adaptive trimmed mean is asymptotically normal with an asymptotic variance equal to the smallest among trimmed means. The asymptotic coverage probability of a bootstrap confidence interval based on such adaptive estimators is shown to have the nominal level. The intervals based on the asymptotic normality of the estimator share the same asymptotic result, but have poor small-sample properties compared to the bootstrap intervals. A small-sample simulation demonstrates that bootstrap adaptive trimmed means adapt themselves rather well even for samples of size 10.  相似文献   

17.
Likelihood-ratio tests (LRTs) are often used for inferences on one or more logistic regression coefficients. Conventionally, for given parameters of interest, the nuisance parameters of the likelihood function are replaced by their maximum likelihood estimates. The new function created is called the profile likelihood function, and is used for inference from LRT. In small samples, LRT based on the profile likelihood does not follow χ2 distribution. Several corrections have been proposed to improve LRT when used with small-sample data. Additionally, complete or quasi-complete separation is a common geometric feature for small-sample binary data. In this article, for small-sample binary data, we have derived explicitly the correction factors of LRT for models with and without separation, and proposed an algorithm to construct confidence intervals. We have investigated the performances of different LRT corrections, and the corresponding confidence intervals through simulations. Based on the simulation results, we propose an empirical rule of thumb on the use of these methods. Our simulation findings are also supported by real-world data.  相似文献   

18.
Two-phase stratified sampling has been extensively used in large epidemiologic studies as a way of reducing costs associated with assembling covariate histories and enlarging relative sample sizes of the most informative subgroups. In this article, we investigate case-cohort sampled current status data under the additive risk model assumption. We describe a class of estimating equations, each depending on a different prevalence ratio estimate. Asymptotic properties of the proposed estimators and inference based on the “m out of n” nonparametric bootstrap are investigated. A small simulation study is employed to evaluate the finite sample performance and relative efficiency of the proposed estimators.  相似文献   

19.
This paper establishes the asymptotic validity for the moving block bootstrap as an approximation to the joint distribution of the sum and the maximum of a stationary sequence. An application is made to statistical inference for a positive time series where an extreme value statistic and sample mean provide the maximum likelihood estimates for the model parameters. A simulation study illustrates small sample size behavior of the bootstrap approximation.  相似文献   

20.
In this note we define a composite quantile function estimator in order to improve the accuracy of the classical bootstrap procedure in small sample setting. The composite quantile function estimator employs a parametric model for modelling the tails of the distribution and uses the simple linear interpolation quantile function estimator to estimate quantiles lying between 1/(n+1) and n/(n+1). The method is easily programmed using standard software packages and has general applicability. It is shown that the composite quantile function estimator improves the bootstrap percentile interval coverage for a variety of statistics and is robust to misspecification of the parametric component. Moreover, it is also shown that the composite quantile function based approach surprisingly outperforms the parametric bootstrap for a variety of small sample situations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号