首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Quasi-random sequences are known to give efficient numerical integration rules in many Bayesian statistical problems where the posterior distribution can be transformed into periodic functions on then-dimensional hypercube. From this idea we develop a quasi-random approach to the generation of resamples used for Monte Carlo approximations to bootstrap estimates of bias, variance and distribution functions. We demonstrate a major difference between quasi-random bootstrap resamples, which are generated by deterministic algorithms and have no true randomness, and the usual pseudo-random bootstrap resamples generated by the classical bootstrap approach. Various quasi-random approaches are considered and are shown via a simulation study to result in approximants that are competitive in terms of efficiency when compared with other bootstrap Monte Carlo procedures such as balanced and antithetic resampling.  相似文献   

2.
Importance resampling is an approach that uses exponential tilting to reduce the resampling necessary for the construction of nonparametric bootstrap confidence intervals. The properties of bootstrap importance confidence intervals are well established when the data is a smooth function of means and when there is no censoring. However, in the framework of survival or time-to-event data, the asymptotic properties of importance resampling have not been rigorously studied, mainly because of the unduly complicated theory incurred when data is censored. This paper uses extensive simulation to show that, for parameter estimates arising from fitting Cox proportional hazards models, importance bootstrap confidence intervals can be constructed if the importance resampling probabilities of the records for the n individuals in the study are determined by the empirical influence function for the parameter of interest. Our results show that, compared to uniform resampling, importance resampling improves the relative mean-squared-error (MSE) efficiency by a factor of nine (for n = 200). The efficiency increases significantly with sample size, is mildly associated with the amount of censoring, but decreases slightly as the number of bootstrap resamples increases. The extra CPU time requirement for calculating importance resamples is negligible when compared to the large improvement in MSE efficiency. The method is illustrated through an application to data on chronic lymphocytic leukemia, which highlights that the bootstrap confidence interval is the preferred alternative to large sample inferences when the distribution of a specific covariate deviates from normality. Our results imply that, because of its computational efficiency, importance resampling is recommended whenever bootstrap methodology is implemented in a survival framework. Its use is particularly important when complex covariates are involved or the survival problem to be solved is part of a larger problem; for instance, when determining confidence bounds for models linking survival time with clusters identified in gene expression microarray data.  相似文献   

3.
It is widely believed that the number of resamples required for bootstrap variance estimation is relatively small An argument based on the unconditional coefficient of variation of the Monte Carlo approximation, suggests that as few as 25 resamples will give reasonable results. In this article we argue that the number of resamples should, in fact, be determined by the conditional coefficient of variation, involving only resampling variability. Our conditional analysis is founded on a belief that Monte Carlo error should not be allowed to determine the conclusions of a statistical analysis and indicates that approximately 800 resamples are required for this purpose. The argument can be generalized to the multivariate setting and a simple formula is given for determining a lower bound on the number of resamples required to approximate an m-dimensional bootstrap variance-covariance matrix.  相似文献   

4.
The study of the effect of a treatment may involve the evaluation of a variable at a number of moments. When assuming a smooth curve for the mean response along time, estimation can be afforded by spline regression, in the context of generalized additive models. The novelty of our work lies in the construction of hypothesis tests to compare two curves of treatments in any interval of time for several types of response variables. The within-subject correlation is not modeled but is considered to obtain valid inferences by the use of bootstrap. We propose both semiparametric and nonparametric bootstrap approaches, based on resampling vectors of residuals or responses, respectively. Simulation studies revealed a good performance of the tests, considering, for the outcome, different distribution functions in the exponential family and varying the correlation between observations along time. We show that the sizes of bootstrap tests are close to the nominal value, with tests based on a standardized statistic having slightly better size properties. The power increases as the distance between curves increases and decreases when correlation gets higher. The usefulness of these statistical tools was confirmed using real data, thus allowing to detect changes in fish behavior when exposed to the toxin microcystin-RR.  相似文献   

5.
Traditional resampling methods for estimating sampling distributions sometimes fail, and alternative approaches are then needed. For example, if the classical central limit theorem does not hold and the naïve bootstrap fails, the m/n bootstrap, based on smaller-sized resamples, may be used as an alternative. An alternative to the naïve bootstrap, the sufficient bootstrap, which uses only the distinct observations in a bootstrap sample, is another recently proposed bootstrap approach that has been suggested to reduce the computational burden associated with bootstrapping. It works as long as naïve bootstrap does. However, if the naïve bootstrap fails, so will the sufficient bootstrap. In this paper, we propose combining the sufficient bootstrap with the m/n bootstrap in order to both regain consistent estimation of sampling distributions and to reduce the computational burden of the bootstrap. We obtain necessary and sufficient conditions for asymptotic normality of the proposed method, and propose new values for the resample size m. We compare the proposed method with the naïve bootstrap, the sufficient bootstrap, and the m/n bootstrap by simulation.  相似文献   

6.
For boundary problems present in wavelet regression, two common methods are usually considered: polynomial wavelet regression (PWR) and hybrid local polynomial wavelet regression (LPWR). Normality assumption played a key role for making such choices for the order of the low-order polynomial, the wavelet thresholding value and other calculations involved in LPWR. However, in practice, the normality assumption may not be valid. In this paper, for PWR, we propose three automatic robust methods based on: MM-estimator, bootstrap and robust threshold procedure. For LPWR, the use of a robust local polynomial (RLP) estimator with a robust threshold procedure has been investigated. The proposed methods do not require any knowledge of noise distribution, are easy to implement and achieve high performances when only a small amount of data is in hand. A simulation study is conducted to assess the numerical performance of the proposed methods.  相似文献   

7.
Summary: One specific problem statistical offices and research institutes are faced with when releasing microdata is the preservation of confidentiality. Traditional methods to avoid disclosure often destroy the structure of the data, and information loss is potentially high. In this paper an alternative technique of creating scientific–use files is discussed, which reproduces the characteristics of the original data quite well. It is based on Fienberg (1997, 1994) who estimates and resamples from the empirical multivariate cumulative distribution function of the data in order to get synthetic data. The procedure creates data sets – the resample – which have the same characteristics as the original survey data. The paper includes some applications of this method with (a) simulated data and (b) innovation survey data, the Mannheim Innovation Panel (MIP), and a comparison between resampling and a common method of disclosure control (disturbance with multiplicative error) with regard to confidentiality on the one hand and the appropriateness of the disturbed data for different kinds of analyses on the other. The results show that univariate distributions can be better reproduced by unweighted resampling. Parameter estimates can be reproduced quite well if the resampling procedure implements the correlation structure of the original data as a scale or if the data is multiplicatively perturbed and a correction term is used. On average, anonymization of data with multiplicatively perturbed values protects better against re–identification than the various resampling methods used.  相似文献   

8.
Alternative methods of estimating properties of unknown distributions include the bootstrap and the smoothed bootstrap. In the standard bootstrap setting, Johns (1988) introduced an importance resam¬pling procedure that results in more accurate approximation to the bootstrap estimate of a distribution function or a quantile. With a suitable “exponential tilting” similar to that used by Johns, we derived a smoothed version of importance resampling in the framework of the smoothed bootstrap. Smoothed importance resampling procedures were developed for the estimation of distribution functions of the Studentized mean, the Studentized variance, and the correlation coefficient. Implementation of these procedures are presented via simulation results which concentrate on the problem of estimation of distribution functions of the Studentized mean and Studentized variance for different sample sizes and various pre-specified smoothing bandwidths for the normal data; additional simulations were conducted for the estimation of quantiles of the distribution of the Studentized mean under an optimal smoothing bandwidth when the original data were simulated from three different parent populations: lognormal, t(3) and t(10). These results suggest that in cases where it is advantageous to use the smoothed bootstrap rather than the standard bootstrap, the amount of resampling necessary might be substantially reduced by the use of importance resampling methods and the efficiency gains depend on the bandwidth used in the kernel density estimation.  相似文献   

9.
A version of the nonparametric bootstrap, which resamples the entire subjects from original data, called the case bootstrap, has been increasingly used for estimating uncertainty of parameters in mixed‐effects models. It is usually applied to obtain more robust estimates of the parameters and more realistic confidence intervals (CIs). Alternative bootstrap methods, such as residual bootstrap and parametric bootstrap that resample both random effects and residuals, have been proposed to better take into account the hierarchical structure of multi‐level and longitudinal data. However, few studies have been performed to compare these different approaches. In this study, we used simulation to evaluate bootstrap methods proposed for linear mixed‐effect models. We also compared the results obtained by maximum likelihood (ML) and restricted maximum likelihood (REML). Our simulation studies evidenced the good performance of the case bootstrap as well as the bootstraps of both random effects and residuals. On the other hand, the bootstrap methods that resample only the residuals and the bootstraps combining case and residuals performed poorly. REML and ML provided similar bootstrap estimates of uncertainty, but there was slightly more bias and poorer coverage rate for variance parameters with ML in the sparse design. We applied the proposed methods to a real dataset from a study investigating the natural evolution of Parkinson's disease and were able to confirm that the methods provide plausible estimates of uncertainty. Given that most real‐life datasets tend to exhibit heterogeneity in sampling schedules, the residual bootstraps would be expected to perform better than the case bootstrap. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

10.
The paper explores statistical features of different resampling schemes under low resampling intensity. The original sample is considered in a very general framework of triangular arrays, without independence or equally distributed assumptions, although improvements under such conditions are also provided. We show that low resampling schemes have very interesting and flexible properties, providing new insights into the performance of widely used resampling methods, including subsampling, two-sample unbalanced permutation statistics or wild bootstrap. It is shown that, under regularity assumptions, resampling tests with critical values derived by the appertaining low resampling procedures are asymptotically valid and there is no loss of power compared with the power function of an ideal (but unfeasible) parametric family of tests. Moreover we show that in several contexts, including regression models, they may act as a filter for the normal part of a limit distribution, turning down the influence of outliers.  相似文献   

11.
We show that the linear process bootstrap (LPB) and the autoregressive sieve bootstrap (AR sieve) are, in general, not valid for statistics whose large-sample distribution depends on moments of order higher than two, irrespective of whether the data come from a linear time series or not. Inspired by the block-of-blocks bootstrap, we circumvent this non-validity by applying the LPB and AR sieve to suitably blocked data and not to the original data itself. In a simulation study, we compare the LPB, AR sieve, and moving block bootstrap applied directly and to blocked data.  相似文献   

12.
We propose a bootstrap technique for generating pseudo-samples from survival data containing censored observations. This simulation selects a survival time with replacement from the data and then assigns a covariate according to the model of proportional hazards. We also develop a constrained bootstrap technique in which every pseudo-sample has the same distribution of covariate values as does the original, observed data. We use these simulation techniques to estimate the bias and variance of regression coefficients and to approximate the significance levels of goodness-of-fit statistics for testing the assumption of the proportional hazards model.  相似文献   

13.
Bootstrapping has been used as a diagnostic tool for validating model results for a wide array of statistical models. Here we evaluate the use of the non-parametric bootstrap for model validation in mixture models. We show that the bootstrap is problematic for validating the results of class enumeration and demonstrating the stability of parameter estimates in both finite mixture and regression mixture models. In only 44% of simulations did bootstrapping detect the correct number of classes in at least 90% of the bootstrap samples for a finite mixture model without any model violations. For regression mixture models and cases with violated model assumptions, the performance was even worse. Consequently, we cannot recommend the non-parametric bootstrap for validating mixture models.

The cause of the problem is that when resampling is used influential individual observations have a high likelihood of being sampled many times. The presence of multiple replications of even moderately extreme observations is shown to lead to additional latent classes being extracted. To verify that these replications cause the problems we show that leave-k-out cross-validation where sub-samples taken without replacement does not suffer from the same problem.  相似文献   


14.
Procedures for detecting change points in sequences of correlated observations (e.g., time series) can help elucidate their complicated structure. Current literature on the detection of multiple change points emphasizes the analysis of sequences of independent random variables. We address the problem of an unknown number of variance changes in the presence of long-range dependence (e.g., long memory processes). Our results are also applicable to time series whose spectrum slowly varies across octave bands. An iterated cumulative sum of squares procedure is introduced in order to look at the multiscale stationarity of a time series; that is, the variance structure of the wavelet coefficients on a scale by scale basis. The discrete wavelet transform enables us to analyze a given time series on a series of physical scales. The result is a partitioning of the wavelet coefficients into locally stationary regions. Simulations are performed to validate the ability of this procedure to detect and locate multiple variance changes. A ‘time’ series of vertical ocean shear measurements is also analyzed, where a variety of nonstationary features are identified.  相似文献   

15.
We describe how to formulate a matching pursuit algorithm which successively approximates a periodic non-stationary time series with orthogonal projections onto elements of a suitable dictionary. We discuss how to construct such dictionaries derived from the maximal overlap (undecimated) discrete wavelet transform (MODWT). Unlike the standard discrete wavelet transform (DWT), the MODWT is equivariant under circular shifts and may be computed for an arbitrary length time series, not necessarily a multiple of a power of 2. We point out that when using the MODWT and continuing past the level where the filters are wrapped, the norms of the dictionary elements may, depending on N, deviate from the required value of unity and require renormalization.We analyse a time series of subtidal sea levels from Crescent City, California. The matching pursuit shows in an iterative fashion how localized dictionary elements (scale and position) account for residual variation, and in particular emphasizes differences in construction for varying parts of the series.  相似文献   

16.
This paper proposes a sufficient bootstrap method, which uses only the unique observations in the resamples, to assess the individual bioequivalence under 2 × 4 randomized crossover design. The finite sample performance of the proposed method is illustrated by extensive Monte Carlo simulations as well as a real‐experimental data set, and the results are compared with those obtained by the traditional bootstrap technique. Our records reveal that the proposed method is a good competitor or even better than the classical percentile bootstrap confidence limits.  相似文献   

17.
We consider the problem of testing for additivity and joint effects in multivariate nonparametric regression when the data are modelled as observations of an unknown response function observed on a d-dimensional (d 2) lattice and contaminated with additive Gaussian noise. We propose tests for additivity and joint effects, appropriate for both homogeneous and inhomogeneous response functions, using the particular structure of the data expanded in tensor product Fourier or wavelet bases studied recently by Amato and Antoniadis (2001) and Amato, Antoniadis and De Feis (2002). The corresponding tests are constructed by applying the adaptive Neyman truncation and wavelet thresholding procedures of Fan (1996), for testing a high-dimensional Gaussian mean, to the resulting empirical Fourier and wavelet coefficients. As a consequence, asymptotic normality of the proposed test statistics under the null hypothesis and lower bounds of the corresponding powers under a specific alternative are derived. We use several simulated examples to illustrate the performance of the proposed tests, and we make comparisons with other tests available in the literature.  相似文献   

18.
This paper proposes a wavelet-based approach to analyze spurious and cointegrated regressions in time series. The approach is based on the properties of the wavelet covariance and correlation in Monte Carlo studies of spurious and cointegrated regression. In the case of the spurious regression, the null hypotheses of zero wavelet covariance and correlation for these series across the scales fail to be rejected. Conversely, these null hypotheses across the scales are rejected for the cointegrated bivariate time series. These nonresidual-based tests are then applied to analyze if any relationship exists between the extraterrestrial phenomenon of sunspots and the earthly economic time series of oil prices. Conventional residual-based tests appear sensitive to the specification in both the cointegrating regression and the lag order in the augmented Dickey–Fuller tests on the residuals. In contrast, the wavelet tests, with their bootstrap t-statistics and confidence intervals, detect the spuriousness of this relationship.  相似文献   

19.
Resampling methods are a common measure to estimate the variance of a statistic of interest when data consist of nonresponse and imputation is used as compensation. Applying resampling methods usually means that subsamples are drawn from the original sample and that variance estimates are computed based on point estimators of several subsamples. However, newer resampling methods such as the rescaling bootstrap of Chipperfield and Preston [Efficient bootstrap for business surveys. Surv Methodol. 2007;33:167–172] include all elements of the original sample in the computation of its point estimator. Thus, procedures to consider imputation in resampling methods cannot be applied in the ordinary way. For such methods, modifications are necessary. This paper presents an approach applying newer resampling methods for imputed data. The Monte Carlo simulation study conducted in the paper shows that the proposed approach leads to reliable variance estimates in contrast to other modifications.  相似文献   

20.
Image processing through multiscale analysis and measurement noise modeling   总被引:2,自引:0,他引:2  
We describe a range of powerful multiscale analysis methods. We also focus on the pivotal issue of measurement noise in the physical sciences. From multiscale analysis and noise modeling, we develop a comprehensive methodology for data analysis of 2D images, 1D signals (or spectra), and point pattern data. Noise modeling is based on the following: (i) multiscale transforms, including wavelet transforms; (ii) a data structure termed the multiresolution support; and (iii) multiple scale significance testing. The latter two aspects serve to characterize signal with respect to noise. The data analysis objectives we deal with include noise filtering and scale decomposition for visualization or feature detection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号