期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Validation of Nonparametric Two-sample Bootstrap in ROC Analysis on Large Datasets

Jin Chu Wu Alvin F. Martin Raghu N. Kacker 《统计学通讯:模拟与计算》2016,45(5):1689-1703

The nonparametric two-sample bootstrap is applied to computing uncertainties of measures in receiver operating characteristic (ROC) analysis on large datasets in areas such as biometrics, speaker recognition, etc. when the analytical method cannot be used. Its validation was studied by computing the standard errors of the area under ROC curve using the well-established analytical Mann–Whitney statistic method and also using the bootstrap. The analytical result is unique. The bootstrap results are expressed as a probability distribution due to its stochastic nature. The comparisons were carried out using relative errors and hypothesis testing. These match very well. This validation provides a sound foundation for such computations. 相似文献

2.

Comparing the performance of normality tests with ROC analysis and confidence intervals

Miguel Patrício Fábio Ferreira Bárbara Oliveiros Francisco Caramelo 《统计学通讯:模拟与计算》2017,46(10):7535-7551

There are several statistical hypothesis tests available for assessing normality assumptions, which is an a priori requirement for most parametric statistical procedures. The usual method for comparing the performances of normality tests is to use Monte Carlo simulations to obtain point estimates for the corresponding powers. The aim of this work is to improve the assessment of 9 normality hypothesis tests. For that purpose, random samples were drawn from several symmetric and asymmetric nonnormal distributions and Monte Carlo simulations were carried out to compute confidence intervals for the power achieved, for each distribution, by two of the most usual normality tests, Kolmogorov–Smirnov with Lilliefors correction and Shapiro–Wilk. In addition, the specificity was computed for each test, again resorting to Monte Carlo simulations, taking samples from standard normal distributions. The analysis was then additionally extended to the Anderson–Darling, Cramer-Von Mises, Pearson chi-square Shapiro–Francia, Jarque–Bera, D'Agostino and uncorrected Kolmogorov–Smirnov tests by determining confidence intervals for the areas under the receiver operating characteristic curves. Simulations were performed to this end, wherein for each sample from a nonnormal distribution an equal-sized sample was taken from a normal distribution. The Shapiro–Wilk test was seen to have the best global performance overall, though in some circumstances the Shapiro–Francia or the D'Agostino tests offered better results. The differences between the tests were not as clear for smaller sample sizes. Also to be noted, the SW and KS tests performed generally quite poorly in distinguishing between samples drawn from normal distributions and t Student distributions. 相似文献

3.

Ignorability conditions for frequentist non parametric analysis of conditional distributions with incomplete data

Shaun Bender Daniel F. Heitjan 《统计学通讯:理论与方法》2017,46(11):5252-5264

Rubin (1976 Rubin, D.B. (1976). Inference and missing data. Biometrika 63(3):581–592.[Crossref], [Web of Science ®] , [Google Scholar]) derived general conditions under which inferences that ignore missing data are valid. These conditions are sufficient but not generally necessary, and therefore may be relaxed in some special cases. We consider here the case of frequentist estimation of a conditional cdf subject to missing outcomes. We partition a set of data into outcome, conditioning, and latent variables, all of which potentially affect the probability of a missing response. We describe sufficient conditions under which a complete-case estimate of the conditional cdf of the outcome given the conditioning variable is unbiased. We use simulations on a renal transplant data set (Dienemann et al.) to illustrate the implications of these results. 相似文献

4.

Spatio-temporal analysis of a plant disease in a non-uniform crop: a Monte Carlo approach

Bin Li R. S. Sanderlin Rebecca A. Melanson Qingzhao Yu 《Journal of applied statistics》2011,38(1):175-182

Identification of the type of disease pattern and spread in a field is critical in epidemiological investigations of plant diseases. For example, an aggregation pattern of infected plants suggests that, at the time of observation, the pathogen is spreading from a proximal source. Conversely, a random pattern suggests a lack of spread from a proximal source. Most of the existing methods of spatial pattern analysis work with only one variety of plant at each location and with uniform genetic disease susceptibility across the field. Pecan orchards, used in this study, and other orchard crops are usually composed of different varieties with different levels of susceptibility to disease. A new measure is suggested to characterize the spatio-temporal transmission patterns of disease; a Monte Carlo test procedure is proposed to test whether the transmission of disease is random or aggregated. In addition, we propose a mixed-transmission model, which allows us to quantify the degree of aggregation effect. 相似文献

5.

Accounting for phase I sampling variability in the performance of the MEWMA control chart with estimated parameters

Nesma A. Saleh Mahmoud A. Mahmoud 《统计学通讯:模拟与计算》2017,46(6):4333-4347

In this article, we assess the performance of the multivariate exponentially weighted moving average (MEWMA) control chart with estimated parameters while considering the practitioner-to-practitioner variability. We evaluate the chart performance in terms of the in-control average run length (ARL) distributional properties; mainly the average (AARL), the standard deviation (SDARL), and some percentiles. We show through simulations that using estimates in place of the in-control parameters may result in an in-control ARL distribution that almost completely lies below the desired value. We also show that even with the use of larger amounts of historical data, there is still a problem with the excessive false alarm rates. We recommend the use of a recently proposed bootstrap-based design technique for adjusting the control limits. The technique is quite effective in controlling the percentage of short in-control ARLs resulting from the estimation error. 相似文献

6.

Monte Carlo Simulation Study of Biased Estimators in the Linear Regression Models with Correlated or Heteroscedastic Errors

M. Revan Özkale 《统计学通讯:模拟与计算》2013,42(5):1143-1186

In this study, the performance of the estimators proposed in the presence of multicollinearity in the linear regression model with heteroscedastic or correlated or both error terms is investigated under the matrix mean square error criterion. Structures of the autocorrelated error terms are given and a Monte Carlo simulation study is conducted to examine the relative efficiency of the estimators against each other. 相似文献

7.

A Monte Carlo examination of the broken-stick distribution to identify components to retain in principal component analysis

《Journal of Statistical Computation and Simulation》2012,82(12):2405-2410

ABSTRACT

The broken-stick (BS) is a popular stopping rule in ecology to determine the number of meaningful components of principal component analysis. However, its properties have not been systematically investigated. The purpose of the current study is to evaluate its ability to detect the correct dimensionality in a data set and whether it tends to over- or underestimate it. A Monte Carlo protocol was carried out. Two main correlation matrices deemed usual in practice were used with three levels of correlation (0, 0.10 and 0.30) between components (generating oblique structure) and with different sample sizes. Analyses of the population correlation matrices indicated that, for extremely large sample sizes, the BS method could be correct for only one of the six simulated structure. It actually failed to identify the correct dimensionality half the time with orthogonal structures and did even worse with some oblique ones. In harder conditions, results show that the power of the BS decreases as sample size increases: weakening its usefulness in practice. Since the BS method seems unlikely to identify the underlying dimensionality of the data, and given that better stopping rules exist it appears as a poor choice when carrying principal component analysis. 相似文献

8.

Temporal dynamics of inter-limb coordination in ice climbing revealed through change-point analysis of the geodesic mean of circular data

Ludovic Seifert Romain Hérault Léo Wattebled Keith Davids 《Journal of applied statistics》2013,40(11):2317-2331

This study examined the temporal dynamics of the inter-limb angles of skilled and less skilled ice climbers to determine how they explored ice fall properties to adapt their coordination patterns during performance. We observed two circular time series corresponding to the upper- and lower-limbs of seven expert and eight inexperienced ice climbers. We analyzed these data through a multiple change-point analysis of the geodesic (or Fréchet) mean on the circle. Guided by the nature of the geodesic mean obtained by an optimization procedure, we extended the filtered derivative method, known to be computationally very cheap and fast, to circular data. Local estimation of the variability was assessed through the number of change-points computed via the filtered derivatives with p-value method for the time series and integrated squared error (ISE). Results of this change-point analysis did not reveal significant differences of the number of change-points between groups but indicated higher ISE that supported the existence of plateaux for beginners. These results emphasized higher local variability of limb angles for experts than for beginners suggesting greater dependence on the properties of the performance environment and adaptive behaviors in the former. Conversely, the lower local variance of limb angles assessed in beginners may reflect their independence of the environmental constraints, as they focused mainly on controlling body equilibrium. 相似文献

9.

R package for analysis of data with mixed measurement error and misclassification in covariates: augSIMEX

Qihuang Zhang 《Journal of Statistical Computation and Simulation》2019,89(12):2293-2315

Measurement error and misclassification arise commonly in various data collection processes. It is well-known that ignoring these features in the data analysis usually leads to biased inference. With the generalized linear model setting, Yi et al. [Functional and structural methods with mixed measurement error and misclassification in covariates. J Am Stat Assoc. 2015;110:681–696] developed inference methods to adjust for the effects of measurement error in continuous covariates and misclassification in discrete covariates simultaneously for the scenario where validation data are available. The augmented simulation-extrapolation (SIMEX) approach they developed generalizes the usual SIMEX method which is only applicable to handle continuous error-prone covariates. To implement this method, we develop an R package, augSIMEX, for public use. Simulation studies are conducted to illustrate the use of the algorithm. This package is available at CRAN. 相似文献

10.

Functional data analysis: estimation of the relative error in functional regression under random left-truncation model

Belkais Altendji Jacques Demongeot Ali Laksaci 《Journal of nonparametric statistics》2018,30(2):472-490

In this paper, we investigate the relationship between a functional random covariable and a scalar response which is subject to left-truncation by another random variable. Precisely, we use the mean squared relative error as a loss function to construct a nonparametric estimator of the regression operator of these functional truncated data. Under some standard assumptions in functional data analysis, we establish the almost sure consistency, with rates, of the constructed estimator as well as its asymptotic normality. Then, a simulation study, on finite-sized samples, was carried out in order to show the efficiency of our estimation procedure and to highlight its superiority over the classical kernel estimation, for different levels of simulated truncated data. 相似文献

11.

Detecting the Guttman effect with the help of ordinal correspondence analysis in synchrotron X-ray diffraction data analysis

C. Mant S. Cornu D. Borschneck C. Mocuta R. van den Bogaert 《Journal of applied statistics》2022,49(2):291

相似文献

12.

A general model for testing the proportional hazards and the accelerated failure time hypotheses in the analysis of censored survival data with covariates

A. Ciampi J. Etezadi-Amoli 《统计学通讯:理论与方法》2013,42(3):651-667

Censored survival data are analysed by regression models which require some assumptions on the waycovariates affect the hazard function. Praportional Hazards (PH) and Accelerated Failure Times (AFT) are the hypothese most often used in practice. A method is introduced here for testing the PH and the AFT hypotheses against a general model for the hazard function. Simulated and real data arepresented to show the usefulness of the method. 相似文献

13.

Using a Box–Cox transformation in the analysis of longitudinal data with incomplete responses

S. R. Lipsitz J. Ibrahim & G. Molenberghs 《Journal of the Royal Statistical Society. Series C, Applied statistics》2000,49(3):287-296

We analyse longitudinal data on CD4 cell counts from patients who participated in clinical trials that compared two therapeutic treatments: zidovudine and didanosine. The investigators were interested in modelling the CD4 cell count as a function of treatment, age at base-line and disease stage at base-line. Serious concerns can be raised about the normality assumption of CD4 cell counts that is implicit in many methods and therefore an analysis may have to start with a transformation. Instead of assuming that we know the transformation (e.g. logarithmic) that makes the outcome normal and linearly related to the covariates, we estimate the transformation, by using maximum likelihood, within the Box–Cox family. There has been considerable work on the Box–Cox transformation for univariate regression models. Here, we discuss the Box–Cox transformation for longitudinal regression models when the outcome can be missing over time, and we also implement a maximization method for the likelihood, assumming that the missing data are missing at random. 相似文献

14.

Multiple imputation compared with restricted pseudo‐likelihood and generalized estimating equations for analysis of binary repeated measures in clinical studies

Ilya Lipkovich Yuyan Duan Saeeduddin Ahmed 《Pharmaceutical statistics》2005,4(4):267-285

Non‐likelihood‐based methods for repeated measures analysis of binary data in clinical trials can result in biased estimates of treatment effects and associated standard errors when the dropout process is not completely at random. We tested the utility of a multiple imputation approach in reducing these biases. Simulations were used to compare performance of multiple imputation with generalized estimating equations and restricted pseudo‐likelihood in five representative clinical trial profiles for estimating (a) overall treatment effects and (b) treatment differences at the last scheduled visit. In clinical trials with moderate to high (40–60%) dropout rates with dropouts missing at random, multiple imputation led to less biased and more precise estimates of treatment differences for binary outcomes based on underlying continuous scores. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献