首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到14条相似文献,搜索用时 15 毫秒
1.
The nonparametric two-sample bootstrap is applied to computing uncertainties of measures in receiver operating characteristic (ROC) analysis on large datasets in areas such as biometrics, speaker recognition, etc. when the analytical method cannot be used. Its validation was studied by computing the standard errors of the area under ROC curve using the well-established analytical Mann–Whitney statistic method and also using the bootstrap. The analytical result is unique. The bootstrap results are expressed as a probability distribution due to its stochastic nature. The comparisons were carried out using relative errors and hypothesis testing. These match very well. This validation provides a sound foundation for such computations.  相似文献   

2.
There are several statistical hypothesis tests available for assessing normality assumptions, which is an a priori requirement for most parametric statistical procedures. The usual method for comparing the performances of normality tests is to use Monte Carlo simulations to obtain point estimates for the corresponding powers. The aim of this work is to improve the assessment of 9 normality hypothesis tests. For that purpose, random samples were drawn from several symmetric and asymmetric nonnormal distributions and Monte Carlo simulations were carried out to compute confidence intervals for the power achieved, for each distribution, by two of the most usual normality tests, Kolmogorov–Smirnov with Lilliefors correction and Shapiro–Wilk. In addition, the specificity was computed for each test, again resorting to Monte Carlo simulations, taking samples from standard normal distributions. The analysis was then additionally extended to the Anderson–Darling, Cramer-Von Mises, Pearson chi-square Shapiro–Francia, Jarque–Bera, D'Agostino and uncorrected Kolmogorov–Smirnov tests by determining confidence intervals for the areas under the receiver operating characteristic curves. Simulations were performed to this end, wherein for each sample from a nonnormal distribution an equal-sized sample was taken from a normal distribution. The Shapiro–Wilk test was seen to have the best global performance overall, though in some circumstances the Shapiro–Francia or the D'Agostino tests offered better results. The differences between the tests were not as clear for smaller sample sizes. Also to be noted, the SW and KS tests performed generally quite poorly in distinguishing between samples drawn from normal distributions and t Student distributions.  相似文献   

3.
Rubin (1976 Rubin, D.B. (1976). Inference and missing data. Biometrika 63(3):581592.[Crossref], [Web of Science ®] [Google Scholar]) derived general conditions under which inferences that ignore missing data are valid. These conditions are sufficient but not generally necessary, and therefore may be relaxed in some special cases. We consider here the case of frequentist estimation of a conditional cdf subject to missing outcomes. We partition a set of data into outcome, conditioning, and latent variables, all of which potentially affect the probability of a missing response. We describe sufficient conditions under which a complete-case estimate of the conditional cdf of the outcome given the conditioning variable is unbiased. We use simulations on a renal transplant data set (Dienemann et al.) to illustrate the implications of these results.  相似文献   

4.
Identification of the type of disease pattern and spread in a field is critical in epidemiological investigations of plant diseases. For example, an aggregation pattern of infected plants suggests that, at the time of observation, the pathogen is spreading from a proximal source. Conversely, a random pattern suggests a lack of spread from a proximal source. Most of the existing methods of spatial pattern analysis work with only one variety of plant at each location and with uniform genetic disease susceptibility across the field. Pecan orchards, used in this study, and other orchard crops are usually composed of different varieties with different levels of susceptibility to disease. A new measure is suggested to characterize the spatio-temporal transmission patterns of disease; a Monte Carlo test procedure is proposed to test whether the transmission of disease is random or aggregated. In addition, we propose a mixed-transmission model, which allows us to quantify the degree of aggregation effect.  相似文献   

5.
In this article, we assess the performance of the multivariate exponentially weighted moving average (MEWMA) control chart with estimated parameters while considering the practitioner-to-practitioner variability. We evaluate the chart performance in terms of the in-control average run length (ARL) distributional properties; mainly the average (AARL), the standard deviation (SDARL), and some percentiles. We show through simulations that using estimates in place of the in-control parameters may result in an in-control ARL distribution that almost completely lies below the desired value. We also show that even with the use of larger amounts of historical data, there is still a problem with the excessive false alarm rates. We recommend the use of a recently proposed bootstrap-based design technique for adjusting the control limits. The technique is quite effective in controlling the percentage of short in-control ARLs resulting from the estimation error.  相似文献   

6.
In this study, the performance of the estimators proposed in the presence of multicollinearity in the linear regression model with heteroscedastic or correlated or both error terms is investigated under the matrix mean square error criterion. Structures of the autocorrelated error terms are given and a Monte Carlo simulation study is conducted to examine the relative efficiency of the estimators against each other.  相似文献   

7.
ABSTRACT

The broken-stick (BS) is a popular stopping rule in ecology to determine the number of meaningful components of principal component analysis. However, its properties have not been systematically investigated. The purpose of the current study is to evaluate its ability to detect the correct dimensionality in a data set and whether it tends to over- or underestimate it. A Monte Carlo protocol was carried out. Two main correlation matrices deemed usual in practice were used with three levels of correlation (0, 0.10 and 0.30) between components (generating oblique structure) and with different sample sizes. Analyses of the population correlation matrices indicated that, for extremely large sample sizes, the BS method could be correct for only one of the six simulated structure. It actually failed to identify the correct dimensionality half the time with orthogonal structures and did even worse with some oblique ones. In harder conditions, results show that the power of the BS decreases as sample size increases: weakening its usefulness in practice. Since the BS method seems unlikely to identify the underlying dimensionality of the data, and given that better stopping rules exist it appears as a poor choice when carrying principal component analysis.  相似文献   

8.
This study examined the temporal dynamics of the inter-limb angles of skilled and less skilled ice climbers to determine how they explored ice fall properties to adapt their coordination patterns during performance. We observed two circular time series corresponding to the upper- and lower-limbs of seven expert and eight inexperienced ice climbers. We analyzed these data through a multiple change-point analysis of the geodesic (or Fréchet) mean on the circle. Guided by the nature of the geodesic mean obtained by an optimization procedure, we extended the filtered derivative method, known to be computationally very cheap and fast, to circular data. Local estimation of the variability was assessed through the number of change-points computed via the filtered derivatives with p-value method for the time series and integrated squared error (ISE). Results of this change-point analysis did not reveal significant differences of the number of change-points between groups but indicated higher ISE that supported the existence of plateaux for beginners. These results emphasized higher local variability of limb angles for experts than for beginners suggesting greater dependence on the properties of the performance environment and adaptive behaviors in the former. Conversely, the lower local variance of limb angles assessed in beginners may reflect their independence of the environmental constraints, as they focused mainly on controlling body equilibrium.  相似文献   

9.
Measurement error and misclassification arise commonly in various data collection processes. It is well-known that ignoring these features in the data analysis usually leads to biased inference. With the generalized linear model setting, Yi et al. [Functional and structural methods with mixed measurement error and misclassification in covariates. J Am Stat Assoc. 2015;110:681–696] developed inference methods to adjust for the effects of measurement error in continuous covariates and misclassification in discrete covariates simultaneously for the scenario where validation data are available. The augmented simulation-extrapolation (SIMEX) approach they developed generalizes the usual SIMEX method which is only applicable to handle continuous error-prone covariates. To implement this method, we develop an R package, augSIMEX, for public use. Simulation studies are conducted to illustrate the use of the algorithm. This package is available at CRAN.  相似文献   

10.
In this paper, we investigate the relationship between a functional random covariable and a scalar response which is subject to left-truncation by another random variable. Precisely, we use the mean squared relative error as a loss function to construct a nonparametric estimator of the regression operator of these functional truncated data. Under some standard assumptions in functional data analysis, we establish the almost sure consistency, with rates, of the constructed estimator as well as its asymptotic normality. Then, a simulation study, on finite-sized samples, was carried out in order to show the efficiency of our estimation procedure and to highlight its superiority over the classical kernel estimation, for different levels of simulated truncated data.  相似文献   

11.
12.
Censored survival data are analysed by regression models which require some assumptions on the waycovariates affect the hazard function. Praportional Hazards (PH) and Accelerated Failure Times (AFT) are the hypothese most often used in practice. A method is introduced here for testing the PH and the AFT hypotheses against a general model for the hazard function. Simulated and real data arepresented to show the usefulness of the method.  相似文献   

13.
We analyse longitudinal data on CD4 cell counts from patients who participated in clinical trials that compared two therapeutic treatments: zidovudine and didanosine. The investigators were interested in modelling the CD4 cell count as a function of treatment, age at base-line and disease stage at base-line. Serious concerns can be raised about the normality assumption of CD4 cell counts that is implicit in many methods and therefore an analysis may have to start with a transformation. Instead of assuming that we know the transformation (e.g. logarithmic) that makes the outcome normal and linearly related to the covariates, we estimate the transformation, by using maximum likelihood, within the Box–Cox family. There has been considerable work on the Box–Cox transformation for univariate regression models. Here, we discuss the Box–Cox transformation for longitudinal regression models when the outcome can be missing over time, and we also implement a maximization method for the likelihood, assumming that the missing data are missing at random.  相似文献   

14.
Non‐likelihood‐based methods for repeated measures analysis of binary data in clinical trials can result in biased estimates of treatment effects and associated standard errors when the dropout process is not completely at random. We tested the utility of a multiple imputation approach in reducing these biases. Simulations were used to compare performance of multiple imputation with generalized estimating equations and restricted pseudo‐likelihood in five representative clinical trial profiles for estimating (a) overall treatment effects and (b) treatment differences at the last scheduled visit. In clinical trials with moderate to high (40–60%) dropout rates with dropouts missing at random, multiple imputation led to less biased and more precise estimates of treatment differences for binary outcomes based on underlying continuous scores. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号