期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

robROSE: A robust approach for dealing with imbalanced data in fraud detection

Baesens Bart H&#;ppner Sebastiaan Ortner Irene Verdonck Tim 《Statistical Methods and Applications》2021,30(3):841-861

Statistical Methods & Applications - A major challenge when trying to detect fraud is that the fraudulent activities form a minority class which make up a very small proportion of the data set.... 相似文献

2.

Books on data science

Ramalingam Shanmugam 《Journal of Statistical Computation and Simulation》2019,89(7):1340-1341

相似文献

3.

Independent exploratory factor analysis with application to atmospheric science data

Steffen Unkel Nickolay T. Trendafilov Abdel Hannachi Ian T. Jolliffe 《Journal of applied statistics》2010,37(11):1847-1862

The independent exploratory factor analysis method is introduced for recovering independent latent sources from their observed mixtures. The new model is viewed as a method of factor rotation in exploratory factor analysis (EFA). First, estimates for all EFA model parameters are obtained simultaneously. Then, an orthogonal rotation matrix is sought that minimizes the dependence between the common factors. The rotation of the scores is compensated by a rotation of the initial loading matrix. The proposed approach is applied to study winter monthly sea-level pressure anomalies over the Northern Hemisphere. The North Atlantic Oscillation, the North Pacific Oscillation, and the Scandinavian pattern are identified among the rotated spatial patterns with a physically interpretable structure. 相似文献

4.

Web of Deception: Misinformation on the Internet

Elizabeth Parang 《Serials Review》2013,39(1):66-67

Abstract

“Serials Spoken Here” has reports on a symposium on “The Transition to Open Access Scholarship” held at the University at Albany, New York, and the 2004 North Carolina Serials Conference, both held in April 2004. 相似文献

5.

A few statistical principles for data science

Noel Cressie 《Australian & New Zealand Journal of Statistics》2021,63(1):182-200

In any other circumstance, it might make sense to define the extent of the terrain (Data Science) first, and then locate and describe the landmarks (Principles). But this data revolution we are experiencing defies a cadastral survey. Areas are continually being annexed into Data Science. For example, biometrics was traditionally statistics for agriculture in all its forms but now, in Data Science, it means the study of characteristics that can be used to identify an individual. Examples of non-intrusive measurements include height, weight, fingerprints, retina scan, voice, photograph/video (facial landmarks and facial expressions) and gait. A multivariate analysis of such data would be a complex project for a statistician, but a software engineer might appear to have no trouble with it at all. In any applied-statistics project, the statistician worries about uncertainty and quantifies it by modelling data as realisations generated from a probability space. Another approach to uncertainty quantification is to find similar data sets, and then use the variability of results between these data sets to capture the uncertainty. Both approaches allow ‘error bars’ to be put on estimates obtained from the original data set, although the interpretations are different. A third approach, that concentrates on giving a single answer and gives up on uncertainty quantification, could be considered as Data Engineering, although it has staked a claim in the Data Science terrain. This article presents a few (actually nine) statistical principles for data scientists that have helped me, and continue to help me, when I work on complex interdisciplinary projects. 相似文献

6.

Using statistical techniques to detect fraud: a test case

Michael O'Kelly 《Pharmaceutical statistics》2004,3(4):237-246

In an experiment to test the effectiveness of statistical measures in detecting fraud, three physicians fabricated scores on the Montgomery–Åsberg Depression Rating Scale (MADRS) for a number of subjects in three sites. The fabricated data were then planted among MADRS data from 18 genuine sites. A statistician blinded as to the identity and quantity of the fabricated data attempted to detect the ‘fraudulent’ data by searching for unusual means and correlations. One of the three fabricated sites was correctly identified, and one genuine site was incorrectly identified as a potential fabrication. In addition, inlying and/or outlying means and correlations found in the genuine data suggested the possibility of using statistical checks for unusual data early in a study so that sites with unusual patterns could be prioritized for monitoring, training and, if necessary, auditing. Copyright © 2004 John Wiley & Sons Ltd. 相似文献

7.

When the metadata exceed the data: data management with uncertain data

John C. Klensin 《Statistics and Computing》1995,5(1):73-84

相似文献

8.

Estimation in autoregressive models with surrogate data and validation data

Shi-Hang Yu Kun Li Zhi-Wen Zhao 《统计学通讯:理论与方法》2017,46(3):1532-1545

Time-series data are often subject to measurement error, usually the result of needing to estimate the variable of interest. Generally, however, the relationship between the surrogate variables and the true variables can be rather complicated compared to the classical additive error structure usually assumed. In this article, we address the estimation of the parameters in autoregressive models in the presence of function measurement errors. We first develop a parameter estimation method with the help of validation data; this estimation method does not depend on functional form and the distribution of the measurement error. The proposed estimator is proved to be consistent. Moreover, the asymptotic representation and the asymptotic normality of the estimator are also derived, respectively. Simulation results indicate that the proposed method works well for practical situation. 相似文献

9.

Estimation with incomplete data: An improved computational method and the analysis of nested data

R.R. Hocking D.L. Marx 《统计学通讯:理论与方法》2013,42(12):1155-1181

Maximum likelihood estimation with incomplete normal nrocedure and (ii.)allows tor a Simple MLI.I v=n,ce nf noofori TiVplilinndv Closed form solutions are described for the general nested case.Exact, ssample, moments are given for the two group case Some. computational comparisions are made with the earlier ESTMAT algorithm. 相似文献

10.

Applied compositional data analysis: with worked examples in R

Ramalingam Shanmugam 《Journal of Statistical Computation and Simulation》2019,89(16)

相似文献

11.

Statistics and data with R

Ana F. Militino 《Journal of applied statistics》2010,37(2):357-358

相似文献

12.

Interval-censored data with misclassification: a Bayesian approach

Magda Carvalho Pires Enrico Antnio Colosimo Guilherme Augusto Veloso Raquel de Souza Borges Ferreira 《Journal of applied statistics》2021,48(5):907

Survival data involving silent events are often subject to interval censoring (the event is known to occur within a time interval) and classification errors if a test with no perfect sensitivity and specificity is applied. Considering the nature of this data plays an important role in estimating the time distribution until the occurrence of the event. In this context, we incorporate validation subsets into the parametric proportional hazard model, and show that this additional data, combined with Bayesian inference, compensate the lack of knowledge about test sensitivity and specificity improving the parameter estimates. The proposed model is evaluated through simulation studies, and Bayesian analysis is conducted within a Gibbs sampling procedure. The posterior estimates obtained under validation subset models present lower bias and standard deviation compared to the scenario with no validation subset or the model that assumes perfect sensitivity and specificity. Finally, we illustrate the usefulness of the new methodology with an analysis of real data about HIV acquisition in female sex workers that have been discussed in the literature. 相似文献

13.

Variance component models for longitudinal count data with baseline information: epilepsy data revisited

Marco Alfò Murray Aitkin 《Statistics and Computing》2006,16(3):231-238

Random effect models have often been used in longitudinal data analysis since they allow for association among repeated measurements due to unobserved heterogeneity. Various approaches have been proposed to extend mixed models for repeated count data to include dependence on baseline counts. Dependence between baseline counts and individual-specific random effects result in a complex form of the (conditional) likelihood. An approximate solution can be achieved ignoring this dependence, but this approach could result in biased parameter estimates and in wrong inferences. We propose a computationally feasible approach to overcome this problem, leaving the random effect distribution unspecified. In this context, we show how the EM algorithm for nonparametric maximum likelihood (NPML) can be extended to deal with dependence of repeated measures on baseline counts. 相似文献

14.

Maximum likelihood estimation in a model with interval data: a comment and extension

Steven B. Caudill 《Journal of applied statistics》1996,23(1):97-104

The purpose of this article is, first, to extend Poon et al. 's (1993) maximum likelihood estimation (MLE) of the correlation coefficient based on interval data to the regression case. Secondly, this paper shows how the traditional method of collecting interval data with the intervals chosen by the researcher can be easily modified to avoid the problems discussed by Poon et al. (1993). The MLE for this modification to the regression problem is presented. Finally, all the methods discussed in this paper are used to estimate the effects of grade point average and gender on student perceptions of the percentage of their classmates who have cheated on at least one exam in college. 相似文献

15.

Epidemics: models and data

Mollison D Isham V Grenfell B 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》1994,157(1):115-149

"The problems of understanding and controlling disease raise a range of challenging mathematical and statistical research topics, from broad theoretical issues to specific practical ones. In particular, recent interest in acquired immune deficiency syndrome has stimulated much progress in diverse areas of epidemic modelling, particularly with regard to the treatment of heterogeneity, both between individuals and in mixing of subgroups of the population. At the same time better data and data analysis techniques have become available, and there have been exciting developments in relevant theory.... This progress in specific areas is now being matched by interdisciplinary cooperation aimed at elucidating relationships between the widely varying types of model that have been found useful, to determine their strengths and limitations in relation to basic aims such as understanding, prediction, and evaluation and implementation of control strategies." 相似文献

16.

Jane M. Horgan: Probability with R, an introduction with computer science applications

Ricardo Maronna 《Statistical Papers》2012,53(4):1075-1075

相似文献

17.

Evidences from survey data and fiscal data: nonresponse and measurement errors in annual incomes

Cavicchioli Maddalena Lalla Michele 《Statistical Methods and Applications》2022,31(3):587-615

Statistical Methods & Applications - A (local) survey on income carried out in the city of Modena in 2002, with income reference year 2001, generated four categories of units: interviewees,... 相似文献

18.

Editorial: What to do with missing data?

Geert Molenberghs 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2007,170(4):861-863

相似文献

19.

Dealing with big data: comparing dimension reduction and shrinkage regression methods

Hamideh D. Hamedani Sara Sadat Moosavi 《Journal of applied statistics》2017,44(3):511-532

In the past decades, the number of variables explaining observations in different practical applications increased gradually. This has led to heavy computational tasks, despite of widely using provisional variable selection methods in data processing. Therefore, more methodological techniques have appeared to reduce the number of explanatory variables without losing much of the information. In these techniques, two distinct approaches are apparent: ‘shrinkage regression’ and ‘sufficient dimension reduction’. Surprisingly, there has not been any communication or comparison between these two methodological categories, and it is not clear when each of these two approaches are appropriate. In this paper, we fill some of this gap by first reviewing each category in brief, paying special attention to the most commonly used methods in each category. We then compare commonly used methods from both categories based on their accuracy, computation time, and their ability to select effective variables. A simulation study on the performance of the methods in each category is generated as well. The selected methods are concurrently tested on two sets of real data which allows us to recommend conditions under which one approach is more appropriate to be applied to high-dimensional data. 相似文献

20.

Maximum likelihood estimation with missing spatial data and with an application to remotely sensed data

Robert Haining Daniel Griffith Robert Bennett 《统计学通讯:理论与方法》2013,42(5):1875-1894

The paper examines the small and large lattice properties of the exact maximum likelihood estimator for a spatial model where parameter estimation and missing data estimation are tackled simultaneously, A first order conditional autoregressive model is examined in detail. The paper concludes with an empirical analysis of remotely sensed data. 相似文献