首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Generating multivariate continuous data via the notion of nearest neighbors
Authors:Hakan Demirtas  Donald Hedeker
Institution:Division of Epidemiology and Biostatistics (MC923) , University of Illinois at Chicago , 1603 West Taylor Street, Chicago, IL, 60612, USA
Abstract:Taylor and Thompson 15] introduced a clever algorithm for simulating multivariate continuous data sets that resemble the original data. Their approach is predicated upon determining a few nearest neighbors of a given row of data through a statistical distance measure, and subsequently combining the observations by stochastic multipliers that are drawn from a uniform distribution to generate simulated data that essentially maintain the original data trends. The newly drawn values are assumed to come from the same underlying hypothetical process that governs the mechanism of how the data are formed. This technique is appealing in that no density estimation is required. We believe that this data-based simulation method has substantial potential in multivariate data generation due to the local nature of the generation scheme, which does not have strict specification requirements as in most other algorithms. In this work, we provide two R routines: one has a built-in simulator for finding the optimal number of nearest neighbors for any given data set, and the other generates pseudo-random data using this optimal number.
Keywords:simulation  random number generation  density estimation  bootstrap  nearest neighbors
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号