Abstract: | Given two independent samples of size n and m drawn from univariate distributions with unknown densities f and g, respectively, we are interested in identifying subintervals where the two empirical densities deviate significantly from each other. The solution is built by turning the nonparametric density comparison problem into a comparison of two regression curves. Each regression curve is created by binning the original observations into many small size bins, followed by a suitable form of root transformation to the binned data counts. Turned as a regression comparison problem, several nonparametric regression procedures for detection of sparse signals can be applied. Both multiple testing and model selection methods are explored. Furthermore, an approach for estimating larger connected regions where the two empirical densities are significantly different is also derived, based on a scale-space representation. The proposed methods are applied on simulated examples as well as real-life data from biology. |