共查询到20条相似文献,搜索用时 46 毫秒
1.
We study non-parametric regression estimates for random fields. The data satisfies certain strong mixing conditions and is defined on the regular N-dimensional lattice structure. We show consistency and obtain rates of convergence. The rates are optimal modulo a logarithmic factor in some cases. As an application, we estimate the regression function with multidimensional wavelets which are not necessarily isotropic. We simulate random fields on planar graphs with the concept of concliques (cf. [Kaiser MS, Lahiri SN, Nordman DJ. Goodness of fit tests for a class of markov random field models. Ann Statist. 2012;40:104–130]) in numerical examples of the estimation procedure. 相似文献
2.
The performance of two clustering strategies for spatially correlated functional data based on the same measure of spatial dependence is examined and compared. In particular, the role of the spatial dependence computed by the trace-variogram function is analyzed. The main features of both procedures is shown through a simulation study based on a variety of practical scenarios easily encountered in the analysis of spatial functional data. An application on real data based on salinity curves is also presented. 相似文献
3.
We consider the recursive estimation of a regression functional where the explanatory variables take values in some functional space. We prove the almost sure convergence of such estimates for dependent functional data. Also we derive the mean quadratic error of the considered class of estimators. Our results are established with rates and asymptotic appear bounds, under strong mixing condition. Finally, the feasibility of the proposed estimator is illustrated throughout an empirical study. 相似文献
4.
SiZer (SIgnificant ZERo crossing of the derivatives) is a graphical scale-space visualization tool that allows for statistical inferences. In this paper we develop a spatial SiZer for finding significant features and conducting goodness-of-fit tests for spatially dependent images. The spatial SiZer utilizes a family of kernel estimates of the image and provides not only exploratory data analysis but also statistical inference with spatial correlation taken into account. It is also capable of comparing the observed image with a specific null model being tested by adjusting the statistical inference using an assumed covariance structure. Pixel locations having statistically significant differences between the image and a given null model are highlighted by arrows. The spatial SiZer is compared with the existing independent SiZer via the analysis of simulated data with and without signal on both planar and spherical domains. We apply the spatial SiZer method to the decadal temperature change over some regions of the Earth. 相似文献
5.
Outlier detection has been used extensively in data analysis to detect anomalous observation in data. It has important applications such as in fraud detection and robust analysis, among others. In this paper, we propose a method in detecting multiple outliers in linear functional relationship model for circular variables. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering approach. With the use of a tree diagram, we illustrate the detection of outliers graphically. A Monte Carlo simulation study is done to verify the accuracy of the proposed method. Low probability of masking and swamping effects indicate the validity of the proposed approach. Also, the illustrations to two sets of real data are given to show its practical applicability. 相似文献
6.
Coppi et al. [ 7 R. Coppi, P. D'Urso, and P. Giordani, Fuzzy and possibilistic clustering for fuzzy data, Comput. Stat. Data Anal. 56 (2012), pp. 915–927. doi: 10.1016/j.csda.2010.09.013[Crossref], [Web of Science ®] , [Google Scholar]] applied Yang and Wu's [ 20 M.-S. Yang and K.-L. Wu, Unsupervised possibilistic clustering, Pattern Recognit. 30 (2006), pp. 5–21. doi: 10.1016/j.patcog.2005.07.005[Crossref], [Web of Science ®] , [Google Scholar]] idea to propose a possibilistic k-means (P kM) clustering algorithm for LR-type fuzzy numbers. The memberships in the objective function of P kM no longer need to satisfy the constraint in fuzzy k-means that of a data point across classes sum to one. However, the clustering performance of P kM depends on the initializations and weighting exponent. In this paper, we propose a robust clustering method based on a self-updating procedure. The proposed algorithm not only solves the initialization problems but also obtains a good clustering result. Several numerical examples also demonstrate the effectiveness and accuracy of the proposed clustering method, especially the robustness to initial values and noise. Finally, three real fuzzy data sets are used to illustrate the superiority of this proposed algorithm. 相似文献
7.
In many research fields, scientific questions are investigated by analyzing data collected over space and time, usually at fixed spatial locations and time steps and resulting in geo-referenced time series. In this context, it is of interest to identify potential partitions of the space and study their evolution over time. A finite space-time mixture model is proposed to identify level-based clusters in spatio-temporal data and study their temporal evolution along the time frame. We anticipate space-time dependence by introducing spatio-temporally varying mixing weights to allocate observations at nearby locations and consecutive time points with similar cluster’s membership probabilities. As a result, a clustering varying over time and space is accomplished. Conditionally on the cluster’s membership, a state-space model is deployed to describe the temporal evolution of the sites belonging to each group. Fully posterior inference is provided under a Bayesian framework through Monte Carlo Markov chain algorithms. Also, a strategy to select the suitable number of clusters based upon the posterior temporal patterns of the clusters is offered. We evaluate our approach through simulation experiments, and we illustrate using air quality data collected across Europe from 2001 to 2012, showing the benefit of borrowing strength of information across space and time. 相似文献
8.
In this paper we propose a new robust technique for the analysis of spatial data through simultaneous autoregressive (SAR)
models, which extends the Forward Search approach of Cerioli and Riani (1999) and Atkinson and Riani (2000). Our algorithm
starts from a subset of outlier-free observations and then selects additional observations according to their degree of agreement
with the postulated model. A number of useful diagnostics which are monitored along the search help to identify masked spatial
outliers and high leverage sites. In contrast to other robust techniques, our method is particularly suited for the analysis
of complex multidimensional systems since each step is performed through statistically and computationally efficient procedures,
such as maximum likelihood. The main contribution of this paper is the development of joint robust estimation of both trend
and autocorrelation parameters in spatial linear models. For this purpose we suggest a novel definition of the elemental sets
of the Forward Search, which relies on blocks of contiguous spatial locations. 相似文献
9.
This work deals with a local linear non parametric estimation of the generalized regression function in the case of a scalar response variable given a random variable taking values in a semimetric space. The rates of pointwise and uniform almost complete convergence are established for the studied estimator when the sample is an α-mixing sequence. Two real datasets are used to illustrate the performance of a studied estimator with respect to the kernel method. 相似文献
10.
We introduce a new goodness-of-fit test which can be applied to hypothesis testing about the marginal distribution of dependent data. We derive a new test for the equivalent hypothesis in the space of wavelet coefficients. Such properties of the wavelet transform as orthogonality, localisation and sparsity make the hypothesis testing in wavelet domain easier than in the domain of distribution functions. We propose to test the null hypothesis separately at each wavelet decomposition level to overcome the problem of bi-dimensionality of wavelet indices and to be able to find the frequency where the empirical distribution function differs from the null in case the null hypothesis is rejected. We suggest a test statistic and state its asymptotic distribution under the null and under some of the alternative hypotheses. 相似文献
11.
We propose two nonparametric Bayesian methods to cluster big data and apply them to cluster genes by patterns of gene–gene interaction. Both approaches define model-based clustering with nonparametric Bayesian priors and include an implementation that remains feasible for big data. The first method is based on a predictive recursion which requires a single cycle (or few cycles) of simple deterministic calculations for each observation under study. The second scheme is an exact method that divides the data into smaller subsamples and involves local partitions that can be determined in parallel. In a second step, the method requires only the sufficient statistics of each of these local clusters to derive global clusters. Under simulated and benchmark data sets the proposed methods compare favorably with other clustering algorithms, including k-means, DP-means, DBSCAN, SUGS, streaming variational Bayes and an EM algorithm. We apply the proposed approaches to cluster a large data set of gene–gene interactions extracted from the online search tool “Zodiac.” 相似文献
12.
ABSTRACTAmong the statistical methods to model stochastic behaviours of objects, clustering is a preliminary technique to recognize similar patterns within a group of observations in a data set. Various distances to measure differences among objects could be invoked to cluster data through numerous clustering methods. When variables in hand contain geometrical information of objects, such metrics should be adequately adapted. In fact, statistical methods for these typical data are endowed with a geometrical paradigm in a multivariate sense. In this paper, a procedure for clustering shape data is suggested employing appropriate metrics. Then, the best shape distance candidate as well as a suitable agglomerative method for clustering the simulated shape data are provided by considering cluster validation measures. The results are implemented in a real life application. 相似文献
13.
Different priors have been suggested to reflect spatial dependence in area health outcomes or in spatial regression residuals. However, to assume that residuals demonstrate spatial clustering only is a strong prior belief and alternatives have been suggested. A scheme suggested by Leroux et al. [B. Leroux, X. Lei, N. Breslow, Estimation of disease rates in small areas: A new mixed model for spatial dependence, in: M. Halloran, D. Berry (Eds.), Statistical Models in Epidemiology, the Environment and Clinical Trials, Springer-Verlag, New York, 1999, pp. 135–178] involves a single set of random effects and a spatial correlation parameter with extreme values corresponding to pure spatial and pure unstructured residual variation. This paper considers a spatially adaptive extension of that prior to reflect the fact that the appropriate mix between local and global smoothing may not be constant across the region being studied. Local smoothing will not be indicated when an area is disparate from its neighbours (e.g. in terms of social or environmental risk factors for the health outcome being considered). The prior for varying spatial correlation parameters may be based on a regression structure which includes possible observed sources of disparity between neighbours. A case study considers probabilities of long term illness in 133 small areas in NE London, with disparities based on a measure of socio-economic deprivation. 相似文献
14.
In this article we study the theoretical properties of the simultaneous multiscale change point estimator (SMUCE) in piecewise-constant signal models with dependent error processes. Empirical studies suggest that in this case the change point estimate is inconsistent, but it is not known if alternatives suggested in the literature for correlated data are consistent. We propose a modification of SMUCE scaling the basic statistic by the long run variance of the error process, which is estimated by a difference-type variance estimator calculated from local means from different blocks. For this modification we prove model consistency for physical-dependent error processes and illustrate the finite sample performance by means of a simulation study. 相似文献
16.
Clustering of mixed data is important yet challenging due to a shortage of conventional distributions for such data. In this article, we propose a mixture model of Gaussian copulas for clustering mixed data. Indeed copulas, and Gaussian copulas in particular, are powerful tools for easily modeling the distribution of multivariate variables. This model clusters data sets with continuous, integer, and ordinal variables (all having a cumulative distribution function) by considering the intra-component dependencies in a similar way to the Gaussian mixture. Indeed, each component of the Gaussian copula mixture produces a correlation coefficient for each pair of variables and its univariate margins follow standard distributions (Gaussian, Poisson, and ordered multinomial) depending on the nature of the variable (continuous, integer, or ordinal). As an interesting by-product, this model generalizes many well-known approaches and provides tools for visualization based on its parameters. The Bayesian inference is achieved with a Metropolis-within-Gibbs sampler. The numerical experiments, on simulated and real data, illustrate the benefits of the proposed model: flexible and meaningful parameterization combined with visualization features. 相似文献
17.
In the framework of null hypothesis significance testing for functional data, we propose a procedure able to select intervals of the domain imputable for the rejection of a null hypothesis. An unadjusted p-value function and an adjusted one are the output of the procedure, namely interval-wise testing. Depending on the sort and level α of type-I error control, significant intervals can be selected by thresholding the two p-value functions at level α. We prove that the unadjusted (adjusted) p-value function point-wise ( interval-wise) controls the probability of type-I error and it is point-wise ( interval-wise) consistent. To enlighten the gain in terms of interpretation of the phenomenon under study, we applied the interval-wise testing to the analysis of a benchmark functional data set, i.e. Canadian daily temperatures. The new procedure provides insights that current state-of-the-art procedures do not, supporting similar advantages in the analysis of functional data with less prior knowledge. 相似文献
18.
In the context of functional data analysis, we propose new sample tests for homogeneity. Based on some well-known depth measures, we construct four different statistics in order to measure distance between the two samples. A simulation study is performed to check the efficiency of the tests when confronted with shape and magnitude perturbation. Finally, we apply these tools to measure the homogeneity in some samples of real data, and we obtain good results using this new method. 相似文献
20.
This article describes a method for simulating n-dimensional multivariate non-normal data, with emphasis on count-valued data. Dependence is characterized by either Pearson correlations or Spearman correlations. The simulation is accomplished by simulating a vector of correlated standard normal variates. The elements of this vector are then transformed to achieve the target marginal distributions. We prove that the method corresponds to simulating data from a multivariate Gaussian copula. The simulation method does not restrict pairwise dependence beyond the limits imposed by the marginal distributions and can achieve any Pearson or Spearman correlation within those limits. Two examples are included. In the first example, marginal means, variances, Pearson correlations, and Spearman correlations are estimated from the epileptic seizure data set of Diggle et al. [P. Diggle, P. Heagerty, K.Y. Liang, and S. Zeger, Analysis of Longitudinal Data, Oxford University Press, Oxford, 2002]. Data with these means and variances are simulated to first achieve the estimated Pearson correlations and then achieve the estimated Spearman correlations. The second example is of a hypothetical time series of Poisson counts with seasonal mean ranging between 1 and 9 and an autoregressive (1) dependence structure. 相似文献
|