首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Smoothing of noisy sample covariances is an important component in functional data analysis. We propose a novel covariance smoothing method based on penalized splines and associated software. The proposed method is a bivariate spline smoother that is designed for covariance smoothing and can be used for sparse functional or longitudinal data. We propose a fast algorithm for covariance smoothing using leave-one-subject-out cross-validation. Our simulations show that the proposed method compares favorably against several commonly used methods. The method is applied to a study of child growth led by one of coauthors and to a public dataset of longitudinal CD4 counts.  相似文献   

2.
3.
4.
5.
In this article, we propose a novel approach to fit a functional linear regression in which both the response and the predictor are functions. We consider the case where the response and the predictor processes are both sparsely sampled at random time points and are contaminated with random errors. In addition, the random times are allowed to be different for the measurements of the predictor and the response functions. The aforementioned situation often occurs in longitudinal data settings. To estimate the covariance and the cross‐covariance functions, we use a regularization method over a reproducing kernel Hilbert space. The estimate of the cross‐covariance function is used to obtain estimates of the regression coefficient function and of the functional singular components. We derive the convergence rates of the proposed cross‐covariance, the regression coefficient, and the singular component function estimators. Furthermore, we show that, under some regularity conditions, the estimator of the coefficient function has a minimax optimal rate. We conduct a simulation study and demonstrate merits of the proposed method by comparing it to some other existing methods in the literature. We illustrate the method by an example of an application to a real‐world air quality dataset. The Canadian Journal of Statistics 47: 524–559; 2019 © 2019 Statistical Society of Canada  相似文献   

6.
The problem of selecting the best of k populations is studied for data which are incomplete as some of the values have been deleted randomly. This situation is met in extreme value analysis where only data exceeding a threshold are observable. For increasing sample size we study the case where the probability that a value is observed tends to zero, but the sparse condition is satisfied, so that the mean number of observable values in each population is bounded away from zero and infinity as the sample size tends to infinity. The incomplete data are described by thinned point processes which are approximated by Poisson point processes. Under weak assumptions and after suitable transformations these processes converge to a Poisson point process. Optimal selection rules for the limit model are used to construct asymptotically optimal selection rules for the original sequence of models. The results are applied to extreme value data for high thresholds data.  相似文献   

7.
Summary.  Sparse clustered data arise in finely stratified genetic and epidemiologic studies and pose at least two challenges to inference. First, it is difficult to model and interpret the full joint probability of dependent discrete data, which limits the utility of full likelihood methods. Second, standard methods for clustered data, such as pairwise likelihood and the generalized estimating function approach, are unsuitable when the data are sparse owing to the presence of many nuisance parameters. We present a composite conditional likelihood for use with sparse clustered data that provides valid inferences about covariate effects on both the marginal response probabilities and the intracluster pairwise association. Our primary focus is on sparse clustered binary data, in which case the method proposed utilizes doubly discordant quadruplets drawn from each stratum to conduct inference about the intracluster pairwise odds ratios.  相似文献   

8.
In this article, we study the methods for two-sample hypothesis testing of high-dimensional data coming from a multivariate binary distribution. We test the random projection method and apply an Edgeworth expansion for improvement. Additionally, we propose new statistics which are especially useful for sparse data. We compare the performance of these tests in various scenarios through simulations run in a parallel computing environment. Additionally, we apply these tests to the 20 Newsgroup data showing that our proposed tests have considerably higher power than the others for differentiating groups of news articles with different topics.  相似文献   

9.
The analysis of high-dimensional data often begins with the identification of lower dimensional subspaces. Principal component analysis is a dimension reduction technique that identifies linear combinations of variables along which most variation occurs or which best “reconstruct” the original variables. For example, many temperature readings may be taken in a production process when in fact there are just a few underlying variables driving the process. A problem with principal components is that the linear combinations can seem quite arbitrary. To make them more interpretable, we introduce two classes of constraints. In the first, coefficients are constrained to equal a small number of values (homogeneity constraint). The second constraint attempts to set as many coefficients to zero as possible (sparsity constraint). The resultant interpretable directions are either calculated to be close to the original principal component directions, or calculated in a stepwise manner that may make the components more orthogonal. A small dataset on characteristics of cars is used to introduce the techniques. A more substantial data mining application is also given, illustrating the ability of the procedure to scale to a very large number of variables.  相似文献   

10.
Q. F. Xu  C. Cai  X. Huang 《Statistics》2019,53(1):26-42
In recent decades, quantile regression has received much more attention from academics and practitioners. However, most of existing computational algorithms are only effective for small or moderate size problems. They cannot solve quantile regression with large-scale data reliably and efficiently. To this end, we propose a new algorithm to implement quantile regression on large-scale data using the sparse exponential transform (SET) method. This algorithm mainly constructs a well-conditioned basis and a sampling matrix to reduce the number of observations. It then solves a quantile regression problem on this reduced matrix and obtains an approximate solution. Through simulation studies and empirical analysis of a 5% sample of the US 2000 Census data, we demonstrate efficiency of the SET-based algorithm. Numerical results indicate that our new algorithm is effective in terms of computation time and performs well for large-scale quantile regression.  相似文献   

11.
In the context of functional data analysis, we propose new sample tests for homogeneity. Based on some well-known depth measures, we construct four different statistics in order to measure distance between the two samples. A simulation study is performed to check the efficiency of the tests when confronted with shape and magnitude perturbation. Finally, we apply these tools to measure the homogeneity in some samples of real data, and we obtain good results using this new method.  相似文献   

12.
In the framework of null hypothesis significance testing for functional data, we propose a procedure able to select intervals of the domain imputable for the rejection of a null hypothesis. An unadjusted p-value function and an adjusted one are the output of the procedure, namely interval-wise testing. Depending on the sort and level α of type-I error control, significant intervals can be selected by thresholding the two p-value functions at level α. We prove that the unadjusted (adjusted) p-value function point-wise (interval-wise) controls the probability of type-I error and it is point-wise (interval-wise) consistent. To enlighten the gain in terms of interpretation of the phenomenon under study, we applied the interval-wise testing to the analysis of a benchmark functional data set, i.e. Canadian daily temperatures. The new procedure provides insights that current state-of-the-art procedures do not, supporting similar advantages in the analysis of functional data with less prior knowledge.  相似文献   

13.
Nonresponse is a very common phenomenon in survey sampling. Nonignorable nonresponse – that is, a response mechanism that depends on the values of the variable having nonresponse – is the most difficult type of nonresponse to handle. This article develops a robust estimation approach to estimating equations (EEs) by incorporating the modelling of nonignorably missing data, the generalized method of moments (GMM) method and the imputation of EEs via the observed data rather than the imputed missing values when some responses are subject to nonignorably missingness. Based on a particular semiparametric logistic model for nonignorable missing response, this paper proposes the modified EEs to calculate the conditional expectation under nonignorably missing data. We can apply the GMM to infer the parameters. The advantage of our method is that it replaces the non-parametric kernel-smoothing with a parametric sampling importance resampling (SIR) procedure to avoid nonparametric kernel-smoothing problems with high dimensional covariates. The proposed method is shown to be more robust than some current approaches by the simulations.  相似文献   

14.
Statistics for spatial functional data is an emerging field in statistics which combines methods of spatial statistics and functional data analysis to model spatially correlated functional data. Checking for spatial autocorrelation is an important step in the statistical analysis of spatial data. Several statistics to achieve this goal have been proposed. The test based on the Mantel statistic is widely known and used in this context. This paper proposes an application of this test to the case of spatial functional data. Although we focus particularly on geostatistical functional data, that is functional data observed in a region with spatial continuity, the test proposed can also be applied with functional data which can be measured on a discrete set of areas of a region (areal functional data) by defining properly the distance between the areas. Based on two simulation studies, we show that the proposed test has a good performance. We illustrate the methodology by applying it to an agronomic data set.  相似文献   

15.
ABSTRACT

We present methods for modeling and estimation of a concurrent functional regression when the predictors and responses are two-dimensional functional datasets. The implementations use spline basis functions and model fitting is based on smoothing penalties and mixed model estimation. The proposed methods are implemented in available statistical software, allow the construction of confidence intervals for the bivariate model parameters, and can be applied to completely or sparsely sampled responses. Methods are tested to data in simulations and they show favorable results in practice. The usefulness of the methods is illustrated in an application to environmental data.  相似文献   

16.
Linear combinations of random variables play a crucial role in multivariate analysis. Two extension of this concept are considered for functional data and shown to coincide using the Loève–Parzen reproducing kernel Hilbert space representation of a stochastic process. This theory is then used to provide an extension of the multivariate concept of canonical correlation. A solution to the regression problem of best linear unbiased prediction is obtained from this abstract canonical correlation formulation. The classical identities of Lawley and Rao that lead to canonical factor analysis are also generalized to the functional data setting. Finally, the relationship between Fisher's linear discriminant analysis and canonical correlation analysis for random vectors is extended to include situations with function-valued random elements. This allows for classification using the canonical Y scores and related distance measures.  相似文献   

17.
We study the design problem for the optimal classification of functional data. The goal is to select sampling time points so that functional data observed at these time points can be classified accurately. We propose optimal designs that are applicable to either dense or sparse functional data. Using linear discriminant analysis, we formulate our design objectives as explicit functions of the sampling points. We study the theoretical properties of the proposed design objectives and provide a practical implementation. The performance of the proposed design is evaluated through simulations and real data applications. The Canadian Journal of Statistics 48: 285–307; 2020 © 2019 Statistical Society of Canada  相似文献   

18.
Functional data are being observed frequently in many scientific fields, and therefore most of the standard statistical methods are being adapted for functional data. The multivariate analysis of variance problem for functional data is considered. It seems to be of practical interest similarly as the one-way analysis of variance for such data. For the MANOVA problem for multivariate functional data, we propose permutation tests based on a basis function representation and tests based on random projections. Their performance is examined in comprehensive simulation studies, which provide an idea of the size control and power of the tests and identify differences between them. The simulation experiments are based on artificial data and real labeled multivariate time series data found in the literature. The results suggest that the studied testing procedures can detect small differences between vectors of curves even with small sample sizes. Illustrative real data examples of the use of the proposed testing procedures in practice are also presented.  相似文献   

19.
20.
Abstract

One of the basic statistical methods of dimensionality reduction is analysis of discriminant coordinates given by Fisher (1936 Fisher, R. A. 1936. The use of multiple measurements in taxonomic problem. Annals of Eugenics 7 (2):17988. doi:10.1111/j.1469-1809.1936.tb02137.x.[Crossref] [Google Scholar]) and Rao (1948). The space of discriminant coordinates is a space convenient for presenting multidimensional data originating from multiple groups and for the use of various classification methods (methods of discriminant analysis). In the present paper, we adapt the classical discriminant coordinates analysis to multivariate functional data. The theory has been applied to analysis of textural properties of apples of six varieties, measured over a period of 180?days, stored in two types of refrigeration chamber.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号