首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Given a collection of n curves that are independent realizations of a functional variable, we are interested in finding patterns in the curve data by exploring low-dimensional approximations to the curves. It is assumed that the data curves are noisy samples from the vector space span <texlscub>f 1, …, f m </texlscub>, where f 1, …, f m are unknown functions on the real interval (0, T) with square-integrable derivatives of all orders m or less, and m<n. Ramsay [Principal differential analysis: Data reduction by differential operators, J. R. Statist. Soc. Ser. B 58 (1996), pp. 495–508] first proposed the method of regularized principal differential analysis (PDA) as an alternative to principal component analysis for finding low-dimensional approximations to curves. PDA is based on the following theorem: there exists an annihilating linear differential operator (LDO) ? of order m such that ?f i =0, i=1, …, m [E.A. Coddington and N. Levinson, Theory of Ordinary Differential Equations, McGraw-Hill, New York, 1955, Theorem 6.2]. PDA specifies m, then uses the data to estimate an annihilating LDO. Smooth estimates of the coefficients of the LDO are obtained by minimizing a penalized sum of the squared norm of the residuals. In this context, the residual is that part of the data curve that is not annihilated by the LDO. PDA obtains the smooth low dimensional approximation to the data curves by projecting onto the null space of the estimated annihilating LDO; PDA is thus useful for obtaining low-dimensional approximations to the data curves whether or not the interpretation of the annihilating LDO is intuitive or obvious from the context of the data. This paper extends PDA to allow for the coefficients in the LDO to smoothly depend upon a single continuous covariate. The estimating equations for the coefficients allowing for a continuous covariate are derived; the penalty of Eilers and Marx [Flexible smoothing with B-splines and penalties, Statist. Sci. 11(2) (1996), pp. 89–121] is used to impose smoothness. The results of a small computer simulation study investigating the bias and variance properties of the estimator are reported.  相似文献   

2.
This paper discusses a supervised classification approach for the differential diagnosis of Raynaud's phenomenon (RP). The classification of data from healthy subjects and from patients suffering for primary and secondary RP is obtained by means of a set of classifiers derived within the framework of linear discriminant analysis. A set of functional variables and shape measures extracted from rewarming/reperfusion curves are proposed as discriminant features. Since the prediction of group membership is based on a large number of these features, the high dimension/small sample size problem is considered to overcome the singularity problem of the within-group covariance matrix. Results on a data set of 72 subjects demonstrate that a satisfactory classification of the subjects can be achieved through the proposed methodology.  相似文献   

3.
This article uses a local-information, near-neighbor forecasting methodology as a prediction test for evidence of a noisy, chaotic data-generating process underlying the Divisia monetary-aggregate series. Using a nonparametric method known to perform well with low-dimensional chaotic processes infected by noise, accompanied by a robust test of forecast performance evaluation, we compare out-of-sample forecasting accuracy from the local-information method to forecasting accuracy from the best fitting global linear model. Our results fail to substantiate previous claims for determinism in the Divisia monetary-aggregate series because the degree of forecast improvement obtained by the local-information method is not consistent with the hypothesis of a low-dimensional attractor underlying the Divisia data.  相似文献   

4.
This article is concerned with data sharpening (DS) technique in nonparametric regression under the setting where the multivariate predictor is embedded in an unknown low-dimensional manifold. Theoretical asymptotic bias is derived, which reveals that the proposed DS estimator has a reduced bias compared to the usual local linear estimator. The asymptotic normality of the DS estimator is also developed. It can be confirmed from simulation and applications to real data that the bias reduction for the DS estimator supported on unknown manifold is evident.  相似文献   

5.
Carbon dioxide is one of the major contributors to Global Warming. In the present study, we develop a differential equation to model the carbon dioxide emission data in the atmosphere using functional linear regression approach. In the proposed method, a differential operator is defined as data smoother and we use the penalized least square fitting criteria to smooth the data. The profile error sum of squares is optimized to estimate the differential operators using functional regression. The solution of the developed differential equation estimates and predicts the rate of change of carbon dioxide in the atmosphere at a particular time. We apply the proposed model to fit the emission of carbon dioxide data in the continental United States. Numerical simulations of a number of test cases depict a satisfactory agreement with real data.  相似文献   

6.
Statistical inference of genetic regulatory networks is essential for understanding temporal interactions of regulatory elements inside the cells. In this work, we propose to infer the parameters of the ordinary differential equations using the techniques from functional data analysis (FDA) by regarding the observed time course expression data as continuous-time curves. For networks with a large number of genes, we take advantage of the sparsity of the networks by penalizing the linear coefficients with a L 1 norm. The ability of the algorithm to infer network structure is demonstrated using the cell-cycle time course data for Saccharomyces cerevisiae.  相似文献   

7.
A new approach is introduced in this article for describing and visualizing time series of curves, where each curve has the particularity of being subject to changes in regime. For this purpose, the curves are represented by a regression model including a latent segmentation, and their temporal evolution is modeled through a Gaussian random walk over low-dimensional factors of the regression coefficients. The resulting model is nothing else than a particular state-space model involving discrete and continuous latent variables, whose parameters are estimated across a sequence of curves through a dedicated variational Expectation-Maximization algorithm. The experimental study conducted on simulated data and real time series of curves has shown encouraging results in terms of visualization of their temporal evolution and forecasting.  相似文献   

8.
The least absolute shrinkage and selection operator (lasso) has been widely used in regression analysis. Based on the piecewise linear property of the solution path, least angle regression provides an efficient algorithm for computing the solution paths of lasso. Group lasso is an important generalization of lasso that can be applied to regression with grouped variables. However, the solution path of group lasso is not piecewise linear and hence cannot be obtained by least angle regression. By transforming the problem into a system of differential equations, we develop an algorithm for efficient computation of group lasso solution paths. Simulation studies are conducted for comparing the proposed algorithm to the best existing algorithm: the groupwise-majorization-descent algorithm.  相似文献   

9.
Functional data can be clustered by plugging estimated regression coefficients from individual curves into the k-means algorithm. Clustering results can differ depending on how the curves are fit to the data. Estimating curves using different sets of basis functions corresponds to different linear transformations of the data. k-means clustering is not invariant to linear transformations of the data. The optimal linear transformation for clustering will stretch the distribution so that the primary direction of variability aligns with actual differences in the clusters. It is shown that clustering the raw data will often give results similar to clustering regression coefficients obtained using an orthogonal design matrix. Clustering functional data using an L(2) metric on function space can be achieved by clustering a suitable linear transformation of the regression coefficients. An example where depressed individuals are treated with an antidepressant is used for illustration.  相似文献   

10.
We introduce a technique for extending the classical method of linear discriminant analysis (LDA) to data sets where the predictor variables are curves or functions. This procedure, which we call functional linear discriminant analysis ( FLDA ), is particularly useful when only fragments of the curves are observed. All the techniques associated with LDA can be extended for use with FLDA. In particular FLDA can be used to produce classifications on new (test) curves, give an estimate of the discriminant function between classes and provide a one- or two-dimensional pictorial representation of a set of curves. We also extend this procedure to provide generalizations of quadratic and regularized discriminant analysis.  相似文献   

11.
Regression methods for common data types such as measured, count and categorical variables are well understood but increasingly statisticians need ways to model relationships between variable types such as shapes, curves, trees, correlation matrices and images that do not fit into the standard framework. Data types that lie in metric spaces but not in vector spaces are difficult to use within the usual regression setting, either as the response and/or a predictor. We represent the information in these variables using distance matrices which requires only the specification of a distance function. A low-dimensional representation of such distance matrices can be obtained using methods such as multidimensional scaling. Once these variables have been represented as scores, an internal model linking the predictors and the responses can be developed using standard methods. We call scoring as the transformation from a new observation to a score, whereas backscoring is a method to represent a score as an observation in the data space. Both methods are essential for prediction and explanation. We illustrate the methodology for shape data, unregistered curve data and correlation matrices using motion capture data from an experiment to study the motion of children with cleft lip.  相似文献   

12.
Cross-validation has been widely used in the context of statistical linear models and multivariate data analysis. Recently, technological advancements give possibility of collecting new types of data that are in the form of curves. Statistical procedures for analysing these data, which are of infinite dimension, have been provided by functional data analysis. In functional linear regression, using statistical smoothing, estimation of slope and intercept parameters is generally based on functional principal components analysis (FPCA), that allows for finite-dimensional analysis of the problem. The estimators of the slope and intercept parameters in this context, proposed by Hall and Hosseini-Nasab [On properties of functional principal components analysis, J. R. Stat. Soc. Ser. B: Stat. Methodol. 68 (2006), pp. 109–126], are based on FPCA, and depend on a smoothing parameter that can be chosen by cross-validation. The cross-validation criterion, given there, is time-consuming and hard to compute. In this work, we approximate this cross-validation criterion by such another criterion so that we can turn to a multivariate data analysis tool in some sense. Then, we evaluate its performance numerically. We also treat a real dataset, consisting of two variables; temperature and the amount of precipitation, and estimate the regression coefficients for the former variable in a model predicting the latter one.  相似文献   

13.
We develop functional data analysis techniques using the differential geometry of a manifold of smooth elastic functions on an interval in which the functions are represented by a log-speed function and an angle function. The manifold's geometry provides a method for computing a sample mean function and principal components on tangent spaces. Using tangent principal component analysis, we estimate probability models for functional data and apply them to functional analysis of variance, discriminant analysis, and clustering. We demonstrate these tasks using a collection of growth curves from children from ages 1–18.  相似文献   

14.
With time series data, there is often the issue of finding accurate approximations for the variance of such quantities as the sample autocovariance function or spectral estimate. Smith and Field (J. Time. Ser. Anal 14: 381–395, 1993) proposed a variance estimate motivated by resampling in the frequency domain. In this paper we present some results on the cumulants of this and other frequency domain estimates obtained via symbolic computation. The statistics of interest are linear combinations of products of discrete Fourier transforms. We describe an operator which calculates the joint cumulants of such statistics, and use the operator to deepen our understanding of the behaviour of the resampling based variance estimate. The operator acts as a filter for a general purpose operator described in Andrews and Stafford (J.R. Statist. Soc. B55, 613–627).  相似文献   

15.
Mixed-effect models are very popular for analyzing data with a hierarchical structure. In medical applications, typical examples include repeated observations within subjects in a longitudinal design, patients nested within centers in a multicenter design. However, recently, due to the medical advances, the number of fixed-effect covariates collected from each patient can be quite large, e.g., data on gene expressions of each patient, and all of these variables are not necessarily important for the outcome. So, it is very important to choose the relevant covariates correctly for obtaining the optimal inference for the overall study. On the other hand, the relevant random effects will often be low-dimensional and pre-specified. In this paper, we consider regularized selection of important fixed-effect variables in linear mixed-effect models along with maximum penalized likelihood estimation of both fixed and random-effect parameters based on general non-concave penalties. Asymptotic and variable selection consistency with oracle properties are proved for low-dimensional cases as well as for high dimensionality of non-polynomial order of sample size (number of parameters is much larger than sample size). We also provide a suitable computationally efficient algorithm for implementation. Additionally, all the theoretical results are proved for a general non-convex optimization problem that applies to several important situations well beyond the mixed model setup (like finite mixture of regressions) illustrating the huge range of applicability of our proposal.  相似文献   

16.
Abstract.  This work proposes an extension of the functional principal components analysis (FPCA) or Karhunen–Loève expansion, which can take into account non-parametrically the effects of an additional covariate. Such models can also be interpreted as non-parametric mixed effect models for functional data. We propose estimators based on kernel smoothers and a data-driven selection procedure of the smoothing parameters based on a two-step cross-validation criterion. The conditional FPCA is illustrated with the analysis of a data set consisting of egg laying curves for female fruit flies. Convergence rates are given for estimators of the conditional mean function and the conditional covariance operator when the entire curves are collected. Almost sure convergence is also proven when one observes discretized noisy sample paths only. A simulation study allows us to check the good behaviour of the estimators.  相似文献   

17.
The authors propose the use of self‐modelling regression to analyze longitudinal data with time invariant covariates. They model the population time curve with a penalized regression spline and use a linear mixed model for transformation of the time and response scales to fit the individual curves. Fitting is done by an iterative algorithm using off‐the‐shelf linear and nonlinear mixed model software. Their method is demonstrated in a simulation study and in the analysis of tree swallow nestling growth from an experiment that includes an experimentally controlled treatment, an observational covariate and multi‐level sampling.  相似文献   

18.
In this article, we present a strategy for producing low-dimensional projections that maximally separate the classes in Gaussian Mixture Model classification. The most revealing linear manifolds are those along which the classes are maximally separable. Here we consider a particular probability product kernel as a measure of similarity or affinity between the class-conditional distributions. It takes an appealing closed analytical form in the case of Gaussian mixture components. The performance of the proposed strategy has been evaluated on real data.  相似文献   

19.
A generalized numerical method for the fit of regression curves to experimental data is given. It is an ordered iteration procedure that has enabled representation of data by non-linear forms of function containing many coefficients, including integral and non-linear differential equations that themselves require numerical solution. It requires only that the function is entered into the computer program and unlike existing techniques requires no further analysis.  相似文献   

20.
Given a noisy time series (or signal), one may wish to remove the noise from the observed series. Assuming that the noise-free series lies in some low-dimensional subspace of rank r, a common approach is to embed the noisy time series into a Hankel trajectory matrix. The singular value decomposition is then used to deconstruct the Hankel matrix into a sum of rank-one components. We wish to demonstrate that there may be some potential in using difference-based methods of the observed series in order to provide guidance regarding the separation of the noise from the signal, and to estimate the rank of the low-dimensional subspace in which the true signal is assumed to lie.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号