首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We develop functional data analysis techniques using the differential geometry of a manifold of smooth elastic functions on an interval in which the functions are represented by a log-speed function and an angle function. The manifold's geometry provides a method for computing a sample mean function and principal components on tangent spaces. Using tangent principal component analysis, we estimate probability models for functional data and apply them to functional analysis of variance, discriminant analysis, and clustering. We demonstrate these tasks using a collection of growth curves from children from ages 1–18.  相似文献   

2.
Abstract.  Functional data analysis is a growing research field as more and more practical applications involve functional data. In this paper, we focus on the problem of regression and classification with functional predictors: the model suggested combines an efficient dimension reduction procedure [functional sliced inverse regression, first introduced by Ferré & Yao ( Statistics , 37, 2003 , 475)], for which we give a regularized version, with the accuracy of a neural network. Some consistency results are given and the method is successfully confronted to real-life data.  相似文献   

3.
4.
Summary.  The problem of component choice in regression-based prediction has a long history. The main cases where important choices must be made are functional data analysis, and problems in which the explanatory variables are relatively high dimensional vectors. Indeed, principal component analysis has become the basis for methods for functional linear regression. In this context the number of components can also be interpreted as a smoothing parameter, and so the viewpoint is a little different from that for standard linear regression. However, arguments for and against conventional component choice methods are relevant to both settings and have received significant recent attention. We give a theoretical argument, which is applicable in a wide variety of settings, justifying the conventional approach. Although our result is of minimax type, it is not asymptotic in nature; it holds for each sample size. Motivated by the insight that is gained from this analysis, we give theoretical and numerical justification for cross-validation choice of the number of components that is used for prediction. In particular we show that cross-validation leads to asymptotic minimization of mean summed squared error, in settings which include functional data analysis.  相似文献   

5.
In this paper, we investigate the relationship between a functional random covariable and a scalar response which is subject to left-truncation by another random variable. Precisely, we use the mean squared relative error as a loss function to construct a nonparametric estimator of the regression operator of these functional truncated data. Under some standard assumptions in functional data analysis, we establish the almost sure consistency, with rates, of the constructed estimator as well as its asymptotic normality. Then, a simulation study, on finite-sized samples, was carried out in order to show the efficiency of our estimation procedure and to highlight its superiority over the classical kernel estimation, for different levels of simulated truncated data.  相似文献   

6.
This article presents the results of a simulation study of variable selection in a multiple regression context that evaluates the frequency of selecting noise variables and the bias of the adjusted R 2 of the selected variables when some of the candidate variables are authentic. It is demonstrated that for most samples a large percentage of the selected variables is noise, particularly when the number of candidate variables is large relative to the number of observations. The adjusted R 2 of the selected variables is highly inflated.  相似文献   

7.
Nonparametric functional data analysis is a field whose development started some 15 years ago and there is a very extensive literature on the topic (hundreds of papers published now). The first aim of this survey is to discuss the state of art in the field through a necessarily selected, bibliographical survey. The second aim is to present a wide scope of open questions in order to promote further discussions. Our main purpose is restricted to methodological contributions and also to emphasize on kernel functional regression analysis before extending discussions in two directions: alternative techniques to kernel methods for functional regression and other statistical problems besides regression.  相似文献   

8.
The present study investigates the performance of fice discrimination methods for data consisting of a mixture of continuous and binary variables. The methods are Fisher’s linear discrimination, logistic discrimination, quadratic discrimination, a kernal model and an independence model. Six-dimensional data, consisting of three binary and three continuous variables, are simulated according to a location model. The results show an almost identical performance for Fisher’s linear discrimination and logistic discrimination. Only in situations with independently distributed variables the independence model does have a reasonable discriminatory ability for the dimensionality considered. If the log likelihood ratio is non-linear ratio is non-linear with respect to its continuous and binary part, the quadratic discrimination method is substantial better than linear and logistic discrimination, followed by the kernel method. A very good performance is obtained when in every situation the better one of linear and quardratic discrimination is used.  相似文献   

9.
We consider the problem of local linear estimation of the regression function when the regressor is functional. The main result of this paper is to prove the strong convergence (with rates), uniformly in bandwidth parameters (UIB), of the considered estimator. The main interest of this result is the possibility to derive the asymptotic properties of our estimate even if the bandwidth parameter is a random variable.  相似文献   

10.
Recently, there has been a great interest in the analysis of longitudinal data in which the observation process is related to the longitudinal process. In literature, the observation process was commonly regarded as a recurrent event process. Sometimes some observation duration may occur and this process is referred to as a recurrent episode process. The medical cost related to hospitalization is an example. We propose a conditional modeling approach that takes into account both informative observation process and observation duration. We conducted simulation studies to assess the performance of the method and applied it to a dataset of medical costs.  相似文献   

11.
Prognostic studies are essential to understand the role of particular prognostic factors and, thus, improve prognosis. In most studies, disease progression trajectories of individual patients may end up with one of mutually exclusive endpoints or can involve a sequence of different events.

One challenge in such studies concerns separating the effects of putative prognostic factors on these different endpoints and testing the differences between these effects.

In this article, we systematically evaluate and compare, through simulations, the performance of three alternative multivariable regression approaches in analyzing competing risks and multiple-event longitudinal data. The three approaches are: (1) fitting separate event-specific Cox's proportional hazards models; (2) the extension of Cox's model to competing risks proposed by Lunn and McNeil; and (3) Markov multi-state model.

The simulation design is based on a prognostic study of cancer progression, and several simulated scenarios help investigate different methodological issues relevant to the modeling of multiple-event processes of disease progression. The results highlight some practically important issues. Specifically, the decreased precision of the observed timing of intermediary (non fatal) events has a strong negative impact on the accuracy of regression coefficients estimated with either the Cox's or Lunn-McNeil models, while the Markov model appears to be quite robust, under the same circumstances. Furthermore, the tests based on both Markov and Lunn-McNeil models had similar power for detecting a difference between the effects of the same covariate on the hazards of two mutually exclusive events. The Markov approach yields also accurate Type I error rate and good empirical power for testing the hypothesis that the effect of a prognostic factor on changes after an intermediary event, which cannot be directly tested with the Lunn-McNeil method. Bootstrap-based standard errors improve the coverage rates for Markov model estimates. Overall, the results of our simulations validate Markov multi-state model for a wide range of data structures encountered in prognostic studies of disease progression, and may guide end users regarding the choice of model(s) most appropriate for their specific application.  相似文献   

12.
以美团网为例,使用函数型数据研究网络团购市场的结构与发展。首先将从网络采集到的团购离散数据,根据分析目的构造成函数化数据,然后对函数化的数据进行描述统计分析和函数主成分分析,并对不同种类的团购与不同地区的团购进行比较。最终得出区别于以往研究团购的结论:当期团购市场仍以美食类为主,购物类团购逐渐被大家认同,休闲类团购集中度下降;上海与北京都属于团购发展较快地区,而北京地区相对变动较大;热门地区团购销售额的集中度随着团购地区的扩张而被稀释。  相似文献   

13.
In this article, we aim at assessing hierarchical Bayesian modeling for the analysis of multiple exposures and highly correlated effects in a multilevel setting. We exploit an artificial data set to apply our method and show the gains in the final estimates of the crucial parameters. As a motivating example to simulate data, we consider a real prospective cohort study designed to investigate the association of dietary exposures with the occurrence of colon-rectum cancer in a multilevel framework, where, e.g., individuals have been enrolled from different countries or cities. We rely on the presence of some additional information suitable to mediate the final effects of the exposures and to be arranged in a level-2 regression to model similarities among the parameters of interest (e.g., data on the nutrient compositions for each dietary item).  相似文献   

14.
In applied statistical data analysis, overdispersion is a common feature. It can be addressed using both multiplicative and additive random effects. A multiplicative model for count data incorporates a gamma random effect as a multiplicative factor into the mean, whereas an additive model assumes a normally distributed random effect, entered into the linear predictor. Using Bayesian principles, these ideas are applied to longitudinal count data, based on the so-called combined model. The performance of the additive and multiplicative approaches is compared using a simulation study.  相似文献   

15.
Abstract. The first goal of this article is to consider influence analysis of principal Hessian directions (pHd) and highlight how such an analysis can provide valuable insight into its behaviour. Such insight includes reasons as to why pHd can sometimes return informative results when it is not expected to do so, and why many prefer a residuals‐based pHd method over its response‐based counterpart. The secondary goal of this article is to introduce a new influence measure applicable to many dimension reduction methods based on average squared canonical correlations. A general form of this measure is also given, allowing for application to dimension reduction methods other than pHd. A sample version of the measure is considered, with respect to pHd, with two example data sets.  相似文献   

16.
The combined model accounts for different forms of extra-variability and has traditionally been applied in the likelihood framework, or in the Bayesian setting via Markov chain Monte Carlo. In this article, integrated nested Laplace approximation is investigated as an alternative estimation method for the combined model for count data, and compared with the former estimation techniques. Longitudinal, spatial, and multi-hierarchical data scenarios are investigated in three case studies as well as a simulation study. As a conclusion, integrated nested Laplace approximation provides fast and precise estimation, while avoiding convergence problems often seen when using Markov chain Monte Carlo.  相似文献   

17.
This paper introduces a parametric discrete failure time model which allows a variety of smooth hazard function shapes, including shapes which are not readily available with continuous failure time models. The model is easy to fit, and statistical inference is simple. Further, it is readily extended to allow for differences between subjects while retaining the ease of fit and simplicity of statistical inference. The performance of the discrete time analysis is demonstrated by application to several data sets.  相似文献   

18.
Multiple assessments of an efficacy variable are often conducted prior to the initiation of randomized treatments in clinical trials as baseline information. Two goals are investigated in this article, where the first goal is to investigate the choice of these baselines in the analysis of covariance (ANCOVA) to increase the statistical power, and the second to investigate the magnitude of power loss when a continuous efficacy variable is dichotomized to categorical variable as commonly reported the biomedical literature. A statistical power analysis is developed with extensive simulations based on data from clinical trials in study participants with end stage renal disease (ESRD). It is found that the baseline choices primarily depend on the correlations among the baselines and the efficacy variable, with substantial gains for correlations greater than 0.6 and negligible for less than 0.2. Continuous efficacy variables always give higher statistical power in the ANCOVA modeling and dichotomizing the efficacy variable generally decreases the statistical power by 25%, which is an important practicum in designing clinical trials for study sample size and realistically budget. These findings can be easily applied in and extended to other clinical trials with similar design.  相似文献   

19.
This paper surveys commercially available MS-DOS and Microsoft Windows based microcomputer software for survival analysis, especially for Cox proportional hazards regression and parametric survival models. Emphasis is given to functionality, documentation, generality, and flexibility of software. A discussion of the need for software integration is given, which leads to the conclusion that survival analysis software not closely tied to a well-designed package will not meet an analyst's general needs. Some standalone programs are good tools for teaching the theory of some survival analysis procedures, but they may not teach the student good data analysis techniques such as critically examining regression assumptions. We contrast typical software with a general, integrated, modeling framework that is available with S-PLUS.  相似文献   

20.
It is well known that non ignorable item non response may occur when the cause of the non response is the value of the latent variable of interest. In these cases, a refusal by a respondent to answer specific questions in a survey should be treated sometimes as a non ignorable item non response. The Rasch-Rasch model (RRM) is a new two-dimensional item response theory model for addressing non ignorable non response. This article demonstrates the use of the RRM on data from an Italian survey focused on assessment of healthcare workers’ knowledge about sudden infant death syndrome (that is, a context in which non response is presumed to be more likely among individuals with a low level of competence). We compare the performance of the RRM with other models within the Rasch model family that assume the unidimensionality of the latent trait. We conclude that this assumption should be considered unreliable for the data at hand, whereas the RRM provides a better fit of the data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号