首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We present a novel methodology for a comprehensive statistical analysis of approximately periodic biosignal data. There are two main challenges in such analysis: (1) the automatic extraction (segmentation) of cycles from long, cyclostationary biosignals and (2) the subsequent statistical analysis, which in many cases involves the separation of temporal and amplitude variabilities. The proposed framework provides a principled approach for statistical analysis of such signals, which in turn allows for an efficient cycle segmentation algorithm. This is achieved using a convenient representation of functions called the square-root velocity function (SRVF). The segmented cycles, represented by SRVFs, are temporally aligned using the notion of the Karcher mean, which in turn allows for more efficient statistical summaries of signals. We show the strengths of this method through various disease classification experiments. In the case of myocardial infarction detection and localization, we show that our method compares favorably to methods described in the current literature.  相似文献   

2.
Functional logistic regression is becoming more popular as there are many situations where we are interested in the relation between functional covariates (as input) and a binary response (as output). Several approaches have been advocated, and this paper goes into detail about three of them: dimension reduction via functional principal component analysis, penalized functional regression, and wavelet expansions in combination with Least Absolute Shrinking and Selection Operator penalization. We discuss the performance of the three methods on simulated data and also apply the methods to data regarding lameness detection for horses. Emphasis is on classification performance, but we also discuss estimation of the unknown parameter function.  相似文献   

3.
This study considers the binary classification of functional data collected in the form of curves. In particular, we assume a situation in which the curves are highly mixed over the entire domain, so that the global discriminant analysis based on the entire domain is not effective. This study proposes an interval-based classification method for functional data: the informative intervals for classification are selected and used for separating the curves into two classes. The proposed method, called functional logistic regression with fused lasso penalty, combines the functional logistic regression as a classifier and the fused lasso for selecting discriminant segments. The proposed method automatically selects the most informative segments of functional data for classification by employing the fused lasso penalty and simultaneously classifies the data based on the selected segments using the functional logistic regression. The effectiveness of the proposed method is demonstrated with simulated and real data examples.  相似文献   

4.
This paper investigates several techniques to discriminate two multivariate stationary signals. The methods considered include Gaussian likelihood ratio tests for variance equality, a chi-squared time-domain test, and a spectral-based test. The latter two tests assess equality of the multivariate autocovariance function of the two signals over many different lags. The Gaussian likelihood ratio test is perhaps best viewed as principal component analyses (PCA) without dimension reduction aspects; it can be modified to consider covariance features other than variances via dimension augmentation tactics. A simulation study is constructed that shows how one can make inappropriate conclusions with PCA tests, even when dimension augmentation techniques are used to incorporate non-zero lag autocovariances into the analysis. The various discrimination methods are first discussed. A simulation study then illuminates the various properties of the methods. In this pursuit, calculations are needed to identify several multivariate time series models with specific autocovariance properties. To demonstrate the applicability of the methods, nine US and Canadian weather stations from three distinct regions are clustered. Here, the spectral clustering perfectly identified distinct regions, the chi-squared test performed marginally, and the PCA/likelihood ratio method did not perform well.  相似文献   

5.
Principal component analysis (PCA) and functional principal analysis are key tools in multivariate analysis, in particular modelling yield curves, but little attention is given to questions of uncertainty, neither in the components themselves nor in any derived quantities such as scores. Actuaries using PCA to model yield curves to assess interest rate risk for insurance companies are required to show any uncertainty in their calculations. Asymptotic results based on assumptions of multivariate normality are unsatisfactory for modest samples, and application of bootstrap methods is not straightforward, with the novel pitfalls of possible inversions in order of sample components and reversals of signs. We present methods for overcoming these difficulties and discuss arising of other potential hazards.  相似文献   

6.
大系统综合评价是综合评价的重要内容之一。文章在分析传统的主成分分析在降维效果、权系数等问题的基础上,提出了一种针对大系统综合评价的改进的主成分评价方法———多重主成分评价。在理论分析其合理性的基础上,最后利用商业银行经营绩效的实例验证了这种方法的有效性。  相似文献   

7.
Abstract.  Functional data analysis is a growing research field as more and more practical applications involve functional data. In this paper, we focus on the problem of regression and classification with functional predictors: the model suggested combines an efficient dimension reduction procedure [functional sliced inverse regression, first introduced by Ferré & Yao ( Statistics , 37, 2003 , 475)], for which we give a regularized version, with the accuracy of a neural network. Some consistency results are given and the method is successfully confronted to real-life data.  相似文献   

8.
The problem of estimating the time-varying frequency, phase and amplitude of a real-valued harmonic signal is considered. It is assumed that the frequency and amplitude are unspecified rapidly time-varying functions of time. The technique is based on fitting a local polynomial approximation of the phase and amplitude which implements a high-order nonlinear nonparametric estimator. The estimator is shown to be strongly consistent and Gaussian. In particular, the convergence ratesO(h-3/2 )and O(h-5/2 ), where $i;h$ei; is the number of observations, are obtained for the frequency estimator when the amplitude is unknown constant or linear in time respectively. The orders of the bias and Gaussian distribution are obtained for a class of the time-varying frequency and amplitude with bounded second derivatives. The a priori amplitude information about the unknown time-varying frequency and amplitude and their derivatives can be incorporated to improve the accuracy of the estimation. Simulation results are given.  相似文献   

9.
Linear combinations of random variables play a crucial role in multivariate analysis. Two extension of this concept are considered for functional data and shown to coincide using the Loève–Parzen reproducing kernel Hilbert space representation of a stochastic process. This theory is then used to provide an extension of the multivariate concept of canonical correlation. A solution to the regression problem of best linear unbiased prediction is obtained from this abstract canonical correlation formulation. The classical identities of Lawley and Rao that lead to canonical factor analysis are also generalized to the functional data setting. Finally, the relationship between Fisher's linear discriminant analysis and canonical correlation analysis for random vectors is extended to include situations with function-valued random elements. This allows for classification using the canonical Y scores and related distance measures.  相似文献   

10.
The traditional classification is based on the assumption that distribution of indicator variable X in one class is homogeneous. However, when data in one class comes from heterogeneous distribution, the likelihood ratio of two classes is not unique. In this paper, we construct the classification via an ambiguity criterion for the case of distribution heterogeneity of X in a single class. The separated historical data in each situation are used to estimate the thresholds respectively. The final boundary is chosen as the maximum and minimum thresholds from all situations. Our approach obtains the minimum ambiguity with a high classification accuracy allowing for a precise decision. In addition, nonparametric estimation of the classification region and theoretical properties are derived. Simulation study and real data analysis are reported to demonstrate the effectiveness of our method.  相似文献   

11.
Summary.  The paper introduces a semiparametric model for functional data. The warping functions are assumed to be linear combinations of q common components, which are estimated from the data (hence the name 'self-modelling'). Even small values of q provide remarkable model flexibility, comparable with nonparametric methods. At the same time, this approach avoids overfitting because the common components are estimated combining data across individuals. As a convenient by-product, component scores are often interpretable and can be used for statistical inference (an example of classification based on scores is given).  相似文献   

12.
Generalized discriminant analysis based on distances   总被引:14,自引:1,他引:13  
This paper describes a method of generalized discriminant analysis based on a dissimilarity matrix to test for differences in a priori groups of multivariate observations. Use of classical multidimensional scaling produces a low‐dimensional representation of the data for which Euclidean distances approximate the original dissimilarities. The resulting scores are then analysed using discriminant analysis, giving tests based on the canonical correlations. The asymptotic distributions of these statistics under permutations of the observations are shown to be invariant to changes in the distributions of the original variables, unlike the distributions of the multi‐response permutation test statistics which have been considered by other workers for testing differences among groups. This canonical method is applied to multivariate fish assemblage data, with Monte Carlo simulations to make power comparisons and to compare theoretical results and empirical distributions. The paper proposes classification based on distances. Error rates are estimated using cross‐validation.  相似文献   

13.
We consider classification in the situation of two groups with normally distributed data in the ‘large p small n’ framework. To counterbalance the high number of variables, we consider the thresholded independence rule. An upper bound on the classification error is established that is taylored to a mean value of interest in biological applications.  相似文献   

14.
Differential analysis techniques are commonly used to offer scientists a dimension reduction procedure and an interpretable gateway to variable selection, especially when confronting high-dimensional genomic data. Huang et al. used a gene expression profile of breast cancer cell lines to identify genomic markers which are highly correlated with in vitro sensitivity of a drug Dasatinib. They considered three statistical methods to identify differentially expressed genes and finally used the results from the intersection. But the statistical methods that are used in the paper are not sufficient to select the genomic markers. In this paper we used three alternative statistical methods to select a combined list of genomic markers and compared the genes that were proposed by Huang et al. We then proposed to use sparse principal component analysis (Sparse PCA) to identify a final list of genomic markers. The Sparse PCA incorporates correlation into account among the genes and helps to draw a successful genomic markers discovery. We present a new and a small set of genomic markers to separate out the groups of patients effectively who are sensitive to the drug Dasatinib. The analysis procedure will also encourage scientists in identifying genomic markers that can help to separate out two groups.  相似文献   

15.
In the classical discriminant analysis, when two multivariate normal distributions with equal variance–covariance matrices are assumed for two groups, the classical linear discriminant function is optimal with respect to maximizing the standardized difference between the means of two groups. However, for a typical case‐control study, the distributional assumption for the case group often needs to be relaxed in practice. Komori et al. (Generalized t ‐statistic for two‐group classification. Biometrics 2015, 71: 404–416) proposed the generalized t ‐statistic to obtain a linear discriminant function, which allows for heterogeneity of case group. Their procedure has an optimality property in the class of consideration. We perform a further study of the problem and show that additional improvement is achievable. The approach we propose does not require a parametric distributional assumption on the case group. We further show that the new estimator is efficient, in that no further improvement is possible to construct the linear discriminant function more efficiently. We conduct simulation studies and real data examples to illustrate the finite sample performance and the gain that it produces in comparison with existing methods.  相似文献   

16.
A model-based classification technique is developed, based on mixtures of multivariate t-factor analyzers. Specifically, two related mixture models are developed and their classification efficacy studied. An AECM algorithm is used for parameter estimation, and convergence of these algorithms is determined using Aitken's acceleration. Two different techniques are proposed for model selection: the BIC and the ICL. Our classification technique is applied to data on red wine samples from Italy and to fatty acid measurements on Italian olive oils. These results are discussed and compared to more established classification techniques; under this comparison, our mixture models give excellent classification performance.  相似文献   

17.
The problem of classification into two univariate normal populations with a common mean is considered. Several classification rules are proposed based on efficient estimators of the common mean. Detailed numerical comparisons of probabilities of misclassifications using these rules have been carried out. It is shown that the classification rule based on the Graybill-Deal estimator of the common mean performs the best. Classification rules are also proposed for the case when variances are assumed to be ordered. Comparison of these rules with the rule based on the Graybill-Deal estimator has been done with respect to individual probabilities of misclassification.  相似文献   

18.
This article introduces BestClass, a set of SAS macros, available in the mainframe and workstation environment, designed for solving two-group classification problems using a class of recently developed nonparametric classification methods. The criteria used to estimate the classification function are based on either minimizing a function of the absolute deviations from the surface which separates the groups, or directly minimizing a function of the number of misclassified entities in the training sample. The solution techniques used by BestClass to estimate the classification rule use the mathematical programming routines of the SAS/OR software. Recently, a number of research studies have reported that under certain data conditions this class of classification methods can provide more accurate classification results than existing methods, such as Fisher's linear discriminant function and logistic regression. However, these robust classification methods have not yet been implemented in the major statistical packages, and hence are beyond the reach of those statistical analysts who are unfamiliar with mathematical programming techniques. We use a limited simulation experiment and an example to compare and contrast properties of the methods included in Best-Class with existing parametric and nonparametric methods. We believe that BestClass contributes significantly to the field of nonparametric classification analysis, in that it provides the statistical community with convenient access to this recently developed class of methods. BestClass is available from the authors.  相似文献   

19.
Model-based classification using latent Gaussian mixture models   总被引:1,自引:0,他引:1  
A novel model-based classification technique is introduced based on parsimonious Gaussian mixture models (PGMMs). PGMMs, which were introduced recently as a model-based clustering technique, arise from a generalization of the mixtures of factor analyzers model and are based on a latent Gaussian mixture model. In this paper, this mixture modelling structure is used for model-based classification and the particular area of application is food authenticity. Model-based classification is performed by jointly modelling data with known and unknown group memberships within a likelihood framework and then estimating parameters, including the unknown group memberships, within an alternating expectation-conditional maximization framework. Model selection is carried out using the Bayesian information criteria and the quality of the maximum a posteriori classifications is summarized using the misclassification rate and the adjusted Rand index. This new model-based classification technique gives excellent classification performance when applied to real food authenticity data on the chemical properties of olive oils from nine areas of Italy.  相似文献   

20.
In this paper, we propose two new tests to test the symmetry of a distribution. These tests are built up on the asymptotic normality of the L1-distance to the symmetry of the Kernel and histogram density estimates. A simulation study is carried out to evaluate performances of the kernel based test.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号