首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents a study on symmetry of repeated bi-phased data signals, in particular, on quantification of the deviation between the two parts of the signal. Three symmetry scores are defined using functional data techniques such as smoothing and registration. One score is related to the L 2-distance between the two parts of the signal, whereas the other two are constructed to specifically measure differences in amplitude and phase. Moreover, symmetry scores based on functional principal component analysis (PCA) are examined. The scores are applied to acceleration signals from a study on equine gait. The scores turn out to be highly associated with lameness, and their applicability for lameness quantification and detection is investigated. Four classification approaches turn out to give similar results. The scores describing amplitude and phase variation turn out to outperform the PCA scores when it comes to the classification of lameness.  相似文献   

2.
In this article, we propose a novel approach to fit a functional linear regression in which both the response and the predictor are functions. We consider the case where the response and the predictor processes are both sparsely sampled at random time points and are contaminated with random errors. In addition, the random times are allowed to be different for the measurements of the predictor and the response functions. The aforementioned situation often occurs in longitudinal data settings. To estimate the covariance and the cross‐covariance functions, we use a regularization method over a reproducing kernel Hilbert space. The estimate of the cross‐covariance function is used to obtain estimates of the regression coefficient function and of the functional singular components. We derive the convergence rates of the proposed cross‐covariance, the regression coefficient, and the singular component function estimators. Furthermore, we show that, under some regularity conditions, the estimator of the coefficient function has a minimax optimal rate. We conduct a simulation study and demonstrate merits of the proposed method by comparing it to some other existing methods in the literature. We illustrate the method by an example of an application to a real‐world air quality dataset. The Canadian Journal of Statistics 47: 524–559; 2019 © 2019 Statistical Society of Canada  相似文献   

3.
4.
Abstract

One of the basic statistical methods of dimensionality reduction is analysis of discriminant coordinates given by Fisher (1936 Fisher, R. A. 1936. The use of multiple measurements in taxonomic problem. Annals of Eugenics 7 (2):17988. doi:10.1111/j.1469-1809.1936.tb02137.x.[Crossref] [Google Scholar]) and Rao (1948). The space of discriminant coordinates is a space convenient for presenting multidimensional data originating from multiple groups and for the use of various classification methods (methods of discriminant analysis). In the present paper, we adapt the classical discriminant coordinates analysis to multivariate functional data. The theory has been applied to analysis of textural properties of apples of six varieties, measured over a period of 180?days, stored in two types of refrigeration chamber.  相似文献   

5.
Statistical methods for an asymmetric normal classification do not adapt well to the situations where the population distributions are perturbed by an interval-screening scheme. This paper explores methods for providing an optimal classification of future samples in this situation. The properties of the screened population distributions are considered and two optimal regions for classifying the future samples are obtained. These developments yield yet other rules for the interval-screened asymmetric normal classification. The rules are studied from several aspects such as the probability of misclassification, robustness, and estimation of the rules. The investigation of the performance of the rules as well as the illustration of the screened classification idea, using two numerical examples, is also considered.  相似文献   

6.
This paper deals with the problem of increasing air pollution monitoring stations in Tehran city for efficient spatial prediction. As the data are multivariate and skewed, we introduce two multivariate skew models through developing the univariate skew Gaussian random field proposed by Zareifard and Jafari Khaledi [21 H. Zareifard and M. Jafari Khaledi, Non-Gaussian modeling of spatial data using scale mixing of a unified skew Gaussian process, J. Multivariate Anal. 114 (2013), pp. 1628. doi: 10.1016/j.jmva.2012.07.003[Crossref], [Web of Science ®] [Google Scholar]]. These models provide extensions of the linear model of coregionalization for non-Gaussian data. In the Bayesian framework, the optimal network design is found based on the maximum entropy criterion. A Markov chain Monte Carlo algorithm is developed to implement posterior inference. Finally, the applicability of two proposed models is demonstrated by analyzing an air pollution data set.  相似文献   

7.
We propose a hybrid two-group classification method that integrates linear discriminant analysis, a polynomial expansion of the basis (or variable space), and a genetic algorithm with multiple crossover operations to select variables from the expanded basis. Using new product launch data from the biochemical industry, we found that the proposed algorithm offers mean percentage decreases in the misclassification error rate of 50%, 56%, 59%, 77%, and 78% in comparison to a support vector machine, artificial neural network, quadratic discriminant analysis, linear discriminant analysis, and logistic regression, respectively. These improvements correspond to annual cost savings of $4.40–$25.73 million.  相似文献   

8.
We examined the impact of different methods for replacing missing data in discriminant analyses conducted on randomly generated samples from multivariate normal and non-normal distributions. The probabilities of correct classification were obtained for these discriminant analyses before and after randomly deleting data as well as after deleted data were replaced using: (1) variable means, (2) principal component projections, and (3) the EM algorithm. Populations compared were: (1) multivariate normal with covariance matrices ∑1=∑2, (2) multivariate normal with ∑1≠∑2 and (3) multivariate non-normal with ∑1=∑2. Differences in the probabilities of correct classification were most evident for populations with small Mahalanobis distances or high proportions of missing data. The three replacement methods performed similarly but all were better than non - replacement.  相似文献   

9.
The current paradigm for the identification of candidate drugs within the pharmaceutical industry typically involves the use of high-throughput screens. High-content screening (HCS) is the term given to the process of using an imaging platform to screen large numbers of compounds for some desirable biological activity. Classification methods have important applications in HCS experiments, where they are used to predict which compounds have the potential to be developed into new drugs. In this paper, a new classification method is proposed for batches of compounds where the rule is updated sequentially using information from the classification of previous batches. This methodology accounts for the possibility that the training data are not a representative sample of the test data and that the underlying group distributions may change as new compounds are analysed. This technique is illustrated on an example data set using linear discriminant analysis, k-nearest neighbour and random forest classifiers. Random forests are shown to be superior to the other classifiers and are further improved by the additional updating algorithm in terms of an increase in the number of true positives as well as a decrease in the number of false positives.  相似文献   

10.
In the context of longitudinal data analysis, a random function typically represents a subject that is often observed at a small number of time point. For discarding this restricted condition of observation number of each subject, we consider the semiparametric partially linear regression models with mean function x?βx?β + g(z), where x and z   are functional data. The estimations of ββ and g(z) are presented and some asymptotic results are given. It is shown that the estimator of the parametric component is asymptotically normal. The convergence rate of the estimator of the nonparametric component is also obtained. Here, the observation number of each subject is completely flexible. Some simulation study is conducted to investigate the finite sample performance of the proposed estimators.  相似文献   

11.
Studies on event occurrence may be conducted in experiments, where one or more treatment groups are compared to a control group. Most of the randomized trials are designed with equally sized groups, but this design is not always the best one. The statistical power of the study may be larger with unequal sample sizes, and researchers may want to place more participants in one group relative to the other due to resource constraints or costs. The optimal designs for discrete-time survival endpoints in trials with two groups, where different proportions of subjects in the experimental group are taken into account, can be studied using the generalized linear model. Applying a cost function, the optimal combination of the number of subjects and periods in the study and the optimal allocation ratio can be found. It is observed that the ratio of the recruitment costs in both groups, the ratio of the recruitment cost in the control group to the cost of obtaining a measurement, the size of the treatment effect, and the shape of the survival distribution have the greatest influence on the optimal design.  相似文献   

12.
This paper considers the problem where the linear discriminant rule is formed from training data that are only partially classified with respect to the two groups of origin. A further complication is that the data of unknown origin do not constitute an observed random sample from a mixture of the two under- lying groups. Under the assumption of a homoscedastic normal model, the overall error rate of the sample linear discriminant rule formed by maximum likelihood from the partially classified training data is derived up to and including terms of the first order in the case of univariate feature data. This first- order expansion of the sample rule so formed is used to define its asymptotic efficiency relative to the rule formed from a completely classified random training set and also to the rule formed from a completely unclassified random set.  相似文献   

13.
For longitudinal data, the within-subject dependence structure and covariance parameters may be of practical and theoretical interests. The estimation of covariance parameters has received much attention and been studied mainly in the framework of generalized estimating equations (GEEs). The GEEs method, however, is sensitive to outliers. In this paper, an alternative set of robust generalized estimating equations for both the mean and covariance parameters are proposed in the partial linear model for longitudinal data. The asymptotic properties of the proposed estimators of regression parameters, non-parametric function and covariance parameters are obtained. Simulation studies are conducted to evaluate the performance of the proposed estimators under different contaminations. The proposed method is illustrated with a real data analysis.  相似文献   

14.
Functional linear models are useful in longitudinal data analysis. They include many classical and recently proposed statistical models for longitudinal data and other functional data. Recently, smoothing spline and kernel methods have been proposed for estimating their coefficient functions nonparametrically but these methods are either intensive in computation or inefficient in performance. To overcome these drawbacks, in this paper, a simple and powerful two-step alternative is proposed. In particular, the implementation of the proposed approach via local polynomial smoothing is discussed. Methods for estimating standard deviations of estimated coefficient functions are also proposed. Some asymptotic results for the local polynomial estimators are established. Two longitudinal data sets, one of which involves time-dependent covariates, are used to demonstrate the approach proposed. Simulation studies show that our two-step approach improves the kernel method proposed by Hoover and co-workers in several aspects such as accuracy, computational time and visual appeal of the estimators.  相似文献   

15.
Linear combinations of random variables play a crucial role in multivariate analysis. Two extension of this concept are considered for functional data and shown to coincide using the Loève–Parzen reproducing kernel Hilbert space representation of a stochastic process. This theory is then used to provide an extension of the multivariate concept of canonical correlation. A solution to the regression problem of best linear unbiased prediction is obtained from this abstract canonical correlation formulation. The classical identities of Lawley and Rao that lead to canonical factor analysis are also generalized to the functional data setting. Finally, the relationship between Fisher's linear discriminant analysis and canonical correlation analysis for random vectors is extended to include situations with function-valued random elements. This allows for classification using the canonical Y scores and related distance measures.  相似文献   

16.
The main difficulty in parametric analysis of longitudinal data lies in specifying covariance structure. Several covariance structures, which usually reflect one series of measurements collected over time, have been presented in the literature. However there is a lack of literature on covariance structures designed for repeated measures specified by more than one repeated factor. In this paper a new, general method of modelling covariance structure based on the Kronecker product of underlying factor specific covariance profiles is presented. The method has an attractive interpretation in terms of independent factor specific contribution to overall within subject covariance structure and can be easily adapted to standard software.  相似文献   

17.
In haemodialysis patients, vascular access type is of paramount importance. Although recent studies have found that central venous catheter is often associated with poor outcomes and switching to arteriovenous fistula is beneficial, studies have not fully elucidated how the effect of switching of access on outcomes changes over time for patients on dialysis and whether the effect depends on switching time. In this paper, we characterise the switching access type effect on outcomes for haemodialysis patients. This is achieved by using a new class of multiple-index varying-coefficient (MIVC) models. We develop a new estimation procedure for MIVC models based on local linear, profile least-square method and Cholesky decomposition. Monte Carlo simulation studies show excellent finite sample performance. Finally, we analyse the dialysis data using our method.  相似文献   

18.
In longitudinal clinical studies, after randomization at baseline, subjects are followed for a period of time for development of symptoms. The interested inference could be the mean change from baseline to a particular visit in some lab values, the proportion of responders to some threshold category at a particular visit post baseline, or the time to some important event. However, in some applications, the interest may be in estimating the cumulative distribution function (CDF) at a fixed time point post baseline. When the data are fully observed, the CDF can be estimated by the empirical CDF. When patients discontinue prematurely during the course of the study, the empirical CDF cannot be directly used. In this paper, we use multiple imputation as a way to estimate the CDF in longitudinal studies when data are missing at random. The validity of the method is assessed on the basis of the bias and the Kolmogorov–Smirnov distance. The results suggest that multiple imputation yields less bias and less variability than the often used last observation carried forward method. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

19.
Many sparse linear discriminant analysis (LDA) methods have been proposed to overcome the major problems of the classic LDA in high‐dimensional settings. However, the asymptotic optimality results are limited to the case with only two classes. When there are more than two classes, the classification boundary is complicated and no explicit formulas for the classification errors exist. We consider the asymptotic optimality in the high‐dimensional settings for a large family of linear classification rules with arbitrary number of classes. Our main theorem provides easy‐to‐check criteria for the asymptotic optimality of a general classification rule in this family as dimensionality and sample size both go to infinity and the number of classes is arbitrary. We establish the corresponding convergence rates. The general theory is applied to the classic LDA and the extensions of two recently proposed sparse LDA methods to obtain the asymptotic optimality.  相似文献   

20.
Current status data arise when the death of every subject in a study cannot be determined precisely, but is known only to have occurred before or after a random monitoring time. The authors discuss the analysis of such data under semiparametric linear transformation models for which they propose a general inference procedure based on estimating functions. They determine the properties of the estimates they propose for the regression parameters of the model and illustrate their technique using tumorigenicity data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号