首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
Survival data analysis aims at collecting data on durations spent in a state by a sample of units, in order to analyse the process of transition to a different state. Survival analysis applied to social and economic phenomena typically relies upon data on transitions collected, for a sample of units, in one or more follow-up surveys. We explore the effect of misclassification of the transition indicator on parameter estimates in an appropriate statistical model for the duration spent in an origin state. Some empirical investigations about the bias induced when ignoring misclassification are reported, extending the model to include the possibility that the rate of misclassification can vary across units according to the value of some covariates. Finally it is shown how a Bayesian approach can lead to parameter estimates.  相似文献   

Discriminant and cluster analysis of high-dimensional time series data have been an urgent need in more and more academic fields. To settle the always-existing problem of bias in distance-based classifiers for high-dimensional models, we consider a new classifier with jackknife-type bias adjustment for stationary time series data. The consistency of the classifier is theoretically shown under suitable conditions, including the situations of possibly high-dimensional data. We also conduct the cluster analysis for real financial data.  相似文献   

This paper deals with the analysis of datasets, where the subjects are described by the estimated means of a p-dimensional variable. Classical statistical methods of data analysis do not treat measurements affected by intrinsic variability, as in the case of estimates, so that the heterogeneity induced among subjects by this condition is not taken into account. In this paper a way to solve the problem is suggested in the context of symbolic data analysis, whose specific aim is to handle data tables where single valued measurements are substituted by complex data structures like frequency distributions, intervals, and sets of values. A principal component analysis is carried out according to this proposal, with a significant improvement in the treatment of information.  相似文献   

The Pareto distribution model assumption in the peaks over threshold method, will be tested by making using of the Kolmogorov–Smirnov goodness of fit method. Pareto distributed variables can be transformed to exponential, and the test will be for exponentiality. It was found that the statistic can be used as an indication of where to choose the threshold and to check the Pareto model assumption.  相似文献   

The von Mises-Fisher distribution is widely used for modeling directional data. In this article, we derive the discriminant rules based on this distribution to assign objects into pre-existing classes. We determine a distance between two von Mises-Fisher populations and we calculate estimates of the misclassification probabilities. We also analyze the behavior of the distance between two von Mises-Fisher populations and of the estimates of the misclassification probabilities when we modify the parameters of the populations or the samples size or the dimension of the sphere. Finally, we present an example with real spherical data available in the literature.  相似文献   


One of the basic statistical methods of dimensionality reduction is analysis of discriminant coordinates given by Fisher (1936 Fisher, R. A. 1936. The use of multiple measurements in taxonomic problem. Annals of Eugenics 7 (2):17988. doi:10.1111/j.1469-1809.1936.tb02137.x.[Crossref] [Google Scholar]) and Rao (1948). The space of discriminant coordinates is a space convenient for presenting multidimensional data originating from multiple groups and for the use of various classification methods (methods of discriminant analysis). In the present paper, we adapt the classical discriminant coordinates analysis to multivariate functional data. The theory has been applied to analysis of textural properties of apples of six varieties, measured over a period of 180?days, stored in two types of refrigeration chamber.  相似文献   

Jurcen Lauter 《Statistics》2013,47(1):125-137
In the paper, it is shown that the error rate of the discriminant analysis can be diminished when restrictions for the parameters are valid. To find suitable restrictions, at first the properties of hierarchical and other multivariate systems are investigated. Then, in a practical section, a modification of the discriminant analysis is offered that consists in decreasing the estimated partial correlations. Finally in the theoretical section, it is veri¬fied that an improvement of the discriminant analysis is attained by a suitable correctionof the positive and negative signs of the discriminant function.  相似文献   

For two or more populations of which the covariance matrices have a common set of eigenvectors, but different sets of eigenvalues, the common principal components (CPC) model is appropriate. Pepler et al. (2015 Pepler, P. T., Uys, D. W. and Nel, D. G. (2015). Regularised covariance matrix estimation under the common principal components model. Communications in Statistics: Simulation and Computation. (In press). [Google Scholar]) proposed a regularized CPC covariance matrix estimator and showed that this estimator outperforms the unbiased and pooled estimators in situations, where the CPC model is applicable. This article extends their work to the context of discriminant analysis for two groups, by plugging the regularized CPC estimator into the ordinary quadratic discriminant function. Monte Carlo simulation results show that CPC discriminant analysis offers significant improvements in misclassification error rates in certain situations, and at worst performs similar to ordinary quadratic and linear discriminant analysis. Based on these results, CPC discriminant analysis is recommended for situations, where the sample size is small compared to the number of variables, in particular for cases where there is uncertainty about the population covariance matrix structures.  相似文献   

In practical survey sampling, missing data are unavoidable due to nonresponse, rejected observations by editing, disclosure control, or outlier suppression. We propose a calibrated imputation approach so that valid point and variance estimates of the population (or domain) totals can be computed by the secondary users using simple complete‐sample formulae. This is especially helpful for variance estimation, which generally require additional information and tools that are unavailable to the secondary users. Our approach is natural for continuous variables, where the estimation may be either based on reweighting or imputation, including possibly their outlier‐robust extensions. We also propose a multivariate procedure to accommodate the estimation of the covariance matrix between estimated population totals, which facilitates variance estimation of the ratios or differences among the estimated totals. We illustrate the proposed approach using simulation data in supplementary materials that are available online.  相似文献   

We study two of the classical bounds for the Bayes error P e , Lissack and Fu’s separability bounds and Bhattacharyya’s bounds, in the classification of an observation into one of the two determined distributions, under the hypothesis that the prior probability χ itself has a probability distribution. The effectiveness of this distribution can be measured in terms of the ratio of two mean values. On the other hand, a discriminant analysis-based optimal classification rule allows us to derive the posterior distribution of χ, together with the related posterior bounds of P e . Research partially supported by NSERC grant A 9249 (Canada). The authors wish to thank two referees, for their very pertinent comments and suggestions, that have helped to improve the quality and the presentation of the paper, and we have, whenever possible, addressed their concerns.  相似文献   

Discriminant analysis (DA), particularly Discriminant Coordinates (DC), is broadly applied in the scientific literature and included in many statistical software packages. DC is used to analyze biomedical data, especially for differential diagnosis on the basis of laboratory profiles. Articles handling influence analysis in DA can be found in the literature; however, this topic has been scarcely touched upon in DC. In this article, the case-deletion approach is followed to introduce a perturbation in the data and influence measures are proposed to assess the effect on three statistics of interest: the transformation matrix, canonical directions, and configuration, of the sample centroids.  相似文献   

In this article we propose a new method of construction of discriminant coordinates and their kernel variant based on the regularization (ridge regression). Moreover, we compare the case of discriminant coordinates, functional discriminant coordinates and the kernel version of functional discriminant coordinates on 20 data sets from a wide variety of application domains using values of the criterion of goodness and statistical tests. Our experiments show that the kernel variant of discriminant coordinates provides significantly more accurate results on the examined data sets.  相似文献   

This paper is concerned with the problem of selecting variables in two-group discriminant analysis for high-dimensional data with fewer observations than the dimension. We consider a selection criterion based on approximately unbiased for AIC type of risk. When the dimension is large compared to the sample size, AIC type of risk cannot be defined. We propose AIC by replacing maximum likelihood estimator with ridge-type estimator. This idea follows Srivastava and Kubokawa (2008). It has been further extended by Yamamura et al. (2010). Simulation revealed that the proposed AIC performs well.  相似文献   

This paper presents a Bayesian method for the analysis of toxicological multivariate mortality data when the discrete mortality rate for each family of subjects at a given time depends on familial random effects and the toxicity level experienced by the family. Our aim is to model and analyse one set of such multivariate mortality data with large family sizes: the potassium thiocyanate (KSCN) tainted fish tank data of O'Hara Hines. The model used is based on a discretized hazard with additional time-varying familial random effects. A similar previous study (using sodium thiocyanate (NaSCN)) is used to construct a prior for the parameters in the current study. A simulation-based approach is used to compute posterior estimates of the model parameters and mortality rates and several other quantities of interest. Recent tools in Bayesian model diagnostics and variable subset selection have been incorporated to verify important modelling assumptions regarding the effects of time and heterogeneity among the families on the mortality rate. Further, Bayesian methods using predictive distributions are used for comparing several plausible models.  相似文献   

This article deals with a criterion for selection of variables for the multiple group discriminant analysis in high-dimensional data. The variable selection models considered for discriminant analysis in Fujikoshi (1985 Fujikoshi , Y. ( 1985 ). Selection of variables in discriminant analysis and canonical correlation analysis . In: Krishnaiah , P. R. , ed. Multivariate Analysis . Vol. VI. Amsterdam : North-Holland , pp. 219236 . [Google Scholar], 2002 Fujikoshi , Y. ( 2002 ). Selection of variables for discriminant analysis in a high-dimensional case . Sankhya Ser. A 64 : 256257 . [Google Scholar]) are the ones based on additional information due to Rao (1948 Rao , C. R. ( 1948 ). Tests of significance in multivariate analysis . Biometrika 35 : 5879 .[Crossref], [PubMed], [Web of Science ®] [Google Scholar], 1970 Rao , C. R. ( 1970 ). Inference on discriminant function coefficients . In: Bose , R. C. , ed. Essays in Probability and Statistics . Chapel Hill , NC : University of North Carolina Press , pp. 537602 . [Google Scholar]). Our criterion is based on Akaike information criterion (AIC) for this model. The AIC has been successfully used in the literature in model selection when the dimension p is smaller than the sample size N. However, the case when p > N has not been considered in the literature, because MLE can not be estimated corresponding to singularity of the within-group covariance matrix. A popular method used to address the singularity problem in high-dimensional classification is the regularized method, which replaces the within-group sample covariance matrix with a ridge-type covariance estimate to stabilize the estimate. In this article, we propose AIC-type criterion by replacing MLE of the within-group covariance matrix with ridge-type estimator. This idea follows Srivastava and Kubokawa (2008 Srivastava , M. S. , Kubokawa , T. ( 2008 ). Akaike information criterion for selecting components of the mean vector in high dimensional data with fewer observations . J. Japan Statist. Soc. 38 : 259283 . [Google Scholar]). Simulations revealed that our proposed criterion performs well.  相似文献   

The authors discuss prior distributions that are conjugate to the multivariate normal likelihood when some of the observations are incomplete. They present a general class of priors for incorporating information about unidentified parameters in the covariance matrix. They analyze the special case of monotone patterns of missing data, providing an explicit recursive form for the posterior distribution resulting from a conjugate prior distribution. They develop an importance sampling and a Gibbs sampling approach to sample from a general posterior distribution and compare the two methods.  相似文献   

It is well known that linear discriminant analysis (LDA) works well and is asymptotically optimal under fixed-p-large-n situations. But Bickel and Levina (2004 Bickel, P.J., Levina, E. (2004). Some theory for Fishers linear discriminant function, naive Bayes, and some alternatives when there are many more variables than observations. Bernoulli 10:9891010.[Crossref], [Web of Science ®] [Google Scholar]) showed that the LDA is as bad as random guessing when p > n. This article studies the sparse discriminant analysis via Dantzig penalized least squares. Our method avoids estimating the high-dimensional covariance matrix and does not need the sparsity assumption on the inverse of the covariance matrix. We show that the new discriminant analysis is asymptotically optimal theoretically. Simulation and real data studies show that the classifier performs better than the existing sparse methods.  相似文献   

The well-known chi-squared goodness-of-fit test for a multinomial distribution is generally biased when the observations are subject to misclassification. In Pardo and Zografos (2000) the problem was considered using a double sampling scheme and ø-divergence test statistics. A new problem appears if the null hypothesis is not simple because it is necessary to give estimators for the unknown parameters. In this paper the minimum ø-divergence estimators are considered and some of their properties are established. The proposed ø-divergence test statistics are obtained by calculating ø-divergences between probability density functions and by replacing parameters by their minimum ø-divergence estimators in the derived expressions. Asymptotic distributions of the new test statistics are also obtained. The testing procedure is illustrated with an example.  相似文献   

Linear combinations of random variables play a crucial role in multivariate analysis. Two extension of this concept are considered for functional data and shown to coincide using the Loève–Parzen reproducing kernel Hilbert space representation of a stochastic process. This theory is then used to provide an extension of the multivariate concept of canonical correlation. A solution to the regression problem of best linear unbiased prediction is obtained from this abstract canonical correlation formulation. The classical identities of Lawley and Rao that lead to canonical factor analysis are also generalized to the functional data setting. Finally, the relationship between Fisher's linear discriminant analysis and canonical correlation analysis for random vectors is extended to include situations with function-valued random elements. This allows for classification using the canonical Y scores and related distance measures.  相似文献   

Data analysts often explore a large database to identify the data of interest, but may not be able to specify the exact query to send to the database. A manual data exploration process is labor intensive and time-consuming. In the new paradigm of system-aided interactive data exploration, the Database Management System presents the samples to the user and engages the user in an interactive exploration process to identify the user interest. In this article, we examine a number of initial sampling techniques to identify at least one positive (i.e., interesting) sample and compare them both theoretically and empirically.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号