首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 703 毫秒
1.
Summary.  Asymmetry is a feature of shape which is of particular interest in a variety of applications. With landmark data, the essential information on asymmetry is contained in the degree to which there is a mismatch between a landmark configuration and its relabelled and matched reflection. This idea is explored in the context of a study of facial shape in infants, where particular interest lies in identifying changes over time and in assessing residual deformity in children who have had corrective surgery for a cleft lip or cleft lip and palate. Interest lies not in whether the mean shape is asymmetric but in comparing the degrees of asymmetry in different populations. A decomposition of the asymmetry score into components that are attributable to particular features of the face is proposed. A further decomposition allows different sources of asymmetry due to position, orientation or intrinsic asymmetry to be identified for each feature. The methods are also extended to data representing anatomical curves across the face.  相似文献   

2.
Face recognition has important applications in forensics (criminal identification) and security (biometric authentication). The problem of face recognition has been extensively studied in the computer vision community, from a variety of perspectives. A relatively new development is the use of facial asymmetry in face recognition, and we present here the results of a statistical investigation of this biometric. We first show how facial asymmetry information can be used to perform three different face recognition tasks—human identification (in the presence of expression variations), classification of faces by expression, and classification of individuals according to sex. Initially, we use a simple classification method, and conduct a feature analysis which shows the particular facial regions that play the dominant role in achieving these three entirely different classification goals. We then pursue human identification under expression changes in greater depth, since this is the most important task from a practical point of view. Two different ways of improving the performance of the simple classifier are then discussed: (i) feature combinations and (ii) the use of resampling techniques (bagging and random subspaces). With these modifications, we succeed in obtaining near perfect classification results on a database of 55 individuals, a statistically significant improvement over the initial results as seen by hypothesis tests of proportions.  相似文献   

3.
This article investigates maximum a-posteriori (MAP) estimation of autoregressive model parameters when the innovations (errors) follow a finite mixture of distributions that, in turn, are scale-mixtures of skew-normal distributions (SMSN), an attractive and extremely flexible family of probabilistic distributions. The proposed model allows to fit different types of data which can be associated with different noise levels, and provides a robust modelling with great flexibility to accommodate skewness, heavy tails, multimodality and stationarity simultaneously. Also, the existence of convenient hierarchical representations of the SMSN random variables allows us to develop an EM-type algorithm to perform the MAP estimates. A comprehensive simulation study is then conducted to illustrate the superior performance of the proposed method. The new methodology is also applied to annual barley yields data.  相似文献   

4.
Summary.  The paper provides a space–time process model for total wet mercury deposition. Key methodological features that are introduced include direct modelling of deposition rather than of expected deposition, the utilization of precipitation information (there is no deposition without precipitation) without having to construct a precipitation model and the handling of point masses at 0 in the distributions of both precipitation and deposition. The result is a specification that enables spatial interpolation and temporal prediction of deposition as well as aggregation in space or time to see patterns and trends in deposition. We use weekly deposition monitoring data from the National Atmospheric Deposition Program–Mercury Deposition Network for 2003 restricted to the eastern USA and Canada. Our spatiotemporal hierarchical model allows us to interpolate to arbitrary locations and, hence, to an arbitrary grid, enabling weekly deposition surfaces (with associated uncertainties) for this region. It also allows us to aggregate weekly depositions at coarser, quarterly and annual, temporal levels.  相似文献   

5.
Dynamic models for spatiotemporal data   总被引:1,自引:0,他引:1  
We propose a model for non-stationary spatiotemporal data. To account for spatial variability, we model the mean function at each time period as a locally weighted mixture of linear regressions. To incorporate temporal variation, we allow the regression coefficients to change through time. The model is cast in a Gaussian state space framework, which allows us to include temporal components such as trends, seasonal effects and autoregressions, and permits a fast implementation and full probabilistic inference for the parameters, interpolations and forecasts. To illustrate the model, we apply it to two large environmental data sets: tropical rainfall levels and Atlantic Ocean temperatures.  相似文献   

6.
Streaming feature selection is a greedy approach to variable selection that evaluates potential explanatory variables sequentially. It selects significant features as soon as they are discovered rather than testing them all and picking the best one. Because it is so greedy, streaming selection can rapidly explore large collections of features. If significance is defined by an alpha investing protocol, then the rate of false discoveries will be controlled. The focus of attention in variable selection, however, should be on fit rather than hypothesis testing. Little is known, however, about the risk of estimators produced by streaming selection and how the configuration of these estimators influences the risk. To meet these needs, we provide a computational framework based on stochastic dynamic programming that allows fast calculation of the minimax risk of a sequential estimator relative to an alternative. The alternative can be data driven or derived from an oracle. This framework allows us to compute and contrast the risk inflation of sequential estimators derived from various alpha investing rules. We find that a universal investing rule performs well over a variety of models and that estimators allowed to have larger than conventional rates of false discoveries produce generally smaller risk.  相似文献   

7.
We consider the context of probabilistic inference of model parameters given error bars or confidence intervals on model output values, when the data is unavailable. We introduce a class of algorithms in a Bayesian framework, relying on maximum entropy arguments and approximate Bayesian computation methods, to generate consistent data with the given summary statistics. Once we obtain consistent data sets, we pool the respective posteriors, to arrive at a single, averaged density on the parameters. This approach allows us to perform accurate forward uncertainty propagation consistent with the reported statistics.  相似文献   

8.
9.
Detection and correction of artificial shifts in climate series   总被引:6,自引:0,他引:6  
Summary.  Many long instrumental climate records are available and might provide useful information in climate research. These series are usually affected by artificial shifts, due to changes in the conditions of measurement and various kinds of spurious data. A comparison with surrounding weather-stations by means of a suitable two-factor model allows us to check the reliability of the series. An adapted penalized log-likelihood procedure is used to detect an unknown number of breaks and outliers. An example concerning temperature series from France confirms that a systematic comparison of the series together is valuable and allows us to correct the data even when no reliable series can be taken as a reference.  相似文献   

10.
In the Bayesian approach, the Behrens–Fisher problem has been posed as one of estimation for the difference of two means. No Bayesian solution to the Behrens–Fisher testing problem has yet been given due, perhaps, to the fact that the conventional priors used are improper. While default Bayesian analysis can be carried out for estimation purposes, it poses difficulties for testing problems. This paper generates sensible intrinsic and fractional prior distributions for the Behrens–Fisher testing problem from the improper priors commonly used for estimation. It allows us to compute the Bayes factor to compare the null and the alternative hypotheses. This default procedure of model selection is compared with a frequentist test and the Bayesian information criterion. We find discrepancy in the sense that frequentist and Bayesian information criterion reject the null hypothesis for data, that the Bayes factor for intrinsic or fractional priors do not.  相似文献   

11.
Quantitative model validation is playing an increasingly important role in performance and reliability assessment of a complex system whenever computer modelling and simulation are involved. The foci of this paper are to pursue a Bayesian probabilistic approach to quantitative model validation with non-normality data, considering data uncertainty and to investigate the impact of normality assumption on validation accuracy. The Box–Cox transformation method is employed to convert the non-normality data, with the purpose of facilitating the overall validation assessment of computational models with higher accuracy. Explicit expressions for the interval hypothesis testing-based Bayes factor are derived for the transformed data in the context of univariate and multivariate cases. Bayesian confidence measure is presented based on the Bayes factor metric. A generalized procedure is proposed to implement the proposed probabilistic methodology for model validation of complicated systems. Classic hypothesis testing method is employed to conduct a comparison study. The impact of data normality assumption and decision threshold variation on model assessment accuracy is investigated by using both classical and Bayesian approaches. The proposed methodology and procedure are demonstrated with a univariate stochastic damage accumulation model, a multivariate heat conduction problem and a multivariate dynamic system.  相似文献   

12.
The emerging field of cancer radiomics endeavors to characterize intrinsic patterns of tumor phenotypes and surrogate markers of response by transforming medical images into objects that yield quantifiable summary statistics to which regression and machine learning algorithms may be applied for statistical interrogation. Recent literature has identified clinicopathological association based on textural features deriving from gray-level co-occurrence matrices (GLCM) which facilitate evaluations of gray-level spatial dependence within a delineated region of interest. GLCM-derived features, however, tend to contribute highly redundant information. Moreover, when reporting selected feature sets, investigators often fail to adjust for multiplicities and commonly fail to convey the predictive power of their findings. This article presents a Bayesian probabilistic modeling framework for the GLCM as a multivariate object as well as describes its application within a cancer detection context based on computed tomography. The methodology, which circumvents processing steps and avoids evaluations of reductive and highly correlated feature sets, uses latent Gaussian Markov random field structure to characterize spatial dependencies among GLCM cells and facilitates classification via predictive probability. Correctly predicting the underlying pathology of 81% of the adrenal lesions in our case study, the proposed method outperformed current practices which achieved a maximum accuracy of only 59%. Simulations and theory are presented to further elucidate this comparison as well as ascertain the utility of applying multivariate Gaussian spatial processes to GLCM objects.  相似文献   

13.
We propose a heuristic for evaluating model adequacy for the Cox proportional hazard model by comparing the population cumulative hazard with the baseline cumulative hazard. We illustrate how recent results from the theory of competing risk can contribute to analysis of data with the Cox proportional hazard model. A classical theorem on independent competing risks allows us to assess model adequacy under the hypothesis of random right censoring, and a recent result on mixtures of exponentials predicts the patterns of the conditional subsurvival functions of random right censored data if the proportional hazard model holds.  相似文献   

14.
15.
Abstract.  In this paper, we propose a random varying-coefficient model for longitudinal data. This model is different from the standard varying-coefficient model in the sense that the time-varying coefficients are assumed to be subject-specific, and can be considered as realizations of stochastic processes. This modelling strategy allows us to employ powerful mixed-effects modelling techniques to efficiently incorporate the within-subject and between-subject variations in the estimators of time-varying coefficients. Thus, the subject-specific feature of longitudinal data is effectively considered in the proposed model. A backfitting algorithm is proposed to estimate the coefficient functions. Simulation studies show that the proposed estimation methods are more efficient in finite-sample performance compared with the standard local least squares method. An application to an AIDS clinical study is presented to illustrate the proposed methodologies.  相似文献   

16.
Generalized additive mixed models are proposed for overdispersed and correlated data, which arise frequently in studies involving clustered, hierarchical and spatial designs. This class of models allows flexible functional dependence of an outcome variable on covariates by using nonparametric regression, while accounting for correlation between observations by using random effects. We estimate nonparametric functions by using smoothing splines and jointly estimate smoothing parameters and variance components by using marginal quasi-likelihood. Because numerical integration is often required by maximizing the objective functions, double penalized quasi-likelihood is proposed to make approximate inference. Frequentist and Bayesian inferences are compared. A key feature of the method proposed is that it allows us to make systematic inference on all model components within a unified parametric mixed model framework and can be easily implemented by fitting a working generalized linear mixed model by using existing statistical software. A bias correction procedure is also proposed to improve the performance of double penalized quasi-likelihood for sparse data. We illustrate the method with an application to infectious disease data and we evaluate its performance through simulation.  相似文献   

17.

There has been increasing interest in using semi-supervised learning to form a classifier. As is well known, the (Fisher) information in an unclassified feature with unknown class label is less (considerably less for weakly separated classes) than that of a classified feature which has known class label. Hence in the case where the absence of class labels does not depend on the data, the expected error rate of a classifier formed from the classified and unclassified features in a partially classified sample is greater than that if the sample were completely classified. We propose to treat the labels of the unclassified features as missing data and to introduce a framework for their missingness as in the pioneering work of Rubin (Biometrika 63:581–592, 1976) for missingness in incomplete data analysis. An examination of several partially classified data sets in the literature suggests that the unclassified features are not occurring at random in the feature space, but rather tend to be concentrated in regions of relatively high entropy. It suggests that the missingness of the labels of the features can be modelled by representing the conditional probability of a missing label for a feature via the logistic model with covariate depending on the entropy of the feature or an appropriate proxy for it. We consider here the case of two normal classes with a common covariance matrix where for computational convenience the square of the discriminant function is used as the covariate in the logistic model in place of the negative log entropy. Rather paradoxically, we show that the classifier so formed from the partially classified sample may have smaller expected error rate than that if the sample were completely classified.

  相似文献   

18.
This paper describes a proposal for the extension of the dual multiple factor analysis (DMFA) method developed by Lê and Pagès 15 to the analysis of categorical tables in which the same set of variables is measured on different sets of individuals. The extension of DMFA is based on the transformation of categorical variables into properly weighted indicator variables, in a way analogous to that used in the multiple factor analysis of categorical variables. The DMFA of categorical variables enables visual comparison of the association structures between categories over the sample as a whole and in the various subsamples (sets of individuals). For each category, DMFA allows us to obtain its global (considering all the individuals) and partial (considering each set of individuals) coordinates in a factor space. This visual analysis allows us to compare the set of individuals to identify their similarities and differences. The suitability of the technique is illustrated through two applications: one using simulated data for two groups of individuals with very different association structures and the other using real data from a voting intention survey in which some respondents were interviewed by telephone and others face to face. The results indicate that the two data collection methods, while similar, are not entirely equivalent.  相似文献   

19.
20.
Probabilistic Principal Component Analysis   总被引:2,自引:0,他引:2  
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based on a probability model. We demonstrate how the principal axes of a set of observed data vectors may be determined through maximum likelihood estimation of parameters in a latent variable model that is closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss, with illustrative examples, the advantages conveyed by this probabilistic approach to PCA.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号