首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到12条相似文献,搜索用时 0 毫秒
1.
Clustering algorithms like types of k-means are fast, but they are inefficient for shape clustering. There are some algorithms, which are effective, but their time complexities are too high. This paper proposes a novel heuristic to solve large-scale shape clustering. The proposed method is effective and it solves large-scale clustering problems in fraction of a second.  相似文献   

2.
A new method for constructing interpretable principal components is proposed. The method first clusters the variables, and then interpretable (sparse) components are constructed from the correlation matrices of the clustered variables. For the first step of the method, a new weighted-variances method for clustering variables is proposed. It reflects the nature of the problem that the interpretable components should maximize the explained variance and thus provide sparse dimension reduction. An important feature of the new clustering procedure is that the optimal number of clusters (and components) can be determined in a non-subjective manner. The new method is illustrated using well-known simulated and real data sets. It clearly outperforms many existing methods for sparse principal component analysis in terms of both explained variance and sparseness.  相似文献   

3.
Outlier detection has been used extensively in data analysis to detect anomalous observation in data. It has important applications such as in fraud detection and robust analysis, among others. In this paper, we propose a method in detecting multiple outliers in linear functional relationship model for circular variables. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering approach. With the use of a tree diagram, we illustrate the detection of outliers graphically. A Monte Carlo simulation study is done to verify the accuracy of the proposed method. Low probability of masking and swamping effects indicate the validity of the proposed approach. Also, the illustrations to two sets of real data are given to show its practical applicability.  相似文献   

4.
In this work, we modify finite mixtures of factor analysers to provide a method for simultaneous clustering of subjects and multivariate discrete outcomes. The joint clustering is performed through a suitable reparameterization of the outcome (column)-specific parameters. We develop an expectation–maximization-type algorithm for maximum likelihood parameter estimation where the maximization step is divided into orthogonal sub-blocks that refer to row and column-specific parameters, respectively. Model performance is evaluated via a simulation study with varying sample size, number of outcomes and row/column-specific clustering (partitions). We compare the performance of our model with the performance of standard model-based biclustering approaches. The proposed method is also demonstrated on a benchmark data set where a multivariate binary response is considered.  相似文献   

5.
We empirically illustrate how concepts and methods involved in a grade of membership (GoM) analysis can be used to sort individuals by competence. Our study relies on a data set compiled from the international survey on higher education graduates called REFLEX. We focus on the subset of data related to the perception of own competencies. It is first decomposed into fuzzy clusters that form a hierarchical fuzzy partition. Then, we calculate a scalar measure of competencies for each fuzzy cluster, and subsequently use the individual GoM scores to combine cluster-based competencies to position individuals on a scale from 0 to 1.  相似文献   

6.
The paper presents a new approach to interrelated two-way clustering of gene expression data. Clustering of genes has been effected using entropy and a correlation measure, whereas the samples have been clustered using the fuzzy C-means. The efficiency of this approach has been tested on two well known data sets: the colon cancer data set and the leukemia data set. Using this approach, we were able to identify the important co-regulated genes and group the samples efficiently at the same time.  相似文献   

7.
A large volume of CCD X-ray spectra is being generated by the Chandra X-ray Observatory (Chandra) and XMM-Newton. Automated spectral analysis and classification methods can aid in sorting, characterizing, and classifying this large volume of CCD X-ray spectra in a non-parametric fashion, complementary to current parametric model fits. We have developed an algorithm that uses multivariate statistical techniques, including an ensemble clustering method, applied for the first time for X-ray spectral classification. The algorithm uses spectral data to group similar discrete sources of X-ray emission by placing the X-ray sources in a three-dimensional spectral sequence and then grouping the ordered sources into clusters based on their spectra. This new method can handle large quantities of data and operate independently of the requirement of spectral source models and a priori knowledge concerning the nature of the sources (i.e., young stars, interacting binaries, active galactic nuclei). We apply the method to Chandra imaging spectroscopy of the young stellar clusters in the Orion Nebula Cluster and the NGC 1333 star formation region.  相似文献   

8.
Biomarkers have the potential to improve our understanding of disease diagnosis and prognosis. Biomarker levels that fall below the assay detection limits (DLs), however, compromise the application of biomarkers in research and practice. Most existing methods to handle non-detects focus on a scenario in which the response variable is subject to the DL; only a few methods consider explanatory variables when dealing with DLs. We propose a Bayesian approach for generalized linear models with explanatory variables subject to lower, upper, or interval DLs. In simulation studies, we compared the proposed Bayesian approach to four commonly used methods in a logistic regression model with explanatory variable measurements subject to the DL. We also applied the Bayesian approach and other four methods in a real study, in which a panel of cytokine biomarkers was studied for their association with acute lung injury (ALI). We found that IL8 was associated with a moderate increase in risk for ALI in the model based on the proposed Bayesian approach.  相似文献   

9.
The aim of the article is to identify the intraday seasonality in a wind speed time series. Following the traditional approach, the marginal probability law is Weibull and, consequently, we consider seasonal Weibull law. A new estimation and decision procedure to estimate the seasonal Weibull law intraday scale parameter is presented. We will also give statistical decision-making tools to discard or not the trend parameter and to validate the seasonal model.  相似文献   

10.
Before biomarkers can be used in clinical trials or patients' management, the laboratory assays that measure their levels have to go through development and analytical validation. One of the most critical performance metrics for validation of any assay is related to the minimum amount of values that can be detected and any value below this limit is referred to as below the limit of detection (LOD). Most of the existing approaches that model such biomarkers, restricted by LOD, are parametric in nature. These parametric models, however, heavily depend on the distributional assumptions, and can result in loss of precision under the model or the distributional misspecifications. Using an example from a prostate cancer clinical trial, we show how a critical relationship between serum androgen biomarker and a prognostic factor of overall survival is completely missed by the widely used parametric Tobit model. Motivated by this example, we implement a semiparametric approach, through a pseudo-value technique, that effectively captures the important relationship between the LOD restricted serum androgen and the prognostic factor. Our simulations show that the pseudo-value based semiparametric model outperforms a commonly used parametric model for modeling below LOD biomarkers by having lower mean square errors of estimation.  相似文献   

11.
Abstract

The hypothesis tests of performance measures for an M/Ek/1 queueing system are considered. With pivotal models deduced from sufficient statistics for the unknown parameters, a generalized p-value approach to derive tests about parametric functions are proposed. The focus is on derivation of the p-values of hypothesis testing for five popular performance measures of the system in the steady state. Given a sample T, let p(T) be the p values we developed. We derive a closed form expression to show that, for small samples, the probability P(p(T) ? γ) is approximately equal to γ, for 0 ? γ ? 1.  相似文献   

12.
Summary. In developmental toxicity studies with exposure before implantation, the toxin that is used may interfere with the early reproductive process and prevent the implantation of some foetuses that would otherwise have been formed. A manifestation of this effect is a decreasing trend in the number of implants per dam as the dose level increases. Because unformed foetuses are not observable, Dunson proposed a multiple-imputation approach to estimate the number of missing foetuses. We propose instead to build a model for observable quantities only, namely the observed litter sizes and the observed numbers of deaths or malformations within litters. Using the probabilistic concept of thinning, we express the probability that a foetus fails to implant in terms of the parameters of the litter size distribution. Estimation is by means of generalized estimating equations and takes into account underdispersion of the observed litter sizes and overdispersion of the numbers of foetal deaths or malformations. A combined risk measure that takes into account not just post-implantation death or malformation but also the possibility of failure to implant is proposed and used to determine the virtually safe dose VSD. It is demonstrated numerically and graphically that ignoring the possibility of unformed foetuses leads to estimates of VSD that are too liberal. The proposed generalized estimating equation approach to estimating VSD and finding lower confidence limits is found to work well in a simulation study. We apply the proposed method to two data sets and compare the results that are obtained with those of existing studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号