首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The authors introduce an algorithm for estimating the least trimmed squares (LTS) parameters in large data sets. The algorithm performs a genetic algorithm search to form a basic subset that is unlikely to contain outliers. Rousseeuw and van Driessen (2006 Rousseeuw , P. J. , van Driessen , K. ( 2006 ). Computing LTS regression for large data sets . Data Mining and Knowledge Discovery 12 : 2945 .[Crossref], [Web of Science ®] [Google Scholar]) suggested drawing independent basic subsets and iterating C-steps many times to minimize LTS criterion. The authors 'algorithm constructs a genetic algorithm to form a basic subset and iterates C-steps to calculate the cost value of the LTS criterion. Genetic algorithms are successful methods for optimizing nonlinear objective functions but they are slower in many cases. The genetic algorithm configuration in the algorithm can be kept simple because a small number of observations are searched from the data. An R package is prepared to perform Monte Carlo simulations on the algorithm. Simulation results show that the performance of the algorithm is suitable for even large data sets because a small number of trials is always performed.  相似文献   

2.
In this paper, we utilize normal/independent (NI) distributions as a tool for robust modeling of linear mixed models (LMM) under a Bayesian paradigm. The purpose is to develop a non-iterative sampling method to obtain i.i.d. samples approximately from the observed posterior distribution by combining the inverse Bayes formulae, sampling/importance resampling and posterior mode estimates from the expectation maximization algorithm to LMMs with NI distributions, as suggested by Tan et al. [33 Tan, M., Tian, G. and Ng, K. 2003. A noniterative sampling method for computing posteriors in the structure of EM-type algorithms. Statist. Sinica, 13(3): 625640. [Web of Science ®] [Google Scholar]]. The proposed algorithm provides a novel alternative to perfect sampling and eliminates the convergence problems of Markov chain Monte Carlo methods. In order to examine the robust aspects of the NI class, against outlying and influential observations, we present a Bayesian case deletion influence diagnostics based on the Kullback–Leibler divergence. Further, some discussions on model selection criteria are given. The new methodologies are exemplified through a real data set, illustrating the usefulness of the proposed methodology.  相似文献   

3.
In this paper a methodology for the delineation of local labour markets (LLMs) using evolutionary algorithms is proposed. This procedure, based on that in Flórez-Revuelta et al. [13 F. Flórez-Revuelta, J.M. Casado-Díaz, and L. Martínez-Bernabeu, An evolutionary approach to the delineation of functional areas base on travel-to-work flows, Int. J. Autom. Comput. 5(1) (2008), pp. 1021. doi: 10.1007/s11633-008-0010-6[Crossref] [Google Scholar],14 F. Flórez-Revuelta, J.M. Casado-Díaz, L. Martínez-Bernabeu, and R. Gómez-Hernández, A memetic algorithm for the delineation of local labour markets, in Parallel Problem Solving from Nature X, Vol. 5199, Lecture Notes in Computer Science, G. Rudolph, T.H. Jansen, S.M. Lucas, C. Poloni, and N. Beume, eds., Springer, Berlin, 2008, pp. 1011–1020. [Google Scholar]], introduces three modifications. First, initial groups of municipalities with a minimum size requirement are built using the travel time between them. Second, a not fully random initiation algorithm is proposed. And third, as a final stage of the procedure, a contiguity step is implemented. These modifications significantly decrease the computational times of the algorithm (up to a 99%) without any deterioration of the quality of the solutions. The optimization algorithm may give a set of potential solutions with very similar values with respect to the objective function what would lead to different partitions, both in terms of number of markets and their composition. In order to capture their common aspects an algorithm based on a cluster partitioning of k-means type is presented. This stage of the procedure also provides a ranking of LLMs foci useful for planners and administrations in decision-making processes on issues related to labour activities. Finally, to evaluate the performance of the algorithm a toy example with artificial data is analysed. The full methodology is illustrated through a real commuting data set of the region of Aragón (Spain).  相似文献   

4.
In this article, we apply an autoregressive correlation structure to the analysis of balanced familial clustered data in the one-parent case with homogeneous intra-class variance. We use the quasi-least squares procedure to derive estimators of the correlation parameters and compare them with maximum likelihood and moment estimators. Asymptotically, the quasi-least squares estimators are nearly as efficient as the maximum likelihood estimators. The small-sample case is analyzed through simulation, and the quasi-least squares estimators are found to be more robust than the maximum likelihood estimators. To show the application of the estimation procedures, data provided in Katapa (1993 Katapa , R. S. ( 1993 ). A test of hypothesis on familial correlations . Biometrics 49 : 569576 . [Google Scholar]) are re-analyzed. For non stationary unbalanced familial data, we outline general correlation models which are natural extensions of the structure studied in this article.  相似文献   

5.
6.
ABSTRACT

The area under a receiver operating characteristic curve is a useful index of the accuracy of a diagnostic test. When the diagnostic ability of a new biomarker is of interest only in a certain range of specificity, the partial area under the curve becomes desirable. In this article, we extend Bamber's (1975 Bamber , D. ( 1975 ). The area above the ordinal dominance graph and the area below the receiver operating characteristics graph. J. Math. Psychol. 12 : 387415 . [CSA] [Crossref], [Web of Science ®] [Google Scholar]) results and show that the partial area under a receiver operating characteristic curve is the probability of a constrained stochastic ordering. We then construct a ‘weighted’ Mann-Whitney statistic as an estimator of the partial area and investigate its statistical properties. A testing procedure is also developed to compare partial area under two receiver operating characteristic curves. The methods are exemplified with data from biomarkers associated with coronary heart disease.  相似文献   

7.
Sammon mapping is an approach of nonlinear dimension reduction and can be used for visualization. To avoid numerical complexity of the algorithm of traditional Sammon mapping, Kovacs and Abonyi (2004 Kovacs , A. , Abonyi , J. ( 2004 ). Visualization of fuzzy clustering results by modified Sammon mapping . Proceedings of the 3rd International Symposium of Hungarian Researchers on Computational Intelligence 177188 . [Google Scholar]) proposed a modified Sammon mapping method. However, this improvement can only be applied to fuzzy clustering results. By using the property of Fermat point, we develop a new method in this article that can be applied to any clustering results. Different from other methods of visualization, we transfer information of clustering results into concentric circles around the Fermat points. So our procedure can demonstrate the data structure in a more informative way and the clustering results become easier to understand, especially for nonprofessionals. The effectiveness of the proposed method is studied by application to a real data in this article.  相似文献   

8.
We propose a Bayesian method to select groups of correlated explanatory variables in a linear regression framework. We do this by introducing in the prior distribution assigned to the regression coefficients a random matrix $G$ that encodes the group structure. The groups can thus be inferred by sampling from the posterior distribution of $G$ . We then give a graph-theoretic interpretation of this random matrix $G$ as the adjacency matrix of cliques. We discuss the extension of the groups from cliques to more general random graphs, so that the proposed approach can be viewed as a method to find networks of correlated covariates that are associated with the response.  相似文献   

9.
10.
ABSTRACT

In this paper, m-dimensional distribution functions with truncation invariant dependence structure are studied. Some of the properties of generalized Archimedean class of copulas under this dependence structure are presented including some results on the conditions of compatibility. It has been shown that Archimedean copula generalized as it is described by Jouini and Clemen[1] Jouini, M.N. and Clemen, R.T. 1996. Copula Models for Aggregating Expert Opinions. Operations Research, 44(3): 444457.  [Google Scholar] which has the truncation invariant dependence structure has to have the form of independence or Cook-Johnson copula. We also consider a multi-parameter class of copulas derived from one-parameter Archimedean copulas. It has been shown that this class has a probabilistic meaning as a connecting copula of the truncated random pair with a right truncation region on the third variable. Multi-parameter copulas generated in this paper stays in the Archimedean class. We provide formulas to compute Kendall's tau and explore the dependence behavior of this multi-parameter class through examples.  相似文献   

11.
In this article, we consider a parametric survival model that is appropriate when the population of interest contains long-term survivors or immunes. The model referred to as the cure rate model was introduced by Boag 1 Boag, J. W. 1949. Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J. R. Stat. Soc. Ser. B, 11: 1553.  [Google Scholar] in terms of a mixture model that included a component representing the proportion of immunes and a distribution representing the life times of the susceptible population. We propose a cure rate model based on the generalized exponential distribution that incorporates the effects of risk factors or covariates on the probability of an individual being a long-time survivor. Maximum likelihood estimators of the model parameters are obtained using the the expectation-maximisation (EM) algorithm. A graphical method is also provided for assessing the goodness-of-fit of the model. We present an example to illustrate the fit of this model to data that examines the effects of different risk factors on relapse time for drug addicts.  相似文献   

12.
The accelerated failure time (AFT) models have proved useful in many contexts, though heavy censoring (as for example in cancer survival) and high dimensionality (as for example in microarray data) cause difficulties for model fitting and model selection. We propose new approaches to variable selection for censored data, based on AFT models optimized using regularized weighted least squares. The regularized technique uses a mixture of \(\ell _1\) and \(\ell _2\) norm penalties under two proposed elastic net type approaches. One is the adaptive elastic net and the other is weighted elastic net. The approaches extend the original approaches proposed by Ghosh (Adaptive elastic net: an improvement of elastic net to achieve oracle properties, Technical Reports 2007) and Hong and Zhang (Math Model Nat Phenom 5(3):115–133 2010), respectively. We also extend the two proposed approaches by adding censoring observations as constraints into their model optimization frameworks. The approaches are evaluated on microarray and by simulation. We compare the performance of these approaches with six other variable selection techniques-three are generally used for censored data and the other three are correlation-based greedy methods used for high-dimensional data.  相似文献   

13.
The Hidden semi-Markov models (HSMMs) were introduced to overcome the constraint of a geometric sojourn time distribution for the different hidden states in the classical hidden Markov models. Several variations of HSMMs were proposed that model the sojourn times by a parametric or a nonparametric family of distributions. In this article, we concentrate our interest on the nonparametric case where the duration distributions are attached to transitions and not to states as in most of the published papers in HSMMs. Therefore, it is worth noticing that here we treat the underlying hidden semi-Markov chain in its general probabilistic structure. In that case, Barbu and Limnios (2008 Barbu , V. , Limnios , N. ( 2008 ). Semi-Markov Chains and Hidden Semi-Markov Models Toward Applications: Their Use in Reliability and DNA Analysis . New York : Springer . [Google Scholar]) proposed an Expectation–Maximization (EM) algorithm in order to estimate the semi-Markov kernel and the emission probabilities that characterize the dynamics of the model. In this article, we consider an improved version of Barbu and Limnios' EM algorithm which is faster than the original one. Moreover, we propose a stochastic version of the EM algorithm that achieves comparable estimates with the EM algorithm in less execution time. Some numerical examples are provided which illustrate the efficient performance of the proposed algorithms.  相似文献   

14.
This paper proposes an intuitive clustering algorithm capable of automatically self-organizing data groups based on the original data structure. Comparisons between the propopsed algorithm and EM [1 A. Banerjee, I.S. Dhillon, J. Ghosh, and S. Sra, Clustering on the unit hypersphere using von Mises–Fisher distribution, J. Mach. Learn. Res. 6 (2005), pp. 139. [Google Scholar]] and spherical k-means [7 I.S. Dhillon and D.S. Modha, Concept decompositions for large sparse text data using clustering, Mach. Learn. 42 (2001), pp. 143175. doi: 10.1023/A:1007612920971[Crossref], [Web of Science ®] [Google Scholar]] algorithms are given. These numerical results show the effectiveness of the proposed algorithm, using the correct classification rate and the adjusted Rand index as evaluation criteria [5 J.-M. Chiou and P.-L. Li, Functional clustering and identifying substructures of longitudinal data, J. R. Statist. Soc. Ser. B. 69 (2007), pp. 679699. doi: 10.1111/j.1467-9868.2007.00605.x[Crossref] [Google Scholar],6 J.-M. Chiou and P.-L. Li, Correlation-based functional clustering via subspace projection, J. Am. Statist. Assoc. 103 (2008), pp. 16841692. doi: 10.1198/016214508000000814[Taylor &; Francis Online], [Web of Science ®] [Google Scholar]]. In 1995, Mayor and Queloz announced the detection of the first extrasolar planet (exoplanet) around a Sun-like star. Since then, observational efforts of astronomers have led to the detection of more than 1000 exoplanets. These discoveries may provide important information for understanding the formation and evolution of planetary systems. The proposed clustering algorithm is therefore used to study the data gathered on exoplanets. Two main implications are also suggested: (1) there are three major clusters, which correspond to the exoplanets in the regimes of disc, ongoing tidal and tidal interactions, respectively, and (2) the stellar metallicity does not play a key role in exoplanet migration.  相似文献   

15.
Coppi et al. [7 R. Coppi, P. D'Urso, and P. Giordani, Fuzzy and possibilistic clustering for fuzzy data, Comput. Stat. Data Anal. 56 (2012), pp. 915927. doi: 10.1016/j.csda.2010.09.013[Crossref], [Web of Science ®] [Google Scholar]] applied Yang and Wu's [20 M.-S. Yang and K.-L. Wu, Unsupervised possibilistic clustering, Pattern Recognit. 30 (2006), pp. 521. doi: 10.1016/j.patcog.2005.07.005[Crossref], [Web of Science ®] [Google Scholar]] idea to propose a possibilistic k-means (PkM) clustering algorithm for LR-type fuzzy numbers. The memberships in the objective function of PkM no longer need to satisfy the constraint in fuzzy k-means that of a data point across classes sum to one. However, the clustering performance of PkM depends on the initializations and weighting exponent. In this paper, we propose a robust clustering method based on a self-updating procedure. The proposed algorithm not only solves the initialization problems but also obtains a good clustering result. Several numerical examples also demonstrate the effectiveness and accuracy of the proposed clustering method, especially the robustness to initial values and noise. Finally, three real fuzzy data sets are used to illustrate the superiority of this proposed algorithm.  相似文献   

16.
We introduce the log-beta Weibull regression model based on the beta Weibull distribution (Famoye et al., 2005 Famoye , F. , Lee , C. , Olumolade , O. ( 2005 ). The beta-Weibull distribution . Journal of Statistical Theory and Applications 4 : 121136 . [Google Scholar]; Lee et al., 2007 Lee , C. , Famoye , F. , Olumolade , O. ( 2007 ). Beta-Weibull distribution: Some properties and applications to censored data . Journal of Modern Applied Statistical Methods 6 : 173186 .[Crossref] [Google Scholar]). We derive expansions for the moment generating function which do not depend on complicated functions. The new regression model represents a parametric family of models that includes as sub-models several widely known regression models that can be applied to censored survival data. We employ a frequentist analysis, a jackknife estimator, and a parametric bootstrap for the parameters of the proposed model. We derive the appropriate matrices for assessing local influences on the parameter estimates under different perturbation schemes and present some ways to assess global influences. Further, for different parameter settings, sample sizes, and censoring percentages, several simulations are performed. In addition, the empirical distribution of some modified residuals are displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be extended to a modified deviance residual in the proposed regression model applied to censored data. We define martingale and deviance residuals to evaluate the model assumptions. The extended regression model is very useful for the analysis of real data and could give more realistic fits than other special regression models.  相似文献   

17.
The t-distribution (univariate and multivariate) has many useful applications in robust statistical analysis. The parameter estimation of the t-distribution is carried out using maximum likelihood (ML) estimation method, and the ML estimates are obtained via the Expectation-Maximization (EM) algorithm. In this article, we will use the maximum Lq-likelihood (MLq) estimation method introduced by Ferrari and Yang (2010 Ferrari, D., and Y. Yang. 2010. Maximum lq-likelihood estimation. The Annals of Statistics 38 (2):75383.[Crossref], [Web of Science ®] [Google Scholar]) to estimate all the parameters of the multivariate t-distribution. We modify the EM algorithm to obtain the MLq estimates. We provide a simulation study and a real data example to illustrate the performance of the MLq estimators over the ML estimators.  相似文献   

18.
The Self-Healing Umbrella Sampling (SHUS) algorithm is an adaptive biasing algorithm which has been proposed in Marsili et al. (J Phys Chem B 110(29):14011–14013, 2006) in order to efficiently sample a multimodal probability measure. We show that this method can be seen as a variant of the well-known Wang–Landau algorithm Wang and Landau (Phys Rev E 64:056101, 2001a; Phys Rev Lett 86(10):2050–2053, 2001b). Adapting results on the convergence of the Wang-Landau algorithm obtained in Fort et al. (Math Comput 84(295):2297–2327, 2014a), we prove the convergence of the SHUS algorithm. We also compare the two methods in terms of efficiency. We finally propose a modification of the SHUS algorithm in order to increase its efficiency, and exhibit some similarities of SHUS with the well-tempered metadynamics method Barducci et al. (Phys Rev Lett 100:020,603, 2008).  相似文献   

19.
The varying coefficient (VC) model introduced by Hastie and Tibshirani [26 T. Hastie and R. Tibshirani, Varying-coefficient models, J. R. Statist. Soc. (Ser. B) 55 (1993), pp. 757796.[Web of Science ®] [Google Scholar]] is arguably one of the most remarkable recent developments in nonparametric regression theory. The VC model is an extension of the ordinary regression model where the coefficients are allowed to vary as smooth functions of an effect modifier possibly different from the regressors. The VC model reduces the modelling bias with its unique structure while also avoiding the ‘curse of dimensionality’ problem. While the VC model has been applied widely in a variety of disciplines, its application in economics has been minimal. The central goal of this paper is to apply VC modelling to the estimation of a hedonic house price function using data from Hong Kong, one of the world's most buoyant real estate markets. We demonstrate the advantages of the VC approach over traditional parametric and semi-parametric regressions in the face of a large number of regressors. We further combine VC modelling with quantile regression to examine the heterogeneity of the marginal effects of attributes across the distribution of housing prices.  相似文献   

20.
A typical problem in optimal design theory is finding an experimental design that is optimal with respect to some criteria in a class of designs. The most popular criteria include the A- and D-criteria. Regular graph designs occur in many optimality results, and if the number of blocks is large enough, an A-optimal (or D-optimal) design is among them (if any exist). To explore the landscape of designs with a large number of blocks, we introduce extensions of regular graph designs. These are constructed by adding the blocks of a balanced incomplete block design repeatedly to the original design. We present the results of an exact computer search for the best regular graph designs and the best extended regular graph designs with up to 20 treatments v, block size \(k \le 10\) and replication r \(\le 10\) and \(r(k-1)-(v-1)\lfloor r(k-1)/(v-1)\rfloor \le 9\).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号