首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In some classification problems it may be important to impose constraints on the set of allowable solutions. In particular, in regional taxonomy, urban and regional studies often try to segment a set of territorial data in homogenous groups with respect to a set of socio-economic variables taking into account, at the same time, contiguous neighbourhoods. The objects in a class are thus required not only to be similar to one another but also to be part of a spatially contiguous set. The rationale behind this is that if a spatially varying phenomenon influences the objects, as could occur in the case of geographical units, and this spatial information were ignored in constructing the classes then it would be less likely to be detected. In this paper a constrained version of thek-means clustering method (MacQueen, 1967; Ball and Hall, 1967) and a new algorithm for devising such a procedure are proposed; the latter is based on the efficient algorithm proposed by Hartigan and Wong (1979). This algorithm has proved its usefulness in zoning two large regions in Italy (Calabria and Puglia).  相似文献   

2.
Behavioural models characterize the way customers behave in their use of a credit product. In this paper, we examine repayment and transaction behaviour with credit cards. In particular, we describe the development of Markov chain models for late repayment, investigate the extent to which there are different classes of behaviour pattern, and explore the extent to which distinct behaviours can be predicted. We also develop overall models for transaction time distributions. Once such models have been built to summarize the data, they can be used to predict likely future behaviour, and can also serve as the basis of predictions of what one might expect when economic circumstances change.  相似文献   

3.
A test of congruence among distance matrices is described. It tests the hypothesis that several matrices, containing different types of variables about the same objects, are congruent with one another, so they can be used jointly in statistical analysis. Raw data tables are turned into similarity or distance matrices prior to testing; they can then be compared to data that naturally come in the form of distance matrices. The proposed test can be seen as a generalization of the Mantel test of matrix correspondence to any number of distance matrices. This paper shows that the new test has the correct rate of Type I error and good power. Power increases as the number of objects and the number of congruent data matrices increase; power is higher when the total number of matrices in the study is smaller. To illustrate the method, the proposed test is used to test the hypothesis that matrices representing different types of organoleptic variables (colour, nose, body, palate and finish) in single‐malt Scotch whiskies are congruent.  相似文献   

4.
Directed acyclic graph (DAG) models—also called Bayesian networks—are widely used in probabilistic reasoning, machine learning and causal inference. If latent variables are present, then the set of possible marginal distributions over the remaining (observed) variables is generally not represented by any DAG. Larger classes of mixed graphical models have been introduced to overcome this; however, as we show, these classes are not sufficiently rich to capture all the marginal models that can arise. We introduce a new class of hyper‐graphs, called mDAGs, and a latent projection operation to obtain an mDAG from the margin of a DAG. We show that each distinct marginal of a DAG model is represented by at least one mDAG and provide graphical results towards characterizing equivalence of these models. Finally, we show that mDAGs correctly capture the marginal structure of causally interpreted DAGs under interventions on the observed variables.  相似文献   

5.
We consider a single-server queueing system which attends to N priority classes that are classified into two distinct types: (i) urgent: classes which have preemptive resume priority over at least one lower priority class, and (ii) non-urgent: classes which only have non-preemptive priority among lower priority classes. While urgent customers have preemptive priority, the ultimate decision on whether to interrupt a current service is based on certain discretionary rules. An accumulating prioritization is also incorporated. The marginal waiting time distributions are obtained and numerical examples comparing the new model to other similar priority queueing systems are provided.  相似文献   

6.
The purpose of this paper is to prove, through the analysis of the behaviour of a standard kernel density estimator, that the notion of weak dependence defined in a previous paper (cf. Doukhan & Louhichi, 1999) has sufficiently sharp properties to be used in various situations. More precisely we investigate the asymptotics of high order losses, asymptotic distributions and uniform almost sure behaviour of kernel density estimates. We prove that they are the same as for independent samples (with some restrictions for a.s. behaviours). Recall finally that this weak dependence condition extends on the previously defined ones such as mixing, association and it allows considerations of new classes such as weak shifts processes based on independent sequences as well as some non-mixing Markov processes.  相似文献   

7.
电子商务客户网络购物行为挖掘   总被引:2,自引:0,他引:2  
电子商务已经成为越来越多的消费者购物的一个重要途径,分析网络购物客户的个人特征及其购物行为,对商业的成功有着至关重要的作用。然而电子商务还是一个崭新的商业领域,很多的业界人士仍忙于技术方面的考虑,却很少分析客户的网络购买行为。而使用真实网络购物KDD Cup 2000数据,分析Gazelle.com公司客户的个人特征和网络购物行为,并应用数据挖掘的购物篮模型对各商品之间的关联性进行分析,才能更确切地预测模型预测客户的忠诚度。  相似文献   

8.
Abstract

Cluster analysis is the distribution of objects into different groups or more precisely the partitioning of a data set into subsets (clusters) so that the data in subsets share some common trait according to some distance measure. Unlike classification, in clustering one has to first decide the optimum number of clusters and then assign the objects into different clusters. Solution of such problems for a large number of high dimensional data points is quite complicated and most of the existing algorithms will not perform properly. In the present work a new clustering technique applicable to large data set has been used to cluster the spectra of 702248 galaxies and quasars having 1,540 points in wavelength range imposed by the instrument. The proposed technique has successfully discovered five clusters from this 702,248X1,540 data matrix.  相似文献   

9.
《随机性模型》2013,29(4):527-548
Abstract

We consider a multi‐server queuing model with two priority classes that consist of multiple customer types. The customers belonging to one priority class customers are lost if they cannot be served immediately upon arrival. Each customer type has its own Poisson arrival and exponential service rate. We derive an exact method to calculate the steady state probabilities for both preemptive and nonpreemptive priority disciplines. Based on these probabilities, we can derive exact expressions for a wide range of relevant performance characteristics for each customer type, such as the moments of the number of customers in the queue and in the system, the expected postponement time and the blocking probability. We illustrate our method with some numerical examples.  相似文献   

10.
In this paper, we examine the potential determinants of foreign direct investment. For this purpose, we apply new exact subset selection procedures, which are based on idealized assumptions, as well as their possibly more plausible empirical counterparts to an international data set to select the optimal set of predictors. Unlike the standard model selection procedures AIC and BIC, which penalize only the number of variables included in a model, and the subset selection procedures RIC and MRIC, which consider also the total number of available candidate variables, our data-specific procedures even take the correlation structure of all candidate variables into account. Our main focus is on a new procedure, which we have designed for situations where some of the potential predictors are certain to be included in the model. For a sample of 73 developing countries, this procedure selects only four variables, namely imports, net income from abroad, gross capital formation, and GDP per capita. An important secondary finding of our study is that the data-specific procedures, which are based on extensive simulations and are therefore very time-consuming, can be approximated reasonably well by the much simpler exact methods.  相似文献   

11.
A new method of discrimination and classification based on a Hausdorff type distance is proposed. In two groups, the Hausdorff distance is defined as the sum of the furthest distance of the nearest elements of one set to another. This distance has some useful properties and is exploited in developing a discriminant criterion between individual objects belonging to two groups based on a finite number of classification variables. The discrimination criterion is generalized to more than two groups in a couple of ways. Several data sets are analysed and their classification accuracy is compared to that obtained from linear discriminant function and the results are encouraging. The method in simple, lends itself to parallel computation and imposes less stringent conditions on the data.  相似文献   

12.
Numerous variable selection methods rely on a two-stage procedure, where a sparsity-inducing penalty is used in the first stage to predict the support, which is then conveyed to the second stage for estimation or inference purposes. In this framework, the first stage screens variables to find a set of possibly relevant variables and the second stage operates on this set of candidate variables, to improve estimation accuracy or to assess the uncertainty associated to the selection of variables. We advocate that more information can be conveyed from the first stage to the second one: we use the magnitude of the coefficients estimated in the first stage to define an adaptive penalty that is applied at the second stage. We give the example of an inference procedure that highly benefits from the proposed transfer of information. The procedure is precisely analyzed in a simple setting, and our large-scale experiments empirically demonstrate that actual benefits can be expected in much more general situations, with sensitivity gains ranging from 50 to 100 % compared to state-of-the-art.  相似文献   

13.
This article considers computational procedures for the waiting time and queue length distributions in stationary multi-class first-come, first-served single-server queues with deterministic impatience times. There are several classes of customers, which are distinguished by deterministic impatience times (i.e., maximum allowable waiting times). We assume that customers in each class arrive according to an independent Poisson process and a single server serves customers on a first-come, first-served basis. Service times of customers in each class are independent and identically distributed according to a phase-type distribution that may differ for different classes. We first consider the stationary distribution of the virtual waiting time and then derive numerically feasible formulas for the actual waiting time distribution and loss probability. We also analyze the joint queue length distribution and provide an algorithmic procedure for computing the probability mass function of the stationary joint queue length.  相似文献   

14.
Abstract. Several old and new density estimators may have good theoretical performance, but are hampered by not being bona fide densities; they may be negative in certain regions or may not integrate to 1. One can therefore not simulate from them, for example. This paper develops general modification methods that turn any density estimator into one which is a bona fide density, and which is always better in performance under one set of conditions and arbitrarily close in performance under a complementary set of conditions. This improvement-for-free procedure can, in particular, be applied for higher-order kernel estimators, classes of modern h 4 bias kernel type estimators, superkernel estimators, the sinc kernel estimator, the k -NN estimator, orthogonal expansion estimators, and for various recently developed semi-parametric density estimators.  相似文献   

15.
Having constructed a rule for classifying objects into classes, one will need to evaluate the performance of that rule both in absolute terms (is it good enough?) and in relative terms (is it better than an alternative?). In this paper, we discuss such evaluation, focusing primarily on the first question, and covering discriminability (how effective the rule is in classifying new objects to the correct class) and reliability (how accurately it estimates probabilities of class membership). Measures based on percentages correct, measures based on probabilities of being correct and distance based measures are outlined, and attractive and problematic properties are discussed.  相似文献   

16.
《随机性模型》2013,29(3):387-424
This paper considers a single server queue that handles arrivals from N classes of customers on a non-preemptive priority basis. Each of the N classes of customers features arrivals from a Poisson process at rate λ i and class-dependent phase type service. To analyze the queue length and waiting time processes of this queue, we derive a matrix geometric solution for the stationary distribution of the underlying Markov chain. A defining characteristic of the paper is the fact that the number of distinct states represented within the sub-level is countably infinite, rather than finite as is usually assumed. Among the results we obtain in the two-priority case are tractable algorithms for the computation of both the joint distribution for the number of customers present and the marginal distribution of low-priority customers, and an explicit solution for the marginal distribution of the number of high-priority customers. This explicit solution can be expressed completely in terms of the arrival rates and parameters of the two service time distributions. These results are followed by algorithms for the stationary waiting time distributions for high- and low-priority customers. We then address the case of an arbitrary number of priority classes, which we solve by relating it to an equivalent three-priority queue. Numerical examples are also presented.  相似文献   

17.
Space–time correlation modelling is one of the crucial steps of traditional structural analysis, since space–time models are used for prediction purposes. A comparative study among some classes of space–time covariance functions is proposed. The relevance of choosing a suitable model by taking into account the characteristic behaviour of the models is proved by using a space–time data set of ozone daily averages and the flexibility of the product-sum model is also highlighted through simulated data sets.  相似文献   

18.
Consider using values of variables X 1, X 2,…, X p to classify entities into one of two classes. Kernel-based procedures such as support vector machines (SVMs) are well suited for this task. In general, the classification accuracy of SVMs can be substantially improved if instead of all p candidate variables, a smaller subset of (say m) variables is used. A new two-step approach to variable selection for SVMs is therefore proposed: best variable subsets of size k = 1,2,…, p are first identified, and then a new data-dependent criterion is used to determine a value for m. The new approach is evaluated in a Monte Carlo simulation study, and on a sample of data sets.  相似文献   

19.
Asymptotic tests are suggested for testing the equality of two multiple correlation coefficients calculated from a single sample from a multivariate normal distribution. An F test is possible only when the two dependent variables coincide and one set of independent variables is a subset of the second set. Tests are compared by simulation for situations in which the F test is inapplicable. Special attention is paid to cases in which asymptotic normality of the test statistics does not hold.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号