首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
A general inductive Bayesian classification framework is considered using a simultaneous predictive distribution for test items. We introduce a principle of generative supervised and semi-supervised classification based on marginalizing the joint posterior distribution of labels for all test items. The simultaneous and marginalized classifiers arise under different loss functions, while both acknowledge jointly all uncertainty about the labels of test items and the generating probability measures of the classes. We illustrate for data from multiple finite alphabets that such classifiers achieve higher correct classification rates than a standard marginal predictive classifier which labels all test items independently, when training data are sparse. In the supervised case for multiple finite alphabets the simultaneous and the marginal classifiers are proven to become equal under generalized exchangeability when the amount of training data increases. Hence, the marginal classifier can be interpreted as an asymptotic approximation to the simultaneous classifier for finite sets of training data. It is also shown that such convergence is not guaranteed in the semi-supervised setting, where the marginal classifier does not provide a consistent approximation.  相似文献   

2.
Dose response studies arise in many medical applications. Often, such studies are considered within the framework of binary-response experiments such as success-failure. In such cases, popular choices for modeling the probability of response are logistic or probit models. Design optimality has been well studied for the logistic model with a continuous covariate. A natural extension of the logistic model is to consider the presence of a qualitative classifier. In this work, we explore D-, A-, and E-optimal designs in a two-parameter, binary logistic regression model after introducing a binary, qualitative classifier with independent levels.  相似文献   

3.
The probability to select the correct model is calculated for likelihood-ratio-based criteria to compare two nested models. If the more extended of the two models is true, the difference between twice the maximised log-likelihoods is approximately noncentral chi-square distributed with d.f. the difference in the number of parameters. The noncentrality parameter of this noncentral chi-square distribution can be approximated by twice the minimum Kullback–Leibler divergence (MKLD) of the best-fitting simple model to the true version of the extended model.The MKLD, and therefore the probability to select the correct model increases approximately proportionally to the number of observations if all observations are performed under the same conditions. If a new set of observations can only be performed under different conditions, the model parameters may depend on the conditions, and therefore have to be estimated for each set of observations separately. An increase in observations will then go together with an increase in the number of model parameters. In this case, the power of the likelihood-ratio test will increase with an increasing number of observations. However, the probability to choose the correct model with the AIC will only increase if for each set of observations the MKLD is more than 0.5. If the MKLD is less than 0.5, that probability will decrease. The probability to choose the correct model with the BIC will always decrease, sometimes after an initial increase for a small number of observation sets. The results are illustrated by a simulation study with a set of five nested nonlinear models for binary data.  相似文献   

4.
Bagging and Boosting are two main ensemble approaches consolidating the decisions of several hypotheses. The diversity of the ensemble members is considered to be a significant element to obtain generalization error. Here, an inventive method called EBAGTS (ensemble-based artificially generated training samples) is proposed to generate ensembles. It manipulates training examples in three ways in order to build various hypotheses straightforwardly: drawing a sub-sample from training set, reducing/raising error-prone training instances, and reducing/raising local instances around error-prone regions. The proposed method is a straightforward, generic framework utilizing any base classifier as its ensemble members to assemble a powerfully built combinational classifier. Decision-tree classifier and multilayer perceptron classifier as some basic classifiers have been employed in the experiments to indicate the proposed method accomplish higher predictive accuracy compared to meta-learning algorithms like Boosting and Bagging. Furthermore, EBAGTS outperforms Boosting more impressively as the training data set gets broader. It is illustrated that EBAGTS can fulfill better performance comparing to the state of the art.  相似文献   

5.
This paper introduces W-tests for assessing homogeneity in mixtures of discrete probability distributions. A W-test statistic depends on the data solely through parameter estimators and, if a penalized maximum likelihood estimation framework is used, has a tractable asymptotic distribution under the null hypothesis of homogeneity. The large-sample critical values are quantiles of a chi-square distribution multiplied by an estimable constant for which we provide an explicit formula. In particular, the estimation of large-sample critical values does not involve simulation experiments or random field theory. We demonstrate that W-tests are generally competitive with a benchmark test in terms of power to detect heterogeneity. Moreover, in many situations, the large-sample critical values can be used even with small to moderate sample sizes. The main implementation issue (selection of an underlying measure) is thoroughly addressed, and we explain why W-tests are well-suited to problems involving large and online data sets. Application of a W-test is illustrated with an epidemiological data set.  相似文献   

6.
In a multinomial model, the sample space is partitioned into a disjoint union of cells. The partition is usually immutable during sampling of the cell counts. In this paper, we extend the multinomial model to the incomplete multinomial model by relaxing the constant partition assumption to allow the cells to be variable and the counts collected from non-disjoint cells to be modeled in an integrated manner for inference on the common underlying probability. The incomplete multinomial likelihood is parameterized by the complete-cell probabilities from the most refined partition. Its sufficient statistics include the variable-cell formation observed as an indicator matrix and all cell counts. With externally imposed structures on the cell formation process, it reduces to special models including the Bradley–Terry model, the Plackett–Luce model, etc. Since the conventional method, which solves for the zeros of the score functions, is unfruitful, we develop a new approach to establishing a simpler set of estimating equations to obtain the maximum likelihood estimate (MLE), which seeks the simultaneous maximization of all multiplicative components of the likelihood by fitting each component into an inequality. As a consequence, our estimation amounts to solving a system of the equality attainment conditions to the inequalities. The resultant MLE equations are simple and immediately invite a fixed-point iteration algorithm for solution, which is referred to as the weaver algorithm. The weaver algorithm is short and amenable to parallel implementation. We also derive the asymptotic covariance of the MLE, verify main results with simulations, and compare the weaver algorithm with an MM/EM algorithm based on fitting a Plackett–Luce model to a benchmark data set.  相似文献   

7.
Convex sets of probability distributions are also called credal sets. They generalize probability theory by relaxing the requirement that probability values be precise. Classification, i.e. assigning class labels to instances described by a set of attributes, is an important domain of application of Bayesian methods, where the naive Bayes classifier has a surprisingly good performance. This paper proposes a new method of classification which involves extending the naive Bayes classifier to credal sets. Exact and effective solution procedures for naive credal classification are derived, and the related dominance criteria are discussed. Credal classification appears as a new method, based on more realistic assumptions and in the direction of more reliable inferences.  相似文献   

8.
In this paper we analyse the average behaviour of the Bayes-optimal and Gibbs learning algorithms. We do this both for off-training-set error and conventional IID (independent identically distributed) error (for which test sets overlap with training sets). For the IID case we provide a major extension to one of the better known results. We also show that expected IID test set error is a non-increasing function of training set size for either algorithm. On the other hand, as we show, the expected off-training-set error for both learning algorithms can increase with training set size, for non-uniform sampling distributions. We characterize the relationship the sampling distribution must have with the prior for such an increase. We show in particular that for uniform sampling distributions and either algorithm, the expected off-training-set error is a non-increasing function of training set size. For uniform sampling distributions, we also characterize the priors for which the expected error of the Bayes-optimal algorithm stays constant. In addition we show that for the Bayes-optimal algorithm, expected off-training-set error can increase with training set size when the target function is fixed, but if and only if the expected error averaged over all targets decreases with training set size. Our results hold for arbitrary noise and arbitrary loss functions.  相似文献   

9.
This paper presents a novel ensemble classifier generation method by integrating the ideas of bootstrap aggregation and Principal Component Analysis (PCA). To create each individual member of an ensemble classifier, PCA is applied to every out-of-bag sample and the computed coefficients of all principal components are stored, and then the principal components calculated on the corresponding bootstrap sample are taken as additional elements of the original feature set. A classifier is trained with the bootstrap sample and some features randomly selected from the new feature set. The final ensemble classifier is constructed by majority voting of the trained base classifiers. The results obtained by empirical experiments and statistical tests demonstrate that the proposed method performs better than or as well as several other ensemble methods on some benchmark data sets publicly available from the UCI repository. Furthermore, the diversity-accuracy patterns of the ensemble classifiers are investigated by kappa-error diagrams.  相似文献   

10.

There has been increasing interest in using semi-supervised learning to form a classifier. As is well known, the (Fisher) information in an unclassified feature with unknown class label is less (considerably less for weakly separated classes) than that of a classified feature which has known class label. Hence in the case where the absence of class labels does not depend on the data, the expected error rate of a classifier formed from the classified and unclassified features in a partially classified sample is greater than that if the sample were completely classified. We propose to treat the labels of the unclassified features as missing data and to introduce a framework for their missingness as in the pioneering work of Rubin (Biometrika 63:581–592, 1976) for missingness in incomplete data analysis. An examination of several partially classified data sets in the literature suggests that the unclassified features are not occurring at random in the feature space, but rather tend to be concentrated in regions of relatively high entropy. It suggests that the missingness of the labels of the features can be modelled by representing the conditional probability of a missing label for a feature via the logistic model with covariate depending on the entropy of the feature or an appropriate proxy for it. We consider here the case of two normal classes with a common covariance matrix where for computational convenience the square of the discriminant function is used as the covariate in the logistic model in place of the negative log entropy. Rather paradoxically, we show that the classifier so formed from the partially classified sample may have smaller expected error rate than that if the sample were completely classified.

  相似文献   

11.
Empirical estimates of source statistical economic data such as trade flows, greenhouse gas emissions, or employment figures are always subject to uncertainty (stemming from measurement errors or confidentiality) but information concerning that uncertainty is often missing. This article uses concepts from Bayesian inference and the maximum entropy principle to estimate the prior probability distribution, uncertainty, and correlations of source data when such information is not explicitly provided. In the absence of additional information, an isolated datum is described by a truncated Gaussian distribution, and if an uncertainty estimate is missing, its prior equals the best guess. When the sum of a set of disaggregate data is constrained to match an aggregate datum, it is possible to determine the prior correlations among disaggregate data. If aggregate uncertainty is missing, all prior correlations are positive. If aggregate uncertainty is available, prior correlations can be either all positive, all negative, or a mix of both. An empirical example is presented, which reports relative uncertainties and correlation priors for the County Business Patterns database. In this example, relative uncertainties range from 1% to 80% and 20% of data pairs exhibit correlations below ?0.9 or above 0.9. Supplementary materials for this article are available online.  相似文献   

12.
A classifier is developed which uses information from all pixels in a neighbourhood to classify the pixel at the center of the neighbourhood. It is not a smoother in that it tries to recognize boundaries. and it makes explieite use of the relative positions of pixels in the neighbourhood. It is based on a geometric probability model for the distribution of the classes in the plane. The neighbourhood-based classifier is shown to outperform linear discriminant analysis on some LANDSAT data.  相似文献   

13.
This paper deals with a class of recursive kernel estimators of the transition probability density function t(y|x) of a stationary Markov process. A sufficient condition for such estimators to be weakly and strongly 2 consistent for almost all (x,y)∈R2 is given. Further an L, convergence result is obtained. No continuity conditions are imposed on t(y|x).  相似文献   

14.
Conditional parametric bootstrapping is defined as the samples obtained by performing the simulations in such a way that the estimator is kept constant and equal to the estimate obtained from the data. Order statistics of the bootstrap replicates of the parameter chosen in each simulation provide exact confidence intervals, in a probabilistic sense, in models with one parameter under quite general conditions. The method is still exact in the case of nuisance parameters when these are location and scale parameters, and the bootstrapping is based on keeping the maximum-likelihood estimates constant. The method is also exact if there exists a sufficient statistic for the nuisance parameters and if the simulations are performed conditioning on this statistic. The technique may also be used to construct prediction intervals. These are generally not exact, but are likely to be good approximations.  相似文献   

15.
In a linear model with an arbitrary variance–covariance matrix, Zyskind (Ann. Math. Statist. 38 (1967) 1092) provided necessary and sufficient conditions for when a given linear function of the fixed-effect parameters has a best linear unbiased estimator (BLUE). If these conditions hold uniformly for all possible variance–covariance parameters (i.e., there is a UBLUE) and if the data are assumed to be normally distributed, these conditions are also necessary and sufficient for the parametric function to have a uniformly minimum variance unbiased estimator (UMVUE). For mixed-effects ANOVA models, we show how these conditions can be translated in terms of the incidence array, which facilitates verification of the UBLUE and UMVUE properties and facilitates construction of designs having such properties.  相似文献   

16.
Sufficiency is a widely used concept for reducing the dimensionality of a data set. Collecting data for a sufficient statistic is generally much easier and less expensive than collecting all of the available data. When the posterior distributions of a quantity of interest given the aggregate and disaggregate data are identical, perfect aggregation is said to hold, and in this case the aggregate data is a sufficient statistic for the quantity of interest. In this paper, the conditions for perfect aggregation are shown to depend on the functional form of the prior distribution. When the quantity of interest is the sum of some parameters in a vector having either a generalized Dirichlet or a Liouville distribution for analyzing compositional data, necessary and sufficient conditions for perfect aggregation are also established.  相似文献   

17.
In the area of sufficient dimension reduction, two structural conditions are often assumed: the linearity condition that is close to assuming ellipticity of underlying distribution of predictors, and the constant variance condition that nears multivariate normality assumption of predictors. Imposing these conditions are considered as necessary trade-off for overcoming the “curse of dimensionality”. However, it is very hard to check whether these conditions hold or not. When these conditions are violated, some methods such as marginal transformation and re-weighting are suggested so that data fulfill them approximately. In this article, we assume an independence condition between the projected predictors and their orthogonal complements which can ensure the commonly used inverse regression methods to identify the central subspace of interest. The independence condition can be checked by the gridded chi-square test. Thus, we extend the scope of many inverse regression methods and broaden their applicability in the literature. Simulation studies and an application to the car price data are presented for illustration.  相似文献   

18.
Inverse probability weighting (IPW) and multiple imputation are two widely adopted approaches dealing with missing data. The former models the selection probability, and the latter models data distribution. Consistent estimation requires correct specification of corresponding models. Although the augmented IPW method provides an extra layer of protection on consistency, it is usually not sufficient in practice as the true data‐generating process is unknown. This paper proposes a method combining the two approaches in the same spirit of calibration in sampling survey literature. Multiple models for both the selection probability and data distribution can be simultaneously accounted for, and the resulting estimator is consistent if any model is correctly specified. The proposed method is within the framework of estimating equations and is general enough to cover regression analysis with missing outcomes and/or missing covariates. Results on both theoretical and numerical investigation are provided.  相似文献   

19.
This article respectively provides sufficient conditions and necessary conditions of matrix linear estimators of an estimable parameter matrix linear function in multivariate linear models with and without the assumption that the underlying distribution is a normal one with completely unknown covariance matrix. In the latter model, a necessary and sufficient condition is given for matrix linear estimators to be admissible in the space of all matrix linear estimators under each of three different kinds of quadratic matrix loss functions, respectively. In the former model, a sufficient condition is first provided for matrix linear estimators to be admissible in the space of all matrix estimators having finite risks under each of the same loss functions, respectively. Furthermore in the former model, one of these sufficient conditions, correspondingly under one of the loss functions, is also proved to be necessary, if additional conditions are assumed.  相似文献   

20.
At least two adequate and well-controlled clinical studies are usually required to support effectiveness of a certain treatment. In some circumstances, however, a single study providing strong results may be sufficient. Some statistical stability criteria for assessing whether a single study provides very persuasive results are known. A new criterion is introduced, and it is based on the conservative estimation of the reproducibility probability in addition to the possibility of performing statistical tests by referring directly to the reproducibility probability estimate. These stability criteria are compared numerically and conceptually. This work aims to help both regulatory agencies and pharmaceutical companies to decide if the results of a single study may be sufficient to establish effectiveness.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号