首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A method of regularized discriminant analysis for discrete data, denoted DRDA, is proposed. This method is related to the regularized discriminant analysis conceived by Friedman (1989) in a Gaussian framework for continuous data. Here, we are concerned with discrete data and consider the classification problem using the multionomial distribution. DRDA has been conceived in the small-sample, high-dimensional setting. This method has a median position between multinomial discrimination, the first-order independence model and kernel discrimination. DRDA is characterized by two parameters, the values of which are calculated by minimizing a sample-based estimate of future misclassification risk by cross-validation. The first parameter is acomplexity parameter which provides class-conditional probabilities as a convex combination of those derived from the full multinomial model and the first-order independence model. The second parameter is asmoothing parameter associated with the discrete kernel of Aitchison and Aitken (1976). The optimal complexity parameter is calculated first, then, holding this parameter fixed, the optimal smoothing parameter is determined. A modified approach, in which the smoothing parameter is chosen first, is discussed. The efficiency of the method is examined with other classical methods through application to data.  相似文献   

2.
Document classification is an area of great importance for which many classification methods have been developed. However, most of these methods cannot generate time-dependent classification rules. Thus, they are not the best choices for problems with time-varying structures. To address this problem, we propose a varying naïve Bayes model, which is a natural extension of the naïve Bayes model that allows for time-dependent classification rule. The method of kernel smoothing is developed for parameter estimation and a BIC-type criterion is invented for feature selection. Asymptotic theory is developed and numerical studies are conducted. Finally, the proposed method is demonstrated on a real dataset, which was generated by the Mayor Public Hotline of Changchun, the capital city of Jilin Province in Northeast China.  相似文献   

3.
We describe a novel deterministic approximate inference technique for conditionally Gaussian state space models, i.e. state space models where the latent state consists of both multinomial and Gaussian distributed variables. The method can be interpreted as a smoothing pass and iteration scheme symmetric to an assumed density filter. It improves upon previously proposed smoothing passes by not making more approximations than implied by the projection onto the chosen parametric form, the assumed density. Experimental results show that the novel scheme outperforms these alternative deterministic smoothing passes. Comparisons with sampling methods suggest that the performance does not degrade with longer sequences.  相似文献   

4.
In this paper, exponentially weighted moving average (EWMA) control charts for multinomial data are developed with a three-level classification scheme. The lower and upper control limits of the proposed EWMA control chart are evaluated using Markov chain approximation. Compared with the three-level Shewhart control chart, numerical results indicate that the proposed EWMA control chart is relatively sensitive to small shifts in a three-level multinomial process. A figure and a table are provided for practitioners to select the value of chart limit coefficient that gives the desired in-control average run length.  相似文献   

5.
Learning classification trees   总被引:11,自引:0,他引:11  
Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statistics. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule is similar to Quinlan's information gain, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach,c4 (Quinlanet al., 1987) andcart (Breimanet al., 1984), show that the full Bayesian algorithm can produce more accurate predictions than versions of these other approaches, though pays a computational price.  相似文献   

6.
On Smoothing Sparse Multinomial Data   总被引:3,自引:0,他引:3  
Asymptotic theory is developed for the problem of smoothing sparse multinomial data, with emphasis on the criterion of mean summed square error of estimators of the probability mass function. If the data are not too sparse, in a well-defined sense, then the optimal rate of convergence is that achieved by the unsmoothed cell proportions. Otherwise, this rate can be improved upon by smoothing. Explicit results, including formulae for the optimal smoothing parameter, are presented for a kernel-type estimator. Also for this case, a cross-validatory choice procedure is shown to be asymptotically optimal.  相似文献   

7.
Abstract

This article considers the problem of selecting the most probable cell in a multinomial distribution in the presence of a nuisance cell. Two open sequential procedures are proposed and studied. One is a two-stage procedure and the other a multistage procedure.  相似文献   

8.
ABSTRACT

This article considers nonparametric regression problems and develops a model-averaging procedure for smoothing spline regression problems. Unlike most smoothing parameter selection studies determining an optimum smoothing parameter, our focus here is on the prediction accuracy for the true conditional mean of Y given a predictor X. Our method consists of two steps. The first step is to construct a class of smoothing spline regression models based on nonparametric bootstrap samples, each with an appropriate smoothing parameter. The second step is to average bootstrap smoothing spline estimates of different smoothness to form a final improved estimate. To minimize the prediction error, we estimate the model weights using a delete-one-out cross-validation procedure. A simulation study has been performed by using a program written in R. The simulation study provides a comparison of the most well known cross-validation (CV), generalized cross-validation (GCV), and the proposed method. This new method is straightforward to implement, and gives reliable performances in simulations.  相似文献   

9.
Here we consider a multinomial probit regression model where the number of variables substantially exceeds the sample size and only a subset of the available variables is associated with the response. Thus selecting a small number of relevant variables for classification has received a great deal of attention. Generally when the number of variables is substantial, sparsity-enforcing priors for the regression coefficients are called for on grounds of predictive generalization and computational ease. In this paper, we propose a sparse Bayesian variable selection method in multinomial probit regression model for multi-class classification. The performance of our proposed method is demonstrated with one simulated data and three well-known gene expression profiling data: breast cancer data, leukemia data, and small round blue-cell tumors. The results show that compared with other methods, our method is able to select the relevant variables and can obtain competitive classification accuracy with a small subset of relevant genes.  相似文献   

10.
ABSTRACT

This research studies automatic price pattern search procedure for bitcoin cryptocurrency based on 1-min price data. To achieve this, search algorithm is proposed based on nonparametric regression method of smoothing splines. We investigate some well-known technical analysis patterns and construct algorithmic trading strategy to evaluate the effectiveness of the patterns. We found that method of smoothing splines for identifying the technical analysis patterns and that strategies based on certain technical analysis patterns yield returns that significantly exceed results of unconditional trading strategies.  相似文献   

11.
ABSTRACT

This note presents an approximation to multivariate regression models which is obtained from a first-order series expansion of the multivariate link function. The proposed approach yields a variable-addition approximation of regression models that enables a multivariate generalization of the well-known goodness-of-link specification test, available for univariate generalized linear models. Application of this general methodology is illustrated with models of multinomial discrete choice and multivariate fractional data, in which context it is shown to lead to well-established approximation and testing procedures.  相似文献   

12.
Assume that a number of individuals are to be classified into one of two populations and that, at the same time, the proportion of members of each population needs to be estimated. The allocated proportions given by the Bayes classification rule are not consistent estimates of the true proportions, so a different classification rule is proposed; this rule yields consistent estimates with only a small increase in the probability of misclassification. As an illustration, the case of two normal distributions with equal covariance matrices is dealt with in detail.  相似文献   

13.
We extend the classical one-dimensional Bayes binary classifier to create a new classification rule that has a region of neutrality to account for cases where the implied weight of evidence is too weak for a confident classification. Our proposed rule allows a “No Prediction” when the observation is too ambiguous to have confidence in a definite prediction. The motivation for making “No Prediction” is that in our microbial community profiling application, a wrong prediction can be worse than making no prediction at all. On the other hand, too many “No Predictions” have adverse implications as well. Consequently, our proposed rule incorporates this trade-off using a cost structure that weighs the penalty for not making a definite prediction against the penalty for making an incorrect definite prediction. We demonstrate that our proposed rule outperforms a naive neutral-zone rule that has been routinely used in biological applications similar to ours.  相似文献   

14.
ABSTRACT

There is no established procedure for testing for trend with nominal outcomes that would provide both a global hypothesis test and outcome-specific inference. We derive a simple formula for such a test using a weighted sum of Cochran–Armitage test statistics evaluating the trend in each outcome separately. The test is shown to be equivalent to the score test for multinomial logistic regression, however, the new formulation enables the derivation of a sample size formula and multiplicity-adjusted inference for individual outcomes. The proposed methods are implemented in the R package multiCA.  相似文献   

15.
Abstract

We consider statistical inference for additive partial linear models when the linear covariate is measured with error. A bias-corrected spline-backfitted kernel smoothing method is proposed. Under mild assumptions, the proposed component function and parameter estimator are oracally efficient and fast to compute. The nonparametric function estimator’s pointwise distribution is asymptotically equivalent to an function estimator in partial linear model. Finite-sample performance of the proposed estimators is assessed by simulation experiments. The proposed methods are applied to Boston house data set.  相似文献   

16.
A Bayesian method is proposed for estimating the cell probabilities of several multinomial distributions. Parameters of different distributions are taken to be a priori exchangeable. The prior specification is based upon mixtures of a hierarchical distribution, referred to as the multivariate “Dirichlet-Dirichlet” distribution. The analysis is facilitated by a multinomial approximation relating to the multinomial-Dirichlet distribution. The posterior estimates depend upon measures of entropy for the various distributions and shrink the individual observed proportions towards values obtained by pooling the data across the distributions. As well as incorporating prior information they are particularly useful when some of the cell frequencies are zero. We use them to investigate a numerical classification of males of various vocations, according to cause of death.  相似文献   

17.
ABSTRACT

The randomized response technique is an effective survey method designed to elicit sensitive information while ensuring the privacy of the respondents. In this article, we present some new results on the randomization response model in situations wherein one or two response variables are assumed to follow a multinomial distribution. For a single sensitive question, we use the well-known Hopkins randomization device to derive estimates, both under the assumption of truthful and untruthful responses, and present a technique for making pairwise comparisons. When there are two sensitive questions of interest, we derive a Pearson product moment correlation estimator based on the multinomial model assumption. This estimator may be used to quantify the linear relationship between two variables when multinomial response data are observed according to a randomized-response protocol.  相似文献   

18.
This paper studies a sequential procedure R for selecting a random size subset that contains the multinomial cell which has the smallest cell probability. The stopping rule of the proposed procedure R is the composite of the stopping rules of curtailed sampling, inverse sampling, and the Ramey-Alam sampling. A reslut on the worst configuration is shown and it is employed in computing the procedure parameters that guarantee certain probability requirements. Tables of these procedure parameters, the corresponding probability of correct selection, the expected sample size, and the expected subset size are given for comparison purpose.  相似文献   

19.
The problem of classification into two univariate normal populations with a common mean is considered. Several classification rules are proposed based on efficient estimators of the common mean. Detailed numerical comparisons of probabilities of misclassifications using these rules have been carried out. It is shown that the classification rule based on the Graybill-Deal estimator of the common mean performs the best. Classification rules are also proposed for the case when variances are assumed to be ordered. Comparison of these rules with the rule based on the Graybill-Deal estimator has been done with respect to individual probabilities of misclassification.  相似文献   

20.
ABSTRACT

Suppose that we observe X ~Binomial(n, p). Inference on p is difficult if X = 0 or n. One way around this is to condition on these events not happening. We show that this has only an exponentially small effect on its cumulants. This is also true if we condition away other rare events. Our results are presented for exponential families, with applications to the binomial, multinomial and negative multinomial distributions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号