首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Using mean absolute deviation, we compare the efficay of two new parametric conditional error rate estimators with six others, four of which are well known.The performance of both new estimators is found to be superior to the six competing estimators examined in this paper, especially when the ratio of the training sample size to the feature dimensionality is small.  相似文献   

2.
Fisher's linear discriminant function, adapted by Anderson for allocating new observations into one of two existing groups, is considered in this paper. Methods of estimating the misclassification error rates are reviewed and evaluated by Monte Carlo simulations. The investigation is carried out under both ideal (Multivariate Normal data) and non-ideal (Multivariate Binary data) conditions. The assessment is based on the usual mean square error (MSE) criterion and also on a new criterion of optimism. The results show that although there is a common cluster of good estimators for both ideal and non-ideal conditions, the single best estimators vary with respect to the different criteria  相似文献   

3.
Generalized linear mixed models (GLMM) are commonly used to model the treatment effect over time while controlling for important clinical covariates. Standard software procedures often provide estimates of the outcome based on the mean of the covariates; however, these estimates will be biased for the true group means in the GLMM. Implementing GLMM in the frequentist framework can lead to issues of convergence. A simulation study demonstrating the use of fully Bayesian GLMM for providing unbiased estimates of group means is shown. These models are very straightforward to implement and can be used for a broad variety of outcomes (eg, binary, categorical, and count data) that arise in clinical trials. We demonstrate the proposed method on a data set from a clinical trial in diabetes.  相似文献   

4.
It is widely believed that unlabeled data are promising for improving prediction accuracy in classification problems. Although theoretical studies about when/how unlabeled data are beneficial exist, an actual prediction improvement has not been sufficiently investigated for a finite sample in a systematic manner. We investigate the impact of unlabeled data in linear discriminant analysis and compare the error rates of the classifiers estimated with/without unlabeled data. Our focus is a labeling mechanism that characterizes the probabilistic structure of occurrence of labeled cases. Results imply that an extremely small proportion of unlabeled data has a large effect on the analysis results.  相似文献   

5.
In this paper, we propose a new Bayesian inference approach for classification based on the traditional hinge loss used for classical support vector machines, which we call the Bayesian Additive Machine (BAM). Unlike existing approaches, the new model has a semiparametric discriminant function where some feature effects are nonlinear and others are linear. This separation of features is achieved automatically during model fitting without user pre-specification. Following the literature on sparse regression of high-dimensional models, we can also identify the irrelevant features. By introducing spike-and-slab priors using two sets of indicator variables, these multiple goals are achieved simultaneously and automatically, without any parameter tuning such as cross-validation. An efficient partially collapsed Markov chain Monte Carlo algorithm is developed for posterior exploration based on a data augmentation scheme for the hinge loss. Our simulations and three real data examples demonstrate that the new approach is a strong competitor to some approaches that were proposed recently for dealing with challenging classification examples with high dimensionality.  相似文献   

6.
A density bounded class P of probability distributions on a space χ is the set of all probability distributions corresponding to probability densities bounded below by a given subprob-ability density and bounded above by a given superprobability density. Density bounded classes arise in robust Bayesian analysis (Lavine 1991) and also in Monte Carlo integration (Fishman Granovsky and Rubin 1989). Finding upper and lower bounds on the variance over all p? P allows one to bound the Monte Carlo variance. Fishman Granovsky and Rubin (1989) find bounds on the variance over all p ? P and also find the densities in P achieving those bounds in the case where χ is discrete; that is, where P is actually a set of probability mass functions. This article generalizes their result by showing how to bound the variance and find the densities achieving the bounds when χ is continuous.  相似文献   

7.
Generalized linear models are commonly used to analyze categorical data such as binary, count, and ordinal outcomes. Adjusting for important prognostic factors or baseline covariates in generalized linear models may improve the estimation efficiency. The model‐based mean for a treatment group produced by most software packages estimates the response at the mean covariate, not the mean response for this treatment group for the studied population. Although this is not an issue for linear models, the model‐based group mean estimates in generalized linear models could be seriously biased for the true group means. We propose a new method to estimate the group mean consistently with the corresponding variance estimation. Simulation showed the proposed method produces an unbiased estimator for the group means and provided the correct coverage probability. The proposed method was applied to analyze hypoglycemia data from clinical trials in diabetes. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

8.
Estimation of the mean of a multivariate normal distribution is considered. The components of the mean vector θ are assumed to be exchangeable; this is modelled in a hierarchical fashion with independent Cauchy distributions as the first-stage prior. The resulting generalized Bayes estimator is calculated and shown to be robust with respect to the presence of outlying means. Alternative estimators that have similar behaviour but are cheaper to compute are also derived.  相似文献   

9.
In this study, we considered a hypothesis test for the difference of two population means using ranked set sampling. We proposed a test statistic for this hypothesis test with more than one cycle under normality. We also investigate the performance of this test statistic, when the assumptions hold and are violated. For this reason, we investigate the type I error and power rates of tests under normality with equal and unequal variances, non-normality with equal and unequal variances. We also examine the performance of this test under imperfect ranking case. The simulation results show that derived test performs quite well.  相似文献   

10.
Finite sample properties of estimators for the parameters of a dependent Bernoulli process are investigated using Monte Carlo techniques. A ratio estimator is proposed for the dependence parameter of the model and is compared to the approximate maximum likelihood estimator given by Klotz. It is shown that both estimators have a downward bias that is extreme in certain cases and that samples well in excess of 200 may be necessary before the asymptotic theory can be applied.  相似文献   

11.
The property of identifiability is an important consideration on estimating the parameters in a mixture of distributions. Also classification of a random variable based on a mixture can be meaning fully discussed only if the class of all finite mixtures is identifiable. The problem of identifiability of finite mixture of Gompertz distributions is studied. A procedure is presented for finding maximum likelihood estimates of the parameters of a mixture of two Gompertz distributions, using classified and unclassified observations. Based on small sample size, estimation of a nonlinear discriminant function is considered. Throughout simulation experiments, the performance of the corresponding estimated nonlinear discriminant function is investigated.  相似文献   

12.
ABSTRACT

This paper investigates the finite-sample performance of the augmented Dickey–Fuller (ADF), Phillips–Perron (PP), momentum threshold autoregressive (M-TAR), Kapetanios–Shin–Snell (KSS), and the inf-t unit-root tests. Simulation results show that the ADF and KSS tests have better size, whereas other tests generate severe size distortions when the date-generating processes are non linear unit-root processes. In general, with regard to the combination of test powers with test sizes, the ADF and KSS tests are comparatively better than the PP, M-TAR, and inf-t tests; moreover, the inf-t test exhibits the poorest performance even for larger sample sizes.  相似文献   

13.
We consider the linear feature selection problem of obtaining a nonzero 1 × n matrix B which minimizes the probability of misclassification based on the Bayes decision rule applied to the random variable Y = BX, where X is a random n-vector arising from one of m Gaussian populations with equal covariances and equal apriori probabilities. It is shown that the optimal B satisfies a fixed point equation B = F(B) which can be solved by successive substitution.  相似文献   

14.
Testing homogeneity of multivariate normal mean vectors under an order restriction when the covariance matrices are unknown, arbitrary positive definite and unequal are considered. This problem of testing has been studied to some extent, for example, by Kulatunga and Sasabuchi (1984 Kulatunga, D. D. S., Sasabuchi, S. (1984). A test of homogeneity of mean vectors against multivariate isotonic alternatives. Mem Fac Sci, Kyushu Univ Ser A Mathemat 38:151161. [Google Scholar]) when the covariance matrices are known and also Sasabuchi et al. (2003 Sasabuchi, S., Tanaka, K., Tsukamodo, T. (2003). Testing homogeneity of multivariate normal mean vectors under an order restriction when the covariance matrices are common but unknown. Annals of Statistics. 31(5):15171536.[Web of Science ®] [Google Scholar]) and Sasabuchi (2007 Sasabuchi, S. (2007). More powerful tests for homogeneity of multivariate normal mean vectors under an order restriction. Sankhya 69(4):700716. [Google Scholar]) when the covariance matrices are unknown but common. In this paper, a test statistic is proposed and because of the main advantage of the bootstrap test is that it avoids the derivation of the complex null distribution analytically, a bootstrap test statistic is derived and since the proposed test statistic is location invariance the bootstrap p-value defined logical and some steps are presented to estimate it. Our numerical studies via Monte Carlo simulation show that the proposed bootstrap test can correctly control the type I error rates. The power of the test for some of the p-dimensional normal distributions is computed by Monte Carlo simulation. Also, the null distribution of test statistic is estimated using kernel density. Finally, the bootstrap test is illustrated using a real data.  相似文献   

15.
This paper presents a comprehensive comparison of well-known partially adaptive estimators (PAEs) in terms of efficiency in estimating regression parameters. The aim is to identify the best estimators of regression parameters when error terms follow from normal, Laplace, Student's t, normal mixture, lognormal and gamma distribution via the Monte Carlo simulation. In the results of the simulation, efficient PAEs are determined in the case of symmetric leptokurtic and skewed leptokurtic regression error data. Additionally, these estimators are also compared in terms of regression applications. Regarding these applications, using certain standard error estimators, it is shown that PAEs can reduce the standard error of the slope parameter estimate relative to ordinary least squares.  相似文献   

16.
This paper discusses a goodness-of-fit test that uses the integral of the squared modulus of the difference between the empirical characteristic function of the sample data and the characteristic function of the hypothesized distribution. Monte Carlo procedures are employed to obtain the empirical percentage points for testing the fit of normal, logistic and exponential distributions with unknown location and scale parameters. Results of Monte Carlo power comparisons with other well-developed goodness-of-fit tests are summarized. Tne proposed test is shown to have superior power for testing the fit of the logistic distibotion (for moderate sample sizes) against a wide range of alternative distributions.  相似文献   

17.
In this paper the problem of statistical hypothesis testing under weighted sampling is considered for obtaining the most powerful test. Some simulated powers of tests, using the Monte Carlo method, are performed. Using a convenient sample of the specialist physicians of Social Security Organization of Ahvaz in Iran, two weighted samplings versus random sampling are tested. Among the three mentioned sampling, the size-biased sampling order 0.2 is more appropriate for the mechanism of data collection.  相似文献   

18.
We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed.  相似文献   

19.
Although the collinearity issue has been studied in previous simulation studies with a simultaneous system of equations, alternative estimators to circumvent this problem have received little attention. Monte Carlo techniques are used to examine the performance of several estimators under a squared error loss criterion. In particular, this study considers the Vinod–Ullah ridge-type estimators at the first and/or second stage of 2SLS. Ridge regression in the second stage only of 2SLS but not the first stage only, seems to be a practical alternative to 2SLS, especially in situations of strong collinearity. The OLS estimator and the ordinary ridge regression estimator also yield favorable results in situations of moderate to strong collinearity.  相似文献   

20.
We deal with the problem of classifying a new observation vector into one of two known multivariate normal distributions when the dimension p and training sample size N   are both large with p<Np<N. Modified linear discriminant analysis (MLDA) was suggested by Xu et al. [10]. Error rate of MLDA is smaller than the one of LDA. However, if p and N   are moderately large, error rate of MLDA is close to the one of LDA. These results are conditional ones, so we should investigate whether they hold unconditionally. In this paper, we give two types of asymptotic approximations of expected probability of misclassification (EPMC) for MLDA as n→∞n with p=O(nδ)p=O(nδ), 0<δ<10<δ<1. The one of two is the same as the asymptotic approximation of LDA, and the other is corrected version of the approximation. Simulation reveals that the modified version of approximation has good accuracy for the case in which p and N are moderately large.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号