首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper gives a comparative study of the K-means algorithm and the mixture model (MM) method for clustering normal data. The EM algorithm is used to compute the maximum likelihood estimators (MLEs) of the parameters of the MM model. These parameters include mixing proportions, which may be thought of as the prior probabilities of different clusters; the maximum posterior (Bayes) rule is used for clustering. Hence, asymptotically the MM method approaches the Bayes rule for known parameters, which is optimal in terms of minimizing the expected misclassification rate (EMCR).  相似文献   

2.
In this paper, we consider the classification of high-dimensional vectors based on a small number of training samples from each class. The proposed method follows the Bayesian paradigm, and it is based on a small vector which can be viewed as the regression of the new observation on the space spanned by the training samples. The classification method provides posterior probabilities that the new vector belongs to each of the classes, hence it adapts naturally to any number of classes. Furthermore, we show a direct similarity between the proposed method and the multicategory linear support vector machine introduced in Lee et al. [2004. Multicategory support vector machines: theory and applications to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association 99 (465), 67–81]. We compare the performance of the technique proposed in this paper with the SVM classifier using real-life military and microarray datasets. The study shows that the misclassification errors of both methods are very similar, and that the posterior probabilities assigned to each class are fairly accurate.  相似文献   

3.
In a clinical trial comparing drug with placebo, where there are multiple primary endpoints, we consider testing problems where an efficacious drug effect can be claimed only if statistical significance is demonstrated at the nominal level for all endpoints. Under the assumption that the data are multivariate normal, the multiple endpoint-testing problem is formulated. The usual testing procedure involves testing each endpoint separately at the same significance level using two-sample t-tests, and claiming drug efficacy only if each t-statistic is significant. In this paper we investigate properties of this procedure. We show that it is identical to both an intersection union test and the likelihood ratio test. A simple expression for the p-value is given. The level and power function are studied; it is shown that the test may be conservative and that it is biased. Computable bounds for the power function are established.  相似文献   

4.
A modified large-sample (MLS) approach and a generalized confidence interval (GCI) approach are proposed for constructing confidence intervals for intraclass correlation coefficients. Two particular intraclass correlation coefficients are considered in a reliability study. Both subjects and raters are assumed to be random effects in a balanced two-factor design, which includes subject-by-rater interaction. Computer simulation is used to compare the coverage probabilities of the proposed MLS approach (GiTTCH) and GCI approaches with the Leiva and Graybill [1986. Confidence intervals for variance components in the balanced two-way model with interaction. Comm. Statist. Simulation Comput. 15, 301–322] method. The competing approaches are illustrated with data from a gauge repeatability and reproducibility study. The GiTTCH method maintains at least the stated confidence level for interrater reliability. For intrarater reliability, the coverage is accurate in several circumstances but can be liberal in some circumstances. The GCI approach provides reasonable coverage for lower confidence bounds on interrater reliability, but its corresponding upper bounds are too liberal. Regarding intrarater reliability, the GCI approach is not recommended because the lower bound coverage is liberal. Comparing the overall performance of the three methods across a wide array of scenarios, the proposed modified large-sample approach (GiTTCH) provides the most accurate coverage for both interrater and intrarater reliability.  相似文献   

5.
We propose different multivariate nonparametric tests for factorial designs and derive their asymptotic distribution for the situation where the number of replications is limited, whereas the number of treatments goes to infinity (large a, small n case). The tests are based on separate rankings for the different variables, and they are therefore invariant under separate monotone transformations of the individual variables.  相似文献   

6.
This article investigates the large sample interval mapping method for genetic trait loci (GTL) in a finite non-linear regression mixture model. The general model includes most commonly used kernel functions, such as exponential family mixture, logistic regression mixture and generalized linear mixture models, as special cases. The populations derived from either the backcross or intercross design are considered. In particular, unlike all existing results in the literature in the finite mixture models, the large sample results presented in this paper do not require the boundness condition on the parametric space. Therefore, the large sample theory presented in this article possesses general applicability to the interval mapping method of GTL in genetic research. The limiting null distribution of the likelihood ratio test statistics can be utilized easily to determine the threshold values or p-values required in the interval mapping. The limiting distribution is proved to be free of the parameter values of null model and free of the choice of a kernel function. Extension to the multiple marker interval GTL detection is also discussed. Simulation study results show favorable performance of the asymptotic procedure when sample sizes are moderate.  相似文献   

7.
Tsukanov (Theor. Probab. Appl. 26 (1981) 173–177) considers the regression model E(y|Z)=Fp+Zq, D(y|Z)=σ2In, where y(n×1) is a vector of measured values,F(n×k) contains the control variables, Z(n×l) contains the observed values, and p(k×1) and q(l×1) are being estimated. Assuming that Z=FL+R, where L(k×l) is non-random, and the rows of R (n×l) are i.i.d. N(0,Σ), we extend Tsukanov's results by (i) computing E(detHp), where Hp is the covariance matrix of p?, the l.s.e. of p, (ii) considering ‘optimality in the mean’ for the largest root criterion, (iii) discussing these equations when the matrix R has a left-spherical distribution.  相似文献   

8.
Confidence intervals for parameters that can be arbitrarily close to being unidentified are unbounded with positive probability [e.g. Dufour, J.-M., 1997. Some impossibility theorems in econometrics with applications to instrumental variables and dynamic models. Econometrica 65, 1365–1388; Pfanzagl, J. 1998. The nonexistence of confidence sets for discontinuous functionals. Journal of Statistical Planning and Inference 75, 9–20], and the asymptotic risks of their estimators are unbounded [Pötscher, B.M., 2002. Lower risk bounds and properties of confidence sets for ill-posed estimation problems with applications to spectral density and persistence estimation, unit roots, and estimation of long memory parameters. Econometrica 70, 1035–1065]. We extend these “impossibility results” and show that all tests of size α concerning parameters that can be arbitrarily close to being unidentified have power that can be as small as α for any sample size even if the null and the alternative hypotheses are not adjacent. The results are proved for a very general framework that contains commonly used models.  相似文献   

9.
We consider a general class of mixed models, where the individual parameter vector is composed of a linear function of the population parameter vector plus an individual random effects vector. The linear function can vary for the different individuals. We show that the search for optimal designs for the estimation of the population parameter vector can be restricted to the class of group-wise identical designs, i.e., for each of the groups defined by the different linear functions only one individual elementary design has to be optimized. A way to apply the result to non-linear mixed models is described.  相似文献   

10.
In this paper, the hypothesis testing and interval estimation for the intraclass correlation coefficients are considered in a two-way random effects model with interaction. Two particular intraclass correlation coefficients are described in a reliability study. The tests and confidence intervals for the intraclass correlation coefficients are developed when the data are unbalanced. One approach is based on the generalized p-value and generalized confidence interval, the other is based on the modified large-sample idea. These two approaches simplify to the ones in Gilder et al. [2007. Confidence intervals on intraclass correlation coefficients in a balanced two-factor random design. J. Statist. Plann. Inference 137, 1199–1212] when the data are balanced. Furthermore, some statistical properties of the generalized confidence intervals are investigated. Finally, some simulation results to compare the performance of the modified large-sample approach with that of the generalized approach are reported. The simulation results indicate that the modified large-sample approach performs better than the generalized approach in the coverage probability and expected length of the confidence interval.  相似文献   

11.
We consider the comparison of mean vectors for k groups when k is large and sample size per group is fixed. The asymptotic null and non-null distributions of the normal theory likelihood ratio, Lawley–Hotelling and Bartlett–Nanda–Pillai statistics are derived under general conditions. We extend the results to tests on the profiles of the mean vectors, tests for additional information (provided by a sub-vector of the responses over and beyond the remaining sub-vector of responses in separating the groups) and tests on the dimension of the hyperplane formed by the mean vectors. Our techniques are based on perturbation expansions and limit theorems applied to independent but non-identically distributed sequences of quadratic forms in random matrices. In all these four MANOVA problems, the asymptotic null and non-null distributions are normal. Both the null and non-null distributions are asymptotically invariant to non-normality when the group sample sizes are equal. In the unbalanced case, a slight modification of the test statistics will lead to asymptotically robust tests. Based on the robustness results, some approaches for finite approximation are introduced. The numerical results provide strong support for the asymptotic results and finiteness approximations.  相似文献   

12.
It is argued that the probability of committing at least one type I error should be reported when testing the main effects simultaneously in a two-way disproportionate ANOVA without interaction. The circumstances where the two F-statistics depart appreciably from statistical independence are characterized, and it is pointed out that procedures now exist for evaluating the bivariate F-probabilities when required. The augmented analysis is illustrated with a numerical example and an extension is offered for assymmetric BIBD's with random block effects.  相似文献   

13.
We propose optimal procedures to achieve the goal of partitioning k multivariate normal populations into two disjoint subsets with respect to a given standard vector. Definition of good or bad multivariate normal populations is given according to their Mahalanobis distances to a known standard vector as being small or large. Partitioning k multivariate normal populations is reduced to partitioning k non-central Chi-square or non-central F distributions with respect to the corresponding non-centrality parameters depending on whether the covariance matrices are known or unknown. The minimum required sample size for each population is determined to ensure that the probability of correct decision attains a certain level. An example is given to illustrate our procedures.  相似文献   

14.
Two sufficient conditions are given for an incomplete block design to be (M,S- optimal. For binary designs the conditions are (i) that the elements in each row, excluding the diagonal element, of the association matrix differ by at most one, and (ii) that the off-diagonal elements of the block characteristic matrix differ by at most one. It is also shown how the conditions can be utilized for nonbinary designs and that for blocks of size two the sufficient condition in terms of the association matrix can be attained.  相似文献   

15.
Location-scale invariant Bickel–Rosenblatt goodness-of-fit tests (IBR tests) are considered in this paper to test the hypothesis that f, the common density function of the observed independent d-dimensional random vectors, belongs to a null location-scale family of density functions. The asymptotic behaviour of the test procedures for fixed and non-fixed bandwidths is studied by using an unifying approach. We establish the limiting null distribution of the test statistics, the consistency of the associated tests and we derive its asymptotic power against sequences of local alternatives. These results show the asymptotic superiority, for fixed and local alternatives, of IBR tests with fixed bandwidth over IBR tests with non-fixed bandwidth.  相似文献   

16.
In the field of molecular biology, it is often of interest to analyze microarray data for clustering genes based on similar profiles of gene expression to identify genes that are differentially expressed under multiple biological conditions. One of the notable characteristics of a gene expression profile is that it shows a cyclic curve over a course of time. To group sequences of similar molecular functions, we propose a Bayesian Dirichlet process mixture of linear regression models with a Fourier series for the regression coefficients, for each of which a spike and slab prior is assumed. A full Gibbs-sampling algorithm is developed for an efficient Markov chain Monte Carlo (MCMC) posterior computation. Due to the so-called “label-switching” problem and different numbers of clusters during the MCMC computation, a post-process approach of Fritsch and Ickstadt (2009) is additionally applied to MCMC samples for an optimal single clustering estimate by maximizing the posterior expected adjusted Rand index with the posterior probabilities of two observations being clustered together. The proposed method is illustrated with two simulated data and one real data of the physiological response of fibroblasts to serum of Iyer et al. (1999).  相似文献   

17.
Weed, Bradley and Grovindarajulu (1974) propose one-sample probability ratio tests based on Lehmann alternatives. They also study the finite sure termination of the stopping times. Motivated by Stein's proof of (1946) of the termination of a sequential probability ratio test (SPRT) in the case of independent and identically distributed (i.i.d.) random variables and the work of Sethuraman (1970) for the two- sample rank order SPRT, we obtain a very mild condition (namely, that a certain random variable U(Z) is not identically zero) for the finite sure termination of the existence of the moment generating function (m.g.f.) for the stopping time of one- sample rank order SPRT's.  相似文献   

18.
This paper deals with the distributions of test statistics for the number of useful discriminant functions and the characteristic roots in canonical discriminant analysis. These asymptotic distributions have been extensively studied when the number p   of variables is fixed, the number q+1q+1 of groups is fixed, and the sample size N tends to infinity. However, these approximations become increasingly inaccurate as the value of p increases for a fixed value of N. On the other hand, we encounter to analyze high-dimensional data such that p is large compared to n. The purpose of the present paper is to derive asymptotic distributions of these statistics in a high-dimensional framework such that q   is fixed, p→∞p, m=n-p+q→∞m=n-p+q, and p/n→c∈(0,1)p/nc(0,1), where n=N-q-1n=N-q-1. Numerical simulation revealed that our new asymptotic approximations are more accurate than the classical asymptotic approximations in a considerably wide range of (n,p,q)(n,p,q).  相似文献   

19.
The basic assumption underlying the concept of ranked set sampling is that actual measurement of units is expensive, whereas ranking is cheap. This may not be true in reality in certain cases where ranking may be moderately expensive. In such situations, based on total cost considerations, k-tuple ranked set sampling is known to be a viable alternative, where one selects k units (instead of one) from each ranked set. In this article, we consider estimation of the distribution function based on k-tuple ranked set samples when the cost of selecting and ranking units is not ignorable. We investigate estimation both in the balanced and unbalanced data case. Properties of the estimation procedure in the presence of ranking error are also investigated. Results of simulation studies as well as an application to a real data set are presented to illustrate some of the theoretical findings.  相似文献   

20.
In this paper, we present Srivastava-Chopra optimal balanced resolution V plans for 2m factorials (4?m?8) which are robust in the sense that, when any observation is missing, each of these designs will remain as a resolution V plan.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号