首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Without the exchangeability assumption, permutation tests for comparing two population means do not provide exact control of the probability of making a Type I error. Another drawback of permutation tests is that it cannot be used to test hypothesis about one population. In this paper, we propose a new type of permutation tests for testing the difference between two population means: the split sample permutation t-tests. We show that the split sample permutation t-tests do not require the exchangeability assumption, are asymptotically exact and can be easily extended to testing hypothesis about one population. Extensive simulations were carried out to evaluate the performance of two specific split sample permutation t-tests: the split in the middle permutation t-test and the split in the end permutation t-test. The simulation results show that the split in the middle permutation t-test has comparable performance to the permutation test if the population distributions are symmetric and satisfy the exchangeability assumption. Otherwise, the split in the end permutation t-test has significantly more accurate control of level of significance than the split in the middle permutation t-test and other existing permutation tests.  相似文献   

2.
Preliminary testing procedures for the two means problem traditionally employ the pooled variance t-statistic. In this paper we show that bias of the t-statistic under conditions of heterogeneity of variance may be increased if use of the t-statistic is conditional on an affirmative F-test. For this reason we conclude that use of the t-statistic in preliminary testing procedures is inappropriate.  相似文献   

3.
Complete sets of orthogonal F-squares of order n = sp, where g is a prime or prime power and p is a positive integer have been constructed by Hedayat, Raghavarao, and Seiden (1975). Federer (1977) has constructed complete sets of orthogonal F-squares of order n = 4t, where t is a positive integer. We give a general procedure for constructing orthogonal F-squares of order n from an orthogonal array (n, k, s, 2) and an OL(s, t) set, where n is not necessarily a prime or prime power. In particular, we show how to construct sets of orthogonal F-squares of order n = 2sp, where s is a prime or prime power and p is a positive integer. These sets are shown to be near complete and approach complete sets as s and/or p become large. We have also shown how to construct orthogonal arrays by these methods. In addition, the best upper bound on the number t of orthogonal F(n, λ1), F(n, λ2), …, F(n, λ1) squares is given.  相似文献   

4.
This paper suggests some distribution-free methods for testing hypothesis of parallelism and concurrence of two linear regressions. We assume that the independent variable x is equally spaced. The proposed procedures are compared with nonparametric competitors and the normal theory t-test.  相似文献   

5.
This study examined the influence of heterogeneity of variance on Type I error rates and power of the independent-samples Student's t-test of equality of means on samples of scores from normal and 10 non-normal distributions. The same test of equality of means was performed on corresponding rank-transformed scores. For many non-normal distributions, both versions produced anomalous power functions, resulting partly from the fact that the hypothesis test was biased, so that under some conditions, the probability of rejecting H 0 decreased as the difference between means increased. In all cases where bias occurred, the t-test on ranks exhibited substantially greater bias than the t-test on scores. This anomalous result was independent of the more familiar changes in Type I error rates and power attributable to unequal sample sizes combined with unequal variances.  相似文献   

6.
We propose penalized-likelihood methods for parameter estimation of high dimensional t distribution. First, we show that a general class of commonly used shrinkage covariance matrix estimators for multivariate normal can be obtained as penalized-likelihood estimator with a penalty that is proportional to the entropy loss between the estimate and an appropriately chosen shrinkage target. Motivated by this fact, we then consider applying this penalty to multivariate t distribution. The penalized estimate can be computed efficiently using EM algorithm for given tuning parameters. It can also be viewed as an empirical Bayes estimator. Taking advantage of its Bayesian interpretation, we propose a variant of the method of moments to effectively elicit the tuning parameters. Simulations and real data analysis demonstrate the competitive performance of the new methods.  相似文献   

7.
A p-value is developed for testing the equivalence of the variances of a bivariate normal distribution. The unknown correlation coefficient is a nuisance parameter in the problem. If the correlation is known, the proposed p-value provides an exact test. For large samples, the p-value can be computed by replacing the unknown correlation by the sample correlation, and the resulting test is quite satisfactory. For small samples, it is proposed to compute the p-value by replacing the unknown correlation by a scalar multiple of the sample correlation. However, a single scalar is not satisfactory, and it is proposed to use different scalars depending on the magnitude of the sample correlation coefficient. In order to implement this approach, tables are obtained providing sub-intervals for the sample correlation coefficient, and the scalars to be used if the sample correlation coefficient belongs to a particular sub-interval. Once such tables are available, the proposed p-value is quite easy to compute since it has an explicit analytic expression. Numerical results on the type I error probability and power are reported on the performance of such a test, and the proposed p-value test is also compared to another test based on a rejection region. The results are illustrated with two examples: an example dealing with the comparability of two measuring devices, and an example dealing with the assessment of bioequivalence.  相似文献   

8.
The basic assumption underlying the concept of ranked set sampling is that actual measurement of units is expensive, whereas ranking is cheap. This may not be true in reality in certain cases where ranking may be moderately expensive. In such situations, based on total cost considerations, k-tuple ranked set sampling is known to be a viable alternative, where one selects k units (instead of one) from each ranked set. In this article, we consider estimation of the distribution function based on k-tuple ranked set samples when the cost of selecting and ranking units is not ignorable. We investigate estimation both in the balanced and unbalanced data case. Properties of the estimation procedure in the presence of ranking error are also investigated. Results of simulation studies as well as an application to a real data set are presented to illustrate some of the theoretical findings.  相似文献   

9.
Clustering gene expression time course data is an important problem in bioinformatics because understanding which genes behave similarly can lead to the discovery of important biological information. Statistically, the problem of clustering time course data is a special case of the more general problem of clustering longitudinal data. In this paper, a very general and flexible model-based technique is used to cluster longitudinal data. Mixtures of multivariate t-distributions are utilized, with a linear model for the mean and a modified Cholesky-decomposed covariance structure. Constraints are placed upon the covariance structure, leading to a novel family of mixture models, including parsimonious models. In addition to model-based clustering, these models are also used for model-based classification, i.e., semi-supervised clustering. Parameters, including the component degrees of freedom, are estimated using an expectation-maximization algorithm and two different approaches to model selection are considered. The models are applied to simulated data to illustrate their efficacy; this includes a comparison with their Gaussian analogues—the use of these Gaussian analogues with a linear model for the mean is novel in itself. Our family of multivariate t mixture models is then applied to two real gene expression time course data sets and the results are discussed. We conclude with a summary, suggestions for future work, and a discussion about constraining the degrees of freedom parameter.  相似文献   

10.
Early investigations of the effects of non-normality indicated that skewness has a greater effect on the distribution of t-statistic than does kurtosis. When the distribution is skewed, the actual p-values can be larger than the values calculated from the t-tables. Transformation of data to normality has shown good results in the case of univariate t-test. In order to reduce the effect of skewness of the distribution on normal-based t-test, one can transform the data and perform the t-test on the transformed scale. This method is not only a remedy for satisfying the distributional assumption, but it also turns out that one can achieve greater efficiency of the test. We investigate the efficiency of tests after a Box-Cox transformation. In particular, we consider the one sample test of location and study the gains in efficiency for one-sample t-test following a Box-Cox transformation. Under some conditions, we prove that the asymptotic relative efficiency of transformed t-test and Hotelling's T 2-test of multivariate location with respect to the same statistic based on untransformed data is at least one.  相似文献   

11.
This article presents a fully Bayesian approach to modeling incomplete longitudinal data using the t linear mixed model with AR(p) dependence. Markov chain Monte Carlo (MCMC) techniques are implemented for computing posterior distributions of parameters. To facilitate the computation, two types of auxiliary indicator matrices are incorporated into the model. Meanwhile, the constraints on the parameter space arising from the stationarity conditions for the autoregressive parameters are handled by a reparametrization scheme. Bayesian predictive inferences for the future vector are also investigated. An application is illustrated through a real example from a multiple sclerosis clinical trial.  相似文献   

12.
Ryszard Zieliński 《Statistics》2013,47(1-2):143-150
A paradoxical behavior of the t-test under ε-contamination is presented. The paradox consists in that under a fixed distribution of contaminants an increasing of the probability of the appearance of a contaminant may decrease the violation of the size of the test! A simple explanation of the phenomenon is given. It is revealed which contaminants make the test conservative and which make it liberal: it appears that, in spite of the established opinion, conservatism or liberalism of the test depends not so much on the tails of the contaminating distribution as on where its support is located.  相似文献   

13.
In this paper, the hypothesis testing and interval estimation for the intraclass correlation coefficients are considered in a two-way random effects model with interaction. Two particular intraclass correlation coefficients are described in a reliability study. The tests and confidence intervals for the intraclass correlation coefficients are developed when the data are unbalanced. One approach is based on the generalized p-value and generalized confidence interval, the other is based on the modified large-sample idea. These two approaches simplify to the ones in Gilder et al. [2007. Confidence intervals on intraclass correlation coefficients in a balanced two-factor random design. J. Statist. Plann. Inference 137, 1199–1212] when the data are balanced. Furthermore, some statistical properties of the generalized confidence intervals are investigated. Finally, some simulation results to compare the performance of the modified large-sample approach with that of the generalized approach are reported. The simulation results indicate that the modified large-sample approach performs better than the generalized approach in the coverage probability and expected length of the confidence interval.  相似文献   

14.
Identifying differentially expressed genes is a basic objective in microarray experiments. Many statistical methods for detecting differentially expressed genes in multiple-slide experiments have been proposed. However, sometimes with limited experimental resources, only a single cDNA array or two Oligonuleotide arrays could be made or only insufficient replicated arrays could be conducted. Many current statistical models cannot be used because of the non-availability of replicated data. Simply using fold changes is also unreliable and inefficient [Chen et al. 1997. Ratio-based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Optics 2, 364–374; Newton et al. 2001. On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 8, 37–52; Pan et al. 2002. How many replicates of arrays are required to detect gene expression changes in microarray experiments? a mixture model approach. Genome Biol. 3, research0022.1-0022.10]. We propose a new method. If the log-transformed ratios for the expressed genes as well as unexpressed genes have equal variance, we use a Hadamard matrix to construct a t-test from a single array data. Basically, we test whether each doubtful gene has significantly differential expression compared to the unexpressed genes. We form some new random variables corresponding to the rows of a Hadamard matrix using the algebraic sum of gene expressions. A one-sample t-test is constructed and the p-value is calculated for each doubtful gene based on these random variables. By using any method for multiple testing, adjusted p-values could be obtained from original p-values and significance of doubtful genes can be determined. When the variance of expressed genes differs from the variance of unexpressed genes, we construct a z-statistic based on the result from application of Hadamard matrix and find the confidence interval to retain the null hypothesis. Using the interval, we determine differentially expressed genes. This method is also useful for multiple microarrays, especially when sufficient replicated data are not available for a traditional t-test. We apply our methodology to ApoAI data. The results appear to be promising. They not only confirm the early known differentially expressed genes, but also indicate more genes to be differentially expressed.  相似文献   

15.
We give a new and easier proof of the existence of a disjoint (3t  , 3, 1) cyclic difference family for every t>3t>3, first proved by Dinitz and Shalaby (2002). Our purely theoretical construction is still elementary but simpler and does not need to be checked by computer.  相似文献   

16.
This study compares empirical type I error and power of different permutation techniques that can be used for partial correlation analysis involving three data vectors and for partial Mantel tests. The partial Mantel test is a form of first-order partial correlation analysis involving three distance matrices which is widely used in such fields as population genetics, ecology, anthropology, psychometry and sociology. The methods compared are the following: (1) permute the objects in one of the vectors (or matrices); (2) permute the residuals of a null model; (3) correlate residualized vector 1 (or matrix A) to residualized vector 2 (or matrix B); permute one of the residualized vectors (or matrices); (4) permute the residuals of a full model. In the partial correlation study, the results were compared to those of the parametric t-test which provides a reference under normality. Simulations were carried out to measure the type I error and power of these permutatio methods, using normal and non-normal data, without and with an outlier. There were 10 000 simulations for each situation (100 000 when n = 5); 999 permutations were produced per test where permutations were used. The recommended testing procedures are the following:(a) In partial correlation analysis, most methods can be used most of the time. The parametric t-test should not be used with highly skewed data. Permutation of the raw data should be avoided only when highly skewed data are combined with outliers in the covariable. Methods implying permutation of residuals, which are known to only have asymptotically exact significance levels, should not be used when highly skewed data are combined with small sample size. (b) In partial Mantel tests, method 2 can always be used, except when highly skewed data are combined with small sample size. (c) With small sample sizes, one should carefully examine the data before partial correlation or partial Mantel analysis. For highly skewed data, permutation of the raw data has correct type I error in the absence of outliers. When highly skewed data are combined with outliers in the covariable vector or matrix, it is still recommended to use the permutation of raw data. (d) Method 3 should never be used.  相似文献   

17.
This paper gives necessary and sufficient conditions on σ, s, t and on μ, s, t for an array with s+t rows to have strength s and weight σ, or to be balanced and have strength s and weight μ. If a balanced array can exist, the conditions provide a construction. The solutions for t=1,2 are also given in an alternate form useful for the study of trim arrays. The balanced solution for t=1 is more detailed than that known so far, and permits one to determine whether or not a solution exists in possibly fewer steps.  相似文献   

18.
A dynamic model of a heterogeneous population is studied. Particles belonging to a population are divided, at every time t, into a finite number of classes according to their types and the partition changes over time. The role of the occupancy numbers, namely the cardinality of each class, is highlighted. The relationship between the stochastic process of occupancy numbers and the process of particle types is analyzed. The main goal of this paper is the estimation of the lifetime of each particle at a given time t, when the observed data are the history of the process of the number of dead particles up to t. Furthermore, a discrete time approximation of the filter is given.  相似文献   

19.
In this paper, we consider the prediction problem in multiple linear regression model in which the number of predictor variables, p, is extremely large compared to the number of available observations, n  . The least-squares predictor based on a generalized inverse is not efficient. We propose six empirical Bayes estimators of the regression parameters. Three of them are shown to have uniformly lower prediction error than the least-squares predictors when the vector of regressor variables are assumed to be random with mean vector zero and the covariance matrix (1/n)XtX(1/n)XtX where Xt=(x1,…,xn)Xt=(x1,,xn) is the p×np×n matrix of observations on the regressor vector centered from their sample means. For other estimators, we use simulation to show its superiority over the least-squares predictor.  相似文献   

20.
In a clinical trial comparing drug with placebo, where there are multiple primary endpoints, we consider testing problems where an efficacious drug effect can be claimed only if statistical significance is demonstrated at the nominal level for all endpoints. Under the assumption that the data are multivariate normal, the multiple endpoint-testing problem is formulated. The usual testing procedure involves testing each endpoint separately at the same significance level using two-sample t-tests, and claiming drug efficacy only if each t-statistic is significant. In this paper we investigate properties of this procedure. We show that it is identical to both an intersection union test and the likelihood ratio test. A simple expression for the p-value is given. The level and power function are studied; it is shown that the test may be conservative and that it is biased. Computable bounds for the power function are established.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号