首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The latent class model or multivariate multinomial mixture is a powerful approach for clustering categorical data. It uses a conditional independence assumption given the latent class to which a statistical unit is belonging. In this paper, we exploit the fact that a fully Bayesian analysis with Jeffreys non-informative prior distributions does not involve technical difficulty to propose an exact expression of the integrated complete-data likelihood, which is known as being a meaningful model selection criterion in a clustering perspective. Similarly, a Monte Carlo approximation of the integrated observed-data likelihood can be obtained in two steps: an exact integration over the parameters is followed by an approximation of the sum over all possible partitions through an importance sampling strategy. Then, the exact and the approximate criteria experimentally compete, respectively, with their standard asymptotic BIC approximations for choosing the number of mixture components. Numerical experiments on simulated data and a biological example highlight that asymptotic criteria are usually dramatically more conservative than the non-asymptotic presented criteria, not only for moderate sample sizes as expected but also for quite large sample sizes. This research highlights that asymptotic standard criteria could often fail to select some interesting structures present in the data.  相似文献   

2.
In this paper the exact null distribution of Bartlett's criterion for testing the homogeneity of variances in normal samples with unequal sizes is derived. The most general form of the density function is obtained by using contour integration. The expression for the cumulative distribution, being a series in simple algebraic functions, seems quite tractable for computation of the exact critical values. In the special case of equal sample sizes, some indication of the relation of the work of others to our series expansions has also been given.  相似文献   

3.
SUMMARY In regression analysis, a best subset of regressors is usually selected by minimizing Mallows's C statistic or some other equivalent criterion, such as the Akaike lambda information criterion or cross-validation. It is known that the resulting procedure suffers from a lack of consistency that can lead to a model with too many variables. For this reason, corrections have been proposed that yield consistent procedures. The object of this paper is to show that these corrected criteria, although asymptotically consistent, are usually too conservative for finite sample sizes. The paper also proposes a new correction of Mallows's statistic that yields better results. A simulation study is conducted that shows that the proposed criterion performs well in a variety of situations.  相似文献   

4.
In this article, we point out some interesting relations between the exact test and the score test for a binomial proportion p. Based on the properties of the tests, we propose some approximate as well as exact methods of computing sample sizes required for the tests to attain a specified power. Sample sizes required for the tests are tabulated for various values of p to attain a power of 0.80 at level 0.05. We also propose approximate and exact methods of computing sample sizes needed to construct confidence intervals with a given precision. Using the proposed exact methods, sample sizes required to construct 95% confidence intervals with various precisions are tabulated for p = .05(.05).5. The approximate methods for computing sample sizes for score confidence intervals are very satisfactory and the results coincide with those of the exact methods for many cases.  相似文献   

5.
Pearson’s chi-square (Pe), likelihood ratio (LR), and Fisher (Fi)–Freeman–Halton test statistics are commonly used to test the association of an unordered r×c contingency table. Asymptotically, these test statistics follow a chi-square distribution. For small sample cases, the asymptotic chi-square approximations are unreliable. Therefore, the exact p-value is frequently computed conditional on the row- and column-sums. One drawback of the exact p-value is that it is conservative. Different adjustments have been suggested, such as Lancaster’s mid-p version and randomized tests. In this paper, we have considered 3×2, 2×3, and 3×3 tables and compared the exact power and significance level of these test’s standard, mid-p, and randomized versions. The mid-p and randomized test versions have approximately the same power and higher power than that of the standard test versions. The mid-p type-I error probability seldom exceeds the nominal level. For a given set of parameters, the power of Pe, LR, and Fi differs approximately the same way for standard, mid-p, and randomized test versions. Although there is no general ranking of these tests, in some situations, especially when averaged over the parameter space, Pe and Fi have the same power and slightly higher power than LR. When the sample sizes (i.e., the row sums) are equal, the differences are small, otherwise the observed differences can be 10% or more. In some cases, perhaps characterized by poorly balanced designs, LR has the highest power.  相似文献   

6.
We study a factor analysis model with two normally distributed observations and one factor. In the case when the errors have equal variance, the maximum likelihood estimate of the factor loading is given in closed form. Exact and approximate distributions of the maximum likelihood estimate are considered. The exact distribution function is given in a complex form that involves the incomplete Beta function. Approximations to the distribution function are given for the cases of large sample sizes and small error variances. The accuracy of the approximations is discussed  相似文献   

7.
Edgeworth expansions as well as saddle-point methods are used to approximate the distributions of some spacing statistics for small to moderate sample sizes. By comparing with the exact values when available, it is shown that a particular form of Edgeworth expansion produces extremely good results even for fairly small sample sizes. However, this expansion suffers from negative tail probabilities and an accurate approximation without this disadvantage, is shown to be the one based on saddle-point method. Finally, quantiles of some spacing statistics whose exact distributions are not known, are tabulated, making them available in a variety of testing contexts.  相似文献   

8.
A simple procedure for specifying a histogram with variable cell sizes is proposed. The procedure chooses a set of cutpoints that maximizes a criterion function based on the sample spacings:Under some conditions, this estimated set of cutpoints is shown to converge in probability to the theoretical set of cutpoints for the histogram estimate that minimizes the Hellingerdistance to the underlying density. An algorithm for finding the set of cutpoints that numerically maximizes the criterion function is presented along with an example. Performance for finite sample sizes is evaluated by simulations.  相似文献   

9.
Summary.  Realistic statistical modelling of observational data often suggests a statistical model which is not fully identified, owing to potential biases that are not under the control of study investigators. Bayesian inference can be implemented with such a model, ideally with the most precise prior knowledge that can be ascertained. However, as a consequence of the non-identifiability, inference cannot be made arbitrarily accurate by choosing the sample size to be sufficiently large. In turn, this has consequences for sample size determination. The paper presents a sample size criterion that is based on a quantification of how much Bayesian learning can arise in a given non-identified model. A global perspective is adopted, whereby choosing larger sample sizes for some studies necessarily implies that some other potentially worthwhile studies cannot be undertaken. This suggests that smaller sample sizes should be selected with non-identified models, as larger sample sizes constitute a squandering of resources in making estimator variances very small compared with their biases. Particularly, consider two investigators planning the same study, one of whom admits to the potential biases at hand and consequently uses a non-identified model, whereas the other pretends that there are no biases, leading to an identified but less realistic model. It is seen that the former investigator always selects a smaller sample size than the latter, with the difference being quite marked in some illustrative cases.  相似文献   

10.
In applications of survival analysis, the failure rate function may frequently present a unimodal shape. In such cases, the log-normal and log-logistic distributions are used. In this paper, we shall be concerned only with parametric forms, so a location-scale regression model based on the odd log-logistic Weibull distribution is proposed for modelling data with a decreasing, increasing, unimodal and bathtub failure rate function as an alternative to the log-Weibull regression model. For censored data, we consider a classic method to estimate the parameters of the proposed model. We derive the appropriate matrices for assessing local influences on the parameter estimates under different perturbation schemes and present some ways to assess global influences. Further, for different parameter settings, sample sizes and censoring percentages, various simulations are performed. In addition, the empirical distribution of some modified residuals is determined and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be extended to a modified deviance residual in the new regression model applied to censored data. We analyse a real data set using the log-odd log-logistic Weibull regression model.  相似文献   

11.
Two simple tests which allow for unequal sample sizes are considered for testing hypothesis for the common mean of two normal populations. The first test is an exact test of size a based on two available t-statistics based on single samples made exact through random allocation of α among the two available t-tests. The test statistic of the second test is a weighted average of two available t-statistics with random weights. It is shown that the first test is more efficient than the available two t-tests with respect to Bahadur asymptotic relative efficiency. It is also shown that the null distribution of the test statistic in the second test, which is similar to the one based on the normalized Graybill-Deal test statistic, converges to a standard normal distribution. Finally, we compare the small sample properties of these tests, those given in Zhou and Mat hew (1993), and some tests given in Cohen and Sackrowitz (1984) in a simulation study. In this study, we find that the second test performs better than the tests given in Zhou and Mathew (1993) and is comparable to the ones given in Cohen and Sackrowitz (1984) with respect to power..  相似文献   

12.
It is shown that the nonparametric two-saniDle test recently proposed by Baumgartner, WeiB, Schindler (1998, Biometrics, 54, 1129-1135) does not control the type I error rate in case of small sample sizes. We investigate the exact permutation test based on their statistic and demonstrate that this test is almost not conservative. Comparing exact tests, the procedure based on the new statistic has a less conservative size and is, according to simulation results, more powerful than the often employed Wilcoxon test. Furthermore, the new test is also powerful with regard to less restrictive settings than the location-shift model. For example, the test can detect location-scale alternatives. Therefore, we use the test to create a powerful modification of the nonparametric location-scale test according to Lepage (1971, Biometrika, 58, 213-217). Selected critical values for the proposed tests are given.  相似文献   

13.
Variance estimators for probability sample-based predictions of species richness (S) are typically conditional on the sample (expected variance). In practical applications, sample sizes are typically small, and the variance of input parameters to a richness estimator should not be ignored. We propose a modified bootstrap variance estimator that attempts to capture the sampling variance by generating B replications of the richness prediction from stochastically resampled data of species incidence. The variance estimator is demonstrated for the observed richness (SO), five richness estimators, and with simulated cluster sampling (without replacement) in 11 finite populations of forest tree species. A key feature of the bootstrap procedure is a probabilistic augmentation of a species incidence matrix by the number of species expected to be ‘lost’ in a conventional bootstrap resampling scheme. In Monte-Carlo (MC) simulations, the modified bootstrap procedure performed well in terms of tracking the average MC estimates of richness and standard errors. Bootstrap-based estimates of standard errors were as a rule conservative. Extensions to other sampling designs, estimators of species richness and diversity, and estimates of change are possible.  相似文献   

14.
Asymptotic theory of using the Fisher information matrix may provide poor approximation to the exact variance matrix of maximum likelihood estimation in nonlinear models. This may be due to not obtaining an efficient D-optimal design. In this article, we propose a modified D-optimality criterion, using a more accurate information matrix, based on the Bhattacharyya matrix. The proposed information matrix and its properties are given for two parameters simple logistic model. It is shown that the resulted modified locally D-optimal design is more efficient than the previous one; particularly, for small sample size experiments.  相似文献   

15.
One of the most important steps in the design of a pharmaceutical clinical trial is the estimation of the sample size. For a superiority trial the sample size formula (to achieve a stated power) would be based on a given clinically meaningful difference and a value for the population variance. The formula is typically used as though this population variance is known whereas in reality it is unknown and is replaced by an estimate with its associated uncertainty. The variance estimate would be derived from an earlier similarly designed study (or an overall estimate from several previous studies) and its precision would depend on its degrees of freedom. This paper provides a solution for the calculation of sample sizes that allows for the imprecision in the estimate of the sample variance and shows how traditional formulae give sample sizes that are too small since they do not allow for this uncertainty with the deficiency being more acute with fewer degrees of freedom. It is recommended that the methodology described in this paper should be used when the sample variance has less than 200 degrees of freedom.  相似文献   

16.
In many experimental situations we need to test the hypothesis concerning the equality of parameters of two or more binomial populations. Of special interest is the knowledge of the sample sizes needed to detect certain differences among the parameters, for a specified power, and at a given level of significance. Al-Bayyati (1971) derived a rule of thumb for a quick calculation of the sample size needed to compare two binomial parameters. The rule is defined in terms of the difference desired to be detected between the two parameters.

In this paper, we introduce a generalization of Al-Bayyatifs rule to several independent proportions. The generalized rule gives a conservative estimate of the sample size needed to achieve a specified power in detecting certain differences among the binomial parameters at a given level of significance. The method is illustrated with an example  相似文献   

17.
Two new implementations of the EM algorithm are proposed for maximum likelihood fitting of generalized linear mixed models. Both methods use random (independent and identically distributed) sampling to construct Monte Carlo approximations at the E-step. One approach involves generating random samples from the exact conditional distribution of the random effects (given the data) by rejection sampling, using the marginal distribution as a candidate. The second method uses a multivariate t importance sampling approximation. In many applications the two methods are complementary. Rejection sampling is more efficient when sample sizes are small, whereas importance sampling is better with larger sample sizes. Monte Carlo approximation using random samples allows the Monte Carlo error at each iteration to be assessed by using standard central limit theory combined with Taylor series methods. Specifically, we construct a sandwich variance estimate for the maximizer at each approximate E-step. This suggests a rule for automatically increasing the Monte Carlo sample size after iterations in which the true EM step is swamped by Monte Carlo error. In contrast, techniques for assessing Monte Carlo error have not been developed for use with alternative implementations of Monte Carlo EM algorithms utilizing Markov chain Monte Carlo E-step approximations. Three different data sets, including the infamous salamander data of McCullagh and Nelder, are used to illustrate the techniques and to compare them with the alternatives. The results show that the methods proposed can be considerably more efficient than those based on Markov chain Monte Carlo algorithms. However, the methods proposed may break down when the intractable integrals in the likelihood function are of high dimension.  相似文献   

18.
When testing hypotheses in two-sample problems, the Wilcoxon rank-sum test is often used to test the location parameter, and this test has been discussed by many authors over the years. One modification of the Wilcoxon rank-sum test was proposed by Tamura [On a modification of certain rank tests. Ann Math Stat. 1963;34:1101–1103]. Deriving the exact critical value of the statistic is difficult when the sample sizes are increased. The normal approximation, the Edgeworth expansion, the saddlepoint approximation, and the permutation test were used to evaluate the upper tail probability for the modified Wilcoxon rank-sum test given finite sample sizes. The accuracy of various approximations to the probability of the modified Wilcoxon statistic was investigated. Simulations were used to investigate the power of the modified Wilcoxon rank-sum test for the one-sided alternative with various population distributions for small sample sizes. The method was illustrated by the analysis of real data.  相似文献   

19.
We introduce the log-odd Weibull regression model based on the odd Weibull distribution (Cooray, 2006). We derive some mathematical properties of the log-transformed distribution. The new regression model represents a parametric family of models that includes as sub-models some widely known regression models that can be applied to censored survival data. We employ a frequentist analysis and a parametric bootstrap for the parameters of the proposed model. We derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and present some ways to assess global influence. Further, for different parameter settings, sample sizes and censoring percentages, some simulations are performed. In addition, the empirical distribution of some modified residuals are given and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be extended to a modified deviance residual in the proposed regression model applied to censored data. We define martingale and deviance residuals to check the model assumptions. The extended regression model is very useful for the analysis of real data.  相似文献   

20.
In this paper we discuss the sample size problem for balanced one-way ANOVA under a posterior Bayesian formulation of the problem. Using the distribution theory of appropriate quadratic forms we derive explicit sample sizes for prespecified posterior precisions. Comparisons with classical sample sizes are made. Instead of extensive tables, a mathematica program for sample size calculation is given. The formulations given in this article form a foundational step towards Bayesian calculation of sample size, in general.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号