首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
An objective of Record Linkage is to link two data files by identifying common elements. A popular model for doing the separation is the probabilistic one from Fellegi and Sunter. To estimate the parameters needed for the model usually a mixture model is constructed and the EM algorithm is applied. For simplification, the assumption of conditional independence is often made. This assumption says that if several attributes of elements in the data are compared, then the results of the comparisons regarding the several attributes are independent within the mixture classes. A mixture model constructed with this assumption has been often used. Within this article a straightforward extension of the model is introduced which allows for conditional dependencies but is heavily dependent on the choice of the starting value. Therefore also an estimation procedure for the EM algorithm starting value is proposed. The two models are compared empirically in a simulation study based on telephone book entries. Particularly the effect of different starting values and conditional dependencies on the matching results is investigated.  相似文献   

2.
A sequentially rejective (SR) testing procedure introduced by Holm (1979) and modified (MSR) by Shaffer (1986) is considered for testing all pairwise mean comparisons.For such comparisons, both the SR and MSR methods require that the observed test statistics be ordered and compared, each in turn, to appropriate percentiles on Student's t distribution.For the MSR method these percentiles are based on the maximum number of true null hypotheses remaining at each stage of the sequential procedure, given prior significance at previous stages, A function is developed for determining this number from the number of means being tested and the stage of the test.For a test of all pairwise comparisons, the logical implications which follow the rejection of a null hypothesis renders the MSR procedure uniformly more powerful than the SR procedure.Tables of percentiles for comparing K means, 3 < K < 6, using the MSR method are presented.These tables use Sidak's (1967) multiplicative inequality and simplify the use of t he MSR procedure.Several modifications to the MSR are suggested as a means of further increasing the power for testing the pairwise comparisons.General use of the MSR and the corresponding function for testing other parameters besides the mean is discussed.  相似文献   

3.
The problem of selection of a subset containing the largest of several location parameters is considered, and a Gupta-type selection rule based on sample medians is investigated for normal and double exponential populations. Numerical comparisons between rules based on medians and means of small samples are made for normal and contaminated normal populations, assuming the popula-tion means to be equally spaced. It appears that the rule based on sample means loses its superiority over the rule based on sample medians in case the samples are heavily contaminated. The asymptotic relative efficiency (ARE) of the medians procedure relative to the means procedure is also computed, assuming the normal means to be in a slippage configuration. The means proce-dure is found to be superior to the median procedure in the sense of ARE. As in the small sample case, the situation is reversed if the normal populations are highly contaminate.  相似文献   

4.
Identification in censored regression analysis and hazard models of duration outcomes relies on the condition that censoring points are conditionally independent of latent outcomes, an assumption which may be questionable in many settings. This article proposes a test for this assumption based on a Cramer–von-Mises-like test statistic comparing two different nonparametric estimators for the latent outcome cdf: the Kaplan–Meier estimator, and the empirical cdf conditional on the censoring point exceeding (for right-censored data) the cdf evaluation point. The test is consistent and has power against a wide variety of alternatives. Applying the test to unemployment duration data from the NLSY, the SIPP, and the PSID suggests the assumption is frequently suspect.  相似文献   

5.
A Bayesian test for the point null testing problem in the multivariate case is developed. A procedure to get the mixed distribution using the prior density is suggested. For comparisons between the Bayesian and classical approaches, lower bounds on posterior probabilities of the null hypothesis, over some reasonable classes of prior distributions, are computed and compared with the p-value of the classical test. With our procedure, a better approximation is obtained because the p-value is in the range of the Bayesian measures of evidence.  相似文献   

6.

When analyzing categorical data using loglinear models in sparse contingency tables, asymptotic results may fail. In this paper the empirical properties of three commonly used asymptotic tests of independence, based on the uniform association model for ordinal data, are investigated by means of Monte Carlo simulation. Five different bootstrapped tests of independence are presented and compared to the asymptotic tests. The comparisons are made with respect to both size and power properties of the tests. Results indicate that the asymptotic tests have poor size control. The test based on the estimated association parameter is severely conservative and the two chi-squared tests (Pearson, likelihood-ratio) are both liberal. The bootstrap tests that either use a parametric assumption or are based on non-pivotal test statistics do not perform better than the asymptotic tests in all situations. The bootstrap tests that are based on approximately pivotal statistics provide both adjustment of size and enhancement of power. These tests are therefore recommended for use in situations similar to those included in the simulation study.  相似文献   

7.
Abstract.  Wang & Wells [ J. Amer. Statist. Assoc. 95 (2000) 62] describe a non-parametric approach for checking whether the dependence structure of a random sample of censored bivariate data is appropriately modelled by a given family of Archimedean copulas. Their procedure is based on a truncated version of the Kendall process introduced by Genest & Rivest [ J. Amer. Statist. Assoc. 88 (1993) 1034] and later studied by Barbe et al . [ J. Multivariate Anal. 58 (1996) 197]. Although Wang & Wells (2000) determine the asymptotic behaviour of their truncated process, their model selection method is based exclusively on the observed value of its L 2-norm. This paper shows how to compute asymptotic p -values for various goodness-of-fit test statistics based on a non-truncated version of Kendall's process. Conditions for weak convergence are met in the most common copula models, whether Archimedean or not. The empirical behaviour of the proposed goodness-of-fit tests is studied by simulation, and power comparisons are made with a test proposed by Shih [ Biometrika 85 (1998) 189] for the gamma frailty family.  相似文献   

8.
A bootstrap based method to construct 1−α simultaneous confidence intervals for relative effects in the one-way layout is presented. This procedure takes the stochastic correlation between the test statistics into account and results in narrower simultaneous confidence intervals than the application of the Bonferroni correction. Instead of using the bootstrap distribution of a maximum statistic, the coverage of the confidence intervals for the individual comparisons are adjusted iteratively until the overall confidence level is reached. Empirical coverage and power estimates of the introduced procedure for many-to-one comparisons are presented and compared with asymptotic procedures based on the multivariate normal distribution.  相似文献   

9.
ABSTRACT

Harter (1979) summarized applications of order statistics to multivariate analysis up through 1949. The present paper covers the period 1950–1959. References in the two papers were selected from the first and second volumes, respectively, of the author's chronological annotated bibliography on order statistics [Harter (1978, 1983)]. Tintner (1950a) established formal relations between four special types of multivariate analysis: (1) canonical correlation, (2) principal components, (3) weighted regression, and (4) discriminant analysis, all of which depend on ordered roots of determinantal equations. During the decade 1950–1959, numerous authors contributed to distribution theory and/or computational methods for ordered roots and their applications to multivariate analysis. Test criteria for (i) multivariate analysis of variance, (ii) comparison of variance–covariance matrices, and (iii) multiple independence of groups of variates when the parent population is multivariate normal were usually derived from the likelihood ratio principle until S. N. Roy (1953) formulated the union–intersection principles on which Roy & Bose (1953) based their simultaneous test and confidence procedure. Roy & Bargmann (1958) used an alternative procedure, called the step–down procedure, in deriving a test for problem (iii), and J. Roy (1958) applied the step–down procedure to problem (i) and (ii), Various authors developed and applied distribution theory for several multivariate distributions. Advances were also made on multivariate tolerance regions [Fraser & Wormleighton (1951), Fraser (1951, 1953), Fraser & Guttman (1956), Kemperman (1956), and Somerville (1958)], a criterion for rejection of multivariate outliers [Kudô (1957)], and linear estimators, from censored samples, of parameters of multivariate normal populations [Watterson (1958, 1959)]. Textbooks on multivariate analysis were published by Kendall (1957) and Anderson (1958), as well as a monograph by Roy (1957) and a book of tables by Pillai (1957).  相似文献   

10.
The problem of testing the equality of the noncentrality parameters of two noncentral t-distributions with identical degrees of freedom is considered, which arises from the comparison of two signal-to-noise ratios for simple linear regression models. A test procedure is derived that is guaranteed to maintain Type I error while having only minimal amounts of conservativeness, and comparisons are made with several other approaches to this problem based on variance stabilizing transformations. The new procedure derived in this article is shown to have good properties and will be useful for practitioners.  相似文献   

11.
Repeated measurements are collected in a variety of situations and are generally characterized by a mixed model where the correlation within the subject is specified by the random effects. In such a mixed model, we propose a multiple comparison procedure based on a variant of the Schwarz information criterion (SIC; Schwarz, 1978 Schwarz , G. ( 1978 ). Estimating the dimension of a model . Ann. Statist. 6 : 461464 .[Crossref], [Web of Science ®] [Google Scholar]). The derivation of SIC indicates that SIC serves as an asymptotic approximation to a transformation of the Bayesian posterior probability of a candidate model. Therefore, an approximated posterior probability for a candidate model can be calculated based upon SIC. We suggest a variant of SIC which includes the terms which are asymptotically negligible in the derivation of SIC. The variant improves upon the performance of SIC in small and moderate sample-size applications. Based upon the proposed variant, the corresponding posterior probability can be calculated for each candidate model. A hypothesis testing for multiple comparisons involves one or more models in the candidate class, the posterior probability of the hypothesis testing is therefore evaluated as the sum of the posterior probabilities for the models associated with the testing. The approximated posterior probability based on the variant accommodates the effect of the prior on each model in the candidate class, and therefore is more effectively approximated than that based on SIC for conducting multiple comparisons. We derive the computational formula of the approximated posterior probability based on the variant in the mixed model. The applications in two real data sets demonstrate that the proposed procedure based on the SIC variant can perform effectively in multiple comparisons.  相似文献   

12.
The standard hypothesis testing procedure in meta-analysis (or multi-center clinical trials) in the absence of treatment-by-center interaction relies on approximating the null distribution of the standard test statistic by a standard normal distribution. For relatively small sample sizes, the standard procedure has been shown by various authors to have poor control of the type I error probability, leading to too many liberal decisions. In this article, two test procedures are proposed, which rely on thet—distribution as the reference distribution. A simulation study indicates that the proposed procedures attain significance levels closer to the nominal level compared with the standard procedure.  相似文献   

13.
The Bradley–Terry model is widely and often beneficially used to rank objects from paired comparisons. The underlying assumption that makes ranking possible is the existence of a latent linear scale of merit or equivalently of a kind of transitiveness of the preference. However, in some situations such as sensory comparisons of products, this assumption can be unrealistic. In these contexts, although the Bradley–Terry model appears to be significantly interesting, the linear ranking does not make sense. Our aim is to propose a 2-dimensional extension of the Bradley–Terry model that accounts for interactions between the compared objects. From a methodological point of view, this proposition can be seen as a multidimensional scaling approach in the context of a logistic model for binomial data. Maximum likelihood is investigated and asymptotic properties are derived in order to construct confidence ellipses on the diagram of the 2-dimensional scores. It is shown by an illustrative example based on real sensory data on how to use the 2-dimensional model to inspect the lack-of-fit of the Bradley–Terry model.  相似文献   

14.
There is a wide variety of stochastic ordering problems where K groups (typically ordered with respect to time) are observed along with a (continuous) response. The interest of the study may be on finding the change-point group, i.e. the group where an inversion of trend of the variable under study is observed. A change point is not merely a maximum (or a minimum) of the time-series function, but a further requirement is that the trend of the time-series is monotonically increasing before that point, and monotonically decreasing afterwards. A suitable solution can be provided within a conditional approach, i.e. by considering some suitable nonparametric combination of dependent tests for simple stochastic ordering problems. The proposed procedure is very flexible and can be extended to trend and/or repeated measure problems. Some comparisons through simulations and examples with the well known Mack & Wolfe test for umbrella alternative and with Page’s test for trend problems with correlated data are investigated.  相似文献   

15.
Traditional statistical modeling of continuous outcome variables relies heavily on the assumption of a normal distribution. However, in some applications, such as analysis of microRNA (miRNA) data, normality may not hold. Skewed distributions play an important role in such studies and might lead to robust results in the presence of extreme outliers. We apply a skew-normal (SN) distribution, which is indexed by three parameters (location, scale and shape), in the context of miRNA studies. We developed a test statistic for comparing means of two conditions replacing the normal assumption with SN distribution. We compared the performance of the statistic with other Wald-type statistics through simulations. Two real miRNA datasets are analyzed to illustrate the methods. Our simulation findings showed that the use of a SN distribution can result in improved identification of differentially expressed miRNAs, especially with markedly skewed data and when the two groups have different variances. It also appeared that the statistic with SN assumption performs comparably with other Wald-type statistics irrespective of the sample size or distribution. Moreover, the real dataset analyses suggest that the statistic with SN assumption can be used effectively for identification of important miRNAs. Overall, the statistic with SN distribution is useful when data are asymmetric and when the samples have different variances for the two groups.  相似文献   

16.
In one-way ANOVA, most of the pairwise multiple comparison procedures depend on normality assumption of errors. In practice, errors have non-normal distributions so frequently. Therefore, it is very important to develop robust estimators of location and the associated variance under non-normality. In this paper, we consider the estimation of one-way ANOVA model parameters to make pairwise multiple comparisons under short-tailed symmetric (STS) distribution. The classical least squares method is neither efficient nor robust and maximum likelihood estimation technique is problematic in this situation. Modified maximum likelihood (MML) estimation technique gives the opportunity to estimate model parameters in closed forms under non-normal distributions. Hence, the use of MML estimators in the test statistic is proposed for pairwise multiple comparisons under STS distribution. The efficiency and power comparisons of the test statistic based on sample mean, trimmed mean, wave and MML estimators are given and the robustness of the test obtained using these estimators under plausible alternatives and inlier model are examined. It is demonstrated that the test statistic based on MML estimators is efficient and robust and the corresponding test is more powerful and having smallest Type I error.  相似文献   

17.
In order for predictive regression tests to deliver asymptotically valid inference, account has to be taken of the degree of persistence of the predictors under test. There is also a maintained assumption that any predictability in the variable of interest is purely attributable to the predictors under test. Violation of this assumption by the omission of relevant persistent predictors renders the predictive regression invalid, and potentially also spurious, as both the finite sample and asymptotic size of the predictability tests can be significantly inflated. In response, we propose a predictive regression invalidity test based on a stationarity testing approach. To allow for an unknown degree of persistence in the putative predictors, and for heteroscedasticity in the data, we implement our proposed test using a fixed regressor wild bootstrap procedure. We demonstrate the asymptotic validity of the proposed bootstrap test by proving that the limit distribution of the bootstrap statistic, conditional on the data, is the same as the limit null distribution of the statistic computed on the original data, conditional on the predictor. This corrects a long-standing error in the bootstrap literature whereby it is incorrectly argued that for strongly persistent regressors and test statistics akin to ours the validity of the fixed regressor bootstrap obtains through equivalence to an unconditional limit distribution. Our bootstrap results are therefore of interest in their own right and are likely to have applications beyond the present context. An illustration is given by reexamining the results relating to U.S. stock returns data in Campbell and Yogo (2006 Campbell, J. Y. and Yogo, M. (2006), “Efficient Tests of Stock Return Predictability,” Journal of Financial Economics, 81, 2760.[Crossref], [Web of Science ®] [Google Scholar]). Supplementary materials for this article are available online.  相似文献   

18.
This paper proposes two methods of estimation for the parameters in a Poisson-exponential model. The proposed methods combine the method of moments with a regression method based on the empirical moment generating function. One of the methods is an adaptation of the mixed-moments procedure of Koutrouvelis & Canavos (1999). The asymptotic distribution of the estimator obtained with this method is derived. Finite-sample comparisons are made with the maximum likelihood estimator and the method of moments. The paper concludes with an exploratory-type analysis of real data based on the empirical moment generating function.  相似文献   

19.
A general procedure for deriving the exact and asymptotic distributions of a certain class of test statistics in multivariate analysis is proposed. The method is based on an asymptotic expansion of gamma ratios in terms of generalized Bernoulli polynomials. The exact and asymptotic results are obtained and the method is illustrated in the problem of testing linear hypotheses in the multinomial case. In this problem the method yields Box's (1949) expansion as a special case.  相似文献   

20.
The purpose of this note is to criticize Nguyen (1985) for his account of the literature on the generalization of Fisher's exact test and to point out parallels with existing algorithms of the algorithm proposed by Nguyen. Subsequently we will briefly raise some questions on the methodology proposed by Nguyen.

Nguyen (1985) suggests that all literature on exact testing prior to Nguyen & Sampson (1985) is based on the “more probable” relation or Exact Probability Test (EPT) as a test statistic. This is not correct. Yates (1934 - Pearson's X2), Lewontin & Felsenstein (1965 - X2), Agresti & Wackerly (1977 - X2, Kendall's tau, Kruskal & Goodman's gamma), Klotz (1966 - Wilcoxon), Klotz & Teng (1977 - Kruskall & Wallis' H), Larntz (1978 - X2, loglike-lihood-ratio statistic G2, Freeman & Tukey statistic), and several others have investigated exact tests with other statistics than the EPT. In fact, Bennett & Nakamura (1963) are incorrectly cited as they investigated both X2 and G2, rather than EPT. Also, Freeman & Halton (1951) are incorrectly cited for they generalized Fisher's exact test to pxq tables and not 2xq tables as stated. And they are even predated by Yates (1934) who extended the test to 2×3 tables.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号