期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Confidence interval estimation of the difference between paired AUCs based on combined biomarkers

Lili Tian Albert Vexler Li Yan Enrique F. Schisterman 《Journal of statistical planning and inference》2009

In many diagnostic studies, multiple diagnostic tests are performed on each subject or multiple disease markers are available. Commonly, the information should be combined to improve the diagnostic accuracy. We consider the problem of comparing the discriminatory abilities between two groups of biomarkers. Specifically, this article focuses on confidence interval estimation of the difference between paired AUCs based on optimally combined markers under the assumption of multivariate normality. Simulation studies demonstrate that the proposed generalized variable approach provides confidence intervals with satisfying coverage probabilities at finite sample sizes. The proposed method can also easily provide P-values for hypothesis testing. Application to analysis of a subset of data from a study on coronary heart disease illustrates the utility of the method in practice. 相似文献

2.

Exact conditional inference for two uniform populations

R.A. Roberts 《Journal of statistical planning and inference》2009

We discuss the nature of ancillary information in the context of the continuous uniform distribution. In the one-sample problem, the existence of sufficient statistics mitigates conditioning on the ancillary configuration. In the two-sample problem, additional ancillary information becomes available when the ratio of scale parameters is known. We give exact results for conditional inferences about the common scale parameter and for the difference in location parameters of two uniform distributions. The ancillary information affects the precision of the latter through a comparison of the sample value of the ratio of scale parameters with the known population value. A limited conditional simulation compares the Type I errors and power of these exact results with approximate results using the robust pooled t-statistic. 相似文献

3.

The effect of verification bias on the comparison of predictive values of two binary diagnostic tests

J.A. Roldán Nofuentes J.D. Luna del Castillo 《Journal of statistical planning and inference》2008

The comparison of the accuracy of two binary diagnostic tests has traditionally required knowledge of the disease status in all of the patients in the sample via the application of a gold standard. In practice, the gold standard is not always applied to all patients in a sample, and the problem of partial verification of the disease arises. The accuracy of a binary diagnostic test can be measured in terms of positive and negative predictive values, which represent the accuracy of a diagnostic test when it is applied to a cohort of patients. In this paper, we deduce the maximum likelihood estimators of predictive values (PVs) of two binary diagnostic tests, and the hypothesis tests to compare these measures when, in the presence of partial disease verification, the verification process only depends on the results of the two diagnostic tests. The effect of verification bias on the naïve estimators of PVs of two diagnostic tests is studied, and simulation experiments are performed in order to investigate the small sample behaviour of hypothesis tests. The hypothesis tests which we have deduced can be applied when all of the patients are verified with the gold standard. The results obtained have been applied to the diagnosis of coronary stenosis. 相似文献

4.

Statistical methods for the analysis of high-throughput data based on functional profiles derived from the Gene Ontology

Alex Sánchez Miquel Salicrú Jordi Ocaña 《Journal of statistical planning and inference》2007

The increasing availability of high-throughput data, that is, massive quantities of molecular biology data arising from different types of experiments such as gene expression or protein microarrays, leads to the necessity of methods for summarizing the available information. As annotation quality improves it is becoming common to rely on biological annotation databases, such as the Gene Ontology (GO), to build functional profiles which characterize a set of genes or proteins using the distribution of their annotations in the database. In this work we describe a statistical model for such profiles, provide methods to compare profiles and develop inferential procedures to assess this comparison. An R-package implementing the methods will be available at publication time. 相似文献

5.

Consistent fractional Bayes factor for nested normal linear models

《Journal of statistical planning and inference》2001,97(2):305-321

In the Bayesian approach to parametric model comparison, the use of improper priors is problematic due to the indeterminacy of the resulting Bayes factor (BF). The need for developing automatic and robust methods for model comparison has led to the introduction of alternative BFs. Intrinsic Bayes factors (Berger and Pericchi, 1996a) and fractional Bayes factors (FBF) (O'Hagan, 1995) are two alternative strategies for default model selection. We show in this paper that the FBF can be inconsistent. To overcome this problem, we propose a generalization of the FBF approach that leads to the usual FBF or to some variants of it in some special cases. As an important problem, we consider and discuss this generalization for model selection in nested linear models. 相似文献

6.

Optimal $$L_2$$-norm empirical importance weights for the change of probability measure

Sergio Amaral Douglas Allaire Karen Willcox 《Statistics and Computing》2017,27(3):625-643

相似文献

7.

Proportion estimation in mixtures with covariates

Chi - Ying Leung 《统计学通讯:理论与方法》2013,42(11):3123-3141

Linear maps of a single unclassified observation are used to estimate the mixing proportion in a mixture of two populations with homogeneous variances in the presence of covariates. with complete knowledge of the parameters of the individual populations, the linear map for which the estimator is unbiased and has minimum variance amongst all similar estimators can be determined. Plug-in estimator based on independent training samples from the component populations can be constructed and is asymptotically equivalent to Cochran's classification statistic V* for covariate classification; see Memon and Okamoto (1970). Under normality assumptions, asymptotic expansion of the distribution of the plug-in estimator is available. In the absence of covariates, our estimator reduces to that suggested by Walker (1980) who has investigated the problem based on information on large unclassified samples from a mixture of two populations with heterogeneous variances. In contrast, distribution of Walker's estimator seems intractable in moderate sample sizes even with normality assumption. 相似文献

8.

Testing the difference of two exponential means using generalized p-values

Malwane M. A. Ananda Samaradasa Weerahandi 《统计学通讯:模拟与计算》2013,42(2):521-532

There are no exact fixed-level tests for testing the null hypothesis that the difference of two exponential means is less than or equal to a prespecified value θ₀. For this testing problem, there are several approximate testing procedures available in the literature. Using an extended definition of p-values, Tsui and Weerahandi (1989) gave an exact significance test for this testing problem. In this paper, the performance of that procedure is investigated and is compared with approximate procedures. A size and power comparison is carried out using a simulation study. Its findings show that the test based on the generalized p-value guarantees the intended size and that it is either as good as or outperforms approximate procedures available in the literature, both in power and in size. 相似文献

9.

Multivariate analysis of variance with additional data from an unknown subset of the populations

R. Radhakrishnan Don R. Robinson 《统计学通讯:模拟与计算》2013,42(3):659-669

In this paper we consider the problem of testing the means of k multivariate normal populations with additional data from an unknown subset of the k populations. The purpose of this research is to offer test procedures utilizing all the available data for the multivariate analysis of variance problem because the additional data may contain valuable information about the parameters of the k populations. The standard procedure uses only the data from identified populations. We provide a test using all available data based upon Hotelling' s generalized T²statistic. The power of this test is computed using Betz's approximation of Hotelling' s generalized T²statistic by an F-distribution. A comparison of the power of the test and the standard test procedure is also given. 相似文献

10.

On the choice of the prior distribution in hypergeometric sampling

Danny Dyer Rebecca L. Pierce 《统计学通讯:理论与方法》2013,42(8):2125-2146

Information in a statistical procedure arising from sources other than sampling is called prior information, and its incorporation into the procedure forms the basis of the Bayesian approach to statistics. Under hypergeometric sampling, methodology is developed which quantifies the amount of information provided by the sample data relative to that provided by the prior distribution and allows for a ranking of prior distributions with respect to conservativeness, where conservatism refers to restraint of extraneous information embedded in any prior distribution. The most conservative prior distribution from a specified class (each member of which carries the available prior information) is that prior distribution within the class over which the likelihood function has the greatest average domination. Four different families of prior distributions are developed by considering a Bayesian approach to the formation of lots. The most conservative prior distribution from each of the four families of prior distributions is determined and compared for the situation when no prior information is available. The results of the comparison advocate the use of the Polya (beta-binomial) prior distribution in hypergeometric sampling. 相似文献

11.

A test for the mean vector in large dimension and small samples

Junyong Park Deepak Nag Ayyala 《Journal of statistical planning and inference》2013

In this paper, we consider the problem of testing the mean vector in the multivariate setting where the dimension p is greater than the sample size n, namely a large p and small n problem. We propose a new scalar transform invariant test and show the asymptotic null distribution and power of the proposed test under weaker conditions than Srivastava (2009). We also present numerical studies including simulations and a real example of microarray data with comparison to existing tests developed for a large p and small n problem. 相似文献

12.

Asymptotic comparison of the critical values of step-down and step-up multiple comparison procedures

《Journal of statistical planning and inference》1999,79(1):11-30

We consider the problem of comparing step-down and step-up multiple test procedures for testing n hypotheses when independent p-values or independent test statistics are available. The defining critical values of these procedures for independent test statistics are asymptotically equal, which yields a theoretical argument for the numerical observation that the step-up procedure is mostly more powerful than the step-down procedure. The main aim of this paper is to quantify the differences between the critical values more precisely. As a by-product we also obtain more information about the gain when we consider two subsequent steps of these procedures. Moreover, we investigate how liberal the step-up procedure becomes when the step-up critical values are replaced by their step-down counterparts or by more refined approximate values. The results for independent p-values are the basis for obtaining corresponding results when independent real-valued test statistics are at hand. It turns out that the differences of step-down and step-up critical values as well as the differences between subsequent steps tend to zero for many distributions, except for heavy-tailed distributions. The Cauchy distribution yields an example where the critical values of both procedures are nearly linearly increasing in n. 相似文献

13.

Estimation of a population size through capture-mark-recapture method: a comparison of various point and interval estimators

《Journal of Statistical Computation and Simulation》2012,82(3):335-354

This article deals with the estimation of a fixed population size through capture-mark-recapture method that gives rise to hypergeometric distribution. There are a few well-known and popular point estimators available in the literature, but no good comprehensive comparison is available about their merits. Apart from the available estimators, an empirical Bayes (EB) estimator of the population size is proposed. We compare all the point estimators in terms of relative bias and relative mean squared error. Next, two new interval estimators – (a) an EB highest posterior distribution interval and (b) a frequentist interval estimator based on a parametric bootstrap method, are proposed. The comparison is then carried among the two proposed interval estimators and interval estimators derived from the currently available estimators in terms of coverage probability and average length (AL). Based on comprehensive numerical results, we rank and recommend the point estimators as well as interval estimators for practical use. Finally, a real-life data set for a green treefrog population is used as a demonstration for all the methods discussed. 相似文献

14.

Multistage bandit problems

《Journal of statistical planning and inference》1996,53(2):153-170

Consider two or more treatments with dichotomous responses. The total number N of experimental units are to be allocated in a fixed number r of stages. The problem is to decide how many units to assign to each treatment in each stage. Responses from selections in previous stages are available and can be considered but responses in the current stage are not available until the next group of selections is made. Information is updated via the Bayes theorem after each stage. The goal is to maximize the overall expected number of successes in the N units.Two forms of prior information are considered: (i) All arms have beta priors, and (ii) prior distributions have continuous densities. Various characteristics of optimal decisions are presented. For example, in most cases of (i) and (ii), the rate of the optimal size of the first stage cannot be greater than √N when r = 2. 相似文献

15.

Low order approximations in deconvolution and regression with errors in variables 总被引：1，自引：0，他引：1

Raymond J. Carroll Peter Hall 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2004,66(1):31-46

Summary. We suggest two new methods, which are applicable to both deconvolution and regression with errors in explanatory variables, for nonparametric inference. The two approaches involve kernel or orthogonal series methods. They are based on defining a low order approximation to the problem at hand, and proceed by constructing relatively accurate estimators of that quantity rather than attempting to estimate the true target functions consistently. Of course, both techniques could be employed to construct consistent estimators, but in many contexts of importance (e.g. those where the errors are Gaussian) consistency is, from a practical viewpoint, an unattainable goal. We rephrase the problem in a form where an explicit, interpretable, low order approximation is available. The information that we require about the error distribution (the error-in-variables distribution, in the case of regression) is only in the form of low order moments and so is readily obtainable by a rudimentary analysis of indirect measurements of errors, e.g. through repeated measurements. In particular, we do not need to estimate a function, such as a characteristic function, which expresses detailed properties of the error distribution. This feature of our methods, coupled with the fact that all our estimators are explicitly defined in terms of readily computable averages, means that the methods are particularly economical in computing time. 相似文献

16.

Clustering gene expression time course data using mixtures of multivariate t-distributions

Paul D. McNicholas Sanjeena Subedi 《Journal of statistical planning and inference》2012,142(5):1114-1127

Clustering gene expression time course data is an important problem in bioinformatics because understanding which genes behave similarly can lead to the discovery of important biological information. Statistically, the problem of clustering time course data is a special case of the more general problem of clustering longitudinal data. In this paper, a very general and flexible model-based technique is used to cluster longitudinal data. Mixtures of multivariate t-distributions are utilized, with a linear model for the mean and a modified Cholesky-decomposed covariance structure. Constraints are placed upon the covariance structure, leading to a novel family of mixture models, including parsimonious models. In addition to model-based clustering, these models are also used for model-based classification, i.e., semi-supervised clustering. Parameters, including the component degrees of freedom, are estimated using an expectation-maximization algorithm and two different approaches to model selection are considered. The models are applied to simulated data to illustrate their efficacy; this includes a comparison with their Gaussian analogues—the use of these Gaussian analogues with a linear model for the mean is novel in itself. Our family of multivariate t mixture models is then applied to two real gene expression time course data sets and the results are discussed. We conclude with a summary, suggestions for future work, and a discussion about constraining the degrees of freedom parameter. 相似文献

17.

Information approach for a lifetime change-point model based on the exponential-logarithmic distribution

Ying-Ju Chen 《统计学通讯:模拟与计算》2013,42(7):1996-2003

ABSTRACT

The exponential-logarithmic distribution is a distribution which has a decreasing failure function and various applications such as in biological and engineering fields. In this paper, we study a change-point problem of this distribution. A procedure based on Schwarz information criterion is proposed to detect changes in parameters of this distribution. Simulations are conducted to indicate the performance of the proposed procedure under different scenarios. Applications on two real data are provided to illustrate the detection procedure. 相似文献

18.

On the mean past lifetime of the components of a parallel system

《Journal of statistical planning and inference》2006,136(4):1197-1206

In the study of reliability of the technical systems and subsystems, parallel systems play a very important role. In the present paper, we consider a parallel system consisting of n identical components with independent lifetimes having a common distribution function F. It is assumed that at time t the system has failed. Under these conditions, we obtain the mean past lifetime (MPL) of the components of the system. Some properties of MPL are studied. It is shown that the underlying distribution function F can be recovered from the proposed MPL. Also, a comparison between two parallel systems are made based on their MPLs in the case where the components of the system are ordered in terms of reversed hazard rate. Finally a characterization of the uniform distribution is given based on MPL. 相似文献

19.

Analysis of Bivariate Survival Data using Shared Inverse Gaussian Frailty Model

David D. Hanagal Richa Sharma 《统计学通讯:理论与方法》2013,42(7):1351-1380

In this article, we consider shared frailty model with inverse Gaussian distribution as frailty distribution and log-logistic distribution (LLD) as baseline distribution for bivariate survival times. We fit this model to three real-life bivariate survival data sets. The problem of analyzing and estimating parameters of shared inverse Gaussian frailty is the interest of this article and then compare the results with shared gamma frailty model under the same baseline for considered three data sets. Data are analyzed using Bayesian approach to the analysis of clustered survival data in which there is a dependence of failure time observations within the same group. The variance component estimation provides the estimated dispersion of the random effects. We carried out a test for frailty (or heterogeneity) using Bayes factor. Model comparison is made using information criteria and Bayes factor. We observed that the shared inverse Gaussian frailty model with LLD as baseline is the better fit for all three bivariate data sets. 相似文献

20.

Some practical issues in the evaluation of heterogeneous labour market programmes by matching methods 总被引：1，自引：0，他引：1

Michael Lechner 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2002,165(1):59-82

Summary. Recently several studies have analysed active labour market policies by using a recently proposed matching estimator for multiple programmes. Since there is only very limited practical experience with this estimator, this paper checks its sensitivity with respect to issues that are of practical importance in this kind of evaluation study. The estimator turns out to be fairly robust to several features that concern its implementation. Furthermore, the paper demonstrates that the matching approach per se is no panacea for solving all the problems of evaluation studies, but that its success depends critically on the information that is available in the data. Finally, a comparison with a bootstrap distribution provides some justification for using a simplified approximation of the distribution of the estimator that ignores its sequential nature. 相似文献