首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 405 毫秒
1.
Developing new medical tests and identifying single biomarkers or panels of biomarkers with superior accuracy over existing classifiers promotes lifelong health of individuals and populations. Before a medical test can be routinely used in clinical practice, its accuracy within diseased and non-diseased populations must be rigorously evaluated. We introduce a method for sample size determination for studies designed to test hypotheses about medical test or biomarker sensitivity and specificity. We show how a sample size can be determined to guard against making type I and/or type II errors by calculating Bayes factors from multiple data sets simulated under null and/or alternative models. The approach can be implemented across a variety of study designs, including investigations into one test or two conditionally independent or dependent tests. We focus on a general setting that involves non-identifiable models for data when true disease status is unavailable due to the nonexistence of or undesirable side effects from a perfectly accurate (i.e. ‘gold standard’) test; special cases of the general method apply to identifiable models with or without gold-standard data. Calculation of Bayes factors is performed by incorporating prior information for model parameters (e.g. sensitivity, specificity, and disease prevalence) and augmenting the observed test-outcome data with unobserved latent data on disease status to facilitate Gibbs sampling from posterior distributions. We illustrate our methods using a thorough simulation study and an application to toxoplasmosis.  相似文献   

2.
In this article, we analyze the three-way bootstrap estimate of the variance of the reader-averaged nonparametric area under the receiver operating characteristic (ROC) curve. The setting for this work is medical imaging, and the experimental design involves sampling from three distributions: a set of normal and diseased cases (patients), and a set of readers (doctors). The experiment we consider is fully crossed in that each reader reads each case. A reading generates a score that indicates the reader's level of suspicion that the patient is diseased. The distribution of scores for the normal patients is compared to the distribution of scores for the diseased patients via an ROC curve, and the area under the ROC curve (AUC) summarizes the reader's diagnostic ability to separate the normal patients from the diseased ones. We find that the bootstrap estimate of the variance of the reader-averaged AUC is biased, and we represent this bias in terms of moments of success outcomes. This representation helps unify and improve several current methods for multi-reader multi-case (MRMC) ROC analysis.  相似文献   

3.
In this article, we study some results related to a specific class of distributions, called skew-curved-symmetric family of distributions that depends on a parameter controlling the skewness and kurtosis at the same time. Special elements of this family which are studied include symmetric and well-known asymmetric distributions. General results are given for the score function and the observed information matrix. It is shown that the observed information matrix is always singular for some special cases. We illustrate the flexibility of this class of distributions with an application to a real dataset on characteristics of Australian athletes.  相似文献   

4.
A unit ω is to be classified into one of two correlated homoskedastic normal populations by linear discriminant function known as W classification statistic [T.W. Anderson, An asymptotic expansion of the distribution of studentized classification statistic, Ann. Statist. 1 (1973), pp. 964–972; T.W. Anderson, An Introduction to Multivariate Statistical Analysis, 2nd edn, Wiley, New York, 1984; G.J. Mclachlan, Discriminant Analysis and Statistical Pattern Recognition, John Wiley and Sons, New York, 1992]. The two populations studied here are two different states of the same population, like two different states of a disease where the population is the population of diseased patient. When a sample unit is observed in both the states (populations), the observations made on it (which form a pair) become correlated. A training sample is unbalanced when not all sample units are observed in both the states. Paired and also unbalanced samples are natural in studies related to correlated populations. S. Bandyopadhyay and S. Bandyopadhyay [Choosing better training sample for classifying an individual into one of two correlated normal populations, Calcutta Statist. Assoc. Bull. 54(215–216) (2003), pp. 167–180] studied the effect of unbalanced training sample structure on the performance of W statistics in the univariate correlated normal set-up for finding optimal sampling strategy for a better classification rate. In this study, the results are extended to the multivariate case with discussion on application in real scenario.  相似文献   

5.
Consider a population of individuals who are free of a disease under study, and who are exposed simultaneously at random exposure levels, say X,Y,Z,… to several risk factors which are suspected to cause the disease in the populationm. At any specified levels X=x, Y=y, Z=z, …, the incidence rate of the disease in the population ot risk is given by the exposure–response relationship r(x,y,z,…) = P(disease|x,y,z,…). The present paper examines the relationship between the joint distribution of the exposure variables X,Y,Z, … in the population at risk and the joint distribution of the exposure variables U,V,W,… among cases under the linear and the exponential risk models. It is proven that under the exponential risk model, these two joint distributions belong to the same family of multivariate probability distributions, possibly with different parameters values. For example, if the exposure variables in the population at risk have jointly a multivariate normal distribution, so do the exposure variables among cases; if the former variables have jointly a multinomial distribution, so do the latter. More generally, it is demonstrated that if the joint distribution of the exposure variables in the population at risk belongs to the exponential family of multivariate probability distributions, so does the joint distribution of exposure variables among cases. If the epidemiologist can specify the differnce among the mean exposure levels in the case and control groups which are considered to be clinically or etiologically important in the study, the results of the present paper may be used to make sample size determinations for the case–control study, corresponding to specified protection levels, i.e., size α and 1–β of a statistical test. The multivariate normal, the multinomial, the negative multinomial and Fisher's multivariate logarithmic series exposure distributions are used to illustrate our results.  相似文献   

6.
This paper describes how importance sampling can be applied to estimate likelihoods for spatio-temporal stochastic models of epidemics in plant populations, where observations consist of the set of diseased individuals at two or more distinct times. Likelihood computation is problematic because of the inherent lack of independence of the status of individuals in the population whenever disease transmission is distance-dependent. The methods of this paper overcome this by partitioning the population into a number of sectors and then attempting to take account of this dependence within each sector, while neglecting that between-sectors. Application to both simulated and real epidemic data sets show that the techniques perform well in comparison with existing approaches. Moreover, the results confirm the validity of likelihood estimates obtained elsewhere using Markov chain Monte Carlo methods.  相似文献   

7.
In this paper we consider structural measurement error models within the elliptical family of distributions. We consider dependent and independent el? liptical models, each of which requires special treatment methodology. We discuss in each case estimation and hypothesis testing using maximum likelihood theory. As shown, most of the developments obtained under normal theory carries through to the dependent case. In the independent case, emphasis is placed on the ^-distribution, an important member of the elliptical family. Correcting likelihood ratio statistics in both cases is also of major interest.  相似文献   

8.
This paper aims to estimate the false negative fraction of a multiple screening test for bowel cancer, where those who give negative results for six consecutive tests do not have their true disease status verified. A subset of these same individuals is given a further screening test, for the sole purpose of evaluating the accuracy of the primary test. This paper proposes a beta heterogeneity model for the probability of a diseased individual ‘testing positive’ on any single test, and it examines the consequences of this model for inference on the false negative fraction. The method can be generalized to the case where selection for further testing is informative, though this did not appear to be the case for the bowel‐cancer data.  相似文献   

9.
A method for refining an equivariant binomial confidence procedure is presented which, when applied to an existing procedure, produces a new set of equivariant intervals that are uniformly superior. The family of procedures generated from this method constitute a complete class within the class of all equivariant procedures. In certain cases it is shown that this class is also minimal complete. Also, an optimally property, monotone minimaxity, is investigated, and monotone minimax procedures are constructed.  相似文献   

10.
In statistical practice, systematic sampling (SYS) is used in many modifications due to its simple handling. In addition, SYS may provide efficiency gains if it is well adjusted to the structure of the population under study. However, if SYS is based on an inappropriate picture of the population a high decrease of efficiency, i.e. a high increase in variance may result by changing from simple random sampling to SYS. In the context of two-stage designs SYS so far seems often in use for subsampling within the primary units. As an alternative to this practice, we propose to randomize the order of the primary units, then to select systematically a number of primary units and, thereafter, to draw secondary units by simple random sampling without replacement within the primary units selected. This procedure is more efficient than simple random sampling with replacement from the whole population of all secondary units, i.e. the variance of an adequate estimator for a total is never increased by changing from simple random sampling to randomized SYS whatever be the values associated by a characteristic with the secondary units, while there are values for which the variance decreases for the change mentioned. This result should hold generally, even if our proof, so far, is not complete for general sample sizes.  相似文献   

11.
On Block Ordering of Variables in Graphical Modelling   总被引:1,自引:0,他引:1  
Abstract.  In graphical modelling, the existence of substantive background knowledge on block ordering of variables is used to perform structural learning within the family of chain graphs (CGs) in which every block corresponds to an undirected graph and edges joining vertices in different blocks are directed in accordance with the ordering. We show that this practice may lead to an inappropriate restriction of the search space and introduce the concept of labelled block ordering B corresponding to a family of B - consistent CGs in which every block may be either an undirected graph or a directed acyclic graph or, more generally, a CG. In this way we provide a flexible tool for specifying subsets of chain graphs, and we observe that the most relevant subsets of CGs considered in the literature are families of B -consistent CGs for the appropriate choice of B . Structural learning within a family of B -consistent CGs requires to deal with Markov equivalence. We provide a graphical characterization of equivalence classes of B -consistent CGs, namely the B - essential graphs , as well as a procedure to construct the B -essential graph for any given equivalence class of B -consistent chain graphs. Both largest CGs and essential graphs turn out to be special cases of B -essential graphs.  相似文献   

12.
The locally stationary wavelet process model assumes some underlying wavelet family in order to generate the process. Analyses of such processes also assume that the same wavelet family is used to obtain unbiased estimates of the wavelet spectrum. In practice this would not typically be possible since, a priori, the underlying wavelet family is not known. This article considers the effect of wavelet choice within this setting. A particular focus is given to the estimation of the evolutionary wavelet spectrum due to its importance in many reported applications.  相似文献   

13.
Adaptive sampling without replacement of clusters   总被引:1,自引:0,他引:1  
In a common form of adaptive cluster sampling, an initial sample of units is selected by random sampling without replacement and, whenever the observed value of the unit is sufficiently high, its neighboring units are added to the sample, with the process of adding neighbors repeated if any of the added units are also high valued. In this way, an initial selection of a high-valued unit results in the addition of the entire network of surrounding high-valued units and some low-valued “edge” units where sampling stops. Repeat selections can occur when more than one initially selected unit is in the same network or when an edge unit is shared by more than one added network. Adaptive sampling without replacement of networks avoids some of this repeat selection by sequentially selecting initial sample units only from the part of the population not already in any selected network. The design proposed in this paper carries this step further by selecting initial units only from the population, exclusive of any previously selected networks or edge units.  相似文献   

14.
A common problem in medical statistics is the discrimination between two groups on the basis of diagnostic information. Information on patient characteristics is used to classify individuals into one of two groups: diseased or disease-free. This classification is often with respect to a particular disease. This discrimination has two probabilistic components: (1) the discrimination is not without error, and (2) in many cases the a priori chance of disease can be estimated. Logistic models (Cox 1970; Anderson 1972) provide methods for incorporating both of these components. The a posteriori probability of disease may be estimated for a patient on the basis of both current measurement of patient characteristics and prior information. The parameters of the logistic model may be estimated on the basis of a calibration trial. In practice, not one but several sets of measurements of one characteristic of the patient may be made on a questionable case. These measurements typically are correlated; they are far from independent. How should these correlated measurements be used? This paper presents a method for incorporating several sets of measurements in the classification of a case.  相似文献   

15.
The nearest neighbour analysis method has been developed to determine whether a disease case may be regarded as being unusually close to other neighbouring cases of the same disease. Using this method, each disease case is classified as spatially 'clustered' or 'non-clustered'. The method is also used to provide a test for global clustering. 'Clusters' are constructed by amalgamating geographically neighbouring clustered cases into one contiguous 'cluster area'. This paper describes a method for studying differences between clustered and non-clustered cases, in terms of case 'attributes'. These attributes may be person related, such as age and sex, or area based, such as geographical isolation. The area-based variables are subject to geographical correlation. The comparison of clustered and non-clustered cases may reveal similarities or differences, which may, in turn, give clues to disease aetiology. A method for studying 'linkage' or similarities in attributes between cases that occur in the same clusters is also described. The methods are illustrated by application to incidence data for leukaemias and lymphomas.  相似文献   

16.
Analysis of familial aggregation in the presence of varying family sizes   总被引:2,自引:0,他引:2  
Summary.  Family studies are frequently undertaken as the first step in the search for genetic and/or environmental determinants of disease. Significant familial aggregation of disease is suggestive of a genetic aetiology for the disease and may lead to more focused genetic analysis. Of course, it may also be due to shared environmental factors. Many methods have been proposed in the literature for the analysis of family studies. One model that is appealing for the simplicity of its computation and the conditional interpretation of its parameters is the quadratic exponential model. However, a limiting factor in its application is that it is not reproducible , meaning that all families must be of the same size. To increase the applicability of this model, we propose a hybrid approach in which analysis is based on the assumption of the quadratic exponential model for a selected family size and combines a missing data approach for smaller families with a marginalization approach for larger families. We apply our approach to a family study of colorectal cancer that was sponsored by the Cancer Genetics Network of the National Institutes of Health. We investigate the properties of our approach in simulation studies. Our approach applies more generally to clustered binary data.  相似文献   

17.
ABSTRACT

We consider the case of production units arranged into a number of groups. All units within a group choose output–input combinations from the same production possibilities set that is represented by a stochastic frontier model. The metafrontier is the envelope of the group-specific frontiers. We are interested in the metafrontier distance, which is the amount by which the group-specific frontier lies below the metafrontier.

Previous work has measured the metafrontier distance using the deterministic portion of the frontier. In a stochastic frontier model, this is not appropriate. We show how to evaluate the metafrontier distance, and we demonstrate the empirical relevance of this issue.  相似文献   

18.
The problem of the allocation of experimental units to experimental groups is studied within the context of generalized linear models. Optimal designs for the estimation of linear combinations of linear predictors are characterized, using concepts from the theory of optimal design. If there is only one linear combination of interest, then the D-optimal allocation is equivalent to the well-known Neyman allocation of subsamples in stratified sampling. However, if the number of linear combinations equals the number of design points, or experimental groups, then the equal replication of all design points is D-optimal. For cases in between, there are no easily accessible general solutions to the problem, although some particular cases are solved, including: i estimation of the n- 1 possible comparisons with a control group in an n-point, one-factor design; and ii estimation of 2 one or two of the four natural parameters of a 2 factorial design. The A-optimal allocations are determined in general.  相似文献   

19.
Summary.  The identification of factors that increase the chances of a certain disease is one of the classical and central issues in epidemiology. In this context, a typical measure of the association between a disease and risk factor is the odds ratio. We deal with design problems that arise for Bayesian inference on the odds ratio in the analysis of case–control studies. We consider sample size determination and allocation criteria for both interval estimation and hypothesis testing. These criteria are then employed to determine the sample size and proportions of units to be assigned to cases and controls for planning a study on the association between the incidence of a non-Hodgkin's lymphoma and exposition to pesticides by eliciting prior information from a previous study.  相似文献   

20.
A new method is described of drawing, without replacement, two sample units per stratum from any population. The method is developed from a consideration of the asymptotic properties of systematic sampling with unequal probabilities, as the sizes of the population units tend to zero. The essential properties of this method are very easily analysed. They also converge, over a large number of strata, to those of systematic sampling from the same strata with their population units arranged in random order. In proving this, the assumption is made that the underlying population is of the type to which it is appropriate to apply ratio estimation. The sampling method described is, however, simple enough to commend itself as an alternative to systematic sampling when the underlying population is not of this type. Consideration is given to the case where the sizes of some of the population units exceed the skip interval.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号