首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Gene expression data analysis provides scientists with a wealth of information about gene relationships, particularly the identification of significantly differentially expressed genes. However, there is no consensus on the analysis technique that will solve the inherent multiplicity problem (thousands of genes to be tested) and yield a reasonable and statistically justifiable number of differentially expressed genes. We propose the Multiplicity-Adjusted Order Statistics Analysis (MAOSA) to identify differentially expressed genes while adjusting for the multiple testing. The multiplicity problem will be eased by performing a Bonferroni correction on a small number of effects, since the majority of genes are not differentially expressed.  相似文献   

2.
Datasets that are subjectively labeled by a number of experts are becoming more common in tasks such as biological text annotation where class definitions are necessarily somewhat subjective. Standard classification and regression models are not suited to multiple labels and typically a pre-processing step (normally assigning the majority class) is performed. We propose Bayesian models for classification and ordinal regression that naturally incorporate multiple expert opinions in defining predictive distributions. The models make use of Gaussian process priors, resulting in great flexibility and particular suitability to text based problems where the number of covariates can be far greater than the number of data instances. We show that using all labels rather than just the majority improves performance on a recent biological dataset.  相似文献   

3.
SUMMARY
In using census data, a range of indicators is commonly used to indicate deprivation. This paper examines the validity of these indicators by exploring how well they predict income in surveys (the Family Expenditure Surveys of 1983 and 1990 and the General Household Survey of 1984) which also collect income data. A reasonably parsimonious set of seven socioeconomic variables (as well as controls for age, sex and region) explains about 40% of the variation in log-income. Our results provide a set of weights for a deprivation index and offer no support for the practice of assigning equal weights to the indicators. A census-based proxy would miss a sizable minority of the actual poor and misclassify some with higher incomes. A majority of the `deprived' are poor by a cash yardstick, but some are not.  相似文献   

4.
In the Bayesian analysis of a multiple-recapture census, different diffuse prior distributions can lead to markedly different inferences about the population size N. Through consideration of the Fisher information matrix it is shown that the number of captures in each sample typically provides little information about N. This suggests that if there is no prior information about capture probabilities, then knowledge of just the sample sizes and not the number of recaptures should leave the distribution of Nunchanged. A prior model that has this property is identified and the posterior distribution is examined. In particular, asymptotic estimates of the posterior mean and variance are derived. Differences between Bayesian and classical point and interval estimators are illustrated through examples.  相似文献   

5.
In order to assess the effectiveness of government programs designed to reduce disparities in the health care minority groups receive relative to the majority white population, a proper statistical measure should be used. This article proposes a measure of and its accompanying graph, which is readily interpretable and is not affected by the number of minority subgroups examined.  相似文献   

6.
For the classical linear regression problem, a number of estimators alternative to least squares have been proposed for situations in which multicollinearity is a problem. There is, however, relatively little known about how these estimators behave in practice. This paper investigates mean square error properties for a number of biased regression estimators, and discusses some practical implications of the use of such estimators, A conclusion is that certain types of ridge estimatorsappear to have good mean square error properties, and this may be useful in situations in which mean square error is important  相似文献   

7.
The development of a general methodology for the construction of good two-level nonregular designs has received significant attention over the last 10 years. Recent works by Phoa and Xu (2009) and Zhang et al. (2011) indicate that quaternary code (QC) designs are very promising in this regard. This paper explores a systematic construction for 1/8th and 1/16th fraction QC designs with high resolution for any number of factors. The 1/8th fraction QC designs often have larger resolution than regular designs of the same size. A majority of the 1/16th fraction QC designs also have larger resolution than comparable two-level regular designs.  相似文献   

8.
A method is presented for the sequential analysis of experiments involving two treatments to which response is dichotomous. Composite hypotheses about the difference in success probabilities are tested, and covariate information is utilized in the analysis. The method is based upon a generalization of Bartlett’s (1946) procedure for using the maximum likelihood estimate of a nuisance parameter in a Sequential Probability Ratio Test (SPRT). Treatment assignment rules studied include pure randomization, randomized blocks, and an adaptive rule which tends to assign the superior treatment to the majority of subjects. It is shown that the use of covariate information can result in important reductions in the expected sample size for specified error probabilities, and that the use of covariate information is essential for the elimination of bias when adaptive assignment rules are employed. Designs of the type presented are easily generated, as the termination criterion is the same as for a Wald SPRT of simple hypotheses.  相似文献   

9.
A nonlinear regression model for forecasting of passenger flow between various spatial points (towns) is described. Unknown parameters are estimated using aggregated data when the information about a number of the departed passengers from each town is available only. For estimation, the least squares and maximum likelihood methods are used. Numerical examples are performed to illustrate the proposed approaches.  相似文献   

10.
A Y-linked two-sex branching process with blind choice is a suitable model for analyzing the evolution of the number of carriers of two alleles of a Y-linked gene in a two-sex monogamous population where each female chooses her partner from among the male population without caring about his type (i.e., the allele he carries). This work focuses on the development of Bayesian inference for this model, considering a parametric framework with the reproduction laws belonging to the power series family of distributions. A sample is considered given by the observation of the total number of females and males (regardless of their types) up to some generation as well as the number of each type of male in the last generation. Using a simulation method based on the Gibbs sampler, we approximate the posterior distributions of the main parameters of this model. The accuracy of the procedure based on this sample is illustrated by way of a simulated example.  相似文献   

11.
Current phylogenetic comparative methods generally employ the Ornstein–Uhlenbeck(OU) process for modeling trait evolution. Being able of tracking the optimum of a trait within a group of related species, the OU process provides information about the stabilizing selection where the population mean adopts a particular trait value. The optima of a trait may follow certain stochastic dynamics along the evolutionary history. In this paper, we extend the current framework by adopting a rate of evolution which behave according to pertinent stochastic dynamics. The novel model is applied to analyze about 225 datasets collected from the existing literature. Results validate that the new framework provides a better fit for the majority of these datasets.  相似文献   

12.
Summary.  A deterministic computer model is to be used in a situation where there is uncertainty about the values of some or all of the input parameters. This uncertainty induces uncertainty in the output of the model. We consider the problem of estimating a specific percentile of the distribution of this uncertain output. We also suppose that the computer code is computationally expensive, so we can run the model only at a small number of distinct inputs. This means that we must consider our uncertainty about the computer code itself at all untested inputs. We model the output, as a function of its inputs, as a Gaussian process, and after a few initial runs of the code use a simulation approach to choose further suitable design points and to make inferences about the percentile of interest itself. An example is given involving a model that is used in sewer design.  相似文献   

13.
A non-linear congruential pseudo random number generator is introduced. This generator does not have the lattice structure in the distribution of tuples of consecutive pseudo random numbers which appears in the case of linear congruential generators. A theorem on the period length of sequences produced by this type of generators is proved. This theorem justifies an algorithm to determine the period length. Finally a simulation problem is described where a linear congruential generator produces completely useless results whereas good results are obtained if a non-linear congruential generator of about the same period length is applied.  相似文献   

14.
A number of different ways are examined of representing the characteristic function φ(t) of the lognormal distribution, which cannot be expanded in a Taylor series based on the moments. In §2 the use of a finite Taylor series is examined. A method of summing the divergent formal expansion is discussed in §3. In §4 the fact that φ(t) is a boundary analytic function is exploited. Asymptotic approximation of the integral defining φ(t) is studied in §5. Each approach produces some interesting information about the distribution.  相似文献   

15.
Quite often we are faced with a sparse number of observations over a finite number of cells and are interested in estimating the cell probabilities. Some local polynomial smoothers or local likelihood estimators have been proposed to improve on the histogram, which would produce too many zero values. We propose a relativized local polynomial smoothing for this problem, weighting heavier the estimating errors in small probability cells. A simulation study about the estimators that are proposed show a good behaviour with respect to natural error criteria, especially when dealing with sparse observations.  相似文献   

16.
A test of congruence among distance matrices is described. It tests the hypothesis that several matrices, containing different types of variables about the same objects, are congruent with one another, so they can be used jointly in statistical analysis. Raw data tables are turned into similarity or distance matrices prior to testing; they can then be compared to data that naturally come in the form of distance matrices. The proposed test can be seen as a generalization of the Mantel test of matrix correspondence to any number of distance matrices. This paper shows that the new test has the correct rate of Type I error and good power. Power increases as the number of objects and the number of congruent data matrices increase; power is higher when the total number of matrices in the study is smaller. To illustrate the method, the proposed test is used to test the hypothesis that matrices representing different types of organoleptic variables (colour, nose, body, palate and finish) in single‐malt Scotch whiskies are congruent.  相似文献   

17.
A systematic method of developing or raising the offsprings of parents or lines that are subjected to analysis to draw valid inferences about parents is called a Mating Design (MD). A Mating Design represents only a part of a genetic experiment. Diallel and the four North Carolina (NC) designs, Triallel and Double Crosses are notable examples of mating designs. In this paper, an attempt has been made to provide a systematic method of construction of Partial Triallel Crosses (PTC) using Trojan Square Design (TSD), which requires only the fraction of the number of crosses to be made compared with Triallel Crosses.  相似文献   

18.
MODEL-BASED VARIANCE ESTIMATION IN SURVEYS WITH STRATIFIED CLUSTERED DESIGN   总被引:1,自引:0,他引:1  
A model-based method for estimating the sampling variances of estimators of (sub-)population means, proportions, quantiles, and regression parameters in surveys with stratified clustered design is described and applied to a survey of US secondary education. The method is compared with the jackknife by a simulation study. The model-based estimators of the sampling variances have much smaller mean squared errors than their jackknife counterparts. In addition, they can be improved by incorporating information about the unknown parameters (variances) from external sources. A regression-based smoothing method for estimating the sampling variances of the estimators for a large number of subpopulation means is proposed. Such smoothing may be invaluable when subpopulations are represented in the sample by only few subjects.  相似文献   

19.
In this note, we propose a new method for selecting the bandwidth parameter in non-parametric regression. While standard criteria, such as cross-validation, are based on the true regression curve about which we know little, we propose a criterion which focuses on the true errors about which assumptions may be made. Our proposal is to choose the bandwidth for which the residuals are as uncorrelated as possible. We use the Box-Pierce statistic as the objective to be minimized. In doing so, the behaviour of our residuals will be close to that of the true errors under the hypothesis of independent errors. A simulation study shows that our method succeeds in capturing the main features of the regression curve, in the sense that the number of turning-points of the curve is correctly estimated most of the time.  相似文献   

20.
The option to stop a project is fundamental in drug development. The majority of drugs do not reach the market. Furthermore, many marketed drugs do not repay their development costs. It is therefore crucial to optimize the value of the option to stop. We formulate two examples of statistical models. One is based on success/failure in a series of trials; the other assumes that the commercial value evolves as a stochastic process as more information becomes available. These models are used to study a number of issues: the number and timing of decision points; value of information; speed of development; and order of trials. The results quantify the value of options. They show that early information that can change key decisions is most valuable. That is, we should nip bad projects in the bud. Modelling is also useful to analyse more complex decisions, for example, weighting the value of decision points against the cost of information or the speed of development. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号