首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 709 毫秒
1.
Abstract.  In a recent paper we extended and refined some tools introduced by O'Hagan for criticism of Bayesian hierarchical models. Especially, avoiding double use of data by a data-splitting approach was a main concern. Such tools can be applied at each node of the model, with a view to diagnosing problems of model fit at any point in the model structure. As O'Hagan, we investigated a Gaussian model of one-way analysis of variance. Through extensive Markov chain Monte Carlo simulations it was shown that our method detects model misspecification about as well as the one of O'Hagan, when this is properly calibrated, while retaining the desired false warning probability for data generated from the assumed model. In the present paper, we suggest some new measures of conflict based on tail probabilities of the so-called integrated posterior distributions introduced in our recent paper. These new measures are equivalent to the measure applied in the latter paper in simple Gaussian models, but seem more appropriately adjusted to deviations from normality and to conflicts not concerning location parameters. A general linear normal model with known covariance matrices is considered in detail.  相似文献   

2.
In many scientific fields, it is interesting and important to determine whether an observed data stream comes from a prespecified model or not, particularly when the number of data streams is of large scale, where multiple hypotheses testing is necessary. In this article, we consider large-scale model checking under certain dependence among different data streams observed at the same time. We propose a false discovery rate (FDR) control procedure to check those unusual data streams. Specifically, we derive an approximation of false discovery and construct a point estimate of FDR. Theoretical results show that, under some mild assumptions, our proposed estimate of FDR is simultaneously conservatively consistent with the true FDR, and hence it is an asymptotically strong control procedure. Simulation comparisons with some competing procedures show that our proposed FDR procedure behaves better in general settings. Application of our proposed FDR procedure is illustrated by the StarPlus fMRI data.  相似文献   

3.
上市公司往往存在粉饰财务数据来美化企业经营状况的动机,这会降低财务风险预警模型预测的准确性。文章利用Benford律和Myer指数两种数据质量评估方法,构建Benford和Myer质量因子,引入BP神经网络模型,构造BM-BP神经网络财务风险预警模型;并进一步利用2000—2019年中国A股上市公司数据,评价数据质量因子对财务风险预警模型预测准确性的影响,分析新模型预测准确性的稳定性。实证分析结果显示:Benford和Myer质量因子提高了BP神经网络财务风险预警模型预测的准确性;在不同质量因子的比较结果中,包含评选指标Benford和Myer质量因子的BP神经网络财务风险预警模型具有较高的预测准确率和较低的二类误判率,稳定性良好;利用决策树算法筛选指标有效提高了新模型的预测准确性。  相似文献   

4.
ABSTRACT

Background: Instrumental variables (IVs) have become much easier to find in the “Big data era” which has increased the number of applications of the Two-Stage Least Squares model (TSLS). With the increased availability of IVs, the possibility that these IVs are weak has increased. Prior work has suggested a ‘rule of thumb’ that IVs with a first stage F statistic at least ten will avoid a relative bias in point estimates greater than 10%. We investigated whether or not this threshold was also an efficient guarantee of low false rejection rates of the null hypothesis test in TSLS applications with many IVs.

Objective: To test how the ‘rule of thumb’ for weak instruments performs in predicting low false rejection rates in the TSLS model when the number of IVs is large.

Method: We used a Monte Carlo approach to create 28 original data sets for different models with the number of IVs varying from 3 to 30. For each model, we generated 2000 observations for each iteration and conducted 50,000 iterations to reach convergence in rejection rates. The point estimate was set to 0, and probabilities of rejecting this hypothesis were recorded for each model as a measurement of false rejection rate. The relationship between the endogenous variable and IVs was carefully adjusted to let the F statistics for the first stage model equal ten, thus simulating the ‘rule of thumb.’

Results: We found that the false rejection rates (type I errors) increased when the number of IVs in the TSLS model increased while holding the F statistics for the first stage model equal to 10. The false rejection rate exceeds 10% when TLSL has 24 IVs and exceed 15% when TLSL has 30 IVs.

Conclusion: When more instrumental variables were applied in the model, the ‘rule of thumb’ was no longer an efficient guarantee for good performance in hypothesis testing. A more restricted margin for F statistics is recommended to replace the ‘rule of thumb,’ especially when the number of instrumental variables is large.  相似文献   

5.
A Bayesian discovery procedure   总被引:1,自引:0,他引:1  
Summary.  We discuss a Bayesian discovery procedure for multiple-comparison problems. We show that, under a coherent decision theoretic framework, a loss function combining true positive and false positive counts leads to a decision rule that is based on a threshold of the posterior probability of the alternative. Under a semiparametric model for the data, we show that the Bayes rule can be approximated by the optimal discovery procedure, which was recently introduced by Storey. Improving the approximation leads us to a Bayesian discovery procedure, which exploits the multiple shrinkage in clusters that are implied by the assumed non-parametric model. We compare the Bayesian discovery procedure and the optimal discovery procedure estimates in a simple simulation study and in an assessment of differential gene expression based on microarray data from tumour samples. We extend the setting of the optimal discovery procedure by discussing modifications of the loss function that lead to different single-thresholding statistics. Finally, we provide an application of the previous arguments to dependent (spatial) data.  相似文献   

6.
Large-scale simultaneous hypothesis testing appears in many areas. A well-known inference method is to control the false discovery rate. One popular approach is to model the z-scores derived from the individual t-tests and then use this model to control the false discovery rate. We propose a heteroscedastic contaminated normal mixture to describe the distribution of z-scores and design an EM-test for testing homogeneity in this class of mixture models. The proposed EM-test can be used to investigate whether a collection of z-scores has arisen from a single normal distribution or whether a heteroscedastic contaminated normal mixture is more appropriate. We show that the EM-test statistic has a shifted mixture of chi-squared limiting distribution. Simulation results show that the proposed testing procedure has accurate type-I error and significantly larger power than its competitors under a variety of model specifications. A real-data example is analysed to exemplify the application of the proposed method.  相似文献   

7.
尝试将BP神经网络引入基本养老保险基金的风险预警中,以期为其提供一种新的预警工具和方法。首先,建立了一套基本养老保险基金风险预警指标体系,构建了基于BP网络的基本养老保险基金风险预警模型;其次,采集1996年至2008年间的上海市各年度基本养老保险基金的历史数据对该模型进行了反复训练和学习,取得了误差率仅为3.86%的预测结果,说明该模型有很好的拟合度;最后,依据国际国内经验对基本养老保险基金的警情指标设置了五个警度输出区间。  相似文献   

8.
耿鹏  齐红倩 《统计研究》2012,29(1):8-14
传统实证研究中使用的当期特定数据存在滞后信息和噪音信息缺陷,导致模型估计结果存在偏误。应用宏观经济实时数据可以有效的剔除造成模型偏误的滞后信息和噪音信息,得到更为准确的估计结果。MIDAS模型可将低频的关键经济数据与高频数据同时估计,较好的解决了应用一般模型存在的高频数据信息损失问题。本文应用M-MIDAS-DL模型与季度GDP实时数据建立我国季度GDP预测模型,实证表明,应用实时数据与组合预测方法,能及时准确预测出2008年以来中国经济增长率的下滑与反弹走势,能起到较好的提前预警作用,是当前较为有效的经济预测手段之一。  相似文献   

9.
Array-based comparative genomic hybridization (aCGH) is a high-resolution high-throughput technique for studying the genetic basis of cancer. The resulting data consists of log fluorescence ratios as a function of the genomic DNA location and provides a cytogenetic representation of the relative DNA copy number variation. Analysis of such data typically involves estimation of the underlying copy number state at each location and segmenting regions of DNA with similar copy number states. Most current methods proceed by modeling a single sample/array at a time, and thus fail to borrow strength across multiple samples to infer shared regions of copy number aberrations. We propose a hierarchical Bayesian random segmentation approach for modeling aCGH data that utilizes information across arrays from a common population to yield segments of shared copy number changes. These changes characterize the underlying population and allow us to compare different population aCGH profiles to assess which regions of the genome have differential alterations. Our method, referred to as BDSAcgh (Bayesian Detection of Shared Aberrations in aCGH), is based on a unified Bayesian hierarchical model that allows us to obtain probabilities of alteration states as well as probabilities of differential alteration that correspond to local false discovery rates. We evaluate the operating characteristics of our method via simulations and an application using a lung cancer aCGH data set.  相似文献   

10.
Bayesian finite mixture modelling is a flexible parametric modelling approach for classification and density fitting. Many areas of application require distinguishing a signal from a noise component. In practice, it is often difficult to justify a specific distribution for the signal component; therefore, the signal distribution is usually further modelled via a mixture of distributions. However, modelling the signal as a mixture of distributions is computationally non-trivial due to the difficulties in justifying the exact number of components to be used and due to the label switching problem. This paper proposes the use of a non-parametric distribution to model the signal component. We consider the case of discrete data and show how this new methodology leads to more accurate parameter estimation and smaller false non-discovery rate. Moreover, it does not incur the label switching problem. We show an application of the method to data generated by ChIP-sequencing experiments.  相似文献   

11.
Summary.  As biological knowledge accumulates rapidly, gene networks encoding genomewide gene–gene interactions have been constructed. As an improvement over the standard mixture model that tests all the genes identically and independently distributed a priori , Wei and co-workers have proposed modelling a gene network as a discrete or Gaussian Markov random field (MRF) in a mixture model to analyse genomic data. However, how these methods compare in practical applications is not well understood and this is the aim here. We also propose two novel constraints in prior specifications for the Gaussian MRF model and a fully Bayesian approach to the discrete MRF model. We assess the accuracy of estimating the false discovery rate by posterior probabilities in the context of MRF models. Applications to a chromatin immuno-precipitation–chip data set and simulated data show that the modified Gaussian MRF models have superior performance compared with other models, and both MRF-based mixture models, with reasonable robustness to misspecified gene networks, outperform the standard mixture model.  相似文献   

12.
Spatial modeling has gained interest in ecology during the past two decades, especially in the area of biodiversity, where reliable distribution maps are required. Several methods have been proposed to construct distribution maps, most of them acknowledging the presence of spatial interactions. In many cases, a key problem is the lack of true absence data. We present here a model suitable for use when true absence data are missing. The quality of the estimates obtained from the model is evaluated using ROC curve analysis as well as a quadratic cost function, computed from the false positive and false negative error rates. The model is also tested under random and clustered scattering of the presence records. We also present an application of the model to the construction of distribution maps of two endemic bird species in México.  相似文献   

13.
In this article we develop a class of stochastic boosting (SB) algorithms, which build upon the work of Holmes and Pintore (Bayesian Stat. 8, Oxford University Press, Oxford, 2007). They introduce boosting algorithms which correspond to standard boosting (e.g. Bühlmann and Hothorn, Stat. Sci. 22:477–505, 2007) except that the optimization algorithms are randomized; this idea is placed within a Bayesian framework. We show that the inferential procedure in Holmes and Pintore (Bayesian Stat. 8, Oxford University Press, Oxford, 2007) is incorrect and further develop interpretational, computational and theoretical results which allow one to assess SB’s potential for classification and regression problems. To use SB, sequential Monte Carlo (SMC) methods are applied. As a result, it is found that SB can provide better predictions for classification problems than the corresponding boosting algorithm. A theoretical result is also given, which shows that the predictions of SB are not significantly worse than boosting, when the latter provides the best prediction. We also investigate the method on a real case study from machine learning.  相似文献   

14.
In the big data era, it is often needed to resolve the problem of parsimonious data representation. In this paper, the data under study are curves and the sparse representation is based on a semiparametric model. Indeed, we propose an original registration model for noisy curves. The model is built transforming an unknown function by plane similarities. We develop a statistical method that allows to estimate the parameters characterizing the plane similarities. The properties of the statistical procedure are studied. We show the convergence and the asymptotic normality of the estimators. Numerical simulations and a real-life aeronautic example illustrate and demonstrate the strength of our methodology.  相似文献   

15.
This article describes a method for simulating n-dimensional multivariate non-normal data, with emphasis on count-valued data. Dependence is characterized by either Pearson correlations or Spearman correlations. The simulation is accomplished by simulating a vector of correlated standard normal variates. The elements of this vector are then transformed to achieve the target marginal distributions. We prove that the method corresponds to simulating data from a multivariate Gaussian copula. The simulation method does not restrict pairwise dependence beyond the limits imposed by the marginal distributions and can achieve any Pearson or Spearman correlation within those limits. Two examples are included. In the first example, marginal means, variances, Pearson correlations, and Spearman correlations are estimated from the epileptic seizure data set of Diggle et al. [P. Diggle, P. Heagerty, K.Y. Liang, and S. Zeger, Analysis of Longitudinal Data, Oxford University Press, Oxford, 2002]. Data with these means and variances are simulated to first achieve the estimated Pearson correlations and then achieve the estimated Spearman correlations. The second example is of a hypothetical time series of Poisson counts with seasonal mean ranging between 1 and 9 and an autoregressive(1) dependence structure.  相似文献   

16.
Statistical approaches for addressing multiplicity in clinical trials range from the very conservative (the Bonferroni method) to the least conservative the fixed sequence approach. Recently, several authors proposed methods that combine merits of the two extreme approaches. Wiens [2003. A fixed sequence Bonferroni procedure for testing multiple endpoints. Pharmaceutical Statist. 2003, 2, 211–215], for example, considered an extension of the Bonferroni approach where the type I error rate (α)(α) is allocated among the endpoints, however, testing proceeds in a pre-determined order allowing the type I error rate to be saved for later use as long as the null hypotheses are rejected. This leads to a higher power of the test in testing later null hypotheses. In this paper, we consider an extension of Wiens’ approach by taking into account correlations among endpoints for achieving higher flexibility in testing. We show strong control of the family-wise type I error rate for this extension and provide critical values and significance levels for testing up to three endpoints with equal correlations and show how to calculate them for other correlation structures. We also present results of a simulation experiment for comparing the power of the proposed method with those of Wiens’ and others. The results of this experiment show that the magnitude of the gain in power of the proposed method depends on the prospective ordering of testing of the endpoints, the magnitude of the treatment effects of the endpoints and the magnitude of correlation between endpoints. Finally, we consider applications of the proposed method for clinical trials with multiple time points and multiple doses, where correlations among endpoints frequently arise.  相似文献   

17.
Elicitation     
There are various situations in which it may be important to obtain expert opinion about some unknown quantity or quantities. But it is not enough simply to ask the expert for an estimate of the unknown quantity: we also need to know how far from that estimate the true value might be. Tony O'Hagan describes the process of elicitation: the formulation of the expert's knowledge in the form of a probability distribution.  相似文献   

18.
We consider hypothesis testing problems for low‐dimensional coefficients in a high dimensional additive hazard model. A variance reduced partial profiling estimator (VRPPE) is proposed and its asymptotic normality is established, which enables us to test the significance of each single coefficient when the data dimension is much larger than the sample size. Based on the p‐values obtained from the proposed test statistics, we then apply a multiple testing procedure to identify significant coefficients and show that the false discovery rate can be controlled at the desired level. The proposed method is also extended to testing a low‐dimensional sub‐vector of coefficients. The finite sample performance of the proposed testing procedure is evaluated by simulation studies. We also apply it to two real data sets, with one focusing on testing low‐dimensional coefficients and the other focusing on identifying significant coefficients through the proposed multiple testing procedure.  相似文献   

19.
We introduce a multivariate heteroscedastic measurement error model for replications under scale mixtures of normal distribution. The model can provide a robust analysis and can be viewed as a generalization of multiple linear regression from both model structure and distribution assumption. An efficient method based on Markov Chain Monte Carlo is developed for parameter estimation. The deviance information criterion and the conditional predictive ordinates are used as model selection criteria. Simulation studies show robust inference behaviours of the model against both misspecification of distributions and outliers. We work out an illustrative example with a real data set on measurements of plant root decomposition.  相似文献   

20.
The false discovery rate (FDR) has become a popular error measure in the large-scale simultaneous testing. When data are collected from heterogenous sources and form grouped hypotheses testing, it may be beneficial to use the distinct feature of groups to conduct the multiple hypotheses testing. We propose a stratified testing procedure that uses different FDR levels according to the stratification features based on p-values. Our proposed method is easy to implement in practice. Simulations studies show that the proposed method produces more efficient testing results. The stratified testing procedure minimizes the overall false negative rate (FNR) level, while controlling the overall FDR. An example from a type II diabetes mice study further illustrates the practical advantages of this new approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号