期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Identifying Guttman Structures in Incomplete Rasch Datasets

Lucio Bertoli-Barsotti Silvia Bacci 《统计学通讯:理论与方法》2014,43(3):470-497

In applications of IRT, it often happens that many examinees omit a substantial proportion of item responses. This can occur for various reasons, though it may well be due to no more than the simple fact of design incompleteness. In such circumstances, literature not infrequently refers to various types of estimation problem, often in terms of generic “convergence problems” in the software used to estimate model parameters. With reference to the Partial Credit Model and the instance of data missing at random, this article demonstrates that as their number increases, so does that of anomalous datasets, intended as those not corresponding to a finite estimate of (the vector parameter that identifies) the model. Moreover, the necessary and sufficient conditions for the existence and uniqueness of the maximum likelihood estimation of the Partial Credit Model (and hence, in particular, the Rasch model) in the case of incomplete data are given – with reference to the model in its more general form, the number of response categories varying according to item. A taxonomy of possible cases of anomaly is then presented, together with an algorithm useful in diagnostics. 相似文献

2.

大规模数据下基于充分降维的Leverage重要性抽样方法

秦磊王奕丹苏治《统计研究》2020,37(3):114-128

随着信息技术的飞速发展，大规模数据在短时间内搜集并储存下来，为分析决策提供了巨大的信息量，也给统计建模带来了一定难度。对于样本容量大、变量个数少的数据，Leverage重要性抽样是一个简便可行的方法。本文发现，该方法中度量样本重要性的Leverage分数与因变量无关，而且在维度较大的情形下对样本没有区分程度，使得估计结果较差。为了同时考虑因变量和维度的影响，本文提出了基于充分降维的Leverage重要性抽样方法。该方法以不损失信息为前提，在充分降维的空间内重新计算Leverage分数，使得抽样更具有代表性。模拟数据分析显示，在样本容量较大的复杂数据中，相比于原始的Leverage重要性抽样方法，本文提出的方法可以降低估计的均方误差。三个实际数据也证实了该方法的可行性和有效性。相似文献

3.

Two steps generalized maximum entropy estimation procedure for fitting linear regression when both covariates are subject to error

Amjad D. Al-Nasser 《Journal of applied statistics》2014,41(8):1708-1720

This paper presents a procedure utilizing the generalized maximum entropy (GME) estimation method in two steps to quantify the uncertainty of the simple linear structural measurement error model parameters exactly. The first step estimates the unknowns from the horizontal line, and then the estimates were used in a second step to estimate the unknowns from the vertical line. The proposed estimation procedure has the ability to minimize the number of unknown parameters in formulating the GME system within each step, and hence reduce variability of the estimates. Analytical and illustrative Monte Carlo simulation comparison experiments with the maximum likelihood estimators and a one-step GME estimation procedure were presented. Simulation experiments demonstrated that the two steps estimation procedure produced parameter estimates that are more accurate and more efficient than the classical estimation methods. An application of the proposed method is illustrated using a data set gathered from the Centre for Integrated Government Services in Delma Island – UAE to predict the association between perceived quality and the customer satisfaction. 相似文献

4.

面板数据随机前沿分析的研究综述 总被引：4，自引：0，他引：4

边文龙王向楠《统计研究》2016,33(6):13-20

近年来,面板数据随机前沿分析（SFA）越来越多地被用于测算各类决策单位的效率,取得了很多成果,但是国内外实证研究文献也存在过度依赖几种假设严格的模型和不注重模型局限性的问题。本文在统一的计量框架下,对面板SFA模型的发展研究进行了系统的梳理总结。本文将相关模型分为两大类：效率不随时间变化的模型和效率可随时间变化的模型,每一类又根据是否对效率项的分布做出假设分为：有分布假设的模型和无分布假设的模型。在明确和比较不同模型的假设、估计过程和局限性的基础上,我们对未来面板SFA模型的应用提出了建议。相似文献

5.

基于首末位质量因子的BP神经网络财务风险预警模型

杨贵军杜飞贾晓磊《统计与决策》2022,(3)

上市公司往往存在粉饰财务数据来美化企业经营状况的动机,这会降低财务风险预警模型预测的准确性。文章利用Benford律和Myer指数两种数据质量评估方法,构建Benford和Myer质量因子,引入BP神经网络模型,构造BM-BP神经网络财务风险预警模型;并进一步利用2000—2019年中国A股上市公司数据,评价数据质量因子对财务风险预警模型预测准确性的影响,分析新模型预测准确性的稳定性。实证分析结果显示:Benford和Myer质量因子提高了BP神经网络财务风险预警模型预测的准确性;在不同质量因子的比较结果中,包含评选指标Benford和Myer质量因子的BP神经网络财务风险预警模型具有较高的预测准确率和较低的二类误判率,稳定性良好;利用决策树算法筛选指标有效提高了新模型的预测准确性。相似文献

6.

Functional principal component analysis for the explorative analysis of multisite–multivariate air pollution time series with long gaps

Mariantonietta Ruggieri Antonella Plaia Francesca Di Salvo Gianna Agró 《Journal of applied statistics》2013,40(4):795-807

The knowledge of the urban air quality represents the first step to face air pollution issues. For the last decades many cities can rely on a network of monitoring stations recording concentration values for the main pollutants. This paper focuses on functional principal component analysis (FPCA) to investigate multiple pollutant datasets measured over time at multiple sites within a given urban area. Our purpose is to extend what has been proposed in the literature to data that are multisite and multivariate at the same time. The approach results to be effective to highlight some relevant statistical features of the time series, giving the opportunity to identify significant pollutants and to know the evolution of their variability along time. The paper also deals with missing value issue. As it is known, very long gap sequences can often occur in air quality datasets, due to long time failures not easily solvable or to data coming from a mobile monitoring station. In the considered dataset, large and continuous gaps are imputed by empirical orthogonal function procedure, after denoising raw data by functional data analysis and before performing FPCA, in order to further improve the reconstruction. 相似文献

7.

Bayesian Single Changepoint Estimation in a Parameter‐driven Model

下载免费PDF全文

Chigozie E. Utazi 《Scandinavian Journal of Statistics》2017,44(3):765-779

In this paper, we consider the problem of estimating a single changepoint in a parameter‐driven model. The model – an extension of the Poisson regression model – accounts for serial correlation through a latent process incorporated in its mean function. Emphasis is placed on the changepoint characterization with changes in the parameters of the model. The model is fully implemented within the Bayesian framework. We develop a RJMCMC algorithm for parameter estimation and model determination. The algorithm embeds well‐devised Metropolis–Hastings procedures for estimating the missing values of the latent process through data augmentation and the changepoint. The methodology is illustrated using data on monthly counts of claimants collecting wage loss benefit for injuries in the workplace and an analysis of presidential uses of force in the USA. 相似文献

8.

Integrated nested Laplace approximation for the analysis of count data via the combined model: A simulation study

Thomas Neyens Christel Faes Geert Molenberghs 《统计学通讯:模拟与计算》2019,48(3):819-836

The combined model accounts for different forms of extra-variability and has traditionally been applied in the likelihood framework, or in the Bayesian setting via Markov chain Monte Carlo. In this article, integrated nested Laplace approximation is investigated as an alternative estimation method for the combined model for count data, and compared with the former estimation techniques. Longitudinal, spatial, and multi-hierarchical data scenarios are investigated in three case studies as well as a simulation study. As a conclusion, integrated nested Laplace approximation provides fast and precise estimation, while avoiding convergence problems often seen when using Markov chain Monte Carlo. 相似文献

9.

Empirical Bayes estimates for correlated hierarchical data with overdispersion

Samuel Iddi Geert Molenberghs Mehreteab Aregay George Kalema 《Pharmaceutical statistics》2014,13(5):316-326

An extension of the generalized linear mixed model was constructed to simultaneously accommodate overdispersion and hierarchies present in longitudinal or clustered data. This so‐called combined model includes conjugate random effects at observation level for overdispersion and normal random effects at subject level to handle correlation, respectively. A variety of data types can be handled in this way, using different members of the exponential family. Both maximum likelihood and Bayesian estimation for covariate effects and variance components were proposed. The focus of this paper is the development of an estimation procedure for the two sets of random effects. These are necessary when making predictions for future responses or their associated probabilities. Such (empirical) Bayes estimates will also be helpful in model diagnosis, both when checking the fit of the model as well as when investigating outlying observations. The proposed procedure is applied to three datasets of different outcome types. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

10.

Estimating the probability of simultaneous rainfall extremes within a region: a spatial approach

Lee Fawcett David Walshaw 《Journal of applied statistics》2014,41(5):959-976

In this paper we investigate the impact of model mis-specification, in terms of the dependence structure in the extremes of a spatial process, on the estimation of key quantities that are of interest to hydrologists and engineers. For example, it is often the case that severe flooding occurs as a result of the observation of rainfall extremes at several locations in a region simultaneously. Thus, practitioners might be interested in estimates of the joint exceedance probability of some high levels across these locations. It is likely that there will be spatial dependence present between the extremes, and this should be properly accounted for when estimating such probabilities. We compare the use of standard models from the geostatistics literature with max-stables models from extreme value theory. We find that, in some situations, using an incorrect spatial model for our extremes results in a significant under-estimation of these probabilities which – in flood defence terms – could lead to substantial under-protection. 相似文献

11.

Simultaneous structure estimation and variable selection in partial linear varying coefficient models for longitudinal data

Kangning Wang 《Journal of Statistical Computation and Simulation》2015,85(7):1459-1473

Partial linear varying coefficient models (PLVCM) are often considered for analysing longitudinal data for a good balance between flexibility and parsimony. The existing estimation and variable selection methods for this model are mainly built upon which subset of variables have linear or varying effect on the response is known in advance, or say, model structure is determined. However, in application, this is unreasonable. In this work, we propose a simultaneous structure estimation and variable selection method, which can do simultaneous coefficient estimation and three types of selections: varying and constant effects selection, relevant variable selection. It can be easily implemented in one step by employing a penalized M-type regression, which uses a general loss function to treat mean, median, quantile and robust mean regressions in a unified framework. Consistency in the three types of selections and oracle property in estimation are established as well. Simulation studies and real data analysis also confirm our method. 相似文献

12.

A Bayesian model for multiple change point to extremes,with application to environmental and financial data

Fernando Ferraz do Nascimento 《Journal of applied statistics》2017,44(13):2410-2426

Abrupt changes often occur for environmental and financial time series. Most often, these changes are due to human intervention. Change point analysis is a statistical tool used to analyze sudden changes in observations along the time series. In this paper, we propose a Bayesian model for extreme values for environmental and economic datasets that present a typical change point behavior. The model proposed in this paper addresses the situation in which more than one change point can occur in a time series. By analyzing maxima, the distribution of each regime is a generalized extreme value distribution. In this model, the change points are unknown and considered parameters to be estimated. Simulations of extremes with two change points showed that the proposed algorithm can recover the true values of the parameters, in addition to detecting the true change points in different configurations. Also, the number of change points was a problem to be considered, and the Bayesian estimation can correctly identify the correct number of change points for each application. Environmental and financial data were analyzed and results showed the importance of considering the change point in the data and revealed that this change of regime brought about an increase in the return levels, increasing the number of floods in cities around the rivers. Stock market levels showed the necessity of a model with three different regimes. 相似文献

13.

Robust mixture model cluster analysis using adaptive kernels

J. Andrew Howe Hamparsum Bozdogan 《Journal of applied statistics》2013,40(2):320-336

The traditional mixture model assumes that a dataset is composed of several populations of Gaussian distributions. In real life, however, data often do not fit the restrictions of normality very well. It is likely that data from a single population exhibiting either asymmetrical or heavy-tail behavior could be erroneously modeled as two populations, resulting in suboptimal decisions. To avoid these pitfalls, we generalize the mixture model using adaptive kernel density estimators. Because kernel density estimators enforce no functional form, we can adapt to non-normal asymmetric, kurtotic, and tail characteristics in each population independently. This, in effect, robustifies mixture modeling. We adapt two computational algorithms, genetic algorithm with regularized Mahalanobis distance and genetic expectation maximization algorithm, to optimize the kernel mixture model (KMM) and use results from robust estimation theory in order to data-adaptively regularize both. Finally, we likewise extend the information criterion ICOMP to score the KMM. We use these tools to simultaneously select the best mixture model and classify all observations without making any subjective decisions. The performance of the KMM is demonstrated on two medical datasets; in both cases, we recover the clinically determined group structure and substantially improve patient classification rates over the Gaussian mixture model. 相似文献

14.

An extended Gaussian max-stable process model for spatial extremes 总被引：1，自引：0，他引：1

Elizabeth L. Smith Alec G. Stephenson 《Journal of statistical planning and inference》2009

The extremes of environmental processes are often of interest due to the damage that can be caused by extreme levels of the processes. These processes are often spatial in nature and modelling the extremes jointly at many locations can be important. In this paper, an extension of the Gaussian max-stable process is developed, enabling data from a number of locations to be modelled under a more flexible framework than in previous applications. The model is applied to annual maximum rainfall data from five sites in South-West England. For estimation we employ a pairwise likelihood within a Bayesian analysis, incorporating informative prior information. 相似文献

15.

Some remarks on measurement models in the structural equation model: an application for socially responsible food consumption

Pasquale Sarnacchiaro Flavio Boccia 《Journal of applied statistics》2018,45(7):1193-1208

Considering the structural equation model (SEM), usually the main researches are based on the structural model rather than on the measurement one. So, this context implies some problems: construct misspecification, identification and validation. Starting from the most recent articles in terms of these issues, we achieve – and formalize through two tables – a general framework that could help researchers select and assess both formative and reflective measurement models with special attention on statistical implications. To show this general framework, we present a survey on customer behaviours for socially responsible food consumption. The survey was carried out by delivering a questionnaire administered to a representative sample of 332 families. In order to detect the main aspects impacting consumers’ preferences, a factor analysis has been performed. Then the general framework has been used to select and assess the measurement models in SEM. The estimation of the SEM has been worked out by partial least squares. The significance of the indicators has been tested using bootstrap. As far as we know, it is the first time that a model for the analysis of the consumers’ behaviour for social responsibility is formalized through a SEM. 相似文献

16.

Integrating dose estimation into a decision‐making framework for model‐based drug development

下载免费PDF全文

James Dunyak Patrick Mitchell Bengt Hamrén Gabriel Helmlinger James Matcham Donald Stanski Nidal Al‐Huniti 《Pharmaceutical statistics》2018,17(2):155-168

Model‐informed drug discovery and development offers the promise of more efficient clinical development, with increased productivity and reduced cost through scientific decision making and risk management. Go/no‐go development decisions in the pharmaceutical industry are often driven by effect size estimates, with the goal of meeting commercially generated target profiles. Sufficient efficacy is critical for eventual success, but the decision to advance development phase is also dependent on adequate knowledge of appropriate dose and dose‐response. Doses which are too high or low pose risk of clinical or commercial failure. This paper addresses this issue and continues the evolution of formal decision frameworks in drug development. Here, we consider the integration of both efficacy and dose‐response estimation accuracy into the go/no‐go decision process, using a model‐based approach. Using prespecified target and lower reference values associated with both efficacy and dose accuracy, we build a decision framework to more completely characterize development risk. Given the limited knowledge of dose response in early development, our approach incorporates a set of dose‐response models and uses model averaging. The approach and its operating characteristics are illustrated through simulation. Finally, we demonstrate the decision approach on a post hoc analysis of the phase 2 data for naloxegol (a drug approved for opioid‐induced constipation). 相似文献

17.

Stochastic variational inference for large-scale discrete choice models using adaptive batch sizes

Linda S. L. Tan 《Statistics and Computing》2017,27(1):237-257

Discrete choice models describe the choices made by decision makers among alternatives and play an important role in transportation planning, marketing research and other applications. The mixed multinomial logit (MMNL) model is a popular discrete choice model that captures heterogeneity in the preferences of decision makers through random coefficients. While Markov chain Monte Carlo methods provide the Bayesian analogue to classical procedures for estimating MMNL models, computations can be prohibitively expensive for large datasets. Approximate inference can be obtained using variational methods at a lower computational cost with competitive accuracy. In this paper, we develop variational methods for estimating MMNL models that allow random coefficients to be correlated in the posterior and can be extended easily to large-scale datasets. We explore three alternatives: (1) Laplace variational inference, (2) nonconjugate variational message passing and (3) stochastic linear regression. Their performances are compared using real and simulated data. To accelerate convergence for large datasets, we develop stochastic variational inference for MMNL models using each of the above alternatives. Stochastic variational inference allows data to be processed in minibatches by optimizing global variational parameters using stochastic gradient approximation. A novel strategy for increasing minibatch sizes adaptively within stochastic variational inference is proposed. 相似文献

18.

Prediction in linear mixed models 总被引：2，自引：0，他引：2

Sue Welham Brian Cullis Beverley Gogel Arthur Gilmour Robin Thompson 《Australian & New Zealand Journal of Statistics》2004,46(3):325-347

Following estimation of effects from a linear mixed model, it is often useful to form predicted values for certain factor/variate combinations. The process has been well defined for linear models, but the introduction of random effects into the model means that a decision has to be made about the inclusion or exclusion of random model terms from the predictions. This paper discusses the interpretation of predictions formed including or excluding random terms. Four datasets are used to illustrate circumstances where different prediction strategies may be appropriate: in an orthogonal design, an unbalanced nested structure, a model with cubic smoothing spline terms and for kriging after spatial analysis. The examples also show the need for different weighting schemes that recognize nesting and aliasing during prediction, and the necessity of being able to detect inestimable predictions. 相似文献

19.

基于正态分布点值化的区间主成分评价法及应用

下载免费PDF全文

陈骥王炳兴《统计研究》2012,29(7):91-95

针对区间数据点值化过程中所存在的“代表性不足”的缺陷,提出了基于正态分布的点值化方法并将之应用于区间主成分评价法。通过与基于中心点值化的区间主成分法的比较,得到三个主要结论：第一,基于正态分布的点值化方法能将各样品的点值化结果导向指标均值,而非区间值的中心点;第二,基于正态分布的点值化结果增加了数据信息量;第三,基于正态分布点值化的区间主成分评价法提高了数据降维效果,具有更好的因子命名能力。应用结果表明,在考虑正态分布情况下,对区间数据的点值化处理方法具有较好的效果,基于正态分布点值化的方法可推广至基于区间数的评价和决策问题。相似文献

20.

基于辅助粒子滤波的DSGE模型识别方法及应用

周丽莉丁东洋《统计与信息论坛》2014,(11):11-17

动态随机一般均衡模型中涵盖无法直接观测的变量,同时跨方程约束涉及复杂的非线性关系使方程的解析估计难以实现。在贝叶斯框架下识别动态随机一般均衡模型,基于状态空间方法建立度量方程和状态转移方程,采用辅助粒子滤波预测条件后验分布,建立贝叶斯误差带描述宏观经济变量脉冲响应函数的动态特征。实际数据分析验证了贝叶斯识别方法的有效性。相似文献