首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In recent years, a variety of regression models, including zero-inflated and hurdle versions, have been proposed to explain the case of a dependent variable with respect to exogenous covariates. Apart from the classical Poisson, negative binomial and generalised Poisson distributions, many proposals have appeared in the statistical literature, perhaps in response to the new possibilities offered by advanced software that now enables researchers to implement numerous special functions in a relatively simple way. However, we believe that a significant research gap remains, since very little attention has been paid to the quasi-binomial distribution, which was first proposed over fifty years ago. We believe this distribution might constitute a valid alternative to existing regression models, in situations in which the variable has bounded support. Therefore, in this paper we present a zero-inflated regression model based on the quasi-binomial distribution, taking into account the moments and maximum likelihood estimators, and perform a score test to compare the zero-inflated quasi-binomial distribution with the zero-inflated binomial distribution, and the zero-inflated model with the homogeneous model (the model in which covariates are not considered). This analysis is illustrated with two data sets that are well known in the statistical literature and which contain a large number of zeros.  相似文献   

2.
ABSTRACT

In economics and government statistics, aggregated data instead of individual level data are usually reported for data confidentiality and for simplicity. In this paper we develop a method of flexibly estimating the probability density function of the population using aggregated data obtained as group averages when individual level data are grouped according to quantile limits. The kernel density estimator has been commonly applied to such data without taking into account the data aggregation process and has been shown to perform poorly. Our method models the quantile function as an integral of the exponential of a spline function and deduces the density function from the quantile function. We match the aggregated data to their theoretical counterpart using least squares, and regularize the estimation by using the squared second derivatives of the density function as the penalty function. A computational algorithm is developed to implement the method. Application to simulated data and US household income survey data show that our penalized spline estimator can accurately recover the density function of the underlying population while the common use of kernel density estimation is severely biased. The method is applied to study the dynamic of China's urban income distribution using published interval aggregated data of 1985–2010.  相似文献   

3.
In many medical studies, patients are followed longitudinally and interest is on assessing the relationship between longitudinal measurements and time to an event. Recently, various authors have proposed joint modeling approaches for longitudinal and time-to-event data for a single longitudinal variable. These joint modeling approaches become intractable with even a few longitudinal variables. In this paper we propose a regression calibration approach for jointly modeling multiple longitudinal measurements and discrete time-to-event data. Ideally, a two-stage modeling approach could be applied in which the multiple longitudinal measurements are modeled in the first stage and the longitudinal model is related to the time-to-event data in the second stage. Biased parameter estimation due to informative dropout makes this direct two-stage modeling approach problematic. We propose a regression calibration approach which appropriately accounts for informative dropout. We approximate the conditional distribution of the multiple longitudinal measurements given the event time by modeling all pairwise combinations of the longitudinal measurements using a bivariate linear mixed model which conditions on the event time. Complete data are then simulated based on estimates from these pairwise conditional models, and regression calibration is used to estimate the relationship between longitudinal data and time-to-event data using the complete data. We show that this approach performs well in estimating the relationship between multivariate longitudinal measurements and the time-to-event data and in estimating the parameters of the multiple longitudinal process subject to informative dropout. We illustrate this methodology with simulations and with an analysis of primary biliary cirrhosis (PBC) data.  相似文献   

4.
A bivariate generalisation of the Consul's(1974) quasi-binomial distributionCQBD) has been obtained with the help of an urn-model for eKplaining data obtained as a result of four-fold sampling, The distribution is expected to cover a very wide range of situations in four-fold sampling. The first and the second order moments of the distribution have been obtained,The distribution has been fitted to an observed set of data as an illustration and its limiting form has also been obtained.  相似文献   

5.
Stochastic models are of fundamental importance in many scientific and engineering applications. For example, stochastic models provide valuable insights into the causes and consequences of intra-cellular fluctuations and inter-cellular heterogeneity in molecular biology. The chemical master equation can be used to model intra-cellular stochasticity in living cells, but analytical solutions are rare and numerical simulations are computationally expensive. Inference of system trajectories and estimation of model parameters from observed data are important tasks and are even more challenging. Here, we consider the case where the observed data are aggregated over time. Aggregation of data over time is required in studies of single cell gene expression using a luciferase reporter, where the emitted light can be very faint and is therefore collected for several minutes for each observation. We show how an existing approach to inference based on the linear noise approximation (LNA) can be generalised to the case of temporally aggregated data. We provide a Kalman filter (KF) algorithm which can be combined with the LNA to carry out inference of system variable trajectories and estimation of model parameters. We apply and evaluate our method on both synthetic and real data scenarios and show that it is able to accurately infer the posterior distribution of model parameters in these examples. We demonstrate how applying standard KF inference to aggregated data without accounting for aggregation will tend to underestimate the process noise and can lead to biased parameter estimates.  相似文献   

6.
The Cohen kappa is probably the most widely used measure of agreement. Measuring the degree of agreement or disagreement in square contingency tables by two raters is mostly of interest. Modeling the agreement provides more information on the pattern of the agreement rather than summarizing the agreement by kappa coefficient. Additionally, the disagreement models in the literature they mentioned are proposed for the nominal scales. Disagreement and uniform association models are aggregated as a new model for the ordinal scale agreement data, thus in this paper, symmetric disagreement plus uniform association model that aims separating the association from the disagreement is proposed. Proposed model is applied to real uterine cancer data.  相似文献   

7.
任燕燕等 《统计研究》2021,38(11):141-149
面板数据由不同个体的时间序列数据汇聚而成。已有大量研究表明面板数据个体之间存在组群结构,并且普遍存在模型的异方差现象。本文借鉴组群异质性的研究成果,构建模型误差项组群结构的面板数据模型,基于模型假定条件,提出惩罚伪最大似然函数估计法(PQMLE),该方法能够同时进行结构识别和参数估计;证明了估计量具有Oracle渐近性质;蒙特卡洛模拟验证了该方法有效的样本性质;进一步应用该方法对我国股市进行Fama-French三因子模型的实证分析,验证了理论模型的应用效果。  相似文献   

8.
A random field displays long (resp. short) memory when its covariance function is absolutely non-summable (resp. summable), or alternatively when its spectral density (spectrum) is unbounded (resp. bounded) at some frequencies. Drawing on the spectrum approach, this paper characterizes both short and long memory features in the spatial autoregressive model. The data generating process is presented as a sequence of spatial autoregressive micro-relationships. The study elaborates the exact conditions under which short and long memories emerge for micro-relationships and for the aggregated field as well. To study the spectrum of the aggregated field, we develop a new general concept referred to as the ‘root order of a function’. This concept might be usefully applied in studying the convergence of some special integrals. We illustrate our findings with simulation experiments and an empirical application based on Gross Domestic Product data for 100 countries spanning over 1960–2004.  相似文献   

9.
Summary.  A multivariate non-linear time series model for road safety data is presented. The model is applied in a case-study into the development of a yearly time series of numbers of fatal accidents (inside and outside urban areas) and numbers of kilometres driven by motor vehicles in the Netherlands between 1961 and 2000. The model accounts for missing entries in the disaggregated numbers of kilometres driven although the aggregated numbers are observed throughout. We consider a multivariate non-linear time series model for the analysis of these data. The model consists of dynamic unobserved factors for exposure and risk that are related in a non-linear way to the number of fatal accidents. The multivariate dimension of the model is due to its inclusion of multiple time series for inside and outside urban areas. Approximate maximum likelihood methods based on the extended Kalman filter are utilized for the estimation of unknown parameters. The latent factors are estimated by extended smoothing methods. It is concluded that the salient features of the observed time series are captured by the model in a satisfactory way.  相似文献   

10.
To be useful to clinicians, prognostic and diagnostic indices must be derived from accurate models developed by using appropriate data sets. We show that fractional polynomials, which extend ordinary polynomials by including non-positive and fractional powers, may be used as the basis of such models. We describe how to fit fractional polynomials in several continuous covariates simultaneously, and we propose ways of ensuring that the resulting models are parsimonious and consistent with basic medical knowledge. The methods are applied to two breast cancer data sets, one from a prognostic factors study in patients with positive lymph nodes and the other from a study to diagnose malignant or benign tumours by using colour Doppler blood flow mapping. We investigate the problems of biased parameter estimates in the final model and overfitting using cross-validation calibration to estimate shrinkage factors. We adopt bootstrap resampling to assess model stability. We compare our new approach with conventional modelling methods which apply stepwise variables selection to categorized covariates. We conclude that fractional polynomial methodology can be very successful in generating simple and appropriate models.  相似文献   

11.
Bayesian calibration of computer models   总被引:5,自引:0,他引:5  
We consider prediction and uncertainty analysis for systems which are approximated using complex mathematical models. Such models, implemented as computer codes, are often generic in the sense that by a suitable choice of some of the model's input parameters the code can be used to predict the behaviour of the system in a variety of specific applications. However, in any specific application the values of necessary parameters may be unknown. In this case, physical observations of the system in the specific context are used to learn about the unknown parameters. The process of fitting the model to the observed data by adjusting the parameters is known as calibration. Calibration is typically effected by ad hoc fitting, and after calibration the model is used, with the fitted input values, to predict the future behaviour of the system. We present a Bayesian calibration technique which improves on this traditional approach in two respects. First, the predictions allow for all sources of uncertainty, including the remaining uncertainty over the fitted parameters. Second, they attempt to correct for any inadequacy of the model which is revealed by a discrepancy between the observed data and the model predictions from even the best-fitting parameter values. The method is illustrated by using data from a nuclear radiation release at Tomsk, and from a more complex simulated nuclear accident exercise.  相似文献   

12.
We develop a Bayesian approach for parsimoniously estimating the correlation structure of the errors in a multivariate stochastic volatility model. Since the number of parameters in the joint correlation matrix of the return and volatility errors is potentially very large, we impose a prior that allows the off-diagonal elements of the inverse of the correlation matrix to be identically zero. The model is estimated using a Markov chain simulation method that samples from the posterior distribution of the volatilities and parameters. We illustrate the approach using both simulated and real examples. In the real examples, the method is applied to equities at three levels of aggregation: returns for firms within the same industry, returns for different industries, and returns aggregated at the index level. We find pronounced correlation effects only at the highest level of aggregation.  相似文献   

13.
For aggregated time series unit root tests are routinely applied to choose among trend and difference stationary models. Recent work demonstrates that such test can also be applied for testing panel data. However, it is well known that disaggregated data often exhibit a considerable amount of heterogeneity so that standard tests may perform poorly. To account for the heterogeneity in the data we allow for individual specific deterministics, that is, we let the time trends vary across the cross section units. It is shown that standard GMM estimators suggested for the dynamic panel data model may fail to give a valid test procedure. To overcome this difficulty, a modified GMM estimator is suggested. In a Monte Carlo study the finite sample properties of the alternative tests are compared.  相似文献   

14.
We develop a Bayesian approach for parsimoniously estimating the correlation structure of the errors in a multivariate stochastic volatility model. Since the number of parameters in the joint correlation matrix of the return and volatility errors is potentially very large, we impose a prior that allows the off-diagonal elements of the inverse of the correlation matrix to be identically zero. The model is estimated using a Markov chain simulation method that samples from the posterior distribution of the volatilities and parameters. We illustrate the approach using both simulated and real examples. In the real examples, the method is applied to equities at three levels of aggregation: returns for firms within the same industry, returns for different industries, and returns aggregated at the index level. We find pronounced correlation effects only at the highest level of aggregation.  相似文献   

15.
This paper describes small area estimation (SAE) of proportions under a spatial dependent generalized linear mixed model using aggregated level data. The SAE is also applied to produce reliable district level estimates and mapping of incidence of indebtedness in the State of Uttar Pradesh in India using debt and investment survey data collected by National Sample Survey Office (NSSO) and the secondary data from the Census. The results show a significant improvement in precision of model-based estimates generated by SAE as compared to direct estimates. The estimates generated by incorporating spatial information are more efficient than the one generated by ignoring this information.  相似文献   

16.
We consider logistic regression with covariate measurement error. Most existing approaches require certain replicates of the error‐contaminated covariates, which may not be available in the data. We propose generalized method of moments (GMM) nonparametric correction approaches that use instrumental variables observed in a calibration subsample. The instrumental variable is related to the underlying true covariates through a general nonparametric model, and the probability of being in the calibration subsample may depend on the observed variables. We first take a simple approach adopting the inverse selection probability weighting technique using the calibration subsample. We then improve the approach based on the GMM using the whole sample. The asymptotic properties are derived, and the finite sample performance is evaluated through simulation studies and an application to a real data set.  相似文献   

17.
A class of weighted quasi-binomial distributions has been derived as weighted distribution of QBD I (Sankhyā 36 (Ser. B, Part 4) 391). The moments, inverse moments, recurrence relations among moments, bounds for mode, problem of estimation and fitting of data from real life situations using different methods and limiting distributions of the class have been studied. Consul's (Comm. Statist. Theory Methods 19(2) (1990) 477) results on QBD I have been seen as particular cases.  相似文献   

18.
基于数据汇总的普查调查框误差研究   总被引:1,自引:0,他引:1  
作为一种全面调查,普查数据的生产过程可以视为由个体数据汇总为总量数据的过程.为了开展普查数据质量评估与控制研究,从中国普查调查实施过程共性出发,构建普查数据汇总模型的一般形式,并以此为基础界定普查调查框及其作用,将普查划分为两种类型;同时从普查数据汇总的角度论述普查调查框误差的量化形式,进一步完善单位清查(清查摸底)环节在普查数据汇总中的理论意义.  相似文献   

19.
This work presents a new linear calibration model with replication by assuming that the error of the model follows a skew scale mixture of the normal distributions family, which is a class of asymmetric thick-tailed distributions that includes the skew normal distribution and symmetric distributions. In the literature, most calibration models assume that the errors are normally distributed. However, the normal distribution is not suitable when there are atypical observations and asymmetry. The estimation of the calibration model parameters are done numerically by the EM algorithm. A simulation study is carried out to verify the properties of the maximum likelihood estimators. This new approach is applied to a real dataset from a chemical analysis.  相似文献   

20.
Motivated by a specific problem concerning the relationship between radar reflectance and rainfall intensity, the paper develops a space–time model for use in environmental monitoring applications. The model is cast as a high dimensional multivariate state space time series model, in which the cross-covariance structure is derived from the spatial context of the component series, in such a way that its interpretation is essentially independent of the particular set of spatial locations at which the data are recorded. We develop algorithms for estimating the parameters of the model by maximum likelihood, and for making spatial predictions of the radar calibration parameters by using realtime computations. We apply the model to data from a weather radar station in Lancashire, England, and demonstrate through empirical validation the predictive performance of the model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号