期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

How far from identifiability? A systematic overview of the statistical matching problem in a non parametric framework

Pier Luigi Conti Mauro Scanu 《统计学通讯:理论与方法》2017,46(2):967-994

Statistical matching consists in estimating the joint characteristics of two variables observed in two distinct and independent sample surveys, respectively. In a parametric setup, ranges of estimates for non identifiable parameters are the only estimable items, unless restrictive assumptions on the probabilistic relationship between the non jointly observed variables are imposed. These ranges correspond to the uncertainty due to the absence of joint observations on the pair of variables of interest. The aim of this paper is to analyze the uncertainty in statistical matching in a non parametric setting. A measure of uncertainty is introduced, and its properties studied: this measure studies the “intrinsic” association between the pair of variables, which is constant and equal to 1/6 whatever the form of the marginal distribution functions of the two variables when knowledge on the pair of variables is the only one available in the two samples. This measure becomes useful in the context of the reduction of uncertainty due to further knowledge than data themselves, as in the case of structural zeros. In this case the proposed measure detects how the introduction of further knowledge shrinks the intrinsic uncertainty from 1/6 to smaller values, zero being the case of no uncertainty. Sampling properties of the uncertainty measure and of the bounds of the uncertainty intervals are also proved. 相似文献

2.

Maximum-Entropy Prior Uncertainty and Correlation of Statistical Economic Data

João D. F. Rodrigues 《商业与经济统计学杂志》2016,34(3):357-367

Empirical estimates of source statistical economic data such as trade flows, greenhouse gas emissions, or employment figures are always subject to uncertainty (stemming from measurement errors or confidentiality) but information concerning that uncertainty is often missing. This article uses concepts from Bayesian inference and the maximum entropy principle to estimate the prior probability distribution, uncertainty, and correlations of source data when such information is not explicitly provided. In the absence of additional information, an isolated datum is described by a truncated Gaussian distribution, and if an uncertainty estimate is missing, its prior equals the best guess. When the sum of a set of disaggregate data is constrained to match an aggregate datum, it is possible to determine the prior correlations among disaggregate data. If aggregate uncertainty is missing, all prior correlations are positive. If aggregate uncertainty is available, prior correlations can be either all positive, all negative, or a mix of both. An empirical example is presented, which reports relative uncertainties and correlation priors for the County Business Patterns database. In this example, relative uncertainties range from 1% to 80% and 20% of data pairs exhibit correlations below ?0.9 or above 0.9. Supplementary materials for this article are available online. 相似文献

3.

面板数据的可加分位回归模型研究与应用

罗幼喜张敏田茂再《统计研究》2020,37(2):105-118

本文在贝叶斯分析的框架下讨论了面板数据的可加模型分位回归建模方法。首先通过低秩薄板惩罚样条展开和个体效应虚拟变量的引进将非参数模型转换为参数模型,然后在假定随机误差项服从非对称Laplace分布的基础上建立了贝叶斯分层分位回归模型。通过对非对称Laplace分布的分解,论文给出了所有待估参数的条件后验分布,并构造了待估参数的 Gibbs抽样估计算法。计算机模拟仿真结果显示,新提出的方法相比于传统的可加模型均值回归方法在估计稳健性上明显占优。最后以消费支出面板数据为例研究了我国农村居民收入结构对消费支出的影响,发现对于农村居民来说,无论是高、中、低消费群体,工资性收入与经营净收入的增加对其消费支出的正向刺激作用更为明显。进一步,相比于高消费农村居民人群,低消费农村居民人群随着收入的增加消费支出上升速度较为缓慢。相似文献

4.

行政记录整合的贝叶斯分层记录链接模型及应用

丁东洋周丽莉《统计与信息论坛》2016,(7):30-35

记录链接的技术问题与统计理论密切相关,尤其是在建立记录链接分类规则时需要构建统计模型,识别关键变量以完成数据匹配。在贝叶斯框架下构建分层模型整合行政记录,通过多元回归可以实现匹配错误率的估计,而且一对一限制下的记录链接允许通过模块反映记录信息的来源变化,基于MCMC模拟的后验分布计算方便,有助于提高数据整合效率。相似文献

5.

Gaussian copula distributions for mixed data,with application in discrimination

《Journal of Statistical Computation and Simulation》2012,82(9):1643-1659

The construction of a joint model for mixed discrete and continuous random variables that accounts for their associations is an important statistical problem in many practical applications. In this paper, we use copulas to construct a class of joint distributions of mixed discrete and continuous random variables. In particular, we employ the Gaussian copula to generate joint distributions for mixed variables. Examples include the robit-normal and probit-normal-exponential distributions, the first for modelling the distribution of mixed binary-continuous data and the second for a mixture of continuous, binary and trichotomous variables. The new class of joint distributions is general enough to include many mixed-data models currently available. We study properties of the distributions and outline likelihood estimation; a small simulation study is used to investigate the finite-sample properties of estimates obtained by full and pairwise likelihood methods. Finally, we present an application to discriminant analysis of multiple correlated binary and continuous data from a study involving advanced breast cancer patients. 相似文献

6.

基尼系数的区间估计及其应用

下载免费PDF全文

戴平生《统计研究》2013,30(5):83-89

本文给出了收入为离散分布的三种计算基尼系数的新方法。利用收入份额法导出了基尼系数协方差算法的离散形式,并因此产生了计算基尼系数的回归系数法。文章重点讨论了基尼系数进行区间估计的两种方法,这些方法也适用于集中度指数,因而它们在测度社会经济领域的不平等中具有着十分广泛的用途。实际应用表明,新算法有效地简化了对基尼系数区间估计的标准差估算。相似文献

7.

Multiply robust inference for statistical interactions

Vansteelandt S Vanderweele TJ Robins JM 《Journal of the American Statistical Association》2008,103(484):1693-1704

A primary focus of an increasing number of scientific studies is to determine whether two exposures interact in the effect that they produce on an outcome of interest. Interaction is commonly assessed by fitting regression models in which the linear predictor includes the product between those exposures. When the main interest lies in the interaction, this approach is not entirely satisfactory because it is prone to (possibly severe) bias when the main exposure effects or the association between outcome and extraneous factors are misspecified. In this article, we therefore consider conditional mean models with identity or log link which postulate the statistical interaction in terms of a finite-dimensional parameter, but which are otherwise unspecified. We show that estimation of the interaction parameter is often not feasible in this model because it would require nonparametric estimation of auxiliary conditional expectations given high-dimensional variables. We thus consider 'multiply robust estimation' under a union model that assumes at least one of several working submodels holds. Our approach is novel in that it makes use of information on the joint distribution of the exposures conditional on the extraneous factors in making inferences about the interaction parameter of interest. In the special case of a randomized trial or a family-based genetic study in which the joint exposure distribution is known by design or by Mendelian inheritance, the resulting multiply robust procedure leads to asymptotically distribution-free tests of the null hypothesis of no interaction on an additive scale. We illustrate the methods via simulation and the analysis of a randomized follow-up study. 相似文献

8.

Estimating Engel curves: a new way to improve the SILC-HBS matching process using GLM methods

Julio Lpez-Laborda Carmen Marín-Gonzlez Jorge Onrubia-Fernndez 《Journal of applied statistics》2021,48(16):3233

Microdata are required to evaluate the distributive impact of the taxation system as a whole (direct and indirect taxes) on individuals or households. However, in European Union countries this information is usually distributed into two separate surveys: the Household Budget Surveys (HBS), including total household expenditure and its composition, and EU Statistics on Income and Living Conditions (EU-SILC), including detailed information about households'' income and direct (but not indirect) taxes paid. We present a parametric statistical matching procedure to merge both surveys. For the first stage of matching, we propose estimating total household expenditure in HBS (Engel curves) using a GLM estimator, instead of the traditionally used OLS method. It is a better alternative, insofar as it can deal with the heteroskedasticity problem of the OLS estimates, while making it unnecessary to retransform the regressors estimated in logarithms. To evaluate these advantages of the GLM estimator, we conducted a computational Monte Carlo simulation. In addition, when an error term is added to the deterministic imputation of expenditure in the EU-SILC, we propose replacing the usual Normal distribution of the error with a Chi-square type, which allows a better approximation to the original expenditures variance in the HBS. An empirical analysis is provided using Spanish surveys for years 2012–2016. In addition, we extend the empirical analysis to the rest of the European Union countries, using the surveys provided by Eurostat (EU-SILC, 2011; HBS, 2010). 相似文献

9.

Temporal variation and scale in movement-based resource selection functions

《Statistical Methodology》2014

A common population characteristic of interest in animal ecology studies pertains to the selection of resources. That is, given the resources available to animals, what do they ultimately choose to use? A variety of statistical approaches have been employed to examine this question and each has advantages and disadvantages with respect to the form of available data and the properties of estimators given model assumptions. A wealth of high resolution telemetry data are now being collected to study animal population movement and space use and these data present both challenges and opportunities for statistical inference. We summarize traditional methods for resource selection and then describe several extensions to deal with measurement uncertainty and an explicit movement process that exists in studies involving high-resolution telemetry data. Our approach uses a correlated random walk movement model to obtain temporally varying use and availability distributions that are employed in a weighted distribution context to estimate selection coefficients. The temporally varying coefficients are then weighted by their contribution to selection and combined to provide inference at the population level. The result is an intuitive and accessible statistical procedure that uses readily available software and is computationally feasible for large datasets. These methods are demonstrated using data collected as part of a large-scale mountain lion monitoring study in Colorado, USA. 相似文献

10.

Theory and methods for partitioned Gini coefficients computed on post-stratified data

Chaitra H. Nagaraja 《统计学通讯:理论与方法》2017,46(10):4809-4823

The Gini coefficient is used to measure inequality in populations. However, shifts in the population distribution may affect subgroups differently. Consequently, it can be informative to examine inequality separately for these segments. Consider an independently and identically distributed sample split based on ranking and compute the Gini coefficient for each partition. These coefficients, calculated from post-stratified data, are not functions of U-statistics. Therefore, previous theoretical and methodological results cannot be applied. In this article, the asymptotic joint distribution is derived for the partitioned coefficients and bootstrap methods for inference are developed. Finally, an application to per capita income across census tracts is examined. 相似文献

11.

Mutual information and redundancy for categorical data

Chong Sun Hong Beom Jun Kim 《Statistical Papers》2011,52(1):17-31

Most methods for describing the relationship among random variables require specific probability distributions and some assumptions concerning random variables. Mutual information, based on entropy to measure the dependency among random variables, does not need any specific distribution and assumptions. Redundancy, which is an analogous version of mutual information, is also proposed as a method. In this paper, the concepts of redundancy and mutual information are explored as applied to multi-dimensional categorical data. We found that mutual information and redundancy for categorical data can be expressed as a function of the generalized likelihood ratio statistic under several kinds of independent log-linear models. As a consequence, mutual information and redundancy can also be used to analyze contingency tables stochastically. Whereas the generalized likelihood ratio statistic to test the goodness-of-fit of the log-linear models is sensitive to the sample size, the redundancy for categorical data does not depend on sample size but depends on its cell probabilities. 相似文献

12.

Simultaneous confidence bands and regions for log-location-scale distributions with censored data

Luis A. Escobar Yili Hong William Q. Meeker 《Journal of statistical planning and inference》2009

In many areas of application, especially life testing and reliability, it is often of interest to estimate an unknown cumulative distribution (cdf). A simultaneous confidence band (SCB) of the cdf can be used to assess the statistical uncertainty of the estimated cdf over the entire range of the distribution. Cheng and Iles [1983. Confidence bands for cumulative distribution functions of continuous random variables. Technometrics 25 (1), 77–86] presented an approach to construct an SCB for the cdf of a continuous random variable. For the log-location-scale family of distributions, they gave explicit forms for the upper and lower boundaries of the SCB based on expected information. In this article, we extend the work of Cheng and Iles [1983. Confidence bands for cumulative distribution functions of continuous random variables. Technometrics 25 (1), 77–86] in several directions. We study the SCBs based on local information, expected information, and estimated expected information for both the “cdf method” and the “quantile method.” We also study the effects of exceptional cases where a simple SCB does not exist. We describe calibration of the bands to provide exact coverage for complete data and type II censoring and better approximate coverage for other kinds of censoring. We also discuss how to extend these procedures to regression analysis. 相似文献

13.

Estimating German overqualification with stochastic earnings frontiers

Uwe Jensen Hermann Gartner Susanne Rässler 《AStA Advances in Statistical Analysis》2010,94(1):33-51

We estimate individual potential income with stochastic earnings frontiers to measure overqualification as the ratio between actual income and potential income. To do this, we remove a drawback of the IAB employment sample, the censoring of the income data, by multiple imputation. The measurement of overqualification by the income ratio is also a valuable addition to the overeducation literature because the well-established objective or subjective overeducation measures focus on some ordinal matching aspects and ignore the metric income and efficiency aspects of overqualification. 相似文献

14.

中国城乡居民信息消费的半参数估计分析# 总被引：2，自引：0，他引：2

田凤平周先波林健《统计与信息论坛》2013,28(1):32-40

以中国1993-2008年30个省份的面板数据为研究对象,通过构建中国城镇和农村居民信息消费的半参数模型,对信息消费的恩格尔曲线、总消费支出和价格指数对信息消费支出份额的边际影响进行了非参数估计,并在此基础上分析比较中国城乡居民信息消费的差异性。实证结果表明：地区经济发展水平仅对农村信息消费支出有显著影响,而居民受教育水平、居民总消费支出以及价格水平均对城乡居民信息消费支出有显著影响;居民总消费支出和价格水平变量对城乡居民信息消费支出的边际影响均存在一个极大值水平;价格水平对农村居民信息消费支出的负效应要大于总消费支出对信息消费支出的正效用,而城镇居民信息消费中没有表现出这一特征。通过控制价格水平、增加农村居民收入、增加农村基础设施投入等方式,可以缩小城乡居民信息消费的差异。相似文献

15.

Accounting for uncertainty in health economic decision models by using model averaging

Christopher H. Jackson Simon G. Thompson Linda D. Sharples 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2009,172(2):383-404

Summary. Health economic decision models are subject to considerable uncertainty, much of which arises from choices between several plausible model structures, e.g. choices of covariates in a regression model. Such structural uncertainty is rarely accounted for formally in decision models but can be addressed by model averaging. We discuss the most common methods of averaging models and the principles underlying them. We apply them to a comparison of two surgical techniques for repairing abdominal aortic aneurysms. In model averaging, competing models are usually either weighted by using an asymptotically consistent model assessment criterion, such as the Bayesian information criterion, or a measure of predictive ability, such as Akaike's information criterion. We argue that the predictive approach is more suitable when modelling the complex underlying processes of interest in health economics, such as individual disease progression and response to treatment. 相似文献

16.

The equivalence of two approaches to incorporating variance uncertainty in sample size calculations for linear statistical models

Gwowen Shieh 《Journal of applied statistics》2017,44(1):40-56

Sample size determination is one of the most commonly encountered tasks in the design of every applied research. The general guideline suggests that a pilot study can offer plausible planning values for the vital model characteristics. This article examines two viable approaches to taking into account the imprecision of a variance estimate in sample size calculations for linear statistical models. The multiplier procedure employs an adjusted sample variance in the form of a multiple of the observed sample variance. The Bayesian method accommodates the uncertainty of a sample variance through a prior distribution. It is shown that the two seemingly distinct techniques are equivalent for sample size determination under the designated assurance requirements that the actual power exceeds the planned threshold with a given tolerance probability, or the expected power attains the desired level. The selection of optimum pilot sample size for minimizing the expected total cost is also considered. 相似文献

17.

Computation of Two- and Three-Dimensional Confidence Regions With the Likelihood Ratio

Adam Jaeger 《The American statistician》2013,67(4):395-398

The asymptotic results pertaining to the distribution of the log-likelihood ratio allow for the creation of a confidence region, which is a general extension of the confidence interval. Two- and three-dimensional regions can be displayed visually to describe the plausible region of the parameters of interest simultaneously. While most advanced statistical textbooks on inference discuss these asymptotic confidence regions, there is no exploration of how to numerically compute these regions for graphical purposes. This article demonstrates the application of a simple trigonometric transformation to compute two- and three-dimensional confidence regions; we transform the Cartesian coordinates of the parameters to create what we call the radial profile log-likelihood. The method is applicable to any distribution with a defined likelihood function, so it is not limited to specific data distributions or model paradigms. We describe the method along with the algorithm, follow with an example of our method, and end with an examination of computation time. Supplementary materials for this article are available online. 相似文献

18.

Post-randomization for controlling identification risk in releasing microdata from general surveys

Cheng Zhang Tapan K. Nayak 《Journal of applied statistics》2021,48(3):455

Before releasing survey data, statistical agencies usually perturb the original data to keep each survey unit''s information confidential. One significant concern in releasing survey microdata is identity disclosure, which occurs when an intruder correctly identifies the records of a survey unit by matching the values of some key (or pseudo-identifying) variables. We examine a recently developed post-randomization method for a strict control of identification risks in releasing survey microdata. While that procedure well preserves the observed frequencies and hence statistical estimates in case of simple random sampling, we show that in general surveys, it may induce considerable bias in commonly used survey-weighted estimators. We propose a modified procedure that better preserves weighted estimates. The procedure is illustrated and empirically assessed with an application to a publicly available US Census Bureau data set. 相似文献

19.

Relative Price Changes and Inequality in the Size Distribution of Various Components of Income

D. J. Slottje 《商业与经济统计学杂志》2013,31(1):19-26

This article uses a comprehensive model of economic inequality to examine the impact of relative price changes on inequality in the marginal distributions of various income components in which the marginal distributions are derived from a multidimensional joint distribution. The multidimensional joint distribution function is assumed to be a member of the Pearson Type VI family; that is, it is assumed to be a beta distribution of the second kind. The multidimensional joint distribution is so called because it is a joint distribution of components of income and expenditures on various commodity groups. Gini measures of inequality are devised from the marginal distributions of the various income components. The inequality measures are shown to depend on the parameters of the multidimensional joint distribution. It is then shown that the parameters of the multidimensional joint distribution depend on the relative prices of various commodity groups and several other specified exogenous variables. Thus, knowledge of how changes in relative prices affect the parameters of the multidimensional joint distribution is deductively equivalent to knowledge of how changes in relative prices affect inequality in the marginal distributions of various components of income. It is found that relative price changes have a statistically significant impact on inequality in various components of income. 相似文献

20.

ESTIMATION OF LLUCH'S EXTENDED LINEAR EXPENDITURE SYSTEM FROM CROSS-SECTIONAL DATA1

Alan Powell 《Australian & New Zealand Journal of Statistics》1973,15(2):111-117

Complete sets of demand relations may be fitted using varying types of sample information and varying a priori specifications. In this paper the identification and estimation of Lluch's extended linear expenditure system (ELES) from cross-sectional data alone is investigated. Under the most favourable conditions of data availability, all of the parameters of the ELES model are identified, and are estimable by the method of reduced form least squares. This is the case where observations on permanent income are available for the consuming units of the cross section and where, in addition, prices are recorded (even though they do not vary from one consuming unit to the next). Under the least favourable conditions only the marginal budget shares are identified. This corresponds to the case where no data on permanent income, or on savings, are available. The conventional ordinary least squares estimators of the marginal budget shares are, under these conditions, biased and inconsistent. Expressions are developed for the large-sample biases. 相似文献