本文以个人潜在医疗需求为潜变量,以表示患病情况的有序离散变量为被解释变量,建立了有序probit回归模型,日的在于考察年龄、收入、性别、教育等主要个人人口特征和经济社会地位变量对医疗需求的影响.本文实证分析采用的是中国健康与营养调查2004年的微观调奁数据,采用半参数方法对模型进行了估计.估计结果显示:年龄、性别、婚姻状态、居住在农村、收入水平、教育水平对个人的医疗需求有不同程度的影响.实证分析结果对于预测我国医疗费用的发展趋势和缓和医疗服务中的不平等问题都有参考价值.  相似文献   

农民工子女教育问题本质上是"三农"问题在教育领域的延伸。农民工子女是教育弱势群体,而社会援助能有效解决农民工子女教育面临社会支持不足的问题。运用双重差分模型和三重差分模型评估获得社会援助对农民工子女学业成绩产生的影响,研究结果表明:获得社会援助有助于农民工子女的学业成绩提高;成绩差的农民工子女的社会援助效果更好;心理援助和物质援助相结合对农民工子女学业成绩的促进作用更大,尤其是对成绩差的农民工子女。  相似文献   

文章利用湖南柑橘农户调查数据,采用有序Logistic模型,基于农户禀赋视角,实证分析了农户风险可控制感与各类农户禀赋因素对柑橘农户自然风险认知的影响。结果表明:柑橘农户有较高自然风险认知水平,农户风险可控制感是影响农户柑橘自然风险认知的重要因素,且农户风险认知程度与农户自身禀赋状况密切相关;农户受教育程度、农户子女受教育程度等农户人力资本禀赋、地理环境禀赋、实物资本禀赋状况均对农户自然风险感知状况影响显著。其中,地理环境禀赋对自然风险认知状况有显著负影响,农户资本禀赋对自然风险认知状况有显著正影响,资本投入越多自然风险认知越强;柑橘农户性别、年龄、产业化组织程度不是农户自然风险的主要影响因素。  相似文献   

以贝克尔生育经济学理论为基础,结合中国人口实际发展情况,认为家庭的平均受教育水平在较低和较高的情况下,都会选择多生子女,以达到人口的效用最大化。将其扩展到宏观层面,受教育水平对人口出生率的影响不应是单纯的线性关系。基于2003—2017年中国大陆省级面板数据,利用空间计量模型,实证分析发现人均受教育年限对人口出生率的影响呈"U"型关系,并且存在空间溢出效应,同时还发现高等教育普及率对本区域和周边地区人口出生率均有显著的正向影响。因此,教育进步不仅是提高人口素质的重要手段,更能起到人口数量调整的作用,教育事业的进步对人口发展具有重要意义。  相似文献   

多分类数据分析在实证研究中具有重要意义。然而,由于高维数、小样本及低信噪比等原因,现有的多分类方法仍面临信息量不足而导致的效果不佳问题。为此,学者们通过收集更多信息源 数据以更全面地刻画实际问题。不同于收集相同自变量的不同源样本,目前较为流行的多源数据收集了相同样本的不同源自变量,它们的独立性和相关性为统计建模带来了新的挑战。本文提出基于典型变量回归的多分类纵向整合分析方法,其中利用惩罚技术实现变量选择,并独特地考虑不同源数据间的关联结构,提出高效的ADMM算法进行模型优化。数值模拟结果表明,该方法在变量选择和分类预测 上均具有优越性。基于我国上证50的多源股票数据,利用该方法对2019年股票日收益率的影响因素进行了实证探究。研究表明,本文提出的多分类整合分析在筛选出具有解释意义变量的同时具有更好的预测效果。  相似文献   

基于TPB理论和江西省17个县262名返乡创业农民工的有效样本数据,利用因子分析和结构方程模型,从主观规范、创业态度、感知行为控制三个变量来探讨其对返乡农民工创业意愿的影响。实证研究发现:中国农民工创业正处于生存型创业和机会型创业并立状态,并有逐步向机会型创业过渡的趋势;农民工在形成创业意愿时更加重视家庭成员的意见。  相似文献   

对于一类变量非线性相关的面板数据,现有的基于线性算法的面板数据聚类方法并不能准确地度量样本间的相似性,且聚类结果的可解释性低。综合考虑变量非线性相关问题及聚类结果可解释性问题,提出一种非线性面板数据的聚类方法,通过非线性核主成分算法实现对样本相似性的测度,并基于混合高斯模型进行样本概率聚类,实证表明该方法的有效性及其对聚类结果的可解释性有所提高。  相似文献   

需水预测对于有效的水资源管理有重要的作用。文章引入随机森林方法对需水预测问题进行了实证研究。实验结果表明,随机森林方法不会受到训练集中异常值的影响而出现过度拟合的情况,模型稳健性较高。在地区需水量的各解释变量中,地区人口和灌溉面积的影响较为重要。文章的结论和方法有助于管理部门更有效的进行需水管理。  相似文献   

加速失效时间模型是一种应用广泛的生存分析模型。本文借助LASSO惩罚剔除冗余预测变量,构建基于核机器的加速失效时间模型,用以刻画预测变量与生存期间的复杂关系。此外,提出一种新的正则化Garrotized核机器估计方法,可以较好地刻画预测变量与生存期潜在的非线性关系,实现非参数分量中预测变量间交互作用的自动建模,提升模型预测精度。模拟研究表明,与已有的代表性方法相比,本文提出的方法对生存期的预测精度更高,特别是在复杂关系情形下优势更为显著。最后,将该方法应用于胃癌数据分析,利用临床信息和基因表达预测生存期和风险评分。实证结果显示,该方法能为病例基于风险分层的临床精准诊疗方案设计提供有益的参考。  相似文献   

近年来,农民外出务工已成为农民收入增长的一个重要途径。他们作为一个特殊群体,影响其收入的因素既具有居民收入的一般规律,又具有其特殊性。本文试图以传统人力资本理论和收入均等化理论来分析农民工收入影响因素。但鉴于我国劳动力转移的实际情况和调查资料的指标设计,又不能局限于上述理论,而是应当以人力资本作为农民工收入重要影响因素,引进最低工资标准、就业时间、农民工来源地的人均GDP等变量,利用农民外出务工(农民工)收入及相关指标调查的截面样本数据,建立多元回归模型,对影响农民工收入的因素进行实证分析。数据来源、变量选…  相似文献   

时间序列自回归AR模型在建模过程中易受离群值的影响,导致计算结果与实际不相符。针对这一现象,运用FQn统计量对传统自相关函数进行改进,构建出自回归AR模型的稳健估计算法,以克服离群值的影响,并对此方法进行了模拟和实证分析。模拟和实证分析均表明:当时序数据中不存在离群值时,传统估计方法与稳健估计方法得到的结果基本保持一致;当数据中存在离群值时,运用传统估计方法得到的结果出现较大变化,而运用稳健估计方法得到的结果基本不变.这说明相对于传统估计方法,稳健估计方法能有效抵抗离群值的影响,具有良好的抗干扰性和高抗差性。  相似文献   

由于传统因子分析方法对离群值较敏感,导致计算结果与实际不相符。针对这一现象,本文运用FAST-MCD方法对传统因子分析方法进行改进,构建出因子分析的稳健算法,以克服离群值的影响,并对此方法进行了模拟和实证分析。模拟和实证分析结果均表明:因子旋转前后,当数据中不存在离群值时,传统因子分析与稳健因子分析得到的结果基本保持一致;当数据中存在离群值时,运用传统因子分析得到的结果出现较大变化,而运用稳健因子分析方法得到的结果基本不变,这说明相对于传统因子分析方法,稳健因子分析方法能有效抵抗离群值的影响,具有良好的抗干扰性和高抗差性。  相似文献   

条件分位数回归(conditional quantile regression,CQR)方法已成为经济学实证研究的常用方法之一。由于CQR结果的经济学阐释基于过多甚至是不必要的控制变量,这与人们所关心的问题有可能并不一致。例如,在劳动经济学对教育回报的研究中,无论个体的年龄,性别与家庭特征如何,教育程度对于个人收入的异质性影响是人们关注的重点,即人们想了解收入关于教育程度的无条件分位数估计。本文旨在介绍近年来发展起来的无条件分位数回归(unconditional quantile regression,UQR)技术并梳理相关文献。特别地,本文介绍三种重要的无条件分位数回归模型:Firpo, Fortin和Lemieux(2009)提出的的再中心化影响函数(recentered influence function, RIF) 回归,Frolich和Melly(2010)提出的无条件分位数处理效应模型与Powell(2010)提出的一般无条件分位数回归。另外,论文还运用一个研究居民收入分配格局变化对其医疗支出影响的实例详细说明了新方法的应用。  相似文献   

On some data oriented robust estimation procedures for means   总被引:3,自引:0,他引:3  
Data oriented to estimate means is very important for large data sets. Since outliers usually occur, the trimmed mean is a robust estimator of locations. After building a reasonable linear model to explain the relationship between the suitably transformed symmetric data and the approximately standardized normal statistics, we find the trimmed proportion based on the smallest variance of trimmed means. The related statistical inference is also discussed. An empirical study based on an annual survey about inbound visitors in the Taiwan area is used to achieve our goal in deciding the trimmed proportion. In this study, we propose a complete procedure to attain the goal.  相似文献   

Univariate time series often take the form of a collection of curves observed sequentially over time. Examples of these include hourly ground-level ozone concentration curves. These curves can be viewed as a time series of functions observed at equally spaced intervals over a dense grid. Since functional time series may contain various types of outliers, we introduce a robust functional time series forecasting method to down-weigh the influence of outliers in forecasting. Through a robust principal component analysis based on projection pursuit, a time series of functions can be decomposed into a set of robust dynamic functional principal components and their associated scores. Conditioning on the estimated functional principal components, the crux of the curve-forecasting problem lies in modelling and forecasting principal component scores, through a robust vector autoregressive forecasting method. Via a simulation study and an empirical study on forecasting ground-level ozone concentration, the robust method demonstrates the superior forecast accuracy that dynamic functional principal component regression entails. The robust method also shows the superior estimation accuracy of the parameters in the vector autoregressive models for modelling and forecasting principal component scores, and thus improves curve forecast accuracy.  相似文献   

传统的多元统计分析方法,如主成分分析方法和因子分析方法等的共同点是计算样本的均值向量和协方差矩阵,并在这两者的基础上计算其他统计量。当样本数据中没有离群值时,这些方法都能得到优良的结果。但是当样本数据中包括离群值时,计算结果就会很容易受到这些离群值的影响,这是因为传统的均值向量和协方差矩阵都不是稳健的统计量。本文对目前较流行的FAST-MCD方法的算法进行研究,构造了稳健的均值向量和稳健的协方差矩阵,应用到主成分分析中,并针对其不足之处提出改进方法。从模拟和实证的结果来看,改进后的的方法和新的稳健估计量确实能够对离群值起到很好的抵抗作用,大幅度地降低它们对计算结果的影响。  相似文献   

In this paper, a new method for robust principal component analysis (PCA) is proposed. PCA is a widely used tool for dimension reduction without substantial loss of information. However, the classical PCA is vulnerable to outliers due to its dependence on the empirical covariance matrix. To avoid such weakness, several alternative approaches based on robust scatter matrix were suggested. A popular choice is ROBPCA that combines projection pursuit ideas with robust covariance estimation via variance maximization criterion. Our approach is based on the fact that PCA can be formulated as a regression-type optimization problem, which is the main difference from the previous approaches. The proposed robust PCA is derived by substituting square loss function with a robust penalty function, Huber loss function. A practical algorithm is proposed in order to implement an optimization computation, and furthermore, convergence properties of the algorithm are investigated. Results from a simulation study and a real data example demonstrate the promising empirical properties of the proposed method.  相似文献   

The presence of outliers would inevitably lead to distorted analysis and inappropriate prediction, especially for multiple outliers in high-dimensional regression, where the high dimensionality of the data might amplify the chance of an observation or multiple observations being outlying. Noting that the detection of outliers is not only necessary but also important in high-dimensional regression analysis, we, in this paper, propose a feasible outlier detection approach in sparse high-dimensional linear regression model. Firstly, we search a clean subset by use of the sure independence screening method and the least trimmed square regression estimates. Then, we define a high-dimensional outlier detection measure and propose a multiple outliers detection approach through multiple testing procedures. In addition, to enhance efficiency, we refine the outlier detection rule after obtaining a relatively reliable non-outlier subset based on the initial detection approach. By comparison studies based on Monte Carlo simulation, it is shown that the proposed method performs well for detecting multiple outliers in sparse high-dimensional linear regression model. We further illustrate the application of the proposed method by empirical analysis of a real-life protein and gene expression data.  相似文献   

We propose a robust rank-based estimation and variable selection in double generalized linear models when the number of parameters diverges with the sample size. The consistency of the variable selection procedure and asymptotic properties of the resulting estimators are established under appropriate selection of tuning parameters. Simulations are performed to assess the finite sample performance of the proposed estimation and variable selection procedure. In the presence of gross outliers, the proposed method is showing that the variable selection method works better. For practical application, a real data application is provided using nutritional epidemiology data, in which we explore the relationship between plasma beta-carotene levels and personal characteristics (e.g. age, gender, fat, etc.) as well as dietary factors (e.g. smoking status, intake of cholesterol, etc.).  相似文献   

The presence of outliers in the data sets affects the structure of multicollinearity which arises from a high degree of correlation between explanatory variables in a linear regression analysis. This affect could be seen as an increase or decrease in the diagnostics used to determine multicollinearity. Thus, the cases of outliers reduce the reliability of diagnostics such as variance inflation factors, condition numbers and variance decomposition proportions. In this study, we propose to use a robust estimation of the correlation matrix obtained by the minimum covariance determinant method to determine the diagnostics of multicollinearity in the presence of outliers. As a result, the present paper demonstrates that the diagnostics of multicollinearity obtained by the robust estimation of the correlation matrix are more reliable in the presence of outliers.  相似文献   

