首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
ABSTRACT

In economics and government statistics, aggregated data instead of individual level data are usually reported for data confidentiality and for simplicity. In this paper we develop a method of flexibly estimating the probability density function of the population using aggregated data obtained as group averages when individual level data are grouped according to quantile limits. The kernel density estimator has been commonly applied to such data without taking into account the data aggregation process and has been shown to perform poorly. Our method models the quantile function as an integral of the exponential of a spline function and deduces the density function from the quantile function. We match the aggregated data to their theoretical counterpart using least squares, and regularize the estimation by using the squared second derivatives of the density function as the penalty function. A computational algorithm is developed to implement the method. Application to simulated data and US household income survey data show that our penalized spline estimator can accurately recover the density function of the underlying population while the common use of kernel density estimation is severely biased. The method is applied to study the dynamic of China's urban income distribution using published interval aggregated data of 1985–2010.  相似文献   

2.
The objective of this paper is to present a method which can accommodate certain types of missing data by using the quasi-likelihood function for the complete data. This method can be useful when we can make first and second moment assumptions only; in addition, it can be helpful when the EM algorithm applied to the actual likelihood becomes overly complicated. First we derive a loss function for the observed data using an exponential family density which has the same mean and variance structure of the complete data. This loss function is the counterpart of the quasi-deviance for the observed data. Then the loss function is minimized using the EM algorithm. The use of the EM algorithm guarantees a decrease in the loss function at every iteration. When the observed data can be expressed as a deterministic linear transformation of the complete data, or when data are missing completely at random, the proposed method yields consistent estimators. Examples are given for overdispersed polytomous data, linear random effects models, and linear regression with missing covariates. Simulation results for the linear regression model with missing covariates show that the proposed estimates are more efficient than estimates based on completely observed units, even when outcomes are bimodal or skewed.  相似文献   

3.
This article proposes a discriminant function and an algorithm to analyze the data addressing the situation, where the data are positively skewed. The performance of the suggested algorithm based on the suggested discriminant function (LNDF) has been compared with the conventional linear discriminant function (LDF) and quadratic discriminant function (QDF) as well as with the nonparametric support vector machine (SVM) and the Random Forests (RFs) classifiers, using real and simulated datasets. A maximum reduction of approximately 81% in the error rates as compared to QDF for ten-variate data was noted. The overall results are indicative of better performance of the proposed discriminant function under certain circumstances.  相似文献   

4.
ABSTRACT. In this paper we consider logspline density estimation for data that may be left-truncated or right-censored. For randomly left-truncated and right-censored data the product-limit estimator is known to be a consistent estimator of the survivor function, having a faster rate of convergence than many density estimators. The product-limit estimator and B-splines are used to construct the logspline density estimate for possibly censored or truncated data. Rates of convergence are established when the log-density function is assumed to be in a Besov space. An algorithm involving a procedure similar to maximum likelihood, stepwise knot addition, and stepwise knot deletion is proposed for the estimation of the density function based upon sample data. Numerical examples are used to show the finite-sample performance of inference based on the logspline density estimation.  相似文献   

5.
We consider the problem of estimation of a density function in the presence of incomplete data and study the Hellinger distance between our proposed estimators and the true density function. Here, the presence of incomplete data is handled by utilizing a Horvitz–Thompson-type inverse weighting approach, where the weights are the estimates of the unknown selection probabilities. We also address the problem of estimation of a regression function with incomplete data.  相似文献   

6.
ABSTRACT. In this paper we consider logspline density estimation for random variables which are contaminated with random noise. In the logspline density estimation for data without noise, the logarithm of an unknown density function is estimated by a polynomial spline, the unknown parameters of which are given by maximum likelihood. When noise is present, B-splines and the Fourier inversion formula are used to construct the logspline density estimator of the unknown density function. Rates of convergence are established when the log-density function is assumed to be in a Besov space. It is shown that convergence rates depend on the smoothness of the density function and the decay rate of the characteristic function of the noise. Simulated data are used to show the finite-sample performance of inference based on the logspline density estimation.  相似文献   

7.
The problem of updating a discriminant function on the basis of data of unknown origin is studied. There are observations of known origin from each of the underlying populations, and subsequently there is available a limited number of unclassified observations assumed to have been drawn from a mixture of the underlying populations. A sample discriminant function can be formed initially from the classified data. The question of whether the subsequent updating of this discriminant function on the basis of the unclassified data produces a reduction in the error rate of sufficient magnitude to warrant the computational effort is considered by carrying out a series of Monte Carlo experiments. The simulation results are contrasted with available asymptotic results.  相似文献   

8.
Joint likelihood approaches have been widely used to handle survival data with time-dependent covariates. In construction of the joint likelihood function for the accelerated failure time (AFT) model, the unspecified baseline hazard function is assumed to be a piecewise constant function in the literature. However, there are usually no close form formulas for the regression parameters, which require numerical methods in the EM iterations. The nonsmooth step function assumption leads to very spiky likelihood function which is very hard to find the globe maximum. Besides, due to nonsmoothness of the likelihood function, direct search methods are conducted for the maximization which are very inefficient and time consuming. To overcome the two disadvantages, we propose a kernel smooth pseudo-likelihood function to replace the nonsmooth step function assumption. The performance of the proposed method is evaluated by simulation studies. A case study of reproductive egg-laying data is provided to demonstrate the usefulness of the new approach.  相似文献   

9.
The main contribution of this paper is is updating a nonlinear discriminant function on the basis of data of unknown origin. Specifically a procedure is developed for updating the nonlinear discriminant function on the basis of two Burr Type III distributions (TBIIID) when the additional observations are mixed or classified. First the nonlinear discriminant function of the assumed model is obtained. Then the total probabilities of misclassification are calculated. In addition a Monte carlo simulation runs are used to compute the relative efficiencies in order to investigate the performance of the developed updating procedures. Finally the results obtained in this paper are illustrated through a real and simulated data set.  相似文献   

10.
Time dependent association measures between variables are of interest in bivariate survival data. Several such measures have been proposed in literature for the modelling and analysis of survival data. In this paper, we introduce a new measure of association for bivariate survival data using product moment residual life function and mean residual life function. Various properties of the proposed measure and its relationship with existing measures are discussed. We also develop a non-parametric estimator of the measure and study its asymptotic properties. The application of the result is illustrated using a real life data. Finally, a stimulation study is carried out to assess the performance of the estimator.  相似文献   

11.
In some observational studies, we have random censoring model. However, the data available may be partially observable censored data consisting of the observed failure times and only those nonfailure times which are subject to follow-up. Suzuki (1985) discussed the problem of nonparametric estimation of the survival function from such partially observable censored data. In this article, we derive a nonparametric Bayes estimator of the survival function for such data of failures and follow-ups under a Dirichlet process prior and squared error loss. The limiting properties such as the mean square consistency, weak convergence and strong consistency of the Bayes estimator are studied. Finally, the procedures developed are illustrated by means of an example.  相似文献   

12.
Survival data obtained from prevalent cohort study designs are often subject to length-biased sampling. Frequentist methods including estimating equation approaches, as well as full likelihood methods, are available for assessing covariate effects on survival from such data. Bayesian methods allow a perspective of probability interpretation for the parameters of interest, and may easily provide the predictive distribution for future observations while incorporating weak prior knowledge on the baseline hazard function. There is lack of Bayesian methods for analyzing length-biased data. In this paper, we propose Bayesian methods for analyzing length-biased data under a proportional hazards model. The prior distribution for the cumulative hazard function is specified semiparametrically using I-Splines. Bayesian conditional and full likelihood approaches are developed for analyzing simulated and real data.  相似文献   

13.
This paper focusses on computing the Bayesian reliability of components whose performance characteristics (degradation – fatigue and cracks) are observed during a specified period of time. Depending upon the nature of degradation data collected, we fit a monotone increasing or decreasing function for the data. Since the components are supposed to have different lifetimes, the rate of degradation is assumed to be a random variable. At a critical level of degradation, the time to failure distribution is obtained. The exponential and power degradation models are studied and exponential density function is assumed for the random variable representing the rate of degradation. The maximum likelihood estimator and Bayesian estimator of the parameter of exponential density function, predictive distribution, hierarchical Bayes approach and robustness of the posterior mean are presented. The Gibbs sampling algorithm is used to obtain the Bayesian estimates of the parameter. Illustrations are provided for the train wheel degradation data.  相似文献   

14.
Case-control family data are now widely used to examine the role of gene-environment interactions in the etiology of complex diseases. In these types of studies, exposure levels are obtained retrospectively and, frequently, information on most risk factors of interest is available on the probands but not on their relatives. In this work we consider correlated failure time data arising from population-based case-control family studies with missing genotypes of relatives. We present a new method for estimating the age-dependent marginalized hazard function. The proposed technique has two major advantages: (1) it is based on the pseudo full likelihood function rather than a pseudo composite likelihood function, which usually suffers from substantial efficiency loss; (2) the cumulative baseline hazard function is estimated using a two-stage estimator instead of an iterative process. We assess the performance of the proposed methodology with simulation studies, and illustrate its utility on a real data example.  相似文献   

15.
Flexible Class of Skew-Symmetric Distributions   总被引:2,自引:0,他引:2  
Abstract.  We propose a flexible class of skew-symmetric distributions for which the probability density function has the form of a product of a symmetric density and a skewing function. By constructing an enumerable dense subset of skewing functions on a compact set, we are able to consider a family of distributions, which can capture skewness, heavy tails and multimodality systematically. We present three illustrative examples for the fibreglass data, the simulated data from a mixture of two normal distributions and the Swiss bills data.  相似文献   

16.
In this paper we study a class of goodness-of-fit tests for the symmetric stable distribution. The proposed tests are based on a weighted integral involving the empirical characteristic function, corresponding to suitably centered data. The consistency of the tests is investigated under a moment assumption. Also, as the decay of the weight function tends to infinity, limit statistics are obtained. Our method is applied on real and simulated data.  相似文献   

17.
王芝皓等 《统计研究》2021,38(7):127-139
在实际数据分析中经常会遇到零膨胀计数数据作为响应变量与函数型随机变量和随机向量作为预测变量相关联。本文考虑函数型部分变系数零膨胀模型 (FPVCZIM),模型中无穷维的斜率函数用函数型主成分基逼近,系数函数用B-样条进行拟合。通过EM 算法得到估计量,讨论其理论性质,在一些正则条件下获得了斜率函数和系数函数估计量的收敛速度。有限样本的Monte Carlo 模拟研究和真实数据分析被用来解释本文提出的方法。  相似文献   

18.
Censored survival data are analysed by regression models which require some assumptions on the waycovariates affect the hazard function. Praportional Hazards (PH) and Accelerated Failure Times (AFT) are the hypothese most often used in practice. A method is introduced here for testing the PH and the AFT hypotheses against a general model for the hazard function. Simulated and real data arepresented to show the usefulness of the method.  相似文献   

19.
以数据挖掘的思想,提出了利用Bemstein基构建一般函数数据的方法。在此基础上,根据中国31个省(自治区、直辖市)城镇居民的人均年收入和消费性支出的数据,构建了消费函数数据,并进行误差分析,求出消费函数的一阶和二阶导数,进一步挖掘消费函数的发展速率,取得良好的效果。  相似文献   

20.
We develop functional data analysis techniques using the differential geometry of a manifold of smooth elastic functions on an interval in which the functions are represented by a log-speed function and an angle function. The manifold's geometry provides a method for computing a sample mean function and principal components on tangent spaces. Using tangent principal component analysis, we estimate probability models for functional data and apply them to functional analysis of variance, discriminant analysis, and clustering. We demonstrate these tasks using a collection of growth curves from children from ages 1–18.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号