首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 797 毫秒
1.
Grouped data are commonly encountered in applications. All data from a continuous population are grouped due to rounding of the individual observations. The Bernstein polynomial model is proposed as an approximate model in this paper for estimating a univariate density function based on grouped data. The coefficients of the Bernstein polynomial, as the mixture proportions of beta distributions, can be estimated using an EM algorithm. The optimal degree of the Bernstein polynomial can be determined using a change-point estimation method. The rate of convergence of the proposed density estimate to the true density is proved to be almost parametric by an acceptance–rejection argument used for generating random numbers. The proposed method is compared with some existing methods in a simulation study and is applied to the Chicken Embryo Data.  相似文献   

2.
Interval-censored data are very common in the reliability and lifetime data analysis. This paper investigates the performance of different estimation procedures for a special type of interval-censored data, i.e. grouped data, from three widely used lifetime distributions. The approaches considered here include the maximum likelihood estimation, the minimum distance estimation based on chi-square criterion, the moment estimation based on imputation (IM) method and an ad hoc estimation procedure. Although IM-based techniques are extensively used recently, we show that this method is not always effective. It is found that the ad hoc estimation procedure is equivalent to the minimum distance estimation with another distance metric and more effective in the simulation. The procedures of different approaches are presented and their performances are investigated by Monte Carlo simulation for various combinations of sample sizes and parameter settings. The numerical results provide guidelines to analyse grouped data for practitioners when they need to choose a good estimation approach.  相似文献   

3.
In this study, testing the equality of mean vectors in a one-way multivariate analysis of variance (MANOVA) is considered when each dataset has a monotone pattern of missing observations. The likelihood ratio test (LRT) statistic in a one-way MANOVA with monotone missing data is given. Furthermore, the modified test (MT) statistic based on likelihood ratio (LR) and the modified LRT (MLRT) statistic with monotone missing data are proposed using the decomposition of the LR and an asymptotic expansion for each decomposed LR. The accuracy of the approximation for the Chi-square distribution is investigated using a Monte Carlo simulation. Finally, an example is given to illustrate the methods.  相似文献   

4.
An improved estimator to analyse missing data   总被引:1,自引:0,他引:1  
Missing data due to nonresponse, though undesirable, is a reality of any survey. In this paper we consider a situation in which, at a given time, observations are missing for one of the several auxiliary characteristics; thus the ‘missing’ phenomenon occurs for the characteristics separately but not simultaneously. A new method, making use of all the available observations, is proposed. A simulation study based on three real populations was performed to test the proposed technique.  相似文献   

5.
In sample surveys and many other areas of application, the ratio of variables is often of great importance. This often occurs when one variable is available at the population level while another variable of interest is available for sample data only. In this case, using the sample ratio, we can often gather valuable information on the variable of interest for the unsampled observations. In many other studies, the ratio itself is of interest, for example when estimating proportions from a random number of observations. In this note we compare three confidence intervals for the population ratio: A large sample interval, a log based version of the large sample interval, and Fieller’s interval. This is done through data analysis and through a small simulation experiment. The Fieller method has often been proposed as a superior interval for small sample sizes. We show through a data example and simulation experiments that Fieller’s method often gives nonsensical and uninformative intervals when the observations are noisy relative to the mean of the data. The large sample interval does not similarly suffer and thus can be a more reliable method for small and large samples.  相似文献   

6.
Density estimation for pre-binned data is challenging due to the loss of exact position information of the original observations. Traditional kernel density estimation methods cannot be applied when data are pre-binned in unequally spaced bins or when one or more bins are semi-infinite intervals. We propose a novel density estimation approach using the generalized lambda distribution (GLD) for data that have been pre-binned over a sequence of consecutive bins. This method enjoys the high power of the parametric model and the great shape flexibility of the GLD. The performances of the proposed estimators are benchmarked via simulation studies. Both simulation results and a real data application show that the proposed density estimators work well for data of moderate or large sizes.  相似文献   

7.
The problem of estimating population parameters based upon grouped data is considered and several alternative estimation schemes such as the method of scoring, least lines, least squares, minimum chi square, and a method of approximating method of moments and maximum likelihood estimators are considered. These estimators are compared with maximum likelihood and method of moments estimators based upon individual observations using a Monte Carlo study where the parent population is characterized by a gamma distribution. An application of these techniques to fitting a gamma distribution to 1970-74 census income data is considered.  相似文献   

8.
Data are often collected in histogram form, especially in the context of computer simulation. While requiring less memory and computation than saving all observations, the grouping of observations in the histogram cells complicates statistical estimation of parameters of interest. In this paper the mean and variance of the cell midpoint estimator of the pth quantile are analyzed in terms of distribution, cell width, and sample size. Three idiosyncrasies of using cell midpoints to estimate quantiles are illustrated. The results tend to run counter to previously published results for grouped data.  相似文献   

9.
When computing the disparity of a metric variable we frequently have to deal with grouped data. It has been generally assumed that the sums of the values in each class are given. Dropping this assumption we usually resort to working with the class mark as the representative value in each class. This paper presents three approaches to the computation of the bounds of the Gini index from grouped data with incomplete information of different degree. Numerical results based on income distributions of the Federal Republic of Germany demonstrate the effects of different degrees of information on a frequency distribution and, consequently, the problems associated with comparing the disparity of various frequency distributions.  相似文献   

10.
Techniques for testing hypotheses about parameters in the regression models under the situation of grouped data are provided. A test statistic similar to conventional F statistic is considered. A simulation study performed for a few cases shows that the proposed statistic has an approximate F distribution and is useful in applications.  相似文献   

11.
Abstract

Weibull mixture models are widely used in a variety of fields for modeling phenomena caused by heterogeneous sources. We focus on circumstances in which original observations are not available, and instead the data comes in the form of a grouping of the original observations. We illustrate EM algorithm for fitting Weibull mixture models for grouped data and propose a bootstrap likelihood ratio test (LRT) for determining the number of subpopulations in a mixture model. The effectiveness of the LRT methods are investigated via simulation. We illustrate the utility of these methods by applying them to two grouped data applications.  相似文献   

12.
A class of asymptotically nonparametric test with contains a test proposed by Wei(1980), is considered for testing the equality of two continuous distribution funcitons when paired observations are subject to arbitrary right censorship. It is shown that under the null hypothesis each test statistic converges in distribution to the standard normal random variable. Furthermore. the Monte Carlo simulation results indicate that some tests in this class are more powerful than Wei's test. A generalization to incomplete censored paired data is also included.  相似文献   

13.
Grouped data are frequently used in several fields of study. In this work, we use the expectation-maximization (EM) algorithm for fitting the skew-normal (SN) mixture model to the grouped data. Implementing the EM algorithm requires computing the one-dimensional integrals for each group or class. Our simulation study and real data analyses reveal that the EM algorithm not only always converges but also can be implemented in just a few seconds even when the number of components is large, contrary to the Bayesian paradigm that is computationally expensive. The accuracy of the EM algorithm and superiority of the SN mixture model over the traditional normal mixture model in modelling grouped data are demonstrated through the simulation and three real data illustrations. For implementing the EM algorithm, we use the package called ForestFit developed for R environment available at https://cran.r-project.org/web/packages/ForestFit/index.html.  相似文献   

14.
This paper describes a wavelet method for the estimation of density and hazard rate functions from randomly right-censored data. We adopt a nonparametric approach in assuming that the density and hazard rate have no specific parametric form. The method is based on dividing the time axis into a dyadic number of intervals and then counting the number of events within each interval. The number of events and the survival function of the observations are then separately smoothed over time via linear wavelet smoothers, and then the hazard rate function estimators are obtained by taking the ratio. We prove that the estimators have pointwise and global mean-square consistency, obtain the best possible asymptotic mean integrated squared error convergence rate and are also asymptotically normally distributed. We also describe simulation experiments that show that these estimators are reasonably reliable in practice. The method is illustrated with two real examples. The first uses survival time data for patients with liver metastases from a colorectal primary tumour without other distant metastases. The second is concerned with times of unemployment for women and the wavelet estimate, through its flexibility, provides a new and interesting interpretation.  相似文献   

15.
In practice, the presence of influential observations may lead to misleading results in variable screening problems. We, therefore, propose a robust variable screening procedure for high-dimensional data analysis in this paper. Our method consists of two steps. The first step is to define a new high-dimensional influence measure and propose a novel influence diagnostic procedure to remove those unusual observations. The second step is to utilize the sure independence screening procedure based on distance correlation to select important variables in high-dimensional regression analysis. The new influence measure and diagnostic procedure that we developed are model free. To confirm the effectiveness of the proposed method, we conduct simulation studies and a real-life data analysis to illustrate the merits of the proposed approach over some competing methods. Both the simulation results and the real-life data analysis demonstrate that the proposed method can greatly control the adverse effect after detecting and removing those unusual observations, and performs better than the competing methods.  相似文献   

16.
Pharmacokinetic (PK) data often contain concentration measurements below the quantification limit (BQL). While specific values cannot be assigned to these observations, nevertheless these observed BQL data are informative and generally known to be lower than the lower limit of quantification (LLQ). Setting BQLs as missing data violates the usual missing at random (MAR) assumption applied to the statistical methods, and therefore leads to biased or less precise parameter estimation. By definition, these data lie within the interval [0, LLQ], and can be considered as censored observations. Statistical methods that handle censored data, such as maximum likelihood and Bayesian methods, are thus useful in the modelling of such data sets. The main aim of this work was to investigate the impact of the amount of BQL observations on the bias and precision of parameter estimates in population PK models (non‐linear mixed effects models in general) under maximum likelihood method as implemented in SAS and NONMEM, and a Bayesian approach using Markov chain Monte Carlo (MCMC) as applied in WinBUGS. A second aim was to compare these different methods in dealing with BQL or censored data in a practical situation. The evaluation was illustrated by simulation based on a simple PK model, where a number of data sets were simulated from a one‐compartment first‐order elimination PK model. Several quantification limits were applied to each of the simulated data to generate data sets with certain amounts of BQL data. The average percentage of BQL ranged from 25% to 75%. Their influence on the bias and precision of all population PK model parameters such as clearance and volume distribution under each estimation approach was explored and compared. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

17.
In socioeconomic areas, functional observations may be collected with weights, called weighted functional data. In this paper, we deal with a general linear hypothesis testing (GLHT) problem in the framework of functional analysis of variance with weighted functional data. With weights taken into account, we obtain unbiased and consistent estimators of the group mean and covariance functions. For the GLHT problem, we obtain a pointwise F-test statistic and build two global tests, respectively, via integrating the pointwise F-test statistic or taking its supremum over an interval of interest. The asymptotic distributions of test statistics under the null and some local alternatives are derived. Methods for approximating their null distributions are discussed. An application of the proposed methods to density function data is also presented. Intensive simulation studies and two real data examples show that the proposed tests outperform the existing competitors substantially in terms of size control and power.  相似文献   

18.
In this paper, we consider modifying the score statistic proposed by Prentice and Gloeckler [Prentice, R. L., & Gloeckler, L. A. (1978). Regression analysis of grouped data with applications to breast cancer data. Biometrics, 34, 57–67] for the grouped data under the proportional hazards model. For this matter, we apply the likelihood method and derive the scores without re-parameterization as a discrete model. Then we illustrate the test with an example and compare the efficiency with the test of Prentice and Gloeckler’s statistic by obtaining empirical powers through simulation study. Also we discuss some possible extension and estimated variances of the score statistic as concluding remarks.  相似文献   

19.
Cramér–von Mises type goodness of fit tests for interval censored data case 2 are proposed based on a resampling method called the leveraged bootstrap, and their asymptotic consistency is shown. The proposed tests are computationally efficient, and in fact can be applied to other types of censored data, including right censored data, doubly censored data and (mixture of) case k interval censored data. Some simulation results and an example from AIDS research are presented.  相似文献   

20.
对二项分布比例参数p的似然比置信区间,提出一种简便求解方法。在平均覆盖率、平均区间长度及区间长度的95%置信区间准则下与WScore、Plus4、Jeffreys置信区间进行模拟比较。试验表明,在二项分布b(n,p)的参数n≥20且p∈(0.1,0.9)时,该方法获取的似然比置信区间性能优良。当点估计p值不是接近于0或1且n≥20时,推荐使用本方法获取p的置信区间。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号