首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
方匡南  赵梦峦 《统计研究》2018,35(12):92-101
随着信息技术的发展,数据来源越来越多,一方面可以更加精准、科学地刻画个人信用状况,但另一方面,由于数据来源多、结构复杂等问题,对传统的征信技术带来了挑战。本文提出了基于多源数据融合的个人信用模型,可以同时对多个数据集进行建模和变量选择,同时考虑了数据集间的相似性和异质性。通过模拟实验发现,本文所提出的整合模型在变量选择和分类效果方面都具有明显的优势。最后,将整合模型应用于城市和农村两个数据集的个人信用评分中。  相似文献   

2.
In this article, we consider shared frailty model with inverse Gaussian distribution as frailty distribution and log-logistic distribution (LLD) as baseline distribution for bivariate survival times. We fit this model to three real-life bivariate survival data sets. The problem of analyzing and estimating parameters of shared inverse Gaussian frailty is the interest of this article and then compare the results with shared gamma frailty model under the same baseline for considered three data sets. Data are analyzed using Bayesian approach to the analysis of clustered survival data in which there is a dependence of failure time observations within the same group. The variance component estimation provides the estimated dispersion of the random effects. We carried out a test for frailty (or heterogeneity) using Bayes factor. Model comparison is made using information criteria and Bayes factor. We observed that the shared inverse Gaussian frailty model with LLD as baseline is the better fit for all three bivariate data sets.  相似文献   

3.
The mixture distribution models are more useful than pure distributions in modeling of heterogeneous data sets. The aim of this paper is to propose mixture of Weibull–Poisson (WP) distributions to model heterogeneous data sets for the first time. So, a powerful alternative mixture distribution is created for modeling of the heterogeneous data sets. In the study, many features of the proposed mixture of WP distributions are examined. Also, the expectation maximization (EM) algorithm is used to determine the maximum-likelihood estimates of the parameters, and the simulation study is conducted for evaluating the performance of the proposed EM scheme. Applications for two real heterogeneous data sets are given to show the flexibility and potentiality of the new mixture distribution.  相似文献   

4.
A number of models have been proposed in the literature to model data reflecting bathtub-shaped hazard rate functions. Mixture distributions provide the obvious choice for modelling such data sets but these contain too many parameters and hamper the accuracy of the inferential procedures particularly when the data are meagre. Recently, a few distributions have been proposed which are simply generalizations of the two-parameter Weibull model and are capable of producing bathtub behaviour of the hazard rate function. The Weibull extension and the modified Weibull models are two such families. This study focuses on comparing these two distributions for data sets exhibiting bathtub shape of the hazard rate. Bayesian tools are preferred due to their wide range of applicability in various nested and non-nested model comparison problems. Real data illustrations are provided so that a particular model can be recommended based on various tools of model comparison discussed in the paper.  相似文献   

5.
Valuation is a fundamental task of government, for-profit, and not-for-profit business. A major subset of valuation issues concerns situations where decision alternatives may be described by benefits and costs and the objective is to infer the values respondents attach to benefit/cost levels. For studies of this sort, computer administration enables the course of data collection to depend on prior responses, which allows the study to adapt to responses made by subjects. This capability is very useful when the objective is to identify which coefficients to include in a model, e.g., whether to include interaction terms. A disadvantage of computer administration, however, is that presenting many alternatives in a single presentation may not be possible because limited screen real estate may sharply limit the number of attributes and alternatives that may appear in choice sets. This paper shows how attribute and attribute level sub-setting may be used to create choice sets for use in choice-based evaluation studies sequentially. Initially data are collected and the model fit tested for main effects. If main effects model gives a good fit, conclusions are drawn on main effects. Otherwise, more choice sets will be included, data are collected and the model fit tested for main effects and two-way interactions. If that model fits, conclusions are drawn on main effects and two-way interactions. Otherwise, more data collected with added choice sets and conclusions are drawn on main effects, two-way and three-way interactions.  相似文献   

6.
The maximum likelihood estimation for the critical points of the failure rate and the mean residual life function are presented in the case of mixture inverse Gaussian model. Several important data sets are analyzed from this point of view. For each of the data sets, Bootstrapping is used to construct confidence intervals of the critical points.  相似文献   

7.
The paper considers a lognormal model for the survival times and obtains a Bayes solution by means of Gibbs sampler algorithm when the priors for the parameters are vague. The formulation given in the paper is mainly focused for censored data problems though it is equally well applicable for complete data scenarios as well. For the purpose of numerical illustration, we considered two real data sets on head and neck cancer patients when they have been treated using either radiotherapy or chemotherapy followed by radiotherapy. The paper not only compares the survival functions for the two therapies assuming a lognormal model but also provides a model compatibility study based on predictive simulation results so that the choice of lognormal model can be justified for the two data sets. The ease of our analysis as compared to an earlier approach is certainly an advantage.  相似文献   

8.
Mixed-Weibull distribution has been used to model a wide range of failure data sets, and in many practical situations the number of components in a mixture model is unknown. Thus, the parameter estimation of a mixed-Weibull distribution is considered and the important issue of how to determine the number of components is discussed. Two approaches are proposed to solve this problem. One is the method of moments and the other is a regularization type of fuzzy clustering algorithm. Finally, numerical examples and two real data sets are given to illustrate the features of the proposed approaches.  相似文献   

9.
This resean h extends the mixture of exponential family developed by Dean (1992) to accommodate oveidispeision in data with censoring. Score statistics testing the (Existence of overdispersion based on the proposed model are obtained Simulations show that the test statistics have suffcient power in detecting the existence of overdispersion when sample size are sufficiently large and the degree of censoring is mild (i. e, 40%) The test statistics are applied to real data sets and are able to detect overdispersion in those data sets.  相似文献   

10.
We introduce a new family of distributions suitable for fitting positive data sets with high kurtosis which is called the Slashed Generalized Rayleigh Distribution. This distribution arises as the quotient of two independent random variables, one being a generalized Rayleigh distribution in the numerator and the other a power of the uniform distribution in the denominator. We present properties and carry out estimation of the model parameters by moment and maximum likelihood (ML) methods. Finally, we conduct a small simulation study to evaluate the performance of ML estimators and analyze real data sets to illustrate the usefulness of the new model.  相似文献   

11.
We propose a multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal component method dedicated to categorical data: multiple correspondence analysis (MCA). The uncertainty concerning the parameters of the imputation model is reflected using a non-parametric bootstrap. Multiple imputation using MCA (MIMCA) requires estimating a small number of parameters due to the dimensionality reduction property of MCA. It allows the user to impute a large range of data sets. In particular, a high number of categories per variable, a high number of variables or a small number of individuals are not an issue for MIMCA. Through a simulation study based on real data sets, the method is assessed and compared to the reference methods (multiple imputation using the loglinear model, multiple imputation by logistic regressions) as well to the latest works on the topic (multiple imputation by random forests or by the Dirichlet process mixture of products of multinomial distributions model). The proposed method provides a good point estimate of the parameters of the analysis model considered, such as the coefficients of a main effects logistic regression model, and a reliable estimate of the variability of the estimators. In addition, MIMCA has the great advantage that it is substantially less time consuming on data sets of high dimensions than the other multiple imputation methods.  相似文献   

12.
13.
Count data are routinely assumed to have a Poisson distribution, especially when there are no straightforward diagnostic procedures for checking this assumption. We reanalyse two data sets from crossover trials of treatments for angina pectoris , in which the outcomes are counts of anginal attacks. Standard analyses focus on treatment effects, averaged over subjects; we are also interested in the dispersion of these effects (treatment heterogeneity). We set up a log-Poisson model with random coefficients to estimate the distribution of the treatment effects and show that the analysis is very sensitive to the distributional assumption; the population variance of the treatment effects is confounded with the (variance) function that relates the conditional variance of the outcomes, given the subject's rate of attacks, to the conditional mean. Diagnostic model checks based on resampling from the fitted distribution indicate that the default choice of the Poisson distribution for the analysed data sets is poorly supported. We propose to augment the data sets with observations of the counts, made possibly outside the clinical setting, so that the conditional distribution of the counts could be established.  相似文献   

14.
A regression model, based on the exponentiated-exponential geometric distribution, is defined and studied. The regression model can be applied to count data with under-dispersion or over-dispersion. Some forms of its modifications to truncated or inflated data are mentioned. Some tests to discriminate between the regression model and its competitors are discussed. Real numerical data sets are used to illustrate the applications of the regression model.  相似文献   

15.
ABSTRACT

The Marshall–Olkin extended two-parameter bathtub distribution is introduced and its structural properties are investigated, including the compounding representation of the distribution, the shapes of the density and the hazard rate function, the moments and quantiles. Estimation of the model parameters by maximum likelihood is discussed. Applications to some real data sets which motivate the usefulness of the model are provided. Comparison between the proposed model and other commonly used distributions is performed using real data sets. A simulation study is presented to investigate the accuracy of the estimates of the model's parameters.  相似文献   

16.
A mixture model with Laplace and normal components is fitted to wind shear data available in grouped form. A set of equations is presented for iteratively estimating the parameters of the model using an application of the EM algorithm. Twenty-four sets of data are examined with this technique, and the model is found to give a good fit to the data. Some hypotheses about the parameters in the model are discussed in light of the estimates obtained.  相似文献   

17.
Due to the escalating growth of big data sets in recent years, new Bayesian Markov chain Monte Carlo (MCMC) parallel computing methods have been developed. These methods partition large data sets by observations into subsets. However, for Bayesian nested hierarchical models, typically only a few parameters are common for the full data set, with most parameters being group specific. Thus, parallel Bayesian MCMC methods that take into account the structure of the model and split the full data set by groups rather than by observations are a more natural approach for analysis. Here, we adapt and extend a recently introduced two-stage Bayesian hierarchical modeling approach, and we partition complete data sets by groups. In stage 1, the group-specific parameters are estimated independently in parallel. The stage 1 posteriors are used as proposal distributions in stage 2, where the target distribution is the full model. Using three-level and four-level models, we show in both simulation and real data studies that results of our method agree closely with the full data analysis, with greatly increased MCMC efficiency and greatly reduced computation times. The advantages of our method versus existing parallel MCMC computing methods are also described.  相似文献   

18.
This paper considers the problem of prediction in a linear regression model when data sets are available from replicated experiments. Pooling the data sets for the estimation of regression parameters, we present three predictors — one arising from the least squares method and two stemming from Stein-rule method. Efficiency properties of these predictors are discussed when they are used to predict actual and average values of response variable within/outside the sample. Received: November 17, 1999; revised version: August 10, 2000  相似文献   

19.
A multivariate generalized Poisson regression model based on the multivariate generalized Poisson distribution is defined and studied. The regression model can be used to describe a count data with any type of dispersion. The model allows for both positive and negative correlation between any pair of the response variables. The parameters of the regression model are estimated by using the maximum likelihood method. Some test statistics are discussed, and two numerical data sets are used to illustrate the applications of the multivariate count data regression model.  相似文献   

20.
Pharmacokinetic (PK) data often contain concentration measurements below the quantification limit (BQL). While specific values cannot be assigned to these observations, nevertheless these observed BQL data are informative and generally known to be lower than the lower limit of quantification (LLQ). Setting BQLs as missing data violates the usual missing at random (MAR) assumption applied to the statistical methods, and therefore leads to biased or less precise parameter estimation. By definition, these data lie within the interval [0, LLQ], and can be considered as censored observations. Statistical methods that handle censored data, such as maximum likelihood and Bayesian methods, are thus useful in the modelling of such data sets. The main aim of this work was to investigate the impact of the amount of BQL observations on the bias and precision of parameter estimates in population PK models (non‐linear mixed effects models in general) under maximum likelihood method as implemented in SAS and NONMEM, and a Bayesian approach using Markov chain Monte Carlo (MCMC) as applied in WinBUGS. A second aim was to compare these different methods in dealing with BQL or censored data in a practical situation. The evaluation was illustrated by simulation based on a simple PK model, where a number of data sets were simulated from a one‐compartment first‐order elimination PK model. Several quantification limits were applied to each of the simulated data to generate data sets with certain amounts of BQL data. The average percentage of BQL ranged from 25% to 75%. Their influence on the bias and precision of all population PK model parameters such as clearance and volume distribution under each estimation approach was explored and compared. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号