首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
When the observed proportion of zeros in a data set consisting of binary outcome data is larger than expected under a regular logistic regression model, it is frequently suggested to use a zero-inflated Bernoulli (ZIB) regression model. A spline-based ZIB regression model is proposed to describe the potentially nonlinear effect of a continuous covariate. A spline is used to approximate the unknown smooth function. Under the smoothness condition, the spline estimator of the unknown smooth function is uniformly consistent, and the regression parameter estimators are asymptotically normally distributed. We propose an easily implemented and consistent estimation method for the variances of the regression parameter estimators. Extensive simulations are conducted to investigate the finite-sample performance of the proposed method. A real-life data set is used to illustrate the practical use of the proposed methodology. The real-life data analysis indicates that the prediction performance of the proposed semiparametric ZIB regression model is better compared to the parametric ZIB regression model.  相似文献   

2.
The B-spline representation is a common tool to improve the fitting of smooth nonlinear functions, it offers a fitting as a piecewise polynomial. The regions that define the pieces are separated by a sequence of knots. The main difficulty in this type of modeling is the choice of the number and the locations of these knots. The Reversible Jump Markov Chain Monte Carlo (RJMCMC) algorithm provides a solution to simultaneously select these two parameters by considering the knots as free parameters. This algorithm belongs to the MCMC techniques that allow simulations from target distributions on spaces of varying dimension. The aim of the present investigation is to use this algorithm in the framework of the analysis of survival time, for the Cox model in particular. In fact, the relation between the hazard ratio function and the covariates being assumed to be log-linear, this assumption is too restrictive. Thus, we propose to use the RJMCMC algorithm to model the log hazard ratio function by a B-spline representation with an unknown number of knots at unknown locations. This method is illustrated with two real data sets: the Stanford heart transplant data and lung cancer survival data. Another application of the RJMCMC is selecting the significant covariates, and a simulation study is performed.  相似文献   

3.
This article considers a class of estimators for the location and scale parameters in the location-scale model based on ‘synthetic data’ when the observations are randomly censored on the right. The asymptotic normality of the estimators is established using counting process and martingale techniques when the censoring distribution is known and unknown, respectively. In the case when the censoring distribution is known, we show that the asymptotic variances of this class of estimators depend on the data transformation and have a lower bound which is not achievable by this class of estimators. However, in the case that the censoring distribution is unknown and estimated by the Kaplan–Meier estimator, this class of estimators has the same asymptotic variance and attains the lower bound for variance for the case of known censoring distribution. This is different from censored regression analysis, where asymptotic variances depend on the data transformation. Our method has three valuable advantages over the method of maximum likelihood estimation. First, our estimators are available in a closed form and do not require an iterative algorithm. Second, simulation studies show that our estimators being moment-based are comparable to maximum likelihood estimators and outperform them when sample size is small and censoring rate is high. Third, our estimators are more robust to model misspecification than maximum likelihood estimators. Therefore, our method can serve as a competitive alternative to the method of maximum likelihood in estimation for location-scale models with censored data. A numerical example is presented to illustrate the proposed method.  相似文献   

4.
Summary.  An authentic food is one that is what it purports to be. Food processors and consumers need to be assured that, when they pay for a specific product or ingredient, they are receiving exactly what they pay for. Classification methods are an important tool in food authenticity studies where they are used to assign food samples of unknown type to known types. A classification method is developed where the classification rule is estimated by using both the labelled and the unlabelled data, in contrast with many classical methods which use only the labelled data for estimation. This methodology models the data as arising from a Gaussian mixture model with parsimonious covariance structure, as is done in model-based clustering. A missing data formulation of the mixture model is used and the models are fitted by using the EM and classification EM algorithms. The methods are applied to the analysis of spectra of food-stuffs recorded over the visible and near infra-red wavelength range in food authenticity studies. A comparison of the performance of model-based discriminant analysis and the method of classification proposed is given. The classification method proposed is shown to yield very good misclassification rates. The correct classification rate was observed to be as much as 15% higher than the correct classification rate for model-based discriminant analysis.  相似文献   

5.
Many fields of research need to classify individual systems based on one or more data series, which are obtained by sampling an unknown continuous curve with noise. In other words, the underlying process is an unknown function which the observed variables represent only imperfectly. Although functional logistic regression has many attractive features for this classification problem, this method is applicable only when the number of individuals to be classified (or available to estimate the model) is large compared to the number of curves sampled per individual.To overcome this limitation, we use penalized optimal scoring to construct a new method for the classification of multi-dimensional functional data. The proposed method consists of two stages. First, the series of observed discrete values available for each individual are expressed as a set of continuous curves. Next, the penalized optimal scoring model is estimated on the basis of these curves. A similar penalized optimal scoring method was described in my previous work, but this model is not suitable for the analysis of continuous functions. In this paper we adopt a Gaussian kernel approach to extend the previous model. The high accuracy of the new method is demonstrated on Monte Carlo simulations, and used to predict defaulting firms on the Japanese Stock Exchange.  相似文献   

6.
In this paper, we consider the Bayesian inference of the unknown parameters of the randomly censored Weibull distribution. A joint conjugate prior on the model parameters does not exist; we assume that the parameters have independent gamma priors. Since closed-form expressions for the Bayes estimators cannot be obtained, we use Lindley's approximation, importance sampling and Gibbs sampling techniques to obtain the approximate Bayes estimates and the corresponding credible intervals. A simulation study is performed to observe the behaviour of the proposed estimators. A real data analysis is presented for illustrative purposes.  相似文献   

7.
Bayesian calibration of computer models   总被引:5,自引:0,他引:5  
We consider prediction and uncertainty analysis for systems which are approximated using complex mathematical models. Such models, implemented as computer codes, are often generic in the sense that by a suitable choice of some of the model's input parameters the code can be used to predict the behaviour of the system in a variety of specific applications. However, in any specific application the values of necessary parameters may be unknown. In this case, physical observations of the system in the specific context are used to learn about the unknown parameters. The process of fitting the model to the observed data by adjusting the parameters is known as calibration. Calibration is typically effected by ad hoc fitting, and after calibration the model is used, with the fitted input values, to predict the future behaviour of the system. We present a Bayesian calibration technique which improves on this traditional approach in two respects. First, the predictions allow for all sources of uncertainty, including the remaining uncertainty over the fitted parameters. Second, they attempt to correct for any inadequacy of the model which is revealed by a discrepancy between the observed data and the model predictions from even the best-fitting parameter values. The method is illustrated by using data from a nuclear radiation release at Tomsk, and from a more complex simulated nuclear accident exercise.  相似文献   

8.
Bayesian semiparametric inference is considered for a loglinear model. This model consists of a parametric component for the regression coefficients and a nonparametric component for the unknown error distribution. Bayesian analysis is studied for the case of a parametric prior on the regression coefficients and a mixture-of-Dirichlet-processes prior on the unknown error distribution. A Markov-chain Monte Carlo (MCMC) method is developed to compute the features of the posterior distribution. A model selection method for obtaining a more parsimonious set of predictors is studied. The method adds indicator variables to the regression equation. The set of indicator variables represents all the possible subsets to be considered. A MCMC method is developed to search stochastically for the best subset. These procedures are applied to two examples, one with censored data.  相似文献   

9.
Often the unknown covariance structure of a stationary, dependent, Gaussian error sequence can be simply parametrised. The error sequence can either be directly observed or observed only through a random sequence containing a deterministic regression model. The method of scoring is used here, in conjunction with recursive estimation techniques, to effect the maximum likelihood estimation of the covariance parameters. Sequences of recursive residuals, useful in model diagnostics and data analysis, are obtained in the estimation procedure.  相似文献   

10.
Length-biased data arise in many important applications including epidemiological cohort studies, cancer prevention trials and studies of labor economics. Such data are also often subject to right censoring due to loss of follow-up or the end of study. In this paper, we consider a proportional hazards model with varying coefficients for right-censored and length-biased data, which is used to study the interact effect nonlinearly of covariates with an exposure variable. A local estimating equation method is proposed for the unknown coefficients and the intercept function in the model. The asymptotic properties of the proposed estimators are established by using the martingale theory and kernel smoothing techniques. Our simulation studies demonstrate that the proposed estimators have an excellent finite-sample performance. The Channing House data is analyzed to demonstrate the applications of the proposed method.  相似文献   

11.
In this paper, the author presents an efficient method of analyzing an interest-rate model using a new approach called 'data augmentation Bayesian forecasting.' First, a dynamic linear model estimation was constructed with a hierarchically-incorporated model. Next, an observational replication was generated based on the one-step forecast distribution derived from the model. A Markov-chain Monte Carlo sampling method was conducted on it as a new observation and unknown parameters were estimated. At that time, the EM algorithm was applied to establish initial values of unknown parameters while the 'quasi Bayes factor' was used to appreciate parameter candidates. 'Data augmentation Bayesian forecasting' is a method of evaluating the transition and history of 'future,' 'present' and 'past' of an arbitrary stochastic process by which an appropriate evaluation is conducted based on the probability measure that has been sequentially modified with additional information. It would be possible to use future prediction results for modifying the model to grasp the present state or re-evaluate the past state. It would be also possible to raise the degree of precision in predicting the future through the modification of the present and the past. Thus, 'data augmentation Bayesian forecasting' is applicable not only in the field of financial data analysis but also in forecasting and controlling the stochastic process.  相似文献   

12.
The non-homogeneous Poisson process (NHPP) model is a very important class of software reliability models and is widely used in software reliability engineering. NHPPs are characterized by their intensity functions. In the literature it is usually assumed that the functional forms of the intensity functions are known and only some parameters in intensity functions are unknown. The parametric statistical methods can then be applied to estimate or to test the unknown reliability models. However, in realistic situations it is often the case that the functional form of the failure intensity is not very well known or is completely unknown. In this case we have to use functional (non-parametric) estimation methods. The non-parametric techniques do not require any preliminary assumption on the software models and then can reduce the parameter modeling bias. The existing non-parametric methods in the statistical methods are usually not applicable to software reliability data. In this paper we construct some non-parametric methods to estimate the failure intensity function of the NHPP model, taking the particularities of the software failure data into consideration.  相似文献   

13.
This paper reviews five related types of analysis, namely (i) sensitivity or what-if analysis, (ii) uncertainty or risk analysis, (iii) screening, (iv) validation, and (v) optimization. The main questions are: when should which type of analysis be applied; which statistical techniques may then be used? This paper claims that the proper sequence to follow in the evaluation of simulation models is as follows. 1) Validation, in which the availability of data on the real system determines which type of statistical technique to use for validation. 2) Screening: in the simulation‘s pilot phase the really important inputs can be identified through a novel technique, called sequential bifurcation, which uses aggregation and sequential experimentation. 3) Sensitivity analysis: the really important inputs should be subjected to a more detailed analysis, which includes interactions between these inputs; relevant statistical techniques are design of experiments (DOE) and regression analysis. 4) Uncertainty analysis: the important environmental inputs may have values that are not precisely known, so the uncertainties of the model outputs that result from the uncertainties in these model inputs should be quantified; relevant techniques are the Monte Carlo method and Latin hypercube sampling. 5) Optimization: the policy variables should be controlled; a relevant technique is Response Surface Methodology (RSM), which combines DOE, regression analysis, and steepest-ascent hill-climbing. The recommended sequence implies that sensitivity analysis procede uncertainty analysis. Several case studies for each phase are briefly discussed in this paper.  相似文献   

14.
In the analysis of time-to-event data with multiple causes using a competing risks Cox model, often the cause of failure is unknown for some of the cases. The probability of a missing cause is typically assumed to be independent of the cause given the time of the event and covariates measured before the event occurred. In practice, however, the underlying missing-at-random assumption does not necessarily hold. Motivated by colorectal cancer molecular pathological epidemiology analysis, we develop a method to conduct valid analysis when additional auxiliary variables are available for cases only. We consider a weaker missing-at-random assumption, with missing pattern depending on the observed quantities, which include the auxiliary covariates. We use an informative likelihood approach that will yield consistent estimates even when the underlying model for missing cause of failure is misspecified. The superiority of our method over naive methods in finite samples is demonstrated by simulation study results. We illustrate the use of our method in an analysis of colorectal cancer data from the Nurses’ Health Study cohort, where, apparently, the traditional missing-at-random assumption fails to hold.  相似文献   

15.
In a 2 × 2 contingency table, when the sample size is small, there may be a number of cells that contain few or no observations, usually referred to as sparse data. In such cases, a common recommendation in the conventional frequentist methods is adding a small constant to every cell of the observed table to find the estimates of the unknown parameters. However, this approach is based on asymptotic properties of the estimates and may work poorly for small samples. An alternative approach would be to use Bayesian methods in order to provide better insight into the problem of sparse data coupled with fewer centers, which would otherwise be difficult to carry out the analysis. In this article, an attempt has been made to use hierarchical Bayesian model to a multicenter data on the effect of a surgical treatment with standard foot care among leprosy patients with posterior tibial nerve damage which is summarized as seven 2 × 2 tables. Monte Carlo Markov Chain (MCMC) techniques are applied in estimating the parameters of interest under sparse data setup.  相似文献   

16.
Data resulting from behavioral dental research, usually categorical or discretized and having unknown measurement and distributional characteristics, often cannot be analyzed with classical multivariate techniques. A non linear principal components technique called multiple correspondence analysis is presented with its corresponding computer program that can handle this kind of data. The model is described as a form of multidimensional scaling. The technique Is applied in order to establish which factors are associated with an Individual's preference for preservation of the teeth.  相似文献   

17.
The paper proposes a method of analysis for data on within–household disease transmission, when only outbreak sizes are available. The method assumes between–household heterogeneity of the transmission probabilities. A random effects model in a hierarchical setting is fitted using MCMC and data augmentation techniques. The procedure is illustrated on a measles dataset.  相似文献   

18.
Owing to the nature of the problems and the design of questionnaires, discrete polytomous data are very common in behavioural, medical and social research. Analysing the relationships between the manifest and the latent variables based on mixed polytomous and continuous data has proven to be difficult. A general structural equation model is investigated for these mixed outcomes. Maximum likelihood (ML) estimates of the unknown thresholds and the structural parameters in the covariance structure are obtained. A Monte Carlo–EM algorithm is implemented to produce the ML estimates. It is shown that closed form solutions can be obtained for the M-step, and estimates of the latent variables are produced as a by-product of the analysis. The method is illustrated with a real example.  相似文献   

19.
Recently, exact inference under hybrid censoring scheme has attracted extensive attention in the field of reliability analysis. However, most of the authors neglect the possibility of competing risks model. This paper mainly discusses the exact likelihood inference for the analysis of generalized type-I hybrid censoring data with exponential competing failure model. Based on the maximum likelihood estimates for unknown parameters, we establish the exact conditional distribution of parameters by conditional moment generating function, and then obtain moment properties as well as exact confidence intervals (CIs) for parameters. Furthermore, approximate CIs are constructed by asymptotic distribution and bootstrap method as well. We also compare their performances with exact method through the use of Monte Carlo simulations. And finally, a real data set is analysed to illustrate the validity of all the methods developed here.  相似文献   

20.
The Dirichlet-multinomial model is considered as a model for cluster sampling. The model assumes that the design's covariance matrix is a constant times the covariance under multinomial sampling. The use of this model requires estimating a parameter C, that measures the clustering effect. In this paper, a regression estimate for C is obtained. An approximate distribution of this estimator is obtained through the use of asymptotic techniques. A goodness of fit statistic for testing the fit of the Dirichlet Multinomial model is also obtained, based on those asymptotic techniques. These statistics provide a means of knowing when the data satisfy the model assumption. These results are used to analyze data concerning the authorship of Greek prose.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号