首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 687 毫秒
1.
The logistic regression model has become a standard tool to investigate the relationship between a binary outcome and a set of potential predictors. When analyzing binary data, it often arises that the observed proportion of zeros is greater than expected under the postulated logistic model. Zero-inflated binomial (ZIB) models have been developed to fit binary data that contain too many zeros. Maximum likelihood estimators in these models have been proposed and their asymptotic properties established. Several aspects of ZIB models still deserve attention however, such as the estimation of odds-ratios and event probabilities. In this article, we propose estimators of these quantities and we investigate their properties both theoretically and via simulations. Based on these results, we provide recommendations about the range of conditions (minimum sample size, maximum proportion of zeros in excess) under which a reliable statistical inference on the odds-ratios and event probabilities can be obtained in a ZIB regression model. A real-data example illustrates the proposed estimators.  相似文献   

2.
Affiliation network is one kind of two-mode social network with two different sets of nodes (namely, a set of actors and a set of social events) and edges representing the affiliation of the actors with the social events. The connections in many affiliation networks are only binary weighted between actors and social events that can not reveal the affiliation strength relationship. Although a number of statistical models are proposed to analyze affiliation binary weighted networks, the asymptotic behaviors of the maximum likelihood estimator (MLE) are still unknown or have not been properly explored in affiliation weighted networks. In this paper, we study an affiliation model with the degree sequence as the exclusively natural sufficient statistic in the exponential family distributions. We derive the consistency and asymptotic normality of the maximum likelihood estimator in affiliation finite discrete weighted networks when the numbers of actors and events both go to infinity. Simulation studies and a real data example demonstrate our theoretical results.  相似文献   

3.
A birth process model proposed by Dixon and Robinson has been widely used in football spread betting market. However, multiple goals in a minute are permitted in the model, which does not conform to historical record. Moreover, it is difficult to calculate the outcome probability of the process accurately. The article presents a discrete-time and finite-state Markov chain model for real-time forecast of football matches and a recursive algorithm is derived to calculate the outcome probability accurately. The empirical study shows that the proposed model outperforms the models of Dixon and Robinson and Dixon and Coles.  相似文献   

4.
This article compares the accuracy of the median unbiased estimator with that of the maximum likelihood estimator for a logistic regression model with two binary covariates. The former estimator is shown to be uniformly more accurate than the latter for small to moderately large sample sizes and a broad range of parameter values. In view of the recently developed efficient algorithms for generating exact distributions of sufficient statistics in binary-data problems, these results call for a serious consideration of median unbiased estimation as an alternative to maximum likelihood estimation, especially when the sample size is not large, or when the data structure is sparse.  相似文献   

5.
Whittemore (1981) proposed an approach for calculating the sample size needed to test hypotheses with specified significance and power against a given alternative for logistic regression with small response probability. Based on the distribution of covariate, which could be either discrete or continuous, this approach first provides a simple closed-form approximation to the asymptotic covariance matrix of the maximum likelihood estimates, and then uses it to calculate the sample size needed to test a hypothesis about the parameter. Self et al. (1992) described a general approach for power and sample size calculations within the framework of generalized linear models, which include logistic regression as a special case. Their approach is based on an approximation to the distribution of the likelihood ratio statistic. Unlike the Whittemore approach, their approach is not limited to situations of small response probability. However, it is restricted to models with a finite number of covariate configurations. This study compares these two approaches to see how accurate they would be for the calculations of power and sample size in logistic regression models with various response probabilities and covariate distributions. The results indicate that the Whittemore approach has a slight advantage in achieving the nominal power only for one case with small response probability. It is outperformed for all other cases with larger response probabilities. In general, the approach proposed in Self et al. (1992) is recommended for all values of the response probability. However, its extension for logistic regression models with an infinite number of covariate configurations involves an arbitrary decision for categorization and leads to a discrete approximation. As shown in this paper, the examined discrete approximations appear to be sufficiently accurate for practical purpose.  相似文献   

6.
The evaluation of decision trees under uncertainty is difficult because of the required nested operations of maximizing and averaging. Pure maximizing (for deterministic decision trees) or pure averaging (for probability trees) are both relatively simple because the maximum of a maximum is a maximum, and the average of an average is an average. But when the two operators are mixed, no simplification is possible, and one must evaluate the maximization and averaging operations in a nested fashion, following the structure of the tree. Nested evaluation requires large sample sizes (for data collection) or long computation times (for simulations).  相似文献   

7.
A hierarchical logit-normal model for analysis of binary data with extra-binomial variation is examined. A method of approximate maximum likelihood estimation of the parameters is proposed. The method uses the EM algorithm and approximations to facilitate its implementation are derived. Approximate standard errors of the estimates are provided and a numerical example is used to illustrate the method.  相似文献   

8.
The conventional phase II trial design paradigm is to make the go/no-go decision based on the hypothesis testing framework. Statistical significance itself alone, however, may not be sufficient to establish that the drug is clinically effective enough to warrant confirmatory phase III trials. We propose the Bayesian optimal phase II trial design with dual-criterion decision making (BOP2-DC), which incorporates both statistical significance and clinical relevance into decision making. Based on the posterior probability that the treatment effect reaches the lower reference value (statistical significance) and the clinically meaningful value (clinical significance), BOP2-DC allows for go/consider/no-go decisions, rather than a binary go/no-go decision. BOP2-DC is highly flexible and accommodates various types of endpoints, including binary, continuous, time-to-event, multiple, and coprimary endpoints, in single-arm and randomized trials. The decision rule of BOP2-DC is optimized to maximize the probability of a go decision when the treatment is effective or minimize the expected sample size when the treatment is futile. Simulation studies show that the BOP2-DC design yields desirable operating characteristics. The software to implement BOP2-DC is freely available at www.trialdesign.org .  相似文献   

9.
In outcome‐dependent sampling, the continuous or binary outcome variable in a regression model is available in advance to guide selection of a sample on which explanatory variables are then measured. Selection probabilities may either be a smooth function of the outcome variable or be based on a stratification of the outcome. In many cases, only data from the final sample is accessible to the analyst. A maximum likelihood approach for this data configuration is developed here for the first time. The likelihood for fully general outcome‐dependent designs is stated, then the special case of Poisson sampling is examined in more detail. The maximum likelihood estimator differs from the well‐known maximum sample likelihood estimator, and an information bound result shows that the former is asymptotically more efficient. A simulation study suggests that the efficiency difference is generally small. Maximum sample likelihood estimation is therefore recommended in practice when only sample data is available. Some new smooth sample designs show considerable promise.  相似文献   

10.
The binary logistic regression is a commonly used statistical method when the outcome variable is dichotomous or binary. The explanatory variables are correlated in some situations of the logit model. This problem is called multicollinearity. It is known that the variance of the maximum likelihood estimator (MLE) is inflated in the presence of multicollinearity. Therefore, in this study, we define a new two-parameter ridge estimator for the logistic regression model to decrease the variance and overcome multicollinearity problem. We compare the new estimator to the other well-known estimators by studying their mean squared error (MSE) properties. Moreover, a Monte Carlo simulation is designed to evaluate the performances of the estimators. Finally, a real data application is illustrated to show the applicability of the new method. According to the results of the simulation and real application, the new estimator outperforms the other estimators for all of the situations considered.  相似文献   

11.
A study to investigate the human immunodeficiency virus (HIV) status on the course of neurological impairment, conducted by the HIV Center at Columbia University, followed a cohort of HIV positive and negative gay men for 5 years and assessed the presence or absence of neurological impairment every 6 months. Almost half of the subjects dropped out before the end of the study for reasons that might have been related to the missing neurological data. We propose likelihood-based methods for analysing such binary longitudinal data under informative and non-informative drop-out. A transition model is assumed for the binary response, and several models for the drop-out processes are considered which are functions of the response variable (neurological impairment). The likelihood ratio test is used to compare models with informative and non-informative drop-out mechanisms. Using simulations, we investigate the percentage bias and mean-squared error (MSE) of the parameter estimates in the transition model under various assumptions for the drop-out. We find evidence for informative drop-out in the study, and we illustrate that the bias and MSE for the parameters of the transition model are not directly related to the observed drop-out or missing data rates. The effect of HIV status on the neurological impairment is found to be statistically significant under each of the models considered for the drop-out, although the regression coefficient may be biased in certain cases. The presence and relative magnitude of the bias depend on factors such as the probability of drop-out conditional on the presence of neurological impairment and the prevalence of neurological impairment in the population under study.  相似文献   

12.
We introduce a three-parameter extension of the exponential distribution which contains as sub-models the exponential, logistic-exponential and Marshall-Olkin exponential distributions. The new model is very flexible and its associated density function can be decreasing or unimodal. Further, it can produce all of the four major shapes of the hazard rate, that is, increasing, decreasing, bathtub and upside-down bathtub. Given that closed-form expressions are available for the survival and hazard rate functions, the new distribution is quite tractable. It can be used to analyze various types of observations including censored data. Computable representations of the quantile function, ordinary and incomplete moments, generating function and probability density function of order statistics are obtained. The maximum likelihood method is utilized to estimate the model parameters. A simulation study is carried out to assess the performance of the maximum likelihood estimators. Two actual data sets are used to illustrate the applicability of the proposed model.  相似文献   

13.
In this article, we consider a two-phase tandem queueing model with a second optional service. In this model, the service is done by two phases. The first phase of service is essential for all customers and after the completion of the first phase of service, any customer receives the second phase of service with probability α, or leaves the system with probability 1 ? α. Also, there are two heterogeneous servers which work independently, one of them providing the first phase of service and the other a second phase of service. In this model, our main purpose is to estimate the parameters of the model, traffic intensity, and mean system size, in the steady state, via maximum likelihood and Bayesian methods. Furthermore, we find asymptotic confidence intervals for mean system size. Finally, by a simulation study, we compute the confidence levels and mean length for asymptotic confidence intervals of mean system size with a nominal level 0.95.  相似文献   

14.
This article addresses two methods of estimation of the probability density function (PDF) and cumulative distribution function (CDF) for the Lindley distribution. Following estimation methods are considered: uniformly minimum variance unbiased estimator (UMVUE) and maximum likelihood estimator (MLE). Since the Lindley distribution is more flexible than the exponential distribution, the same estimators have been found out for the exponential distribution and compared. Monte Carlo simulations and a real data analysis are performed to compare the performances of the proposed methods of estimation.  相似文献   

15.
In this paper, we suggest a technique to quantify model risk, particularly model misspecification for binary response regression problems found in financial risk management, such as in credit risk modelling. We choose the probability of default model as one instance of many other credit risk models that may be misspecified in a financial institution. By way of illustrating the model misspecification for probability of default, we carry out quantification of two specific statistical predictive response techniques, namely the binary logistic regression and complementary log–log. The maximum likelihood estimation technique is employed for parameter estimation. The statistical inference, precisely the goodness of fit and model performance measurements, are assessed. Using the simulation dataset and Taiwan credit card default dataset, our finding reveals that with the same sample size and very small simulation iterations, the two techniques produce similar goodness-of-fit results but completely different performance measures. However, when the iterations increase, the binary logistic regression technique for balanced dataset reveals prominent goodness of fit and performance measures as opposed to the complementary log–log technique for both simulated and real datasets.  相似文献   

16.
Missing data analysis requires assumptions about an outcome model or a response probability model to adjust for potential bias due to nonresponse. Doubly robust (DR) estimators are consistent if at least one of the models is correctly specified. Multiply robust (MR) estimators extend DR estimators by allowing for multiple models for both the outcome and/or response probability models and are consistent if at least one of the multiple models is correctly specified. We propose a robust quasi-randomization-based model approach to bring more protection against model misspecification than the existing DR and MR estimators, where any multiple semiparametric, nonparametric or machine learning models can be used for the outcome variable. The proposed estimator achieves unbiasedness by using a subsampling Rao–Blackwell method, given cell-homogenous response, regardless of any working models for the outcome. An unbiased variance estimation formula is proposed, which does not use any replicate jackknife or bootstrap methods. A simulation study shows that our proposed method outperforms the existing multiply robust estimators.  相似文献   

17.
Single-arm one- or multi-stage study designs are commonly used in phase II oncology development when the primary outcome of interest is tumor response, a binary variable. Both two- and three-outcome designs are available. Simon two-stage design is a well-known example of two-outcome designs. The objective of a two-outcome trial is to reject either the null hypothesis that the objective response rate (ORR) is less than or equal to a pre-specified low uninteresting rate or to reject the alternative hypothesis that the ORR is greater than or equal to some target rate. Three-outcome designs proposed by Sargent et al. allow a middle gray decision zone which rejects neither hypothesis in order to reduce the required study size. We propose new two- and three-outcome designs with continual monitoring based on Bayesian posterior probability that meet frequentist specifications such as type I and II error rates. Futility and/or efficacy boundaries are based on confidence functions, which can require higher levels of evidence for early versus late stopping and have clear and intuitive interpretations. We search in a class of such procedures for optimal designs that minimize a given loss function such as average sample size under the null hypothesis. We present several examples and compare our design with other procedures in the literature and show that our design has good operating characteristics.  相似文献   

18.
Summary.  In longitudinal studies missing data are the rule not the exception. We consider the analysis of longitudinal binary data with non-monotone missingness that is thought to be non-ignorable. In this setting a full likelihood approach is complicated algebraically and can be computationally prohibitive when there are many measurement occasions. We propose a 'protective' estimator that assumes that the probability that a response is missing at any occasion depends, in a completely unspecified way, on the value of that variable alone. Relying on this 'protectiveness' assumption, we describe a pseudolikelihood estimator of the regression parameters under non-ignorable missingness, without having to model the missing data mechanism directly. The method proposed is applied to CD4 cell count data from two longitudinal clinical trials of patients infected with the human immunodeficiency virus.  相似文献   

19.
Abstract

In some clinical, environmental, or economical studies, researchers are interested in a semi-continuous outcome variable which takes the value zero with a discrete probability and has a continuous distribution for the non-zero values. Due to the measuring mechanism, it is not always possible to fully observe some outcomes, and only an upper bound is recorded. We call this left-censored data and observe only the maximum of the outcome and an independent censoring variable, together with an indicator. In this article, we introduce a mixture semi-parametric regression model. We consider a parametric model to investigate the influence of covariates on the discrete probability of the value zero. For the non-zero part of the outcome, a semi-parametric Cox’s regression model is used to study the conditional hazard function. The different parameters in this mixture model are estimated using a likelihood method. Hereby the infinite dimensional baseline hazard function is estimated by a step function. As results, we show the identifiability and the consistency of the estimators for the different parameters in the model. We study the finite sample behaviour of the estimators through a simulation study and illustrate this model on a practical data example.  相似文献   

20.
Probabilistic Principal Component Analysis   总被引:2,自引:0,他引:2  
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based on a probability model. We demonstrate how the principal axes of a set of observed data vectors may be determined through maximum likelihood estimation of parameters in a latent variable model that is closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss, with illustrative examples, the advantages conveyed by this probabilistic approach to PCA.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号