期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A class of log-linear models with constrained marginal distributions

Roberto Colombi 《Statistical Methods and Applications》1995,4(2):147-165

Summary In the log-linear model for bivariate probability functions the conditional and joint probabilities have a simple form. This property make the log-linear parametrization useful when modeling these probabilities is the focus of the investigation. On the contrary, in the log-linear representation of bivariate probability functions, the marginal probabilities have a complex form. So the log-linear models are not useful when the marginal probabilities are of particular interest. In this paper the previous statements are discussed and a model obtained from the log-linear one by imposing suitable constraints on the marginal probabilities is introduced. This work was supported by a M.U.R.S.T. grant. 相似文献

2.

An evaluation of the Bayesian approach to fitting the N-mixture model for use with pseudo-replicated count data

《Journal of Statistical Computation and Simulation》2012,82(8):1135-1143

The N-mixture model proposed by Royle in 2004 may be used to approximate the abundance and detection probability of animal species in a given region. In 2006, Royle and Dorazio discussed the advantages of using a Bayesian approach in modelling animal abundance and occurrence using a hierarchical N-mixture model. N-mixture models assume replication on sampling sites, an assumption that may be violated when the site is not closed to changes in abundance during the survey period or when nominal replicates are defined spatially. In this paper, we studied the robustness of a Bayesian approach to fitting the N-mixture model for pseudo-replicated count data. Our simulation results showed that the Bayesian estimates for abundance and detection probability are slightly biased when the actual detection probability is small and are sensitive to the presence of extra variability within local sites. 相似文献

3.

Paired comparison models applied to the design of the Major League baseball play-offs

Donald E. K. Martin 《Journal of applied statistics》1999,26(1):69-80

This paper presents an analysis of the eff ect of various baseball play-off configurations on the probability of advancing to the World Series. Play-off games are assumed to be independent. Several paired comparisons models are considered for modeling the probability of a home team winning a single game as a function of the winning percentages of the contestants over the course of the season. The uniform and logistic regression models are both adequate, whereas the Bradley-Terry model (modified for within-pair order eff ects, i.e. the home field advantage) is not. The single-game probabilities are then used to compute the probability of winning the play-off s under various structures. The extra round of play-off s, instituted in 1994, significantly lowers the probability of the team with the best record advancing to the World Series, whereas home field advantage and the diff erent possible play-offdraws have a minimal eff ect. 相似文献

4.

Estimating the probability of simultaneous rainfall extremes within a region: a spatial approach

Lee Fawcett David Walshaw 《Journal of applied statistics》2014,41(5):959-976

In this paper we investigate the impact of model mis-specification, in terms of the dependence structure in the extremes of a spatial process, on the estimation of key quantities that are of interest to hydrologists and engineers. For example, it is often the case that severe flooding occurs as a result of the observation of rainfall extremes at several locations in a region simultaneously. Thus, practitioners might be interested in estimates of the joint exceedance probability of some high levels across these locations. It is likely that there will be spatial dependence present between the extremes, and this should be properly accounted for when estimating such probabilities. We compare the use of standard models from the geostatistics literature with max-stables models from extreme value theory. We find that, in some situations, using an incorrect spatial model for our extremes results in a significant under-estimation of these probabilities which – in flood defence terms – could lead to substantial under-protection. 相似文献

5.

大数据背景下网络调查样本的建模推断问题研究——以广义Boosted模型的倾向得分推断为例

刘展潘莹丽《统计研究》2019,36(9):93

随着大数据和网络的不断发展,网络调查越来越广泛,大部分网络调查样本属于非概率样本,难以采用传统的抽样推断理论进行推断,如何解决网络调查样本的推断问题是大数据背景下网络调查发展的迫切需求。本文首次从建模的角度提出了解决该问题的基本思路：一是入样概率的建模推断,可以考虑构建基于机器学习与变量选择的倾向得分模型来估计入样概率推断总体;二是目标变量的建模推断,可以考虑直接对目标变量建立参数、非参数或半参数超总体模型进行估计;三是入样概率与目标变量的双重建模推断,可以考虑进行倾向得分模型与超总体模型的加权估计与混合推断。最后,以基于广义Boosted模型的入样概率建模推断为例演示了具体解决方法。相似文献

6.

Marginal models for the association structure of hierarchical binary responses

André G. F. C. Costa Aline B. M. Vaz José Luiz P. Silva Leila D. Amorim 《Journal of applied statistics》2017,44(10):1827-1838

Clustered binary responses are often found in ecological studies. Data analysis may include modeling the marginal probability response. However, when the association is the main scientific focus, modeling the correlation structure between pairs of responses is the key part of the analysis. Second-order generalized estimating equations (GEE) are established in the literature. Some of them are more efficient in computational terms, especially facing large clusters. Alternating logistic regression (ALR) and orthogonalized residual (ORTH) GEE methods are presented and compared in this paper. Simulation results show a slightly superiority of ALR over ORTH. Marginal probabilities and odds ratios are also estimated and compared in a real ecological study involving a three-level hierarchical clustering. ALR and ORTH models are useful for modeling complex association structure with large cluster sizes. 相似文献

7.

Bayesian analysis of zero-inflated regression models

《Journal of statistical planning and inference》2006,136(4):1360-1375

In modeling defect counts collected from an established manufacturing processes, there are usually a relatively large number of zeros (non-defects). The commonly used models such as Poisson or Geometric distributions can underestimate the zero-defect probability and hence make it difficult to identify significant covariate effects to improve production quality. This article introduces a flexible class of zero inflated models which includes other familiar models such as the Zero Inflated Poisson (ZIP) models, as special cases. A Bayesian estimation method is developed as an alternative to traditionally used maximum likelihood based methods to analyze such data. Simulation studies show that the proposed method has better finite sample performance than the classical method with tighter interval estimates and better coverage probabilities. A real-life data set is analyzed to illustrate the practicability of the proposed method easily implemented using WinBUGS. 相似文献

8.

Fitting probability forecasting models by scoring rules and maximum likelihood

David Johnstone Yan-Xia Lin 《Journal of statistical planning and inference》2011,141(5):1832-1837

Probability forecasting models can be estimated using weighted score functions that (by definition) capture the performance of the estimated probabilities relative to arbitrary “baseline” probability assessments, such as those produced by another model, by a bookmaker or betting market, or by a human probability assessor. Maximum likelihood estimation (MLE) is interpretable as just one such method of optimum score estimation. We find that when MLE-based probabilities are themselves treated as the baseline, forecasting models estimated by optimizing any of the proven families of power and pseudospherical economic score functions yield the very same probabilities as MLE. The finding that probabilities estimated by optimum score estimation respond to MLE-baseline probabilities by mimicking them supports reliance on MLE as the default form of optimum score estimation. 相似文献

9.

Two-stage approaches to the analysis of occupancy data I: the homogeneous case (analysis of occupancy data)

Natalie Karavarsamis Richard M. Huggins 《统计学通讯:理论与方法》2020,49(19):4751-4761

Abstract

Occupancy models are used in statistical ecology to estimate species dispersion. The two components of an occupancy model are the detection and occupancy probabilities, with the main interest being in the occupancy probabilities. We show that for the homogeneous occupancy model there is an orthogonal transformation of the parameters that gives a natural two-stage inference procedure based on a conditional likelihood. We then extend this to a partial likelihood that gives explicit estimators of the model parameters. By allowing the separate modeling of the detection and occupancy probabilities, the extension of the two-stage approach to more general models has the potential to simplify the computational routines used there. 相似文献

10.

Events per variable for risk differences and relative risks using pseudo-observations

Stefan Nygaard Hansen Per Kragh Andersen Erik Thorlund Parner 《Lifetime data analysis》2014,20(4):584-598

A method based on pseudo-observations has been proposed for direct regression modeling of functionals of interest with right-censored data, including the survival function, the restricted mean and the cumulative incidence function in competing risks. The models, once the pseudo-observations have been computed, can be fitted using standard generalized estimating equation software. Regression models can however yield problematic results if the number of covariates is large in relation to the number of events observed. Guidelines of events per variable are often used in practice. These rules of thumb for the number of events per variable have primarily been established based on simulation studies for the logistic regression model and Cox regression model. In this paper we conduct a simulation study to examine the small sample behavior of the pseudo-observation method to estimate risk differences and relative risks for right-censored data. We investigate how coverage probabilities and relative bias of the pseudo-observation estimator interact with sample size, number of variables and average number of events per variable. 相似文献

11.

Joint analysis of nonlinear heterogeneous longitudinal data and binary outcome: an application to AIDS clinical studies

Xiaosun Lu Rong Zhou 《Journal of applied statistics》2016,43(15):2713-2728

Finite mixture models are currently used to analyze heterogeneous longitudinal data. By releasing the homogeneity restriction of nonlinear mixed-effects (NLME) models, finite mixture models not only can estimate model parameters but also cluster individuals into one of the pre-specified classes with class membership probabilities. This clustering may have clinical significance, which might be associated with a clinically important binary outcome. This article develops a joint modeling of a finite mixture of NLME models for longitudinal data in the presence of covariate measurement errors and a logistic regression for a binary outcome, linked by individual latent class indicators, under a Bayesian framework. Simulation studies are conducted to assess the performance of the proposed joint model and a naive two-step model, in which finite mixture model and logistic regression are fitted separately, followed by an application to a real data set from an AIDS clinical trial, in which the viral dynamics and dichotomized time to the first decline of CD4/CD8 ratio are analyzed jointly. 相似文献

12.

A comparison of two approaches for power and sample size calculations in logistic regression models

Gwowen Shieh 《统计学通讯:模拟与计算》2013,42(3):763-791

Whittemore (1981) proposed an approach for calculating the sample size needed to test hypotheses with specified significance and power against a given alternative for logistic regression with small response probability. Based on the distribution of covariate, which could be either discrete or continuous, this approach first provides a simple closed-form approximation to the asymptotic covariance matrix of the maximum likelihood estimates, and then uses it to calculate the sample size needed to test a hypothesis about the parameter. Self et al. (1992) described a general approach for power and sample size calculations within the framework of generalized linear models, which include logistic regression as a special case. Their approach is based on an approximation to the distribution of the likelihood ratio statistic. Unlike the Whittemore approach, their approach is not limited to situations of small response probability. However, it is restricted to models with a finite number of covariate configurations. This study compares these two approaches to see how accurate they would be for the calculations of power and sample size in logistic regression models with various response probabilities and covariate distributions. The results indicate that the Whittemore approach has a slight advantage in achieving the nominal power only for one case with small response probability. It is outperformed for all other cases with larger response probabilities. In general, the approach proposed in Self et al. (1992) is recommended for all values of the response probability. However, its extension for logistic regression models with an infinite number of covariate configurations involves an arbitrary decision for categorization and leads to a discrete approximation. As shown in this paper, the examined discrete approximations appear to be sufficiently accurate for practical purpose. 相似文献

13.

A frequentist understanding of sets of measures

P.I. Fierens L.C. Rêgo T.L. Fine 《Journal of statistical planning and inference》2009

We present a mathematical theory of objective, frequentist chance phenomena that uses as a model a set of probability measures. In this work, sets of measures are not viewed as a statistical compound hypothesis or as a tool for modeling imprecise subjective behavior. Instead we use sets of measures to model stable (although not stationary in the traditional stochastic sense) physical sources of finite time series data that have highly irregular behavior. Such models give a coarse-grained picture of the phenomena, keeping track of the range of the possible probabilities of the events. We present methods to simulate finite data sequences coming from a source modeled by a set of probability measures, and to estimate the model from finite time series data. The estimation of the set of probability measures is based on the analysis of a set of relative frequencies of events taken along subsequences selected by a collection of rules. In particular, we provide a universal methodology for finding a family of subsequence selection rules that can estimate any set of probability measures with high probability. 相似文献

14.

Optimal Designs for Binary Logistic Regression with a Qualitative Classifier with Independent Levels

Karabi Nandy Sami Helle Antti Liski Erkki Liski 《统计学通讯:模拟与计算》2013,42(10):1962-1977

Dose response studies arise in many medical applications. Often, such studies are considered within the framework of binary-response experiments such as success-failure. In such cases, popular choices for modeling the probability of response are logistic or probit models. Design optimality has been well studied for the logistic model with a continuous covariate. A natural extension of the logistic model is to consider the presence of a qualitative classifier. In this work, we explore D-, A-, and E-optimal designs in a two-parameter, binary logistic regression model after introducing a binary, qualitative classifier with independent levels. 相似文献

15.

Bayesian joint modeling of correlated counts data with application to adverse birth outcomes

Cindy Xin Feng 《Journal of applied statistics》2015,42(6):1206-1222

In disease mapping, health outcomes measured at the same spatial locations may be correlated, so one can consider joint modeling the multivariate health outcomes accounting for their dependence. The general approaches often used for joint modeling include shared component models and multivariate models. An alternative way to model the association between two health outcomes, when one outcome can naturally serve as a covariate of the other, is to use ecological regression model. For example, in our application, preterm birth (PTB) can be treated as a predictor for low birth weight (LBW) and vice versa. Therefore, we proposed to blend the ideas from joint modeling and ecological regression methods to jointly model the relative risks for LBW and PTBs over the health districts in Saskatchewan, Canada, in 2000–2010. This approach is helpful when proxy of areal-level contextual factors can be derived based on the outcomes themselves when direct information on risk factors are not readily available. Our results indicate that the proposed approach improves the model fit when compared with the conventional joint modeling methods. Further, we showed that when no strong spatial autocorrelation is present, joint outcome modeling using only independent error terms can still provide a better model fit when compared with the separate modeling. 相似文献

16.

Probability models on horse-race outcomes

Mukhtar M. Ali 《Journal of applied statistics》1998,25(2):221-229

SUMMARY A number of models have been examined for modelling probability based on rankings. Most prominent among these are the gamma and normal probability models. The accuracy of these models in predicting the outcomes of horse races is investigated in this paper. The parameters of these models are estimated by the maximum likelihood method, using the information on win pool fractions. These models are used to estimate the probabilities that race entrants finish second or third in a race. These probabilities are then compared with the corresponding objective probabilities estimated from actual race outcomes. The data are obtained from over 15 000 races. it is found that all the models tend to overestimate the probability of a horse finishing second or third when the horse has a high probability of such a result, but underestimate the probability of a horse finishing second or third when this probability is low. 相似文献

17.

From distance sampling to spatial capture–recapture

David L. Borchers Tiago A. Marques 《AStA Advances in Statistical Analysis》2017,101(4):475-494

Distance sampling and capture–recapture are the two most widely used wildlife abundance estimation methods. capture–recapture methods have only recently incorporated models for spatial distribution and there is an increasing tendency for distance sampling methods to incorporated spatial models rather than to rely on partly design-based spatial inference. In this overview we show how spatial models are central to modern distance sampling and that spatial capture–recapture models arise as an extension of distance sampling methods. Depending on the type of data recorded, they can be viewed as particular kinds of hierarchical binary regression, Poisson regression, survival or time-to-event models, with individuals’ locations as latent variables and a spatial model as the latent variable distribution. Incorporation of spatial models in these two methods provides new opportunities for drawing explicitly spatial inferences. Areas of likely future development include more sophisticated spatial and spatio-temporal modelling of individuals’ locations and movements, new methods for integrating spatial capture–recapture and other kinds of ecological survey data, and methods for dealing with the recapture uncertainty that often arise when “capture” consists of detection by a remote device like a camera trap or microphone. 相似文献

18.

Modeling data with a truncated and inflated Poisson distribution

Min-Hsiao Tsai Ting Hsiang Lin 《Statistical Methods and Applications》2017,26(3):383-401

Zero inflated Poisson regression is a model commonly used to analyze data with excessive zeros. Although many models have been developed to fit zero-inflated data, most of them strongly depend on the special features of the individual data. For example, there is a need for new models when dealing with truncated and inflated data. In this paper, we propose a new model that is sufficiently flexible to model inflation and truncation simultaneously, and which is a mixture of a multinomial logistic and a truncated Poisson regression, in which the multinomial logistic component models the occurrence of excessive counts. The truncated Poisson regression models the counts that are assumed to follow a truncated Poisson distribution. The performance of our proposed model is evaluated through simulation studies, and our model is found to have the smallest mean absolute error and best model fit. In the empirical example, the data are truncated with inflated values of zero and fourteen, and the results show that our model has a better fit than the other competing models. 相似文献

19.

The use of multi-state capture-recapture models to address questions in evolutionary ecology

James D. Nichols William L. Kendall 《Journal of applied statistics》1995,22(5-6):835-846

Multi-state capture-recapture models can be used to estimate survival rates in populations that are stratified by location or by state variables associated with individual animals. In populations stratified by location, movement probabilities can be estimated and used to test hypotheses relevant to population genetics and evolutionary ecology. When the interest is in state variables, these models permit estimation and testing of hypotheses about state-specific survival probabilities. If the state variable of interest is reproductive activity or success, then the multi-state modeling approach can be used to test hypotheses about life history trade-offs and a possible cost of reproduction. 相似文献

20.

Nonparametric survival regression using the beta-Stacy process

Fabio Rigat Pietro Muliere 《Journal of statistical planning and inference》2012

A novel class of hierarchical nonparametric Bayesian survival regression models for time-to-event data with uninformative right censoring is introduced. The survival curve is modeled as a random function whose prior distribution is defined using the beta-Stacy (BS) process. The prior mean of each survival probability and its prior variance are linked to a standard parametric survival regression model. This nonparametric survival regression can thus be anchored to any reference parametric form, such as a proportional hazards or an accelerated failure time model, allowing substantial departures of the predictive survival probabilities when the reference model is not supported by the data. Also, under this formulation the predictive survival probabilities will be close to the empirical survival distribution near the mode of the reference model and they will be shrunken towards its probability density in the tails of the empirical distribution. 相似文献