期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

On Baseline Conditions for Zero-Inflated Longitudinal Count Data

Antonello Maruotti Valentina Raponi 《统计学通讯:模拟与计算》2013,42(4):743-760

We describe a mixed-effect hurdle model for zero-inflated longitudinal count data, where a baseline variable is included in the model specification. Association between the count data process and the endogenous baseline variable is modeled through a latent structure, assumed to be dependent across equations. We show how model parameters can be estimated in a finite mixture context, allowing for overdispersion, multivariate association and endogeneity of the baseline variable. The model behavior is investigated through a large-scale simulation experiment. An empirical example on health care utilization data is provided. 相似文献

2.

Economic design based on a multivariate Bayesian VSI chart with a dual control limit

Xing Zhang Chao Tan 《统计学通讯:理论与方法》2017,46(14):6808-6822

In this paper, a multivariate Bayesian variable sampling interval (VSI) control chart for the economic design and optimization of statistical parameters is designed. Based on the VSI sampling strategy of a multivariate Bayesian control chart with dual control limits, the optimal expected cost function is constructed. The proposed model allows the determination of the scheme parameters that minimize the expected cost per time of the process. The effectiveness of the Bayesian VSI chart is estimated through economic comparisons with the Bayesian fixed sampling interval and the Hotelling's T² chart. This study is an in-depth study on a Bayesian multivariate control chart with variable parameter. Furthermore, it is shown that significant cost improvement may be realized through the new model. 相似文献

3.

Bayesian variable selection for multioutcome models through shared shrinkage

Debamita Kundu Riten Mitra Jeremy T. Gaskins 《Scandinavian Journal of Statistics》2021,48(1):295-320

Variable selection over a potentially large set of covariates in a linear model is quite popular. In the Bayesian context, common prior choices can lead to a posterior expectation of the regression coefficients that is a sparse (or nearly sparse) vector with a few nonzero components, those covariates that are most important. This article extends the “global‐local” shrinkage idea to a scenario where one wishes to model multiple response variables simultaneously. Here, we have developed a variable selection method for a K‐outcome model (multivariate regression) that identifies the most important covariates across all outcomes. The prior for all regression coefficients is a mean zero normal with coefficient‐specific variance term that consists of a predictor‐specific factor (shared local shrinkage parameter) and a model‐specific factor (global shrinkage term) that differs in each model. The performance of our modeling approach is evaluated through simulation studies and a data example. 相似文献

4.

A computational framework for variable selection in multivariate regression

Bruce E. Barrett J. Brian Gray 《Statistics and Computing》1994,4(3):203-212

Stepwise variable selection procedures are computationally inexpensive methods for constructing useful regression models for a single dependent variable. At each step a variable is entered into or deleted from the current model, based on the criterion of minimizing the error sum of squares (SSE). When there is more than one dependent variable, the situation is more complex. In this article we propose variable selection criteria for multivariate regression which generalize the univariate SSE criterion. Specifically, we suggest minimizing some function of the estimated error covariance matrix: the trace, the determinant, or the largest eigenvalue. The computations associated with these criteria may be burdensome. We develop a computational framework based on the use of the SWEEP operator which greatly reduces these calculations for stepwise variable selection in multivariate regression. 相似文献

5.

A mixture model for the detection of Neosporosis without a gold standard

Andrés Farall Tomás Tetzlaff 《Journal of applied statistics》2011,38(5):913-926

Neosporosis is a bovine disease caused by the parasite Neospora caninum. It is not yet sufficiently studied, and it is supposed to cause an important number of abortions. Its clinical symptoms do not yet allow the reliable identification of infected animals. Its study and treatment would improve if a test based on antibody counts were available. Knowing the distribution functions of observed counts of uninfected and infected cows would allow the determination of a cutoff value. These distributions cannot be estimated directly. This paper deals with the indirect estimation of these distributions based on a data set consisting of the antibody counts for some 200 pairs of cows and their calves. The desired distributions are estimated through a mixture model based on simple assumptions that describe the relationship between each cow and its calf. The model then allows the estimation of the cutoff value and of the error probabilities. 相似文献

6.

High breakdown point robust estimators with missing data

Florencia Statti Victor J. Yohai 《统计学通讯:理论与方法》2018,47(21):5145-5162

In this paper, we propose a new procedure to estimate the distribution of a variable y when there are missing data. To compensate the presence of missing responses, it is assumed that a covariate vector x is observed and that y and x are related by means of a semi-parametric regression model. Observed residuals are combined with predicted values to estimate the missing response distribution. Once the responses distribution is consistently estimated, we can estimate any parameter defined through a continuous functional T using a plug in procedure. We prove that the proposed estimators have high breakdown point. 相似文献

7.

Decomposition analysis as a framework for understanding heterogeneity of treatment effects in non-randomized health care studies

William H. Crown 《Pharmaceutical statistics》2021,20(5):945-951

This paper uses the decomposition framework from the economics literature to examine the statistical structure of treatment effects estimated with observational data compared to those estimated from randomized studies. It begins with the estimation of treatment effects using a dummy variable in regression models and then presents the decomposition method from economics which estimates separate regression models for the comparison groups and recovers the treatment effect using bootstrapping methods. This method shows that the overall treatment effect is a weighted average of structural relationships of patient features with outcomes within each treatment arm and differences in the distributions of these features across the arms. In large randomized trials, it is assumed that the distribution of features across arms is very similar. Importantly, randomization not only balances observed features but also unobserved. Applying high dimensional balancing methods such as propensity score matching to the observational data causes the distributional terms of the decomposition model to be eliminated but unobserved features may still not be balanced in the observational data. Finally, a correction for non-random selection into the treatment groups is introduced via a switching regime model. Theoretically, the treatment effect estimates obtained from this model should be the same as those from a randomized trial. However, there are significant challenges in identifying instrumental variables that are necessary for estimating such models. At a minimum, decomposition models are useful tools for understanding the relationship between treatment effects estimated from observational versus randomized data. 相似文献

8.

Fairness of the national health service in Italy: a bivariate correlated random effects model

Antonello Maruotti 《Journal of applied statistics》2009,36(7):709-722

The primary purpose of this paper is to comprehensively assess households’ burden due to health payments. Starting from the fairness approach developed by the World Health Organization, we analyse the burden of healthcare payments on Italian households by modeling catastrophic payments and impoverishment due to healthcare expenditures. For this purpose, we propose to extend the analysis of fairness in financing contribution through a generalized linear mixed models by introducing a bivariate correlated random effects model, where association between the outcomes is modeled through individual- and outcome-specific latent effects which are assumed to be correlated. We discuss model parameter estimation in a finite mixture context. By using such model specification, the fairness of the Italian national health service is investigated. 相似文献

9.

Count data and treatment heterogeneity in 2×2 crossover trials

N. T. Longford 《Journal of the Royal Statistical Society. Series C, Applied statistics》1998,47(2):217-229

Count data are routinely assumed to have a Poisson distribution, especially when there are no straightforward diagnostic procedures for checking this assumption. We reanalyse two data sets from crossover trials of treatments for angina pectoris , in which the outcomes are counts of anginal attacks. Standard analyses focus on treatment effects, averaged over subjects; we are also interested in the dispersion of these effects (treatment heterogeneity). We set up a log-Poisson model with random coefficients to estimate the distribution of the treatment effects and show that the analysis is very sensitive to the distributional assumption; the population variance of the treatment effects is confounded with the (variance) function that relates the conditional variance of the outcomes, given the subject's rate of attacks, to the conditional mean. Diagnostic model checks based on resampling from the fitted distribution indicate that the default choice of the Poisson distribution for the analysed data sets is poorly supported. We propose to augment the data sets with observations of the counts, made possibly outside the clinical setting, so that the conditional distribution of the counts could be established. 相似文献

10.

A Method to Handle Zero Counts in the Multinomial Model

Frank Tuyl 《The American statistician》2019,73(2):151-158

In the context of an objective Bayesian approach to the multinomial model, Dirichlet(a, …, a) priors with a < 1 have previously been shown to be inadequate in the presence of zero counts, suggesting that the uniform prior (a = 1) is the preferred candidate. In the presence of many zero counts, however, this prior may not be satisfactory either. A model selection approach is proposed, allowing for the possibility of zero parameters corresponding to zero count categories. This approach results in a posterior mixture of Dirichlet distributions and marginal mixtures of beta distributions, which seem to avoid the problems that potentially result from the various proposed Dirichlet priors, in particular in the context of extreme data with zero counts. 相似文献

11.

A copula regression model for estimating firm efficiency in the insurance industry

Peng Shi Wei Zhang 《Journal of applied statistics》2011,38(10):2271-2287

This article considers the estimation of insurers’ cost-efficiency in a longitudinal context. The current practice ignores the tails of the cost distribution, where the most and least efficient insurers belong to. To address this issue, we propose a copula regression model to estimate insurers’ cost frontier. Both time-invariant and time-varying efficiency are adapted to this framework and various temporal patterns are considered. In our method, flexible distributions are allowed for the marginals, and the subject heterogeneity is accommodated through an association matrix. Specifically, when fitting to the insurance data, we perform a GB2 regression on insurers total cost and employ a t-copula to capture their intertemporal dependencies. In doing so, we provide a nonlinear formulation of the stochastic panel frontier and the parameters are easily estimated by likelihood-based method. Based on a translog cost function, the X-efficiency is estimated for US property-casualty insurers. An economic analysis provides evidences of economies of scale and the consistency between the cost-efficiency and other performance measures. 相似文献

12.

Bayesian analysis of multivariate threshold autoregressive models with missing data

Sergio A. Calderón V. Fabio H. Nieto 《统计学通讯:理论与方法》2017,46(1):296-318

In some fields, we are forced to work with missing data in multivariate time series. Unfortunately, the data analysis in this context cannot be carried out in the same way as in the case of complete data. To deal with this problem, a Bayesian analysis of multivariate threshold autoregressive models with exogenous inputs and missing data is carried out. In this paper, Markov chain Monte Carlo methods are used to obtain samples from the involved posterior distributions, including threshold values and missing data. In order to identify autoregressive orders, we adapt the Bayesian variable selection method in this class of multivariate process. The number of regimes is estimated using marginal likelihood or product parameter-space strategies. 相似文献

13.

Inference after variable selection using restricted permutation methods

Rui Wang Stephen W. Lagakos 《Revue canadienne de statistique》2009,37(4):625-644

When confronted with multiple covariates and a response variable, analysts sometimes apply a variable‐selection algorithm to the covariate‐response data to identify a subset of covariates potentially associated with the response, and then wish to make inferences about parameters in a model for the marginal association between the selected covariates and the response. If an independent data set were available, the parameters of interest could be estimated by using standard inference methods to fit the postulated marginal model to the independent data set. However, when applied to the same data set used by the variable selector, standard (“naive”) methods can lead to distorted inferences. The authors develop testing and interval estimation methods for parameters reflecting the marginal association between the selected covariates and response variable, based on the same data set used for variable selection. They provide theoretical justification for the proposed methods, present results to guide their implementation, and use simulations to assess and compare their performance to a sample‐splitting approach. The methods are illustrated with data from a recent AIDS study. The Canadian Journal of Statistics 37: 625–644; 2009 © 2009 Statistical Society of Canada 相似文献

14.

Modelling the role of variables in model-based cluster analysis

Giuliano Galimberti Annamaria Manisi Gabriele Soffritti 《Statistics and Computing》2018,28(1):145-169

In the framework of cluster analysis based on Gaussian mixture models, it is usually assumed that all the variables provide information about the clustering of the sample units. Several variable selection procedures are available in order to detect the structure of interest for the clustering when this structure is contained in a variable sub-vector. Currently, in these procedures a variable is assumed to play one of (up to) three roles: (1) informative, (2) uninformative and correlated with some informative variables, (3) uninformative and uncorrelated with any informative variable. A more general approach for modelling the role of a variable is proposed by taking into account the possibility that the variable vector provides information about more than one structure of interest for the clustering. This approach is developed by assuming that such information is given by non-overlapped and possibly correlated sub-vectors of variables; it is also assumed that the model for the variable vector is equal to a product of conditionally independent Gaussian mixture models (one for each variable sub-vector). Details about model identifiability, parameter estimation and model selection are provided. The usefulness and effectiveness of the described methodology are illustrated using simulated and real datasets. 相似文献

15.

Mixture of linear mixed models using multivariate t distribution

《Journal of Statistical Computation and Simulation》2012,82(4):771-787

Linear mixed models are widely used when multiple correlated measurements are made on each unit of interest. In many applications, the units may form several distinct clusters, and such heterogeneity can be more appropriately modelled by a finite mixture linear mixed model. The classical estimation approach, in which both the random effects and the error parts are assumed to follow normal distribution, is sensitive to outliers, and failure to accommodate outliers may greatly jeopardize the model estimation and inference. We propose a new mixture linear mixed model using multivariate t distribution. For each mixture component, we assume the response and the random effects jointly follow a multivariate t distribution, to conveniently robustify the estimation procedure. An efficient expectation conditional maximization algorithm is developed for conducting maximum likelihood estimation. The degrees of freedom parameters of the t distributions are chosen data adaptively, for achieving flexible trade-off between estimation robustness and efficiency. Simulation studies and an application on analysing lung growth longitudinal data showcase the efficacy of the proposed approach. 相似文献

16.

Unconditional Quantile Treatment Effects Under Endogeneity

Markus Frölich Blaise Melly 《商业与经济统计学杂志》2013,31(3):346-357

This article develops estimators for unconditional quantile treatment effects when the treatment selection is endogenous. We use an instrumental variable (IV) to solve for the endogeneity of the binary treatment variable. Identification is based on a monotonicity assumption in the treatment choice equation and is achieved without any functional form restriction. We propose a weighting estimator that is extremely simple to implement. This estimator is root n consistent, asymptotically normally distributed, and its variance attains the semiparametric efficiency bound. We also show that including covariates in the estimation is not only necessary for consistency when the IV is itself confounded but also for efficiency when the instrument is valid unconditionally. An application of the suggested methods to the effects of fertility on the family income distribution illustrates their usefulness. Supplementary materials for this article are available online. 相似文献

17.

Point estimation after early stopping in a repeated measures trial

Jae Won Lee Mira Park 《统计学通讯:模拟与计算》2013,42(2):399-417

When the individual measurements are statistically independent, the maximum likelihood estimator calculated at the end of a sequential procedure overestimates the underlying effect. There are many clinical trials in which we are interested in comparing changes in responses between two treatment groups sequentially. Lee and DeMets (1991, JASA 86, 757–762) proposed a group sequential method for comparing rates of change when a response variable is measured for eaeh patient at successive follow-up visits. They assumed that the response follows the linear mixed effects model and derived the asymptotic joint distribution of the sequentially computed statistics. In this article, we consider the maximum likelihood estimator (MLE), the median unbiased estimator (MUE) and the midpoint of a 100(1-α)% confidence interval as point estimators for the rate of change in the linear mixed effects model, and investigate their properties by Monte Carlo simulation. 相似文献

18.

Control charts based on quasi-likelihood estimation for monitoring profiles

Chung-I Li 《Journal of Statistical Computation and Simulation》2018,88(3):457-470

In some applications, the quality of the process or product is characterized and summarized by a functional relationship between a response variable and one or more explanatory variables. Profile monitoring is a technique for checking the stability of the relationship over time. Existing linear profile monitoring methods usually assumed the error distribution to be normal. However, this assumption may not always be true in practice. To address this situation, we propose a method for profile monitoring under the framework of generalized linear models when the relationship between the mean and variance of the response variable is known. Two multivariate exponentially weighted moving average control schemes are proposed based on the estimated profile parameters obtained using a quasi-likelihood approach. The performance of the proposed methods is evaluated by simulation studies. Furthermore, the proposed method is applied to a real data set, and the R code for profile monitoring is made available to users. 相似文献

19.

Bayes model averaging with selection of regressors

P. J. Brown M. Vannucci T. Fearn 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2002,64(3):519-536

Summary. When a number of distinct models contend for use in prediction, the choice of a single model can offer rather unstable predictions. In regression, stochastic search variable selection with Bayesian model averaging offers a cure for this robustness issue but at the expense of requiring very many predictors. Here we look at Bayes model averaging incorporating variable selection for prediction. This offers similar mean-square errors of prediction but with a vastly reduced predictor space. This can greatly aid the interpretation of the model. It also reduces the cost if measured variables have costs. The development here uses decision theory in the context of the multivariate general linear model. In passing, this reduced predictor space Bayes model averaging is contrasted with single-model approximations. A fast algorithm for updating regressions in the Markov chain Monte Carlo searches for posterior inference is developed, allowing many more variables than observations to be contemplated. We discuss the merits of absolute rather than proportionate shrinkage in regression, especially when there are more variables than observations. The methodology is illustrated on a set of spectroscopic data used for measuring the amounts of different sugars in an aqueous solution. 相似文献

20.

Conditional Akaike information under covariate shift with application to small area estimation

下载免费PDF全文

Yuki Kawakubo Shonosuke Sugasawa Tatsuya Kubokawa 《Revue canadienne de statistique》2018,46(2):316-335

In this study, we consider the problem of selecting explanatory variables of fixed effects in linear mixed models under covariate shift, which is when the values of covariates in the model for prediction differ from those in the model for observed data. We construct a variable selection criterion based on the conditional Akaike information introduced by Vaida & Blanchard (2005). We focus especially on covariate shift in small area estimation and demonstrate the usefulness of the proposed criterion. In addition, numerical performance is investigated through simulations, one of which is a design‐based simulation using a real dataset of land prices. The Canadian Journal of Statistics 46: 316–335; 2018 © 2018 Statistical Society of Canada 相似文献