期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Outlier Detection in a Circular Regression Model Using COVRATIO Statistic

S. Ibrahim A. Rambli A. G. Hussin I. Mohamed 《统计学通讯:模拟与计算》2013,42(10):2272-2280

In this article, we model the relationship between two circular variables using the circular regression models, to be called JS circular regression model, which was proposed by Jammalamadaka and Sarma (1993). The model has many interesting properties and is sensitive enough to detect the occurrence of outliers. We focus our attention on the problem of identifying outliers in this model. In particular, we extend the use of the COVRATIO statistic, which has been successfully used in the linear case for the same purpose, to the JS circular regression model via a row deletion approach. Through simulation studies, the cut-off points for the new procedure are obtained and its power of performance is investigated. It is found that the performance improves when the resulting residuals have small variance and when the sample size gets larger. An example of the application of the procedure is presented using a real dataset. 相似文献

2.

Sparse alternatives to ridge regression: a random effects approach

Arief Gusnanto Yudi Pawitan 《Journal of applied statistics》2015,42(1):12-26

In a calibration of near-infrared (NIR) instrument, we regress some chemical compositions of interest as a function of their NIR spectra. In this process, we have two immediate challenges: first, the number of variables exceeds the number of observations and, second, the multicollinearity between variables are extremely high. To deal with the challenges, prediction models that produce sparse solutions have recently been proposed. The term ‘sparse’ means that some model parameters are zero estimated and the other parameters are estimated naturally away from zero. In effect, a variable selection is embedded in the model to potentially achieve a better prediction. Many studies have investigated sparse solutions for latent variable models, such as partial least squares and principal component regression, and for direct regression models such as ridge regression (RR). However, in the latter, it mainly involves an L₁ norm penalty to the objective function such as lasso regression. In this study, we investigate new sparse alternative models for RR within a random effects model framework, where we consider Cauchy and mixture-of-normals distributions on the random effects. The results indicate that the mixture-of-normals model produces a sparse solution with good prediction and better interpretation. We illustrate the methods using NIR spectra datasets from milk and corn specimens. 相似文献

3.

A model-averaging treatment of multiple instruments in Poisson models with errors

Xiaomeng Zhang Xinyu Zhang Yanyuan Ma 《Revue canadienne de statistique》2023,51(1):173-198

We analyze Poisson regression when covariates contain measurement errors and when multiple potential instrumental variables are available. Without empirical knowledge to select the most suitable variable as an instrument, we propose a novel model-averaging approach to resolve this issue. We prescribe an implementation and establish its optimality in terms of minimizing prediction risk. We further show that, as long as one model is correctly specified among all potential instrumental variable models, our method will lead to consistent prediction. The performance of our method is illustrated through simulations and a movie sales example. 相似文献

4.

Multivariate modelling of spatial extremes based on copulas

Raymond K. S. Chan 《Journal of Statistical Computation and Simulation》2018,88(12):2404-2424

To model extreme spatial events, a general approach is to use the generalized extreme value (GEV) distribution with spatially varying parameters such as spatial GEV models and latent variable models. In the literature, this approach is mostly used to capture spatial dependence for only one type of event. This limits the applications to air pollutants data as different pollutants may chemically interact with each other. A recent advancement in spatial extremes modelling for multiple variables is the multivariate max-stable processes. Similarly to univariate max-stable processes, the multivariate version also assumes standard distributions such as unit-Fréchet as margins. Additional modelling is required for applications such as spatial prediction. In this paper, we extend the marginal methods such as spatial GEV models and latent variable models into a multivariate setting based on copulas so that it is capable of handling both the spatial dependence and the dependence among multiple pollutants. We apply our proposed model to analyse weekly maxima of nitrogen dioxide, sulphur dioxide, respirable suspended particles, fine suspended particles, and ozone collected in Pearl River Delta in China. 相似文献

5.

A multi-index model for quantile regression with ordinal data

Hyokyoung Grace Hong Jianhui Zhou 《Journal of applied statistics》2013,40(6):1231-1245

In this paper, we propose a quantile approach to the multi-index semiparametric model for an ordinal response variable. Permitting non-parametric transformation of the response, the proposed method achieves a root-n rate of convergence and has attractive robustness properties. Further, the proposed model allows additional indices to model the remaining correlations between covariates and the residuals from the single-index, considerably reducing the error variance and thus leading to more efficient prediction intervals (PIs). The utility of the model is demonstrated by estimating PIs for functional status of the elderly based on data from the second longitudinal study of aging. It is shown that the proposed multi-index model provides significantly narrower PIs than competing models. Our approach can be applied to other areas in which the distribution of future observations must be predicted from ordinal response data. 相似文献

6.

Variable selection in finite mixture of semi-parametric regression models

Ehsan Ormoz Farzad Eskandari 《统计学通讯:理论与方法》2013,42(3):695-711

Abstract

In this paper we are concerned with variable selection in finite mixture of semiparametric regression models. This task consists of model selection for non parametric component and variable selection for parametric part. Thus, we encountered separate model selections for every non parametric component of each sub model. To overcome this computational burden, we introduced a class of variable selection procedures for finite mixture of semiparametric regression models using penalized approach for variable selection. It is shown that the new method is consistent for variable selection. Simulations show that the performance of proposed method is good, and it consequently improves pervious works in this area and also requires much less computing power than existing methods. 相似文献

7.

Detection of outliers in simple circular regression models using the mean circular error statistic

A. H. Abuzaid A. G. Hussin I. B. Mohamed 《Journal of Statistical Computation and Simulation》2013,83(2):269-277

The investigation on the identification of outliers in linear regression models can be extended to those for circular regression case. In this paper, we propose a new numerical statistic called mean circular error to identify possible outliers in circular regression models by using a row deletion approach. Through intensive simulation studies, the cut-off points of the statistic are obtained and its power of performance investigated. It is found that the performance improves as the concentration parameter of circular residuals becomes larger or the sample size becomes smaller. As an illustration, the statistic is applied to a wind direction data set. 相似文献

8.

Predicting recessions using trends in the yield spread

Steven E. Kozlowski Thaddeus Sim 《Journal of applied statistics》2019,46(7):1323-1335

The yield spread, measured as the difference between long- and short-term interest rates, is widely regarded as one of the strongest predictors of economic recessions. In this paper, we propose an enhanced recession prediction model that incorporates trends in the value of the yield spread. We expect our model to generate stronger recession signals because a steadily declining value of the yield spread typically indicates growing pessimism associated with the reduced future business activity. We capture trends in the yield spread by considering both the level of the yield spread at a lag of 12 months as well as its value at each of the previous two quarters leading up to the forecast origin, and we evaluate its predictive abilities using both logit and artificial neural network models. Our results indicate that models incorporating information from the time series of the yield spread correctly predict future recession periods much better than models only considering the spread value as of the forecast origin. Furthermore, the results are strongest for our artificial neural network model and logistic regression model that includes interaction terms, which we confirm using both a blocked cross-validation technique as well as an expanding estimation window approach. 相似文献

9.

General location multivariate latent variable models for mixed correlated bounded continuous,ordinal, and nominal responses with non-ignorable missing data

Elham Tabrizi Ehsan Bahrami Samani Mojtaba Ganjali 《Journal of applied statistics》2021,48(5):765

Using a multivariate latent variable approach, this article proposes some new general models to analyze the correlated bounded continuous and categorical (nominal or/and ordinal) responses with and without non-ignorable missing values. First, we discuss regression methods for jointly analyzing continuous, nominal, and ordinal responses that we motivated by analyzing data from studies of toxicity development. Second, using the beta and Dirichlet distributions, we extend the models so that some bounded continuous responses are replaced for continuous responses. The joint distribution of the bounded continuous, nominal and ordinal variables is decomposed into a marginal multinomial distribution for the nominal variable and a conditional multivariate joint distribution for the bounded continuous and ordinal variables given the nominal variable. We estimate the regression parameters under the new general location models using the maximum-likelihood method. Sensitivity analysis is also performed to study the influence of small perturbations of the parameters of the missing mechanisms of the model on the maximal normal curvature. The proposed models are applied to two data sets: BMI, Steatosis and Osteoporosis data and Tehran household expenditure budgets. 相似文献

10.

Bayesian beta nonlinear models with constrained parameters to describe ruminal degradation kinetics

Diego Salmern 《Journal of applied statistics》2022,49(10):2612

The models used to describe the kinetics of ruminal degradation are usually nonlinear models where the dependent variable is the proportion of degraded food. The method of least squares is the standard approach used to estimate the unknown parameters but this method can lead to unacceptable predictions. To solve this issue, a beta nonlinear model and the Bayesian perspective is proposed in this article. The application of standard methodologies to obtain prior distributions, such as the Jeffreys prior or the reference priors, involves serious difficulties here because this model is a nonlinear non-normal regression model, and the constrained parameters appear in the log-likelihood function through the Gamma function. This paper proposes an objective method to obtain the prior distribution, which can be applied to other models with similar complexity, can be easily implemented in OpenBUGS, and solves the problem of unacceptable predictions. The model is generalized to a larger class of models. The methodology was applied to real data with three models that were compared using the Deviance Information Criterion and the root mean square prediction error. A simulation study was performed to evaluate the coverage of the credible intervals. 相似文献

11.

Small area estimation of the mean using non-parametric M-quantile regression: a comparison when a linear mixed model does not hold

《Journal of Statistical Computation and Simulation》2012,82(8):945-964

The demand for reliable statistics in subpopulations, when only reduced sample sizes are available, has promoted the development of small area estimation methods. In particular, an approach that is now widely used is based on the seminal work by Battese et al. [An error-components model for prediction of county crop areas using survey and satellite data, J. Am. Statist. Assoc. 83 (1988), pp. 28–36] that uses linear mixed models (MM). We investigate alternatives when a linear MM does not hold because, on one side, linearity may not be assumed and/or, on the other, normality of the random effects may not be assumed. In particular, Opsomer et al. [Nonparametric small area estimation using penalized spline regression, J. R. Statist. Soc. Ser. B 70 (2008), pp. 265–283] propose an estimator that extends the linear MM approach to the case in which a linear relationship may not be assumed using penalized splines regression. From a very different perspective, Chambers and Tzavidis [M-quantile models for small area estimation, Biometrika 93 (2006), pp. 255–268] have recently proposed an approach for small-area estimation that is based on M-quantile (MQ) regression. This allows for models robust to outliers and to distributional assumptions on the errors and the area effects. However, when the functional form of the relationship between the qth MQ and the covariates is not linear, it can lead to biased estimates of the small area parameters. Pratesi et al. [Semiparametric M-quantile regression for estimating the proportion of acidic lakes in 8-digit HUCs of the Northeastern US, Environmetrics 19(7) (2008), pp. 687–701] apply an extended version of this approach for the estimation of the small area distribution function using a non-parametric specification of the conditional MQ of the response variable given the covariates [M. Pratesi, M.G. Ranalli, and N. Salvati, Nonparametric m-quantile regression using penalized splines, J. Nonparametric Stat. 21 (2009), pp. 287–304]. We will derive the small area estimator of the mean under this model, together with its mean-squared error estimator and compare its performance to the other estimators via simulations on both real and simulated data. 相似文献

12.

The probabilistic reduction approach to specifying multinomial logistic regression models in health outcomes research

Jason S. Bergtold Eberechukwu Onukwugha 《Journal of applied statistics》2014,41(10):2206-2221

The paper provides a novel application of the probabilistic reduction (PR) approach to the analysis of multi-categorical outcomes. The PR approach, which systematically takes account of heterogeneity and functional form concerns, can improve the specification of binary regression models. However, its utility for systematically enriching the specification of and inference from models of multi-categorical outcomes has not been examined, while multinomial logistic regression models are commonly used for inference and, increasingly, prediction. Following a theoretical derivation of the PR-based multinomial logistic model (MLM), we compare functional specification and marginal effects from a traditional specification and a PR-based specification in a model of post-stroke hospital discharge disposition and find that the traditional MLM is misspecified. Results suggest that the impact on the reliability of substantive inferences from a misspecified model may be significant, even when model fit statistics do not suggest a strong lack of fit compared with a properly specified model using the PR approach. We identify situations under which a PR-based MLM specification can be advantageous to the applied researcher. 相似文献

13.

Bayes model averaging with selection of regressors

P. J. Brown M. Vannucci T. Fearn 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2002,64(3):519-536

Summary. When a number of distinct models contend for use in prediction, the choice of a single model can offer rather unstable predictions. In regression, stochastic search variable selection with Bayesian model averaging offers a cure for this robustness issue but at the expense of requiring very many predictors. Here we look at Bayes model averaging incorporating variable selection for prediction. This offers similar mean-square errors of prediction but with a vastly reduced predictor space. This can greatly aid the interpretation of the model. It also reduces the cost if measured variables have costs. The development here uses decision theory in the context of the multivariate general linear model. In passing, this reduced predictor space Bayes model averaging is contrasted with single-model approximations. A fast algorithm for updating regressions in the Markov chain Monte Carlo searches for posterior inference is developed, allowing many more variables than observations to be contemplated. We discuss the merits of absolute rather than proportionate shrinkage in regression, especially when there are more variables than observations. The methodology is illustrated on a set of spectroscopic data used for measuring the amounts of different sugars in an aqueous solution. 相似文献

14.

A simulation based method for assessing the statistical significance of logistic regression models after common variable selection procedures

Tristan R. Grogan David A. Elashoff 《统计学通讯:模拟与计算》2017,46(9):7180-7193

Classification models can demonstrate apparent prediction accuracy even when there is no underlying relationship between the predictors and the response. Variable selection procedures can lead to false positive variable selections and overestimation of true model performance. A simulation study was conducted using logistic regression with forward stepwise, best subsets, and LASSO variable selection methods with varying total sample sizes (20, 50, 100, 200) and numbers of random noise predictor variables (3, 5, 10, 15, 20, 50). Using our critical values can help reduce needless follow-up on variables having no true association with the outcome. 相似文献

15.

A computational framework for variable selection in multivariate regression

Bruce E. Barrett J. Brian Gray 《Statistics and Computing》1994,4(3):203-212

Stepwise variable selection procedures are computationally inexpensive methods for constructing useful regression models for a single dependent variable. At each step a variable is entered into or deleted from the current model, based on the criterion of minimizing the error sum of squares (SSE). When there is more than one dependent variable, the situation is more complex. In this article we propose variable selection criteria for multivariate regression which generalize the univariate SSE criterion. Specifically, we suggest minimizing some function of the estimated error covariance matrix: the trace, the determinant, or the largest eigenvalue. The computations associated with these criteria may be burdensome. We develop a computational framework based on the use of the SWEEP operator which greatly reduces these calculations for stepwise variable selection in multivariate regression. 相似文献

16.

Optimal Estimator for Logistic Model with Distribution‐free Random Intercept

下载免费PDF全文

Tanya P. Garcia Yanyuan Ma 《Scandinavian Journal of Statistics》2016,43(1):156-171

Logistic models with a random intercept are prevalent in medical and social research where clustered and longitudinal data are often collected. Traditionally, the random intercept in these models is assumed to follow some parametric distribution such as the normal distribution. However, such an assumption inevitably raises concerns about model misspecification and misleading inference conclusions, especially when there is dependence between the random intercept and model covariates. To protect against such issues, we use a semiparametric approach to develop a computationally simple and consistent estimator where the random intercept is distribution‐free. The estimator is revealed to be optimal and achieve the efficiency bound without the need to postulate or estimate any latent variable distributions. We further characterize other general mixed models where such an optimal estimator exists. 相似文献

17.

M-quantile models with application to poverty mapping 总被引：1，自引：0，他引：1

Nikos Tzavidis Nicola Salvati Monica Pratesi Ray Chambers 《Statistical Methods and Applications》2008,17(3):393-411

Over the last decade there has been growing demand for estimates of population characteristics at small area level. Unfortunately, cost constraints in the design of sample surveys lead to small sample sizes within these areas and as a result direct estimation, using only the survey data, is inappropriate since it yields estimates with unacceptable levels of precision. Small area models are designed to tackle the small sample size problem. The most popular class of models for small area estimation is random effects models that include random area effects to account for between area variations. However, such models also depend on strong distributional assumptions, require a formal specification of the random part of the model and do not easily allow for outlier robust inference. An alternative approach to small area estimation that is based on the use of M-quantile models was recently proposed by Chambers and Tzavidis (Biometrika 93(2):255–268, 2006) and Tzavidis and Chambers (Robust prediction of small area means and distributions. Working paper, 2007). Unlike traditional random effects models, M-quantile models do not depend on strong distributional assumption and automatically provide outlier robust inference. In this paper we illustrate for the first time how M-quantile models can be practically employed for deriving small area estimates of poverty and inequality. The methodology we propose improves the traditional poverty mapping methods in the following ways: (a) it enables the estimation of the distribution function of the study variable within the small area of interest both under an M-quantile and a random effects model, (b) it provides analytical, instead of empirical, estimation of the mean squared error of the M-quantile small area mean estimates and (c) it employs a robust to outliers estimation method. The methodology is applied to data from the 2002 Living Standards Measurement Survey (LSMS) in Albania for estimating (a) district level estimates of the incidence of poverty in Albania, (b) district level inequality measures and (c) the distribution function of household per-capita consumption expenditure in each district. Small area estimates of poverty and inequality show that the poorest Albanian districts are in the mountainous regions (north and north east) with the wealthiest districts, which are also linked with high levels of inequality, in the coastal (south west) and southern part of country. We discuss the practical advantages of our methodology and note the consistency of our results with results from previous studies. We further demonstrate the usefulness of the M-quantile estimation framework through design-based simulations based on two realistic survey data sets containing small area information and show that the M-quantile approach may be preferable when the aim is to estimate the small area distribution function. 相似文献

18.

Variable selection in finite mixture of regression models using the skew-normal distribution

Junhui Yin Liucang Wu Lin Dai 《Journal of applied statistics》2020,47(16):2941

Variable selection in finite mixture of regression (FMR) models is frequently used in statistical modeling. The majority of applications of variable selection in FMR models use a normal distribution for regression error. Such assumptions are unsuitable for a set of data containing a group or groups of observations with asymmetric behavior. In this paper, we introduce a variable selection procedure for FMR models using the skew-normal distribution. With appropriate choice of the tuning parameters, we establish the theoretical properties of our procedure, including consistency in variable selection and the oracle property in estimation. To estimate the parameters of the model, a modified EM algorithm for numerical computations is developed. The methodology is illustrated through numerical experiments and a real data example. 相似文献

19.

Prior elicitation, variable selection and Bayesian computation for logistic regression models

M.-H. Chen J. G. Ibrahim & C. Yiannoutsos 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1999,61(1):223-242

Bayesian selection of variables is often difficult to carry out because of the challenge in specifying prior distributions for the regression parameters for all possible models, specifying a prior distribution on the model space and computations. We address these three issues for the logistic regression model. For the first, we propose an informative prior distribution for variable selection. Several theoretical and computational properties of the prior are derived and illustrated with several examples. For the second, we propose a method for specifying an informative prior on the model space, and for the third we propose novel methods for computing the marginal distribution of the data. The new computational algorithms only require Gibbs samples from the full model to facilitate the computation of the prior and posterior model probabilities for all possible models. Several properties of the algorithms are also derived. The prior specification for the first challenge focuses on the observables in that the elicitation is based on a prior prediction y ₀ for the response vector and a quantity a ₀ quantifying the uncertainty in y ₀. Then, y ₀ and a ₀ are used to specify a prior for the regression coefficients semi-automatically. Examples using real data are given to demonstrate the methodology. 相似文献

20.

A comparison of methods for the fitting of generalized additive models

Harald Binder Gerhard Tutz 《Statistics and Computing》2008,18(1):87-99

There are several procedures for fitting generalized additive models, i.e. regression models for an exponential family response where the influence of each single covariates is assumed to have unknown, potentially non-linear shape. Simulated data are used to compare a smoothing parameter optimization approach for selection of smoothness and of covariates, a stepwise approach, a mixed model approach, and a procedure based on boosting techniques. In particular it is investigated how the performance of procedures is linked to amount of information, type of response, total number of covariates, number of influential covariates, and extent of non-linearity. Measures for comparison are prediction performance, identification of influential covariates, and smoothness of fitted functions. One result is that the mixed model approach returns sparse fits with frequently over-smoothed functions, while the functions are less smooth for the boosting approach and variable selection is less strict. The other approaches are in between with respect to these measures. The boosting procedure is seen to perform very well when little information is available and/or when a large number of covariates is to be investigated. It is somewhat surprising that in scenarios with low information the fitting of a linear model, even with stepwise variable selection, has not much advantage over the fitting of an additive model when the true underlying structure is linear. In cases with more information the prediction performance of all procedures is very similar. So, in difficult data situations the boosting approach can be recommended, in others the procedures can be chosen conditional on the aim of the analysis. 相似文献