期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Corrected likelihood-ratio tests in logistic regression using small-sample data

Ujjwal Das Subhra Sankar Dhar Vivek Pradhan 《统计学通讯:理论与方法》2018,47(17):4272-4285

Likelihood-ratio tests (LRTs) are often used for inferences on one or more logistic regression coefficients. Conventionally, for given parameters of interest, the nuisance parameters of the likelihood function are replaced by their maximum likelihood estimates. The new function created is called the profile likelihood function, and is used for inference from LRT. In small samples, LRT based on the profile likelihood does not follow χ² distribution. Several corrections have been proposed to improve LRT when used with small-sample data. Additionally, complete or quasi-complete separation is a common geometric feature for small-sample binary data. In this article, for small-sample binary data, we have derived explicitly the correction factors of LRT for models with and without separation, and proposed an algorithm to construct confidence intervals. We have investigated the performances of different LRT corrections, and the corresponding confidence intervals through simulations. Based on the simulation results, we propose an empirical rule of thumb on the use of these methods. Our simulation findings are also supported by real-world data. 相似文献

2.

Phase I monitoring of generalized linear model-based regression profiles

《Journal of Statistical Computation and Simulation》2012,82(14):2839-2859

In some industrial applications, the quality of a process or product is characterized by a relationship between the response variable and one or more independent variables which is called as profile. There are many approaches for monitoring different types of profiles in the literature. Most researchers assume that the response variable follows a normal distribution. However, this assumption may be violated in many cases. The most likely situation is when the response variable follows a distribution from generalized linear models (GLMs). For example, when the response variable is the number of defects in a certain area of a product, the observations follow Poisson distribution and ignoring this fact will cause misleading results. In this paper, three methods including a T²-based method, likelihood ratio test (LRT) method and F method are developed and modified in order to be applied in monitoring GLM regression profiles in Phase I. The performance of the proposed methods is analysed and compared for the special case that the response variable follows Poisson distribution. A simulation study is done regarding the probability of the signal criterion. Results show that the LRT method performs better than two other methods and the F method performs better than the T²-based method in detecting either small or large step shifts as well as drifts. Moreover, the F method performs better than the other two methods, and the LRT method performs poor in comparison with the F and T²-based methods in detecting outliers. A real case, in which the size and number of agglomerates ejected from a volcano in successive days form the GLM profile, is illustrated and the proposed methods are applied to determine whether the number of agglomerates of each size is under statistical control or not. Results showed that the proposed methods could handle the mentioned situation and distinguish the out-of-control conditions. 相似文献

3.

Bootstrap likelihood ratio test for Weibull mixture models fitted to grouped data

Youjiao Yu Jane L. Harvill 《统计学通讯:理论与方法》2013,42(18):4550-4568

Abstract

Weibull mixture models are widely used in a variety of fields for modeling phenomena caused by heterogeneous sources. We focus on circumstances in which original observations are not available, and instead the data comes in the form of a grouping of the original observations. We illustrate EM algorithm for fitting Weibull mixture models for grouped data and propose a bootstrap likelihood ratio test (LRT) for determining the number of subpopulations in a mixture model. The effectiveness of the LRT methods are investigated via simulation. We illustrate the utility of these methods by applying them to two grouped data applications. 相似文献

4.

Improved Score Tests in Symmetric Linear Regression Models

Miguel A. Uribe-Opazo Gauss M. Cordeiro 《统计学通讯:理论与方法》2013,42(2):261-276

The class of symmetric linear regression models has the normal linear regression model as a special case and includes several models that assume that the errors follow a symmetric distribution with longer-than-normal tails. An important member of this class is the t linear regression model, which is commonly used as an alternative to the usual normal regression model when the data contain extreme or outlying observations. In this article, we develop second-order asymptotic theory for score tests in this class of models. We obtain Bartlett-corrected score statistics for testing hypotheses on the regression and the dispersion parameters. The corrected statistics have chi-squared distributions with errors of order O(n ^?3/2), n being the sample size. The corrections represent an improvement over the corresponding original Rao's score statistics, which are chi-squared distributed up to errors of order O(n ^?1). Simulation results show that the corrected score tests perform much better than their uncorrected counterparts in samples of small or moderate size. 相似文献

5.

Tolerance limits under Poisson regression based on small-sample asymptotic methodology

Zachary Zimmer 《统计学通讯:模拟与计算》2017,46(7):5836-5845

This article explores the calculation of tolerance limits for the Poisson regression model based on the profile likelihood methodology and small-sample asymptotic corrections to improve the coverage probability performance. The data consist of n counts, where the mean or expected rate depends upon covariates via the log regression function. This article evaluated upper tolerance limits as a function of covariates. The upper tolerance limits are obtained from upper confidence limits of the mean. To compute upper confidence limits the following methodologies were considered: likelihood based asymptotic methods, small-sample asymptotic methods to improve the likelihood based methodology, and the delta method. Two applications are discussed: one application relating to defects in semiconductor wafers due to plasma etching and the other examining the number of surface faults in upper seams of coal mines. All three methodologies are illustrated for the two applications. 相似文献

6.

Inferences on the difference between future observations for comparing two treatments

A. J. Hayter 《Journal of applied statistics》2013,40(4):887-900

The comparison of two treatments with normally distributed data is considered. Inferences are considered based upon the difference between single potential future observations from each of the two treatments, which provides a useful and easily interpretable assessment of the difference between the two treatments. These methodologies combine information from a standard confidence interval analysis of the difference between the two treatment means, with information available from standard prediction intervals of future observations. Win-probabilities, which are the probabilities that a future observation from one treatment will be superior to a future observation from the other treatment, are a special case of these methodologies. The theoretical derivation of these methodologies is based upon inferences about the non-centrality parameter of a non-central t-distribution. Equal and unequal variance situations are addressed, and extensions to groups of future observations from the two treatments are also considered. Some examples and discussions of the methodologies are presented. 相似文献

7.

A class of finite mixture of quantile regressions with its applications

Yuzhu Tian Maozai Tian 《Journal of applied statistics》2016,43(7):1240-1252

Mixture of linear regression models provide a popular treatment for modeling nonlinear regression relationship. The traditional estimation of mixture of regression models is based on Gaussian error assumption. It is well known that such assumption is sensitive to outliers and extreme values. To overcome this issue, a new class of finite mixture of quantile regressions (FMQR) is proposed in this article. Compared with the existing Gaussian mixture regression models, the proposed FMQR model can provide a complete specification on the conditional distribution of response variable for each component. From the likelihood point of view, the FMQR model is equivalent to the finite mixture of regression models based on errors following asymmetric Laplace distribution (ALD), which can be regarded as an extension to the traditional mixture of regression models with normal error terms. An EM algorithm is proposed to obtain the parameter estimates of the FMQR model by combining a hierarchical representation of the ALD. Finally, the iterated weighted least square estimation for each mixture component of the FMQR model is derived. Simulation studies are conducted to illustrate the finite sample performance of the estimation procedure. Analysis of an aphid data set is used to illustrate our methodologies. 相似文献

8.

Logistic Regression and Discriminant Analysis by Ordinary Least Squares

Gus W. Haggstrom 《商业与经济统计学杂志》2013,31(3):229-238

If the observations for fitting a polytomous logistic regression model satisfy certain normality assumptions, the maximum likelihood estimates of the regression coefficients are the discriminant function estimates. This article shows that these estimates, their unbiased counterparts, and associated test statistics for variable selection can be calculated using ordinary least squares regression techniques, thereby providing a convenient method for fitting logistic regression models in the normal case. Evidence is given indicating that the discriminant function estimates and test statistics merit wider use in nonnormal cases, especially in exploratory work on large data sets. 相似文献

9.

On the monitoring of mixture simple linear profiles

《Journal of Statistical Computation and Simulation》2012,82(15):3009-3024

ABSTRACT

In some applications, the quality of a process or product is best characterized by a functional relationship between a response variable and one or more explanatory variables. Profile monitoring is used to understand and to check the stability of this relationship or curve over time. In the existing simple linear regression profile models, it is often assumed that the data follow a single mode distribution and consequently the noise of the functional relationship follows a normal distribution. However, in some applications, it is likely that the data may follow a multiple-modes distribution. In this case, it is more appropriate to assume that the data follow a mixture profile. In this study, we focus on a mixture simple linear profile model, and propose new control schemes for Phase II monitoring. The proposed methods are shown to have good performance in a simulation study. 相似文献

10.

Residuals for log-Burr XII regression models in survival analysis

Giovana O. Silva Gilberto A. Paula 《Journal of applied statistics》2011,38(7):1435-1445

In this paper, we compare three residuals to assess departures from the error assumptions as well as to detect outlying observations in log-Burr XII regression models with censored observations. These residuals can also be used for the log-logistic regression model, which is a special case of the log-Burr XII regression model. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and the empirical distribution of each residual is displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended to the modified martingale-type residual in log-Burr XII regression models with censored data. 相似文献

11.

Modified Profile Likelihood for Fixed-Effects Panel Data Models 总被引：1，自引：0，他引：1

F. Bartolucci R. Bellio A. Salvan N. Sartori 《Econometric Reviews》2016,35(7):1271-1289

We show how modified profile likelihood methods, developed in the statistical literature, may be effectively applied to estimate the structural parameters of econometric models for panel data, with a remarkable reduction of bias with respect to ordinary likelihood methods. Initially, the implementation of these methods is illustrated for general models for panel data including individual-specific fixed effects and then, in more detail, for the truncated linear regression model and dynamic regression models for binary data formulated along with different specifications. Simulation studies show the good behavior of the inference based on the modified profile likelihood, even when compared to an ideal, although infeasible, procedure (in which the fixed effects are known) and also to alternative estimators existing in the econometric literature. The proposed estimation methods are implemented in an R package that we make available to the reader. 相似文献

12.

Discriminating between generalized exponential,geometric extreme exponential and Weibull distributions

《Journal of Statistical Computation and Simulation》2012,82(12):1403-1412

Generalized exponential, geometric extreme exponential and Weibull distributions are three non-negative skewed distributions that are suitable for analysing lifetime data. We present diagnostic tools based on the likelihood ratio test (LRT) and the minimum Kolmogorov distance (KD) method to discriminate between these models. Probability of correct selection has been calculated for each model and for several combinations of shape parameters and sample sizes using Monte Carlo simulation. Application of LRT and KD discrimination methods to some real data sets has also been studied. 相似文献

13.

Efficient Estimation and Robust Inference of Linear Regression Models in the Presence of Heteroscedastic Errors and High Leverage Points

Muhammad Aslam Tahira Riaz Saima Altaf 《统计学通讯:模拟与计算》2013,42(10):2223-2238

It is common for linear regression models that the error variances are not the same for all observations and there are some high leverage data points. In such situations, the available literature advocates the use of heteroscedasticity consistent covariance matrix estimators (HCCME) for the testing of regression coefficients. Primarily, such estimators are based on the residuals derived from the ordinary least squares (OLS) estimator that itself can be seriously inefficient in the presence of heteroscedasticity. To get efficient estimation, many efficient estimators, namely the adaptive estimators are available but their performance has not been evaluated yet when the problem of heteroscedasticity is accompanied with the presence of high leverage data. In this article, the presence of high leverage data is taken into account to evaluate the performance of the adaptive estimator in terms of efficiency. Furthermore, our numerical work also evaluates the performance of the robust standard errors based on this efficient estimator in terms of interval estimation and null rejection rate (NRR). 相似文献

14.

A new class of survival regression models with heavy-tailed errors: robustness and diagnostics

Barros M Paula GA Leiva V 《Lifetime data analysis》2008,14(3):316-332

Birnbaum-Saunders models have largely been applied in material fatigue studies and reliability analyses to relate the total time until failure with some type of cumulative damage. In many problems related to the medical field, such as chronic cardiac diseases and different types of cancer, a cumulative damage caused by several risk factors might cause some degradation that leads to a fatigue process. In these cases, BS models can be suitable for describing the propagation lifetime. However, since the cumulative damage is assumed to be normally distributed in the BS distribution, the parameter estimates from this model can be sensitive to outlying observations. In order to attenuate this influence, we present in this paper BS models, in which a Student-t distribution is assumed to explain the cumulative damage. In particular, we show that the maximum likelihood estimates of the Student-t log-BS models attribute smaller weights to outlying observations, which produce robust parameter estimates. Also, some inferential results are presented. In addition, based on local influence and deviance component and martingale-type residuals, a diagnostics analysis is derived. Finally, a motivating example from the medical field is analyzed using log-BS regression models. Since the parameter estimates appear to be very sensitive to outlying and influential observations, the Student-t log-BS regression model should attenuate such influences. The model checking methodologies developed in this paper are used to compare the fitted models. 相似文献

15.

Win-probabilities for regression models

《Statistical Methodology》2012,9(5):520-527

This paper considers inferences concerning future observations for regression models. Specifically, the differences between future observations at two designated sets of input values are considered. Win-probabilities, which are the probabilities that one of the future observations will exceed the other, constitute a special case of this analysis. These win-probabilities, together with the more general inferences on the difference between the future observations, provide a useful and easily interpretable tool with which a practitioner can assess the information provided by the regression model, and can make decisions regarding which of the two designated sets of input values would be optimal. A multiple-linear-regression model is considered in detail, although the results can be applied to any regression model with normally distributed errors. Central and non-central

t

-distributions are used for the analysis, and several examples of the methodologies are presented. 相似文献

16.

Group variable selection for data with dependent structures

《Journal of Statistical Computation and Simulation》2012,82(1):95-106

Variable selection methods have been widely used in the analysis of high-dimensional data, for example, gene expression microarray data and single nucleotide polymorphism data. A special feature of the genomic data is that genes participating in a common metabolic pathway or sharing a similar biological function tend to have high correlations. The collinearity naturally embedded in these data requires special handling, which cannot be provided by existing variable selection methods. In this paper, we propose a set of new methods to select variables in correlated data. The new methods follow the forward selection procedure of least angle regression (LARS) but conduct grouping and selecting at the same time. The methods specially work when no prior information on group structures of data is available. Simulations and real examples show that our proposed methods often outperform the existing variable selection methods, including LARS and elastic net, in terms of both reducing prediction error and preserving sparsity of representation. 相似文献

17.

Proportional hazards and threshold regression: their theoretical and practical connections

Mei-Ling Ting Lee G. A. Whitmore 《Lifetime data analysis》2010,16(2):196-214

Proportional hazards (PH) regression is a standard methodology for analyzing survival and time-to-event data. The proportional hazards assumption of PH regression, however, is not always appropriate. In addition, PH regression focuses mainly on hazard ratios and thus does not offer many insights into underlying determinants of survival. These limitations have led statistical researchers to explore alternative methodologies. Threshold regression (TR) is one of these alternative methodologies (see Lee and Whitmore, Stat Sci 21:501–513, 2006, for a review). The connection between PH regression and TR has been examined in previous published work but the investigations have been limited in scope. In this article, we study the connections between these two regression methodologies in greater depth and show that PH regression is, for most purposes, a special case of TR. We show two methods of construction by which TR models can yield PH functions for survival times, one based on altering the TR time scale and the other based on varying the TR boundary. We discuss how to estimate the TR time scale and boundary, with or without the PH assumption. A case demonstration is used to highlight the greater understanding of scientific foundations that TR can offer in comparison to PH regression. Finally, we discuss the potential benefits of positioning PH regression within the first-hitting-time context of TR regression. 相似文献

18.

A generalized model of logistic regression for clustered data

Yinsheng Qu George George W. Williams Gerald J. Beck Marlene Goormastic 《统计学通讯:理论与方法》2013,42(12):3447-3476

This paper proposes a generalized logistic regression model that can account for the correlation among responses on subunits. The subunits may arise as data on multiple observations within an individual. This method generalizes earlier work by Rosner (1984 a,b) and others. Methodological generalizations include: (1) the use of the more general Polya-Eggenberger distribution instead of the beta-binomial distribution to model the correlation structure, so that cases with negative, positive, or zero intraclass correlation can be handled; (2) a stepwise approach; (3) linear and non-linear regression; and, (4) the inclusion of the case of a truncated distribution. The model can accommodate missing data and covariates on the unit and subunit level. The derivative-free simplex algorithm is used to estimate the parameters.

The model is applied to data describing the progression of obstruction in coronary disease where multiple arterial segments are studied for each patient. The correlation in response that may exist for these multiple segments is accounted for in the analyses while attempting to examine associations with individual-specific (e.g., history of diabetes) and segment-specific (e.g., initial percent stenosis) covariates. Analyses were performed on a data set describing 382 patients with unoperated coronary artery disease and two coronary angiograms separated by at least one month and on a data set describing 284 patients undergoing percutaneous transluminal coronary angioplasty and studied by coronary angiograms. 相似文献

19.

A modified likelihood ratio test for homogeneity in finite mixture models

Hanfeng Chen Jiahua Chen & John D. Kalbfleisch 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2001,63(1):19-29

Testing for homogeneity in finite mixture models has been investigated by many researchers. The asymptotic null distribution of the likelihood ratio test (LRT) is very complex and difficult to use in practice. We propose a modified LRT for homogeneity in finite mixture models with a general parametric kernel distribution family. The modified LRT has a χ-type of null limiting distribution and is asymptotically most powerful under local alternatives. Simulations show that it performs better than competing tests. They also reveal that the limiting distribution with some adjustment can satisfactorily approximate the quantiles of the test statistic, even for moderate sample sizes. 相似文献

20.

A generalized semiparametric regression and its efficient estimation

Lu Lin Lili Liu Xia Cui Kangning Wang 《Scandinavian Journal of Statistics》2021,48(1):1-24

We investigate a generalized semiparametric regression. Such a model can avoid the risk of wrongly choosing the base measure function. We propose a profile likelihood to efficiently estimate both parameter and nonparametric function. The main difference from the classical profile likelihood is that the profile likelihood proposed is a functional of the base measure function, instead of a function of a real variable. By making the most of the structure information of the semiparametric exponential family, we get an explicit expression of the estimator of the least favorable curve. It ensures that the new profile likelihood is computationally simple. Due to the use of the least favorable curve, the semiparametric efficiency is achieved successfully and the estimation bias is reduced significantly. Simulation studies can illustrate that our proposal is much better than the existing methodologies for most cases under study, and is robust to the different model conditions. 相似文献