首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 28 毫秒
1.
A test for lack of fit in regression is presented. Unlike other methods, this one doesn't require replicates or a prior estimate of variance. It can be used for linear or multiple regression, and would be easy to add to existing computer packages. It is based on comparing a fit over low leverage points with a fit over the entire set of data. Distribution theory results are pre¬sented, with examples of power. A discussion of its use for de¬tecting violations of other regression assumptions is also given.  相似文献   

2.
We extend the discussion of Qin and Zhang's [1997. A goodness of fit test for logistic regression models base on case–control data. Biometrika 84, 609–618] goodness-of-fit test of logistic regression under case–control data to continuation ratio logistic regression (CRLR) models. We first showed that the retrospective CRLR model, which is valid for case–control data (the null hypothesis H0)H0), is equivalent to an I  -sample semiparametric model. Then under H0H0, we find the semiparametric profile empirical likelihood estimators of distributions of the covariate conditioning on each response category and use them to define a Kolmogorov–Smirnov type test for assessing the global fit of CRLR models under case–control data. Unlike prospective CRLR models, retrospective CRLR models cannot be partitioned to a series of retrospective binary logistic regression models studied by Qin and Zhang [1997. A goodness of fit test for logistic regression models base on case–control data. Biometrika 84, 609–618].  相似文献   

3.
A key diagnostic in the analysis of linear regression models is whether the fitted model is appropriate for the observed data. The classical lack of fit test is used for testing the adequacy of a linear regression model when replicates are available. While many efforts have been made in finding alternative lack of fit tests for models without replicates, this paper focuses on studying the efficacy of three tests: the classical lack of fit test, Utts' (1982) test, Burn & Ryan's (1983) test. The powers of these tests are computed for a variety of situations. Comments and conclusions on the overall performance of these tests are made, including recommendations for future studies.  相似文献   

4.
Minitab's data subsetting lack of fit test (denoted XLOF) is a combination of Burn and Ryan's test and Utts' test for testing lack of fit in linear regression models. As an alternative to the classical or pure error lack of fit test, it does not require replicates of predictor variables. However, due to the uncertainty about its performance, XLOF still remains unfamiliar to regression users while the well-known classical lack of fit test is not applicable to regression data without replicates. So far this procedure has not been mentioned in any textbooks and has not been included in any other software packages. This study assesses the performance of XLOF in detecting lack of fit in linear regressions without replicates by comparing the power with the classic test. The power of XLOF is simulated using Minitab macros for variables with several forms of curvature. These comparisons lead to pragmatic suggestions on the use of XLOF. The performance of XLOF was shown to be superior to the classical test based on the results. It should be noted that the replicates required for the classical test made itself unavailable for most of the regression data while XLOF can still be as powerful as the classic test even without replicates.  相似文献   

5.
In this paper, a test is derived to assess the validity of heteroscedastic nonlinear regression models by a non‐parametric cosine regression method. For order selection, the paper proposes a data‐driven method that uses the parametric null model optimal order. This method yields a test that is asymptotically normally distributed under the null hypothesis and is consistent against any fixed alternative. Simulation studies that test the lack of fit of a generalized linear model are conducted to compare the performance of the proposed test with that of an existing non‐parametric kernel test. A dataset of esterase levels is used to demonstrate the proposed method in practice.  相似文献   

6.
Zero inflated Poisson regression is a model commonly used to analyze data with excessive zeros. Although many models have been developed to fit zero-inflated data, most of them strongly depend on the special features of the individual data. For example, there is a need for new models when dealing with truncated and inflated data. In this paper, we propose a new model that is sufficiently flexible to model inflation and truncation simultaneously, and which is a mixture of a multinomial logistic and a truncated Poisson regression, in which the multinomial logistic component models the occurrence of excessive counts. The truncated Poisson regression models the counts that are assumed to follow a truncated Poisson distribution. The performance of our proposed model is evaluated through simulation studies, and our model is found to have the smallest mean absolute error and best model fit. In the empirical example, the data are truncated with inflated values of zero and fourteen, and the results show that our model has a better fit than the other competing models.  相似文献   

7.
A model is presented to generate a distribution for the probability of an ACR response at six months for a new treatment for rheumatoid arthritis given evidence from a one- or three-month clinical trial. The model is based on published evidence from 11 randomized controlled trials on existing treatments. A hierarchical logistic regression model is used to find the relationship between the proportion of patients achieving ACR20 and ACR50 at one and three months and the proportion at six months. The model is assessed by Bayesian predictive P-values that demonstrate that the model fits the data well. The model can be used to predict the number of patients with an ACR response for proposed six-month clinical trials given data from clinical trials of one or three months duration.  相似文献   

8.
Three procedures for testing the adequacy of a proposed linear multiresponse regression model against unspecified general alternatives are considered. The model has an error structure with a matrix normal distribution which allows the vector of responses for a particular run to have an unknown covariance matrix while the responses for different runs are uncorrelated. Furthermore, each response variable may be modeled by a separate design matrix. Multivariate statistics corresponding to the classical univariate lack of fit and pure error sums of squares are defined and used to determine the multivariate lack of fit tests. A simulation study was performed to compare the power functions of the test procedures in the case of replication. Generalizations of the tests for the case in which there are no independent replicates on all responses are also presented.  相似文献   

9.
Demonstrated equivalence between a categorical regression model based on case‐control data and an I‐sample semiparametric selection bias model leads to a new goodness‐of‐fit test. The proposed test statistic is an extension of an existing Kolmogorov–Smirnov‐type statistic and is the weighted average of the absolute differences between two estimated distribution functions in each response category. The paper establishes an optimal property for the maximum semiparametric likelihood estimator of the parameters in the I‐sample semiparametric selection bias model. It also presents a bootstrap procedure, some simulation results and an analysis of two real datasets.  相似文献   

10.
A test is proposed for assessing the lack of fit of heteroscedastic nonlinear regression models that is based on comparison of nonparametric kernel and parametric fits. A data-driven method is proposed for bandwidth selection using the asymptotically optimal bandwidth of the parametric null model which leads to a test that has a limiting normal distribution under the null hypothesis and is consistent against any fixed alternative. The resulting test is applied to the problem of testing the lack of fit of a generalized linear model.  相似文献   

11.
Several tests for regression lack of fit proposed by Christensen (1989), Shillington (1979) and Neill and Johnson (1985) are compared. The tests considered are applicable for the case of nonreplication and reduce to the classical lack of fit test when independent replications are available. A simulation study is used to compare the size and power of the test procedures for small sample sizes and various configurations of nonreplication. In addition, each test is shown to be consistent as well as invariant with respect to location and scale changes made on the regressor variables.  相似文献   

12.
Logistic regression is a popular method of relating a binary response to one or more potential covariables or risk factors. In 1980, Hosmer and Lemeshow proposed a method for assessing the goodness of fit of logistic regression models. This test is based on a chi-squared statistic that compares the observed and expected cell frequencies in the 2 g table, as found by sorting the observations by predicted probabilities and forming g groups. We have noted that the test may be sensitive to situations where there are low expected cell frequencies. Further, several commonly used statistical packages apply the Hosmer-Lemeshow test, but do so in diff erent ways, and none of the packages we considered alerted the user to the potential difficulty with low expected cell frequencies. An alternative goodness-of-fit test is illustrated which seems to off er an advantage over the popular Hosmer-Lemeshow test, by reducing the likelihood of small expected counts and, potentially, sharpening the interpretation. An example is provided which demonstrates these ideas.  相似文献   

13.
An F-statistic which tests a hypothesized linear regression model against the general alternative is developed. Observations are grouped using “near neighbours” and a generalization of the usual lack of fit test is derived. Two data sets from Daniel and Wood (1971) are used to illustrate the methodology. Power considerations are discussed.  相似文献   

14.
Herein, we propose a data-driven test that assesses the lack of fit of nonlinear regression models. The comparison of local linear kernel and parametric fits is the basis of this test, and specific boundary-corrected kernels are not needed at the boundary when local linear fitting is used. Under the parametric null model, the asymptotically optimal bandwidth can be used for bandwidth selection. This selection method leads to the data-driven test that has a limiting normal distribution under the null hypothesis and is consistent against any fixed alternative. The finite-sample property of the proposed data-driven test is illustrated, and the power of the test is compared with that of some existing tests via simulation studies. We illustrate the practicality of the proposed test by using two data sets.  相似文献   

15.
The hat matrix is widely used as a diagnostic tool in linear regression because it contains the leverages which the independent variables exert on the fitted values. In some experiments, cases with high leverage may be avoided by judicious choice of design for the independent variables. A variety of methods for constructing equileverage designs for linear regression are discussed. Such designs remove one of the factors, namely large leverage points, which can lead to nonrobust estimators and tests. In addition, a method is given for combining equileverage designs to test for lack of fit of the linear model.  相似文献   

16.
The authors consider the problem of testing the validity of the logistic regression model using a random sample. Given the values of the response variable, they observe that the sample actually consists of two independent subsets of observations whose density ratio has a known parametric form when the model is true. They are thus led to propose a generalized-moments specification test in detail. In addition, they show that this test can be derived using Neyman's smooth tests for goodness of fit. They present simulation results and apply the methodology to the analysis of two real data sets.  相似文献   

17.
Count data often contain many zeros. In parametric regression analysis of zero-inflated count data, the effect of a covariate of interest is typically modelled via a linear predictor. This approach imposes a restrictive, and potentially questionable, functional form on the relation between the independent and dependent variables. To address the noted restrictions, a flexible parametric procedure is employed to model the covariate effect as a linear combination of fixed-knot cubic basis splines or B-splines. The semiparametric zero-inflated Poisson regression model is fitted by maximizing the likelihood function through an expectation–maximization algorithm. The smooth estimate of the functional form of the covariate effect can enhance modelling flexibility. Within this modelling framework, a log-likelihood ratio test is used to assess the adequacy of the covariate function. Simulation results show that the proposed test has excellent power in detecting the lack of fit of a linear predictor. A real-life data set is used to illustrate the practicality of the methodology.  相似文献   

18.
Although the effect of missing data on regression estimates has received considerable attention, their effect on predictive performance has been neglected. We studied the performance of three missing data strategies—omission of records with missing values, replacement with a mean and imputation based on regression—on the predictive performance of logistic regression (LR), classification tree (CT) and neural network (NN) models in the presence of data missing completely at random (MCAR). Models were constructed using datasets of size 500 simulated from a joint distribution of binary and continuous predictors including nonlinearities, collinearity and interactions between variables. Though omission produced models that fit better on the data from which the models were developed, imputation was superior on average to omission for all models when evaluating the receiver operating characteristic (ROC) curve area, mean squared error (MSE), pooled variance across outcome categories and calibration X 2 on an independently generated test set. However, in about one-third of simulations, omission performed better. Performance was also more variable with omission including quite a few instances of extremely poor performance. Replacement and imputation generally produced similar results except with neural networks for which replacement, the strategy typically used in neural network algorithms, was inferior to imputation. Missing data affected simpler models much less than they did more complex models such as generalized additive models that focus on local structure For moderate sized datasets, logistic regressions that use simple nonlinear structures such as quadratic terms and piecewise linear splines appear to be at least as robust to randomly missing values as neural networks and classification trees.  相似文献   

19.
The Hosmer–Lemeshow (H–L) test is a widely used method when assessing the goodness-of-fit of a logistic regression model. However, the H–L test is sensitive to the sample sizes and the number of groups in H–L test. Cautions need to be taken for interpreting an H–L test with a large sample size. In this paper, we propose a simple test procedure to evaluate the model fit of logistic regression model with a large sample size, in which a bootstrap method is used and the test result is determined by the power of H–L test at the target sample size. Simulation studies show that the proposed method can effectively standardize the power of the H–L test under the pre-specified level of type I error. Application to the two datasets illustrates the usefulness of the proposed model.  相似文献   

20.
Interaction is very common in reality, but has received little attention in logistic regression literature. This is especially true for higher-order interactions. In conventional logistic regression, interactions are typically ignored. We propose a model selection procedure by implementing an association rules analysis. We do this by (1) exploring the combinations of input variables which have significant impacts to response (via association rules analysis); (2) selecting the potential (low- and high-order) interactions; (3) converting these potential interactions into new dummy variables; and (4) performing variable selections among all the input variables and the newly created dummy variables (interactions) to build up the optimal logistic regression model. Our model selection procedure establishes the optimal combination of main effects and potential interactions. The comparisons are made through thorough simulations. It is shown that the proposed method outperforms the existing methods in all cases. A real-life example is discussed in detail to demonstrate the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号