首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A test for lack of fit in regression is presented. Unlike other methods, this one doesn't require replicates or a prior estimate of variance. It can be used for linear or multiple regression, and would be easy to add to existing computer packages. It is based on comparing a fit over low leverage points with a fit over the entire set of data. Distribution theory results are pre¬sented, with examples of power. A discussion of its use for de¬tecting violations of other regression assumptions is also given.  相似文献   

2.
Minitab's data subsetting lack of fit test (denoted XLOF) is a combination of Burn and Ryan's test and Utts' test for testing lack of fit in linear regression models. As an alternative to the classical or pure error lack of fit test, it does not require replicates of predictor variables. However, due to the uncertainty about its performance, XLOF still remains unfamiliar to regression users while the well-known classical lack of fit test is not applicable to regression data without replicates. So far this procedure has not been mentioned in any textbooks and has not been included in any other software packages. This study assesses the performance of XLOF in detecting lack of fit in linear regressions without replicates by comparing the power with the classic test. The power of XLOF is simulated using Minitab macros for variables with several forms of curvature. These comparisons lead to pragmatic suggestions on the use of XLOF. The performance of XLOF was shown to be superior to the classical test based on the results. It should be noted that the replicates required for the classical test made itself unavailable for most of the regression data while XLOF can still be as powerful as the classic test even without replicates.  相似文献   

3.
We develop an omnibus two-sample test for ranked-set sampling (RSS) data. The test statistic is the conditional probability of seeing the observed sequence of ranks in the combined sample, given the observed sequences within the separate samples. We compare the test to existing tests under perfect rankings, finding that it can outperform existing tests in terms of power, particularly when the set size is large. The test does not maintain its level under imperfect rankings. However, one can create a permutation version of the test that is comparable in power to the basic test under perfect rankings and also maintains its level under imperfect rankings. Both tests extend naturally to judgment post-stratification, unbalanced RSS, and even RSS with multiple set sizes. Interestingly, the tests have no simple random sampling analog.  相似文献   

4.
We introduce an omnibus goodness-of-fit test for statistical models for the conditional distribution of a random variable. In particular, this test is useful for assessing whether a regression model fits a data set on all its assumptions. The test is based on a generalization of the Cramér–von Mises statistic and involves a local polynomial estimator of the conditional distribution function. First, the uniform almost sure consistency of this estimator is established. Then, the asymptotic distribution of the test statistic is derived under the null hypothesis and under contiguous alternatives. The extension to the case where unknown parameters appear in the model is developed. A simulation study shows that the test has good power against some common departures encountered in regression models. Moreover, its power is comparable to that of other nonparametric tests designed to examine only specific departures.  相似文献   

5.
An F-statistic which tests a hypothesized linear regression model against the general alternative is developed. Observations are grouped using “near neighbours” and a generalization of the usual lack of fit test is derived. Two data sets from Daniel and Wood (1971) are used to illustrate the methodology. Power considerations are discussed.  相似文献   

6.
The use of a statistic based on cubic spline smoothing is considered for testing nonlinear regression models for lack of fit. The statistic is defined to be the Euclidean squared norm of the smoothed residual vector obtained from fitting the nonlinear model, The asymptotic distribution of the statistic is derived under suitable smooth local alternatives and a numerical example is presented.  相似文献   

7.
8.
Herein, we propose a data-driven test that assesses the lack of fit of nonlinear regression models. The comparison of local linear kernel and parametric fits is the basis of this test, and specific boundary-corrected kernels are not needed at the boundary when local linear fitting is used. Under the parametric null model, the asymptotically optimal bandwidth can be used for bandwidth selection. This selection method leads to the data-driven test that has a limiting normal distribution under the null hypothesis and is consistent against any fixed alternative. The finite-sample property of the proposed data-driven test is illustrated, and the power of the test is compared with that of some existing tests via simulation studies. We illustrate the practicality of the proposed test by using two data sets.  相似文献   

9.
A test is proposed for assessing the lack of fit of heteroscedastic nonlinear regression models that is based on comparison of nonparametric kernel and parametric fits. A data-driven method is proposed for bandwidth selection using the asymptotically optimal bandwidth of the parametric null model which leads to a test that has a limiting normal distribution under the null hypothesis and is consistent against any fixed alternative. The resulting test is applied to the problem of testing the lack of fit of a generalized linear model.  相似文献   

10.
Assessment of the adequacy of a proposed linear regression model is necessarily subjective. However, the following three criteria may warrant investigation whether the distributional assumptions for the stochastic portion of the model are satisfied, whether the predictive capability of the model is satisfactory, and whether the deterministic portion of the model is adejuate in a statistical sense. The first two criteria have been reviewed in the literature to some extent. This paper reviews statistical tests and procedures which aid the experimenter in deterrmining lack of fit or functional misspecification associated with the deterministic portion of a proposed linear regression model.  相似文献   

11.
Goodness of fit tests for the multiple logistic regression model   总被引:1,自引:0,他引:1  
Several test statistics are proposed for the purpose of assessing the goodness of fit of the multiple logistic regression model. The test statistics are obtained by applying a chi-square test for a contingency table in which the expected frequencies are determined using two different grouping strategies and two different sets of distributional assumptions. The null distributions of these statistics are examined by applying the theory for chi-square tests of Moore Spruill (1975) and through computer simulations. All statistics are shown to have a chi-square distribution or a distribution which can be well approximated by a chi-square. The degrees of freedom are shown to depend on the particular statistic and the distributional assumptions.

The power of each of the proposed statistics is examined for the normal, linear, and exponential alternative models using computer simulations.  相似文献   

12.
A key diagnostic in the analysis of linear regression models is whether the fitted model is appropriate for the observed data. The classical lack of fit test is used for testing the adequacy of a linear regression model when replicates are available. While many efforts have been made in finding alternative lack of fit tests for models without replicates, this paper focuses on studying the efficacy of three tests: the classical lack of fit test, Utts' (1982) test, Burn & Ryan's (1983) test. The powers of these tests are computed for a variety of situations. Comments and conclusions on the overall performance of these tests are made, including recommendations for future studies.  相似文献   

13.
The author proposes a nonparametric test for checking the lack of fit of the quantile function of survival time given the covariates; she assumes that survival time is subjected to random right censoring. Her test statistic is a kemel‐based smoothing estimator of a moment condition. The test statistic is asymptotically Gaussian under the null hypothesis. The author investigates its behavior under local alternative sequences. She assesses its finite‐sample power through simulations and illustrates its use with the Stanford heart transplant data.  相似文献   

14.
Logistic regression is a popular method of relating a binary response to one or more potential covariables or risk factors. In 1980, Hosmer and Lemeshow proposed a method for assessing the goodness of fit of logistic regression models. This test is based on a chi-squared statistic that compares the observed and expected cell frequencies in the 2 g table, as found by sorting the observations by predicted probabilities and forming g groups. We have noted that the test may be sensitive to situations where there are low expected cell frequencies. Further, several commonly used statistical packages apply the Hosmer-Lemeshow test, but do so in diff erent ways, and none of the packages we considered alerted the user to the potential difficulty with low expected cell frequencies. An alternative goodness-of-fit test is illustrated which seems to off er an advantage over the popular Hosmer-Lemeshow test, by reducing the likelihood of small expected counts and, potentially, sharpening the interpretation. An example is provided which demonstrates these ideas.  相似文献   

15.
16.
Summary.  The paper describes a method of estimating the performance of a multiple-screening test where those who test negatively do not have their true disease status determined. The methodology is motivated by a data set on 49927 subjects who were given K =6 binary tests for bowel cancer. A complicating factor is that individuals may have polyps in the bowel, a condition that the screening test is not designed to detect but which may be worth diagnosing. The methodology is based on a multinomial logit model for Pr( S | R 6), the probability distribution of patient status S (healthy, polyps or diseased) conditional on the results R 6 from six binary tests. An advantage of the methodology described is that the modelling is data driven. In particular, we require no assumptions about correlation within subjects, the relative sensitivity of the K tests or the conditional independence of the tests. The model leads to simple estimates of the trade-off between different errors as the number of tests is varied, presented graphically by using receiver operating characteristic curves. Finally, the model allows us to estimate better protocols for assigning subjects to the disease group, as well as the gains in accuracy from these protocols.  相似文献   

17.
Errors in measurement frequently occur in observing responses. If case–control data are based on certain reported responses, which may not be the true responses, then we have contaminated case–control data. In this paper, we first show that the ordinary logistic regression analysis based on contaminated case–control data can lead to very serious biased conclusions. This can be concluded from the results of a theoretical argument, one example, and two simulation studies. We next derive the semiparametric maximum likelihood estimate (MLE) of the risk parameter of a logistic regression model when there is a validation subsample. The asymptotic normality of the semiparametric MLE will be shown along with consistent estimate of asymptotic variance. Our example and two simulation studies show these estimates to have reasonable performance under finite sample situations.  相似文献   

18.
The purpose of this study is to highlight the application of sparse logistic regression models in dealing with prediction of tumour pathological subtypes based on lung cancer patients'' genomic information. We consider sparse logistic regression models to deal with the high dimensionality and correlation between genomic regions. In a hierarchical likelihood (HL) method, it is assumed that the random effects follow a normal distribution and its variance is assumed to follow a gamma distribution. This formulation considers ridge and lasso penalties as special cases. We extend the HL penalty to include a ridge penalty (called ‘HLnet’) in a similar principle of the elastic net penalty, which is constructed from lasso penalty. The results indicate that the HL penalty creates more sparse estimates than lasso penalty with comparable prediction performance, while HLnet and elastic net penalties have the best prediction performance in real data. We illustrate the methods in a lung cancer study.  相似文献   

19.
We derive approximations to the first three moments of the conditional distribution of the deviance statistic, for testing the goodness of fit of generalized linear models with non-canonical links, by using an estimating equations approach, for data that are extensive but sparse. A supplementary estimating equation is proposed from which the modified deviance statistic is obtained. An application of a modified deviance statistic is shown to binomial and Poisson data. We also conduct a performance study of the modified Pearson statistic derived by Farrington and the modified deviance statistic derived in this paper, in terms of size and power, through a small scale simulation experiment. Both statistics are shown to perform well in terms of size. The deviance statistic, however, shows an advantage of power. Two examples are given.  相似文献   

20.
Likelihood-ratio tests (LRTs) are often used for inferences on one or more logistic regression coefficients. Conventionally, for given parameters of interest, the nuisance parameters of the likelihood function are replaced by their maximum likelihood estimates. The new function created is called the profile likelihood function, and is used for inference from LRT. In small samples, LRT based on the profile likelihood does not follow χ2 distribution. Several corrections have been proposed to improve LRT when used with small-sample data. Additionally, complete or quasi-complete separation is a common geometric feature for small-sample binary data. In this article, for small-sample binary data, we have derived explicitly the correction factors of LRT for models with and without separation, and proposed an algorithm to construct confidence intervals. We have investigated the performances of different LRT corrections, and the corresponding confidence intervals through simulations. Based on the simulation results, we propose an empirical rule of thumb on the use of these methods. Our simulation findings are also supported by real-world data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号