首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Maximum penalized likelihood estimation is applied in non(semi)-para-metric regression problems, and enables us exploratory identification and diagnostics of nonlinear regression relationships. The smoothing parameter A controls trade-off between the smoothness and the goodness-of-fit of a function. The method of cross-validation is used for selecting A, but the generalized cross-validation, which is based on the squared error criterion, shows bad be¬havior in non-normal distribution and can not often select reasonable A. The purpose of this study is to propose a method which gives more suitable A and to evaluate the performance of it.

A method of simple calculation for the delete-one estimates in the likeli¬hood-based cross-validation (LCV) score is described. A score of similar form to the Akaike information criterion (AIC) is also derived. The proposed scores are compared with the ones of standard procedures by using data sets in liter¬atures. Simulations are performed to compare the patterns of selecting A and overall goodness-of-fit and to evaluate the effects of some factors. The LCV-scores by the simple calculation provide good approximation to the exact one if λ is not extremeiy smaii Furthermore the LCV scores by the simple size it possible to select X adaptively They have the effect, of reducing the bias of estimates and provide better performance in the sense of overall goodness-of fit. These scores are useful especially in the case of small sample size and in the case of binary logistic regression.  相似文献   

2.
Model summaries based on the ratio of fitted and null likelihoods have been proposed for generalised linear models, reducing to the familiar R2 coefficient of determination in the Gaussian model with identity link. In this note I show how to define the Cox–Snell and Nagelkerke summaries under arbitrary probability sampling designs, giving a design‐consistent estimator of the population model summary. It is also shown that for logistic regression models under case–control sampling the usual Cox–Snell and Nagelkerke R2 are not design‐consistent, but are systematically larger than would be obtained with a cross‐sectional or cohort sample from the same population, even in settings where the weighted and unweighted logistic regression estimators are similar or identical. Implementation of the new estimators is straightforward and code is provided in R.  相似文献   

3.
Abstract

It is common to monitor several correlated quality characteristics using the Hotelling's T 2 statistic. However, T 2 confounds the location shift with scale shift and consequently it is often difficult to determine the factors responsible for out of control signal in terms of the process mean vector and/or process covariance matrix. In this paper, we propose a diagnostic procedure called ‘D-technique’ to detect the nature of shift. For this purpose, two sets of regression equations, each consisting of regression of a variable on the remaining variables, are used to characterize the ‘structure’ of the ‘in control’ process and that of ‘current’ process. To determine the sources responsible for an out of control state, it is shown that it is enough to compare these two structures using the dummy variable multiple regression equation. The proposed method is operationally simpler and computationally advantageous over existing diagnostic tools. The technique is illustrated with various examples.  相似文献   

4.
The problems of existence and uniqueness of maximum likelihood estimates for logistic regression were completely solved by Silvapulle in 1981 and Albert and Anderson in 1984. In this paper, we extend the well-known results by Silvapulle and by Albert and Anderson to weighted logistic regression. We analytically prove the equivalence between the overlap condition used by Albert and Anderson and that used by Silvapulle. We show that the maximum likelihood estimate of weighted logistic regression does not exist if there is a complete separation or a quasicomplete separation of the data points, and exists and is unique if there is an overlap of data points. Our proofs and results for weighted logistic apply to unweighted logistic regression.  相似文献   

5.
Berkson (1980) conjectured that minimum x2 was a superior procedure to that of maximum likelihood, especially with regard to mean squared error. To explore his conjecture, we analyze his (1955) bioassay problem related to logistic regression. We consider not only the criterion of mean squared error for the comparison of these estimators, but also include alternative criteria such as concentration functions and Pitman's measure of closeness. The choice of these latter criteria is motivated by Rao's (1981) considerations of the shortcomings of mean squared error. We also include several Rao-Blackwellized versions of the minimum logit x2 the purpose of these comparisons.  相似文献   

6.
Random coefficient regression models have been used t odescribe repeated measures on members of a sample of n in dividuals . Previous researchers have proposed methods of estimating the mean parameters of such models. Their methods require that eachindividual be observed under the same settings of independent variablesor , lesss stringently , that the number of observations ,r , on each individual be the same. Under the latter restriction ,estimators of mean regression parameters exist which are consist ent as both r→∞and n→∞ and efficient as r→∞, and large sample ( r large ) tests of mean parameters are available . These results are easily extended to the case where not a11 individuals are observed an equal number of times provided limit are taken as min(r) → ∞. Existing methods of inference , however, are not justified by the current literature when n is large and r is small, as is the case i n many bio-medical applications . The primary con tribution of the current paper is a derivation of the asymptotic properties of modifications of existing estimators as n alone tends to infinity, r fixed. From these properties it is shown that existing methods of inference, which are currently justified only when min(r) is large, are also justifiable when n is large and min(r) is small. A secondary contribution is the definition of a positive definite estimator of the covariance matrix for the random coefficients in these models. Use of this estimator avoids computational problems that can otherwise arise.  相似文献   

7.
Using a wavelet basis, Chesneau and Shirazi study the estimation of one-dimensional regression functions in a biased non parametric model over L2 risk (see Chesneau, C and Shirazi, E. Non parametric wavelet regression based on biased data, Communication in Statistics – Theory and Methods, 43: 2642–2658, 2014). This article considers d-dimensional regression function estimation over Lp?(1 ? p < ∞) risk. It turns out that our results reduce to the corresponding theorems of Chesneau and Shirazi’s theorems, when d = 1 and p = 2.  相似文献   

8.
Estimating the risk factors of a disease such as diabetic retinopathy (DR) is one of the important research problems among bio-medical and statistical practitioners as well as epidemiologists. Incidentally many studies have focused in building models with binary outcomes, that may not exploit the available information. This article has investigated the importance of retaining the ordinal nature of the response variable (e.g. severity level of a disease) while determining the risk factors associated with DR. A generalized linear model approach with appropriate link functions has been studied using both Classical and Bayesian frameworks. From the result of this study, it can be observed that the ordinal logistic regression with probit link function could be more appropriate approach in determining the risk factors of DR. The study has emphasized the ways to handle the ordinal nature of the response variable with better model fit compared to other link functions.  相似文献   

9.
In comparison to other experimental studies, multicollinearity appears frequently in mixture experiments, a special study area of response surface methodology, due to the constraints on the components composing the mixture. In the analysis of mixture experiments by using a special generalized linear model, logistic regression model, multicollinearity causes precision problems in the maximum-likelihood logistic regression estimate. Therefore, effects due to multicollinearity can be reduced to a certain extent by using alternative approaches. One of these approaches is to use biased estimators for the estimation of the coefficients. In this paper, we suggest the use of logistic ridge regression (RR) estimator in the cases where there is multicollinearity during the analysis of mixture experiments using logistic regression. Also, for the selection of the biasing parameter, we use fraction of design space plots for evaluating the effect of the logistic RR estimator with respect to the scaled mean squared error of prediction. The suggested graphical approaches are illustrated on the tumor incidence data set.  相似文献   

10.
11.
ABSTRACT

Fisher's linear discriminant analysis (FLDA) is known as a method to find a discriminative feature space for multi-class classification. As a theory of extending FLDA to an ultimate nonlinear form, optimal nonlinear discriminant analysis (ONDA) has been proposed. ONDA indicates that the best theoretical nonlinear map for maximizing the Fisher's discriminant criterion is formulated by using the Bayesian a posterior probabilities. In addition, the theory proves that FLDA is equivalent to ONDA when the Bayesian a posterior probabilities are approximated by linear regression (LR). Due to some limitations of the linear model, there is room to modify FLDA by using stronger approximation/estimation methods. For the purpose of probability estimation, multi-nominal logistic regression (MLR) is more suitable than LR. Along this line, in this paper, we develop a nonlinear discriminant analysis (NDA) in which the posterior probabilities in ONDA are estimated by MLR. In addition, in this paper, we develop a way to introduce sparseness into discriminant analysis. By applying L1 or L2 regularization to LR or MLR, we can incorporate sparseness in FLDA and our NDA to increase generalization performance. The performance of these methods is evaluated by benchmark experiments using last_exam17 standard datasets and a face classification experiment.  相似文献   

12.
A substantial fraction of the statistical analyses and in particular statistical computing is done under the heading of multiple linear regression. That is the fitting of equations to multivariate data using the least squares technique for estimating parameters The optimality properties of these estimates are described in an ideal setting which is not often realized in practice.

Frequently, we do not have "good" data in the sense that the errors are non-normal or the variance is non-homogeneous. The data may contain outliers or extremes which are not easily detectable but variables in the proper functional, and we. may have the linearity

Prior to the mid-sixties regression programs provided just the basic least squares computations plus possibly a step-wise algorithm for variable selection. The increased interest in regression prompted by dramatic improvements in computers has led to a vast amount of literatur describing alternatives to least squares improved variable selection methods and extensive diagnostic procedures

The purpose of this paper is to summarize and illustrate some of these recent developments. In particular we shall review some of the potential problems with regression data discuss the statistics and techniques used to detect these problems and consider some of the proposed solutions. An example is presented to illustrate the effectiveness of these diagnostic methods in revealing such problems and the potential consequences of employing the proposed methods.  相似文献   

13.
The use of individualized regressions, which reduces the polychotomous logistic regression model to several dichotomous models, has been proposed as a solution to some practical difficulties for binary covariates (Begg and Gray 1984, Biometrika, 71, 11–18). Its disadvantages, however, include loss of efficiency and the complexity of making comparisons among regressions. Using expressions for the large-sample distribution of the maximum-likelihood estimates, the efficiency of the individualized procedure relative to the polychotomous procedure is evaluated for the case in which the covariates are assumed to follow a multivariate normal distribution. The relative efficiency when the logistic slope vectors from different regressions are collinear can be substantially lower compared to the efficiency with orthogonal slope vectors. Further evaluations for binary covariates using collinear and orthogonal slope parametrizations lead to a similar characterization.  相似文献   

14.
A recent article in this journal presented a variety of expressions for the coefficient of determination (R 2) and demonstrated that these expressions were generally not equivalent. The article discussed potential pitfalls in interpreting the R 2 statistic in ordinary least-squares regression analysis. The current article extends this discussion to the case in which regression models are fit by weighted least squares and points out an additional pitfall that awaits the unwary data analyst. We show that unthinking reliance on the R 2 statistic can lead to an overly optimistic interpretation of the proportion of variance accounted for in the regression. We propose a modification of the estimator and demonstrate its utility by example.  相似文献   

15.
This study considers the binary classification of functional data collected in the form of curves. In particular, we assume a situation in which the curves are highly mixed over the entire domain, so that the global discriminant analysis based on the entire domain is not effective. This study proposes an interval-based classification method for functional data: the informative intervals for classification are selected and used for separating the curves into two classes. The proposed method, called functional logistic regression with fused lasso penalty, combines the functional logistic regression as a classifier and the fused lasso for selecting discriminant segments. The proposed method automatically selects the most informative segments of functional data for classification by employing the fused lasso penalty and simultaneously classifies the data based on the selected segments using the functional logistic regression. The effectiveness of the proposed method is demonstrated with simulated and real data examples.  相似文献   

16.
This article rigorously proves superiority of the proportion χ2 test to the logistic regression Wald test in terms of power when comparing two rates, despite their asymptotic equivalence under the null hypothesis that the two rates are equal.  相似文献   

17.
This article examines several goodness-of-fit measures in the binary probit regression model. Existing pseudo-R 2 measures are reviewed, two modified and one new pseudo-R 2 measure are proposed. For the probit regression model, empirical comparisons are made for different goodness-of-fit measures with the squared sample correlation coefficient of the observed response and the predicted probabilities. As an illustration, the goodness-of-fit measures are applied to a “paid labor force” data set.  相似文献   

18.
Gaussian process (GP) is a Bayesian nonparametric regression model, showing good performance in various applications. However, during its model-tuning procedure, the GP implementation suffers from numerous covariance-matrix inversions of expensive O(N3) operations, where N is the matrix dimension. In this article, we propose using the quasi-Newton BFGS O(N2)-operation formula to approximate/replace recursively the inverse of covariance matrix at every iteration. The implementation accuracy is guaranteed carefully by a matrix-trace criterion and by the restarts technique to generate good initial guesses. A number of numerical tests are then performed based on the sinusoidal regression example and the Wiener–Hammerstein identification example. It is shown that by using the proposed implementation, more than 80% O(N3) operations could be eliminated, and a typical speedup of 5–9 could be achieved as compared to the standard maximum-likelihood-estimation (MLE) implementation commonly used in Gaussian process regression.  相似文献   

19.
Studying the effect of exposure or intervention on a dichotomous outcome is very common in medical research. Logistic regression (LR) is often used to determine such association which provides odds ratio (OR). OR often overestimates the effect size for prevalent outcome data. In such situations, use of relative risk (RR) has been suggested. We propose modifications in Zhang and Yu and Diaz-Quijano methods. These methods were compared with stratified Mantel Haenszel method, LR, log binomial regression (LBR), Zhang and Yu method, Poisson/Cox regression, modified Poisson/Cox regression, marginal probability method, COPY method, inverse probability of treatment weighted LBR, and Diaz-Quijano method. Our proposed modified Diaz-Quijano (MDQ) method provides RR and its confidence interval similar to those estimated by modified Poisson/Cox and LBRs. The proposed modifications in Zhang and Yu method provides better estimate of RR and its standard error as compared to Zhang and Yu method in a variety of situations with prevalent outcome. The MDQ method can be used easily to estimate the RR and its confidence interval in the studies which require reporting of RRs. Regression models which directly provide the estimate of RR without convergence problems such as the MDQ method and modified Poisson/Cox regression should be preferred.  相似文献   

20.
Approximate Representation of Estimators in Constrained Regression Problems   总被引:6,自引:0,他引:6  
The estimators of inequality-constrained regression problems can be computed by iterative algorithms of mathematical programming, but they do not have analytical expressions in terms of the given data. This situation brings obstacles to further studies on the constrained regression. In this paper we derive approximate representations of the estimators with a remainder of magnitude ( N −1 log log N )1/2. From these representations one can clearly see the concrete structure of the estimators of these problems. It will be very helpful for further regression analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号