期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Likelihood-based cross-validation score for selecting the smoothing parameter in maximum penalized likelihood estimation

Wataru Sakamoto Shingo Shirahata 《统计学通讯:理论与方法》2013,42(7):1671-1698

Maximum penalized likelihood estimation is applied in non(semi)-para-metric regression problems, and enables us exploratory identification and diagnostics of nonlinear regression relationships. The smoothing parameter A controls trade-off between the smoothness and the goodness-of-fit of a function. The method of cross-validation is used for selecting A, but the generalized cross-validation, which is based on the squared error criterion, shows bad be¬havior in non-normal distribution and can not often select reasonable A. The purpose of this study is to propose a method which gives more suitable A and to evaluate the performance of it.

A method of simple calculation for the delete-one estimates in the likeli¬hood-based cross-validation (LCV) score is described. A score of similar form to the Akaike information criterion (AIC) is also derived. The proposed scores are compared with the ones of standard procedures by using data sets in liter¬atures. Simulations are performed to compare the patterns of selecting A and overall goodness-of-fit and to evaluate the effects of some factors. The LCV-scores by the simple calculation provide good approximation to the exact one if λ is not extremeiy smaii Furthermore the LCV scores by the simple size it possible to select X adaptively They have the effect, of reducing the bias of estimates and provide better performance in the sense of overall goodness-of fit. These scores are useful especially in the case of small sample size and in the case of binary logistic regression. 相似文献

2.

Pseudo‐R2 statistics under complex sampling

下载免费PDF全文

Thomas Lumley 《Australian & New Zealand Journal of Statistics》2017,59(2):187-194

Model summaries based on the ratio of fitted and null likelihoods have been proposed for generalised linear models, reducing to the familiar R² coefficient of determination in the Gaussian model with identity link. In this note I show how to define the Cox–Snell and Nagelkerke summaries under arbitrary probability sampling designs, giving a design‐consistent estimator of the population model summary. It is also shown that for logistic regression models under case–control sampling the usual Cox–Snell and Nagelkerke R² are not design‐consistent, but are systematically larger than would be obtained with a cross‐sectional or cohort sample from the same population, even in settings where the weighted and unweighted logistic regression estimators are similar or identical. Implementation of the new estimators is straightforward and code is provided in R. 相似文献

3.

Diagnosis of Multivariate Control Chart Signal Based on Dummy Variable Regression Technique

《统计学通讯:理论与方法》2013,42(8):1665-1684

Abstract

It is common to monitor several correlated quality characteristics using the Hotelling's T ² statistic. However, T ² confounds the location shift with scale shift and consequently it is often difficult to determine the factors responsible for out of control signal in terms of the process mean vector and/or process covariance matrix. In this paper, we propose a diagnostic procedure called ‘D-technique’ to detect the nature of shift. For this purpose, two sets of regression equations, each consisting of regression of a variable on the remaining variables, are used to characterize the ‘structure’ of the ‘in control’ process and that of ‘current’ process. To determine the sources responsible for an out of control state, it is shown that it is enough to compare these two structures using the dummy variable multiple regression equation. The proposed method is operationally simpler and computationally advantageous over existing diagnostic tools. The technique is illustrated with various examples. 相似文献

4.

On the existence of maximum likelihood estimates for weighted logistic regression

Guoping Zeng 《统计学通讯:理论与方法》2017,46(22):11194-11203

The problems of existence and uniqueness of maximum likelihood estimates for logistic regression were completely solved by Silvapulle in 1981 and Albert and Anderson in 1984. In this paper, we extend the well-known results by Silvapulle and by Albert and Anderson to weighted logistic regression. We analytically prove the equivalence between the overlap condition used by Albert and Anderson and that used by Silvapulle. We show that the maximum likelihood estimate of weighted logistic regression does not exist if there is a complete separation or a quasicomplete separation of the data points, and exists and is unique if there is an overlap of data points. Our proofs and results for weighted logistic apply to unweighted logistic regression. 相似文献

5.

An example arising from berkson's conjecture

Robert L. Fountain Jerome P. Keating C. Radhakrishna Rao 《统计学通讯:理论与方法》2013,42(11):3457-3472

Berkson (1980) conjectured that minimum x² was a superior procedure to that of maximum likelihood, especially with regard to mean squared error. To explore his conjecture, we analyze his (1955) bioassay problem related to logistic regression. We consider not only the criterion of mean squared error for the comparison of these estimators, but also include alternative criteria such as concentration functions and Pitman's measure of closeness. The choice of these latter criteria is motivated by Rao's (1981) considerations of the shortcomings of mean squared error. We also include several Rao-Blackwellized versions of the minimum logit x² the purpose of these comparisons. 相似文献

6.

Large sample inference in random coefficient regression models

Randy L. Carter Mark C.K. Yang 《统计学通讯:理论与方法》2013,42(8):2507-2525

Random coefficient regression models have been used t odescribe repeated measures on members of a sample of n in dividuals . Previous researchers have proposed methods of estimating the mean parameters of such models. Their methods require that eachindividual be observed under the same settings of independent variablesor , lesss stringently , that the number of observations ,r , on each individual be the same. Under the latter restriction ,estimators of mean regression parameters exist which are consist ent as both r^→∞and n^→∞ and efficient as r^→∞, and large sample ( r large ) tests of mean parameters are available . These results are easily extended to the case where not a11 individuals are observed an equal number of times provided limit are taken as min(r) → ∞. Existing methods of inference , however, are not justified by the current literature when n is large and r is small, as is the case i n many bio-medical applications . The primary con tribution of the current paper is a derivation of the asymptotic properties of modifications of existing estimators as n alone tends to infinity, r fixed. From these properties it is shown that existing methods of inference, which are currently justified only when min(r) is large, are also justifiable when n is large and min(r) is small. A secondary contribution is the definition of a positive definite estimator of the covariance matrix for the random coefficients in these models. Use of this estimator avoids computational problems that can otherwise arise. 相似文献

7.

Non parametric regression estimations over Lp risk based on biased data

Junke Kou 《统计学通讯:理论与方法》2017,46(5):2375-2395

Using a wavelet basis, Chesneau and Shirazi study the estimation of one-dimensional regression functions in a biased non parametric model over L² risk (see Chesneau, C and Shirazi, E. Non parametric wavelet regression based on biased data, Communication in Statistics – Theory and Methods, 43: 2642–2658, 2014). This article considers d-dimensional regression function estimation over L^p?(1 ? p < ∞) risk. It turns out that our results reduce to the corresponding theorems of Chesneau and Shirazi’s theorems, when d = 1 and p = 2. 相似文献

8.

A comparison of ordinal logistic regression models using Classical and Bayesian approaches in an analysis of factors associated with diabetic retinopathy

K. Vaitheeswaran M. Subbiah R. Ramakrishnan T. Kannan 《Journal of applied statistics》2016,43(12):2254-2260

Estimating the risk factors of a disease such as diabetic retinopathy (DR) is one of the important research problems among bio-medical and statistical practitioners as well as epidemiologists. Incidentally many studies have focused in building models with binary outcomes, that may not exploit the available information. This article has investigated the importance of retaining the ordinal nature of the response variable (e.g. severity level of a disease) while determining the risk factors associated with DR. A generalized linear model approach with appropriate link functions has been studied using both Classical and Bayesian frameworks. From the result of this study, it can be observed that the ordinal logistic regression with probit link function could be more appropriate approach in determining the risk factors of DR. The study has emphasized the ways to handle the ordinal nature of the response variable with better model fit compared to other link functions. 相似文献

9.

A graphical evaluation of logistic ridge estimator in mixture experiments

Kadri Ulas Akay 《Journal of applied statistics》2014,41(6):1217-1232

In comparison to other experimental studies, multicollinearity appears frequently in mixture experiments, a special study area of response surface methodology, due to the constraints on the components composing the mixture. In the analysis of mixture experiments by using a special generalized linear model, logistic regression model, multicollinearity causes precision problems in the maximum-likelihood logistic regression estimate. Therefore, effects due to multicollinearity can be reduced to a certain extent by using alternative approaches. One of these approaches is to use biased estimators for the estimation of the coefficients. In this paper, we suggest the use of logistic ridge regression (RR) estimator in the cases where there is multicollinearity during the analysis of mixture experiments using logistic regression. Also, for the selection of the biasing parameter, we use fraction of design space plots for evaluating the effect of the logistic RR estimator with respect to the scaled mean squared error of prediction. The suggested graphical approaches are illustrated on the tumor incidence data set. 相似文献

10.

Construction of simultaneous confidence bands for multiple logistic regression models over restricted regions

Lucy Kerns 《Statistics》2016,50(6):1332-1345

相似文献

11.

Sparse discriminant analysis based on estimation of posterior probabilities

Akinori Hidaka Kenji Watanabe Takio Kurita 《Journal of applied statistics》2019,46(15):2761-2785

ABSTRACT

Fisher's linear discriminant analysis (FLDA) is known as a method to find a discriminative feature space for multi-class classification. As a theory of extending FLDA to an ultimate nonlinear form, optimal nonlinear discriminant analysis (ONDA) has been proposed. ONDA indicates that the best theoretical nonlinear map for maximizing the Fisher's discriminant criterion is formulated by using the Bayesian a posterior probabilities. In addition, the theory proves that FLDA is equivalent to ONDA when the Bayesian a posterior probabilities are approximated by linear regression (LR). Due to some limitations of the linear model, there is room to modify FLDA by using stronger approximation/estimation methods. For the purpose of probability estimation, multi-nominal logistic regression (MLR) is more suitable than LR. Along this line, in this paper, we develop a nonlinear discriminant analysis (NDA) in which the posterior probabilities in ONDA are estimated by MLR. In addition, in this paper, we develop a way to introduce sparseness into discriminant analysis. By applying L1 or L2 regularization to LR or MLR, we can incorporate sparseness in FLDA and our NDA to increase generalization performance. The performance of these methods is evaluated by benchmark experiments using last_exam17 standard datasets and a face classification experiment. 相似文献

12.

The regression dilemma

R.R. Hocking O.J. Pendleton 《统计学通讯:理论与方法》2013,42(5):497-527

A substantial fraction of the statistical analyses and in particular statistical computing is done under the heading of multiple linear regression. That is the fitting of equations to multivariate data using the least squares technique for estimating parameters The optimality properties of these estimates are described in an ideal setting which is not often realized in practice.

Frequently, we do not have "good" data in the sense that the errors are non-normal or the variance is non-homogeneous. The data may contain outliers or extremes which are not easily detectable but variables in the proper functional, and we. may have the linearity

Prior to the mid-sixties regression programs provided just the basic least squares computations plus possibly a step-wise algorithm for variable selection. The increased interest in regression prompted by dramatic improvements in computers has led to a vast amount of literatur describing alternatives to least squares improved variable selection methods and extensive diagnostic procedures

The purpose of this paper is to summarize and illustrate some of these recent developments. In particular we shall review some of the potential problems with regression data discuss the statistics and techniques used to detect these problems and consider some of the proposed solutions. An example is presented to illustrate the effectiveness of these diagnostic methods in revealing such problems and the potential consequences of employing the proposed methods. 相似文献

13.

A characterization of the efficiency of individualized logistic regressions

Shelley B. Bull Allan Donner 《Revue canadienne de statistique》1993,21(1):71-78

The use of individualized regressions, which reduces the polychotomous logistic regression model to several dichotomous models, has been proposed as a solution to some practical difficulties for binary covariates (Begg and Gray 1984, Biometrika, 71, 11–18). Its disadvantages, however, include loss of efficiency and the complexity of making comparisons among regressions. Using expressions for the large-sample distribution of the maximum-likelihood estimates, the efficiency of the individualized procedure relative to the polychotomous procedure is evaluated for the case in which the covariates are assumed to follow a multivariate normal distribution. The relative efficiency when the logistic slope vectors from different regressions are collinear can be substantially lower compared to the efficiency with orthogonal slope vectors. Further evaluations for binary covariates using collinear and orthogonal slope parametrizations lead to a similar characterization. 相似文献

14.

Another Cautionary Note about R 2: Its Use in Weighted Least-Squares Regression Analysis

John B. Willett Judith D. Singer 《The American statistician》2013,67(3):236-238

A recent article in this journal presented a variety of expressions for the coefficient of determination (R ²) and demonstrated that these expressions were generally not equivalent. The article discussed potential pitfalls in interpreting the R ² statistic in ordinary least-squares regression analysis. The current article extends this discussion to the case in which regression models are fit by weighted least squares and points out an additional pitfall that awaits the unwary data analyst. We show that unthinking reliance on the R ² statistic can lead to an overly optimistic interpretation of the proportion of variance accounted for in the regression. We propose a modification of the estimator and demonstrate its utility by example. 相似文献

15.

Functional logistic regression with fused lasso penalty

Hyojoong Kim 《Journal of Statistical Computation and Simulation》2018,88(15):2982-2999

This study considers the binary classification of functional data collected in the form of curves. In particular, we assume a situation in which the curves are highly mixed over the entire domain, so that the global discriminant analysis based on the entire domain is not effective. This study proposes an interval-based classification method for functional data: the informative intervals for classification are selected and used for separating the curves into two classes. The proposed method, called functional logistic regression with fused lasso penalty, combines the functional logistic regression as a classifier and the fused lasso for selecting discriminant segments. The proposed method automatically selects the most informative segments of functional data for classification by employing the fused lasso penalty and simultaneously classifies the data based on the selected segments using the functional logistic regression. The effectiveness of the proposed method is demonstrated with simulated and real data examples. 相似文献

16.

Comparing Two Tests for Two Rates

Chunpeng Fan Lin Wang Lynn Wei 《The American statistician》2017,71(3):275-281

This article rigorously proves superiority of the proportion χ² test to the logistic regression Wald test in terms of power when comparing two rates, despite their asymptotic equivalence under the null hypothesis that the two rates are equal. 相似文献

17.

Comparison of Goodness-of-Fit Measures in Probit Regression Model

Berna Yazici Özlem Alpu Yaning Yang 《统计学通讯:模拟与计算》2013,42(5):1061-1073

This article examines several goodness-of-fit measures in the binary probit regression model. Existing pseudo-R ² measures are reviewed, two modified and one new pseudo-R ² measure are proposed. For the probit regression model, empirical comparisons are made for different goodness-of-fit measures with the squared sample correlation coefficient of the observed response and the predicted probabilities. As an illustration, the goodness-of-fit measures are applied to a “paid labor force” data set. 相似文献

18.

O(N 2)-Operation Approximation of Covariance Matrix Inverse in Gaussian Process Regression Based on Quasi-Newton BFGS Method

W. E. Leithead Yunong Zhang 《统计学通讯:模拟与计算》2013,42(2):367-380

Gaussian process (GP) is a Bayesian nonparametric regression model, showing good performance in various applications. However, during its model-tuning procedure, the GP implementation suffers from numerous covariance-matrix inversions of expensive O(N³) operations, where N is the matrix dimension. In this article, we propose using the quasi-Newton BFGS O(N²)-operation formula to approximate/replace recursively the inverse of covariance matrix at every iteration. The implementation accuracy is guaranteed carefully by a matrix-trace criterion and by the restarts technique to generate good initial guesses. A number of numerical tests are then performed based on the sinusoidal regression example and the Wiener–Hammerstein identification example. It is shown that by using the proposed implementation, more than 80% O(N³) operations could be eliminated, and a typical speedup of 5–9 could be achieved as compared to the standard maximum-likelihood-estimation (MLE) implementation commonly used in Gaussian process regression. 相似文献

19.

Methods for estimating relative risk in studies of common binary outcomes

Alok Kumar Dwivedi Indika Mallawaarachchi Soyoung Lee Patrick Tarwater 《Journal of applied statistics》2014,41(3):484-500

Studying the effect of exposure or intervention on a dichotomous outcome is very common in medical research. Logistic regression (LR) is often used to determine such association which provides odds ratio (OR). OR often overestimates the effect size for prevalent outcome data. In such situations, use of relative risk (RR) has been suggested. We propose modifications in Zhang and Yu and Diaz-Quijano methods. These methods were compared with stratified Mantel Haenszel method, LR, log binomial regression (LBR), Zhang and Yu method, Poisson/Cox regression, modified Poisson/Cox regression, marginal probability method, COPY method, inverse probability of treatment weighted LBR, and Diaz-Quijano method. Our proposed modified Diaz-Quijano (MDQ) method provides RR and its confidence interval similar to those estimated by modified Poisson/Cox and LBRs. The proposed modifications in Zhang and Yu method provides better estimate of RR and its standard error as compared to Zhang and Yu method in a variety of situations with prevalent outcome. The MDQ method can be used easily to estimate the RR and its confidence interval in the studies which require reporting of RRs. Regression models which directly provide the estimate of RR without convergence problems such as the MDQ method and modified Poisson/Cox regression should be preferred. 相似文献

20.

Approximate Representation of Estimators in Constrained Regression Problems 总被引：6，自引：0，他引：6

Jinde Wang 《Scandinavian Journal of Statistics》2000,27(1):21-33

The estimators of inequality-constrained regression problems can be computed by iterative algorithms of mathematical programming, but they do not have analytical expressions in terms of the given data. This situation brings obstacles to further studies on the constrained regression. In this paper we derive approximate representations of the estimators with a remainder of magnitude ( N ⁻¹ log log N )^1/2. From these representations one can clearly see the concrete structure of the estimators of these problems. It will be very helpful for further regression analysis. 相似文献