首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
High-dimensional sparse modeling with censored survival data is of great practical importance, as exemplified by applications in high-throughput genomic data analysis. In this paper, we propose a class of regularization methods, integrating both the penalized empirical likelihood and pseudoscore approaches, for variable selection and estimation in sparse and high-dimensional additive hazards regression models. When the number of covariates grows with the sample size, we establish asymptotic properties of the resulting estimator and the oracle property of the proposed method. It is shown that the proposed estimator is more efficient than that obtained from the non-concave penalized likelihood approach in the literature. Based on a penalized empirical likelihood ratio statistic, we further develop a nonparametric likelihood approach for testing the linear hypothesis of regression coefficients and constructing confidence regions consequently. Simulation studies are carried out to evaluate the performance of the proposed methodology and also two real data sets are analyzed.  相似文献   

2.
The penalized logistic regression is a useful tool for classifying samples and feature selection. Although the methodology has been widely used in various fields of research, their performance takes a sudden turn for the worst in the presence of outlier, since the logistic regression is based on the maximum log-likelihood method which is sensitive to outliers. It implies that we cannot accurately classify samples and find important factors having crucial information for classification. To overcome the problem, we propose a robust penalized logistic regression based on a weighted likelihood methodology. We also derive an information criterion for choosing the tuning parameters, which is a vital matter in robust penalized logistic regression modelling in line with generalized information criteria. We demonstrate through Monte Carlo simulations and real-world example that the proposed robust modelling strategies perform well for sparse logistic regression modelling even in the presence of outliers.  相似文献   

3.
In this paper, the generalized varying-coefficient single-index model is discussed based on penalized likelihood. All the unknown functions are fitted by penalized spline. The estimates of the unknown parameters and the unknown coefficient functions are obtained and the estimation approach is rapid and computationally stable. Under some mild conditions, the consistency and the asymptotic normality of these resulting estimators are given. Two simulation studies are carried out to illustrate the performance of the estimates. An application of the model to the Hong Kong environmental data further demonstrates the potential of the proposed modelling procedures.  相似文献   

4.
In this article, we propose an outlier detection approach in a multiple regression model using the properties of a difference-based variance estimator. This type of a difference-based variance estimator was originally used to estimate error variance in a non parametric regression model without estimating a non parametric function. This article first employed a difference-based error variance estimator to study the outlier detection problem in a multiple regression model. Our approach uses the leave-one-out type method based on difference-based error variance. The existing outlier detection approaches using the leave-one-out approach are highly affected by other outliers, while ours is not because our approach does not use the regression coefficient estimator. We compared our approach with several existing methods using a simulation study, suggesting the outperformance of our approach. The advantages of our approach are demonstrated using a real data application. Our approach can be extended to the non parametric regression model for outlier detection.  相似文献   

5.
This paper studies outlier detection and accommodation in general spatial models including spatial autoregressive models and spatial error model as special cases. Using mean-shift and variance-weight models respectively, test statistics for multiple outliers are derived and the detecting procedures are proposed. In addition, several key diagnostic measures such as standardized residuals and leverage measure are defined in general spatial models. Outlier modified models are proposed to accommodate outliers in the data set. The performance of test statistics, including size and power, are examined via simulation studies. Three real examples are analyzed and the results show that the proposed methodology is useful for identifying and accommodating outliers in general spatial models.  相似文献   

6.

Outlier detection is an inevitable step to most statistical data analyses. However, the mere detection of an outlying case does not always answer all scientific questions associated with that data point. Outlier detection techniques, classical and robust alike, will typically flag the entire case as outlying, or attribute a specific case weight to the entire case. In practice, particularly in high dimensional data, the outlier will most likely not be outlying along all of its variables, but just along a subset of them. If so, the scientific question why the case has been flagged as an outlier becomes of interest. In this article, a fast and efficient method is proposed to detect variables that contribute most to an outlier’s outlyingness. Thereby, it helps the analyst understand in which way an outlier lies out. The approach pursued in this work is to estimate the univariate direction of maximal outlyingness. It is shown that the problem of estimating that direction can be rewritten as the normed solution of a classical least squares regression problem. Identifying the subset of variables contributing most to outlyingness, can thus be achieved by estimating the associated least squares problem in a sparse manner. From a practical perspective, sparse partial least squares (SPLS) regression, preferably by the fast sparse NIPALS (SNIPLS) algorithm, is suggested to tackle that problem. The performed method is demonstrated to perform well both on simulated data and real life examples.

  相似文献   

7.
This paper studies outlier detection for multilevel models. Approximate formulae for outlier detection in estimating both fixed and random parameters under the mean-shift outlier model are derived, and a test for multiple outliers is proposed. These results can be used to detect outlier units at any levels. Detection of outlier units related to random parts is also studied. Analysis of an example shows that the proposed method is effective in identifying outliers in multilevel models.  相似文献   

8.
The presence of outliers would inevitably lead to distorted analysis and inappropriate prediction, especially for multiple outliers in high-dimensional regression, where the high dimensionality of the data might amplify the chance of an observation or multiple observations being outlying. Noting that the detection of outliers is not only necessary but also important in high-dimensional regression analysis, we, in this paper, propose a feasible outlier detection approach in sparse high-dimensional linear regression model. Firstly, we search a clean subset by use of the sure independence screening method and the least trimmed square regression estimates. Then, we define a high-dimensional outlier detection measure and propose a multiple outliers detection approach through multiple testing procedures. In addition, to enhance efficiency, we refine the outlier detection rule after obtaining a relatively reliable non-outlier subset based on the initial detection approach. By comparison studies based on Monte Carlo simulation, it is shown that the proposed method performs well for detecting multiple outliers in sparse high-dimensional linear regression model. We further illustrate the application of the proposed method by empirical analysis of a real-life protein and gene expression data.  相似文献   

9.
Outlier detection is a critical part of data analysis, and the use of Studentized residuals from regression models fit using least squares is a very common approach to identifying discordant observations in linear regression problems. In this paper we propose a bootstrap approach to constructing critical points for use in outlier detection in the context of least-squares Studentized residuals, and find that this approach allows naturally for mild departures in model assumptions such as non-Normal error distributions. We illustrate our methodology through both a real data example and simulated data.  相似文献   

10.
Outlier detection has been used extensively in data analysis to detect anomalous observation in data. It has important applications such as in fraud detection and robust analysis, among others. In this paper, we propose a method in detecting multiple outliers in linear functional relationship model for circular variables. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering approach. With the use of a tree diagram, we illustrate the detection of outliers graphically. A Monte Carlo simulation study is done to verify the accuracy of the proposed method. Low probability of masking and swamping effects indicate the validity of the proposed approach. Also, the illustrations to two sets of real data are given to show its practical applicability.  相似文献   

11.
There is currently much discussion about lasso-type regularized regression which is a useful tool for simultaneous estimation and variable selection. Although the lasso-type regularization has several advantages in regression modelling, owing to its sparsity, it suffers from outliers because of using penalized least-squares methods. To overcome this issue, we propose a robust lasso-type estimation procedure that uses the robust criteria as the loss function, imposing L1-type penalty called the elastic net. We also introduce to use the efficient bootstrap information criteria for choosing optimal regularization parameters and a constant in outlier detection. Simulation studies and real data analysis are given to examine the efficiency of the proposed robust sparse regression modelling. We observe that our modelling strategy performs well in the presence of outliers.  相似文献   

12.
Motivated by the joint analysis of longitudinal quality of life data and recurrence free survival times from a cancer clinical trial, we present in this paper two approaches to jointly model the longitudinal proportional measurements, which are confined in a finite interval, and survival data. Both approaches assume a proportional hazards model for the survival times. For the longitudinal component, the first approach applies the classical linear mixed model to logit transformed responses, while the second approach directly models the responses using a simplex distribution. A semiparametric method based on a penalized joint likelihood generated by the Laplace approximation is derived to fit the joint model defined by the second approach. The proposed procedures are evaluated in a simulation study and applied to the analysis of breast cancer data motivated this research.  相似文献   

13.
In this article we propose a penalized likelihood approach for the semiparametric density model with parametric and nonparametric components. An efficient iterative procedure is proposed for estimation. Approximate generalized maximum likelihood criterion from Bayesian point of view is derived for selecting the smoothing parameter. The finite sample performance of the proposed estimation approach is evaluated through simulation. Two real data examples, suicide study data and Old Faithful geyser data, are analyzed to demonstrate use of the proposed method.  相似文献   

14.
Penalized likelihood inference in extreme value analyses   总被引:1,自引:0,他引:1  
Models for extreme values are usually based on detailed asymptotic argument, for which strong ergodic assumptions such as stationarity, or prescribed perturbations from stationarity, are required. In most applications of extreme value modelling such assumptions are not satisfied, but the type of departure from stationarity is either unknown or complex, making asymptotic calculations unfeasible. This has led to various approaches in which standard extreme value models are used as building blocks for conditional or local behaviour of processes, with more general statistical techniques being used at the modelling stage to handle the non-stationarity. This paper presents another approach in this direction based on penalized likelihood. There are some advantages to this particular approach: the method has a simple interpretation; computations for estimation are relatively straightforward using standard algorithms; and a simple reinterpretation of the model enables broader inferences, such as confidence intervals, to be obtained using MCMC methodology. Methodological details together with applications to both athletics and environmental data are given.  相似文献   

15.
Detecting local spatial clusters for count data is an important task in spatial epidemiology. Two broad approaches—moving window and disease mapping methods—have been suggested in some of the literature to find clusters. However, the existing methods employ somewhat arbitrarily chosen tuning parameters, and the local clustering results are sensitive to the choices. In this paper, we propose a penalized likelihood method to overcome the limitations of existing local spatial clustering approaches for count data. We start with a Poisson regression model to accommodate any type of covariates, and formulate the clustering problem as a penalized likelihood estimation problem to find change points of intercepts in two-dimensional space. The cost of developing a new algorithm is minimized by modifying an existing least absolute shrinkage and selection operator algorithm. The computational details on the modifications are shown, and the proposed method is illustrated with Seoul tuberculosis data.  相似文献   

16.
For high-dimensional data, it is a tedious task to determine anomalies such as outliers. We present a novel outlier detection method for high-dimensional contingency tables. We use the class of decomposable graphical models to model the relationship among the variables of interest, which can be depicted by an undirected graph called the interaction graph. Given an interaction graph, we derive a closed-form expression of the likelihood ratio test (LRT) statistic and an exact distribution for efficient simulation of the test statistic. An observation is declared an outlier if it deviates significantly from the approximated distribution of the test statistic under the null hypothesis. We demonstrate the use of the LRT outlier detection framework on genetic data modeled by Chow–Liu trees.  相似文献   

17.
We present influence diagnostics for linear measurement error models with stochastic linear restrictions using the corrected likelihood of Nakamura in 1990. The case deletion and mean shift outlier models are developed to identify outlying and influential observations. We derive a corrected score test statistic for outlier detection based on mean shift outlier models. The analogs of Cook's distance and likelihood distance are proposed to determine influential observations based on case deletion models. A parametric bootstrap procedure is used to obtain empirical distributions of the test statistics and a simulation study has been used to evaluate the performance of the proposed estimators based on the mean squares error criterion and the score test statistic. Finally, a numerical example is given to illustrate the theoretical results.  相似文献   

18.
It is illustrated in this paper that hypothesis testing procedures can be derived based on the penalized likelihood approach. Based on this point of view, many traditional hypothesis tests, including the two-sample mean test, score test, and Hotelling’s T2 test are revisited under the penalized likelihood framework. Similar framework is also applicable to the empirical likelihood.  相似文献   

19.
A two-step estimation approach is proposed for the fixed-effect parameters, random effects and their variance σ2 of a Poisson mixed model. In the first step, it is proposed to construct a small σ2-based approximate likelihood function of the data and utilize this function to estimate the fixed-effect parameters and σ2. In the second step, the random effects are estimated by minimizing their posterior mean squared error. Methods of Waclawiw and Liang (1993) based on so-called Stein-type estimating functions and of Breslow and Clayton (1993) based on penalized quasilikelihood are compared with the proposed likelihood method. The results of a simulation study on the performance of all three approaches are reported.  相似文献   

20.
This paper studies the outlier detection and robust variable selection problem in the linear regression model. The penalized weighted least absolute deviation (PWLAD) regression estimation method and the adaptive least absolute shrinkage and selection operator (LASSO) are combined to simultaneously achieve outlier detection, and robust variable selection. An iterative algorithm is proposed to solve the proposed optimization problem. Monte Carlo studies are evaluated the finite-sample performance of the proposed methods. The results indicate that the finite sample performance of the proposed methods performs better than that of the existing methods when there are leverage points or outliers in the response variable or explanatory variables. Finally, we apply the proposed methodology to analyze two real datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号