期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Simultaneous structure estimation and variable selection in partial linear varying coefficient models for longitudinal data

Kangning Wang 《Journal of Statistical Computation and Simulation》2015,85(7):1459-1473

Partial linear varying coefficient models (PLVCM) are often considered for analysing longitudinal data for a good balance between flexibility and parsimony. The existing estimation and variable selection methods for this model are mainly built upon which subset of variables have linear or varying effect on the response is known in advance, or say, model structure is determined. However, in application, this is unreasonable. In this work, we propose a simultaneous structure estimation and variable selection method, which can do simultaneous coefficient estimation and three types of selections: varying and constant effects selection, relevant variable selection. It can be easily implemented in one step by employing a penalized M-type regression, which uses a general loss function to treat mean, median, quantile and robust mean regressions in a unified framework. Consistency in the three types of selections and oracle property in estimation are established as well. Simulation studies and real data analysis also confirm our method. 相似文献

2.

Rank-based group variable selection

Guy-vanie M. Miakonkana Asheber Abebe 《Journal of nonparametric statistics》2016,28(3):550-562

A robust rank-based estimator for variable selection in linear models, with grouped predictors, is studied. The proposed estimation procedure extends the existing rank-based variable selection [Johnson, B.A., and Peng, L. (2008), ‘Rank-based Variable Selection’, Journal of Nonparametric Statistics, 20(3):241–252] and the ww-scad [Wang, L., and Li, R. (2009), ‘Weighted Wilcoxon-type Smoothly Clipped Absolute Deviation Method’, Biometrics, 65(2):564–571] to linear regression models with grouped variables. The resulting estimator is robust to contamination or deviations in both the response and the design space.The Oracle property and asymptotic normality of the estimator are established under some regularity conditions. Simulation studies reveal that the proposed method performs better than the existing rank-based methods [Johnson, B.A., and Peng, L. (2008), ‘Rank-based Variable Selection’, Journal of Nonparametric Statistics, 20(3):241–252; Wang, L., and Li, R. (2009), ‘Weighted Wilcoxon-type Smoothly Clipped Absolute Deviation Method’, Biometrics, 65(2):564–571] for grouped variables models. This estimation procedure also outperforms the adaptive hlasso [Zhou, N., and Zhu, J. (2010), ‘Group Variable Selection Via a Hierarchical Lasso and its Oracle Property’, Interface, 3(4):557–574] in the presence of local contamination in the design space or for heavy-tailed error distribution. 相似文献

3.

我国正规就业者的教育收益率

郭冬梅等《统计研究》2014,31(8):19-23

针对教育收益率测算中可能存在的弱工具变量问题,本文利用2006年中国健康与营养调查数据,结合工具变量估计框架下的各种模型设定检验,对我国正规就业者的教育收益率进行测算。检验和测算结果表明：受教育程度的变量存在内生性,个体配偶的受教育年限是内生变量受教育程度的强工具变量,而个体的出生季度是弱工具变量。广义矩估计结果显示我国正规就业者的教育收益率为10.1%。相似文献

4.

Statistical analysis of rock-burst events in underground mines and excavations to present reasonable data-driven predictors

Sajjad Afraei Sayyed Hasan Madani 《Journal of Statistical Computation and Simulation》2017,87(17):3336-3376

Rock bursts are sudden and violent failures of surrounding rockmasses in underground mines and excavations. In this paper, a database consisting of 188 case histories was collected. Each case history contains some of the predictor variables ‘overburden thickness, maximum tangential stress, uniaxial compressive strength of rock, tensile strength of rock, stress ratio, brittleness ratio and elastic energy index’ and one of the four defined classes for the dependent variable ‘rock burst intensity’. A strategy, including ‘outlier detection and substitution, normality evaluation, deduction of distribution functions, estimation of mean and mean variation ranges, evaluation of mean-equality and distribution function-equality hypotheses, correlation analysis and factor analysis for in-review variables’, was implemented. The strategy led to conclude that some predictor variables with available case histories have no contributions for rock burst prediction. These inferences were in accordance with the results of regression techniques for qualitative dependent variables. Besides, many predictor variable arrangements were incompatible with factor analysis. In the case of compatible arrangements, the variation of the predictor variables cannot be considerably reflected. Application of nonlinear principal component analysis using auto-associative neural networks did not also lead to representative components. Therefore, the significant predictor variables can only be used to design new classifiers. 相似文献

5.

ESTIMATING DISTRIBUTION FUNCTIONS FROM SURVEY DATA WITH LIMITED BENCHMARK INFORMATION

R. Dunstan R. L. Chambers 《Australian & New Zealand Journal of Statistics》1989,31(1):1-11

The model-based approach to estimation of finite population distribution functions introduced in Chambers & Dunstan (1986) is extended to the case where only summary information is available for the auxiliary size variable. Monte Carlo results indicate that this ‘limited information’ extension is almost as efficient as the ‘full information’ method proposed in the above reference. These results also indicate that the model-based confidence intervals generated by either of these methods have superior coverage properties to more conventional design-based confidence intervals. 相似文献

6.

Dealing with big data: comparing dimension reduction and shrinkage regression methods

Hamideh D. Hamedani Sara Sadat Moosavi 《Journal of applied statistics》2017,44(3):511-532

In the past decades, the number of variables explaining observations in different practical applications increased gradually. This has led to heavy computational tasks, despite of widely using provisional variable selection methods in data processing. Therefore, more methodological techniques have appeared to reduce the number of explanatory variables without losing much of the information. In these techniques, two distinct approaches are apparent: ‘shrinkage regression’ and ‘sufficient dimension reduction’. Surprisingly, there has not been any communication or comparison between these two methodological categories, and it is not clear when each of these two approaches are appropriate. In this paper, we fill some of this gap by first reviewing each category in brief, paying special attention to the most commonly used methods in each category. We then compare commonly used methods from both categories based on their accuracy, computation time, and their ability to select effective variables. A simulation study on the performance of the methods in each category is generated as well. The selected methods are concurrently tested on two sets of real data which allows us to recommend conditions under which one approach is more appropriate to be applied to high-dimensional data. 相似文献

7.

A Shrinkage Estimation of Central Subspace in Sufficient Dimension Reduction

Qin Wang 《统计学通讯:模拟与计算》2013,42(10):1868-1876

Sliced regression is an effective dimension reduction method by replacing the original high-dimensional predictors with its appropriate low-dimensional projection. It is free from any probabilistic assumption and can exhaustively estimate the central subspace. In this article, we propose to incorporate shrinkage estimation into sliced regression so that variable selection can be achieved simultaneously with dimension reduction. The new method can improve the estimation accuracy and achieve better interpretability for the reduced variables. The efficacy of proposed method is shown through both simulation and real data analysis. 相似文献

8.

Illuminate the unknown: evaluation of imputation procedures based on the SAVE survey 总被引：1，自引：0，他引：1

Michael Ziegelmeyer 《AStA Advances in Statistical Analysis》2013,97(1):49-76

Questions about monetary variables (such as income, wealth or savings) are key components of questionnaires on household finances. However, missing information on such sensitive topics is a well-known phenomenon which can seriously bias any inference based only on complete-case analysis. Many imputation techniques have been developed and implemented in several surveys. Using the German SAVE data, a new estimation technique is necessary to overcome the upward bias of monetary variables caused by the initially implemented imputation procedure. The upward bias is the result of adding random draws to the implausible negative values predicted by OLS regressions until all values are positive. To overcome this problem the logarithm of the dependent variable is taken and the predicted values are retransformed to the original scale by Duan’s smearing estimate. This paper evaluates the two different techniques for the imputation of monetary variables implementing a simulation study, where a random pattern of missingness is imposed on the observed values of the variables of interest. A Monte-Carlo simulation based on the observed data shows the superiority of the newly implemented smearing estimate to construct the missing data structure. All waves are consistently imputed using the new method. 相似文献

9.

An alternative circular smoothing method to nonparametric estimation of periodic functions

Zheng Xu 《Journal of applied statistics》2016,43(9):1649-1672

This article provides alternative circular smoothing methods in nonparametric estimation of periodic functions. By treating the data as ‘circular’, we solve the “boundary issue” in the nonparametric estimation treating the data as ‘linear’. By redefining the distance metric and signed distance, we modify many estimators used in the situations involving periodic patterns. In the perspective of ‘nonparametric estimation of periodic functions’, we present the examples in nonparametric estimation of (1) a periodic function, (2) multiple periodic functions, (3) an evolving function, (4) a periodically varying-coefficient model and (5) a generalized linear model with periodically varying coefficient. In the perspective of ‘circular statistics’, we provide alternative approaches to calculate the weighted average and evaluate the ‘linear/circular–linear/circular’ association and regression. Simulation studies and an empirical study of electricity price index have been conducted to illustrate and compare our methods with other methods in the literature. 相似文献

10.

A class of estimators for quantitative sensitive data

Giancarlo Diana Pier Francesco Perri 《Statistical Papers》2011,52(3):633-650

Randomized response methods for quantitative sensitive data are treated in an unified approach which includes the use of auxiliary information at the estimation stage. A class of estimators for the mean of a sensitive variable is proposed under a generic randomization model and the optimum estimator is obtained. Some special models are discussed in detail. To evaluate the degree of respondents’ confidentiality in models using auxiliary variables, a new measure of privacy protection is introduced. Different models are then compared both from the perspective of efficiency and privacy protection. 相似文献

11.

SEQUENTIAL,BOTTOM‐UP VARIABLE SELECTION FOR HIGH‐DIMENSIONAL CLASSIFICATION

Peter Hall Hugh Miller 《Australian & New Zealand Journal of Statistics》2010,52(4):403-421

Most methods for variable selection work from the top down and steadily remove features until only a small number remain. They often rely on a predictive model, and there are usually significant disconnections in the sequence of methodologies that leads from the training samples to the choice of the predictor, then to variable selection, then to choice of a classifier, and finally to classification of a new data vector. In this paper we suggest a bottom‐up approach that brings the choices of variable selector and classifier closer together, by basing the variable selector directly on the classifier, removing the need to involve predictive methods in the classification decision, and enabling the direct and transparent comparison of different classifiers in a given problem. Specifically, we suggest ‘wrapper methods’, determined by classifier type, for choosing variables that minimize the classification error rate. This approach is particularly useful for exploring relationships among the variables that are chosen for the classifier. It reveals which variables have a high degree of leverage for correct classification using different classifiers; it shows which variables operate in relative isolation, and which are important mainly in conjunction with others; it permits quantification of the authority with which variables are selected; and it generally leads to a reduced number of variables for classification, in comparison with alternative approaches based on prediction. 相似文献

12.

基于随机化适应性Lasso的高维变量选择

闫懋博田茂再《统计研究》2021,38(1):147-160

Lasso等惩罚变量选择方法选入模型的变量数受到样本量限制。文献中已有研究变量系数显著性的方法舍弃了未选入模型的变量含有的信息。本文在变量数大于样本量即p>n的高维情况下,使用随机化bootstrap方法获得变量权重,在计算适应性Lasso时构建选择事件的条件分布并剔除系数不显著的变量,以得到最终估计结果。本文的创新点在于提出的方法突破了适应性Lasso可选变量数的限制,当观测数据含有大量干扰变量时能够有效地识别出真实变量与干扰变量。与现有的惩罚变量选择方法相比,多种情境下的模拟研究展示了所提方法在上述两个问题中的优越性。实证研究中对NCI-60癌症细胞系数据进行了分析,结果较以往文献有明显改善。相似文献

13.

Semiparametric Estimators for Limited Dependent Variable (LDV) Models with Endogenous Regressors

Myoung-Jae Lee 《Econometric Reviews》2013,32(2):171-214

This article reviews semiparametric estimators for limited dependent variable (LDV) models with endogenous regressors, where nonlinearity and nonseparability pose difficulties. We first introduce six main approaches in the linear equation system literature to handle endogenous regressors with linear projections: (i) ‘substitution’ replacing the endogenous regressors with their projected versions on the system exogenous regressors x, (ii) instrumental variable estimator (IVE) based on E{(error) × x} = 0, (iii) ‘model-projection’ turning the original model into a model in terms of only x-projected variables, (iv) ‘system reduced form (RF)’ finding RF parameters first and then the structural form (SF) parameters, (v) ‘artificial instrumental regressor’ using instruments as artificial regressors with zero coefficients, and (vi) ‘control function’ adding an extra term as a regressor to control for the endogeneity source. We then check if these approaches are applicable to LDV models using conditional mean/quantiles instead of linear projection. The six approaches provide a convenient forum on which semiparametric estimators in the literature can be categorized, although there are a few exceptions. The pros and cons of the approaches are discussed, and a small-scale simulation study is provided for some reviewed estimators. 相似文献

14.

Variable Selection Methods in High-dimensional Regression—A Simulation Study

Shirin Shahriari Susana Faria A. Manuela Gonçalves 《统计学通讯:模拟与计算》2015,44(10):2548-2561

A challenging problem in the analysis of high-dimensional data is variable selection. In this study, we describe a bootstrap based technique for selecting predictors in partial least-squares regression (PLSR) and principle component regression (PCR) in high-dimensional data. Using a bootstrap-based technique for significance tests of the regression coefficients, a subset of the original variables can be selected to be included in the regression, thus obtaining a more parsimonious model with smaller prediction errors. We compare the bootstrap approach with several variable selection approaches (jack-knife and sparse formulation-based methods) on PCR and PLSR in simulation and real data. 相似文献

15.

Nonparametric particle filtering and smoothing with quasi-Monte Carlo sampling

《Journal of Statistical Computation and Simulation》2012,82(11):1361-1379

Sequential Monte Carlo methods (also known as particle filters and smoothers) are used for filtering and smoothing in general state-space models. These methods are based on importance sampling. In practice, it is often difficult to find a suitable proposal which allows effective importance sampling. This article develops an original particle filter and an original particle smoother which employ nonparametric importance sampling. The basic idea is to use a nonparametric estimate of the marginally optimal proposal. The proposed algorithms provide a better approximation of the filtering and smoothing distributions than standard methods. The methods’ advantage is most distinct in severely nonlinear situations. In contrast to most existing methods, they allow the use of quasi-Monte Carlo (QMC) sampling. In addition, they do not suffer from weight degeneration rendering a resampling step unnecessary. For the estimation of model parameters, an efficient on-line maximum-likelihood (ML) estimation technique is proposed which is also based on nonparametric approximations. All suggested algorithms have almost linear complexity for low-dimensional state-spaces. This is an advantage over standard smoothing and ML procedures. Particularly, all existing sequential Monte Carlo methods that incorporate QMC sampling have quadratic complexity. As an application, stochastic volatility estimation for high-frequency financial data is considered, which is of great importance in practice. The computer code is partly available as supplemental material. 相似文献

16.

A unified empirical likelihood approach for testing MCAR and subsequent estimation

Shixiao Zhang Peisong Han Changbao Wu 《Scandinavian Journal of Statistics》2019,46(1):272-288

For an estimation with missing data, a crucial step is to determine if the data are missing completely at random (MCAR), in which case a complete‐case analysis would suffice. Most existing tests for MCAR do not provide a method for a subsequent estimation once the MCAR is rejected. In the setting of estimating means, we propose a unified approach for testing MCAR and the subsequent estimation. Upon rejecting MCAR, the same set of weights used for testing can then be used for estimation. The resulting estimators are consistent if the missingness of each response variable depends only on a set of fully observed auxiliary variables and the true outcome regression model is among the user‐specified functions for deriving the weights. The proposed method is based on the calibration idea from survey sampling literature and the empirical likelihood theory. 相似文献

17.

From distance sampling to spatial capture–recapture

David L. Borchers Tiago A. Marques 《AStA Advances in Statistical Analysis》2017,101(4):475-494

Distance sampling and capture–recapture are the two most widely used wildlife abundance estimation methods. capture–recapture methods have only recently incorporated models for spatial distribution and there is an increasing tendency for distance sampling methods to incorporated spatial models rather than to rely on partly design-based spatial inference. In this overview we show how spatial models are central to modern distance sampling and that spatial capture–recapture models arise as an extension of distance sampling methods. Depending on the type of data recorded, they can be viewed as particular kinds of hierarchical binary regression, Poisson regression, survival or time-to-event models, with individuals’ locations as latent variables and a spatial model as the latent variable distribution. Incorporation of spatial models in these two methods provides new opportunities for drawing explicitly spatial inferences. Areas of likely future development include more sophisticated spatial and spatio-temporal modelling of individuals’ locations and movements, new methods for integrating spatial capture–recapture and other kinds of ecological survey data, and methods for dealing with the recapture uncertainty that often arise when “capture” consists of detection by a remote device like a camera trap or microphone. 相似文献

18.

Variable selection for model-based clustering using the integrated complete-data likelihood

Matthieu Marbac Mohammed Sedki 《Statistics and Computing》2017,27(4):1049-1063

Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty. However, the calibration of the penalty term can suffer from criticisms. Model selection methods are an efficient alternative, yet they require a difficult optimization of an information criterion which involves combinatorial problems. First, most of these optimization algorithms are based on a suboptimal procedure (e.g. stepwise method). Second, the algorithms are often computationally expensive because they need multiple calls of EM algorithms. Here we propose to use a new information criterion based on the integrated complete-data likelihood. It does not require the maximum likelihood estimate and its maximization appears to be simple and computationally efficient. The original contribution of our approach is to perform the model selection without requiring any parameter estimation. Then, parameter inference is needed only for the unique selected model. This approach is used for the variable selection of a Gaussian mixture model with conditional independence assumed. The numerical experiments on simulated and benchmark datasets show that the proposed method often outperforms two classical approaches for variable selection. The proposed approach is implemented in the R package VarSelLCM available on CRAN. 相似文献

19.

Robust exploratory factor analysis

Reinhold Kosfeld 《Statistical Papers》1996,37(2):105-122

In classical factor analysis, a few outliers can bias the factor structure extracted from the relationship between manifest variables. As in least-squares regression analysis there is no protection against deviant observations. This paper discusses estimation methods which aim to extract the “true” factor structure reflecting the relationships within the bulk of the data. Such estimation methods constitute the core of robust factor analysis. By means of a simulation study, we illustrate that an implementation of robust estimation methods can lend considerable improvement to the validity of a factor analysis. 相似文献

20.

正则化Beta回归及其应用

方匡南王秉权《统计与信息论坛》2016,(8):14-20

随着计算机的飞速发展,极大地便利了数据的获取和存储,很多企业积累了大量的数据,同时数据的维度也越来越高,噪声变量越来越多,因此在建模分析时面临的重要问题之一就是从高维的变量中筛选出少数的重要变量。针对因变量取值为(0,1)区间的比例数据提出了正则化Beta回归,研究了在LASSO、SCAD和MCP三种惩罚方法下的极大似然估计及其渐进性质。统计模拟表明MCP的方法会优于SCAD和LASSO,并且随着样本量的增大,SCAD的方法也将优于LASSO。最后,将该方法应用到中国上市公司股息率的影响因素研究中。相似文献