期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Logistic regression analysis of randomized response data with missing covariates

S.H. Hsieh S.M. Lee P.S. Shen 《Journal of statistical planning and inference》2010

Randomized response is an interview technique designed to eliminate response bias when sensitive questions are asked. In this paper, we present a logistic regression model on randomized response data when the covariates on some subjects are missing at random. In particular, we propose Horvitz and Thompson (1952)-type weighted estimators by using different estimates of the selection probabilities. We present large sample theory for the proposed estimators and show that they are more efficient than the estimator using the true selection probabilities. Simulation results support theoretical analysis. We also illustrate the approach using data from a survey of cable TV. 相似文献

2.

Consistency of Bayesian linear model selection with a growing number of parameters

Zuofeng Shang Murray K. Clayton 《Journal of statistical planning and inference》2011,141(11):3463-3474

Linear models with a growing number of parameters have been widely used in modern statistics. One important problem about this kind of model is the variable selection issue. Bayesian approaches, which provide a stochastic search of informative variables, have gained popularity. In this paper, we will study the asymptotic properties related to Bayesian model selection when the model dimension p is growing with the sample size n. We consider p≤n and provide sufficient conditions under which: (1) with large probability, the posterior probability of the true model (from which samples are drawn) uniformly dominates the posterior probability of any incorrect models; and (2) the posterior probability of the true model converges to one in probability. Both (1) and (2) guarantee that the true model will be selected under a Bayesian framework. We also demonstrate several situations when (1) holds but (2) fails, which illustrates the difference between these two properties. Finally, we generalize our results to include g-priors, and provide simulation examples to illustrate the main results. 相似文献

3.

Variable Selection in Linear Mixed Models Using an Extended Class of Penalties

Julian D. Taylor Arūnas P. Verbyla Colin Cavanagh Marcus Newberry 《Australian & New Zealand Journal of Statistics》2012,54(4):427-449

There is an emerging need to advance linear mixed model technology to include variable selection methods that can simultaneously choose and estimate important effects from a potentially large number of covariates. However, the complex nature of variable selection has made it difficult for it to be incorporated into mixed models. In this paper we extend the well known class of penalties and show that they can be integrated succinctly into a linear mixed model setting. Under mild conditions, the estimator obtained from this mixed model penalised likelihood is shown to be consistent and asymptotically normally distributed. A simulation study reveals that the extended family of penalties achieves varying degrees of estimator shrinkage depending on the value of one of its parameters. The simulation study also shows there is a link between the number of false positives detected and the number of true coefficients when using the same penalty. This new mixed model variable selection (MMVS) technology was applied to a complex wheat quality data set to determine significant quantitative trait loci (QTL). 相似文献

4.

The Adaptive Gril Estimator with a Diverging Number of Parameters

Mohammed El Anbari 《统计学通讯:理论与方法》2013,42(14):2634-2660

We consider the problem of variables selection and estimation in linear regression model in situations where the number of parameters diverges with the sample size. We propose the adaptive Generalized Ridge-Lasso (\mboxAdaGril) which is an extension of the the adaptive Elastic Net. AdaGril incorporates information redundancy among correlated variables for model selection and estimation. It combines the strengths of the quadratic regularization and the adaptively weighted Lasso shrinkage. In this article, we highlight the grouped selection property for AdaCnet method (one type of AdaGril) in the equal correlation case. Under weak conditions, we establish the oracle property of AdaGril which ensures the optimal large performance when the dimension is high. Consequently, it achieves both goals of handling the problem of collinearity in high dimension and enjoys the oracle property. Moreover, we show that AdaGril estimator achieves a Sparsity Inequality, i.e., a bound in terms of the number of non-zero components of the “true” regression coefficient. This bound is obtained under a similar weak Restricted Eigenvalue (RE) condition used for Lasso. Simulations studies show that some particular cases of AdaGril outperform its competitors. 相似文献

5.

Tuning Parameter Selection in Cox Proportional Hazards Model with a Diverging Number of Parameters

《Scandinavian Journal of Statistics》2018,45(3):557-570

Regularized variable selection is a powerful tool for identifying the true regression model from a large number of candidates by applying penalties to the objective functions. The penalty functions typically involve a tuning parameter that controls the complexity of the selected model. The ability of the regularized variable selection methods to identify the true model critically depends on the correct choice of the tuning parameter. In this study, we develop a consistent tuning parameter selection method for regularized Cox's proportional hazards model with a diverging number of parameters. The tuning parameter is selected by minimizing the generalized information criterion. We prove that, for any penalty that possesses the oracle property, the proposed tuning parameter selection method identifies the true model with probability approaching one as sample size increases. Its finite sample performance is evaluated by simulations. Its practical use is demonstrated in The Cancer Genome Atlas breast cancer data. 相似文献

6.

A cluster tree based model selection approach for logistic regression classifier

Ozge Tanju 《Journal of Statistical Computation and Simulation》2018,88(7):1394-1414

Model selection methods are important to identify the best approximating model. To identify the best meaningful model, purpose of the model should be clearly pre-stated. The focus of this paper is model selection when the modelling purpose is classification. We propose a new model selection approach designed for logistic regression model selection where main modelling purpose is classification. The method is based on the distance between the two clustering trees. We also question and evaluate the performances of conventional model selection methods based on information theory concepts in determining best logistic regression classifier. An extensive simulation study is used to assess the finite sample performances of the cluster tree based and the information theoretic model selection methods. Simulations are adjusted for whether the true model is in the candidate set or not. Results show that the new approach is highly promising. Finally, they are applied to a real data set to select a binary model as a means of classifying the subjects with respect to their risk of breast cancer. 相似文献

7.

Subset selection in multiple linear regression in the presence of outlier and multicollinearity

《Statistical Methodology》2014

Various subset selection methods are based on the least squares parameter estimation method. The performance of these methods is not reasonably well in the presence of outlier or multicollinearity or both. Few subset selection methods based on the

M

-estimator are available in the literature for outlier data. Very few subset selection methods account the problem of multicollinearity with ridge regression estimator.In this article, we develop a generalized version of

S_{p}

statistic based on the jackknifed ridge

M

-estimator for subset selection in the presence of outlier and multicollinearity. We establish the equivalence of this statistic with the existing

C_{p}

,

S_{p}

and

R_{p}

statistics. The performance of the proposed method is illustrated through some numerical examples and the correct model selection ability is evaluated using simulation study. 相似文献

8.

Model selection for infinite variance time series

R.H. Glendinning 《统计学通讯:理论与方法》2013,42(4):889-910

In this we consider the problem of model selection for infinite variance time series. We introduce a group of model selection critera based on a general loss function Ψ. This family includes various generalizations of predictive least square and AIC Parameter estimation is carried out using Ψ. We use two loss functions commonly used in robust estimation and show that certain criteria out perform the conventional approach based on least squares or Yule-Walker estimation for heavy tailed innovations. Our conclusions are based on a comprehensive study of the performance of competing criteria for a wide selection of AR(2) models. We also consider the performance of these techniques when the ‘true’ model is not contained in the family of candidate models. 相似文献

9.

Model selection procedures in social research: Monte-Carlo simulation results

Lawrence E. Raffalovich Glenn D. Deane David Armstrong Hui-Shien Tsao 《Journal of applied statistics》2008,35(10):1093-1114

Model selection strategies play an important, if not explicit, role in quantitative research. The inferential properties of these strategies are largely unknown, therefore, there is little basis for recommending (or avoiding) any particular set of strategies. In this paper, we evaluate several commonly used model selection procedures [Bayesian information criterion (BIC), adjusted R ², Mallows’ C _p, Akaike information criteria (AIC), AIC_c, and stepwise regression] using Monte-Carlo simulation of model selection when the true data generating processes (DGP) are known.

We find that the ability of these selection procedures to include important variables and exclude irrelevant variables increases with the size of the sample and decreases with the amount of noise in the model. None of the model selection procedures do well in small samples, even when the true DGP is largely deterministic; thus, data mining in small samples should be avoided entirely. Instead, the implicit uncertainty in model specification should be explicitly discussed. In large samples, BIC is better than the other procedures at correctly identifying most of the generating processes we simulated, and stepwise does almost as well. In the absence of strong theory, both BIC and stepwise appear to be reasonable model selection strategies in large samples. Under the conditions simulated, adjusted R ², Mallows’ C _p AIC, and AIC_c are clearly inferior and should be avoided. 相似文献

10.

Fast and approximate exhaustive variable selection for generalised linear models with APES

Kevin YX Wang Garth Tarr Jean YH Yang Samuel Mueller 《Australian & New Zealand Journal of Statistics》2019,61(4):445-465

We present APproximated Exhaustive Search (APES), which enables fast and approximated exhaustive variable selection in Generalised Linear Models (GLMs). While exhaustive variable selection remains as the gold standard in many model selection contexts, traditional exhaustive variable selection suffers from computational feasibility issues. More precisely, there is often a high cost associated with computing maximum likelihood estimates (MLE) for all subsets of GLMs. Efficient algorithms for exhaustive searches exist for linear models, most notably the leaps‐and‐bound algorithm and, more recently, the mixed integer optimisation (MIO) algorithm. The APES method learns from observational weights in a generalised linear regression super‐model and reformulates the GLM problem as a linear regression problem. In this way, APES can approximate a true exhaustive search in the original GLM space. Where exhaustive variable selection is not computationally feasible, we propose a best‐subset search, which also closely approximates a true exhaustive search. APES is made available in both as a standalone R package as well as part of the already existing mplot package. 相似文献

11.

Birnbaum–Saunders sample selection model

Fernando de Souza Bastos Wagner Barreto-Souza 《Journal of applied statistics》2021,48(11):1896

The sample selection bias problem occurs when the outcome of interest is only observed according to some selection rule, where there is a dependence structure between the outcome and the selection rule. In a pioneering work, J. Heckman proposed a sample selection model based on a bivariate normal distribution for dealing with this problem. Due to the non-robustness of the normal distribution, many alternatives have been introduced in the literature by assuming extensions of the normal distribution like the Student-t and skew-normal models. One common limitation of the existent sample selection models is that they require a transformation of the outcome of interest, which is common

R^{+}

-valued, such as income and wage. With this, data are analyzed on a non-original scale which complicates the interpretation of the parameters. In this paper, we propose a sample selection model based on the bivariate Birnbaum–Saunders distribution, which has the same number of parameters that the classical Heckman model. Further, our associated outcome equation is

R^{+}

-valued. We discuss estimation by maximum likelihood and present some Monte Carlo simulation studies. An empirical application to the ambulatory expenditures data from the 2001 Medical Expenditure Panel Survey is presented. 相似文献

12.

Non-nested model selection based on the quantiles and it’s application in time series

S. Zamani Mehreyan D. Thomakos 《统计学通讯:理论与方法》2019,48(2):332-353

We consider the problem of model selection based on quantile analysis and with unknown parameters estimated using quantile leasts squares. We propose a model selection test for the null hypothesis that the competing models are equivalent against the alternative hypothesis that one model is closer to the true model. We follow with two applications of the proposed model selection test. The first application is in model selection for time series with non-normal innovations. The second application is in model selection in the NoVas method, short for normalizing and variance stabilizing transformation, forecast. A set of simulation results also lends strong support to the results presented in the paper. 相似文献

13.

Variable selection via penalized minimum φ-divergence estimation in logistic regression

D.M. Sakate D.N. Kashid 《Journal of applied statistics》2014,41(6):1233-1246

We propose penalized minimum φ-divergence estimator for parameter estimation and variable selection in logistic regression. Using an appropriate penalty function, we show that penalized φ-divergence estimator has oracle property. With probability tending to 1, penalized φ-divergence estimator identifies the true model and estimates nonzero coefficients as efficiently as if the sparsity of the true model was known in advance. The advantage of penalized φ-divergence estimator is that it produces estimates of nonzero parameters efficiently than penalized maximum likelihood estimator when sample size is small and is equivalent to it for large one. Numerical simulations confirm our findings. 相似文献

14.

Order selection in finite mixtures of linear regressions

Nicolas Depraetere Martina Vandebroek 《Statistical Papers》2014,55(3):871-911

Finite mixture models can adequately model population heterogeneity when this heterogeneity arises from a finite number of relatively homogeneous clusters. An example of such a situation is market segmentation. Order selection in mixture models, i.e. selecting the correct number of components, however, is a problem which has not been satisfactorily resolved. Existing simulation results in the literature do not completely agree with each other. Moreover, it appears that the performance of different selection methods is affected by the type of model and the parameter values. Furthermore, most existing results are based on simulations where the true generating model is identical to one of the models in the candidate set. In order to partly fill this gap we carried out a (relatively) large simulation study for finite mixture models of normal linear regressions. We included several types of model (mis)specification to study the robustness of 18 order selection methods. Furthermore, we compared the performance of these selection methods based on unpenalized and penalized estimates of the model parameters. The results indicate that order selection based on penalized estimates greatly improves the success rates of all order selection methods. The most successful methods were $MDL2$ , $MRC$ , $MRC_k$ , $ICL$ – $BIC$ , $ICL$ , $CAIC$ , $BIC$ and $CLC$ but not one method was consistently good or best for all types of model (mis)specification. 相似文献

15.

Linear model selection by cross-validation

《Journal of statistical planning and inference》2005,128(1):231-240

We consider the problem of model (or variable) selection in the classical regression model based on cross-validation with an added penalty term for penalizing overfitting. Under some weak conditions, the new criterion is shown to be strongly consistent in the sense that with probability one, for all large n, the criterion chooses the smallest true model. The penalty function denoted by C_n depends on the sample size n and is chosen to ensure the consistency in the selection of true model. There are various choices of C_n suggested in the literature on model selection. In this paper we show that a particular choice of C_n based on observed data, which makes it random, preserves the consistency property and provides improved performance over a fixed choice of C_n. 相似文献

16.

Variable selection via the weighted group lasso for factor analysis models

Kei Hirose Sadanori Konishi 《Revue canadienne de statistique》2012,40(2):345-361

We consider the problem of selecting variables in factor analysis models. The $L_1$ regularization procedure is introduced to perform an automatic variable selection. In the factor analysis model, each variable is controlled by multiple factors when there are more than one underlying factor. We treat parameters corresponding to the multiple factors as grouped parameters, and then apply the group lasso. Furthermore, the weight of the group lasso penalty is modified to obtain appropriate estimates and improve the performance of variable selection. Crucial issues in this modeling procedure include the selection of the number of factors and a regularization parameter. Choosing these parameters can be viewed as a model selection and evaluation problem. We derive a model selection criterion for evaluating the factor analysis model via the weighted group lasso. Monte Carlo simulations are conducted to investigate the effectiveness of the proposed procedure. A real data example is also given to illustrate our procedure. The Canadian Journal of Statistics 40: 345–361; 2012 © 2012 Statistical Society of Canada 相似文献

17.

基于吉伯斯样本生成器的向量自回归模型选择

赵昕东钱国骐《统计研究》2008,25(1):86-92

内容提要：向量自回归模型是多元时间序列分析中最常用的方法之一。在建立模型的过程中模型选择是非常重要的一个环节,如果候选模型不是很多时,可以通过比较每个模型的准则值如AIC、AICc、BIC或HQ进行模型选择。可是,当存在大量候选模型时,我们无法一一比较每个模型的准则值。为了解决这个问题,本文提出一个基于吉伯斯样本生成器的向量自回归模型选择方法,结果表明应用该方法能够从大量候选模型中准确、高效地确认准则值最小的模型。相似文献

18.

Identification of average marginal effects under misspecification when covariates are normal

José Ignacio Cuesta Jonathan M. V. Davis Andrew Gianou Alejandro Hoyos 《Econometric Reviews》2019,38(3):350-357

A previously known result in the econometrics literature is that when covariates of an underlying data generating process are jointly normally distributed, estimates from a nonlinear model that is misspecified as linear can be interpreted as average marginal effects. This has been shown for models with exogenous covariates and separability between covariates and errors. In this paper, we extend this identification result to a variety of more general cases, in particular for combinations of separable and nonseparable models under both exogeneity and endogeneity. So long as the underlying model belongs to one of these large classes of data generating processes, our results show that nothing else must be known about the true DGP—beyond normality of observable data, a testable assumption—in order for linear estimators to be interpretable as average marginal effects. We use simulation to explore the performance of these estimators using a misspecified linear model and show they perform well when the data are normal but can perform poorly when this is not the case. 相似文献

19.

Interpreting statistical evidence by using imperfect models: robust adjusted likelihood functions

Richard Royall Tsung-Shan Tsou 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2003,65(2):391-404

Summary. The strength of statistical evidence is measured by the likelihood ratio. Two key performance properties of this measure are the probability of observing strong misleading evidence and the probability of observing weak evidence. For the likelihood function associated with a parametric statistical model, these probabilities have a simple large sample structure when the model is correct. Here we examine how that structure changes when the model fails. This leads to criteria for determining whether a given likelihood function is robust (continuing to perform satisfactorily when the model fails), and to a simple technique for adjusting both likelihoods and profile likelihoods to make them robust. We prove that the expected information in the robust adjusted likelihood cannot exceed the expected information in the likelihood function from a true model. We note that the robust adjusted likelihood is asymptotically fully efficient when the working model is correct, and we show that in some important examples this efficiency is retained even when the working model fails. In such cases the Bayes posterior probability distribution based on the adjusted likelihood is robust, remaining correct asymptotically even when the model for the observable random variable does not include the true distribution. Finally we note a link to standard frequentist methodology—in large samples the adjusted likelihood functions provide robust likelihood-based confidence intervals. 相似文献

20.

Mis-specification analyses of gamma and Wiener degradation processes 总被引：2，自引：0，他引：2

Chih-Chun Tsai Sheng-Tsaing Tseng N. Balakrishnan 《Journal of statistical planning and inference》2011,141(12):3725-3735

Degradation models are widely used these days to assess the lifetime information of highly reliable products if there exist some quality characteristics (QC) whose degradation over time can be related to the reliability of the product. In this study, motivated by a laser data, we investigate the mis-specification effect on the prediction of product's MTTF (mean-time-to-failure) when the degradation model is wrongly fitted. More specifically, we derive an expression for the asymptotic distribution of quasi-MLE (QMLE) of the product's MTTF when the true model comes from gamma degradation process, but is wrongly assumed to be Wiener degradation process. The penalty for the model mis-specification can then be addressed sequentially. The result demonstrates that the effect on the accuracy of the product's MTTF prediction strongly depends on the ratio of critical value to the scale parameter of the gamma degradation process. The effects on the precision of the product's MTTF prediction are observed to be serious when the shape and scale parameters of the gamma degradation process are large. We then carry out a simulation study to evaluate the penalty of the model mis-specification, using which we show that the simulation results are quite close to the theoretical ones even when the sample size and termination time are not large. For the reverse mis-specification problem, i.e., when the true degradation is a Wiener process, but is wrongly assumed to be a gamma degradation process, we carry out a Monte Carlo simulation study to examine the effect of the corresponding model mis-specification. The obtained results reveal that the effect of this model mis-specification is negligible. 相似文献