共查询到20条相似文献,搜索用时 15 毫秒
1.
《Journal of Statistical Computation and Simulation》2012,82(1-2):117-126
In its application to variable selection in the linear model, cross-validation is traditionally applied to an individual model contained in a set of potential models. Each model in the set is cross-validated independently of the rest and the model with the smallest cross-validated sum of squares is selected. In such settings, an efficient algorithm for cross-validation must be able to add and to delete single points quickly from a mixed model. Recent work in variable selection has applied cross-validation to an entire process of variable selection, such as Backward Elimination or Stepwise regression (Thall, Simon and Grier, 1992). The cross-validated version of Backward Elimination, for example, divides the data into an estimation and validation set and performs a complete Backward Elimination on the estimation set, while computing the cross-validated sum of squares at each step with the validation set. After doing this process once, a different validation set is selected and the process is repeated. The final model selection is based on the cross-validated sum of squares for all Backward Eliminations. An optimal algorithm for this application of cross-validation need not be efficient in adding and deleting observations from a single model but must be efficient in computing the cross-validation sum of squares from a series of models using a common validation set. This paper explores such an algorithm based on the sweep operator. 相似文献
2.
The weighted generalized estimating equation (WGEE), an extension of the generalized estimating equation (GEE) method, is a method for analyzing incomplete longitudinal data. An inappropriate specification of the working correlation structure results in the loss of efficiency of the GEE estimation. In this study, we evaluated the efficiency of WGEE estimation for incomplete longitudinal data when the working correlation structure was misspecified. As a result, we found that the efficiency of the WGEE estimation was lower when an improper working correlation structure was selected, similar to the case of the GEE method. Furthermore, we modified the criterion proposed by Gosho et al. (2011) for selecting a working correlation structure, such that the GEE and WGEE methods can be applied to incomplete longitudinal data, and we investigated the performance of the modified criterion. The results revealed that when the modified criterion was adopted, the proportion that the true correlation structure was selected was likely higher than that in the case of adopting other competing approaches. 相似文献
3.
对半参数变系数回归模型,构造了新的空间相关性检验统计量,利用三阶矩 逼近方法导出了其检验 值的近似计算公式,蒙特卡罗模拟结果表明该统计量在检测空间相关性方面具有较高的准确性和可靠性。同时考察了误差项服从不同分布时的检验功效,体现出该检验方法的稳健性。进一步,我们还给出了检验统计量的Bootstrap方法以及检验水平的模拟效果。 相似文献
4.
In Statistics of Extremes, the estimation of parameters of extreme or even rare events is usually done under a semi-parametric framework. The estimators are based on the largest k-ordered statistics in the sample or on the excesses over a high level u. Although showing good asymptotic properties, most of those estimators present a strong dependence on k or u with high bias when the k increases or the level u decreases. The use of resampling methodologies has revealed to be promising in the reduction of the bias and in the choice of k or u. Different approaches for resampling need to be considered depending on whether we are in an independent or in a dependent setup. A great amount of investigation has been performed for the independent situation. The main objective of this article is to use bootstrap and jackknife methods in the context of dependence to obtain more stable estimators of a parameter that appears characterizing the degree of local dependence on extremes, the so-called extremal index. A simulation study illustrates the application of those methods. 相似文献
5.
Recently, least absolute deviations (LAD) estimator for median regression models with doubly censored data was proposed and the asymptotic normality of the estimator was established. However, it is invalid to make inference on the regression parameter vectors, because the asymptotic covariance matrices are difficult to estimate reliably since they involve conditional densities of error terms. In this article, three methods, which are based on bootstrap, random weighting, and empirical likelihood, respectively, and do not require density estimation, are proposed for making inference for the doubly censored median regression models. Simulations are also done to assess the performance of the proposed methods. 相似文献
6.
Timothy J. Tyrrell 《商业与经济统计学杂志》2013,31(3):249-252
Polynomials are commonly used in linear regression models to capture nonlinearities in explanatory variables. It is less common, however, that polynomials are used to shift the regression coefficients, an exception being the use of polynomially distributed lag coefficients. This note recommends the technique for a wider range of applications and suggests the Lagrangian interpolation representation as the most convenient for practitioners. 相似文献
7.
Currently there is much interest in using microarray gene-expression data to form prediction rules for the diagnosis of patient outcomes. A process of gene selection is usually carried out first to find those genes that are most useful according to some criterion for distinguishing between the given classes of tissue samples. However, there is a bias (selection bias) introduced in the estimate of the final version of a prediction rule that has been formed from a smaller subset of the genes that have been selected according to some optimality criterion. In this paper, we focus on the bias that arises when a full data set is not available in the first instance and the prediction rule is formed subsequently by working with the top-ranked genes from the full set. We demonstrate how large the subset of top genes must be before this selection bias is not of practical consequence. 相似文献
8.
线性回归模型Bootstrap LM-Lag检验有效性研究 总被引:2,自引:0,他引:2
基于OLS估计残差,将Bootstrap方法用于空间滞后相关LM-Lag检验。在不同的误差结构和空间权重矩阵条件下,比较Bootstrap LM-Lag检验和渐近检验的水平扭曲和功效。通过Monte Carlo实验表明,当误差项不服从经典正态分布假设时,LM-Lag渐近检验存在严重的水平扭曲,Bootstrap检验能够有效地校正水平扭曲,并且Bootstrap LM-Lag检验的功效与渐近检验近似;无论误差项是否服从正态分布,从水平扭曲和功效角度看,线性回归模型Bootstrap LM-Lag检验有效。 相似文献
9.
We study the finite-sample properties of White's test for heteroskedasticity in fixed and stochastic regression models. We compare by simulation White and bootstrap methods when the underlying distribution is symmetric as well as asymmetric. The superior performance of the bootstrap method in small samples does not hold when the underlying distribution is asymmetric. 相似文献
10.
《统计学通讯:理论与方法》2012,41(16-17):3020-3029
Standard asymptotic chi-square distribution of the likelihood ratio and score statistics under the null hypothesis does not hold when the parameter value is on the boundary of the parameter space. In mixed models it is of interest to test for a zero random effect variance component. Some available tests for the variance component are reviewed and a new test within the permutation framework is presented. The power and significance level of the different tests are investigated by means of a Monte Carlo simulation study. The proposed test has a significance level closer to the nominal one and it is more powerful. 相似文献
11.
Calvin F. Schmid 《The American statistician》2013,67(3):238-244
Although the semilogarithmic chart is one of the most effective, adaptable, and dependable graphic forms, it has been subject to neglect and indifference by contemporary chart makers. At the present time, a large proportion of chart makers are simply not aware of the distinctive qualities and unusual potentials of the semilogarithmic chart. In addition to discussions of the characteristics, applications, interpretation, advantages, and disadvantages of the semilogarithmic chart, actual cases of ill-chosen graphic forms and substandard charts are analyzed and compared, which clearly demonstrates the superiority of the semilogarithmic chart for many purposes. 相似文献
12.
本文在贝叶斯分析的框架下讨论了面板数据的可加模型分位回归建模方法。首先通过低秩薄板惩罚样条展开和个体效应虚拟变量的引进将非参数模型转换为参数模型,然后在假定随机误差项服从非对称Laplace分布的基础上建立了贝叶斯分层分位回归模型。通过对非对称Laplace分布的分解,论文给出了所有待估参数的条件后验分布,并构造了待估参数的 Gibbs抽样估计算法。计算机模拟仿真结果显示,新提出的方法相比于传统的可加模型均值回归方法在估计稳健性上明显占优。最后以消费支出面板数据为例研究了我国农村居民收入结构对消费支出的影响,发现对于农村居民来说,无论是高、中、低消费群体,工资性收入与经营净收入的增加对其消费支出的正向刺激作用更为明显。进一步,相比于高消费农村居民人群,低消费农村居民人群随着收入的增加消费支出上升速度较为缓慢。 相似文献
13.
《统计学通讯:理论与方法》2012,41(13-14):2465-2489
The Akaike information criterion, AIC, and Mallows’ C p statistic have been proposed for selecting a smaller number of regressors in the multivariate regression models with fully unknown covariance matrix. All of these criteria are, however, based on the implicit assumption that the sample size is substantially larger than the dimension of the covariance matrix. To obtain a stable estimator of the covariance matrix, it is required that the dimension of the covariance matrix is much smaller than the sample size. When the dimension is close to the sample size, it is necessary to use ridge-type estimators for the covariance matrix. In this article, we use a ridge-type estimators for the covariance matrix and obtain the modified AIC and modified C p statistic under the asymptotic theory that both the sample size and the dimension go to infinity. It is numerically shown that these modified procedures perform very well in the sense of selecting the true model in large dimensional cases. 相似文献
14.
We introduce the notion of weak approaching and conditionally weak approaching sequences of random processes. This notion generalizes the conventional weak convergence, and has been proposed for real valued random variables in Belyaev (1995). Some of the standard tools for an investigation of the behaviour of weak approaching sequences of random elements in metric spaces are developed. The spaces of smoothed and right-continuous functions with left-hand limits are considered. This technique allows us to use the resampling approach for an evaluation of distributions of continuous functionals on realizations of sum of an increasing number of independent random processes. Two numerical examples are presented for such functionals as supremum and number of level crossings. 相似文献
15.
Stuart R. Lipsitz Garrett M. Fitzmaurice Geert Molenberghs & Lue Ping Zhao 《Journal of the Royal Statistical Society. Series C, Applied statistics》1997,46(4):463-476
Patients infected with the human immunodeficiency virus (HIV) generally experience a decline in their CD4 cell count (a count of certain white blood cells). We describe the use of quantile regression methods to analyse longitudinal data on CD4 cell counts from 1300 patients who participated in clinical trials that compared two therapeutic treatments: zidovudine and didanosine. It is of scientific interest to determine any treatment differences in the CD4 cell counts over a short treatment period. However, the analysis of the CD4 data is complicated by drop-outs: patients with lower CD4 cell counts at the base-line appear more likely to drop out at later measurement occasions. Motivated by this example, we describe the use of `weighted' estimating equations in quantile regression models for longitudinal data with drop-outs. In particular, the conventional estimating equations for the quantile regression parameters are weighted inversely proportionally to the probability of drop-out. This approach requires the process generating the missing data to be estimable but makes no assumptions about the distribution of the responses other than those imposed by the quantile regression model. This method yields consistent estimates of the quantile regression parameters provided that the model for drop-out has been correctly specified. The methodology proposed is applied to the CD4 cell count data and the results are compared with those obtained from an `unweighted' analysis. These results demonstrate how an analysis that fails to account for drop-outs can mislead. 相似文献
16.
When making patient-specific prediction, it is important to compare prediction models to evaluate the gain in prediction accuracy for including additional covariates. We propose two statistical testing methods, the complete data permutation (CDP) and the permutation cross-validation (PCV) for comparing prediction models. We simulate clinical trial settings extensively and show that both methods are robust and achieve almost correct test sizes; the methods have comparable power in moderate to large sample situations, while the CDP is more efficient in computation. The methods are also applied to ovarian cancer clinical trial data. 相似文献
17.
18.
Martin A. Tanner 《The American statistician》2013,67(4):306-310
Reasons for including investigations in a first course in statistics are presented. Investigations create an environment of participation and give the student the opportunity to experience statistics in action. This participation highlights the interaction between science and statistics. Suggestions are made regarding the integration of investigations into a formal course environment. One investigation is presented in detail. Thirteen other investigations are outlined. Emphasis is placed on experiments that require minimal set-up time yet illustrate important statistical concepts. 相似文献
19.
由于不同国家死亡率改善现象不同,世界各国所使用的死亡率模型皆不尽相同,而且不同年龄段的死亡率模型也不同。实际中,我们常常采用Gompertz模型、Makeham模型、Weibull模型等拟合高年龄段人口的死亡率,但是因高年龄段人口的死亡数据资料不够充分,较少有人以统计的观点给出模型适合性的检验过程。因此本研究提出利用Bootstrap方法检验死亡模型假设的方法,包括模型适合性的检验、参数估计、参数假设检验等。最后,本文应用中国1997-2007年65-89岁人口的粗死亡率数据,提出适合的死亡模型,然后给出利用Bootstrap方法进行死亡模型检验的全过程。 相似文献
20.
This article proposes a group bridge estimator to select the correct number of factors in approximate factor models. It contributes to the literature on shrinkage estimation and factor models by extending the conventional bridge estimator from a single equation to a large panel context. The proposed estimator can consistently estimate the factor loadings of relevant factors and shrink the loadings of irrelevant factors to zero with a probability approaching one. Hence, it provides a consistent estimate for the number of factors. We also propose an algorithm for the new estimator; Monte Carlo experiments show that our algorithm converges reasonably fast and that our estimator has very good performance in small samples. An empirical example is also presented based on a commonly used U.S. macroeconomic dataset. 相似文献