首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We suggest a method for constructing a multidimensional distribution of correlated categorical data with fixed marginal distributions and specified degrees of association based on the log-linear models. A convex combination approach by Lee (1997 Lee , A. J. ( 1997 ). Some simple methods for generating correlated categorical variates . Computational Statistics & Data Analysis 26 : 133148 .[Crossref], [Web of Science ®] [Google Scholar]) is applied to get a joint distribution with fixed Pearson chi-square coefficient. By using the suggested method, we can generate three-dimensional distributions which have a fixed association among three variables. Therefore, the suggested method could be extended to higher dimensions.  相似文献   

2.
Lasso has been widely used for variable selection because of its sparsity, and a number of its extensions have been developed. In this article, we propose a robust variant of Lasso for the time-course multivariate response, and develop an algorithm which transforms the optimization into a sequence of ridge regressions. The proposed method enables us to effectively handle multivariate responses and employs a basis representation of the regression parameters to reduce the dimensionality. We assess the proposed method through simulation and apply it to the microarray data.  相似文献   

3.
Supersaturated designs are a large class of factorial designs which can be used for screening out the important factors from a large set of potentially active variables. The huge advantage of these designs is that they reduce the experimental cost drastically, but their critical disadvantage is the confounding involved in the statistical analysis. In this article, we propose a method for analyzing data using several types of supersaturated designs. Modifications of widely used information criteria are given and applied to the variable selection procedure for the identification of the active factors. The effectiveness of the proposed method is depicted via simulated experiments and comparisons.  相似文献   

4.
A new method is proposed for measuring the distance between a training data set and a single, new observation. The novel distance measure reflects the expected squared prediction error when a quantitative response variable is predicted on the basis of the training data set using the distance weighted k-nearest-neighbor method. The simulation presented here shows that the distance measure correlates well with the true expected squared prediction error in practice. The distance measure can be applied, for example, in assessing the uncertainty of prediction.  相似文献   

5.
6.
In this article, we study stepwise AIC method for variable selection comparing with other stepwise method for variable selection, such as, Partial F, Partial Correlation, and Semi-Partial Correlation in linear regression modeling. Then we show mathematically that the stepwise AIC method and other stepwise methods lead to the same method as Partial F. Hence, there are more reasons to use the stepwise AIC method than the other stepwise methods for variable selection, since the stepwise AIC method is a model selection method that can be easily managed and can be widely extended to more generalized models and applied to non normally distributed data. We also treat problems that always appear in applications, that are validation of selected variables and problem of collinearity.  相似文献   

7.
In many medical studies, event times are recorded in an interval-censored (IC) format. For example, in numerous cancer trials, time to disease relapse is only known to have occurred between two consecutive clinic visits. Many existing modeling methods in the IC context are computationally intensive and usually require numerous assumptions that could be unrealistic or difficult to verify in practice. We propose a flexible and computationally efficient modeling strategy based on jackknife pseudo-observations (POs). The POs obtained based on nonparametric estimators of the survival function are employed as outcomes in an equivalent, yet simpler regression model that produces consistent covariate effect estimates. Hence, instead of operating in the IC context, the problem is translated into the realm of generalized linear models, where numerous options are available. Outcome transformations via appropriate link functions lead to familiar modeling contexts such as the proportional hazards and proportional odds. Moreover, the methods developed are not limited to these settings and have broader applicability. Simulations studies show that the proposed methods produce virtually unbiased covariate effect estimates, even for moderate sample sizes. An example from the International Breast Cancer Study Group (IBCSG) Trial VI further illustrates the practical advantages of this new approach.  相似文献   

8.
Often, categorical ordinal data are clustered using a well-defined similarity measure for this kind of data and then using a clustering algorithm not specifically developed for them. The aim of this article is to introduce a new clustering method suitably planned for ordinal data. Objects are grouped using a multinomial model, a cluster tree and a pruning strategy. Two types of pruning are analyzed through simulations. The proposed method allows to overcome two typical problems of cluster analysis: the choice of the number of groups and the scale invariance.  相似文献   

9.
一种针对不完全观测数据的消费者聚类方法   总被引:2,自引:2,他引:0  
现有聚类方法都是基于消费者全部的行为信息,对于观测不完全的信息,提出了三阶段聚类方法。首先,使用样本数据的全部信息对消费者聚类;接着仅使用人口统计变量建立分类模型;最后对上述结果进行修正。三阶段聚类方法最大优点是可以将没有入选样本的个体分配到由样本个体得到的行为集群中去,将这个方法应用于电视行业,得到了很有实际应有价值的结果。  相似文献   

10.
The effect of a single variable data point, x, on the usual test statistics for traditional hypothesis tests for means is analyzed. It is shown that an outlier may have a profound and unexpected effect on the test statistic. Although it might appear that an outlier would tend to lend support to the alternate hypothesis, it may in fact detract from the significance of the test. In one-population tests and analysis of variance (ANOVA), the value of x that maximizes the significance of the test statistic is given. This value does not have to be unusually large or small. In fact, it often falls within the range of the other sample points. In the general one-population case, the limiting value for the test statistic is shown to be +1. In the case involving more than one population, it is shown that the limiting value of the test statistic is a function only of the number of members in the samples and not their relative values. Special cases are identified in which the test statistic is shown to have unique characteristics depending on the characteristics of the data.  相似文献   

11.
高维面板数据降维与变量选择方法研究   总被引:2,自引:1,他引:2  
从介绍高维面板数据的一般特征入手,在总结高维面板数据在实际应用中所表现出的各种不同类型及其研究理论与方法的同时,主要介绍高维面板数据因子模型和混合效应模型;对混合效应模型随机效应和边际效应中的高维协方差矩阵以及经济数据中出现的多指标大维数据的研究进展进行述评;针对高维面板数据未来的发展方向、理论与应用中尚待解决的一些关键问题进行分析与展望。  相似文献   

12.
We propose a penalized quantile regression for partially linear varying coefficient (VC) model with longitudinal data to select relevant non parametric and parametric components simultaneously. Selection consistency and oracle property are established. Furthermore, if linear part and VC part are unknown, we propose a new unified method, which can do three types of selections: separation of varying and constant effects, selection of relevant variables, and it can be carried out conveniently in one step. Consistency in the three types of selections and oracle property in estimation are established as well. Simulation studies and real data analysis also confirm our method.  相似文献   

13.
This article investigates power and size of some tests for exogeneity of a binary explanatory variable in count models by conducting extensive Monte Carlo simulations. The tests under consideration are Hausman contrast tests as well as univariate Wald tests, including a new test of notably easy implementation. Performance of the tests is explored under misspecification of the underlying model and under different conditions regarding the instruments. The results indicate that often the tests that are simpler to estimate outperform tests that are more demanding. This is especially the case for the new test.  相似文献   

14.
文章研究了纵向数据半参数Logistic回归模型的估计问题,给出了模型中未知参数和未知函数的估计方法,探讨了参数部分的变量选择问题,并对不同的变量选择方法进行比较分析.从模拟结果可以看到,文中给出的方法具有很好的估计效果.  相似文献   

15.
This article proposes a variable selection procedure for partially linear models with right-censored data via penalized least squares. We apply the SCAD penalty to select significant variables and estimate unknown parameters simultaneously. The sampling properties for the proposed procedure are investigated. The rate of convergence and the asymptotic normality of the proposed estimators are established. Furthermore, the SCAD-penalized estimators of the nonzero coefficients are shown to have the asymptotic oracle property. In addition, an iterative algorithm is proposed to find the solution of the penalized least squares. Simulation studies are conducted to examine the finite sample performance of the proposed method.  相似文献   

16.
k-POD: A Method for k-Means Clustering of Missing Data   总被引:1,自引:0,他引:1  
The k-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, are common in many applications. Mainstream approaches to clustering missing data reduce the missing data problem to a complete data formulation through either deletion or imputation but these solutions may incur significant costs. Our k-POD method presents a simple extension of k-means clustering for missing data that works even when the missingness mechanism is unknown, when external information is unavailable, and when there is significant missingness in the data.

[Received November 2014. Revised August 2015.]  相似文献   

17.
Abstract.  Variable selection is an important issue in all regression analyses, and in this paper we discuss this in the context of regression analysis of panel count data. Panel count data often occur in long-term studies that concern occurrence rate of a recurrent event, and their analysis has recently attracted a great deal of attention. However, there does not seem to exist any established approach for variable selection with respect to panel count data. For the problem, we adopt the idea behind the non-concave penalized likelihood approach and develop a non-concave penalized estimating function approach. The proposed methodology selects variables and estimates regression coefficients simultaneously, and an algorithm is presented for this process. We show that the proposed procedure performs as well as the oracle procedure in that it yields the estimates as if the correct submodel were known. Simulation studies are conducted for assessing the performance of the proposed approach and suggest that it works well for practical situations. An illustrative example from a cancer study is provided.  相似文献   

18.
In this article, we consider the problem of variable selection and estimation with the strongly correlated multi-collinear data by using grouping variable selection techniques. A new grouping variable selection method, called weight-fused elastic net(WFEN), is proposed to deal with the high dimensional collinear data. The proposed model, combined two different grouping effect mechanisms induced by the elastic net and weight-fused LASSO, respectively, can be easily unified in the frame of LASSO and computed efficiently. The performance with the simulation and real data sets shows that our method is competitive with other related methods, especially when the data present high multi-collinearity.  相似文献   

19.
In the analysis of time-to-event data, restricted mean survival time has been well investigated in the literature and provided by many commercial software packages, while calculating mean survival time remains as a challenge due to censoring or insufficient follow-up time. Several researchers have proposed a hybrid estimator of mean survival based on the Kaplan–Meier curve with an extrapolated tail. However, this approach often leads to biased estimate due to poor estimate of the parameters in the extrapolated “tail” and the large variability associated with the tail of the Kaplan–Meier curve due to small set of patients at risk. Two key challenges in this approach are (1) where the extrapolation should start and (2) how to estimate the parameters for the extrapolated tail. The authors propose a novel approach to calculate mean survival time to address these two challenges. In the proposed approach, an algorithm is used to search if there are any time points where the hazard rates change significantly. The survival function is estimated by the Kaplan–Meier method prior to the last change point and approximated by an exponential function beyond the last change point. The parameter in the exponential function is estimated locally. Mean survival time is derived based on this survival function. The simulation and case studies demonstrated the superiority of the proposed approach.  相似文献   

20.
苏为华等 《统计研究》2015,32(7):100-105
考虑时间和评价主体的双重动态情形,提出了一种动态群组评价方法。首先,分别测度了评价意见的横向冲突和纵向冲突,以提高评价结论的可靠性;其次,针对存在变动的主体,基于其与未变动主体评价意见的一致程度选择“时间效应”,以实现不同时间维度的评价意见的可比;再次,通过将主体意见分解成时间效应、个体效应、基础效应和冲突效应,以进行动态集成与评价。最后,给出了应用实例,结果表明该方法具有可行性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号