首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Mediation analysis is a popular statistical analysis verifying the relation between an independent variable and a dependent variable through a mediator. There are three traditional tests to assess indirect effects: the Baron and Kenny test (BK), the Sobel test (ST) and the bootstrap method (BT). Previous studies have showed that the BT is more powerful and more conceptually appropriate. However, no study has systematically compared these tests regarding the type I error rate. A Monte-Carlo simulation is carried out with 19 scenarios varying paths (but no indirect effect), 9 scenarios varying the direct effect, and 6 sample sizes (1056 different scenarios). Results show that the BT had an overall good performance even for small sample size and whatever the effect sizes. The ST and the BK test were conservative, especially with small sample size and low effect sizes. In conclusion, these tests should be avoided, and the BT is recommended.  相似文献   

2.
文章基于解释变量与被解释变量之间的互信息提出一种新的变量选择方法:MI-SIS。该方法可以处理解释变量数目p远大于观测样本量n的超高维问题,即p=O(exp(nε))ε>0。另外,该方法是一种不依赖于模型假设的变量选择方法。数值模拟和实证研究表明,MI-SIS方法在小样本情形下能够有效地发现微弱信号。  相似文献   

3.
In order to examine factors influencing the perception of ideal family size in Poland, "the present paper discusses path models, which explain a central reproductive behaviour category i.e. the actual number of children....[The author finds that] the image of the ideal family size directly influences the existing number of children. Path coefficients show that the increase of the ideal number of children by one standard deviation is associated with the increase of actual family size by approximately one third standard deviation of this explanatory variable." Other variables considered include parental influence, rural or urban residence, educational status, age at marriage, religion, and quality of life. (SUMMARY IN ENG)  相似文献   

4.
Techniques of credit scoring have been developed these last years in order to reduce the risk taken by banks and financial institutions in the loans that they are granting. Credit Scoring is a classification problem of individuals in one of the two following groups: defaulting borrowers or non-defaulting borrowers. The aim of this paper is to propose a new method of discrimination when the dependent variable is categorical and when a large number of categorical explanatory variables are retained. This method, Categorical Multiblock Linear Discriminant Analysis, computes components which take into account both relationships between explanatory categorical variables and canonical correlation between each explanatory categorical variable and the dependent variable. A comparison with three other techniques and an application on credit scoring data are provided.  相似文献   

5.
We investigated CART performance with a unimodal response curve for one continuous response and four continuous explanatory variables, where two variables were important (i.e. directly related to the response) and the other two were not. We explored performance under three relationship strengths and two explanatory variable conditions: equal importance and one variable four times as important as the other. We compared CART variable selection performance using three tree-selection rules ('minimum risk', 'minimum risk complexity', 'one standard error') to stepwise polynomial ordinary least squares (OLS) under four sample size conditions. The one-standard-error and minimum risk-complexity methods performed about as well as stepwise OLS with large sample sizes when the relationship was strong. With weaker relationships, equally important explanatory variables and larger sample sizes, the one-standard-error and minimum-risk-complexity rules performed better than stepwise OLS. With weaker relationships and explanatory variables of unequal importance, tree-structured methods did not perform as well as stepwise OLS. Comparing performance within tree-structured methods, with a strong relationship and equally important explanatory variables, the one-standard-error rule was more likely to choose the correct model than were the other tree-selection rules. The minimum-risk-complexity rule was more likely to choose the correct model than were the other tree-selection rules (1) with weaker relationships and equally important explanatory variables; and (2) under all relationship strengths when explanatory variables were of unequal importance and sample sizes were lower.  相似文献   

6.
This paper explores the effect of sample size, scale of parameters and size of the choice set on the maximum likelihood estimator of the parameters of the multinomial logit model. Data were generated by simulations under a three-way factorial experimental design for logit models containing three, four and five explanatory variables. Simulation data were analyzed by analysis of covariance and a regression model of the performance measure, the log root mean-squared error (LRMSE), fitted against the three factors and their interactions. Several important conclusions emerged. First, the LRMSE improves, but at a decreasing rate, with increases in the model's degrees of freedom. Second, the number of choice alternatives in the decision makers' choice sets has a significant impact on the LRMSE; however, heterogeneity in the choice sets across the sample has little or no impact. Finally, the scale of parameters and all of its two-way interactions with the other two factors significantly affect the LRMSE. Using the regression results, a family of iso-LRMSE curves are derived in the space of model degrees of freedom and scale of parameters. Their implications for researchers in choosing sample size and scale of parameters is discussed.  相似文献   

7.
Ridge regression solves multicollinearity problems by introducing a biasing parameter that is called ridge parameter; it shrinks the estimates and their standard errors in order to reach acceptable results. Selection of the ridge parameter was done using several subjective and objective techniques that are concerned with certain criteria. In this study, selection of the ridge parameter depends on other important statistical measures to reach a better value of the ridge parameter. The proposed ridge parameter selection technique depends on a mathematical programming model and the results are evaluated using a simulation study. The performance of the proposed method is good when the error variance is greater than or equal to one; the sample consists of 20 observations, the number of explanatory variables in the model is 2, and there is a very strong correlation between the two explanatory variables.  相似文献   

8.
通过蒙特卡罗模拟技术揭示各种HAC法在平稳过程伪回归中的适用性.研究发现,与核权函数HAC相比,预白化HAC法具有明显的优势;进一步的研究表明相对于被解释变量的持久性,解释变量的持久性对HAC的影响较大;当数据过程是高阶自回归过程时,在样本容量不是很大的情况下预白化方法的拒绝率会随着阶数增加而增大,只有在样本容量较大和BIC信息准则情况下预白化HAC的拒绝率才接近检验水平.  相似文献   

9.
Here we consider a multinomial probit regression model where the number of variables substantially exceeds the sample size and only a subset of the available variables is associated with the response. Thus selecting a small number of relevant variables for classification has received a great deal of attention. Generally when the number of variables is substantial, sparsity-enforcing priors for the regression coefficients are called for on grounds of predictive generalization and computational ease. In this paper, we propose a sparse Bayesian variable selection method in multinomial probit regression model for multi-class classification. The performance of our proposed method is demonstrated with one simulated data and three well-known gene expression profiling data: breast cancer data, leukemia data, and small round blue-cell tumors. The results show that compared with other methods, our method is able to select the relevant variables and can obtain competitive classification accuracy with a small subset of relevant genes.  相似文献   

10.
The so-called “fixed effects” approach to the estimation of panel data models suffers from the limitation that it is not possible to estimate the coefficients on explanatory variables that are time-invariant. This is in contrast to a “random effects” approach, which achieves this by making much stronger assumptions on the relationship between the explanatory variables and the individual-specific effect. In a linear model, it is possible to obtain the best of both worlds by making random effects-type assumptions on the time-invariant explanatory variables while maintaining the flexibility of a fixed effects approach when it comes to the time-varying covariates. This article attempts to do the same for some popular nonlinear models.  相似文献   

11.
Varying coefficient models are a useful statistical tool to explore dynamic patterns of a regression relationship, in which the variation features of the regression coefficients are taken as the main evidence to reflect the dynamic relationship between the response and the explanatory variables. In this study, we propose a SiZer approach as a visually diagnostic device to uncover the statistically significant features of the coefficients. This method can highlight the significant structures of the coefficients under different scales and can therefore extract relatively full information in the data. The simulation studies and real-world data analysis show that the SiZer approach performs satisfactorily in mining the significant features of the coefficients.  相似文献   

12.
Sample size and correlation coefficient of populations are the most important factors which influence the statistical significance of the sample correlation coefficient. It is observed that for evaluating the hypothesis when the observed value of the correlation coefficient's r is different from zero, Fisher's Z transformation may be incorrect for small samples especially when population correlation coefficient ρ has big values. In this study, a simulation program has been generated for to illustrate how the bias in the Fisher transformation of the correlation coefficient affects estimate precision when sample size is small and ρ has big value. By the simulation results, 90 and 95% confidence intervals of correlation coefficients have been created and tabled. As a result, it is suggested that especially when ρ is greater than 0.2 and sample sizes of 18 or less, Tables 1 and 2 can be used for the significance test in correlations.  相似文献   

13.
Suppose that the conditional density of a response variable given a vector of explanatory variables is parametrically modelled, and that data are collected by a two-phase sampling design. First, a simple random sample is drawn from the population. The stratum membership in a finite number of strata of the response and explanatory variables is recorded for each unit. Second, a subsample is drawn from the phase-one sample such that the selection probability is determined by the stratum membership. The response and explanatory variables are fully measured at this phase. We synthesize existing results on nonparametric likelihood estimation and present a streamlined approach for the computation and the large sample theory of profile likelihood in four different situations. The amount of information in terms of data and assumptions varies depending on whether the phase-one data are retained, the selection probabilities are known, and/or the stratum probabilities are known. We establish and illustrate numerically the order of efficiency among the maximum likelihood estimators, according to the amount of information utilized, in the four situations.  相似文献   

14.
This paper considers several estimators for estimating the restricted ridge parameter estimators. A simulation study has been conducted to compare the performance of these estimators. Based on the simulation study we found that, increasing the correlation between the independent variables has positive effect on the mean square error (MSE). However, increasing the value of ρ has negative effect on MSE. When the sample size increases, the MSE decreases even when the correlation between the independent variables is large. Two real life examples have been considered to illustrate the performance of the estimators.  相似文献   

15.
删除截距项和遗漏解释变量是线性回归模型估计中的两个常见错误,删除截距项错误发生的原因是检验过程中发现其不显著而将其剔除,这会造成模型参数估计和假设检验的失真;遗漏解释变量的错误发生原因是人们错误认为只要变量存在相关性且存在因果联系就可以进行回归分析,以至于不考虑其它重要的解释变量,此时建立的模型不能用于经济结构分析和政策评价,最多只能用于预测目的。  相似文献   

16.
Summary.  The problem of component choice in regression-based prediction has a long history. The main cases where important choices must be made are functional data analysis, and problems in which the explanatory variables are relatively high dimensional vectors. Indeed, principal component analysis has become the basis for methods for functional linear regression. In this context the number of components can also be interpreted as a smoothing parameter, and so the viewpoint is a little different from that for standard linear regression. However, arguments for and against conventional component choice methods are relevant to both settings and have received significant recent attention. We give a theoretical argument, which is applicable in a wide variety of settings, justifying the conventional approach. Although our result is of minimax type, it is not asymptotic in nature; it holds for each sample size. Motivated by the insight that is gained from this analysis, we give theoretical and numerical justification for cross-validation choice of the number of components that is used for prediction. In particular we show that cross-validation leads to asymptotic minimization of mean summed squared error, in settings which include functional data analysis.  相似文献   

17.
In regression analyses of spatially structured data, it is common practice to introduce spatially correlated random effects into the regression model to reduce or even avoid unobserved variable bias in the estimation of other covariate effects. If besides the response the covariates are also spatially correlated, the spatial effects may confound the effect of the covariates or vice versa. In this case, the model fails to identify the true covariate effect due to multicollinearity. For highly collinear continuous covariates, path analysis and structural equation modeling techniques prove to be helpful to disentangle direct covariate effects from indirect covariate effects arising from correlation with other variables. This work discusses the applicability of these techniques in regression setups, where spatial and covariate effects coincide at least partly and classical geoadditive models fail to separate these effects. Supplementary materials for this article are available online.  相似文献   

18.
Latent class analysis (LCA) has been found to have important applications in social and behavioural sciences for modelling categorical response variables, and non-response is typical when collecting data. In this study, the non-response mainly included ‘contingency questions’ and real ‘missing data’. The primary objective of this study was to evaluate the effects of some potential factors on model selection indices in LCA with non-response data. We simulated missing data with contingency question and evaluated the accuracy rates of eight information criteria for selecting the correct models. The results showed that the main factors are latent class proportions, conditional probabilities, sample size, the number of items, the missing data rate and the contingency data rate. Interactions of the conditional probabilities with class proportions, sample size and the number of items are also significant. From our simulation results, the impact of missing data and contingency questions can be amended by increasing the sample size or the number of items.  相似文献   

19.
In this study, our aim was to investigate the changes of different data structures and different sample sizes on the structural equation modeling and the influence of these factors on the model fit measures. Examining the created structural equation modeling under different data structures and sample sizes, the evaluation of model fit measures were performed with a simulation study. As a result of the simulation study, optimization and negative variance estimation problems have been encountered depending on the sample size and changing correlations. It was observed that these problems disappeared either by increasing the sample size or the correlations between the variables in factor. For upcoming studies, the choice of RMSEA and IFI model fit measures can be suggested in all sample sizes and the correlation values for data sets are ensured the multivariate normal distribution assumption.  相似文献   

20.
ABSTRACT

The broken-stick (BS) is a popular stopping rule in ecology to determine the number of meaningful components of principal component analysis. However, its properties have not been systematically investigated. The purpose of the current study is to evaluate its ability to detect the correct dimensionality in a data set and whether it tends to over- or underestimate it. A Monte Carlo protocol was carried out. Two main correlation matrices deemed usual in practice were used with three levels of correlation (0, 0.10 and 0.30) between components (generating oblique structure) and with different sample sizes. Analyses of the population correlation matrices indicated that, for extremely large sample sizes, the BS method could be correct for only one of the six simulated structure. It actually failed to identify the correct dimensionality half the time with orthogonal structures and did even worse with some oblique ones. In harder conditions, results show that the power of the BS decreases as sample size increases: weakening its usefulness in practice. Since the BS method seems unlikely to identify the underlying dimensionality of the data, and given that better stopping rules exist it appears as a poor choice when carrying principal component analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号