首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 125 毫秒
In this article, a generalized linear mixed model (GLMM) based on a frequentist approach is employed to examine spatial trend of asthma data. However, the frequentist analysis of GLMM is computationally difficult. On the other hand, the Bayesian analysis of GLMM has been computationally convenient due to the advent of Markov chain Monte Carlo algorithms. Recently developed data cloning (DC) method, which yields to maximum likelihood estimate, provides frequentist approach to complex mixed models and equally computationally convenient method. We use DC to conduct frequentist analysis of spatial models. The advantages of the DC approach are that the answers are independent of the choice of the priors, non-estimable parameters are flagged automatically, and the possibility of improper posterior distributions is completely avoided. We illustrate this approach using a real dataset of asthma visits to hospital in the province of Manitoba, Canada, during 2000–2010. The performance of the DC approach in our application is also studied through a simulation study.  相似文献   

"Population estimates from the 1990 Post-Enumeration Survey (PES), used to measure decennial census undercount, were obtained from dual system estimates (DSE's) that assumed independence within strata defined by age-race-sex-geography and other variables. We make this independence assumption for females, but develop methods to avoid the independence assumption for males within strata by using national level sex ratios from demographic analysis (DA).... We consider several...alternative DSE's, and use DA results for 1990 to apply them to data from the 1990 U.S. census and PES."  相似文献   

A commonly used procedure for reduction of the number of variables in linear discriminant analysis is the stepwise method for variable selection. Although often criticized, when used carefully, this method can be a useful prelude to a further analysis. The contribution of a variable to the discriminatory power of the model is usually measured by the maximum likelihood ratio criterion, referred to as Wilks’ lambda. It is well known that the Wilks’ lambda statistic is extremely sensitive to the influence of outliers. In this work a robust version of the Wilks’ lambda statistic will be constructed based on the Minimum Covariance Discriminant (MCD) estimator and its reweighed version which has a higher efficiency. Taking advantage of the availability of a fast algorithm for computing the MCD a simulation study will be done to evaluate the performance of this statistic. The presentation of material in this article does not imply the expression of any opinion whatsoever on the part of Austro Control GmbH and is the sole responsibility of the authors.  相似文献   

判别分析与Logistic回归的模拟比较   总被引:2,自引:1,他引:1  
利用随机模拟方法,研究判别分析和Logistic回归分类的回判正确率。模拟结果显示,Logistic回归的回判正确率优于判别分析。随着随机误差的增大,Logistic回归与判别分析的回判正确率差异逐渐减小。随机误差超过一定界限,Logistic回归的回判正确率低于判别分析。在随机模拟的基础上,引入修正Logistic回归分类,模拟结果显示,修正Logistic回归分类略优于Logistic回归。  相似文献   

The concepts of defining contrast (DC), generalized defining relationship (GDR) and aliasing structure (AS) are now well established in the terminology of regression analysis and factorial design theory. There is no complete agreement in the literature about the meaning of regular and irregular fractional factorial designs. This paper provides a workable definition of a regular fraction from a symmetrial prime-powered factorial. It characterizes the uniqueness of the GDR for fractions from the most general factorial. Results are also présentés on the uniqueness of the GDR for regular designs, on orthogonality aspects of regular and irregular designs, and on group-theoretic generation of the complete aliasing structure. Examples are provided to illustrate the developments.  相似文献   

Discriminant and cluster analysis of high-dimensional time series data have been an urgent need in more and more academic fields. To settle the always-existing problem of bias in distance-based classifiers for high-dimensional models, we consider a new classifier with jackknife-type bias adjustment for stationary time series data. The consistency of the classifier is theoretically shown under suitable conditions, including the situations of possibly high-dimensional data. We also conduct the cluster analysis for real financial data.  相似文献   

In this paper, we propose a flexible cure rate survival model by assuming that the number of competing causes of the event of interest follows the Negative Binomial distribution and the time to event follows a Weibull distribution. Indeed, we introduce the Weibull-Negative-Binomial (WNB) distribution, which can be used in order to model survival data when the hazard rate function is increasing, decreasing and some non-monotonous shaped. Another advantage of the proposed model is that it has some distributions commonly used in lifetime analysis as particular cases. Moreover, the proposed model includes as special cases some of the well-know cure rate models discussed in the literature. We consider a frequentist analysis for parameter estimation of a WNB model with cure rate. Then, we derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and present some ways to perform global influence analysis. Finally, the methodology is illustrated on a medical data.  相似文献   

When Gaussian errors are inappropriate in a multivariate linear regression setting, it is often assumed that the errors are iid from a distribution that is a scale mixture of multivariate normals. Combining this robust regression model with a default prior on the unknown parameters results in a highly intractable posterior density. Fortunately, there is a simple data augmentation (DA) algorithm and a corresponding Haar PX‐DA algorithm that can be used to explore this posterior. This paper provides conditions (on the mixing density) for geometric ergodicity of the Markov chains underlying these Markov chain Monte Carlo algorithms. Letting d denote the dimension of the response, the main result shows that the DA and Haar PX‐DA Markov chains are geometrically ergodic whenever the mixing density is generalized inverse Gaussian, log‐normal, inverted Gamma (with shape parameter larger than d /2) or Fréchet (with shape parameter larger than d /2). The results also apply to certain subsets of the Gamma, F and Weibull families.  相似文献   

The Linear Discriminant Rule (LD) is theoretically justified for use in classification when the population within-groups covariance matrices are equal, something rarely known in practice. As an alternative, the Quadratic Discriminant Rule (QD) avoids assuming equal covariance matrices, but requires the estimation of a large number of parameters. Hence, the performance of QD may be poor if the training set sizes are small or moderate. In fact, simulation studies have shown that in the two-groups case LD often outperforms QD for small training sets even when the within -groups covariance matrices differ substantially. The present article shows this to be true when there are more than two groups, as well. Thus, it would seem reasonable and useful to develop a data-based method of classification that, in effect, represents a compromise between QD and LD. In this article we develop such a method based on an empirical Bayes formulation in which the within-groups covariance matrices are assumed to be outcomes of a common prior distribution whose parameters are estimated from the data. Two classification rules are developed under this framework and, through the use of extensive simulations, are compared to existing methods when the number of groups is moderate.  相似文献   

郭婧璇等 《统计研究》2020,37(10):104-114
随着物联网技术的进步,大数据给网络带宽和计算机存储能力带来巨大挑战,传统的集中式数据处理难以实现,客观上促进了分布式统计学习的发展。在无迭代算法研究中,Zhang等(2013)证明了当数据集个数s=O(N) 时,基于局部经验风险最小化的分治(DC)简单平均估计量具有O(N-1)均方误差收敛速度,Huang和Huo(2019)在M估计框架下进一步提出分布式一步估计量,但上述方法均未考虑海量数据可能存在的异质性对分治估计效果的影响。本文在线性模型框架下提出海量异质数据的分治一步加权估计,证明了估计量的渐近性质并考虑了异质性检验问题。将本文提出的方法应用于美国医疗保险实际数据分析,结果表明该方法能更好地拟合数据的线性趋势且显著提高了计算效率。  相似文献   

Birnbaum–Saunders (BS) models are receiving considerable attention in the literature. Multivariate regression models are a useful tool of the multivariate analysis, which takes into account the correlation between variables. Diagnostic analysis is an important aspect to be considered in the statistical modeling. In this paper, we formulate multivariate generalized BS regression models and carry out a diagnostic analysis for these models. We consider the Mahalanobis distance as a global influence measure to detect multivariate outliers and use it for evaluating the adequacy of the distributional assumption. We also consider the local influence approach and study how a perturbation may impact on the estimation of model parameters. We implement the obtained results in the R software, which are illustrated with real-world multivariate data to show their potential applications.  相似文献   

Supersaturated designs (SSDs) are factorial designs in which the number of experimental runs is smaller than the number of parameters to be estimated in the model. While most of the literature on SSDs has focused on balanced designs, the construction and analysis of unbalanced designs has not been developed to a great extent. Recent studies discuss the possible advantages of relaxing the balance requirement in construction or data analysis of SSDs, and that unbalanced designs compare favorably to balanced designs for several optimality criteria and for the way in which the data are analyzed. Moreover, the effect analysis framework of unbalanced SSDs until now is restricted to the central assumption that experimental data come from a linear model. In this article, we consider unbalanced SSDs for data analysis under the assumption of generalized linear models (GLMs), revealing that unbalanced SSDs perform well despite the unbalance property. The examination of Type I and Type II error rates through an extensive simulation study indicates that the proposed method works satisfactorily.  相似文献   

Food authenticity studies are concerned with determining if food samples have been correctly labelled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give excellent classification performance on several high-dimensional multiclass food authenticity datasets with more variables than observations. The variables selected by the proposed method provide information about which variables are meaningful for classification purposes. A headlong search strategy for variable selection is shown to be efficient in terms of computation and achieves excellent classification performance. In applications to several food authenticity datasets, our proposed method outperformed default implementations of Random Forests, AdaBoost, transductive SVMs and Bayesian Multinomial Regression by substantial margins.  相似文献   

Decision making is often supported by decision models. This study suggests that the negative impact of poor data quality (DQ) on decision making is often mediated by biased model estimation. To highlight this perspective, we develop an analytical framework that links three quality levels – data, model, and decision. The general framework is first developed at a high-level, and then extended further toward understanding the effect of incomplete datasets on Linear Discriminant Analysis (LDA) classifiers. The interplay between the three quality levels is evaluated analytically – initially for a one-dimensional case, and then for multiple dimensions. The impact is then further analyzed through several simulative experiments with artificial and real-world datasets. The experiment results support the analytical development and reveal nearly-exponential decline in the decision error as the completeness level increases. To conclude, we discuss the framework and the empirical findings, elaborate on the implications of our model on the data quality management, and the use of data for decision-models estimation.  相似文献   

文章基于平均策略,使用BP神经网络对贝叶斯判别、费歇尔线性判别和logistic回归判别财务危机的输出新变量进行加权平均再判别,并和单一方法判别的效果比较。应用双层分类器做了一次财务危机判别的新尝试。  相似文献   

There is a vast amount of literature on accelerated life testing, however, most of this literature ignores the possibility of competing modes of failure. The literature that attempts to address this problem often uses a maximum likelihood estimation method which may require large samples. Even in the case of a single failure mode, a small sample data is expected to be generated by an ALT procedure applied to expensive components. In this article we present a Bayesian framework for the analysis of ALT data with possible multiple failure modes. We illustrate the applicability of our model on some competing risk data sets available in the literature.  相似文献   

Process monitoring in the presence of data correlation is one of the most discussed issues in statistical process control literature over the past decade. However, the attention to retrospective analysis in the presence of data correlation with various common cause sigma estimators is lacking in the literature. Maragah et al. (1992), in an early paper on the retrospective analysis in presence of data correlation, addresses only a single common cause sigma estimator. This paper studies the effect of data correlation on retrospective X-chart with various common cause sigma estimates in stable period of AR(1) Process. This study is carried out with the aim of identifying suitable standard deviation statistic/statistics which is/are robust to the data correlation. This paper also discusses the robustness of common cause sigma estimates for monitoring the data following other time series models, namely ARMA(1,1) and AR(p). Further, the bias characteristics of robust standard deviation estimates have been discussed for the above time-series models. This paper further studies the performance of retrospective X-chart on forecast residuals from various forecasting methods of AR(1) process. The above studies were carried out through simulating the stable period of AR(1), AR(2), stable and invertible period of ARMA(1,1) processes. The average number of false alarms have been considered as a measure of performance. The results of simulation studies have been discussed.  相似文献   

Andrews plots (Biometrics 28 (1972) 125-136), as a tool to graphically interpret multivariate data, have recently gained considerable recognition. In this article, we first review the previous literature and then suggest a modification to the traditional Andrews plots. Finally, we illustrate a few new applications of these plots in robust design studies and in correspondence analysis, using real data.  相似文献   

In this paper we use Monte Carlo Simulation methodology to compare the effectiveness of five multivariate quality control methods, namely Hotelling T 2, Multivariate Shewhart Char, Discriminant Analysis, Decomposition Method, and Multivariate Ridge Residual Chart-developed by Authors-, for controlling the mean vector in a multivariate process. P-dimensional multivariate normal data generated using different covariance structures. Various amount of shift in the mean vector is induced and the resulting Average Run Length (ARL) is computed. The effectiveness of each method with regard to ARL is discussed.  相似文献   


When data analysts operate within different statistical frameworks (e.g., frequentist versus Bayesian, emphasis on estimation versus emphasis on testing), how does this impact the qualitative conclusions that are drawn for real data? To study this question empirically we selected from the literature two simple scenarios—involving a comparison of two proportions and a Pearson correlation—and asked four teams of statisticians to provide a concise analysis and a qualitative interpretation of the outcome. The results showed considerable overall agreement; nevertheless, this agreement did not appear to diminish the intensity of the subsequent debate over which statistical framework is more appropriate to address the questions at hand.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号