首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
2.
In this article, we develop a method for checking the estimation equations, which is for joint estimation of the regression parameters and the overdispersion parameters, based on one dimension projected covariate. This method is different from the general testing methods in that our proposed method can be applied to high-dimensional response while the classical testing methods can not be extended to high dimension problem simply to construct a powerful test. Furthermore, the properties of the test statistics are investigated and Nonparametric Monte Carlo Test (NMCT) is suggested to determine the critical values of the test statistics under null hypothesis.  相似文献   

3.
Under non-normality, this article is concerned with testing diagonality of high-dimensional covariance matrix, which is more practical than testing sphericity and identity in high-dimensional setting. The existing testing procedure for diagonality is not robust against either the data dimension or the data distribution, producing tests with distorted type I error rates much larger than nominal levels. This is mainly due to bias from estimating some functions of high-dimensional covariance matrix under non-normality. Compared to the sphericity and identity hypotheses, the asymptotic property of the diagonality hypothesis would be more involved and we should be more careful to deal with bias. We develop a correction that makes the existing test statistic robust against both the data dimension and the data distribution. We show that the proposed test statistic is asymptotically normal without the normality assumption and without specifying an explicit relationship between the dimension p and the sample size n. Simulations show that it has good size and power for a wide range of settings.  相似文献   

4.
Approximate Bayesian computation (ABC) methods permit approximate inference for intractable likelihoods when it is possible to simulate from the model. However, they perform poorly for high-dimensional data and in practice must usually be used in conjunction with dimension reduction methods, resulting in a loss of accuracy which is hard to quantify or control. We propose a new ABC method for high-dimensional data based on rare event methods which we refer to as RE-ABC. This uses a latent variable representation of the model. For a given parameter value, we estimate the probability of the rare event that the latent variables correspond to data roughly consistent with the observations. This is performed using sequential Monte Carlo and slice sampling to systematically search the space of latent variables. In contrast, standard ABC can be viewed as using a more naive Monte Carlo estimate. We use our rare event probability estimator as a likelihood estimate within the pseudo-marginal Metropolis–Hastings algorithm for parameter inference. We provide asymptotics showing that RE-ABC has a lower computational cost for high-dimensional data than standard ABC methods. We also illustrate our approach empirically, on a Gaussian distribution and an application in infectious disease modelling.  相似文献   

5.
Sliced regression is an effective dimension reduction method by replacing the original high-dimensional predictors with its appropriate low-dimensional projection. It is free from any probabilistic assumption and can exhaustively estimate the central subspace. In this article, we propose to incorporate shrinkage estimation into sliced regression so that variable selection can be achieved simultaneously with dimension reduction. The new method can improve the estimation accuracy and achieve better interpretability for the reduced variables. The efficacy of proposed method is shown through both simulation and real data analysis.  相似文献   

6.
This paper is concerned with the problem of selecting variables in two-group discriminant analysis for high-dimensional data with fewer observations than the dimension. We consider a selection criterion based on approximately unbiased for AIC type of risk. When the dimension is large compared to the sample size, AIC type of risk cannot be defined. We propose AIC by replacing maximum likelihood estimator with ridge-type estimator. This idea follows Srivastava and Kubokawa (2008). It has been further extended by Yamamura et al. (2010). Simulation revealed that the proposed AIC performs well.  相似文献   

7.
Residual marked empirical process-based tests are commonly used in regression models. However, they suffer from data sparseness in high-dimensional space when there are many covariates. This paper has three purposes. First, we suggest a partial dimension reduction adaptive-to-model testing procedure that can be omnibus against general global alternative models although it fully use the dimension reduction structure under the null hypothesis. This feature is because that the procedure can automatically adapt to the null and alternative models, and thus greatly overcomes the dimensionality problem. Second, to achieve the above goal, we propose a ridge-type eigenvalue ratio estimate to automatically determine the number of linear combinations of the covariates under the null and alternative hypotheses. Third, a Monte-Carlo approximation to the sampling null distribution is suggested. Unlike existing bootstrap approximation methods, this gives an approximation as close to the sampling null distribution as possible by fully utilising the dimension reduction model structure under the null model. Simulation studies and real data analysis are then conducted to illustrate the performance of the new test and compare it with existing tests.  相似文献   

8.
ABSTRACT

This paper discusses the problem of testing the complete independence of random variables when the dimension of observations can be much larger than the sample size. It is reported that two typical tests based on, respectively, the biggest off-diagonal entry and the largest eigenvalue of the sample correlation matrix lose their control of type I error in such high-dimensional scenarios, and exhibit distinct behaviours in type II error under different types of alternative hypothesis. Given these facts, we propose a permutation test procedure by synthesizing these two extreme statistics. Simulation results show that for finite dimension and sample size the proposed test outperforms the existing methods in various cases.  相似文献   

9.
ABSTRACT

Identifying homogeneous subsets of predictors in classification can be challenging in the presence of high-dimensional data with highly correlated variables. We propose a new method called cluster correlation-network support vector machine (CCNSVM) that simultaneously estimates clusters of predictors that are relevant for classification and coefficients of penalized SVM. The new CCN penalty is a function of the well-known Topological Overlap Matrix whose entries measure the strength of connectivity between predictors. CCNSVM implements an efficient algorithm that alternates between searching for predictors’ clusters and optimizing a penalized SVM loss function using Majorization–Minimization tricks and a coordinate descent algorithm. This combining of clustering and sparsity into a single procedure provides additional insights into the power of exploring dimension reduction structure in high-dimensional binary classification. Simulation studies are considered to compare the performance of our procedure to its competitors. A practical application of CCNSVM on DNA methylation data illustrates its good behaviour.  相似文献   

10.
Two methods that are often used to evaluate the run length distribution of quality control charts are the Markov chain and integral equation approaches. Both methods have been used to evaluate the cumulative sum (CUSUM) charts and the exponentially weighted moving average (EWMA) control charts. The Markov chain approach involves "discretiz-ing" the possible values which can be plotted. Using properties of finite Markov chains, expressions for the distribution of the run length, and for the average run length (ARL), can be obtained. For the CUSUM and EWMA charts there exist integral equations whose solution gives the ARL. Approximate methods can then be used to solve the integral equation. In this article we show that if the product midpoint rule is used to approximate the integral in the integral equation, then both approaches yield the same approximations for the ARL. In addition we show that the recursive expressions for the probability functions are the same for the two approaches. These results establish the integral equation approach as preferable whenever an integral equation can be found  相似文献   

11.
Problems involving high-dimensional data, such as pattern recognition, image analysis, and gene clustering, often require a preliminary step of dimension reduction before or during statistical analysis. If one restricts to a linear technique for dimension reduction, the remaining issue is the choice of the projection. This choice can be dictated by desire to maximize certain statistical criteria, including variance, kurtosis, sparseness, and entropy, of the projected data. Motivations for such criteria comes from past empirical studies of statistics of natural and urban images. We present a geometric framework for finding projections that are optimal for obtaining certain desired statistical properties. Our approach is to define an objective function on spaces of orthogonal linear projections—Stiefel and Grassmann manifolds, and to use gradient techniques to optimize that function. This construction uses the geometries of these manifolds to perform the optimization. Experimental results are presented to demonstrate these ideas for natural and facial images.  相似文献   

12.
A general sampling algorithm for nested Archimedean copulas was recently suggested. It is given in two different forms, a recursive or an explicit one. The explicit form allows for a simpler version of the algorithm which is numerically more stable and faster since less function evaluations are required. The algorithm can also be given in general form, not being restricted to a particular nesting such as fully nested Archimedean copulas. Further, several examples are given.  相似文献   

13.
Recent work has shown that the Lasso-based regularization is very useful for estimating the high-dimensional inverse covariance matrix. A particularly useful scheme is based on penalizing the ?1 norm of the off-diagonal elements to encourage sparsity. We embed this type of regularization into high-dimensional classification. A two-stage estimation procedure is proposed which first recovers structural zeros of the inverse covariance matrix and then enforces block sparsity by moving non-zeros closer to the main diagonal. We show that the block-diagonal approximation of the inverse covariance matrix leads to an additive classifier, and demonstrate that accounting for the structure can yield better performance accuracy. Effect of the block size on classification is explored, and a class of asymptotically equivalent structure approximations in a high-dimensional setting is specified. We suggest a variable selection at the block level and investigate properties of this procedure in growing dimension asymptotics. We present a consistency result on the feature selection procedure, establish asymptotic lower an upper bounds for the fraction of separative blocks and specify constraints under which the reliable classification with block-wise feature selection can be performed. The relevance and benefits of the proposed approach are illustrated on both simulated and real data.  相似文献   

14.
The paper considers a significance test of regression variables in the high-dimensional linear regression model when the dimension of the regression variables p, together with the sample size n, tends to infinity. Under two sightly different cases, we proved that the likelihood ratio test statistic will converge in distribution to a Gaussian random variable, and the explicit expressions of the asymptotical mean and covariance are also obtained. The simulations demonstrate that our high-dimensional likelihood ratio test method outperforms those using the traditional methods in analyzing high-dimensional data.  相似文献   

15.
This paper considers estimation and prediction in the Aalen additive hazards model in the case where the covariate vector is high-dimensional such as gene expression measurements. Some form of dimension reduction of the covariate space is needed to obtain useful statistical analyses. We study the partial least squares regression method. It turns out that it is naturally adapted to this setting via the so-called Krylov sequence. The resulting PLS estimator is shown to be consistent provided that the number of terms included is taken to be equal to the number of relevant components in the regression model. A standard PLS algorithm can also be constructed, but it turns out that the resulting predictor can only be related to the original covariates via time-dependent coefficients. The methods are applied to a breast cancer data set with gene expression recordings and to the well known primary biliary cirrhosis clinical data.  相似文献   

16.
居民收入密度函数的核密度估计具有非连续性,因无法通过积分计算特定收入区间的人口规模,故在核密度估计基础上,构建二分递归算法用以测算特定收入群体规模。使用中国健康和营养调查中的中国农村居民人均纯收入的微观调查数据,对中国农村居民收入分布进行核密度估计,并通过二分递归算法测算中国农村贫困发生率,结果显示:考虑到微观数据源和数据内容上的一些差异,计算得到的农村贫困发生率符合国家统计局公布的变动趋势且数值差异不大。因此,在核密度估计下使用二分递归算法计算特定收入群体规模具有有效性。  相似文献   

17.
Dimension reduction for model-based clustering   总被引:1,自引:0,他引:1  
We introduce a dimension reduction method for visualizing the clustering structure obtained from a finite mixture of Gaussian densities. Information on the dimension reduction subspace is obtained from the variation on group means and, depending on the estimated mixture model, on the variation on group covariances. The proposed method aims at reducing the dimensionality by identifying a set of linear combinations, ordered by importance as quantified by the associated eigenvalues, of the original features which capture most of the cluster structure contained in the data. Observations may then be projected onto such a reduced subspace, thus providing summary plots which help to visualize the clustering structure. These plots can be particularly appealing in the case of high-dimensional data and noisy structure. The new constructed variables capture most of the clustering information available in the data, and they can be further reduced to improve clustering performance. We illustrate the approach on both simulated and real data sets.  相似文献   

18.
Sliced Inverse Regression (SIR) is an effective method for dimension reduction in high-dimensional regression problems. The original method, however, requires the inversion of the predictors covariance matrix. In case of collinearity between these predictors or small sample sizes compared to the dimension, the inversion is not possible and a regularization technique has to be used. Our approach is based on a Fisher Lecture given by R.D. Cook where it is shown that SIR axes can be interpreted as solutions of an inverse regression problem. We propose to introduce a Gaussian prior distribution on the unknown parameters of the inverse regression problem in order to regularize their estimation. We show that some existing SIR regularizations can enter our framework, which permits a global understanding of these methods. Three new priors are proposed leading to new regularizations of the SIR method. A comparison on simulated data as well as an application to the estimation of Mars surface physical properties from hyperspectral images are provided.  相似文献   

19.
In estimating a multiple integral, it is known that Monte Carlo methods are more efficient than analytical techniques when the number of dimensions is beyond seven. In general, the sample-mean method is better than the hit-or-miss Monte Carlo method. However, when the volume of a domain in a high-dimensional space is of interest, the hit-or-miss method is usually preferred. It is because of the difficulty in generalizing the sample-mean method for the computation of the volume of a domain. This paper develops a technique to make such a generalization possible. The technique can be interpreted as a volume-preserving transformation procedure. A volume-preserving transformation is first performed to transform the concerned domain into a hypersphere. The volume of the domain is then evaluated by computing the volume of the hypersphere.  相似文献   

20.
Summary. The evaluation of the cumulative distribution function of a multivariate normal distribution is considered. The multivariate normal distribution can have any positive definite correlation matrix and any mean vector. The approach taken has two stages. In the first stage, it is shown how non-centred orthoscheme probabilities can be evaluated by using a recursive integration method. In the second stage, some ideas of Schläfli and Abrahamson are extended to show that any non-centred orthant probability can be expressed as differences between at most ( m −1)! non-centred orthoscheme probabilities. This approach allows an accurate evaluation of many multivariate normal probabilities which have important applications in statistical practice.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号