首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This study develops a robust automatic algorithm for clustering probability density functions based on the previous research. Unlike other existing methods that often pre-determine the number of clusters, this method can self-organize data groups based on the original data structure. The proposed clustering method is also robust in regards to noise. Three examples of synthetic data and a real-world COREL dataset are utilized to illustrate the accurateness and effectiveness of the proposed approach.  相似文献   

2.
Most feature screening methods for ultrahigh-dimensional classification explicitly or implicitly assume the covariates are continuous. However, in the practice, it is quite common that both categorical and continuous covariates appear in the data, and applicable feature screening method is very limited. To handle this non-trivial situation, we propose an entropy-based feature screening method, which is model free and provides a unified screening procedure for both categorical and continuous covariates. We establish the sure screening and ranking consistency properties of the proposed procedure. We investigate the finite sample performance of the proposed procedure by simulation studies and illustrate the method by a real data analysis.  相似文献   

3.
We consider the problem of variable screening in ultra-high-dimensional generalized linear models (GLMs) of nonpolynomial orders. Since the popular SIS approach is extremely unstable in the presence of contamination and noise, we discuss a new robust screening procedure based on the minimum density power divergence estimator (MDPDE) of the marginal regression coefficients. Our proposed screening procedure performs well under pure and contaminated data scenarios. We provide a theoretical motivation for the use of marginal MDPDEs for variable screening from both population as well as sample aspects; in particular, we prove that the marginal MDPDEs are uniformly consistent leading to the sure screening property of our proposed algorithm. Finally, we propose an appropriate MDPDE-based extension for robust conditional screening in GLMs along with the derivation of its sure screening property. Our proposed methods are illustrated through extensive numerical studies along with an interesting real data application.  相似文献   

4.
Abstract

In this work, we propose beta prime kernel estimator for estimation of a probability density functions defined with nonnegative support. For the proposed estimator, beta prime probability density function used as a kernel. It is free of boundary bias and nonnegative with a natural varying shape. We obtained the optimal rate of convergence for the mean squared error (MSE) and the mean integrated squared error (MISE). Also, we use adaptive Bayesian bandwidth selection method with Lindley approximation for heavy tailed distributions and compare its performance with the global least squares cross-validation bandwidth selection method. Simulation studies are performed to evaluate the average integrated squared error (ISE) of the proposed kernel estimator against some asymmetric competitors using Monte Carlo simulations. Moreover, real data sets are presented to illustrate the findings.  相似文献   

5.
Rejection sampling is a well-known method to generate random samples from arbitrary target probability distributions. It demands the design of a suitable proposal probability density function (pdf) from which candidate samples can be drawn. These samples are either accepted or rejected depending on a test involving the ratio of the target and proposal densities. The adaptive rejection sampling method is an efficient algorithm to sample from a log-concave target density, that attains high acceptance rates by improving the proposal density whenever a sample is rejected. In this paper we introduce a generalized adaptive rejection sampling procedure that can be applied with a broad class of target probability distributions, possibly non-log-concave and exhibiting multiple modes. The proposed technique yields a sequence of proposal densities that converge toward the target pdf, thus achieving very high acceptance rates. We provide a simple numerical example to illustrate the basic use of the proposed technique, together with a more elaborate positioning application using real data.  相似文献   

6.
In this article we suggest a definition for the notion of L1-distance that combines probability density functions and prior probabilities. We also obtain the upper and lower bounds for this distance as well as its relation to other measures. Besides, the relationship between the proposed distance and quantities involved in classification problem by Bayesian method will be established. In practice, calculations are performed by Matlab procedures. As an illustration for applications of the obtained results, the article gives here an estimation for the ability to repay bank debt of some companies in Can Tho City, Vietnam.  相似文献   

7.
In this article, a simple probability distribution for modeling data with flat densities is introduced and used to model real world assessment data. The new distribution is a mixture of three distributions, two truncated normals, and a uniform. The parameters are estimated by using the sample percentiles because the likelihood and the method of moment approaches result in complicated forms. The proposed method is mainly based on the shape of the empirical density. The proposed method looks promising for modeling flat densities that are similar to the one used in the study.  相似文献   

8.
Ultra-high dimensional data arise in many fields of modern science, such as medical science, economics, genomics and imaging processing, and pose unprecedented challenge for statistical analysis. With such rapid-growth size of scientific data in various disciplines, feature screening becomes a primary step to reduce the high dimensionality to a moderate scale that can be handled by the existing penalized methods. In this paper, we introduce a simple and robust feature screening method without any model assumption to tackle high dimensional censored data. The proposed method is model-free and hence applicable to a general class of survival models. The sure screening and ranking consistency properties without any finite moment condition of the predictors and the response are established. The computation of the proposed method is rather straightforward. Finite sample performance of the newly proposed method is examined via extensive simulation studies. An application is illustrated with the gene association study of the mantle cell lymphoma.  相似文献   

9.
Feature screening and variable selection are fundamental in analysis of ultrahigh-dimensional data, which are being collected in diverse scientific fields at relatively low cost. Distance correlation-based sure independence screening (DC-SIS) has been proposed to perform feature screening for ultrahigh-dimensional data. The DC-SIS possesses sure screening property and filters out unimportant predictors in a model-free manner. Like all independence screening methods, however, it fails to detect the truly important predictors which are marginally independent of the response variable due to correlations among predictors. When there are many irrelevant predictors which are highly correlated with some strongly active predictors, the independence screening may miss other active predictors with relatively weak marginal signals. To improve the performance of DC-SIS, we introduce an effective iterative procedure based on distance correlation to detect all truly important predictors and potentially interactions in both linear and nonlinear models. Thus, the proposed iterative method possesses the favourable model-free and robust properties. We further illustrate its excellent finite-sample performance through comprehensive simulation studies and an empirical analysis of the rat eye expression data set.  相似文献   

10.
In this paper, we propose a conditional quantile independence screening approach for ultra-high-dimensional heterogeneous data given some known, significant and low-dimensional variables. The new method does not require imposing a specific model structure for the response and covariates and can detect additional features that contribute to conditional quantiles of the response given those already-identified important predictors. We also prove that the proposed procedure enjoys the ranking consistency and sure screening properties. Some simulation studies are carried out to examine the performance of advised procedure. At last, we illustrate it by a real data example.  相似文献   

11.
A crucial problem in kernel density estimates of a probability density function is the selection of the bandwidth. The aim of this study is to propose a procedure for selecting both fixed and variable bandwidths. The present study also addresses the question of how different variable bandwidth kernel estimators perform in comparison with each other and to the fixed type of bandwidth estimators. The appropriate algorithms for implementation of the proposed method are given along with a numerical simulation.The numerical results serve as a guide to determine which bandwidth selection method is most appropriate for a given type of estimator over a vide class of probability density functions, Also, we obtain a numerical comparison of the different types of kernel estimators under various types of bandwidths.  相似文献   

12.
This article modifies two internal validity measures and applies them to evaluate the quality of clustering for probability density functions (pdfs). Based on these measures, we propose a new modified genetic algorithm called GA-CDF to establish the suitable clusters for pdfs. The proposed algorithm is tested by four numerical examples including two synthetic data sets and two real data sets. These examples illustrate the superiority of proposed algorithm over some existing algorithms in evaluating the internal or external validity measures. It demonstrates the feasibility and applicability of the GA-CDF for practical problems in data mining.  相似文献   

13.
Abstract

Generating function-based statistical inference is an attractive approach if the probability (density) function is complicated when compared with the generating function. Here, we propose a parameter estimation method that minimizes a probability generating function (pgf)-based power divergence with a tuning parameter to mitigate the impact of data contamination. The proposed estimator is linked to the M-estimators and hence possesses the properties of consistency and asymptotic normality. In terms of parameter biases and mean squared errors from simulations, the proposed estimation method performs better for smaller value of the tuning parameter as data contamination percentage increases.  相似文献   

14.
Quantile regression is a flexible approach to assessing covariate effects on failure time, which has attracted considerable interest in survival analysis. When the dimension of covariates is much larger than the sample size, feature screening and variable selection become extremely important and indispensable. In this article, we introduce a new feature screening method for ultrahigh dimensional censored quantile regression. The proposed method can work for a general class of survival models, allow for heterogeneity of data and enjoy desirable properties including the sure screening property and the ranking consistency property. Moreover, an iterative version of screening algorithm has also been proposed to accommodate more complex situations. Monte Carlo simulation studies are designed to evaluate the finite sample performance under different model settings. We also illustrate the proposed methods through an empirical analysis.  相似文献   

15.
This paper addresses the problem of the probability density estimation in the presence of covariates when data are missing at random (MAR). The inverse probability weighted method is used to define a nonparametric and a semiparametric weighted probability density estimators. A regression calibration technique is also used to define an imputed estimator. It is shown that all the estimators are asymptotically normal with the same asymptotic variance as that of the inverse probability weighted estimator with known selection probability function and weights. Also, we establish the mean squared error (MSE) bounds and obtain the MSE convergence rates. A simulation is carried out to assess the proposed estimators in terms of the bias and standard error.  相似文献   

16.
Abstract

The Coefficient of Variation is one of the most commonly used statistical tool across various scientific fields. This paper proposes a use of the Coefficient of Variation, obtained by Sampling, to define the polynomial probability density function (pdf) of a continuous and symmetric random variable on the interval [a, b]. The basic idea behind the first proposed algorithm is the transformation of the interval from [a, b] to [0, b-a]. The chi-square goodness-of-fit test is used to compare the proposed (observed) sample distribution with the expected probability distribution. The experimental results show that the collected data are approximated by the proposed pdf. The second algorithm proposes a new method to get a fast estimate for the degree of the polynomial pdf when the random variable is normally distributed. Using the known percentages of values that lie within one, two and three standard deviations of the mean, respectively, the so-called three-sigma rule of thumb, we conclude that the degree of the polynomial pdf takes values between 1.8127 and 1.8642. In the case of a Laplace (μ, b) distribution, we conclude that the degree of the polynomial pdf takes values greater than 1. All calculations and graphs needed are done using statistical software R.  相似文献   

17.
With the recent explosion of scientific data of unprecedented size and complexity, feature ranking and screening are playing an increasingly important role in many scientific studies. In this article, we propose a novel feature screening procedure under a unified model framework, which covers a wide variety of commonly used parametric and semiparametric models. The new method does not require imposing a specific model structure on regression functions, and thus is particularly appealing to ultrahigh-dimensional regressions, where there are a huge number of candidate predictors but little information about the actual model forms. We demonstrate that, with the number of predictors growing at an exponential rate of the sample size, the proposed procedure possesses consistency in ranking, which is both useful in its own right and can lead to consistency in selection. The new procedure is computationally efficient and simple, and exhibits a competent empirical performance in our intensive simulations and real data analysis.  相似文献   

18.
Proportional hazards model(Cox, 1972) is reviewed for the case of grouped data with one continuously measured covariate. This leads to a logit-rank procedure for tied data which is reduced to the test proposed by O’Brien(1978) and studied by O’Quigley and Prentice(1991) in the absence of ties. The proposed test is then applied to a special ranking method in order to study non-monotonic associations.  相似文献   

19.
In order to explore and compare a finite number T of data sets by applying functional principal component analysis (FPCA) to the T associated probability density functions, we estimate these density functions by using the multivariate kernel method. The data set sizes being fixed, we study the behaviour of this FPCA under the assumption that all the bandwidth matrices used in the estimation of densities are proportional to a common parameter h and proportional to either the variance matrices or the identity matrix. In this context, we propose a selection criterion of the parameter h which depends only on the data and the FPCA method. Then, on simulated examples, we compare the quality of approximation of the FPCA when the bandwidth matrices are selected using either the previous criterion or two other classical bandwidth selection methods, that is, a plug-in or a cross-validation method.  相似文献   

20.
Many methods based on ranked set sampling (RSS) assume perfect ranking of the samples. Here, by using the data measured by a balanced RSS scheme, we propose a nonparametric test for the assumption of perfect ranking. The test statistic that we use formally corresponds to the Jonckheere-Terpstra-type test statistic. We show formal relations of the proposed test for perfect ranking to other methods proposed recently in the literature. Through an empirical power study, we demonstrate that the proposed method performs favorably compared to many of its competitors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号