首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
At present, ensemble learning has exhibited its great power in stabilizing and enhancing the performance of some traditional variable selection methods such as lasso and genetic algorithm. In this paper, a novel bagging ensemble method called BSSW is developed to implement variable ranking and selection in linear regression models. Its main idea is to execute stepwise search algorithm on multiple bootstrap samples. In each trial, a mixed importance measure is assigned to each variable according to the order that it is selected into final model as well as the improvement of model fitting resulted from its inclusion. Based on the importance measure averaged across some bootstrapping trials, all candidate variables are ranked and then decided to be important or not. To extend the scope of application, BSSW is extended to the situation of generalized linear models. Experiments carried out with some simulated and real data indicate that BSSW achieves better performance in most studied cases when compared with several other existing methods.  相似文献   

2.
The Calculation of the permanent of a matrix is an extremely difficult task. Indeed it belongs to the class of hard counting problems denoted #P complete and hence any algorithm to compute the permanent must run in exponential time in the order of the matrix. Attention, therefore, has concentrated on Monte Carlo algorithms to estimate permanents with considerable emphasis on deriving randomised polynomial time algorithms. Interest in this area hs largely stemmed from problems in combibatorial enumeration, for instance the permanent of a square(0,1) matrix gives the number of perfect matchings in a bipartite graph.  相似文献   

3.
Tree-structured methods for exploratory data analysis have previously been extended to right-censored survival data. We further extend these methods to allow for truncation and time-dependent covariates. We apply the new methods to a data set on incubation times of acquired immunodeficiency syndrome (AIDS), using calendar time as a time-dependent covariate. Contrary to expectation, we find that rates of progression to AIDS appear to be faster after August 1989 than before.  相似文献   

4.
The main models of machine learning are briefly reviewed and considered for building a classifier to identify the Fragile X Syndrome (FXS). We have analyzed 172 patients potentially affected by FXS in Andalusia (Spain) and, by means of a DNA test, each member of the data set is known to belong to one of two classes: affected, not affected. The whole predictor set, formed by 40 variables, and a reduced set with only nine predictors significantly associated with the response are considered. Four alternative base classification models have been investigated: logistic regression, classification trees, multilayer perceptron and support vector machines. For both predictor sets, the best accuracy, considering both the mean and the standard deviation of the test error rate, is achieved by the support vector machines, confirming the increasing importance of this learning algorithm. Three ensemble methods - bagging, random forests and boosting - were also considered, amongst which the bagged versions of support vector machines stand out, especially when they are constructed with the reduced set of predictor variables. The analysis of the sensitivity, the specificity and the area under the ROC curve agrees with the main conclusions extracted from the accuracy results. All of these models can be fitted by free R programs.  相似文献   

5.
This study considers the nonparametric estimation of a regression function when the response variable is the waiting time between two consecutive events of a stationary renewal process, and where this variable is not completely observed. In these circumstances, our data are the recurrence times from the occurrence of the last event up to a pre-established time, along with the corresponding values of a certain set of covariates. Estimation of the error density function and some of its characteristics are also considered. For the proposed estimators, we first analyze their asymptotic behavior and, thereafter, carry out a simulation study to highlight their behavior in finite samples. Finally, we apply this methodology to an illustrative example with biomedical data.  相似文献   

6.
We present theoretical results on the random wavelet coefficients covariance structure. We use simple properties of the coefficients to derive a recursive way to compute the within- and across-scale covariances. We point out a useful link between the algorithm proposed and the two-dimensional discrete wavelet transform. We then focus on Bayesian wavelet shrinkage for estimating a function from noisy data. A prior distribution is imposed on the coefficients of the unknown function. We show how our findings on the covariance structure make it possible to specify priors that take into account the full correlation between coefficients through a parsimonious number of hyperparameters. We use Markov chain Monte Carlo methods to estimate the parameters and illustrate our method on bench-mark simulated signals.  相似文献   

7.
The spatial sampling designs suggested by Quenouille (1949) are investigated under a number of trend assumptions, namely a linear trend, linear trend and periodic variation, and spatially correlated populations. The results obtained provide a planar analogue to the one-dimensional results appearing in Cochram (1977, Ch. ε). Centrally located systematic sampling and the planar analogue of Yates' (1948) method of end corrections are put forward as methods which eliminate the linear trend. The comparisons of the two methods provide the planar analogue to results obtained by Bellhouse and Rao (1975).  相似文献   

8.
A system of predictors for estimating a finite population variance is defined and shown to be asymptotically design-unbiased (ADU) and asymptotically design-consistent (ADC) under probability sampling. An asymptotic mean squared error (MSE) of a generalized regression-type predictor, generated from the system, is obtained. The suggested predictor attains the minimum expected variance of any design-unbiased estimator when the superpopulation model is correct. The generalized regression-type predictor and the predictor suggested by Mukhopadhyay (1990) are compared.  相似文献   

9.
The objective of this paper is to study the efficiency of sampling schemes suggested by Hosmer(1973), termed models Ml and M2, relative to the regular random sampling, termed model MO, when samples are drawn from a population having the Inverse Gaussian-Weibull (IG-W) mixture distribution.

It has been shown that whether the efficiency is based on relative variances of the maximum likelihood estimates (ML,E's) of the components of the vector of parameters or on the generalized variances of the MLE's of that vector, Hosmer's models Ml or M2 perform better than model MO.  相似文献   

10.
Motivated by molecular biology, there has been an upsurge of research activities in directional statistics in general and its Bayesian aspect in particular. The central distribution for the circular case is von Mises distribution which has two parameters (mean and concentration) akin to the univariate normal distribution. However, there has been a challenge to sample efficiently from the posterior distribution of the concentration parameter. We describe a novel, highly efficient algorithm to sample from the posterior distribution and fill this long-standing gap.  相似文献   

11.
The demand for reliable statistics in subpopulations, when only reduced sample sizes are available, has promoted the development of small area estimation methods. In particular, an approach that is now widely used is based on the seminal work by Battese et al. [An error-components model for prediction of county crop areas using survey and satellite data, J. Am. Statist. Assoc. 83 (1988), pp. 28–36] that uses linear mixed models (MM). We investigate alternatives when a linear MM does not hold because, on one side, linearity may not be assumed and/or, on the other, normality of the random effects may not be assumed. In particular, Opsomer et al. [Nonparametric small area estimation using penalized spline regression, J. R. Statist. Soc. Ser. B 70 (2008), pp. 265–283] propose an estimator that extends the linear MM approach to the case in which a linear relationship may not be assumed using penalized splines regression. From a very different perspective, Chambers and Tzavidis [M-quantile models for small area estimation, Biometrika 93 (2006), pp. 255–268] have recently proposed an approach for small-area estimation that is based on M-quantile (MQ) regression. This allows for models robust to outliers and to distributional assumptions on the errors and the area effects. However, when the functional form of the relationship between the qth MQ and the covariates is not linear, it can lead to biased estimates of the small area parameters. Pratesi et al. [Semiparametric M-quantile regression for estimating the proportion of acidic lakes in 8-digit HUCs of the Northeastern US, Environmetrics 19(7) (2008), pp. 687–701] apply an extended version of this approach for the estimation of the small area distribution function using a non-parametric specification of the conditional MQ of the response variable given the covariates [M. Pratesi, M.G. Ranalli, and N. Salvati, Nonparametric m-quantile regression using penalized splines, J. Nonparametric Stat. 21 (2009), pp. 287–304]. We will derive the small area estimator of the mean under this model, together with its mean-squared error estimator and compare its performance to the other estimators via simulations on both real and simulated data.  相似文献   

12.
This paper develops the algorithm for the optimization designs of the adaptive T2 Control Chart for Monitoring the Mean Vector of a Multivariate Normal Process. It includes the variable sample size, variable sampling interval and variable dimensional chart. The VDT2 control chart performs well for moderate and large shifts in the mean vector. However, its performance for small shifts is poor. To improve the chart's performance in detecting such shifts, we propose the application of the variable sample size and sampling interval technique to the VDT2 control chart, resulting in the VSSIVDT2 control chart.  相似文献   

13.
14.
This paper considered the estimation of the regression parameters of a general probit regression model. Accordingly, we proposed five ridge regression (RR) estimators for the probit regression models for estimating the parameters (β)(β) when the weighted design matrix is ill-conditioned and it is suspected that the parameter ββ may belong to a linear subspace defined by Hβ=hHβ=h. Asymptotic properties of the estimators are studied with respect to quadratic biases, MSE matrices and quadratic risks. The regions of optimality of the proposed estimators are determined based on the quadratic risks. Some relative efficiency tables and risk graphs are provided to illustrate the numerical comparison of the estimators. We conclude that when q≥3q3, one would uses PRRRE; otherwise one uses PTRRE with some optimum size αα. We also discuss the performance of the proposed estimators compare to the alternative ridge regression method due to Liu (1993).  相似文献   

15.
The distribution of the sample correlation coefficient is derived when the population is a mixture of two bivariate normal distributions with zero mean but different covariances and mixing proportions 1 - λ and λ respectively; λ will be called the proportion of contamination. The test of ρ = 0 based on Student's t, Fisher's z, arcsine, or Ruben's transformation is shown numerically to be nonrobust when λ, the proportion of contamination, lies between 0.05 and 0.50 and the contaminated population has 9 times the variance of the standard (bivariate normal) population. These tests are also sensitive to the presence of outliers.  相似文献   

16.
Summary.  The paper presents a statistical analysis of patterns in the incidence of disciplinary sanction (yellow and red cards) that were taken against players in the English Premier League over the period 1996–2003. Several questions concerning sources of inconsistency and bias in refereeing standards are examined. Evidence is found to support a time consistency hypothesis, that the average incidence of disciplinary sanction is predominantly stable over time. However, a refereeing consistency hypothesis, that the incidence of disciplinary sanction does not vary between referees, is rejected. The tendency for away teams to incur more disciplinary points than home teams cannot be attributed to the home advantage effect on match results and appears to be due to a refereeing bias favouring the home team.  相似文献   

17.
ABSTRACT

In this article, we propose a method to estimate the common location and common scale parameters of several distributions using suitably defined ranked set sampling. Efficiency comparison of the obtained estimators with some of the standard estimators is made. Illustration of the results to real life data sets is also described.  相似文献   

18.
This article proposes a CV chart by using the variable sample size and sampling interval (VSSI) feature to improve the performance of the basic CV chart, for detecting small and moderate shifts in the CV. The proposed VSSI CV chart is designed by allowing the sample size and the sampling interval to vary. The VSSI CV chart's statistical performance is measured by using the average time to signal (ATS) and expected average time to signal (EATS) criteria and is compared with that of existing CV charts. The Markov chain approach is employed in the design of the chart.  相似文献   

19.
This paper discusses a model in which the regression lines will be passing through a common point. This point exists as a focal point in the wind-blown sand phenomena. The model of regression lines will be called ‘the focal point regression model’. The focal point will move according to the conditions of the experiments or the measurement site, so it must be estimated together with regression coefficients. The existence of the focal point is mathematically proved in the research field of coastal engineering, but its physical meaning and exact estimation method have not been established. Considering the experimental and/or measurement conditions, five models, that is, common or different error variance(s), passing through or not the centroid and Bayes-like approach are proposed. Moreover, the formulae of direct computation for a focal point under some conditions are given for engineering purpose. The models are applied to the wind-blown sand data, and behaviors of the models are verified by numerical experiments.  相似文献   

20.
One of the surprising decision-theoretic results Charles Stein discovered is the inadmissibility of the uniformly minimum variance unbiased estirnator(UMVUE) of the variance of a normal distribution with an unknown mean. Some methods for deriving estimators better than the UMVUE were given by Stein. Brown, Brewster and Zidek. Recently Kubokawa established a novel approach, called the IERD method, by use of which one gets a unified class of improved estimators including their previous procedures. This paper gives a review for a series of these decision-theoretical developments as well as surveys the study of the variance-estimation problem from various aspects. Related to this issue, the paper enumerates several topics with the situations where the usual plain estimators are required to be shrunken or modified, and gives reasonable procedures improving the usual ones through the IERD method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号