首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recently, the methods used to estimate monotonic regression (MR) models have been substantially improved, and some algorithms can now produce high-accuracy monotonic fits to multivariate datasets containing over a million observations. Nevertheless, the computational burden can be prohibitively large for resampling techniques in which numerous datasets are processed independently of each other. Here, we present efficient algorithms for estimation of confidence limits in large-scale settings that take into account the similarity of the bootstrap or jackknifed datasets to which MR models are fitted. In addition, we introduce modifications that substantially improve the accuracy of MR solutions for binary response variables. The performance of our algorithms is illustrated using data on death in coronary heart disease for a large population. This example also illustrates that MR can be a valuable complement to logistic regression.  相似文献   

2.
Optimal designs for copula models   总被引:1,自引:0,他引:1  
E. Perrone 《Statistics》2016,50(4):917-929
Copula modelling has in the past decade become a standard tool in many areas of applied statistics. However, a largely neglected aspect concerns the design of related experiments. Particularly the issue of whether the estimation of copula parameters can be enhanced by optimizing experimental conditions and how robust all the parameter estimates for the model are with respect to the type of copula employed. In this paper an equivalence theorem for (bivariate) copula models is provided that allows formulation of efficient design algorithms and quick checks of whether designs are optimal or at least efficient. Some examples illustrate that in practical situations considerable gains in design efficiency can be achieved. A natural comparison between different copula models with respect to design efficiency is provided as well.  相似文献   

3.
The experimental design literature has produced a wide range of algorithms optimizing estimator variance for linear models where the design-space is finite or a convex polytope. But these methods have problems handling nonlinear constraints or constraints over multiple treatments. This paper presents Newton-type algorithms to compute exact optimal designs in models with continuous and/or discrete regressors, where the set of feasible treatments is defined by nonlinear constraints. We carry out numerical comparisons with other state-of-art methods to show the performance of this approach.  相似文献   

4.
There exist primarily three different types of algorithms for computing nonparametric maximum likelihood estimates (NPMLEs) of mixing distributions in the literature, which are the EM-type algorithms, the vertex direction algorithms such as VDM and VEM, and the algorithms based on general constrained optimization techniques such as the projected gradient method. It is known that the projected gradient algorithm may run into stagnation during iterations. When a stagnation occurs, VDM steps need to be added. We argue that the abrupt switch to VDM steps can significantly reduce the efficiency of the projected gradient algorithm, and is usually unnecessary. In this paper, we define a group of partially projected directions, which can be regarded as hybrids of ordinary projected gradient directions and VDM directions. Based on these directions, four new algorithms are proposed for computing NPMLEs of mixing distributions. The properties of the algorithms are discussed and their convergence is proved. Extensive numerical simulations show that the new algorithms outperform the existing methods, especially when a NPMLE has a large number of support points or when high accuracy is required.  相似文献   

5.
When a new product is the result of design and/or process improvements introduced in its predecessors, then the past failure data and the expert technical knowledge constitute a valuable source of information that can lead to a more accurate reliability estimate of the upgraded product. This paper proposes a Bayesian procedure to formalize the prior information available about the failure probability of an upgraded automotive component. The elicitation process makes use of the failure data of the past product, the designer information on the effectiveness of planned design/process modifications, information on actual working conditions of the upgraded component and, for outsourced components, technical knowledge on the effect of possible cost reductions. By using the proposed procedure, more accurate estimates of the failure probability can arise. The number of failed items in a future population of vehicles is also predicted to measure the effect of a possible extension of the warranty period. Finally, the proposed procedure was applied to a case study and its feasibility in supporting reliability estimation is illustrated.  相似文献   

6.
A tutorial on support vector regression   总被引:78,自引:0,他引:78  
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.  相似文献   

7.
Computer models can describe complicated phenomena encountered in science and engineering fields. To use these models for scientific investigation, however, their generally long running time and mostly deterministic nature require a specially designed experiment. The column-orthogonal design (COD) is a popular choice for computer experiments. Because of the restriction on the orthogonality, however, only little CODs can be constructed. In this article, we propose two algorithms for constructing nearly CODs by rotating orthogonal arrays under two different criteria. Further, some obtained nearly CODs are nearly Latin hypercube designs. Some examples are provided to show the advantages of our algorithms. Some rotation matrices obtained via the algorithms are listed.  相似文献   

8.
Important empirical information on household behavior and finances is obtained from surveys, and these data are used heavily by researchers, central banks, and for policy consulting. However, various interdependent factors that can be controlled only to a limited extent lead to unit and item nonresponse, and missing data on certain items is a frequent source of difficulties in statistical practice. More than ever, it is important to explore techniques for the imputation of large survey data. This paper presents the theoretical underpinnings of a Markov chain Monte Carlo multiple imputation procedure and outlines important technical aspects of the application of MCMC-type algorithms to large socio-economic data sets. In an illustrative application it is found that MCMC algorithms have good convergence properties even on large data sets with complex patterns of missingness, and that the use of a rich set of covariates in the imputation models has a substantial effect on the distributions of key financial variables.  相似文献   

9.
With respect to random sampling from finite population, when the correlation between the auxiliary and the main characteristics is negative, the product estimator is often used to estimate the population mean. The product estimator, however, would have a large mean-squared-error (MSE) if the coefficients of variations for these two characteristics were large and the absolute value of the correlation between them was small. In this paper, we propose a general family of modified product estimators, that include the product estimator as a special case. We provide a discussion on the reduction of the MSE by using the optimal modified product estimator that has the minimal MSE in the proposed family. In certain situations, these reductions of the MSE can be significant.  相似文献   

10.
A paint manufacturing company was facing the problem of Vehicle Separation and Settling in one of its prime products. These two abnormalities are, in general, opposing in nature. The manufacturer tried several modifications in the existing recipe for the product but failed to control them. Experimentation was carried out using mixture design, a special type of designed experiment, and quadratic response surface models were fitted for both the responses. Finally, optimum formulation was obtained by simultaneously optimizing the two response surface models. During the determination of optimal formulation, different methods were compared. The optimum formulation is currently being used for regular manufacturing.  相似文献   

11.
Many assumptions, including assumptions regarding treatment effects, are made at the design stage of a clinical trial for power and sample size calculations. It is desirable to check these assumptions during the trial by using blinded data. Methods for sample size re‐estimation based on blinded data analyses have been proposed for normal and binary endpoints. However, there is a debate that no reliable estimate of the treatment effect can be obtained in a typical clinical trial situation. In this paper, we consider the case of a survival endpoint and investigate the feasibility of estimating the treatment effect in an ongoing trial without unblinding. We incorporate information of a surrogate endpoint and investigate three estimation procedures, including a classification method and two expectation–maximization (EM) algorithms. Simulations and a clinical trial example are used to assess the performance of the procedures. Our studies show that the expectation–maximization algorithms highly depend on the initial estimates of the model parameters. Despite utilization of a surrogate endpoint, all three methods have large variations in the treatment effect estimates and hence fail to provide a precise conclusion about the treatment effect. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

12.
李扬等 《统计研究》2018,35(7):125-128
海量化的数据规模作为大数据的第一个特征,带来计算方面的首要挑战。大规模样本不一定可以完全替代总体,因此大数据分析的算法设计不仅要考虑精简计算成本,还要考虑如何刻画估计结果的不确定性。本文以分治自助算法和子集双重自助算法为例讨论兼具计算效率提升和不确定性评价的可并行计算的大数据统计算法设计,通过比较分析探讨设计思想与未来研究方向。  相似文献   

13.
Noting that several rule discovery algorithms in data mining can produce a large number of irrelevant or obvious rules from data, there has been substantial research in data mining that addressed the issue of what makes rules truly 'interesting'. This resulted in the development of a number of interestingness measures and algorithms that find all interesting rules from data. However, these approaches have the drawback that many of the discovered rules, while supposed to be interesting by definition, may actually (1) be obvious in that they logically follow from other discovered rules or (2) be expected given some of the other discovered rules and some simple distributional assumptions. In this paper we argue that this is a paradox since rules that are supposed to be interesting, in reality are uninteresting for the above reason. We show that this paradox exists for various popular interestingness measures and present an abstract characterization of an approach to alleviate the paradox. We finally discuss existing work in data mining that addresses this issue and show how these approaches can be viewed with respect to the characterization presented here.  相似文献   

14.
This paper presents some considerations about the numerical procedures for generating D–optimal design in a finite design space. The influence of starting procedures and the finite set of points on the design efficiency is considered. Some modifications of the existing procedures for D–optimal designs generation are described. It is shown that for large number of factors the sequential procedures are more appropriate than the nonsequential ones  相似文献   

15.
Summary.  The problem of optimizing a number of simultaneous bets is considered, using primarily log-utility. Stochastic gradient-based algorithms for solving this problem are developed and compared with the simplex method. The solutions may be regarded as a generalization of 'Kelly staking' to the case of many simultaneous bets. Properties of the solutions are examined in two example cases using real odds from sports bookmakers. The algorithms that are developed also have wide applicability beyond sports betting and may be extended to general portfolio optimization problems, with any reasonable utility function.  相似文献   

16.
Single value design optimality criteria are often considered when selecting a response surface design. An alternative to a single value criterion is to evaluate prediction variance properties throughout the experimental region and to graphically display the results in a variance dispersion graph (VDG) (Giovannitti-Jensen and Myers (1989)). Three properties of interest are the spherical average, maximum, and minimum prediction variances. Currently, a computer-intensive optimization algorithm is utilized to evaluate these prediction variance properties. It will be shown that the average, maximum, and minimum spherical prediction variances for central composite designs and Box-Behnken designs can be derived analytically. These three prediction variances can be expressed as functions of the radius and the design parameters. These functions provide exact spherical prediction variance values eliminating the implementation of extensive computing involving algorithms which do not guarantee convergence. This research is concerned with the theoretical development of these analytical forms. Results are presented for hyperspherical and hypercuboidal regions.  相似文献   

17.
Importance sampling and control variates have been used as variance reduction techniques for estimating bootstrap tail quantiles and moments, respectively. We adapt each method to apply to both quantiles and moments, and combine the methods to obtain variance reductions by factors from 4 to 30 in simulation examples.We use two innovations in control variates—interpreting control variates as a re-weighting method, and the implementation of control variates using the saddlepoint; the combination requires only the linear saddlepoint but applies to general statistics, and produces estimates with accuracy of order n -1/2 B -1, where n is the sample size and B is the bootstrap sample size.We discuss two modifications to classical importance sampling—a weighted average estimate and a mixture design distribution. These modifications make importance sampling robust and allow moments to be estimated from the same bootstrap simulation used to estimate quantiles.  相似文献   

18.
ABSTRACT

Very fast automatic rejection algorithms were developed recently which allow us to generate random variates from large classes of unimodal distributions. They require the choice of several design points which decompose the domain of the distribution into small sub-intervals. The optimal choice of these points is an important but unsolved problem. Therefore, we present an approach that allows us to characterize optimal design points in the asymptotic case (when their number tends to infinity) under mild regularity conditions. We describe a short algorithm to calculate these asymptotically optimal points in practice. Numerical experiments indicate that they are very close to optimal even when only six or seven design points are calculated.  相似文献   

19.
In nonlinear regression problems, the assumption is usually made that parameter estimates will be approximately normally distributed. The accuracy of the approximation depends on the sample size and also on the intrinsic and parameter-effects curvatures. Based on these curvatures, criteria are defined here that indicate whether or not an experiment will lead to estimates with distributions well approximated by a normal distribution. An approach is motivated of optimizing a primary design criterion subject to satisfying constraints based on these nonnormality measures. The approach can be used either to I) find designs for a fixed sample size or to II) choose the sample size for the optimal design based on the primary objective so that the constraints are satisfied. This later objective is useful as the nonnormality measures decrease with the sample size. As the constraints are typically not concave functions over a set of design measures, the usual equivalence theorems of optimal design theory do not hold for the first approach, and numerical implementation is required. Examples are given, and a new notation using tensor products is introduced to define tractable general notation for the nonnormality measures.  相似文献   

20.
Real world applications of association rule mining have well-known problems of discovering a large number of rules, many of which are not interesting or useful for the application at hand. The algorithms for closed and maximal itemsets mining significantly reduce the volume of rules discovered and complexity associated with the task, but the implications of their use and important differences with respect to the generalization power, precision and recall when used in the classification problem have not been examined. In this paper, we present a systematic evaluation of the association rules discovered from frequent, closed and maximal itemset mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriate sequence of usage. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided as a whole and w.r.t individual classes. Empirical results confirm that with a proper combination of data mining and statistical analysis, a large number of non-significant, redundant and contradictive rules can be eliminated while preserving relatively high precision and recall. More importantly, the results reveal the important characteristics and differences between using frequent, closed and maximal itemsets for the classification task, and the effect of incorporating statistical/heuristic measures for optimizing such rule sets. With closed itemset mining already being a preferred choice for complexity and redundancy reduction during rule generation, this study has further confirmed that overall closed itemset based association rules are also of better quality in terms of classification precision and recall, and precision and recall on individual class examples. On the other hand maximal itemset based association rules, that are a subset of closed itemset based rules, show to be insufficient in this regard, and typically will have worse recall and generalization power. Empirical results also show the downfall of using the confidence measure at the start to generate association rules, as typically done within the association rule framework. Removing rules that occur below a certain confidence threshold, will also remove the knowledge of existence of any contradictions in the data to the relatively higher confidence rules, and thus precision can be increased by disregarding contradictive rules prior to application of confidence constraint.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号