首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
Proper scoring rules are devices for encouraging honest assessment of probability distributions. Just like log‐likelihood, which is a special case, a proper scoring rule can be applied to supply an unbiased estimating equation for any statistical model, and the theory of such equations can be applied to understand the properties of the associated estimator. In this paper, we discuss some novel applications of scoring rules to parametric inference. In particular, we focus on scoring rule test statistics, and we propose suitable adjustments to allow reference to the usual asymptotic chi‐squared distribution. We further explore robustness and interval estimation properties, by both theory and simulations.  相似文献   

2.
A scoring rule for evaluating the usefulness of an assessed prior distribution should reflect the purpose for which the distribution is to be used. In this paper we suppose that sample data is to become available and that the posterior distribution will be used to estimate some quantity under a quadratic loss function. The utility of a prior distribution is consequently determined by its preposterior expected quadratic loss. It is shown that this loss function has properties desirable in a scoring rule and formulae are derived for calculating the scores it gives in some common problems. Many scoring rules give a very poor score to any improper prior distribution but, in contrast, the scoring rule proposed here provides a meaningful measure for comparing the usefulness of assessed prior distributions and non-informative (improper) prior distributions. Results for making this comparison in various situations are also given.  相似文献   

3.
4.
We propose to utilize the group lasso algorithm for logistic regression to construct a risk scoring system for predicting disease in swine. This work is motivated by the need to develop a risk scoring system from survey data on risk factor for porcine reproductive and respiratory syndrome (PRRS), which is a major health, production and financial problem for swine producers in nearly every country. Group lasso provides an attractive solution to this research question because of its ability to achieve group variable selection and stabilize parameter estimates at the same time. We propose to choose the penalty parameter for group lasso through leave-one-out cross-validation, using the criterion of the area under the receiver operating characteristic curve. Survey data for 896 swine breeding herd sites in the USA and Canada completed between March 2005 and March 2009 are used to construct the risk scoring system for predicting PRRS outbreaks in swine. We show that our scoring system for PRRS significantly improves the current scoring system that is based on an expert opinion. We also show that our proposed scoring system is superior in terms of area under the curve to that developed using multiple logistic regression model selected based on variable significance.  相似文献   

5.
Scoring rules give rise to methods for statistical inference and are useful tools to achieve robustness or reduce computations. Scoring rule inference is generally performed through first-order approximations to the distribution of the scoring rule estimator or of the ratio-type statistic. In order to improve the accuracy of first-order methods even in simple models, we propose bootstrap adjustments of signed scoring rule root statistics for a scalar parameter of interest in presence of nuisance parameters. The method relies on the parametric bootstrap approach that avoids onerous calculations specific of analytical adjustments. Numerical examples illustrate the accuracy of the proposed method.  相似文献   

6.
In this paper, we investigate the relative merits of rain rules for one-day cricket matches. We suggest that interrupted one-day matches present a missing data problem: the outcome of the complete match cannot be observed, and instead the outcome of the interrupted match, as determined at least in part by the rain rule in question, is observed. Viewing the outcome of the interrupted match as an imputation of the missing outcome of the complete match, standard characteristics to assess the performance of classification tests can be used to assess the performance of a rain rule. In particular, we consider the overall and conditional accuracy and the predictive value of a rain rule. We propose two requirements for a ‘fair’ rain rule, and show that a fair rain rule must satisfy an identity involving its conditional accuracies. Estimating the performance characteristics of various rain rules from a sample of complete one-day matches our results suggest that the Duckworth–Lewis method, currently adopted by the International Cricket Council, is essentially as accurate as and somewhat more fair than its best competitors. A rain rule based on the iso-probability principle also performs well but might benefit from re-calibration using a more representative data base.  相似文献   

7.
The problems of assessing, comparing and combining probability forecasts for a binary events sequence are considered. A Gaussian threshold model (analytically of closed form) is introduced which allows generation of different probability forecast sequences valid for the same events. Chi - squared type test statistics, and also a marginal-conditional method are proposed for the assessment problem, and an asymptotic normality result is given. A graphical method is developed for the comparison problem, based upon decomposing arbitrary proper scoring rules into certain elementary scoring functions. The special role of the logarithmic scoring rule is examined in the context of Neyman - Pearson theory.  相似文献   

8.
The naïve Bayes rule (NBR) is a popular and often highly effective technique for constructing classification rules. This study examines the effectiveness of NBR as a method for constructing classification rules (credit scorecards) in the context of screening credit applicants (credit scoring). For this purpose, the study uses two real-world credit scoring data sets to benchmark NBR against linear discriminant analysis, logistic regression analysis, k-nearest neighbours, classification trees and neural networks. Of the two aforementioned data sets, the first one is taken from a major Greek bank whereas the second one is the Australian Credit Approval data set taken from the UCI Machine Learning Repository (available at http://www.ics.uci.edu/~mlearn/MLRepository.html). The predictive ability of scorecards is measured by the total percentage of correctly classified cases, the Gini coefficient and the bad rate amongst accepts. In each of the data sets, NBR is found to have a lower predictive ability than some of the other five methods under all measures used. Reasons that may negatively affect the predictive ability of NBR relative to that of alternative methods in the context of credit scoring are examined.  相似文献   

9.
One method of expressing coarse information about the shape of an object is to describe the shape by its landmarks, which can be taken as meaningful points on the outline of an object. We consider a situation in which we want to classify shapes into known populations based on their landmarks, invariant to the location, scale and rotation of the shapes. A neural network method for transformation-invariant classification of landmark data is presented. The method is compared with the (non-transformation-invariant) complex Bingham rule; the two techniques are tested on two sets of simulated data, and on data that arise from mice vertebrae. Despite the obvious advantage of the complex Bingham rule because of information about rotation, the neural network method compares favourably.  相似文献   

10.
We review the Fisher scoring and EM algorithms for incomplete multivariate data from an estimating function point of view, and examine the corresponding quasi-score functions under second-moment assumptions. A bias-corrected REML-type estimator for the covariance matrix is derived, and the Fisher, Godambe and empirical sandwich information matrices are compared. We make a numerical investigation of the two algorithms, and compare with a hybrid algorithm, where Fisher scoring is used for the mean vector and the EM algorithm for the covariance matrix.  相似文献   

11.
We investigate the problem of selecting the best population from positive exponential family distributions based on type-I censored data. A Bayes rule is derived and a monotone property of the Bayes selection rule is obtained. Following that property, we propose an early selection rule. Through this early selection rule, one can terminate the experiment on a few populations early and possibly make the final decision before the censoring time. An example is provided in the final part to illustrate the use of the early selection rule.  相似文献   

12.
13.
Regression methods for common data types such as measured, count and categorical variables are well understood but increasingly statisticians need ways to model relationships between variable types such as shapes, curves, trees, correlation matrices and images that do not fit into the standard framework. Data types that lie in metric spaces but not in vector spaces are difficult to use within the usual regression setting, either as the response and/or a predictor. We represent the information in these variables using distance matrices which requires only the specification of a distance function. A low-dimensional representation of such distance matrices can be obtained using methods such as multidimensional scaling. Once these variables have been represented as scores, an internal model linking the predictors and the responses can be developed using standard methods. We call scoring as the transformation from a new observation to a score, whereas backscoring is a method to represent a score as an observation in the data space. Both methods are essential for prediction and explanation. We illustrate the methodology for shape data, unregistered curve data and correlation matrices using motion capture data from an experiment to study the motion of children with cleft lip.  相似文献   

14.
A rule of thumb for testing symmetry of an unknown univariate continuous distribution against the alternative of a long right tail is proposed. Our proposed test is based on the concept of exceedance statistic and is ad hoc in nature. Exact performances of the proposed rule are investigated in detail. Some results from an asymptotic point of view are also provided. We compare our proposed test with several classical tests which are practically applicable and are known to be exact or nearly distribution free. We see that the proposed rule is better than most of the existing tests for symmetry and can be applied with ease. An illustration with real data is provided.  相似文献   

15.
A Bayesian discovery procedure   总被引:1,自引:0,他引:1  
Summary.  We discuss a Bayesian discovery procedure for multiple-comparison problems. We show that, under a coherent decision theoretic framework, a loss function combining true positive and false positive counts leads to a decision rule that is based on a threshold of the posterior probability of the alternative. Under a semiparametric model for the data, we show that the Bayes rule can be approximated by the optimal discovery procedure, which was recently introduced by Storey. Improving the approximation leads us to a Bayesian discovery procedure, which exploits the multiple shrinkage in clusters that are implied by the assumed non-parametric model. We compare the Bayesian discovery procedure and the optimal discovery procedure estimates in a simple simulation study and in an assessment of differential gene expression based on microarray data from tumour samples. We extend the setting of the optimal discovery procedure by discussing modifications of the loss function that lead to different single-thresholding statistics. Finally, we provide an application of the previous arguments to dependent (spatial) data.  相似文献   

16.
ABSTRACT

We consider multiple regression (MR) model averaging using the focused information criterion (FIC). Our approach is motivated by the problem of implementing a mean-variance portfolio choice rule. The usual approach is to estimate parameters ignoring the intention to use them in portfolio choice. We develop an estimation method that focuses on the trading rule of interest. Asymptotic distributions of submodel estimators in the MR case are derived using a localization framework. The localization is of both regression coefficients and error covariances. Distributions of submodel estimators are used for model selection with the FIC. This allows comparison of submodels using the risk of portfolio rule estimators. FIC model averaging estimators are then characterized. This extension further improves risk properties. We show in simulations that applying these methods in the portfolio choice case results in improved estimates compared with several competitors. An application to futures data shows superior performance as well.  相似文献   

17.
Because many illnesses show heterogeneous response to treatment, there is increasing interest in individualizing treatment to patients [11]. An individualized treatment rule is a decision rule that recommends treatment according to patient characteristics. We consider the use of clinical trial data in the construction of an individualized treatment rule leading to highest mean response. This is a difficult computational problem because the objective function is the expectation of a weighted indicator function that is non-concave in the parameters. Furthermore there are frequently many pretreatment variables that may or may not be useful in constructing an optimal individualized treatment rule yet cost and interpretability considerations imply that only a few variables should be used by the individualized treatment rule. To address these challenges we consider estimation based on l(1) penalized least squares. This approach is justified via a finite sample upper bound on the difference between the mean response due to the estimated individualized treatment rule and the mean response due to the optimal individualized treatment rule.  相似文献   

18.
Non parametric approaches to classification have gained significant attention in the last two decades. In this paper, we propose a classification methodology based on the multivariate rank functions and show that it is a Bayes rule for spherically symmetric distributions with a location shift. We show that a rank-based classifier is equivalent to optimal Bayes rule under suitable conditions. We also present an affine invariant version of the classifier. To accommodate different covariance structures, we construct a classifier based on the central rank region. Asymptotic properties of these classification methods are studied. We illustrate the performance of our proposed methods in comparison to some other depth-based classifiers using simulated and real data sets.  相似文献   

19.
Summary.  We present an approach for correcting for interobserver measurement error in an ordinal logistic regression model taking into account also the variability of the estimated correction terms. The different scoring behaviour of the 16 examiners complicated the identification of a geographical trend in a recent study on caries experience in Flemish children (Belgium) who were 7 years old. Since the measurement error is on the response the factor 'examiner' could be included in the regression model to correct for its confounding effect. However, controlling for examiner largely removed the geographical east–west trend. Instead, we suggest a (Bayesian) ordinal logistic model which corrects for the scoring error (compared with a gold standard) using a calibration data set. The marginal posterior distribution of the regression parameters of interest is obtained by integrating out the correction terms pertaining to the calibration data set. This is done by processing two Markov chains sequentially, whereby one Markov chain samples the correction terms. The sampled correction term is imputed in the Markov chain pertaining to the regression parameters. The model was fitted to the oral health data of the Signal–Tandmobiel® study. A WinBUGS program was written to perform the analysis.  相似文献   

20.
We discuss the analysis of mark-recapture data when the aim is to quantify density dependence between survival rate and abundance. We describe an analysis for a random effects model that includes a linear relationship between abundance and survival using an errors-in-variables regression estimator with analytical adjustment for approximate bias. The analysis is illustrated using data from short-tailed shearwaters banded for 48 consecutive years at Fisher Island, Tasmania, and Hutton's shearwater banded at Kaikoura, New Zealand for nine consecutive years. The Fisher Island data provided no evidence of a density dependence relationship between abundance and survival, and confidence interval widths rule out anything but small density dependent effects. The Hutton's shearwater data were equivocal with the analysis unable to rule out anything but a very strong density dependent relationship between survival and abundance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号