期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Bootstrap adjustments of signed scoring rule root statistics

V. Mameli M. Musio L. Ventura 《统计学通讯:模拟与计算》2018,47(4):1204-1215

Scoring rules give rise to methods for statistical inference and are useful tools to achieve robustness or reduce computations. Scoring rule inference is generally performed through first-order approximations to the distribution of the scoring rule estimator or of the ratio-type statistic. In order to improve the accuracy of first-order methods even in simple models, we propose bootstrap adjustments of signed scoring rule root statistics for a scalar parameter of interest in presence of nuisance parameters. The method relies on the parametric bootstrap approach that avoids onerous calculations specific of analytical adjustments. Numerical examples illustrate the accuracy of the proposed method. 相似文献

2.

Distribution-free subset selection for incompletely ranked data

Guohua Pan Winson Taam 《Revue canadienne de statistique》1999,27(1):53-62

In market research and some other areas, it is common that a sample of n judges (consumers, evaluators, etc.) are asked to independently rank a series of k objects or candidates. It is usually difficult to obtain the judges' full cooperation to completely rank all k objects. A practical way to overcome this difficulty is to give each judge the freedom to choose the number of top candidates he is willing to rank. A frequently encountered question in this type of survey is how to select the best object or candidate from the incompletely ranked data. This paper proposes a subset selection procedure which constructs a random subset of all the k objects involved in the survey such that the best object is included in the subset with a prespecified confidence. It is shown that the proposed subset selection procedure is distribution-free over a very broad class of underlying distributions. An example from a market research study is used to illustrate the proposed procedure. 相似文献

3.

Subset selection procedures for ΔP-superior populations

S. Panchapakesan T. J. Santner 《统计学通讯:理论与方法》2013,42(11):1081-1090

In some ranking and selection problems it is reasonable to consider any population which is inferior but sufficiently close to the best (t-th best) as acceptable. Under this assumption, this paper studies classes of procedures to meet two possible goals. A and B. Goal A is to select a subset which contains only good populations, while Goal B is of a screening nature and requires selection of a subset of size not exceeding m (1 ≤ m ≤ k) and containing at least one good population. In each case results loading to the determination of the sample size required to attain the goals above with prespecified probability are obtained. Properties of the procedures are discussed. 相似文献

4.

Bayesian estimation of the multidimensional graded response model with nonignorable missing data

《Journal of Statistical Computation and Simulation》2012,82(11):1237-1252

A Bayesian approach is developed for analysing item response models with nonignorable missing data. The relevant model for the observed data is estimated concurrently in conjunction with the item response model for the missing-data process. Since the approach is fully Bayesian, it can be easily generalized to more complicated and realistic models, such as those models with covariates. Furthermore, the proposed approach is illustrated with item response data modelled as the multidimensional graded response models. Finally, a simulation study is conducted to assess the extent to which the bias caused by ignoring the missing-data mechanism can be reduced. 相似文献

5.

Categorical multiblock linear discriminant analysis

Philippe Casin 《Journal of applied statistics》2018,45(8):1396-1409

Techniques of credit scoring have been developed these last years in order to reduce the risk taken by banks and financial institutions in the loans that they are granting. Credit Scoring is a classification problem of individuals in one of the two following groups: defaulting borrowers or non-defaulting borrowers. The aim of this paper is to propose a new method of discrimination when the dependent variable is categorical and when a large number of categorical explanatory variables are retained. This method, Categorical Multiblock Linear Discriminant Analysis, computes components which take into account both relationships between explanatory categorical variables and canonical correlation between each explanatory categorical variable and the dependent variable. A comparison with three other techniques and an application on credit scoring data are provided. 相似文献

6.

Missing item imputation for quality-of-life instruments with application to asthma quality-of-life questionnaires

Wang J Rapatz G Lowy A Olson S Kuebler J 《Pharmaceutical statistics》2009,8(1):73-83

There has been increasing use of quality-of-life (QoL) instruments in drug development. Missing item values often occur in QoL data. A common approach to solve this problem is to impute the missing values before scoring. Several imputation procedures, such as imputing with the most correlated item and imputing with a row/column model or an item response model, have been proposed. We examine these procedures using data from two clinical trials, in which the original asthma quality-of-life questionnaire (AQLQ) and the miniAQLQ were used. We propose two modifications to existing procedures: truncating the imputed values to eliminate outliers and using the proportional odds model as the item response model for imputation. We also propose a novel imputation method based on a semi-parametric beta regression so that the imputed value is always in the correct range and illustrate how this approach can easily be implemented in commonly used statistical software. To compare these approaches, we deleted 5% of item values in the data according to three different missingness mechanisms, imputed them using these approaches and compared the imputed values with the true values. Our comparison showed that the row/column-model-based imputation with truncation generally performed better, whereas our new approach had better performance under a number scenarios. 相似文献

7.

On lower confidence bounds for pcs in truncated location parameter models

Shanti S. Gupta Leu Lii - Yuh TaChen Liang 《统计学通讯:理论与方法》2013,42(2):527-546

We are concerned with deriving lower confidence bounds for the probability of a correct selection in truncated location-parameter models. Two cases are considered according to whether the scale parameter is known or unknown. For each case, a lower confidence bound for the difference between the best and the second best is obtained. These lower confidence bounds are used to construct lower confidence bounds for the probability of a correct selection. The results are then applied to the problem of seleting the best exponential populationhaving the largest truncated location-parameter. Useful tables are provided for implementing the proposed methods. 相似文献

8.

An Investigation of Agreement Between the Item Difficulty Coefficient Calculated in Accordance With Classical Test Theory and Item Response Theory With Bland–Altman Method

Tülin Acar 《统计学通讯:理论与方法》2013,42(21):4614-4621

The purpose of this study is to investigate agreement between item difficulty coefficients calculated relying on classical test theory and item response theory with Bland–Altman method. According to results, although there is a high correlation between Pj and b coefficient estimated with HGLM (hierarchical generalized linear model), 1P, and 3P models, it can be said that there is no agreement between two methods and cannot be used interchangeably. It is observed that the confidence limit is wide according to Bland–Altman graphics. Therefore, it can be said that there is no agreement between item difficulty values obtained from two methods. Bland–Altman method which is used in clinical studies mostly is suggested to be used in the comparison of methods used especially in the evaluation of student performance in education, in agreement studies among specialist considerations especially in terms of providing additional information to the studies in which correlation coefficient is calculated. 相似文献

9.

Variable selection in classification model via quadratic programming

Jun Huang Wei Wang 《统计学通讯:模拟与计算》2018,47(7):1922-1939

Variable selection is an important decision process in consumer credit scoring. However, with the rapid growth in credit industry, especially, after the rising of e-commerce, a huge amount of information on customer behavior is available to provide more informative implication of consumer credit scoring. In this study, a hybrid quadratic programming model is proposed for consumer credit scoring problems by variable selection. The proposed model is then solved with a bisection method based on Tabu search algorithm (BMTS), and the solution of this model provides alternative subsets of variables in different sizes. The final subset of variables used in consumer credit scoring model is selected based on both the size (number of variables in a subset) and predictive (classification) accuracy rate. Simulation studies are used to measure the performances of the proposed model, illustrating its effectiveness for simultaneous variable selection as well as classification. 相似文献

10.

Bayesian factor models in characterizing molecular adaptation

Saheli Datta Raquel Prado Abel Rodríguez 《Journal of applied statistics》2013,40(7):1402-1424

Assessing the selective influence of amino acid properties is important in understanding evolution at the molecular level. A collection of methods and models has been developed in recent years to determine if amino acid sites in a given DNA sequence alignment display substitutions that are altering or conserving a prespecified set of amino acid properties. Residues showing an elevated number of substitutions that favorably alter a physicochemical property are considered targets of positive natural selection. Such approaches usually perform independent analyses for each amino acid property under consideration, without taking into account the fact that some of the properties may be highly correlated. We propose a Bayesian hierarchical regression model with latent factor structure that allows us to determine which sites display substitutions that conserve or radically change a set of amino acid properties, while accounting for the correlation structure that may be present across such properties. We illustrate our approach by analyzing simulated data sets and an alignment of lysin sperm DNA. 相似文献

11.

On some optimal selection procedures for weibull populations

Tong An Hsu 《统计学通讯:理论与方法》2013,42(23):2657-2668

Consider k (k >(>)2) Weibull populations. We shall derive a method of constructing optimal selection procedures to select a subset of the k populations containing the best population which control the size of the selected subset and which maximises the minimum probability of making a correct selection. Procedures and results are derived for the case when sample sizes are unequal. Some tables and figures are given at the end of this paper. 相似文献

12.

Collaborative filtering for massive multinomial data

Andrew Cron Liang Zhang Deepak Agarwal 《Journal of applied statistics》2014,41(4):701-715

Content recommendation on a webpage involves recommending content links (items) on multiple slots for each user visit to maximize some objective function, typically the click-through rate (CTR) which is the probability of clicking on an item for a given user visit. Most existing approaches to this problem assume user's response (click/no click) on different slots are independent of each other. This is problematic since in many scenarios CTR on a slot may depend on externalities like items recommended on other slots. Incorporating the effects of such externalities in the modeling process is important to better predictive accuracy. We therefore propose a hierarchical model that assumes a multinomial response for each visit to incorporate competition among slots and models complex interactions among (user, item, slot) combinations through factor models via a tensor approach. In addition, factors in our model are drawn with means that are based on regression functions of user/item covariates, which helps us obtain better estimates for users/items that are relatively new with little past activity. We show marked gains in predictive accuracy by various metrics. 相似文献

13.

Construction of disease risk scoring systems using logistic group lasso: application to porcine reproductive and respiratory syndrome survey data

Hui Lin Peng Liu Derald J. Holtkamp 《Journal of applied statistics》2013,40(4):736-746

We propose to utilize the group lasso algorithm for logistic regression to construct a risk scoring system for predicting disease in swine. This work is motivated by the need to develop a risk scoring system from survey data on risk factor for porcine reproductive and respiratory syndrome (PRRS), which is a major health, production and financial problem for swine producers in nearly every country. Group lasso provides an attractive solution to this research question because of its ability to achieve group variable selection and stabilize parameter estimates at the same time. We propose to choose the penalty parameter for group lasso through leave-one-out cross-validation, using the criterion of the area under the receiver operating characteristic curve. Survey data for 896 swine breeding herd sites in the USA and Canada completed between March 2005 and March 2009 are used to construct the risk scoring system for predicting PRRS outbreaks in swine. We show that our scoring system for PRRS significantly improves the current scoring system that is based on an expert opinion. We also show that our proposed scoring system is superior in terms of area under the curve to that developed using multiple logistic regression model selected based on variable significance. 相似文献

14.

Bayesian Estimation and Model Choice in Item Response Models

《Journal of Statistical Computation and Simulation》2012,82(3):217-232

Item response models are essential tools for analyzing results from many educational and psychological tests. Such models are used to quantify the probability of correct response as a function of unobserved examinee ability and other parameters explaining the difficulty and the discriminatory power of the questions in the test. Some of these models also incorporate a threshold parameter for the probability of the correct response to account for the effect of guessing the correct answer in multiple choice type tests. In this article we consider fitting of such models using the Gibbs sampler. A data augmentation method to analyze a normal-ogive model incorporating a threshold guessing parameter is introduced and compared with a Metropolis-Hastings sampling method. The proposed method is an order of magnitude more efficient than the existing method. Another objective of this paper is to develop Bayesian model choice techniques for model discrimination. A predictive approach based on a variant of the Bayes factor is used and compared with another decision theoretic method which minimizes an expected loss function on the predictive space. A classical model choice technique based on a modified likelihood ratio test statistic is shown as one component of the second criterion. As a consequence the Bayesian methods proposed in this paper are contrasted with the classical approach based on the likelihood ratio test. Several examples are given to illustrate the methods. 相似文献

15.

Bayesian analysis of competing risks with partially masked cause of failure

Sanjib Basu Ananda Sen Mousumi Banerjee 《Journal of the Royal Statistical Society. Series C, Applied statistics》2003,52(1):77-93

Summary. Bayesian analysis of system failure data from engineering applications under a competing risks framework is considered when the cause of failure may not have been exactly identified but has only been narrowed down to a subset of all potential risks. In statistical literature, such data are termed masked failure data. In addition to masking, failure times could be right censored owing to the removal of prototypes at a prespecified time or could be interval censored in the case of periodically acquired readings. In this setting, a general Bayesian formulation is investigated that includes most commonly used parametric lifetime distributions and that is sufficiently flexible to handle complex forms of censoring. The methodology is illustrated in two engineering applications with a special focus on model comparison issues. 相似文献

16.

Randomization procedures for analyzing clinical trial data with treatment related withdrawals

A. Richard Entsuah 《统计学通讯:理论与方法》2013,42(10):3859-3880

The problem of treatment related withdrawals has not been fully addressed in the statistical literature. Statistical procedures which compare efficacies of treatments when withdrawals may occur prior to the conclusion of a study are discussed. A unified test statistic which incorporates all available data, not just the last values and adjusts for withdrawal patterns, proportion of withdrawals, and level of response prior to withdrawal has been developed with the help of data- dependent scoring schemes. A randomization technique has been developed to compute an empirical significance level for each scoring system under both the Fulldata and Endpoint data for a specified parameter configuration. The proposed methods have been applied to a subset (Allen Park Hospital) of the VA study 127. 相似文献

17.

On eliminating inferior regression models

D.Y. Huang S. Panchapakesan 《统计学通讯:理论与方法》2013,42(7):751-759

Consider a linear regression model with [p-1] predictor variables which is taken as the "true" model.The goal Is to select a subset of all possible reduced models such that all inferior models ‘to be defined’ are excluded with a guaranteed minimum probability.A procedure is proposed for which the exact evaluation of the probability of a correct decision 1s difficult; however, 1t is shown that the probability requirement can be met for sufficiently large sample size.Monte Carlo evaluation of the constant associated with the procedure and some ways to reduce the amount of computations Involved in the Implementation of the procedure are discussed. 相似文献

18.

Large sample inference in random coefficient regression models

Randy L. Carter Mark C.K. Yang 《统计学通讯:理论与方法》2013,42(8):2507-2525

Random coefficient regression models have been used t odescribe repeated measures on members of a sample of n in dividuals . Previous researchers have proposed methods of estimating the mean parameters of such models. Their methods require that eachindividual be observed under the same settings of independent variablesor , lesss stringently , that the number of observations ,r , on each individual be the same. Under the latter restriction ,estimators of mean regression parameters exist which are consist ent as both r^→∞and n^→∞ and efficient as r^→∞, and large sample ( r large ) tests of mean parameters are available . These results are easily extended to the case where not a11 individuals are observed an equal number of times provided limit are taken as min(r) → ∞. Existing methods of inference , however, are not justified by the current literature when n is large and r is small, as is the case i n many bio-medical applications . The primary con tribution of the current paper is a derivation of the asymptotic properties of modifications of existing estimators as n alone tends to infinity, r fixed. From these properties it is shown that existing methods of inference, which are currently justified only when min(r) is large, are also justifiable when n is large and min(r) is small. A secondary contribution is the definition of a positive definite estimator of the covariance matrix for the random coefficients in these models. Use of this estimator avoids computational problems that can otherwise arise. 相似文献

19.

Post hoc false positive control for structured hypotheses

Guillermo Durand Gilles Blanchard Pierre Neuvial Etienne Roquain 《Scandinavian Journal of Statistics》2020,47(4):1114-1148

In a high-dimensional multiple testing framework, we present new confidence bounds on the false positives contained in subsets S of selected null hypotheses. These bounds are post hoc in the sense that the coverage probability holds simultaneously over all S, possibly chosen depending on the data. This article focuses on the common case of structured null hypotheses, for example, along a tree, a hierarchy, or geometrically (spatially or temporally). Following recent advances in post hoc inference, we build confidence bounds for some prespecified forest-structured subsets and deduce a bound for any subset S by interpolation. The proposed bounds are shown to improve substantially previous ones when the signal is locally structured. Our findings are supported both by theoretical results and numerical experiments. Moreover, our bounds can be obtained by an algorithm (with complexity bilinear in the sizes of the reference hierarchy and of the selected subset) that is implemented in the open-source R package sansSouci available from https://github.com/pneuvial/sanssouci , making our approach operational. 相似文献

20.

On a strengthening of the indifference zone approach to a generalized selection goal

Guido Giani 《统计学通讯:理论与方法》2013,42(10):3163-3171

The problem of selecting s out of k given compounts which contains at least c of the t best ones is considered. In the case of underlying distribution families with location or scale parameter it is shown that the indiffence zone approach can be strengthened to confidence statements for the parameters of the selected components. These confidence statements are valid over the entire parameter space without decreasing the infimum of the probability of a correct selection. 相似文献