期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Nonparametric predictive inference for stock returns

Rebecca M. Baker Frank P. A. Coolen 《Journal of applied statistics》2017,44(8):1333-1349

In finance, inferences about future asset returns are typically quantified with the use of parametric distributions and single-valued probabilities. It is attractive to use less restrictive inferential methods, including nonparametric methods which do not require distributional assumptions about variables, and imprecise probability methods which generalize the classical concept of probability to set-valued quantities. Main attractions include the flexibility of the inferences to adapt to the available data and that the level of imprecision in inferences can reflect the amount of data on which these are based. This paper introduces nonparametric predictive inference (NPI) for stock returns. NPI is a statistical approach based on few assumptions, with inferences strongly based on data and with uncertainty quantified via lower and upper probabilities. NPI is presented for inference about future stock returns, as a measure for risk and uncertainty, and for pairwise comparison of two stocks based on their future aggregate returns. The proposed NPI methods are illustrated using historical stock market data. 相似文献

2.

Nonparametric predictive comparison of lifetime data under progressive censoring

Tahani A. Maturi Pauline Coolen-Schrijner Frank P.A. Coolen 《Journal of statistical planning and inference》2010

In reliability and lifetime testing, comparison of two groups of data is a common problem. It is often attractive, or even necessary, to make a quick and efficient decision in order to save time and costs. This paper presents a nonparametric predictive inference (NPI) approach to compare two groups, say X and Y, when one (or both) is (are) progressively censored. NPI can easily be applied to different types of progressive censoring schemes. NPI is a statistical approach based on few assumptions, with inferences strongly based on data and with uncertainty quantified via lower and upper probabilities. These inferences consider the event that the lifetime of a future unit from Y is greater than the lifetime of a future unit from X. 相似文献

3.

Nonparametric predictive inference for diagnostic test thresholds

《统计学通讯:理论与方法》2012,41(3):697-725

Abstract

Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine, machine learning and credit scoring. The receiver operating characteristic (ROC) curve and surface are useful tools to assess the ability of diagnostic tests to discriminate between ordered classes or groups. To define these diagnostic tests, selecting the optimal thresholds that maximize the accuracy of these tests is required. One procedure that is commonly used to find the optimal thresholds is by maximizing what is known as Youden’s index. This article presents nonparametric predictive inference (NPI) for selecting the optimal thresholds of a diagnostic test. NPI is a frequentist statistical method that is explicitly aimed at using few modeling assumptions, enabled through the use of lower and upper probabilities to quantify uncertainty. Based on multiple future observations, the NPI approach is presented for selecting the optimal thresholds for two-group and three-group scenarios. In addition, a pairwise approach has also been presented for the three-group scenario. The article ends with an example to illustrate the proposed methods and a simulation study of the predictive performance of the proposed methods along with some classical methods such as Youden index. The NPI-based methods show some interesting results that overcome some of the issues concerning the predictive performance of Youden’s index. 相似文献

4.

Three-group ROC predictive analysis for ordinal outcomes

Tahani Coolen-Maturi 《统计学通讯:理论与方法》2017,46(19):9476-9493

Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine, machine learning, and credit scoring. The receiver operating characteristic (ROC) surface is a useful tool to assess the ability of a diagnostic test to discriminate among three-ordered classes or groups. In this article, nonparametric predictive inference (NPI) for three-group ROC analysis for ordinal outcomes is presented. NPI is a frequentist statistical method that is explicitly aimed at using few modeling assumptions, enabled through the use of lower and upper probabilities to quantify uncertainty. This article also includes results on the volumes under the ROC surfaces and consideration of the choice of decision thresholds for the diagnosis. Two examples are provided to illustrate our method. 相似文献

5.

Nonparametric Predictive Inference for Ordinal Data

F. P. A. Coolen P. Coolen-Schrijner T. Coolen-Maturi F. F. Elkhafifi 《统计学通讯:理论与方法》2013,42(19):3478-3496

Nonparametric predictive inference (NPI) is a powerful frequentist statistical framework based only on an exchangeability assumption for future and past observations, made possible by the use of lower and upper probabilities. In this article, NPI is presented for ordinal data, which are categorical data with an ordering of the categories. The method uses a latent variable representation of the observations and categories on the real line. Lower and upper probabilities for events involving the next observation are presented, and briefly compared to NPI for non ordered categorical data. As application, the comparison of multiple groups of ordinal data is presented. 相似文献

6.

Nonparametric Predictive Multiple Comparisons of Lifetime Data

Tahani Coolen-Maturi Pauline Coolen-Schrijner Frank P. A. Coolen 《统计学通讯:理论与方法》2013,42(22):4164-4181

We consider lifetime experiments to compare units from different groups, where the units’ lifetimes may be right censored. Nonparametric predictive inference for comparison of multiple groups is presented, in particular lower and upper probabilities for the event that a specific group will provide the largest next lifetime. We include the practically relevant consideration that the overall lifetime experiment may be terminated at an early stage, leading to simultaneous right-censoring of all units still in the experiment. 相似文献

7.

Cluster analysis in a randomized complete block design

K.E. Basford G.J. McLachlan 《统计学通讯:理论与方法》2013,42(2):451-463

The mixture maximum likelihood approach to clustering is used to allocate treatments from a randomized complete block de-sign into relatively homogeneous groups. The implementation of this approach is straightforward for fixed but not random block effects. The density function in each underlying group is assumed to be normal and clustering is performed on the basis of the estimated posterior probabilities of group membership. A test based on the log likelihood under the mixture model can be used to assess the actual number of groups present. The tech-nique is demonstrated by using it to cluster data from a random-ized complete block experiment. 相似文献

8.

A Model-based Approach to Measure School Achievements in Latent Groups of Students: A Simulation Study

Giulia Roli Paola Monari 《统计学通讯:模拟与计算》2016,45(5):1655-1663

In this article, we present a model-based framework to estimate the educational attainments of students in latent groups defined by unobservable or only partially observed features that are likely to affect the outcome distribution, as well as being interesting to be investigated. We focus our attention on the case of students in the first year of the upper secondary schools, for which the teachers’ suggestion at the end of their lower educational level toward the subsequent type of school is available. We use this information to develop latent strata according to the compliance behavior of students simplifying to the case of binary data for both counseled and attended school (i.e., academic or technical institute). We consider a likelihood-based approach to estimate outcome distributions in the latent groups and propose a set of plausible assumptions with respect to the problem at hand. In order to assess our method and its robustness, we simulate data resembling a real study conducted on pupils of the province of Bologna in year 2007/2008 to investigate their success or failure at the end of the first school year. 相似文献

9.

Confidence and prediction intervals based on interpolated records

Jafar Ahmadi Elham Basiri Debasis Kundu 《Journal of nonparametric statistics》2017,29(1):1-21

In several statistical problems, nonparametric confidence intervals for population quantiles can be constructed and their coverage probabilities can be computed exactly, but cannot in general be rendered equal to a pre-determined level. The same difficulty arises for coverage probabilities of nonparametric prediction intervals for future observations. One solution to this difficulty is to interpolate between intervals which have the closest coverage probability from above and below to the pre-determined level. In this paper, confidence intervals for population quantiles are constructed based on interpolated upper and lower records. Subsequently, prediction intervals are obtained for future upper records based on interpolated upper records. Additionally, we derive upper bounds for the coverage error of these confidence and prediction intervals. Finally, our results are applied to some real data sets. Also, a comparison via a simulation study is done with similar classical intervals obtained before. 相似文献

10.

Modeling proportions and marginal counts simultaneously for clustered multinomial data with random cluster sizes

Guohua Yan Renjun Ma 《Journal of applied statistics》2016,43(6):1074-1087

Clustered multinomial data with random cluster sizes commonly appear in health, environmental and ecological studies. Traditional approaches for analyzing clustered multinomial data contemplate two assumptions. One of these assumptions is that cluster sizes are fixed, whereas the other demands cluster sizes to be positive. Randomness of the cluster sizes may be the determinant of the within-cluster correlation and between-cluster variation. We propose a baseline-category mixed model for clustered multinomial data with random cluster sizes based on Poisson mixed models. Our orthodox best linear unbiased predictor approach to this model depends only on the moment structure of unobserved distribution-free random effects. Our approach also consolidates the marginal and conditional modeling interpretations. Unlike the traditional methods, our approach can accommodate both random and zero cluster sizes. Two real-life multinomial data examples, crime data and food contamination data, are used to manifest our proposed methodology. 相似文献

11.

Generalized Mann–Whitney Type Tests for Microarray Experiments

Daniel Fischer Hannu Oja Johanna Schleutker Pranab K. Sen Tiina Wahlfors 《Scandinavian Journal of Statistics》2014,41(3):672-692

New statistical procedures are introduced to analyse typical microRNA expression data sets. For each separate microRNA expression, the null hypothesis to be tested is that there is no difference between the distributions of the expression in different groups. The test statistics are then constructed having certain type of alternatives in mind. To avoid strong (parametric) distributional assumptions, the alternatives are formulated using probabilities of different orders of pairs or triples of observations coming from different groups, and the test statistics are then constructed using corresponding several‐sample U‐statistics, natural estimates of these probabilities. Classical several‐sample rank test statistics, such as the Kruskal–Wallis and Jonckheere–Terpstra tests, are special cases in our approach. Also, as the number of variables (microRNAs) is huge, we confront a serious simultaneous testing problem. Different approaches to control the family‐wise error rate or the false discovery rate are shortly discussed, and it is shown how the Chen–Stein theorem can be used to show that family‐wise error rate can be controlled for cluster‐dependent microRNAs under weak assumptions. The theory is illustrated with an analysis of real data, a microRNA expression data set on Finnish (aggressive and non‐aggressive) prostate cancer patients and their controls. 相似文献

12.

Semiautomatic robust regression clustering of international trade data

Torti Francesca Riani Marco Morelli Gianluca 《Statistical Methods and Applications》2021,30(3):863-894

The purpose of this paper is to show in regression clustering how to choose the most relevant solutions, analyze their stability, and provide information about best combinations of optimal number of groups, restriction factor among the error variance across groups and level of trimming. The procedure is based on two steps. First we generalize the information criteria of constrained robust multivariate clustering to the case of clustering weighted models. Differently from the traditional approaches which are based on the choice of the best solution found minimizing an information criterion (i.e. BIC), we concentrate our attention on the so called optimal stable solutions. In the second step, using the monitoring approach, we select the best value of the trimming factor. Finally, we validate the solution using a confirmatory forward search approach. A motivating example based on a novel dataset concerning the European Union trade of face masks shows the limitations of the current existing procedures. The suggested approach is initially applied to a set of well known datasets in the literature of robust regression clustering. Then, we focus our attention on a set of international trade datasets and we provide a novel informative way of updating the subset in the random start approach. The Supplementary material, in the spirit of the Special Issue, deepens the analysis of trade data and compares the suggested approach with the existing ones available in the literature.

相似文献

13.

Unconditional empirical likelihood approach for analytic use of public survey data

Yves G. Berger 《Scandinavian Journal of Statistics》2023,50(1):383-410

Modeling survey data often requires having the knowledge of design and weighting variables. With public-use survey data, some of these variables may not be available for confidentiality reasons. The proposed approach can be used in this situation, as long as calibrated weights and variables specifying the strata and primary sampling units are available. It gives consistent point estimation and a pivotal statistics for testing and confidence intervals. The proposed approach does not rely on with-replacement sampling, single-stage, negligible sampling fractions, or noninformative sampling. Adjustments based on design effects, eigenvalues, joint-inclusion probabilities or bootstrap, are not needed. The inclusion probabilities and auxiliary variables do not have to be known. Multistage designs with unequal selection of primary sampling units are considered. Nonresponse can be easily accommodated if the calibrated weights include reweighting adjustment for nonresponse. We use an unconditional approach, where the variables and sample are random variables. The design can be informative. 相似文献

14.

Estimation of Causal Effects in Latent Strata with an Encouragement for Response

《统计学通讯:理论与方法》2012,41(16-17):3150-3161

We consider a new approach to deal with non ignorable non response on an outcome variable, in a causal inference framework. Assuming that a binary instrumental variable for non response is available, we provide a likelihood-based approach to identify and estimate heterogeneous causal effects of a binary treatment on specific latent subgroups of units, named principal strata, defined by the non response behavior under each level of the treatment and of the instrument. We show that, within each stratum, non response is ignorable and respondents can be properly compared by treatment status. In order to assess our method and its robustness when the usually invoked assumptions are relaxed or misspecified, we simulate data to resemble a real experiment conducted on a panel survey which compares different methods of reducing panel attrition. 相似文献

15.

On selection procedures for positive exponential family distributions based on type-I censored data

Shanti S. Gupta Shuyuan He Jianjun Li 《Journal of statistical planning and inference》2003,110(1-2):11-21

We investigate the problem of selecting the best population from positive exponential family distributions based on type-I censored data. A Bayes rule is derived and a monotone property of the Bayes selection rule is obtained. Following that property, we propose an early selection rule. Through this early selection rule, one can terminate the experiment on a few populations early and possibly make the final decision before the censoring time. An example is provided in the final part to illustrate the use of the early selection rule. 相似文献

16.

A subset selection procedure for multinomial distributions

Saad T. Bakir 《Journal of applied statistics》2013,40(7):1608-1618

A subset selection procedure is developed for selecting a subset containing the multinomial population that has the highest value of a certain linear combination of the multinomial cell probabilities; such population is called the ‘best’. The multivariate normal large sample approximation to the multinomial distribution is used to derive expressions for the probability of a correct selection, and for the threshold constant involved in the procedure. The procedure guarantees that the probability of a correct selection is at least at a pre-assigned level. The proposed procedure is an extension of Gupta and Sobel's [14] selection procedure for binomials and of Bakir's [2] restrictive selection procedure for multinomials. One illustration of the procedure concerns population income mobility in four countries: Peru, Russia, South Africa and the USA. Analysis indicates that Russia and Peru fall in the selected subset containing the best population with respect to income mobility from poverty to a higher-income status. The procedure is also applied to data concerning grade distribution for students in a certain freshman class. 相似文献

17.

On the $$L_p$$ norms of kernel regression estimators for incomplete data with applications to classification

Timothy Reese Majid Mojirsheibani 《Statistical Methods and Applications》2017,26(1):81-112

We consider kernel methods to construct nonparametric estimators of a regression function based on incomplete data. To tackle the presence of incomplete covariates, we employ Horvitz–Thompson-type inverse weighting techniques, where the weights are the selection probabilities. The unknown selection probabilities are themselves estimated using (1) kernel regression, when the functional form of these probabilities are completely unknown, and (2) the least-squares method, when the selection probabilities belong to a known class of candidate functions. To assess the overall performance of the proposed estimators, we establish exponential upper bounds on the $L_p$ norms, $1\le p<\infty $, of our estimators; these bounds immediately yield various strong convergence results. We also apply our results to deal with the important problem of statistical classification with partially observed covariates. 相似文献

18.

Sample size planning for phase II trials based on success probabilities for phase III

下载免费PDF全文

Heiko Götte Armin Schüler Marietta Kirchner Meinhard Kieser 《Pharmaceutical statistics》2015,14(6):515-524

In recent years, high failure rates in phase III trials were observed. One of the main reasons is overoptimistic assumptions for the planning of phase III resulting from limited phase II information and/or unawareness of realistic success probabilities. We present an approach for planning a phase II trial in a time‐to‐event setting that considers the whole phase II/III clinical development programme. We derive stopping boundaries after phase II that minimise the number of events under side conditions for the conditional probabilities of correct go/no‐go decision after phase II as well as the conditional success probabilities for phase III. In addition, we give general recommendations for the choice of phase II sample size. Our simulations show that unconditional probabilities of go/no‐go decision as well as the unconditional success probabilities for phase III are influenced by the number of events observed in phase II. However, choosing more than 150 events in phase II seems not necessary as the impact on these probabilities then becomes quite small. We recommend considering aspects like the number of compounds in phase II and the resources available when determining the sample size. The lower the number of compounds and the lower the resources are for phase III, the higher the investment for phase II should be. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

19.

The use of conditional cutoffs in a forward selection procedure

Sh I. Pinsker Victor Kipnis Eugen Grechanovsky 《统计学通讯:理论与方法》2013,42(8):2227-2241

Using a forward selection procedure for selecting the best subset of regression variables involves the calculation of critical values (cutoffs) for an F-ratio at each step of a multistep search process. On dropping the restrictive (unrealistic) assumptions used in previous works, the null distribution of the F-ratio depends on unknown regression parameters for the variables already included in the subset. For the case of known σ, by conditioning the F-ratio on the set of regressors included so far and also on the observed (estimated) values of their regression coefficients, we obtain a forward selection procedure whose stepwise type I error does not depend on the unknown (nuisance) parameters. A numerical example with an orthogonal design matrix illustrates the difference between conditional cutoffs, cutoffs for the centralF-distribution, and cutoffs suggested by Pope and Webster. 相似文献

20.

Confidence interval estimation of the difference between paired AUCs based on combined biomarkers

Lili Tian Albert Vexler Li Yan Enrique F. Schisterman 《Journal of statistical planning and inference》2009

In many diagnostic studies, multiple diagnostic tests are performed on each subject or multiple disease markers are available. Commonly, the information should be combined to improve the diagnostic accuracy. We consider the problem of comparing the discriminatory abilities between two groups of biomarkers. Specifically, this article focuses on confidence interval estimation of the difference between paired AUCs based on optimally combined markers under the assumption of multivariate normality. Simulation studies demonstrate that the proposed generalized variable approach provides confidence intervals with satisfying coverage probabilities at finite sample sizes. The proposed method can also easily provide P-values for hypothesis testing. Application to analysis of a subset of data from a study on coronary heart disease illustrates the utility of the method in practice. 相似文献