首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In finance, inferences about future asset returns are typically quantified with the use of parametric distributions and single-valued probabilities. It is attractive to use less restrictive inferential methods, including nonparametric methods which do not require distributional assumptions about variables, and imprecise probability methods which generalize the classical concept of probability to set-valued quantities. Main attractions include the flexibility of the inferences to adapt to the available data and that the level of imprecision in inferences can reflect the amount of data on which these are based. This paper introduces nonparametric predictive inference (NPI) for stock returns. NPI is a statistical approach based on few assumptions, with inferences strongly based on data and with uncertainty quantified via lower and upper probabilities. NPI is presented for inference about future stock returns, as a measure for risk and uncertainty, and for pairwise comparison of two stocks based on their future aggregate returns. The proposed NPI methods are illustrated using historical stock market data.  相似文献   

2.
In reliability and lifetime testing, comparison of two groups of data is a common problem. It is often attractive, or even necessary, to make a quick and efficient decision in order to save time and costs. This paper presents a nonparametric predictive inference (NPI) approach to compare two groups, say X and Y, when one (or both) is (are) progressively censored. NPI can easily be applied to different types of progressive censoring schemes. NPI is a statistical approach based on few assumptions, with inferences strongly based on data and with uncertainty quantified via lower and upper probabilities. These inferences consider the event that the lifetime of a future unit from Y is greater than the lifetime of a future unit from X.  相似文献   

3.
Abstract

Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine, machine learning and credit scoring. The receiver operating characteristic (ROC) curve and surface are useful tools to assess the ability of diagnostic tests to discriminate between ordered classes or groups. To define these diagnostic tests, selecting the optimal thresholds that maximize the accuracy of these tests is required. One procedure that is commonly used to find the optimal thresholds is by maximizing what is known as Youden’s index. This article presents nonparametric predictive inference (NPI) for selecting the optimal thresholds of a diagnostic test. NPI is a frequentist statistical method that is explicitly aimed at using few modeling assumptions, enabled through the use of lower and upper probabilities to quantify uncertainty. Based on multiple future observations, the NPI approach is presented for selecting the optimal thresholds for two-group and three-group scenarios. In addition, a pairwise approach has also been presented for the three-group scenario. The article ends with an example to illustrate the proposed methods and a simulation study of the predictive performance of the proposed methods along with some classical methods such as Youden index. The NPI-based methods show some interesting results that overcome some of the issues concerning the predictive performance of Youden’s index.  相似文献   

4.
Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine, machine learning, and credit scoring. The receiver operating characteristic (ROC) surface is a useful tool to assess the ability of a diagnostic test to discriminate among three-ordered classes or groups. In this article, nonparametric predictive inference (NPI) for three-group ROC analysis for ordinal outcomes is presented. NPI is a frequentist statistical method that is explicitly aimed at using few modeling assumptions, enabled through the use of lower and upper probabilities to quantify uncertainty. This article also includes results on the volumes under the ROC surfaces and consideration of the choice of decision thresholds for the diagnosis. Two examples are provided to illustrate our method.  相似文献   

5.
Nonparametric predictive inference (NPI) is a powerful frequentist statistical framework based only on an exchangeability assumption for future and past observations, made possible by the use of lower and upper probabilities. In this article, NPI is presented for ordinal data, which are categorical data with an ordering of the categories. The method uses a latent variable representation of the observations and categories on the real line. Lower and upper probabilities for events involving the next observation are presented, and briefly compared to NPI for non ordered categorical data. As application, the comparison of multiple groups of ordinal data is presented.  相似文献   

6.
We consider lifetime experiments to compare units from different groups, where the units’ lifetimes may be right censored. Nonparametric predictive inference for comparison of multiple groups is presented, in particular lower and upper probabilities for the event that a specific group will provide the largest next lifetime. We include the practically relevant consideration that the overall lifetime experiment may be terminated at an early stage, leading to simultaneous right-censoring of all units still in the experiment.  相似文献   

7.
The mixture maximum likelihood approach to clustering is used to allocate treatments from a randomized complete block de-sign into relatively homogeneous groups. The implementation of this approach is straightforward for fixed but not random block effects. The density function in each underlying group is assumed to be normal and clustering is performed on the basis of the estimated posterior probabilities of group membership. A test based on the log likelihood under the mixture model can be used to assess the actual number of groups present. The tech-nique is demonstrated by using it to cluster data from a random-ized complete block experiment.  相似文献   

8.
In this article, we present a model-based framework to estimate the educational attainments of students in latent groups defined by unobservable or only partially observed features that are likely to affect the outcome distribution, as well as being interesting to be investigated. We focus our attention on the case of students in the first year of the upper secondary schools, for which the teachers’ suggestion at the end of their lower educational level toward the subsequent type of school is available. We use this information to develop latent strata according to the compliance behavior of students simplifying to the case of binary data for both counseled and attended school (i.e., academic or technical institute). We consider a likelihood-based approach to estimate outcome distributions in the latent groups and propose a set of plausible assumptions with respect to the problem at hand. In order to assess our method and its robustness, we simulate data resembling a real study conducted on pupils of the province of Bologna in year 2007/2008 to investigate their success or failure at the end of the first school year.  相似文献   

9.
In several statistical problems, nonparametric confidence intervals for population quantiles can be constructed and their coverage probabilities can be computed exactly, but cannot in general be rendered equal to a pre-determined level. The same difficulty arises for coverage probabilities of nonparametric prediction intervals for future observations. One solution to this difficulty is to interpolate between intervals which have the closest coverage probability from above and below to the pre-determined level. In this paper, confidence intervals for population quantiles are constructed based on interpolated upper and lower records. Subsequently, prediction intervals are obtained for future upper records based on interpolated upper records. Additionally, we derive upper bounds for the coverage error of these confidence and prediction intervals. Finally, our results are applied to some real data sets. Also, a comparison via a simulation study is done with similar classical intervals obtained before.  相似文献   

10.
Clustered multinomial data with random cluster sizes commonly appear in health, environmental and ecological studies. Traditional approaches for analyzing clustered multinomial data contemplate two assumptions. One of these assumptions is that cluster sizes are fixed, whereas the other demands cluster sizes to be positive. Randomness of the cluster sizes may be the determinant of the within-cluster correlation and between-cluster variation. We propose a baseline-category mixed model for clustered multinomial data with random cluster sizes based on Poisson mixed models. Our orthodox best linear unbiased predictor approach to this model depends only on the moment structure of unobserved distribution-free random effects. Our approach also consolidates the marginal and conditional modeling interpretations. Unlike the traditional methods, our approach can accommodate both random and zero cluster sizes. Two real-life multinomial data examples, crime data and food contamination data, are used to manifest our proposed methodology.  相似文献   

11.
New statistical procedures are introduced to analyse typical microRNA expression data sets. For each separate microRNA expression, the null hypothesis to be tested is that there is no difference between the distributions of the expression in different groups. The test statistics are then constructed having certain type of alternatives in mind. To avoid strong (parametric) distributional assumptions, the alternatives are formulated using probabilities of different orders of pairs or triples of observations coming from different groups, and the test statistics are then constructed using corresponding several‐sample U‐statistics, natural estimates of these probabilities. Classical several‐sample rank test statistics, such as the Kruskal–Wallis and Jonckheere–Terpstra tests, are special cases in our approach. Also, as the number of variables (microRNAs) is huge, we confront a serious simultaneous testing problem. Different approaches to control the family‐wise error rate or the false discovery rate are shortly discussed, and it is shown how the Chen–Stein theorem can be used to show that family‐wise error rate can be controlled for cluster‐dependent microRNAs under weak assumptions. The theory is illustrated with an analysis of real data, a microRNA expression data set on Finnish (aggressive and non‐aggressive) prostate cancer patients and their controls.  相似文献   

12.

The purpose of this paper is to show in regression clustering how to choose the most relevant solutions, analyze their stability, and provide information about best combinations of optimal number of groups, restriction factor among the error variance across groups and level of trimming. The procedure is based on two steps. First we generalize the information criteria of constrained robust multivariate clustering to the case of clustering weighted models. Differently from the traditional approaches which are based on the choice of the best solution found minimizing an information criterion (i.e. BIC), we concentrate our attention on the so called optimal stable solutions. In the second step, using the monitoring approach, we select the best value of the trimming factor. Finally, we validate the solution using a confirmatory forward search approach. A motivating example based on a novel dataset concerning the European Union trade of face masks shows the limitations of the current existing procedures. The suggested approach is initially applied to a set of well known datasets in the literature of robust regression clustering. Then, we focus our attention on a set of international trade datasets and we provide a novel informative way of updating the subset in the random start approach. The Supplementary material, in the spirit of the Special Issue, deepens the analysis of trade data and compares the suggested approach with the existing ones available in the literature.

  相似文献   

13.
Modeling survey data often requires having the knowledge of design and weighting variables. With public-use survey data, some of these variables may not be available for confidentiality reasons. The proposed approach can be used in this situation, as long as calibrated weights and variables specifying the strata and primary sampling units are available. It gives consistent point estimation and a pivotal statistics for testing and confidence intervals. The proposed approach does not rely on with-replacement sampling, single-stage, negligible sampling fractions, or noninformative sampling. Adjustments based on design effects, eigenvalues, joint-inclusion probabilities or bootstrap, are not needed. The inclusion probabilities and auxiliary variables do not have to be known. Multistage designs with unequal selection of primary sampling units are considered. Nonresponse can be easily accommodated if the calibrated weights include reweighting adjustment for nonresponse. We use an unconditional approach, where the variables and sample are random variables. The design can be informative.  相似文献   

14.
《统计学通讯:理论与方法》2012,41(16-17):3150-3161
We consider a new approach to deal with non ignorable non response on an outcome variable, in a causal inference framework. Assuming that a binary instrumental variable for non response is available, we provide a likelihood-based approach to identify and estimate heterogeneous causal effects of a binary treatment on specific latent subgroups of units, named principal strata, defined by the non response behavior under each level of the treatment and of the instrument. We show that, within each stratum, non response is ignorable and respondents can be properly compared by treatment status. In order to assess our method and its robustness when the usually invoked assumptions are relaxed or misspecified, we simulate data to resemble a real experiment conducted on a panel survey which compares different methods of reducing panel attrition.  相似文献   

15.
We investigate the problem of selecting the best population from positive exponential family distributions based on type-I censored data. A Bayes rule is derived and a monotone property of the Bayes selection rule is obtained. Following that property, we propose an early selection rule. Through this early selection rule, one can terminate the experiment on a few populations early and possibly make the final decision before the censoring time. An example is provided in the final part to illustrate the use of the early selection rule.  相似文献   

16.
A subset selection procedure is developed for selecting a subset containing the multinomial population that has the highest value of a certain linear combination of the multinomial cell probabilities; such population is called the ‘best’. The multivariate normal large sample approximation to the multinomial distribution is used to derive expressions for the probability of a correct selection, and for the threshold constant involved in the procedure. The procedure guarantees that the probability of a correct selection is at least at a pre-assigned level. The proposed procedure is an extension of Gupta and Sobel's [14] selection procedure for binomials and of Bakir's [2] restrictive selection procedure for multinomials. One illustration of the procedure concerns population income mobility in four countries: Peru, Russia, South Africa and the USA. Analysis indicates that Russia and Peru fall in the selected subset containing the best population with respect to income mobility from poverty to a higher-income status. The procedure is also applied to data concerning grade distribution for students in a certain freshman class.  相似文献   

17.
We consider kernel methods to construct nonparametric estimators of a regression function based on incomplete data. To tackle the presence of incomplete covariates, we employ Horvitz–Thompson-type inverse weighting techniques, where the weights are the selection probabilities. The unknown selection probabilities are themselves estimated using (1) kernel regression, when the functional form of these probabilities are completely unknown, and (2) the least-squares method, when the selection probabilities belong to a known class of candidate functions. To assess the overall performance of the proposed estimators, we establish exponential upper bounds on the \(L_p\) norms, \(1\le p<\infty \), of our estimators; these bounds immediately yield various strong convergence results. We also apply our results to deal with the important problem of statistical classification with partially observed covariates.  相似文献   

18.
In recent years, high failure rates in phase III trials were observed. One of the main reasons is overoptimistic assumptions for the planning of phase III resulting from limited phase II information and/or unawareness of realistic success probabilities. We present an approach for planning a phase II trial in a time‐to‐event setting that considers the whole phase II/III clinical development programme. We derive stopping boundaries after phase II that minimise the number of events under side conditions for the conditional probabilities of correct go/no‐go decision after phase II as well as the conditional success probabilities for phase III. In addition, we give general recommendations for the choice of phase II sample size. Our simulations show that unconditional probabilities of go/no‐go decision as well as the unconditional success probabilities for phase III are influenced by the number of events observed in phase II. However, choosing more than 150 events in phase II seems not necessary as the impact on these probabilities then becomes quite small. We recommend considering aspects like the number of compounds in phase II and the resources available when determining the sample size. The lower the number of compounds and the lower the resources are for phase III, the higher the investment for phase II should be. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

19.
Using a forward selection procedure for selecting the best subset of regression variables involves the calculation of critical values (cutoffs) for an F-ratio at each step of a multistep search process. On dropping the restrictive (unrealistic) assumptions used in previous works, the null distribution of the F-ratio depends on unknown regression parameters for the variables already included in the subset. For the case of known σ, by conditioning the F-ratio on the set of regressors included so far and also on the observed (estimated) values of their regression coefficients, we obtain a forward selection procedure whose stepwise type I error does not depend on the unknown (nuisance) parameters. A numerical example with an orthogonal design matrix illustrates the difference between conditional cutoffs, cutoffs for the centralF-distribution, and cutoffs suggested by Pope and Webster.  相似文献   

20.
In many diagnostic studies, multiple diagnostic tests are performed on each subject or multiple disease markers are available. Commonly, the information should be combined to improve the diagnostic accuracy. We consider the problem of comparing the discriminatory abilities between two groups of biomarkers. Specifically, this article focuses on confidence interval estimation of the difference between paired AUCs based on optimally combined markers under the assumption of multivariate normality. Simulation studies demonstrate that the proposed generalized variable approach provides confidence intervals with satisfying coverage probabilities at finite sample sizes. The proposed method can also easily provide P-values for hypothesis testing. Application to analysis of a subset of data from a study on coronary heart disease illustrates the utility of the method in practice.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号