首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
ABSTRACT

Sharp bounds on expected values of L-statistics based on a sample of possibly dependent, identically distributed random variables are given in the case when the sample size is a random variable with values in the set {0, 1, 2,…}. The dependence among observations is modeled by copulas and mixing. The bounds are attainable and provide characterizations of some non trivial distributions.  相似文献   

2.
A class of closed inverse sampling procedures R(n,m) for selecting the multinomial cell with the largest probability is considered; here n is the maximum sample size that an experimenter can take and m is the maximum frequency that a multinomial cell can have. The proposed procedures R(n,m) achieve the same probability of a correct selection as do the corresponding fixed sample size procedures and the curtailed sequential procedures when m is at least n/2. A monotonicity property on the probability of a correct selection is proved and it is used to find the least favorable configurations and to tabulate the necessary probabilities of a correct selection and corresponding expected sample sizes  相似文献   

3.
Assuming stratified simple random sampling, a confidence interval for a finite population quantile may be desired. Using a confidence interval with endpoints given by order statistics from the combined stratified sample, several procedures to obtain lower bounds (and approximations for the lower bounds) for the confidence coefficients are presented. The procedures differ with respect to the amount of prior information assumed about the var-iate values in the finite population, and the extent to which sample data is used to estimate the lower bounds.  相似文献   

4.
Several distribution-free bounds on expected values of L-statistics based on the sample of possibly dependent and nonidentically distributed random variables are given in the case when the sample size is a random variable, possibly dependent on the observations, with values in the set {1,2,…}. Some bounds extend the results of Papadatos (2001a) to the case of random sample size. The others provide new evaluations even if the sample size is nonrandom. Some applications of the presented bounds are also indicated.  相似文献   

5.
This paper deals with the problem of selecting the best population from among k(≥ 2) two-parameter exponential populations. New selection procedures are proposed for selecting the unique best. The procedures include preliminary tests which allow the xperimenter to have an option to not select if the statistical evidence is not significant. Two probabilities, the probability to make a selection and the probability of a correct selection, are controlled by these selection procedures. Comparisons between the proposed selection procedures and certain earlier existing procedures are also made. The results show the superiority of the proposed selection procedures in terms of the required sample size.  相似文献   

6.
A large sample approximation of the least favorable configuration for a fixed sample size selection procedure for negative binomial populations is proposed. A normal approximation of the selection procedure is also presented. Optimal sample sizes required to be drawn from each population and the bounds for the sample sizes are tabulated. Sample sizes obtained using the approximate least favorable configuration are compared with those obtained using the exact least favorable configuration. Alternate form of the normal approximation to the probability of correct selection is also presented. The relation between the required sample size and the number of populations involved is studied.  相似文献   

7.
We restrict attention to a class of Bernoulli subset selection procedures which take observations one-at-a-time and can be compared directly to the Gupta-Sobel single-stage procedure. For the criterion of minimizing the expected total number of observations required to terminate experimentation, we show that optimal sampling rules within this class are not of practical interest. We thus turn to procedures which, although not optimal, exhibit desirable behavior with regard to this criterion. A procedure which employs a modification of the so-called least-failures sampling rule is proposed, and is shown to possess many desirable properties among a restricted class of Bernoulli subset selection procedures. Within this class, it is optimal for minimizing the number of observations taken from populations excluded from consideration following a subset selection experiment, and asymptotically optimal for minimizing the expected total number of observations required. In addition, it can result in substantial savings in the expected total num¬ber of observations required as compared to a single-stage procedure, thus it may be de¬sirable to a practitioner if sampling is costly or the sample size is limited.  相似文献   

8.
We establish best upper bounds on the expected differences of records and sample maxima, and kth records and kth maxima based on sequences of independent random variables with identical continuous distribution and finite variance. The bounds are expressed in terms of the standard deviation units of the parent distribution. We also provide conditions for attaining the bounds.  相似文献   

9.
The two approaches to a multinomial ranking and selection problem (for selecting the t best cells out of k are combined to form a new apprach. In this new approach there is a preference zone (PZ) and an indifference zone (IZ), and the concept of a correct selection (CS) is defined differentlv in eac of these zones. Lower bounds for the probability of correct selection p(CS) are then guaranteed in each of these zones using a single experiment. The procedure on the ordered frequencies in the cells. The principle tool used to derive expressins for the p(CS). for the expected sample size EN, for the expected subsct size ES and for other probabilities. is the Dirichlet integral (Type 2) which was recent tabulated. These Dirichlet integrals are used to prove that the multiplicative slippage configuratin is leas favorable in the PZ and, for t = l, that the IZ. Numerical calculations are carried out for an illustrative example but extensive tables are not yet avalable

  相似文献   

10.
Studies of diagnostic tests are often designed with the goal of estimating the area under the receiver operating characteristic curve (AUC) because the AUC is a natural summary of a test's overall diagnostic ability. However, sample size projections dealing with AUCs are very sensitive to assumptions about the variance of the empirical AUC estimator, which depends on two correlation parameters. While these correlation parameters can be estimated from the available data, in practice it is hard to find reliable estimates before the study is conducted. Here we derive achievable bounds on the projected sample size that are free of these two correlation parameters. The lower bound is the smallest sample size that would yield the desired level of precision for some model, while the upper bound is the smallest sample size that would yield the desired level of precision for all models. These bounds are important reference points when designing a single or multi-arm study; they are the absolute minimum and maximum sample size that would ever be required. When the study design includes multiple readers or interpreters of the test, we derive bounds pertaining to the average reader AUC and the ‘pooled’ or overall AUC for the population of readers. These upper bounds for multireader studies are not too conservative when several readers are involved.  相似文献   

11.
Despite the simplicity of the Bernoulli process, developing good confidence interval procedures for its parameter—the probability of success p—is deceptively difficult. The binary data yield a discrete number of successes from a discrete number of trials, n. This discreteness results in actual coverage probabilities that oscillate with the n for fixed values of p (and with p for fixed n). Moreover, this oscillation necessitates a large sample size to guarantee a good coverage probability when p is close to 0 or 1.

It is well known that the Wilson procedure is superior to many existing procedures because it is less sensitive to p than any other procedures, therefore it is less costly. The procedures proposed in this article work as well as the Wilson procedure when 0.1 ≤p ≤ 0.9, and are even less sensitive (i.e., more robust) than the Wilson procedure when p is close to 0 or 1. Specifically, when the nominal coverage probability is 0.95, the Wilson procedure requires a sample size 1, 021 to guarantee that the coverage probabilities stay above 0.92 for any 0.001 ≤ min {p, 1 ?p} <0.01. By contrast, our procedures guarantee the same coverage probabilities but only need a sample size 177 without increasing either the expected interval width or the standard deviation of the interval width.  相似文献   

12.
Abstract. Lasso and other regularization procedures are attractive methods for variable selection, subject to a proper choice of shrinkage parameter. Given a set of potential subsets produced by a regularization algorithm, a consistent model selection criterion is proposed to select the best one among this preselected set. The approach leads to a fast and efficient procedure for variable selection, especially in high‐dimensional settings. Model selection consistency of the suggested criterion is proven when the number of covariates d is fixed. Simulation studies suggest that the criterion still enjoys model selection consistency when d is much larger than the sample size. The simulations also show that our approach for variable selection works surprisingly well in comparison with existing competitors. The method is also applied to a real data set.  相似文献   

13.
Testing between hypotheses, when independent sampling is possible, is a well developed subject. In this paper, we propose hypothesis tests that are applicable when the samples are obtained using Markov chain Monte Carlo. These tests are useful when one is interested in deciding whether the expected value of a certain quantity is above or below a given threshold. We show non-asymptotic error bounds and bounds on the expected number of samples for three types of tests, a fixed sample size test, a sequential test with indifference region, and a sequential test without indifference region. Our tests can lead to significant savings in sample size. We illustrate our results on an example of Bayesian parameter inference involving an ODE model of a biochemical pathway.  相似文献   

14.
Consider k (≥2) independent Type I extreme value populations with unknown location parameters and common known scale parameter. With samples of same size, we study procedures based on the sample means for (1) selecting the population having the largest location parameter, (2) selecting the population having the smallest location parameter, and (3) testing for equality of all the location parameters. We use Bechhofer's indifference-zone and Gupta's subset selection formulations. Tables of constants for implemention are provided based on approximation for the distribution of the standardized sample mean by a generalized Tukey's lambda distribution. Examples are provided for all procedures.  相似文献   

15.
The problem of selecting the best population from among a finite number of populations in the presence of uncertainty is a problem one faces in many scientific investigations, and has been studied extensively, Many selection procedures have been derived for different selection goals. However, most of these selection procedures, being frequentist in nature, don't tell how to incorporate the information in a particular sample to give a data-dependent measure of correct selection achieved for this particular sample. They often assign the same decision and probability of correct selection for two different sample values, one of which actually seems intuitively much more conclusive than the other. The methodology of conditional inference offers an approach which achieves both frequentist interpret ability and a data-dependent measure of conclusiveness. By partitioning the sample space into a family of subsets, the achieved probability of correct selection is computed by conditioning on which subset the sample falls in. In this paper, the partition considered is the so called continuum partition, while the selection rules are both the fixed-size and random-size subset selection rules. Under the distributional assumption of being monotone likelihood ratio, results on least favourable configuration and alpha-correct selection are established. These re-sults are not only useful in themselves, but also are used to design a new sequential procedure with elimination for selecting the best of k Binomial populations. Comparisons between this new procedure and some other se-quential selection procedures with regard to total expected sample size and some risk functions are carried out by simulations.  相似文献   

16.
The poor performance of the Wald method for constructing confidence intervals (CIs) for a binomial proportion has been demonstrated in a vast literature. The related problem of sample size determination needs to be updated and comparative studies are essential to understanding the performance of alternative methods. In this paper, the sample size is obtained for the Clopper–Pearson, Bayesian (Uniform and Jeffreys priors), Wilson, Agresti–Coull, Anscombe, and Wald methods. Two two-step procedures are used: one based on the expected length (EL) of the CI and another one on its first-order approximation. In the first step, all possible solutions that satisfy the optimal criterion are obtained. In the second step, a single solution is proposed according to a new criterion (e.g. highest coverage probability (CP)). In practice, it is expected a sample size reduction, therefore, we explore the behavior of the methods admitting 30% and 50% of losses. For all the methods, the ELs are inflated, as expected, but the coverage probabilities remain close to the original target (with few exceptions). It is not easy to suggest a method that is optimal throughout the range (0, 1) for p. Depending on whether the goal is to achieve CP approximately or above the nominal level different recommendations are made.  相似文献   

17.
Risk estimation is an important statistical question for the purposes of selecting a good estimator (i.e., model selection) and assessing its performance (i.e., estimating generalization error). This article introduces a general framework for cross-validation and derives distributional properties of cross-validated risk estimators in the context of estimator selection and performance assessment. Arbitrary classes of estimators are considered, including density estimators and predictors for both continuous and polychotomous outcomes. Results are provided for general full data loss functions (e.g., absolute and squared error, indicator, negative log density). A broad definition of cross-validation is used in order to cover leave-one-out cross-validation, V-fold cross-validation, Monte Carlo cross-validation, and bootstrap procedures. For estimator selection, finite sample risk bounds are derived and applied to establish the asymptotic optimality of cross-validation, in the sense that a selector based on a cross-validated risk estimator performs asymptotically as well as an optimal oracle selector based on the risk under the true, unknown data generating distribution. The asymptotic results are derived under the assumption that the size of the validation sets converges to infinity and hence do not cover leave-one-out cross-validation. For performance assessment, cross-validated risk estimators are shown to be consistent and asymptotically linear for the risk under the true data generating distribution and confidence intervals are derived for this unknown risk. Unlike previously published results, the theorems derived in this and our related articles apply to general data generating distributions, loss functions (i.e., parameters), estimators, and cross-validation procedures.  相似文献   

18.
Smoothing methods for curve estimation have received considerable attention in statistics with a wide range of applications. However, to our knowledge, sample size planning for testing significance of curves has not been discussed in the literature. This paper focuses on sample size calculations for nonparametric regression and partially linear models based on local linear estimators. We describe explicit procedures for sample size calculations based on non- and semi-parametric F-tests. Data examples are provided to demonstrate the use of the procedures.  相似文献   

19.
ABSTRACT

We derive concentration inequalities for the cross-validation estimate of the generalization error for empirical risk minimizers. In the general setting, we show that the worst-case error of this estimate is not much worse that of training error estimate see Kearns M, Ron D. [Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Comput. 1999;11:1427–1453]. General loss functions and class of predictors with finite VC-dimension are considered. Our focus is on proving the consistency of the various cross-validation procedures. We point out the interest of each cross-validation procedure in terms of rates of convergence. An interesting consequence is that the size of the test sample is not required to grow to infinity for the consistency of the cross-validation procedure.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号