共查询到20条相似文献,搜索用时 562 毫秒
1.
Random sampling from databases: a survey 总被引:2,自引:0,他引:2
This paper reviews recent literature on techniques for obtaining random samples from databases. We begin with a discussion of why one would want to include sampling facilities in database management systems. We then review basic sampling techniques used in constructing DBMS sampling algorithms, e.g. acceptance/rejection and reservoir sampling. A discussion of sampling from various data structures follows: B
+ trees, hash files, spatial data structures (including R-trees and quadtrees). Algorithms for sampling from simple relational queries, e.g. single relational operators such as selection, intersection, union, set difference, projection, and join are then described. We then describe sampling for estimation of aggregates (e.g. the size of query results). Here we discuss both clustered sampling, and sequential sampling approaches. Decision-theoretic approaches to sampling for query optimization are reviewed. 相似文献
2.
Under stratified random sampling, we develop a kth-order bootstrap bias-corrected estimator of the number of classes θ which exist in a study region. This research extends Smith and van Belle’s (1984) first-order bootstrap bias-corrected estimator under simple random sampling. Our estimator has applicability for many settings including: estimating the number of animals when there are stratified capture periods, estimating the number of species based on stratified random sampling of subunits (say, quadrats) from the region, and estimating the number of errors/defects in a product based on observations from two or more types of inspectors. When the differences between the strata are large, utilizing stratified random sampling and our estimator often results in superior performance versus the use of simple random sampling and its bootstrap or jackknife [Burnham and Overton (1978)] estimator. The superior performance is often associated with more observed classes, and we provide insights into optimal designation of the strata and optimal allocation of sample sectors to strata. 相似文献
3.
Perfect simulation of positive Gaussian distributions 总被引:1,自引:0,他引:1
We provide an exact simulation algorithm that produces variables from truncated Gaussian distributions on (
+)
p
via a perfect sampling scheme, based on stochastic ordering and slice sampling, since accept-reject algorithms like the one of Geweke (1991) and Robert (1995) are difficult to extend to higher dimensions. 相似文献
4.
Abstract. The strong Rayleigh property is a new and robust negative dependence property that implies negative association; in fact it implies conditional negative association closed under external fields (CNA+). Suppose that and are two families of 0‐1 random variables that satisfy the strong Rayleigh property and let . We show that {Zi} conditioned on is also strongly Rayleigh; this turns out to be an easy consequence of the results on preservation of stability of polynomials of Borcea & Brändén (Invent. Math., 177, 2009, 521–569). This entails that a number of important π ps sampling algorithms, including Sampford sampling and Pareto sampling, are CNA+. As a consequence, statistics based on such samples automatically satisfy a version of the Central Limit Theorem for triangular arrays. 相似文献
5.
Tommy Wright 《The American statistician》2013,67(4):217-224
We present a surprising though obvious result that seems to have been unnoticed until now. In particular, we demonstrate the equivalence of two well-known problems—the optimal allocation of the fixed overall sample size n among L strata under stratified random sampling and the optimal allocation of the H = 435 seats among the 50 states for apportionment of the U.S. House of Representatives following each decennial census. In spite of the strong similarity manifest in the statements of the two problems, they have not been linked and they have well-known but different solutions; one solution is not explicitly exact (Neyman allocation), and the other (equal proportions) is exact. We give explicit exact solutions for both and note that the solutions are equivalent. In fact, we conclude by showing that both problems are special cases of a general problem. The result is significant for stratified random sampling in that it explicitly shows how to minimize sampling error when estimating a total TY while keeping the final overall sample size fixed at n; this is usually not the case in practice with Neyman allocation where the resulting final overall sample size might be near n + L after rounding. An example reveals that controlled rounding with Neyman allocation does not always lead to the optimum allocation, that is, an allocation that minimizes variance. 相似文献
6.
Motivated by a real-life problem, we develop a Two-Stage Cluster Sampling with Ranked Set Sampling (TSCRSS) design in the second stage for which we derive an unbiased estimator of population mean and its variance. An unbiased estimator of the variance of mean estimator is also derived. It is proved that the TSCRSS is more efficient—in the sense of having smaller variance—than the conventional two-stage cluster simple random sampling in which the second-stage sampling is with replacement. Using a simulation study on a real-life population, we show that the TSCRSS is more efficient than the conventional two-stage cluster sampling when simple random sampling without replacement is used in both stages. 相似文献
7.
Gupta and Shabbir 2 have suggested an alternative form of ratio-type estimators for estimating the population mean. In this paper, we obtained a corrected version for the mean square error (MSE) of the Gupta–Shabbir estimator, up to first order of approximation, and the optimum case is discussed. We expand this estimator to the stratified random sampling and propose general classes for combined and separate estimators. Also an empirical study is carried out to show the properties of the proposed estimators. 相似文献
8.
In this paper we consider the problem of unbiased estimation of the distribution function of an exponential population using order statistics based on a random sample. We present a (unique) unbiased estimator based on a single, say ith, order statistic and study some properties of the estimator for i = 2. We also indicate how this estimator can be utilized to obtain unbiased estimators when a few selected order statistics are available as well as when the sample is selected following an alternative sampling procedure known as ranked set sampling. It is further proved that for a ranked set sample of size two, the proposed estimator is uniformly better than the conventional nonparametric unbiased estimator, further, for a general sample size, a modified ranked set sampling procedure provides an unbiased estimator uniformly better than the conventional nonparametric unbiased estimator based on the usual ranked set sampling procedure. 相似文献
9.
LENNART BONDESSON 《Scandinavian Journal of Statistics》2010,37(3):514-530
Abstract. Two new unequal probability sampling methods are introduced: conditional and restricted Pareto sampling. The advantage of conditional Pareto sampling compared with standard Pareto sampling, introduced by Rosén (J. Statist. Plann. Inference, 62, 1997, 135, 159), is that the factual inclusion probabilities better agree with the desired ones. Restricted Pareto sampling, preferably conditioned or adjusted, is able to handle cases where there are several restrictions on the sample and is an alternative to the recent cube method for balanced sampling introduced by Deville and Tillé (Biometrika, 91, 2004, 893). The new sampling designs have high entropy and the involved random numbers can be seen as permanent random numbers. 相似文献
10.
This article addresses the problem of estimating the population mean in stratified random sampling using the information of an auxiliary variable. A class of estimators for population mean is defined with its properties under large sample approximation. In particular, various classes of estimators are identified as particular member of the suggested class. It has been shown that the proposed class of estimators is better than usual unbiased estimator, usual combined ratio estimator, usual product estimator, usual regression estimator and Koyuncu and Kadilar (2009) class of estimators. The results have been illustrated through an empirical study. 相似文献
11.
《Journal of Statistical Computation and Simulation》2012,82(4):337-349
In this paper we examine the failure-censored sampling plans for the two–parameter exponential distri- bution based on m random samples, each of size n. The suggested procedure is based on exact results and only the first failure time of each sample is needed. The values of the acceptability constant are also tabulated for selected values of p α 1 p β 1, α and β. Further, a comparison of the proposed sampling plans with ordinary sampling plans using a sample of size mn is made. When compared to ordinary sampling plans, the proposed plan has an advantage in terms of shorter test-time and a saving of resources. 相似文献
12.
Ratio-Cum-Product Type Exponential Estimator of Finite Population Mean in Stratified Random Sampling
Rajesh Tailor 《统计学通讯:理论与方法》2014,43(2):343-354
This article addresses the problem of estimating the finite population mean in stratified random sampling using auxiliary information. Motivated by Singh (1967) and Bahl and Tuteja (1991) a ratio-cum-product type exponential estimator has been suggested and its bias and mean squared error have been derived under large sample approximation. Suggested estimator has been compared with usual unbiased estimator of population mean in stratified random sampling, combined ratio estimator, combined product estimator, ratio and product type exponential estimator of Singh et al. (2008). Conditions under which suggested estimator is more efficient than other considered estimators have been obtained. A numerical illustration is given in support of the theoretical findings. 相似文献
13.
Juliet Popper Shaffer 《The American statistician》2013,67(4):269-273
In the standard linear regression model with independent, homoscedastic errors, the Gauss—Markov theorem asserts that = (X'X)-1(X'y) is the best linear unbiased estimator of β and, furthermore, that is the best linear unbiased estimator of c'β for all p × 1 vectors c. In the corresponding random regressor model, X is a random sample of size n from a p-variate distribution. If attention is restricted to linear estimators of c'β that are conditionally unbiased, given X, the Gauss—Markov theorem applies. If, however, the estimator is required only to be unconditionally unbiased, the Gauss—Markov theorem may or may not hold, depending on what is known about the distribution of X. The results generalize to the case in which X is a random sample without replacement from a finite population. 相似文献
14.
15.
Zaheen Khan 《统计学通讯:理论与方法》2013,42(7):2105-2117
ABSTRACTIn this paper, some deficiencies in traditional selection procedure of circular version of systematic sampling schemes are investigated and alternative methods are proposed. We also suggest some rules of thumb for coincidence of units in the sample. The end corrections proposed by Bellhouse and Rao (1975) and Sampath and Varalakshmi (2008) for circular systematic sampling and diagonal circular systematic sampling, respectively, are also modified. 相似文献
16.
ABSTRACTThe article suggests a class of estimators of population mean in stratified random sampling using auxiliary information with its properties. In addition, various known estimators/classes of estimators are identified as members of the suggested class. It has been shown that the suggested class of estimators under optimum condition performs better than the usual unbiased, usual combined ratio, usual combined regression, Kadilar and Cingi (2005), Singh and Vishwakarma (2006) estimators and the members belonging to the classes of estimators envisaged by Kadilar and Cingi (2003), Singh, Tailor et al. (2008), Singh et al. (2009), Singh and Vishwakarma (2010) and Koyuncu and Kadilar (2010). 相似文献
17.
The present article deals with some methods for estimation of finite populations means in the presence of linear trend among the population values. As a result, we provided a strategy for the selection of sampling interval k for the case of circular systematic sampling, which ensures better estimator for the population mean compared to other choices of the sampling interval. This has been established based on empirical studies. Further we more, applied multiple random starts methods for selecting random samples for the case of linear systematic sampling and diagonal systematic sampling schemes. We also derived the explicit expressions for the variances and their estimates. The relative performances of simple random sampling, linear systematic sampling and diagonal systematic sampling schemes with single and multiple random starts are also assessed based on numerical examples. 相似文献
18.
Housila P. Singh 《统计学通讯:理论与方法》2013,42(6):1008-1023
This paper suggests an efficient class of ratio and product estimators for estimating the population mean in stratified random sampling using auxiliary information. It is interesting to mention that, in addition to many, Koyuncu and Kadilar (2009), Kadilar and Cingi (2003, 2005), and Singh and Vishwakarma (2007) estimators are identified as members of the proposed class of estimators. The expressions of bias and mean square error (MSE) of the proposed estimators are derived under large sample approximation in general form. Asymptotically optimum estimator (AOE) in the class is identified alongwith its MSE formula. It has been shown that the proposed class of estimators is more efficient than combined regression estimator and Koyuncu and Kadilar (2009) estimator. Moreover, theoretical findings are supported through a numerical example. 相似文献
19.
20.
This paper addresses the problem of unbiased estimation of P[X > Y] = θ for two independent exponentially distributed random variables X and Y. We present (unique) unbiased estimator of θ based on a single pair of order statistics obtained from two independent random samples from the two populations. We also indicate how this estimator can be utilized to obtain unbiased estimators of θ when only a few selected order statistics are available from the two random samples as well as when the samples are selected by an alternative procedure known as ranked set sampling. It is proved that for ranked set samples of size two, the proposed estimator is uniformly better than the conventional non-parametric unbiased estimator and further, a modified ranked set sampling procedure provides an unbiased estimator even better than the proposed estimator. 相似文献