期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

VARIANCE ESTIMATION IN TWO-PHASE SAMPLING

M.A. Hidiroglou J.N.K. Rao David Haziza 《Australian & New Zealand Journal of Statistics》2009,51(2):127-141

Two‐phase sampling is often used for estimating a population total or mean when the cost per unit of collecting auxiliary variables, x, is much smaller than the cost per unit of measuring a characteristic of interest, y. In the first phase, a large sample s₁ is drawn according to a specific sampling design p(s₁) , and auxiliary data x are observed for the units i∈s₁ . Given the first‐phase sample s₁ , a second‐phase sample s₂ is selected from s₁ according to a specified sampling design {p(s₂∣s₁) } , and (y, x) is observed for the units i∈s₂ . In some cases, the population totals of some components of x may also be known. Two‐phase sampling is used for stratification at the second phase or both phases and for regression estimation. Horvitz–Thompson‐type variance estimators are used for variance estimation. However, the Horvitz–Thompson ( Horvitz & Thompson, J. Amer. Statist. Assoc. 1952 ) variance estimator in uni‐phase sampling is known to be highly unstable and may take negative values when the units are selected with unequal probabilities. On the other hand, the Sen–Yates–Grundy variance estimator is relatively stable and non‐negative for several unequal probability sampling designs with fixed sample sizes. In this paper, we extend the Sen–Yates–Grundy ( Sen , J. Ind. Soc. Agric. Statist. 1953; Yates & Grundy , J. Roy. Statist. Soc. Ser. B 1953) variance estimator to two‐phase sampling, assuming fixed first‐phase sample size and fixed second‐phase sample size given the first‐phase sample. We apply the new variance estimators to two‐phase sampling designs with stratification at the second phase or both phases. We also develop Sen–Yates–Grundy‐type variance estimators of the two‐phase regression estimators that make use of the first‐phase auxiliary data and known population totals of some of the auxiliary variables. 相似文献

2.

Rare and clustered population estimation using the adaptive cluster sampling with some robust measures

Muhammad Nouman Qureshi Cem Kadilar Muhammad Noor Ul Amin Muhammad Hanif 《Journal of Statistical Computation and Simulation》2018,88(14):2761-2774

The use of robust measures helps to increase the precision of the estimators, especially for the estimation of extremely skewed distributions. In this article, a generalized ratio estimator is proposed by using some robust measures with single auxiliary variable under the adaptive cluster sampling (ACS) design. We have incorporated tri-mean (TM), mid-range (MR) and Hodges-Lehman (HL) of the auxiliary variable as robust measures together with some conventional measures. The expressions of bias and mean square error (MSE) of the proposed generalized ratio estimator are derived. Two types of numerical study have been conducted using artificial clustered population and real data application to examine the performance of the proposed estimator over the usual mean per unit estimator under simple random sampling (SRS). Related results of the simulation study show that the proposed estimators provide better estimation results on both real and artificial population over the competing estimators. 相似文献

3.

Divergence versus decision P-values: A distinction worth making in theory and keeping in practice: Or,how divergence P-values measure evidence even when decision P-values do not

Sander Greenland 《Scandinavian Journal of Statistics》2023,50(1):54-88

There are two distinct definitions of “P-value” for evaluating a proposed hypothesis or model for the process generating an observed dataset. The original definition starts with a measure of the divergence of the dataset from what was expected under the model, such as a sum of squares or a deviance statistic. A P-value is then the ordinal location of the measure in a reference distribution computed from the model and the data, and is treated as a unit-scaled index of compatibility between the data and the model. In the other definition, a P-value is a random variable on the unit interval whose realizations can be compared to a cutoff α to generate a decision rule with known error rates under the model and specific alternatives. It is commonly assumed that realizations of such decision P-values always correspond to divergence P-values. But this need not be so: Decision P-values can violate intuitive single-sample coherence criteria where divergence P-values do not. It is thus argued that divergence and decision P-values should be carefully distinguished in teaching, and that divergence P-values are the relevant choice when the analysis goal is to summarize evidence rather than implement a decision rule. 相似文献

4.

ON PREDICTION OF TOTAL VALUE IN INCOMPLETELY SPECIFIED DOMAINS

Tomasz &#;&#;d&#;o 《Australian & New Zealand Journal of Statistics》2006,48(3):269-283

This paper presents the problem of prediction of a domain total value based on the general linear model. In many methods presented in the survey sampling literature (e.g. Cassel, Särndal & Wretman, 1977 [Foundations of inference in survey sampling, New York: John Wiley & Sons]; Valliant, Dorfman & Royall, 2000 [Finite population sampling and inference. A prediction approach. New York: John Wiley & Sons]; Rao, 2003 [Small area estimation. New York; John Wiley & Sons]) a common assumption is that for each element of a population the domain to which it belongs is known. This assumption is especially important in the situation when a superpopulation model with auxiliary variables is considered. In this paper a method is proposed for prediction of the domain total when it is not known whether a unit belongs to a given domain or not, or when the information is available only for sampled elements of the population. 相似文献

5.

Sampling strategy for optimal classification into one of two correlated normal populations

Subhadip Bandyopadhyay Shibdas Bandyopadhyay 《Statistics》2013,47(5):1116-1127

A unit ω is to be classified into one of two correlated homoskedastic normal populations by linear discriminant function known as W classification statistic [T.W. Anderson, An asymptotic expansion of the distribution of studentized classification statistic, Ann. Statist. 1 (1973), pp. 964–972; T.W. Anderson, An Introduction to Multivariate Statistical Analysis, 2nd edn, Wiley, New York, 1984; G.J. Mclachlan, Discriminant Analysis and Statistical Pattern Recognition, John Wiley and Sons, New York, 1992]. The two populations studied here are two different states of the same population, like two different states of a disease where the population is the population of diseased patient. When a sample unit is observed in both the states (populations), the observations made on it (which form a pair) become correlated. A training sample is unbalanced when not all sample units are observed in both the states. Paired and also unbalanced samples are natural in studies related to correlated populations. S. Bandyopadhyay and S. Bandyopadhyay [Choosing better training sample for classifying an individual into one of two correlated normal populations, Calcutta Statist. Assoc. Bull. 54(215–216) (2003), pp. 167–180] studied the effect of unbalanced training sample structure on the performance of W statistics in the univariate correlated normal set-up for finding optimal sampling strategy for a better classification rate. In this study, the results are extended to the multivariate case with discussion on application in real scenario. 相似文献

6.

Unification of Dirichlet Methodology

Aaron Childs 《统计学通讯:理论与方法》2013,42(9):1647-1662

In this article we provide a unified framework for solving Dirichlet related probability and waiting time problems. We consider a Pólya sampling scheme in which each time an object is selected, it is put back into the population along with c additional objects of the same type. By considering both fixed sample size and inverse sampling procedures, we unify the Dirichlet I, J, C, and D functions with their hypergeometric counterparts by extending these functions to Pólya sampling. We then use these functions to unify and extend the corresponding expected waiting time results. 相似文献

7.

Inference on finite population categorical response: nonparametric regression-based predictive approach

Sumanta Adhya Tathagata Banerjee Gaurangadeb Chattopadhyay 《AStA Advances in Statistical Analysis》2012,96(1):69-98

Suppose that a finite population consists of N distinct units. Associated with the ith unit is a polychotomous response vector, d _i, and a vector of auxiliary variable x _i. The values x _i’s are known for the entire population but d _i’s are known only for the units selected in the sample. The problem is to estimate the finite population proportion vector P. One of the fundamental questions in finite population sampling is how to make use of the complete auxiliary information effectively at the estimation stage. In this article a predictive estimator is proposed which incorporates the auxiliary information at the estimation stage by invoking a superpopulation model. However, the use of such estimators is often criticized since the working superpopulation model may not be correct. To protect the predictive estimator from the possible model failure, a nonparametric regression model is considered in the superpopulation. The asymptotic properties of the proposed estimator are derived and also a bootstrap-based hybrid re-sampling method for estimating the variance of the proposed estimator is developed. Results of a simulation study are reported on the performances of the predictive estimator and its re-sampling-based variance estimator from the model-based viewpoint. Finally, a data survey related to the opinions of 686 individuals on the cause of addiction is used for an empirical study to investigate the performance of the nonparametric predictive estimator from the design-based viewpoint. 相似文献

8.

Sampling schemes for price index construction: a performance comparison across the classification of individual consumption by purpose food groups

Saeed Heravi Peter Morgan 《Journal of applied statistics》2014,41(7):1453-1470

Five sampling schemes (SS) for price index construction – one cut-off sampling technique and four probability-proportional-to-size (pps) methods – are evaluated by comparing their performance on a homescan market research data set across 21 months for each of the 13 classification of individual consumption by purpose (COICOP) food groups. Classifications are derived for each of the food groups and the population index value is used as a reference to derive performance error measures, such as root mean squared error, bias and standard deviation for each food type. Repeated samples are taken for each of the pps schemes and the resulting performance error measures analysed using regression of three of the pps schemes to assess the overall effect of SS and COICOP group whilst controlling for sample size, month and population index value. Cut-off sampling appears to perform less well than pps methods and multistage pps seems to have no advantage over its single-stage counterpart. The jackknife resampling technique is also explored as a means of estimating the standard error of the index and compared with the actual results from repeated sampling. 相似文献

9.

Hayter and Tsui's test with double sampling for the vector mean of multivariate normal population

Sueli Aparecida Mingoti Graziele Umbelina Alves Ferreira 《统计学通讯:理论与方法》2013,42(15):4365-4377

Abstract

In this paper, we introduce a version of Hayter and Tsui's statistical test with double sampling for the vector mean of a population under multivariate normal assumption. A study showed that this new test was more or as efficient than the well-known Hotelling's T² with double sampling. Some nice features of Hayter and Tsui's test are its simplicity of implementation and its capability of identifying the errant variables when the null hypothesis is rejected. Taking that into consideration, a new control chart called HTDS is also introduced as a tool to monitor multivariate process vector mean when using double sampling. 相似文献

10.

RANKED SET SAMPLING FROM LOCATION-SCALE FAMILIES OF SYMMETRIC DISTRIBUTIONS

《统计学通讯:理论与方法》2013,42(8-9):1641-1659

Statistical inference based on ranked set sampling has primarily been motivated by nonparametric problems. However, the sampling procedure can provide an improved estimator of the population mean when the population is partially known. In this article, we consider estimation of the population mean and variance for the location-scale families of distributions. We derive and compare different unbiased estimators of these parameters based on rindependent replications of a ranked set sample of size n.Large sample properties, along with asymptotic relative efficiencies, help identify which estimators are best suited for different location-scale distributions. 相似文献

11.

Moments of the sampled autocovariances and autocorrelations for a Gaussian white-noise process

Oliver D. Anderson 《Revue canadienne de statistique》1990,18(3):271-284

Cumulants, moments about zero, and central moments are obtained for the mean-corrected serial covariances and serial correlations for series realizations of length n from a white-noise Gaussian process. All first and second moments (and some third, fourth, and higher moments) are given explicitly for the serial covariances; and the corresponding moments for the serial correlations are derived either explicitly or implicitly. 相似文献

12.

Convergence in the Wasserstein Metric for Markov Chain Monte Carlo Algorithms with Applications to Image Restoration

《随机性模型》2013,29(4):473-492

Abstract

In this paper, we show how the time for convergence to stationarity of a Markov chain can be assessed using the Wasserstein metric, rather than the usual choice of total variation distance. The Wasserstein metric may be more easily applied in some applications, particularly those on continuous state spaces. Bounds on convergence time are established by considering the number of iterations required to approximately couple two realizations of the Markov chain to within ε tolerance. The particular application considered is the use of the Gibbs sampler in the Bayesian restoration of a degraded image, with pixels that are a continuous grey-scale and with pixels that can only take two colours. On finite state spaces, a bound in the Wasserstein metric can be used to find a bound in total variation distance. We use this relationship to get a precise O(N log N) bound on the convergence time of the stochastic Ising model that holds for appropriate values of its parameter as well as other binary image models. Our method employing convergence in the Wasserstein metric can also be applied to perfect sampling algorithms involving coupling from the past to obtain estimates of their running times. 相似文献

13.

RATIO-TO-SIZE ESTIMATORS OF MEAN PER SUBUNIT IN TWO-STAGE SAMPLING OVER TWO OCCASIONS

Fabian C. Okafor 《Australian & New Zealand Journal of Statistics》1988,30(3):367-378

We propose here some strategies for estimating the population mean per subunit at the current occasion and change in mean from one occasion to the next based on two-stage sampling on two successive occasions. Estimators of the mean in two-stage sampling over successive occasions have so far been based on the knowledge of the total number of subunits (elements), M⁰ in the population or on assumed equal sizes for the primary units. We have therefore given the ratio-to-size estimators for the population mean per subunit on the current occasion and the change in mean over the two occasions where M⁰ is not known. The results obtained also apply to the situation where M¹ is correlated with the variable of interest y. 相似文献

14.

Remainder Markov systematic sampling

Fei-Fei Kao Ching-Ho LeuChien-Hao Ko 《Journal of statistical planning and inference》2011,141(11):3595-3604

Systematic sampling is the simplest and easiest of the most common sampling methods. However, when the population size N cannot be evenly divided by the sampling size n, systematic sampling cannot be performed. Not only is it difficult to determine the sampling interval k equivalent to the sampling probability of the sampling unit, but also the sample size will be inconstant and the sample mean will be a biased estimator of the population mean. To solve this problem, this paper introduces an improved method for systematic sampling: the remainder Markov systematic sampling method. This new method involves separately finding the first-order and second-order inclusion probabilities. This approach uses the Horvitz-Thompson estimator as an unbiased estimator of the population mean to find the variance of the estimator. This study examines the effectiveness of the proposed method for different super-populations. 相似文献

15.

Pseudo‐R2 statistics under complex sampling

下载免费PDF全文

Thomas Lumley 《Australian & New Zealand Journal of Statistics》2017,59(2):187-194

Model summaries based on the ratio of fitted and null likelihoods have been proposed for generalised linear models, reducing to the familiar R² coefficient of determination in the Gaussian model with identity link. In this note I show how to define the Cox–Snell and Nagelkerke summaries under arbitrary probability sampling designs, giving a design‐consistent estimator of the population model summary. It is also shown that for logistic regression models under case–control sampling the usual Cox–Snell and Nagelkerke R² are not design‐consistent, but are systematically larger than would be obtained with a cross‐sectional or cohort sample from the same population, even in settings where the weighted and unweighted logistic regression estimators are similar or identical. Implementation of the new estimators is straightforward and code is provided in R. 相似文献

16.

General procedure for estimating the population mean using ranked set sampling

《Journal of Statistical Computation and Simulation》2012,82(5):931-945

In this paper, we suggest a class of estimators for estimating the population mean ? of the study variable Y using information on X?, the population mean of the auxiliary variable X using ranked set sampling envisaged by McIntyre [A method of unbiased selective sampling using ranked sets, Aust. J. Agric. Res. 3 (1952), pp. 385–390] and developed by Takahasi and Wakimoto [On unbiased estimates of the population mean based on the sample stratified by means of ordering, Ann. Inst. Statist. Math. 20 (1968), pp. 1–31]. The estimator reported by Kadilar et al. [Ratio estimator for the population mean using ranked set sampling, Statist. Papers 50 (2009), pp. 301–309] is identified as a member of the proposed class of estimators. The bias and the mean-squared error (MSE) of the proposed class of estimators are obtained. An asymptotically optimum estimator in the class is identified with its MSE formulae. To judge the merits of the suggested class of estimators over others, an empirical study is carried out. 相似文献

17.

Biostatistical genetics and genetic epidemiology

《Journal of Statistical Computation and Simulation》2012,82(7):543-544

It is well-known that when ranked set sampling (RSS) scheme is employed to estimate the mean of a population, it is more efficient than simple random sampling (SRS) with the same sample size. One can use a RSS analog of SRS regression estimator to estimate the population mean of Y using its concomitant variable X when they are linearly related. Unfortunately, the variance of this estimate cannot be evaluated unless the distribution of X is known. We investigate the use of resampling methods to establish confidence intervals for the regression estimation of the population mean. Simulation studies show that the proposed methods perform well in a variety of situations when the assumption of linearity holds, and decently well under mild non-linearity. 相似文献

18.

Comparison of Three Sampling Designs

Ester Samuel-Cahn 《The American statistician》2013,67(3):156-157

Three sampling designs are considered for estimating the sum of k population means by the sum of the corresponding sample means. These are (a) the optimal design; (b) equal sample sizes from all populations; and (c) sample sizes that render equal variances to all sample means. Designs (b) and (c) are equally inefficient, and may yield a variance up to k times as large as that of (a). Similar results are true when the cost of sampling is introduced, and they depend on the population sampled. 相似文献

19.

HORVITZ-THOMPSON AND DES RAJ ESTIMATORS REVISITED

T. J. Rao 《Australian & New Zealand Journal of Statistics》1972,14(3):227-230

In this paper it is shown that the generalized πPS sampling strategy consisting of the design with π_i, the probability of inclusion of the ith unit in the sample, proportional to the modified size together with the corresponding Horvitz-Thompson estimator (Rao, 1971), is superior to the symmetrized Des Raj strategy under a general super-population set-up for all values of the super-population parameter g, when the samples are of size two. 相似文献

20.

An inequality for random replacement sampling plans

George L. O'Brien Augustine Wong 《Revue canadienne de statistique》1988,16(4):383-391

A sample (X₁ …, X_n) is drawn from a population of size N. Karlin (1974) conjectured that for any function ? in a certain class of real-valued functions on the sample space, ? is at least as large for sampling with replacement as for any other random replacement sampling plan. This conjecture is proved under the assumption that ? 相似文献