首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
Outliers that commonly occur in business sample surveys can have large impacts on domain estimates. The authors consider an outlier‐robust design and smooth estimation approach, which can be related to the so‐called “Surprise stratum” technique [Kish, “Survey Sampling,” Wiley, New York (1965)]. The sampling design utilizes a threshold sample consisting of previously observed outliers that are selected with probability one, together with stratified simple random sampling from the rest of the population. The domain predictor is an extension of the Winsorization‐based estimator proposed by Rivest and Hidiroglou [Rivest and Hidiroglou, “Outlier Treatment for Disaggregated Estimates,” in “Proceedings of the Section on Survey Research Methods,” American Statistical Association (2004), pp. 4248–4256], and is similar to the estimator for skewed populations suggested by Fuller [Fuller, Statistica Sinica 1991;1:137–158]. It makes use of a domain Winsorized sample mean plus a domain‐specific adjustment of the estimated overall mean of the excess values on top of that. The methods are studied in theory from a design‐based perspective and by simulations based on the Norwegian Research and Development Survey data. Guidelines for choosing the threshold values are provided. The Canadian Journal of Statistics 39: 147–164; 2011 © 2010 Statistical Society of Canada  相似文献   

2.
Many survey questions allow respondents to pick any number out of c possible categorical responses or “items”. These kinds of survey questions often use the terminology “choose all that apply” or “pick any”. Often of interest is determining if the marginal response distributions of each item differ among r different groups of respondents. Agresti and Liu (1998, 1999) call this a test for multiple marginal independence (MMI). If respondents are allowed to pick only 1 out of c responses, the hypothesis test may be performed using the Pearson chi-square test of independence. However, since respondents may pick more or less than 1 response, the test's assumptions that responses are made independently of each other is violated. Recently, a few MMI testing methods have been proposed. Loughin and Scherer (1998) propose using a bootstrap method based on a modified version of the Pearson chi-square test statistic. Agresti and Liu (1998, 1999) propose using marginal logit models, quasisymmetric loglinear models, and a few methods based on Pearson chi-square test statistics. Decady and Thomas (1999) propose using a Rao-Scott adjusted chi-squared test statistic. There has not been a full investigation of these MMI testing methods. The purpose here is to evaluate the proposed methods and propose a few new methods. Recommendations are given to guide the practitioner in choosing which MMI testing methods to use.  相似文献   

3.
When confronted with multiple covariates and a response variable, analysts sometimes apply a variable‐selection algorithm to the covariate‐response data to identify a subset of covariates potentially associated with the response, and then wish to make inferences about parameters in a model for the marginal association between the selected covariates and the response. If an independent data set were available, the parameters of interest could be estimated by using standard inference methods to fit the postulated marginal model to the independent data set. However, when applied to the same data set used by the variable selector, standard (“naive”) methods can lead to distorted inferences. The authors develop testing and interval estimation methods for parameters reflecting the marginal association between the selected covariates and response variable, based on the same data set used for variable selection. They provide theoretical justification for the proposed methods, present results to guide their implementation, and use simulations to assess and compare their performance to a sample‐splitting approach. The methods are illustrated with data from a recent AIDS study. The Canadian Journal of Statistics 37: 625–644; 2009 © 2009 Statistical Society of Canada  相似文献   

4.
Multiple-response (or pick any/c) categorical variables summarize responses to survey questions that ask “pick any” from a set of item responses. Extensions to loglinear model methodology are proposed to model associations between these variables across all their items simultaneously. Because individual item responses to a multiple-response categorical variable are likely to be correlated, the usual chi-square distributional approximations for model-comparison statistics are not appropriate. Adjusted statistics and a new bootstrap procedure are developed to facilitate distributional approximations. Odds ratio and standardized Pearson residual measures are also developed to estimate specific associations and examine deviations from a specified model.  相似文献   

5.
The evaluation of new processor designs is an important issue in electrical and computer engineering. Architects use simulations to evaluate designs and to understand trade‐offs and interactions among design parameters. However, due to the lengthy simulation time and limited resources, it is often practically impossible to simulate a full factorial design space. Effective sampling methods and predictive models are required. In this paper, the authors propose an automated performance predictive approach which employs an adaptive sampling scheme that interactively works with the predictive model to select samples for simulation. These samples are then used to build Bayesian additive regression trees, which in turn are used to predict the whole design space. Both real data analysis and simulation studies show that the method is effective in that, though sampling at very few design points, it generates highly accurate predictions on the unsampled points. Furthermore, the proposed model provides quantitative interpretation tools with which investigators can efficiently tune design parameters in order to improve processor performance. The Canadian Journal of Statistics 38: 136–152; 2010 © 2010 Statistical Society of Canada  相似文献   

6.
Longitudinal surveys have emerged in recent years as an important data collection tool for population studies where the primary interest is to examine population changes over time at the individual level. Longitudinal data are often analyzed through the generalized estimating equations (GEE) approach. The vast majority of existing literature on the GEE method; however, is developed under non‐survey settings and are inappropriate for data collected through complex sampling designs. In this paper the authors develop a pseudo‐GEE approach for the analysis of survey data. They show that survey weights must and can be appropriately accounted in the GEE method under a joint randomization framework. The consistency of the resulting pseudo‐GEE estimators is established under the proposed framework. Linearization variance estimators are developed for the pseudo‐GEE estimators when the finite population sampling fractions are small or negligible, a scenario often held for large‐scale surveys. Finite sample performances of the proposed estimators are investigated through an extensive simulation study using data from the National Longitudinal Survey of Children and Youth. The results show that the pseudo‐GEE estimators and the linearization variance estimators perform well under several sampling designs and for both continuous and binary responses. The Canadian Journal of Statistics 38: 540–554; 2010 © 2010 Statistical Society of Canada  相似文献   

7.
Ranked set sampling (RSS) was first proposed by McIntyre [1952. A method for unbiased selective sampling, using ranked sets. Australian J. Agricultural Res. 3, 385–390] as an effective way to estimate the unknown population mean. Chuiv and Sinha [1998. On some aspects of ranked set sampling in parametric estimation. In: Balakrishnan, N., Rao, C.R. (Eds.), Handbook of Statistics, vol. 17. Elsevier, Amsterdam, pp. 337–377] and Chen et al. [2004. Ranked Set Sampling—Theory and Application. Lecture Notes in Statistics, vol. 176. Springer, New York] have provided excellent surveys of RSS and various inferential results based on RSS. In this paper, we use the idea of order statistics from independent and non-identically distributed (INID) random variables to propose ordered ranked set sampling (ORSS) and then develop optimal linear inference based on ORSS. We determine the best linear unbiased estimators based on ORSS (BLUE-ORSS) and show that they are more efficient than BLUE-RSS for the two-parameter exponential, normal and logistic distributions. Although this is not the case for the one-parameter exponential distribution, the relative efficiency of the BLUE-ORSS (to BLUE-RSS) is very close to 1. Furthermore, we compare both BLUE-ORSS and BLUE-RSS with the BLUE based on order statistics from a simple random sample (BLUE-OS). We show that BLUE-ORSS is uniformly better than BLUE-OS, while BLUE-RSS is not as efficient as BLUE-OS for small sample sizes (n<5n<5).  相似文献   

8.
 在改革开放的新形势下,我国政府统计部门开展了农村住户等一系列农村统计调查,为解决“三农”问题提供了多方面的数据信息。本文通过分析总结现行农村住户抽样调查方案中存在的各种矛盾和问题,利用国际上前沿的连续性抽样调查方法作为理论基础,分别从农村住户抽样框的构建、连续各期调查样本的抽取、二维平衡轮换模式的设计、连续性抽样估计及其方差估计和连续时间序列数据的调整分析等角度提出一系列改革措施,从而设计出更加科学的调查方案,为及时、准确地搜集和提供关于“三农”问题的数据信息服务。关于其它类型的抽样调查方案亦可按照本文研究的思路类似地加以设计和解决。  相似文献   

9.
Generalized partially linear varying-coefficient models   总被引:1,自引:0,他引:1  
Generalized varying-coefficient models are useful extensions of generalized linear models. They arise naturally when investigating how regression coefficients change over different groups characterized by certain covariates such as age. In this paper, we extend these models to generalized partially linear varying-coefficient models, in which some coefficients are constants and the others are functions of certain covariates. Procedures for estimating the linear and non-parametric parts are developed and their associated statistical properties are studied. The methods proposed are illustrated using some simulations and real data analysis.  相似文献   

10.
The results of analyzing experimental data using a parametric model may heavily depend on the chosen model for regression and variance functions, moreover also on a possibly underlying preliminary transformation of the variables. In this paper we propose and discuss a complex procedure which consists in a simultaneous selection of parametric regression and variance models from a relatively rich model class and of Box-Cox variable transformations by minimization of a cross-validation criterion. For this it is essential to introduce modifications of the standard cross-validation criterion adapted to each of the following objectives: 1. estimation of the unknown regression function, 2. prediction of future values of the response variable, 3. calibration or 4. estimation of some parameter with a certain meaning in the corresponding field of application. Our idea of a criterion oriented combination of procedures (which usually if applied, then in an independent or sequential way) is expected to lead to more accurate results. We show how the accuracy of the parameter estimators can be assessed by a “moment oriented bootstrap procedure", which is an essential modification of the “wild bootstrap” of Härdle and Mammen by use of more accurate variance estimates. This new procedure and its refinement by a bootstrap based pivot (“double bootstrap”) is also used for the construction of confidence, prediction and calibration intervals. Programs written in Splus which realize our strategy for nonlinear regression modelling and parameter estimation are described as well. The performance of the selected model is discussed, and the behaviour of the procedures is illustrated, e.g., by an application in radioimmunological assay.  相似文献   

11.
In this paper, a new estimator for estimating the proportion of a potentially sensitive attribute in survey sampling has been introduced. The proposed estimator makes use of higher order moments of the scrambling variable at the estimation stage. The proposed estimator has been found to be more efficient than the estimator due to Kuk [1990. Asking sensitive questions indirectly. Biomerika 77(2), 436–438] and Franklin [1989. A comparison of estimators for randomized response sampling with continuous distributions from a dichotomous population. Comm. Statist. Theory Methods 18, 489–505] type estimators in randomized response sampling. Recently, Guerriero and Sandri [2007. A note on the comparison of some randomized response procedures. J. Statist. Plann. Inference 137, 2184–2190] have shown that the family of randomized response models proposed by Kuk [1990. Asking sensitive questions indirectly. Biomerika 77(2), 436–438] is better than the Simmons’ family in terms of efficiency and protection.  相似文献   

12.
This paper deals with models and methods for count data derived from observations on pairing phenomena. Pairs formed from “similar” members are excluded. Various models are considered and analyzed. Particular emphasis is on developing methods for testing whether particular pairs are prone to occur more or less often than expected by chance.  相似文献   

13.
This article is concerned with the simulation of one‐day cricket matches. Given that only a finite number of outcomes can occur on each ball that is bowled, a discrete generator on a finite set is developed where the outcome probabilities are estimated from historical data involving one‐day international cricket matches. The probabilities depend on the batsman, the bowler, the number of wickets lost, the number of balls bowled and the innings. The proposed simulator appears to do a reasonable job at producing realistic results. The simulator allows investigators to address complex questions involving one‐day cricket matches. The Canadian Journal of Statistics © 2009 Statistical Society of Canada  相似文献   

14.
加入WTO以来,中国在全球贸易中占据了举足轻重的地位,其中出口、对外投资是中国企业国际化的主要模式。为了考察企业生产率与资本密集度对国际化模式选择的影响,本文利用2005-2007年《中国工业企业数据库》和《境外投资企业(机构)名录》,按非出口、仅出口、仅投资和既出口又投资将全部企业划分为四种国际市场参与模式,使用多值Logit模型及相对风险概率对企业国际化模式选择进行检验。结果表明:较高生产率的企业更可能选择对外投资;劳动密集型企业更可能出口,资本密集型企业更可能对外投资;不同地区企业的生产率与资本密集度对企业国际化模式选择的影响具有差异。最后本文进一步从对外投资经营形式角度对对外直接投资的“引致出口效应”进行了解释。  相似文献   

15.
We consider the analysis of spell durations observed in event history studies where members of the study panel are seen intermittently. Challenges for analysis arise because losses to followup are frequently related to previous event history, and spells typically overlap more than one observation period. We provide methods of estimation based on inverse probability of censoring weighting for parametric and semiparametric Cox regression models. Selection of panel members through a complex survey design is also addressed, and the methods are illustrated in an analysis of jobless spell durations based on data from the Statistics Canada Survey of Labour and Income Dynamics. The Canadian Journal of Statistics 40: 1–21; 2012 © 2012 Statistical Society of Canada  相似文献   

16.
We propose a method of comparing two functional linear models in which explanatory variables are functions (curves) and responses can be either scalars or functions. In such models, the role of parameter vectors (or matrices) is played by integral operators acting on a function space. We test the null hypothesis that these operators are the same in two independent samples. The complexity of the test statistics increases as we move from scalar to functional responses and relax assumptions on the covariance structure of the regressors. They all, however, have an asymptotic chi‐squared distribution with the number of degrees of freedom which depends on a specific setting. The test statistics are readily computable using the R package fda , and have good finite sample properties. The test is applied to egg‐laying curves of Mediterranean flies and to data from terrestrial magnetic observatories. The Canadian Journal of Statistics © 2009 Statistical Society of Canada  相似文献   

17.
In “before” and “after” surveys of Attitudes Towards Random Breath Testing in South Australia, three basic versions of the questionnaire were used. In the first, a set of “lead-up” questions, which were designed to deliberately bias the results towards acceptance of the tests, was included before the main questions; in the second, there were no lead-up questions; in the third, a different set of lead-up questions was used, and was aimed at deliberately biasing the results against the tests. The results in two out of the four attempts to influence the answers (compared with no lead-up questions) were significant in the expected direction, and in the other two cases were in the correct direction but not significant. The difference between the positive-and negative-biasing versions was highly significant in both cases. It is important to be aware that changes in context rather than in question wording per se can give rise to effects which dwarf the sampling error.  相似文献   

18.
Donor imputation is frequently used in surveys. However, very few variance estimation methods that take into account donor imputation have been developed in the literature. This is particularly true for surveys with high sampling fractions using nearest donor imputation, often called nearest‐neighbour imputation. In this paper, the authors develop a variance estimator for donor imputation based on the assumption that the imputed estimator of a domain total is approximately unbiased under an imputation model; that is, a model for the variable requiring imputation. Their variance estimator is valid, irrespective of the magnitude of the sampling fractions and the complexity of the donor imputation method, provided that the imputation model mean and variance are accurately estimated. They evaluate its performance in a simulation study and show that nonparametric estimation of the model mean and variance via smoothing splines brings robustness with respect to imputation model misspecifications. They also apply their variance estimator to real survey data when nearest‐neighbour imputation has been used to fill in the missing values. The Canadian Journal of Statistics 37: 400–416; 2009 © 2009 Statistical Society of Canada  相似文献   

19.
Jun Shao 《Statistics》2013,47(3-4):203-237
This article reviews the applications of three resampling methods, the jackknife, the balanced repeated replication, and the bootstrap, in sample surveys. The sampling design under consideration is a stratified multistage sampling design. We discuss the implementation of the resampling methods; for example, the construction of balanced repeated replications and approximated balanced repeated replication estimators; four modified bootstrap algorithms to generate bootstrap samples; and three different ways of applying the resampling methods in the presence of imputed missing values. Asymptotic properties of the resampling estimators are discussed for two types of important survey estimators, functions of weighted averages and sample quantiles.  相似文献   

20.
Abstract

Time averaging has been the traditional approach to handle mixed sampling frequencies. However, it ignores information possibly embedded in high frequency. Mixed data sampling (MIDAS) regression models provide a concise way to utilize the additional information in high-frequency variables. In this paper, we propose a specification test to choose between time averaging and MIDAS models, based on a Durbin-Wu-Hausman test. In particular, a set of instrumental variables is proposed and theoretically validated when the frequency ratio is large. As a result, our method tends to be more powerful than existing methods, as reconfirmed through the simulations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号