首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 750 毫秒
1.
When tables are generated from a data file, the release of those tables should not reveal too detailed information concerning individual respondents. The disclosure of individual respondents in the microdata file can be prevented by applying disclosure control methods at the table level (by cell suppression or cell perturbation), but this may create inconsistencies among other tables based on the same data file. Alternatively, disclosure control methods can be applied at the microdata level, but these methods may change the data permanently and do not account for specific table properties. These problems can be circumvented by assigning a (single and fixed) weight factor to each respondent/record in the microdata file. Normally this weight factor is equal to 1 for each record, and is not explicitly incorporated in the microdata file. Upon tabulation, each contribution of a respondent is weighted multiplicatively by the respondent's weight factor. This approach is called Source Data Perturbation (SDP) because the data is perturbed at the microdata level, not at the table level. It should be noted, however, that the data in the original microdata is not changed; only a weight variable is added. The weight factors can be chosen in accordance with the SDC paradigm, i.e. such that the tables generated from the microdata are safe, and the information loss is minimized. The paper indicates how this can be done. Moreover it is shown that the SDP approach is very suitable for use in data warehouses, as the weights can be conveniently put in the fact tables. The data can then still be accessed and sliced and diced up to a certain level of detail, and tables generated from the data warehouse are mutually consistent and safe.  相似文献   

2.
Using the marginal likelihood based on the signed ranks derived from matched pairs data, inferences are made for regression parameters. Both members of a given pair are subject to the same censoring time, while different pairs are subject to different censoring times. Censoring is independent of the response and on the right. Easily calculated logistic density scores are used to provide an approximate analysis so that inferences can be made about a regression parameter in the presence of a difference within the matched pairs. Inference for the survival times of matched skin grafts is considered.  相似文献   

3.
Parametric mixed-effects logistic models can provide effective analysis of binary matched-pairs data. Responses are assumed to follow a logistic model within pairs, with an intercept which varies across pairs according to a specified family of probability distributions G. In this paper we give necessary and sufficient conditions for consistent covariate effect estimation and present a geometric view of estimation which shows that when the assumed family of mixture distributions is rich enough, estimates of the effect of the binary covariate are typically consistent. The geometric view also shows that under the conditions for consistent estimation, the mixed-model estimator is identical to the familar conditional-likelihood estimator for matched pairs. We illustrate the findings with some examples.  相似文献   

4.
In official statistics, when a file of microdata must be delivered to external users, it is very difficult to propose them a file where missing values has been treated by multiple imputations. In order to overcome this difficulty, we propose a method of single imputation for qualitative data that respect numerous constraints. The imputation is balanced on totals previously estimated; editing rules can be respected; the imputation is random, but the totals are not affected by an imputation variance.  相似文献   

5.
Although there are several available test statistics to assess the difference of marginal probabilities in clustered matched‐pair binary data, associated confidence intervals (CIs) are not readily available. Herein, the construction of corresponding CIs is proposed, and the performance of each CI is investigated. The results from Monte Carlo simulation study indicate that the proposed CIs perform well in maintaining the nominal coverage probability: for small to medium numbers of clusters, the intracluster correlation coefficient‐adjusted McNemar statistic and its associated Wald or Score CIs are preferred; however, this statistic becomes conservative when the number of clusters is larger so that alternative statistics and their associated CIs are preferred. In practice, a combination of the intracluster correlation coefficient‐adjusted McNemar statistic with an alternative statistic is recommended. To illustrate the practical application, a real clustered matched‐pair collection of data is used to illustrate testing the difference of marginal probabilities and constructing the associated CIs. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

6.
7.
Matched case–control designs are commonly used in epidemiological studies for estimating the effect of exposure variables on the risk of a disease by controlling the effect of confounding variables. Due to retrospective nature of the study, information on a covariate could be missing for some subjects. A straightforward application of the conditional logistic likelihood for analyzing matched case–control data with the partially missing covariate may yield inefficient estimators of the parameters. A robust method has been proposed to handle this problem using an estimated conditional score approach when the missingness mechanism does not depend on the disease status. Within the conditional logistic likelihood framework, an empirical procedure is used to estimate the odds of the disease for the subjects with missing covariate values. The asymptotic distribution and the asymptotic variance of the estimator when the matching variables and the completely observed covariates are categorical. The finite sample performance of the proposed estimator is assessed through a simulation study. Finally, the proposed method has been applied to analyze two matched case–control studies. The Canadian Journal of Statistics 38: 680–697; 2010 © 2010 Statistical Society of Canada  相似文献   

8.
"We wish to measure the evidence that a pair of records relates to the same, rather than different, individuals. The paper emphasizes statistical models which can be fitted to a file of record pairs known to be correctly matched, and then used to estimate likelihood ratios. A number of models are developed and applied to U.K. immigration statistics. The combination of likelihood ratios for possibly correlated record fields is discussed." A series of comments on the paper is also included, as well as a reply to those comments by the author (pp. 312-20).  相似文献   

9.
The complicated structures can be modeled more efficiently and their flexibility can be increased through frailty models with varying coefficients.Therefore, such models are proposed in this article. The real challenge is to estimate varying coefficients by the penalized partial likelihood without closed form. The Laplace approximation is used to solve this problem. These varying coefficients are fitted using B-splines. Moreover, the variances of random effects are estimated by maximizing an approximate profile likelihood. The performance of the proposed methods are assessed with simulation studies and real data. The results show that the methods proposed are better than the counterpart in literature.  相似文献   

10.
《随机性模型》2013,29(3):333-367
We model behavior of a TCP-like source transmitting over a single channel to a server that processes work at a constant rate τ. Transmission by the source follows an on/off mechanism. When the overall load in the system is below a critical constant γ, transmission rates increase linearly but when the load exceeds γ, then transmission rates decrease geometrically fast. We study the system by means of an embedded Markov chain, which gives the buffer content at the start of transmissions. Attention is paid to the time necessary to transmit a file of size L and both the tail behavior and expectation of the distribution of file transmission time are considered.  相似文献   

11.
In an observational study in which each treated subject is matched to several untreated controls by using observed pretreatment covariates, a sensitivity analysis asks how hidden biases due to unobserved covariates might alter the conclusions. The bounds required for a sensitivity analysis are the solution to an optimization problem. In general, this optimization problem is not separable, in the sense that one cannot find the needed optimum by performing a separate optimization in each matched set and combining the results. We show, however, that this optimization problem is asymptotically separable, so that when there are many matched sets a separate optimization may be performed in each matched set and the results combined to yield the correct optimum with negligible error. This is true when the Wilcoxon rank sum test or the Hodges-Lehmann aligned rank test is applied in matching with multiple controls. Numerical calculations show that the asymptotic approximation performs well with as few as 10 matched sets. In the case of the rank sum test, a table is given containing the separable solution. With this table, only simple arithmetic is required to conduct the sensitivity analysis. The method also supplies estimates, such as the Hodges-Lehmann estimate, and confidence intervals associated with rank tests. The method is illustrated in a study of dropping out of US high schools and the effects on cognitive test scores.  相似文献   

12.
Determining the effectiveness of different treatments from observational data, which are characterized by imbalance between groups due to lack of randomization, is challenging. Propensity matching is often used to rectify imbalances among prognostic variables. However, there are no guidelines on how appropriately to analyze group matched data when the outcome is a zero-inflated count. In addition, there is debate over whether to account for correlation of responses induced by matching and/or whether to adjust for variables used in generating the propensity score in the final analysis. The aim of this research is to compare covariate unadjusted and adjusted zero-inflated Poisson models that do and do not account for the correlation. A simulation study is conducted, demonstrating that it is necessary to adjust for potential residual confounding, but that accounting for correlation is less important. The methods are applied to a biomedical research data set.  相似文献   

13.
Four distribution-free tests are developed for use in matched pair experiments when data may be censored: a bootstrap based on estimates of the median difference, and three rerandomization tests. The latter include a globally almost most powerful (GAMP) test which uses the original data and two modified Gilbert-Gehan tests which use the ranks. Computation time is reduced by using a binary count to generate subsamples and by restricting subsampling to the uncensored pairs. In Monte Carlo simulations against normal alternatives, mixed normal alternatives, and exponential alternatives, the GAMP test is most powerful with light censoring, the rank test is most powerful with heavy censoring. The bootstrap degenerates to the sign test and is least powerful.  相似文献   

14.
In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation–Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case–control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered.  相似文献   

15.
We studied several test statistics for testing the equality of marginal survival functions of paired censored data. The null distribution of the test statistics was approximated by permutation. These tests do not require explicit modeling or estimation of the within-pair correlation, accommodate both paired data and singletons, and the computation is straightforward with most statistical software. Numerical studies showed that these tests have competitive size and power performance. One test statistic has higher power than previously published test statistics when the two survival functions under comparison cross. We illustrate use of these tests in a propensity score matched dataset.  相似文献   

16.
Binary as well as polytomous logistic models have been found useful for estimating odds ratios when the exposure of prime interest assumes unordered multiple levels under matched pair case-control design. In our earlier studies, we have shown the use of a polytomous logistic model for estimating cumulative odds ratios when the exposure of prime interest assumes multiple ordered levels under matched pair case-control design. In this paper, using the above model, we estimate the covariate adjusted cumulative odds ratios, in the case of an ordinal multiple level exposure variable under a pairwise matched case-control retrospective design. An approach, based on asymptotic distributional results, is also described to investigate whether or not the response categories are distinguishable with respect to the cumulative odds ratios after adjusting the effect of covariates. An illustrative example is presented and discussed.  相似文献   

17.
ABSTRACT

The difference-in-differences (DID) method is widely used as a tool for identifying causal effects of treatments in program evaluation. When panel data sets are available, it is well-known that the average treatment effect on the treated (ATT) is point-identified under the DID setup. If a panel data set is not available, repeated cross sections (pretreatment and posttreatment) may be used, but may not point-identify the ATT. This paper systematically studies the identification of the ATT under the DID setup when posttreatment treatment status is unknown for the pretreatment sample. This is done through a novel application of an extension of a continuous version of the classical monotone rearrangement inequality which allows for general copula bounds. The identifying power of an instrumental variable and of a ‘matched subsample’ is also explored. Finally, we illustrate our approach by estimating the effect of the Americans with Disabilities Act of 1991 on employment outcomes of the disabled.  相似文献   

18.
ABSTRACT

We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer–employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to unmodeled determinants of earnings. Both tests reject exogenous mobility. We relax exogenous mobility by modeling the matched data as an evolving bipartite graph using a Bayesian latent-type framework. Our results suggest that allowing endogenous mobility increases the variation in earnings explained by individual heterogeneity and reduces the proportion due to employer and match effects. To assess external validity, we match our estimates of the wage components to out-of-sample estimates of revenue per worker. The mobility-bias-corrected estimates attribute much more of the variation in revenue per worker to variation in match quality and worker quality than the uncorrected estimates. Supplementary materials for this article are available online.  相似文献   

19.
In the formula of the likelihood ratio test on fourfold tables with matched pairs of binary data, only the two parts b and c, which represent changes, are considered; the retained parts a and d, which represent concordant observations, are not included. To develop the test by considering all the four parts and the mixture distribution of likelihood ratio chi-squares, a formula based on the entire sample is proposed. The revised formula is the same as the unrevised one when a + d is zero. The revised test is more valid than the revised McNemar's test in most cases.  相似文献   

20.
We consider methods for analysing matched case–control data when some covariates ( W ) are completely observed but other covariates ( X ) are missing for some subjects. In matched case–control studies, the complete-record analysis discards completely observed subjects if none of their matching cases or controls are completely observed. We investigate an imputation estimate obtained by solving a joint estimating equation for log-odds ratios of disease and parameters in an imputation model. Imputation estimates for coefficients of W are shown to have smaller bias and mean-square error than do estimates from the complete-record analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号