首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
To protect public-use microdata, one approach is not to allow users access to the microdata. Instead, users submit analyses to a remote computer that reports back basic output from the fitted model, such as coefficients and standard errors. To be most useful, this remote server also should provide some way for users to check the fit of their models, without disclosing actual data values. This paper discusses regression diagnostics for remote servers. The proposal is to release synthetic diagnostics—i.e. simulated values of residuals and dependent and independent variables–constructed to mimic the relationships among the real-data residuals and independent variables. Using simulations, it is shown that the proposed synthetic diagnostics can reveal model inadequacies without substantial increase in the risk of disclosures. This approach also can be used to develop remote server diagnostics for generalized linear models.  相似文献   

2.
Statistical Agencies manage huge amounts of microdata. The main task of these agencies is to provide a variety of users with general information about for instance the population and the economy. However, in some cases users request additional, more specific information. Many agencies have therefore set up facilities that enable selected users to obtain tailor-made statistical information.A remote access system is an example of such a facility where users can submit queries for statistical information from their own computer. These queries are handled by the statistical agency and the generated, possibly confidentialised, output is returned to the user. This way the agency still keeps control over its own data while the user does not need to make frequent visits to the agency.For some years, the Luxembourg Income Study (LIS) and Luxembourg Employment Study (LES) have made use of an advanced remote access system. At Statistics Netherlands and at other statistical institutes recently the need for a similar system has been expressed. In this article, we discuss the characteristics, limitations and desired properties of a remote access system. We illustrate the discussion by the system used at LIS/LES.  相似文献   

3.
In this paper we present methods for inference on data selected by a complex sampling design for a class of statistical models for the analysis of ordinal variables. Specifically, assuming that the sampling scheme is not ignorable, we derive for the class of cub models (Combination of discrete Uniform and shifted Binomial distributions) variance estimates for a complex two stage stratified sample. Both Taylor linearization and repeated replication variance estimators are presented. We also provide design‐based test diagnostics and goodness‐of‐fit measures. We illustrate by means of real data analysis the differences between survey‐weighted and unweighted point estimates and inferences for cub model parameters.  相似文献   

4.
Regression diagnostics are introduced for parameters in marginal association models for clustered binary outcomes in an implementation of generalized estimating equations. Estimating equations for intracluster correlations facilitate computational formulae for one-step deletion diagnostics in an extension of earlier work on diagnostics for parameters in the marginal mean model. The proposed diagnostics measure the influence of an observation or a cluster of observations on the estimated regression parameters and on the overall fit of the model. The diagnostics are applied to data from four research studies from public health and medicine.  相似文献   

5.
6.
The analysis of residuals may reveal various functional forms suitable for the regression model. In this paper, we investigate some selection criteria for selecting important regression variables. In doing so, we use statistical selection and ranking procedures. Thus, we derive an appropriate criterion to measure the influence and bias for the reduced models. We show that the reduced models are based on some noncentrality parameters which provide a measure of goodness of fit for the fitted models. In this paper, we also discuss the relationships of influence diagnostics and the statistic proposed earlier by Gupta and Huang (J. Statist. Plann. Inference 20 (1988) 155–167). We introduce a new measure for detecting influential data as an alternative to Cook's measure.  相似文献   

7.
8.
ABSTRACT

Constrained general linear models (CGLMs) have wide applications in practice. Similar to other data analysis, the identification of influential observations that may be potential outliers is an important step beyond in the CGLMs. We develop multiple case-deletion diagnostics for detecting influential observations in the CGLMs. The diagnostics are functions of basic building blocks: studentized residuals, error contrast matrix, and the inverse of the response variable covariance matrix. The basic building blocks are computed only once from the complete data analysis and provide information on the influence of the data on different aspects of the model fit. Computational formulas are given which make the procedures feasible. An illustrative example with a real data set is also reported.  相似文献   

9.
Tins paper discussed some diagnostic tools for logistics binary choice models using techniques based on perfect values. The concept of perfect value fit is defined for logistic models in an analogous approach used when dealing with non ignorable non-iespouse in contingency tables. Performance ol outlier diagnostics based on perfect values is illustrated on a set of data on incidents in pre-Challenger launches.  相似文献   

10.
National statistical agencies and other data custodians collect and hold a vast amount of survey and census data, containing information vital for research and policy analysis. However, the problem of allowing analysis of these data, while protecting respondent confidentiality, has proved challenging to address. In this paper we will focus on the remote analysis approach, under which a confidential dataset is held in a secure environment under the direct control of the data custodian agency. A computer system within the secure environment accepts a query from an analyst, runs it on the data, then returns the results to the analyst. In particular, the analyst does not have direct access to the data at all, and cannot view any microdata records. We further focus on the fitting of linear regression models to confidential data in the presence of outliers and influential points, such as are often present in business data. We propose a new method for protecting confidentiality in linear regression via a remote analysis system, that provides additional confidentiality protection for outliers and influential points in the data. The method we describe in this paper was designed for the prototype DataAnalyser system developed by the Australian Bureau of Statistics, however the method would be suitable for similar remote analysis systems.  相似文献   

11.
This article estimates autoregressive conditionally heteroscedastic (ARCH) and generalized ARCH (GARCH) models for five foreign currencies, using 10 years of daily data, a variety of ARCH and GARCH specifications, a number of nonnormal error densities, and a comprehensive set of diagnostic checks. It finds that ARCH and GARCH models can usually remove all heteroscedasticity in price changes in all five currencies. Goodness-of-fit diagnostics indicate that exponential GARCH with certain nonnormal distributions fits the Canadian dollar extremely well and the Swiss franc and the deutsche mark reasonably well. Only one nonnormal distribution fits the Japanese yen reasonably well. None fit the British pound.  相似文献   

12.
This paper introduces a parametric discrete failure time model which allows a variety of smooth hazard function shapes, including shapes which are not readily available with continuous failure time models. The model is easy to fit, and statistical inference is simple. Further, it is readily extended to allow for differences between subjects while retaining the ease of fit and simplicity of statistical inference. The performance of the discrete time analysis is demonstrated by application to several data sets.  相似文献   

13.
Recent research in cumulative damage models for strengths of systems has yielded various statistical distributions that incorporate a system size variable and follow a generalized Birnbaum-Saunders form. These models can be unified as a three-parameter Birnbaum-Saunders-type family of distributions, where the third parameter arises from the size variable through the cumulative damage approach. In this paper, the generalized three-parameter Birnbaum-Saunders distribution is characterized, and examples of cumulative damage models for system strength that fit this form are given. Also, estimation and asymptotic theory are developed for the generalized distribution, and illustrations are presented for experimental strength data for carbon composite materials.  相似文献   

14.
In survey sampling and in stereology, it is often desirable to estimate the ratio of means θ= E(Y)/E(X) from bivariate count data (X, Y) with unknown joint distribution. We review methods that are available for this problem, with particular reference to stereological applications. We also develop new methods based on explicit statistical models for the data, and associated model diagnostics. The methods are tested on a stereological dataset. For point‐count data, binomial regression and bivariate binomial models are generally adequate. Intercept‐count data are often overdispersed relative to Poisson regression models, but adequately fitted by negative binomial regression.  相似文献   

15.
刘洪  金林 《统计研究》2012,29(10):99-104
本文以经济增长理论为基础,对1953-2010年中国GDP数据和劳动投入、资本投入、人力资本等因素建立了半参数回归模型。然后,文章对模型了进行了统计诊断分析,计算了相关统计诊断量,利用统计诊断量得到了模型的异常点,基于此对中国GDP数据的准确性进行了讨论:中国GDP数据的异常点主要集中两个时间段1958-1961年和1991-1994年。文章最后对基于半参数回归模型统计诊断的统计数据准确性评估方法进行了评述。  相似文献   

16.
In this paper, Erlang–Lindley distribution (ErLD) is proposed which offers a more flexible model for waiting time data. It has the property that it can accommodate increasing, bathtub, and inverted bathtub shapes. Several statistical and reliability properties are derived and studied. The moments, its associated measures, and the limiting distributions of order statistics are derived. The model parameters are estimated by maximum likelihood and method of moments. An application of the proposed distribution to some waiting time data shows that it can give a better fit than other important lifetime models.  相似文献   

17.
In recent years, there has been considerable interest in regression models based on zero-inflated distributions. These models are commonly encountered in many disciplines, such as medicine, public health, and environmental sciences, among others. The zero-inflated Poisson (ZIP) model has been typically considered for these types of problems. However, the ZIP model can fail if the non-zero counts are overdispersed in relation to the Poisson distribution, hence the zero-inflated negative binomial (ZINB) model may be more appropriate. In this paper, we present a Bayesian approach for fitting the ZINB regression model. This model considers that an observed zero may come from a point mass distribution at zero or from the negative binomial model. The likelihood function is utilized to compute not only some Bayesian model selection measures, but also to develop Bayesian case-deletion influence diagnostics based on q-divergence measures. The approach can be easily implemented using standard Bayesian software, such as WinBUGS. The performance of the proposed method is evaluated with a simulation study. Further, a real data set is analyzed, where we show that ZINB regression models seems to fit the data better than the Poisson counterpart.  相似文献   

18.
For longitudinal time series data, linear mixed models that contain both random effects across individuals and first-order autoregressive errors within individuals may be appropriate. Some statistical diagnostics based on the models under a proposed elliptical error structure are developed in this work. It is well known that the class of elliptical distributions offers a more flexible framework for modelling since it contains both light- and heavy-tailed distributions. Iterative procedures for the maximum-likelihood estimates of the model parameters are presented. Score tests for the presence of autocorrelation and the homogeneity of autocorrelation coefficients among individuals are constructed. The properties of test statistics are investigated through Monte Carlo simulations. The local influence method for the models is also given. The analysed results of a real data set illustrate the values of the models and diagnostic statistics.  相似文献   

19.
In the analysis of time‐to‐event data, competing risks occur when multiple event types are possible, and the occurrence of a competing event precludes the occurrence of the event of interest. In this situation, statistical methods that ignore competing risks can result in biased inference regarding the event of interest. We review the mechanisms that lead to bias and describe several statistical methods that have been proposed to avoid bias by formally accounting for competing risks in the analyses of the event of interest. Through simulation, we illustrate that Gray's test should be used in lieu of the logrank test for nonparametric hypothesis testing. We also compare the two most popular models for semiparametric modelling: the cause‐specific hazards (CSH) model and Fine‐Gray (F‐G) model. We explain how to interpret estimates obtained from each model and identify conditions under which the estimates of the hazard ratio and subhazard ratio differ numerically. Finally, we evaluate several model diagnostic methods with respect to their sensitivity to detect lack of fit when the CSH model holds, but the F‐G model is misspecified and vice versa. Our results illustrate that adequacy of model fit can strongly impact the validity of statistical inference. We recommend analysts incorporate a model diagnostic procedure and contingency to explore other appropriate models when designing trials in which competing risks are anticipated.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号