首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
In this article we propose a novel framework for the modelling of non-stationary multivariate lattice processes. Our approach extends the locally stationary wavelet paradigm into the multivariate two-dimensional setting. As such the framework we develop permits the estimation of a spatially localised spectrum within a channel of interest and, more importantly, a localised cross-covariance which describes the localised coherence between channels. Associated estimation theory is also established which demonstrates that this multivariate spatial framework is properly defined and has suitable convergence properties. We also demonstrate how this model-based approach can be successfully used to classify a range of colour textures provided by an industrial collaborator, yielding superior results when compared against current state-of-the-art statistical image processing methods.  相似文献   

2.
This paper describes an application of small area estimation (SAE) techniques under area-level spatial random effect models when only area (or district or aggregated) level data are available. In particular, the SAE approach is applied to produce district-level model-based estimates of crop yield for paddy in the state of Uttar Pradesh in India using the data on crop-cutting experiments supervised under the Improvement of Crop Statistics scheme and the secondary data from the Population Census. The diagnostic measures are illustrated to examine the model assumptions as well as reliability and validity of the generated model-based small area estimates. The results show a considerable gain in precision in model-based estimates produced applying SAE. Furthermore, the model-based estimates obtained by exploiting spatial information are more efficient than the one obtained by ignoring this information. However, both of these model-based estimates are more efficient than the direct survey estimate. In many districts, there is no survey data and therefore it is not possible to produce direct survey estimates for these districts. The model-based estimates generated using SAE are still reliable for such districts. These estimates produced by using SAE will provide invaluable information to policy-analysts and decision-makers.  相似文献   

3.
A novel family of mixture models is introduced based on modified t-factor analyzers. Modified factor analyzers were recently introduced within the Gaussian context and our work presents a more flexible and robust alternative. We introduce a family of mixtures of modified t-factor analyzers that uses this generalized version of the factor analysis covariance structure. We apply this family within three paradigms: model-based clustering; model-based classification; and model-based discriminant analysis. In addition, we apply the recently published Gaussian analogue to this family under the model-based classification and discriminant analysis paradigms for the first time. Parameter estimation is carried out within the alternating expectation-conditional maximization framework and the Bayesian information criterion is used for model selection. Two real data sets are used to compare our approach to other popular model-based approaches; in these comparisons, the chosen mixtures of modified t-factor analyzers model performs favourably. We conclude with a summary and suggestions for future work.  相似文献   

4.
Many of the available methods for estimating small-area parameters are model-based approaches in which auxiliary variables are used to predict the variable of interest. For models that are nonlinear, prediction is not straightforward. MacGibbon and Tomberlin and Farrell, MacGibbon, and Tomberlin have proposed approaches that require microdata for all individuals in a small area. In this article, we develop a method, based on a second-order Taylor-series expansion to obtain model-based predictions, that requires only local-area summary statistics for both continuous and categorical auxiliary variables. The methodology is evaluated using data based on a U.S. Census.  相似文献   

5.
Among the many tools suited to detect local clusters in group-level data, Kulldorff–Nagarwalla’s spatial scan statistic gained wide popularity (Kulldorff and Nagarwalla in Stat Med 14(8):799–810, 1995). The underlying assumptions needed for making statistical inference feasible are quite strong, as counts in spatial units are assumed to be independent Poisson distributed random variables. Unfortunately, outcomes in spatial units are often not independent of each other, and risk estimates of areas that are close to each other will tend to be positively correlated as they share a number of spatially varying characteristics. We therefore introduce a Bayesian model-based algorithm for cluster detection in the presence of spatially autocorrelated relative risks. Our approach has been made possible by the recent development of new numerical methods based on integrated nested Laplace approximation, by which we can directly compute very accurate approximations of posterior marginals within short computational time (Rue et al. in JRSS B 71(2):319–392, 2009). Simulated data and a case study show that the performance of our method is at least comparable to that of Kulldorff–Nagarwalla’s statistic.  相似文献   

6.
Multilevel modelling of the geographical distributions of diseases   总被引:4,自引:0,他引:4  
Multilevel modelling is used on problems arising from the analysis of spatially distributed health data. We use three applications to demonstrate the use of multilevel modelling in this area. The first concerns small area all-cause mortality rates from Glasgow where spatial autocorrelation between residuals is examined. The second analysis is of prostate cancer cases in Scottish counties where we use a range of models to examine whether the incidence is higher in more rural areas. The third develops a multiple-cause model in which deaths from cancer and cardiovascular disease in Glasgow are examined simultaneously in a spatial model. We discuss some of the issues surrounding the use of complex spatial models and the potential for future developments.  相似文献   

7.
National statistical agencies and other data custodians collect and hold a vast amount of survey and census data, containing information vital for research and policy analysis. However, the problem of allowing analysis of these data, while protecting respondent confidentiality, has proved challenging to address. In this paper we will focus on the remote analysis approach, under which a confidential dataset is held in a secure environment under the direct control of the data custodian agency. A computer system within the secure environment accepts a query from an analyst, runs it on the data, then returns the results to the analyst. In particular, the analyst does not have direct access to the data at all, and cannot view any microdata records. We further focus on the fitting of linear regression models to confidential data in the presence of outliers and influential points, such as are often present in business data. We propose a new method for protecting confidentiality in linear regression via a remote analysis system, that provides additional confidentiality protection for outliers and influential points in the data. The method we describe in this paper was designed for the prototype DataAnalyser system developed by the Australian Bureau of Statistics, however the method would be suitable for similar remote analysis systems.  相似文献   

8.
Building new and flexible classes of nonseparable spatio-temporal covariances and variograms has resulted a key point of research in the last years. The goal of this paper is to present an up-to-date overview of recent spatio-temporal covariance models taking into account the problem of spatial anisotropy. The resulting structures are proved to have certain interesting mathematical properties, together with a considerable applicability. In particular, we focus on the problem of modelling anisotropy through isotropy within components. We present the Bernstein class, and a generalisation of Gneiting’s approach (2002a) to obtain new classes of space–time covariance functions which are spatially anisotropic. We also discuss some methods for building covariance functions that attain negative values. We finally present several differentiation and integration operators acting on particular space–time covariance classes.   相似文献   

9.
Before releasing survey data, statistical agencies usually perturb the original data to keep each survey unit''s information confidential. One significant concern in releasing survey microdata is identity disclosure, which occurs when an intruder correctly identifies the records of a survey unit by matching the values of some key (or pseudo-identifying) variables. We examine a recently developed post-randomization method for a strict control of identification risks in releasing survey microdata. While that procedure well preserves the observed frequencies and hence statistical estimates in case of simple random sampling, we show that in general surveys, it may induce considerable bias in commonly used survey-weighted estimators. We propose a modified procedure that better preserves weighted estimates. The procedure is illustrated and empirically assessed with an application to a publicly available US Census Bureau data set.  相似文献   

10.
Business establishment microdata typically are required to satisfy agency-specified edit rules, such as balance equations and linear inequalities. Inevitably some establishments' reported data violate the edit rules. Statistical agencies correct faulty values using a process known as edit-imputation. Business establishment data also must be heavily redacted before being shared with the public; indeed, confidentiality concerns lead many agencies not to share establishment microdata as unrestricted access files. When microdata must be heavily redacted, one approach is to create synthetic data, as done in the U.S. Longitudinal Business Database and the German IAB Establishment Panel. This article presents the first implementation of a fully integrated approach to edit-imputation and data synthesis. We illustrate the approach on data from the U.S. Census of Manufactures and present a variety of evaluations of the utility of the synthetic data. The paper also presents assessments of disclosure risks for several intruder attacks. We find that the synthetic data preserve important distributional features from the post-editing confidential microdata, and have low risks for the various attacks.  相似文献   

11.
In this paper we discuss a new theoretical basis for perturbation methods. In developing this new theoretical basis, we define the ideal measures of data utility and disclosure risk. Maximum data utility is achieved when the statistical characteristics of the perturbed data are the same as that of the original data. Disclosure risk is minimized if providing users with microdata access does not result in any additional information. We show that when the perturbed values of the confidential variables are generated as independent realizations from the distribution of the confidential variables conditioned on the non-confidential variables, they satisfy the data utility and disclosure risk requirements. We also discuss the relationship between the theoretical basis and some commonly used methods for generating perturbed values of confidential numerical variables.  相似文献   

12.
When tables are generated from a data file, the release of those tables should not reveal too detailed information concerning individual respondents. The disclosure of individual respondents in the microdata file can be prevented by applying disclosure control methods at the table level (by cell suppression or cell perturbation), but this may create inconsistencies among other tables based on the same data file. Alternatively, disclosure control methods can be applied at the microdata level, but these methods may change the data permanently and do not account for specific table properties. These problems can be circumvented by assigning a (single and fixed) weight factor to each respondent/record in the microdata file. Normally this weight factor is equal to 1 for each record, and is not explicitly incorporated in the microdata file. Upon tabulation, each contribution of a respondent is weighted multiplicatively by the respondent's weight factor. This approach is called Source Data Perturbation (SDP) because the data is perturbed at the microdata level, not at the table level. It should be noted, however, that the data in the original microdata is not changed; only a weight variable is added. The weight factors can be chosen in accordance with the SDC paradigm, i.e. such that the tables generated from the microdata are safe, and the information loss is minimized. The paper indicates how this can be done. Moreover it is shown that the SDP approach is very suitable for use in data warehouses, as the weights can be conveniently put in the fact tables. The data can then still be accessed and sliced and diced up to a certain level of detail, and tables generated from the data warehouse are mutually consistent and safe.  相似文献   

13.
In spatial statistics, models are often constructed based on some common, but possible restrictive assumptions for the underlying spatial process, including Gaussianity as well as stationarity and isotropy. However, these assumptions are frequently violated in applied problems. In order to simultaneously handle skewness and non-homogeneity (i.e., non-stationarity and anisotropy), we develop the fixed rank kriging model through the use of skew-normal distribution for its non-spatial latent variables. Our approach to spatial modeling is easy to implement and also provides a great flexibility in adjusting to skewed and large datasets with heterogeneous correlation structures. We adopt a Bayesian framework for our analysis, and describe a simple MCMC algorithm for sampling from the posterior distribution of the model parameters and performing spatial prediction. Through a simulation study, we demonstrate that the proposed model could detect departures from normality and, for illustration, we analyze a synthetic dataset of CO\(_2\) measurements. Finally, to deal with multivariate spatial data showing some degree of skewness, a multivariate extension of the model is also provided.  相似文献   

14.
Summary. Protection against disclosure is important for statistical agencies releasing microdata files from sample surveys. Simple measures of disclosure risk can provide useful evidence to support decisions about release. We propose a new measure of disclosure risk: the probability that a unique match between a microdata record and a population unit is correct. We argue that this measure has at least two advantages. First, we suggest that it may be a more realistic measure of risk than two measures that are currently used with census data. Second, we show that consistent inference (in a specified sense) may be made about this measure from sample data without strong modelling assumptions. This is a surprising finding, in its contrast with the properties of the two 'similar' established measures. As a result, this measure has potentially useful applications to sample surveys. In addition to obtaining a simple consistent predictor of the measure, we propose a simple variance estimator and show that it is consistent. We also consider the extension of inference to allow for certain complex sampling schemes. We present a numerical study based on 1991 census data for about 450 000 enumerated individuals in one area of Great Britain. We show that the theoretical results on the properties of the point predictor of the measure of risk and its variance estimator hold to a good approximation for these data.  相似文献   

15.
In this paper we address the evaluation of measurement process quality. We mainly focus on the evaluation procedure, as far as it is based on the numerical outcomes for the measurement of a single physical quantity. We challenge the approach where the ‘exact’ value of the observed quantity is compared with the error interval obtained from the measurements under test and we propose a procedure where reference measurements are used as ‘gold standard’. To this purpose, we designed a specific t-test procedure, explained here. We also describe and discuss a numerical simulation experiment demonstrating the behaviour of our procedure.  相似文献   

16.
A Bayesian mixture model for differential gene expression   总被引:3,自引:0,他引:3  
Summary.  We propose model-based inference for differential gene expression, using a nonparametric Bayesian probability model for the distribution of gene intensities under various conditions. The probability model is a mixture of normal distributions. The resulting inference is similar to a popular empirical Bayes approach that is used for the same inference problem. The use of fully model-based inference mitigates some of the necessary limitations of the empirical Bayes method. We argue that inference is no more difficult than posterior simulation in traditional nonparametric mixture-of-normal models. The approach proposed is motivated by a microarray experiment that was carried out to identify genes that are differentially expressed between normal tissue and colon cancer tissue samples. Additionally, we carried out a small simulation study to verify the methods proposed. In the motivating case-studies we show how the nonparametric Bayes approach facilitates the evaluation of posterior expected false discovery rates. We also show how inference can proceed even in the absence of a null sample of known non-differentially expressed scores. This highlights the difference from alternative empirical Bayes approaches that are based on plug-in estimates.  相似文献   

17.
A large body of literature exists on the techniques for selecting the important variables in linear regression analysis. Many of these techniques are ad hoc in nature and have not been studied from a theoretical viewpoint. In this paper we discuss some of the more commonly used techniques and propose a selection procedure based on the statistical selection and ranking approach. This procedure is easy to compute and apply. The procedure depends on the goodness of fit of the model and the total error associated with it.  相似文献   

18.
Ecological studies are based on characteristics of groups of individuals, which are common in various disciplines including epidemiology. It is of great interest for epidemiologists to study the geographical variation of a disease by accounting for the positive spatial dependence between neighbouring areas. However, the choice of scale of the spatial correlation requires much attention. In view of a lack of studies in this area, this study aims to investigate the impact of differing definitions of geographical scales using a multilevel model. We propose a new approach – the grid-based partitions and compare it with the popular census region approach. Unexplained geographical variation is accounted for via area-specific unstructured random effects and spatially structured random effects specified as an intrinsic conditional autoregressive process. Using grid-based modelling of random effects in contrast to the census region approach, we illustrate conditions where improvements are observed in the estimation of the linear predictor, random effects, parameters, and the identification of the distribution of residual risk and the aggregate risk in a study region. The study has found that grid-based modelling is a valuable approach for spatially sparse data while the statistical local area-based and grid-based approaches perform equally well for spatially dense data.  相似文献   

19.
"A model-based approach to the analysis of disease incidence around a fixed point is presented by considering the radial and directional effects to be expected from emissions from a putative source. In addition we present some score statistics which can be used to test for spatial effects." The methods discussed are applied to the analysis of bronchitis mortality around a reprocessing plant in Bonnybridge, Scotland.  相似文献   

20.
We set out IDR as a loglinear-model-based Moran's I test for Poisson count data that resembles the Moran's I residual test for Gaussian data. We evaluate its type I and type II error probabilities via simulations, and demonstrate its utility via a case study. When population sizes are heterogeneous, IDR is effective in detecting local clusters by local association terms with an acceptable type I error probability. When used in conjunction with local spatial association terms in loglinear models, IDR can also indicate the existence of first-order global cluster that can hardly be removed by local spatial association terms. In this situation, IDR should not be directly applied for local cluster detection. In the case study of St. Louis homicides, we bridge loglinear model methods for parameter estimation to exploratory data analysis, so that a uniform association term can be defined with spatially varied contributions among spatial neighbors. The method makes use of exploratory tools such as Moran's I scatter plots and residual plots to evaluate the magnitude of deviance residuals, and it is effective to model the shape, the elevation and the magnitude of a local cluster in the model-based test.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号