首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Summary: In this paper, we present results of the estimation of a two–panel–waves wage equation based on completely observed units and on a multiply imputed data set. In addition to the survey information, reliable income data is available from the register. These external data are used to assess the reliability of wage regressions that suffer from item nonresponse. The findings reveal marked differences between the complete case analyses and both versions of multiple imputation analyses. We argue that the results based on the multiply imputed data sets are more reliable than those based on the complete case analysis.* We would like to thank Statistics Finland for providing the data. We are also very grateful to Susanna Sandström and Marjo Pyy–Martikainen for their helpful advice using the Finnish data. Helpful comments from Joachim Winter and participants of the Workshop on Item Nonresponse and Data Quality in Large Social Surveys, Basel, October, 2003, on an earlier version of the paper are greatfully acknowledged. Further, we would like to thank three anonymous referees and the editor for helpful comments and suggestions.  相似文献   

2.
Summary: We describe depth–based graphical displays that show the interdependence of multivariate distributions. The plots involve one–dimensional curves or bivariate scatterplots, so they are easier to interpret than correlation matrices. The correlation curve, modelled on the scale curve of Liu et al. (1999), compares the volume of the observed central regions with the volume under independence. The correlation DD–plot is the scatterplot of depth values under a reference distribution against depth values under independence. The area of the plot gives a measure of distance from independence. Correlation curve and DD-plot require an independence model as a baseline: Besides classical parametric specifications, a nonparametric estimator, derived from the randomization principle, is used. Combining data depth and the notion of quadrant dependence, quadrant correlation trajectories are obtained which allow simultaneous representation of subsets of variables. The properties of the plots for the multivariate normal distribution are investigated. Some real data examples are illustrated. *This work was completed with the support of Ca Foscari University.  相似文献   

3.
Summary: The next German census will be an Administrative Record Census. Data from several administrative registers about persons will be merged. Object identification has to be applied, since no unique identification number exists in the registers. We present a two–step procedure. We briefly discuss questions like correctness and completeness of the Administrative Record Census. Then we focus on the object identification problem, that can be perceived as a special classification problem. Pairs of records are to be classified as matched or not matched. To achieve computational efficiency a preselection technique of pairs is applied. Our approach is illustrated with a database containing a large set of consumer addresses.*This work was partially supported by the Berlin–Brandenburg Graduate School in Distributed Information Systems (DFG grant no. GRK 316). The authors thank Michael Fürnrohr for previewing the paper. We would like to thank also for the helpful comments of an anonymous reviewer.  相似文献   

4.
Summary: Wald statistics in generalized linear models are asymptotically 2 distributed. The asymptotic chi–squared law of the corresponding quadratic form shows disadvantages with respect to the approximation of the finite–sample distribution. It is shown by means of a comprehensive simulation study that improvements can be achieved by applying simple finite–sample size approximations to the distribution of the quadratic form in generalized linear models. These approximations are based on a 2 distribution with an estimated degree of freedom that generalizes an approach by Patnaik and Pearson. Simulation studies confirm that nominal level is maintained with higher accuracy compared to the Wald statistics.  相似文献   

5.
Summary: Data depth is a concept that measures the centrality of a point in a given data cloud x 1, x 2,...,x n or in a multivariate distribution P X on d d . Every depth defines a family of so–called trimmed regions. The –trimmed region is given by the set of points that have a depth of at least . Data depth has been used to define multivariate measures of location and dispersion as well as multivariate dispersion orders.If the depth of a point can be represented as the minimum of the depths with respect to all unidimensional projections, we say that the depth satisfies the (weak) projection property. Many depths which have been proposed in the literature can be shown to satisfy the weak projection property. A depth is said to satisfy the strong projection property if for every the unidimensional projection of the –trimmed region equals the –trimmed region of the projected distribution.After a short introduction into the general concept of data depth we formally define the weak and the strong projection property and give necessary and sufficient criteria for the projection property to hold. We further show that the projection property facilitates the construction of depths from univariate trimmed regions. We discuss some of the depths proposed in the literature which possess the projection property and define a general class of projection depths, which are constructed from univariate trimmed regions by using the above method.Finally, algorithmic aspects of projection depths are discussed. We describe an algorithm which enables the approximate computation of depths that satisfy the projection property.  相似文献   

6.
A traditional interpolation model is characterized by the choice of regularizer applied to the interpolant, and the choice of noise model. Typically, the regularizer has a single regularization constant , and the noise model has a single parameter . The ratio / alone is responsible for determining globally all these attributes of the interpolant: its complexity, flexibility, smoothness, characteristic scale length, and characteristic amplitude. We suggest that interpolation models should be able to capture more than just one flavour of simplicity and complexity. We describe Bayesian models in which the interpolant has a smoothness that varies spatially. We emphasize the importance, in practical implementation, of the concept of conditional convexity when designing models with many hyperparameters. We apply the new models to the interpolation of neuronal spike data and demonstrate a substantial improvement in generalization error.  相似文献   

7.
Summary: Panel data offers a unique opportunity to identify data that interviewers clearly faked by comparing data waves. In the German Socio–Economic Panel (SOEP), only 0.5 percent of all records of raw data have been detected as faked. These fakes are used here to analyze the potential impact of fakes on survey results. Due to our central finding the faked records have no impact on the mean or the proportions. However, we show that there may be a serious bias in the estimation of correlations and regression coefficients. In all but one year (1998), the detected faked data have never been disseminated within the widely–used SOEP study. The fakes are removed prior to data release.* We are grateful to participants in the workshop on Item Nonresponse and Data Quality on Large Social Surveys for useful critique and comments, especially Rainer Schnell and our outstanding discussant Regina Riphahn. The usual disclaimer applies.  相似文献   

8.
A probabilistic expert system provides a graphical representation of a joint probability distribution which can be used to simplify and localize calculations. Jensenet al. (1990) introduced a flow-propagation algorithm for calculating marginal and conditional distributions in such a system. This paper analyses that algorithm in detail, and shows how it can be modified to perform other tasks, including maximization of the joint density and simultaneous fast retraction of evidence entered on several variables.  相似文献   

9.
Zusammenfassung: In dieser Studie wird ein Konzept zur Kumulation von laufenden Haushaltsbudgetbefragungen im Rahmen des Projektes Amtliche Statistik und sozioökonomische Fragestellungen entwickelt und zur Diskussion gestellt. Dafür werden die theoretischen Grundlagen und Bausteine gelegt und die zentrale Aufgabe einer strukturellen demographischen Gewichtung mit einem Hochrechnungs–/Kalibrierungsansatz auf informationstheoretischer Basis gelöst.Vor dem Hintergrund der Wirtschaftsrechnungen des Statistischen Bundesamtes (Lfd. Wirtschaftsrechnungen und EVS) wird darauf aufbauend ein konkretes Konzept für die Kumulation von jährlichen Haushaltsbudgetbefragungen vorgeschlagen. Damit kann das Ziel einer Kumulation von Querschnitten mit einer umfassenderen Kumulationsstichprobe für tief gegliederte Analysen erreicht werden. Folgen sollen die Simulationsrechnungen zur Evaluation des Konzepts.
Summary: In this study a concept for cumulating periodic household surveys within the frame of the project Official Statistics and Socio–Economic Questions is developed and asks for discussion. We develop the theoretical background and solve the central task of a structural demographic weighting/calibration based on an information theoretical approach.Based on the household budget surveys of the Federal Statistical Office (Periodic Household Budget Surveys and Income and Consumption Sample (EVS)) a practical concept is proposed to cumulate yearly household surveys. This allows a cumulation of cross–sections by a comprehensive cumulated sample for deeply structured analyses. In a following study this concept shall be evaluated.
  相似文献   

10.
We propose exploratory, easily implemented methods for diagnosing the appropriateness of an underlying copula model for bivariate failure time data, allowing censoring in either or both failure times. It is found that the proposed approach effectively distinguishes gamma from positive stable copula models when the sample is moderately large or the association is strong. Data from the Womens Health and Aging Study (WHAS, Guralnik et al., The Womenss Health and Aging Study: Health and Social Characterisitics of Older Women with Disability. National Institute on Aging: Bethesda, Mayland, 1995) are analyzed to demonstrate the proposed diagnostic methodology. The positive stable model gives a better overall fit to these data than the gamma frailty model, but it tends to underestimate association at the later time points. The finding is consistent with recent theory differentiating catastrophic from progressive disability onset in older adults. The proposed methods supply an interpretable quantity for copula diagnosis. We hope that they will usefully inform practitioners as to the reasonableness of their modeling choices.  相似文献   

11.
I present a new Markov chain sampling method appropriate for distributions with isolated modes. Like the recently developed method of simulated tempering, the tempered transition method uses a series of distributions that interpolate between the distribution of interest and a distribution for which sampling is easier. The new method has the advantage that it does not require approximate values for the normalizing constants of these distributions, which are needed for simulated tempering, and can be tedious to estimate. Simulated tempering performs a random walk along the series of distributions used. In contrast, the tempered transitions of the new method move systematically from the desired distribution, to the easily-sampled distribution, and back to the desired distribution. This systematic movement avoids the inefficiency of a random walk, an advantage that is unfortunately cancelled by an increase in the number of interpolating distributions required. Because of this, the sampling efficiency of the tempered transition method in simple problems is similar to that of simulated tempering. On more complex distributions, however, simulated tempering and tempered transitions may perform differently. Which is better depends on the ways in which the interpolating distributions are deceptive.  相似文献   

12.
A fast splitting procedure for classification trees   总被引:1,自引:0,他引:1  
This paper provides a faster method to find the best split at each node when using the CART methodology. The predictability index is proposed as a splitting rule for growing the same classification tree as CART does when using the Gini index of heterogeneity as an impurity measure. A theorem is introduced to show a new property of the index : the for a given predictor has a value not lower than the for any split generated by the predictor. This property is used to make a substantial saving in the time required to generate a classification tree. Three simulation studies are presented in order to show the computational gain in terms of both the number of splits analysed at each node and the CPU time. The proposed splitting algorithm can prove computational efficiency in real data sets as shown in an example.  相似文献   

13.
Let X, T, Y be random vectors such that the distribution of Y conditional on covariates partitioned into the vectors X = x and T = t is given by f(y; x, ), where = (, (t)). Here is a parameter vector and (t) is a smooth, real–valued function of t. The joint distribution of X and T is assumed to be independent of and . This semiparametric model is called conditionally parametric because the conditional distribution f(y; x, ) of Y given X = x, T = t is parameterized by a finite dimensional parameter = (, (t)). Severini and Wong (1992. Annals of Statistics 20: 1768–1802) show how to estimate and (·) using generalized profile likelihoods, and they also provide a review of the literature on generalized profile likelihoods. Under specified regularity conditions, they derive an asymptotically efficient estimator of and a uniformly consistent estimator of (·). The purpose of this paper is to provide a short tutorial for this method of estimation under a likelihood–based model, reviewing results from Stein (1956. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, University of California Press, Berkeley, pp. 187–196), Severini (1987. Ph.D Thesis, The University of Chicago, Department of Statistics, Chicago, Illinois), and Severini and Wong (op. cit.).  相似文献   

14.
Summary: This paper presents first results of the project Factual anonymization of business microdata. The project aims at creating and providing scientific–use files of business data. For this, appropriate anonymization strategies and methods are created. The anonymization procedures are judged by whether they can guarantee the factual anonymity of company and operating data without limiting their potential value for statistical analysis. As an example the survey of German cost structure in the processing industry is considered. Here, both the impact of different anonymization methods on the potential use of the data as well as their effect on the re–identification risk is examined.
Zusammenfassung: Mit diesem Beitrag werden erste Ergebnisse des Projektes Faktische Anonymisierung wirtschaftsstatistischer Einzeldaten vorgestellt. Ziel des Projektes ist es, die Grundlagen für die Erstellung von Scientific–use–files im Bereich der Unternehmens– und Betriebsdaten zu legen. Hierzu werden entsprechende Anonymisierungsverfahren und –strategien entwickelt. Die Anonymisierungsverfahren müssen danach beurteilt werden, ob sie die faktische Anonymität von Unternehmens– und Betriebsdaten sicherstellen können, ohne das Analysepotenzial über Gebühr einzuschränken. Am Beispiel der Kostenstrukturerhebung im Verarbeitenden Gewerbe werden erste Ergebnisse vorgestellt. Dabei wird sowohl untersucht, wie sich verschiedene Anonymisierungsmaßnahmen auf das Analysepotenzial der Daten auswirken, als auch welche Effekte sie auf das Re–Identifikationsrisiko haben.
  相似文献   

15.
When simulating a dynamical system, the computation is actually of a spatially discretized system, because finite machine arithmetic replaces continuum state space. For chaotic dynamical systems, the discretized simulations often have collapsing effects, to a fixed point or to short cycles. Statistical properties of these phenomena can be modelled with random mappings with an absorbing centre. The model gives results which are very much in line with computational experiments. The effects are discussed with special reference to the family of mappings f (x)=1-|1-2x|,x [0,1],1,<,,<,. Computer experiments show close agreement with predictions of the model.  相似文献   

16.
When constructing uniform random numbers in [0, 1] from the output of a physical device, usually n independent and unbiased bits B j are extracted and combined into the machine number . In order to reduce the number of data used to build one real number, we observe that for independent and exponentially distributed random variables X n (which arise for example as waiting times between two consecutive impulses of a Geiger counter) the variable U n : = X 2n – 1/(X 2n – 1 + X 2n ) is uniform in [0, 1]. In the practical application X n can only be measured up to a given precision (in terms of the expectation of the X n ); it is shown that the distribution function obtained by calculating U n from these measurements differs from the uniform by less than /2.We compare this deviation with the error resulting from the use of biased bits B j with P {B j = 1{ = (where ] – [) in the construction of Y above. The influence of a bias is given by the estimate that in the p-total variation norm Q TV p = ( |Q()| p )1/p (p 1) we have P Y P 0 Y TV p (c n · )1/p with c n p for n . For the distribution function F Y F 0 Y 2(1 – 2n )|| holds.  相似文献   

17.
In a regression or classification setting where we wish to predict Y from x1,x2,..., xp, we suppose that an additional set of coaching variables z1,z2,..., zm are available in our training sample. These might be variables that are difficult to measure, and they will not be available when we predict Y from x1,x2,..., xp in the future. We consider two methods of making use of the coaching variables in order to improve the prediction of Y from x1,x2,..., xp. The relative merits of these approaches are discussed and compared in a number of examples.  相似文献   

18.
We present a new test for the presence of a normal mixture distribution, based on the posterior Bayes factor of Aitkin (1991). The new test has slightly lower power than the likelihood ratio test. It does not require the computation of the MLEs of the parameters or a search for multiple maxima, but requires computations based on classification likelihood assignments of observations to mixture components.  相似文献   

19.
We investigate the properties of several statistical tests for comparing treatment groups with respect to multivariate survival data, based on the marginal analysis approach introduced by Wei, Lin and Weissfeld [Regression Analysis of multivariate incomplete failure time data by modelling marginal distributians, JASA vol. 84 pp. 1065–1073]. We consider two types of directional tests, based on a constrained maximization and on linear combinations of the unconstrained maximizer of the working likelihood function, and the omnibus test arising from the same working likelihood. The directional tests are members of a larger class of tests, from which an asymptotically optimal test can be found. We compare the asymptotic powers of the tests under general contiguous alternatives for a variety of settings, and also consider the choice of the number of survival times to include in the multivariate outcome. We illustrate the results with simulations and with the results from a clinical trial examining recurring opportunistic infections in persons with HIV.  相似文献   

20.
Comparison of observed mortality with known, background, or standard rates has taken place for several hundred years. With the developments of regression models for survival data, an increasing interest has arisen in individualizing the standardisation using covariates of each individual. Also, account sometimes needs to be taken of random variation in the standard group.Emphasizing uses of the Cox regression model, this paper surveys a number of critical choices and pitfalls in this area. The methods are illustrated by comparing survival of liver patients after transplantation with survival after conservative treatment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号