共查询到17条相似文献,搜索用时 15 毫秒
1.
The integration of different data sources is a widely discussed topic among both the researchers and the Official Statistics. Integrating data helps to contain costs and time required by new data collections. The non-parametric micro Statistical Matching (SM) allows to integrate ‘live’ data resorting only to the observed information, potentially avoiding the misspecification bias and speeding the computational effort. Despite these pros, the assessment of the integration goodness when we use this method is not robust. Moreover, several applications comply with some commonly accepted practices which recommend e.g. to use the biggest data set as donor. We propose a validation strategy to assess the integration goodness. We apply it to investigate these practices and to explore how different combinations of the SM techniques and distance functions perform in terms of the reliability of the synthetic (complete) data set generated. The validation strategy takes advantage of the relation existing among the variables pre-and-post the integration. The results show that ‘the biggest, the best’ rule must not be considered mandatory anymore. Indeed, the integration goodness increases in relation to the variability of the matching variables rather than with respect to the dimensionality ratio between the recipient and the donor data set. 相似文献
2.
3.
Jan Lasek Zoltán Szlávik Marek Gagolewski Sandjai Bhulai 《Journal of applied statistics》2016,43(7):1349-1368
In this paper, we study the efficacy of the official ranking for international football teams compiled by FIFA, the body governing football competition around the globe. We present strategies for improving a team's position in the ranking. By combining several statistical techniques, we derive an objective function in a decision problem of optimal scheduling of future matches. The presented results display how a team's position can be improved. Along the way, we compare the official procedure to the famous Elo rating system. Although it originates from chess, it has been successfully tailored to ranking football teams as well. 相似文献
4.
In this note we consider the equality of the ordinary least squares estimator (OLSE) and the best linear unbiased estimator
(BLUE) of the estimable parametric function in the general Gauss–Markov model. Especially we consider the structures of the
covariance matrix V for which the OLSE equals the BLUE. Our results are based on the properties of a particular reparametrized version of the
original Gauss–Markov model.
相似文献
5.
The recent controversy about the size of crowds at candlelight protests in Korea raises an interesting question regarding the methods used to estimate crowd size. Protest organizers tend to count all participants in the event from its start to finish, while the police usually report the crowd size at its peak. While several counting methods are available to estimate the size of a crowd at a given time, counting the total number of the participants at a protest is not straightforward. In this paper, we propose a new estimator to count the total number of participants that we call the size of a dynamic crowd. We assume that the arrival and departure times of the crowd are randomly observed and that the number of the attendees in the crowd at a specific time is estimable. We estimate the number of total attendees during the entire gathering based on the capture-recapture model. We also propose a bootstrap procedure to construct a confidence interval for the crowd size. We demonstrate the performance of the proposed method with simulation studies and the data from Korea''s March for Science, a global event across the world on Earth Day, April 22, 2017. 相似文献
6.
Most data used to study the durations of unemployment spells come from the Current Population Survey (CPS), which is a point-in-time survey and gives an incomplete picture of the underlying duration distribution. We introduce a new sample of completed unemployment spells obtained from panel data and apply CPS sampling and reporting techniques to replicate the type of data used by other researchers. Predicted duration distributions derived from this CPS-like data are then compared to the actual distribution. We conclude that the best inferences that can be made about unemployment durations by using CPS-like data are seriously biased. 相似文献
7.
Masafumi Akahira 《统计学通讯:模拟与计算》2013,42(3):595-605
A higher order approximation formula for a percentage point of the noncentral t–distribution with v degrees of freedom is given up to the order o(v-3), using the Cornish-Fisher expansion for the statistic based on a lin-ear combination of a normal random variable and a chi-random variable. The upper confidence limit and the confidence interval for the non–centrality parameter are given. Numerical results are also obtained. 相似文献
8.
We argue that when the household composition changes, consumption patterns vary not only because of the cost effect that equivalence
scales try to measure, but also because of a “taste” or “style” effect. This effect can be identified and measured, under
a few assumptions, with the use of a new methodology, calledDM
2 (Decomposition Model of the effects of Demographic Metamorphosis), that can be viewed as a generalisation of Ray's (1983) price-scaling approach to the construction of equivalent scales.
An empirical application to data drawn from the Istat 1995 Italian Household Budget Survey suggests that the proposed method
improves our understanding of households' consumption patterns and the reliability of the equivalence scales that we derive.
We gratefully acknowledge helpful comments from Giorgio Calzolari, Franco Polverini, Ugo Trivellato and an anonymous referee,
but retain full responsibility for all errors, and for the processing of Istat (Italian Institute of Statistics) microdata
on Household Budgets. Financial support for this research was provided by the italian MURST (Research project on “Equivalence
scales” directed by Prof. Guido Ferrari, University of Firenze, Ref. No. 9913105354; and Research project on “Low fertility
in Italy: between economic constraints and value changes”, directed by Prof. Massimo Livi Bacci, University of Firenze, Ref.
No. MM13107238). Preliminary findings on this research topic have been presented in a few seminars and conferences: cf., e.g.,
De Santis and Maltagliati (2000 and 2001). 相似文献
9.
Michelle Casey Evgeny Degtyarev María José Lechuga Paola Aimone Alain Ravaud Robert J. Motzer Feng Liu Viktoriya Stalbovskaya Rui Tang Emily Butler Oliver Sailer Susan Halabi Daniel George 《Pharmaceutical statistics》2021,20(2):324-334
The estimand framework requires a precise definition of the clinical question of interest (the estimand) as different ways of accounting for “intercurrent” events post randomization may result in different scientific questions. The initiation of subsequent therapy is common in oncology clinical trials and is considered an intercurrent event if the start of such therapy occurs prior to a recurrence or progression event. Three possible ways to account for this intercurrent event in the analysis are to censor at initiation, consider recurrence or progression events (including death) that occur before and after the initiation of subsequent therapy, or consider the start of subsequent therapy as an event in and of itself. The new estimand framework clarifies that these analyses address different questions (“does the drug delay recurrence if no patient had received subsequent therapy?” vs “does the drug delay recurrence with or without subsequent therapy?” vs “does the drug delay recurrence or start of subsequent therapy?”). The framework facilitates discussions during clinical trial planning and design to ensure alignment between the key question of interest, the analysis, and interpretation. This article is a result of a cross-industry collaboration to connect the International Council for Harmonisation E9 addendum concepts to applications. Data from previously reported randomized phase 3 studies in the renal cell carcinoma setting are used to consider common intercurrent events in solid tumor studies, and to illustrate different scientific questions and the consequences of the estimand choice for study design, data collection, analysis, and interpretation. 相似文献
10.
Almut E. D. Veraart 《AStA Advances in Statistical Analysis》2011,95(3):253-291
This paper studies the impact of jumps on volatility estimation and inference based on various realised variation measures
such as realised variance, realised multipower variation and truncated realised multipower variation. We review the asymptotic
theory of those realised variation measures and present a new estimator for the asymptotic ‘variance’ of the centered realised
variance in the presence of jumps. Next, we compare the finite sample performance of the various estimators by means of detailed
Monte Carlo studies. Here we study the impact of the jump activity, of the jump size of the jumps in the price and of the
presence of additional independent or dependent jumps in the volatility. We find that the finite sample performance of realised
variance and, in particular, of log-transformed realised variance is generally good, whereas the jump-robust statistics tend
to struggle in the presence of a highly active jump process. 相似文献
11.
《Journal of Statistical Computation and Simulation》2012,82(1-4):127-138
In a former study (Chatillon, Gelinas, Martin and Laurencelle, 1987), the authors arrived at the conclusion that for small to moderate sample sizes (n≦90), and for population distributions that are not too skewed nor heavy tailed, the percentiles computed from a set of 9 classes are at least as precise as the corresponding percentiles computed with raw data. Their proof was based essentially on Monte Carlo simulations. The present paper gives a different and complementary proof, based on an exact evaluation of the mean squared error. The method of proof uses the trinomial distribution in an interesting way. 相似文献
12.
In this paper we consider a recursive method of Robbins–Monro type to estimate the solution of the linear problem Ax = u, in which the second member is measured with α-mixing errors. We also show the almost complete convergence (a.co) of this algorithm specifying its convergence rate. 相似文献
13.
ABSTRACTFactor analysis (FA) is the most commonly used pattern recognition methodology in social and health research. A technique that may help to better retrieve true information from FA is the rotation of the information axes. The main goal is to test the reliability of the results derived through FA and to reveal the best rotation method under various scenarios. Based on the results of the simulations, it was observed that when applying non-orthogonal rotation, the results were more repeatable as compared to the orthogonal rotation, and, when no rotation was applied. 相似文献
14.
15.
Redfern P 《Journal of official statistics》1986,2(4):415-424
"During the past twenty years Scandinavian countries have made changes in the methods of taking population and housing censuses that are more fundamental than any seen since modern census methods were first introduced two hundred years ago. These countries extract their census data in part or in whole from administrative registers. If other countries in Western Europe were to adopt this approach, most of them would have to make major improvements to their administrative records. But the primary reasons for making such improvements are concerned with administration and policy rather than statistics, namely, the need to secure a more effective and fairer system of public administration and to enable governments to exercise a wider range of policy options." 相似文献
16.
Michael O'Kelly John Doyle Philip J. Boland 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2010,173(1):215-235
Summary. The single transferable vote is a method of election that allows voters to mark candidates in order of preference. Votes that are not required to elect a candidate are passed to the next candidate in the voter's order of preference. Results of this kind of election give us data about the degree to which voters of a given persuasion are willing to pass their vote to a candidate of a different persuasion. Measures of voters' willingness to pass a vote to a candidate of a different persuasion are of particular interest in places such as Northern Ireland, where communities differ by religion and national aspiration, and agreed new political institutions are based on cross-community power-sharing. How we quantify this voting data may depend on the questions that we want to answer, of course. But, to understand changes in how the voter orders her or his preference, one may need to ask several questions, and to quantify the results of the election in more than one way. 相似文献
17.
The Wald statistic is known to vary under reparameterization. This raises the question: which parameterization should be chosen, in order to optimize power of the Wald statistic? We specifically consider k-sample tests of generalized linear models (GLMs) and generalized estimating equations (GEEs) in which the alternative hypothesis contains only two parameters. An example is presented in which such an alternative hypothesis is of interest. Amongst a general class of parameterizations, we find the parameterization that maximizes power via analysis of the non-centrality parameter, and show how the effect on power of reparameterization depends on sampling design and the differences in variance across samples. There is no single parameterization with optimal power across all alternatives. The Wald statistic commonly used under the canonical parameterization is optimal in some instances but it performs very poorly in others. We demonstrate results by example and by simulation, and describe their implications for likelihood ratio statistics and score statistics. We conclude that due to poor power properties, the routine use of score statistics and Wald statistics under the canonical parameterization for GEEs is a questionable practice. 相似文献