首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Two ordinary computer discs containing the personal details—including the bank account numbers—of almost half the citizens of this country have gone missing. David Hand looks at data in the modern world and asks how secure it is and how secure it should be.  相似文献   

3.
4.
5.
We propose a statistical index for measuring the fluctuations of a stochastic process ξξ. This index is based on the generalized Lorenz curves and (modified) Gini indices of econometric theory.  相似文献   

6.
7.
In this paper, we studied the uniform convergence with rates for the kernel estimator of the conditional mode function for a left truncated and right censored model. It is assumed that the lifetime observations with multivariate covariates form a stationary α-mixing sequence. Also, the asymptotic normality of the estimator is established.  相似文献   

8.
Although there is no shortage of clustering algorithms proposed in the literature, the question of the most relevant strategy for clustering compositional data (i.e. data whose rows belong to the simplex) remains largely unexplored in cases where the observed value is equal or close to zero for one or more samples. This work is motivated by the analysis of two applications, both focused on the categorization of compositional profiles: (1) identifying groups of co-expressed genes from high-throughput RNA sequencing data, in which a given gene may be completely silent in one or more experimental conditions; and (2) finding patterns in the usage of stations over the course of one week in the Velib' bicycle sharing system in Paris, France. For both of these applications, we make use of appropriately chosen data transformations, including the Centered Log Ratio and a novel extension called the Log Centered Log Ratio, in conjunction with the K-means algorithm. We use a non-asymptotic penalized criterion, whose penalty is calibrated with the slope heuristics, to select the number of clusters. Finally, we illustrate the performance of this clustering strategy, which is implemented in the Bioconductor package coseq, on both the gene expression and bicycle sharing system data.  相似文献   

9.
The terms sweeping and alignment refer to the same process. Sweeping/alignment is used by data analysts as a technique for describing the effects of a model factor (e.g., treatments in a randomized block design) after the effects of nuisance parameters (e.g., blocks) have been removed from the data. In this paper sweeping/alignment is used as the basis for developing tests of factors in unbalanced experimental design models. Formulas are presented for treatment effects in randomized block designs with missing observations, and for interaction and main effects in unbalanced two-way factorial designs with empty cells.  相似文献   

10.
In November 1979, the derailment of a train passing through Mississauga, Ontario, caused the explosion of tank cars containing liquid propane and the leakage of chlorine through a hole in another tank car, Officials evacuated more than 200,000 people from the area, but firemen stayed, exposing themselves to noxious fumes from the explosions and fires. When the crisis was over, health officials administered health tests and questionnaires to the affected men and to a control group of unaffected firefighters. Health information was gathered again one and two years later. In this study, two independent sets of analysts examine the health data to determine whether exposure to hazardous chemicals at the derailment site had any lasting effects on the lung function of the Mississauga firefighters.  相似文献   

11.
The Olympic and Paralympic Games are coming to London in 2012, and there will be huge interest, especially among the young. Could the Games be used to involve students of all ages in a large scale project that involves and interests them to break down their fear of statistics and to motivate learning? Neville Davies of the Royal Statistical Society Centre for Statistical Education appeals for help in a major teaching and learning initiative based on the London Olympics. Significance is proud to announce the launch of a scheme that could bring thousands of students, nationally and internationally, to appreciate and value the usefulness of statistics.  相似文献   

12.
Summary.  A frequent problem in longitudinal studies is that subjects may miss scheduled visits or be assessed at self-selected points in time. As a result, observed outcome data may be highly unbalanced and the availability of the data may be directly related to the outcome measure and/or some auxiliary factors that are associated with the outcome. If the follow-up visit and outcome processes are correlated, then marginal regression analyses will produce biased estimates. Building on the work of Robins, Rotnitzky and Zhao, we propose a class of inverse intensity-of-visit process-weighted estimators in marginal regression models for longitudinal responses that may be observed in continuous time. This allows us to handle arbitrary patterns of missing data as embedded in a subject's visit process. We derive the large sample distribution for our inverse visit-intensity-weighted estimators and investigate their finite sample behaviour by simulation. Our approach is illustrated with a data set from a health services research study in which homeless people with mental illness were randomized to three different treatments and measures of homelessness (as percentage days homeless in the past 3 months) and other auxiliary factors were recorded at follow-up times that are not fixed by design.  相似文献   

13.
We propose a mixture of latent variables model for the model-based clustering, classification, and discriminant analysis of data comprising variables with mixed type. This approach is a generalization of latent variable analysis, and model fitting is carried out within the expectation-maximization framework. Our approach is outlined and a simulation study conducted to illustrate the effect of sample size and noise on the standard errors and the recovery probabilities for the number of groups. Our modelling methodology is then applied to two real data sets and their clustering and classification performance is discussed. We conclude with discussion and suggestions for future work.  相似文献   

14.
"The 1990 [U.S.] census and Post-Enumeration Survey produced census and dual system estimates (DSE) of population by domain, together with an estimated sampling covariance matrix of the DSE. Estimates of the bias of the DSE were derived from various PES evaluation programs. Of the three sources, the unadjusted census is the least variable but is believed to be the most biased, the DSE is less biased but more variable, and the bias estimates may be regarded as unbiased but are the most variable. This article addresses methods for combining the census, the DSE, and bias estimates obtained from the evaluation programs to produce accurate estimates of population shares, as measured by weighted squared- or absolute-error loss functions applied to estimated population shares of domains."  相似文献   

15.
16.
We discuss maximum likelihood and estimating equations methods for combining results from multiple studies in pooling projects and data consortia using a meta-analysis model, when the multivariate estimates with their covariance matrices are available. The estimates to be combined are typically regression slopes, often from relative risk models in biomedical and epidemiologic applications. We generalize the existing univariate meta-analysis model and investigate the efficiency advantages of the multivariate methods, relative to the univariate ones. We generalize a popular univariate test for between-studies homogeneity to a multivariate test. The methods are applied to a pooled analysis of type of carotenoids in relation to lung cancer incidence from seven prospective studies. In these data, the expected gain in efficiency was evident, sometimes to a large extent. Finally, we study the finite sample properties of the estimators and compare the multivariate ones to their univariate counterparts.  相似文献   

17.
It is widely believed that the median is “usually” between the mean and the mode for skewed unimodal distributions. However, this inequality is not always true, especially with grouped data. Unavailability of complete raw data further necessitates the importance of evaluating this characteristic in grouped data. There is a gap in the current statistical literature on assessing mean–median–mode inequality for grouped data. The study aims to evaluate the relationship between the mean, median, and mode with unimodal grouped data; derive conditions for their inequalities; and present their application.  相似文献   

18.
When comparing the central values of two independent groups, should a t-test be performed, or should the observations be transformed into their ranks and a Wilcoxon-Mann-Whitney test performed? This paper argues that neither should automatically be chosen. Instead, provided that software for conducting randomisation tests is available, the chief concern should be with obtaining data values that are a good reflection of scientific reality and appropriate to the objective of the research; if necessary, the data values should be transformed so that this is so. The subsequent use of a randomisation (permutation) test will mean that failure of the transformed data values to satisfy assumptions such as normality and equality of variances will not be of concern.  相似文献   

19.
Two independent random samples are drawn from two multivariate normal populations with mean vectors μ1 and μ2 and a common variance-covariance matrix Σ. Ahmed and Saleh (1990) considered preliminary test maximum likelihood estimator (PMLTE) for estimating μ1 based on the Hotelling's T N 2, when it is suspected that μ1=μ2. In this paper, the PTMLE based on the Wald (W), Likelihood Ratio (LR) and Lagrangian Multiplier (LM) tests are considered. Using the quadratic risk function, the conditions of superiority of the proposed estimator for departure parameter are derived. A max-min rule for the size of the preliminary test of significance is presented. It is demonstrated that the PTMLE based on W test produces the highest minimum guaranteed efficiencies compared to UMLE among the three test procedures.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号