首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
A class of symmetric bivariate uniform distributions is proposed for use in statistical modeling. The distributions may be constructed to be absolutely continuous with correlations as close to±1 as desired. Expressions for the correlations, regressions and copulas are found. An extension to three dimensions is proposed.  相似文献   

2.
Empirical estimates of source statistical economic data such as trade flows, greenhouse gas emissions, or employment figures are always subject to uncertainty (stemming from measurement errors or confidentiality) but information concerning that uncertainty is often missing. This article uses concepts from Bayesian inference and the maximum entropy principle to estimate the prior probability distribution, uncertainty, and correlations of source data when such information is not explicitly provided. In the absence of additional information, an isolated datum is described by a truncated Gaussian distribution, and if an uncertainty estimate is missing, its prior equals the best guess. When the sum of a set of disaggregate data is constrained to match an aggregate datum, it is possible to determine the prior correlations among disaggregate data. If aggregate uncertainty is missing, all prior correlations are positive. If aggregate uncertainty is available, prior correlations can be either all positive, all negative, or a mix of both. An empirical example is presented, which reports relative uncertainties and correlation priors for the County Business Patterns database. In this example, relative uncertainties range from 1% to 80% and 20% of data pairs exhibit correlations below ?0.9 or above 0.9. Supplementary materials for this article are available online.  相似文献   

3.
Multiple assessments of an efficacy variable are often conducted prior to the initiation of randomized treatments in clinical trials as baseline information. Two goals are investigated in this article, where the first goal is to investigate the choice of these baselines in the analysis of covariance (ANCOVA) to increase the statistical power, and the second to investigate the magnitude of power loss when a continuous efficacy variable is dichotomized to categorical variable as commonly reported the biomedical literature. A statistical power analysis is developed with extensive simulations based on data from clinical trials in study participants with end stage renal disease (ESRD). It is found that the baseline choices primarily depend on the correlations among the baselines and the efficacy variable, with substantial gains for correlations greater than 0.6 and negligible for less than 0.2. Continuous efficacy variables always give higher statistical power in the ANCOVA modeling and dichotomizing the efficacy variable generally decreases the statistical power by 25%, which is an important practicum in designing clinical trials for study sample size and realistically budget. These findings can be easily applied in and extended to other clinical trials with similar design.  相似文献   

4.
Over 60 years ago Ronald Fisher demonstrated a number of potential pitfalls with statistical analyses using ratio variables. Nonetheless, these pitfalls are largely overlooked in contemporary clinical and epidemiological research, which routinely uses ratio variables in statistical analyses. This article aims to demonstrate how very different findings can be generated as a result of less than perfect correlations among the data used to generate ratio variables. These imperfect correlations result from measurement error and random biological variation. While the former can often be reduced by improvements in measurement, random biological variation is difficult to estimate and eliminate in observational studies. Moreover, wherever the underlying biological relationships among epidemiological variables are unclear, and hence the choice of statistical model is also unclear, the different findings generated by different analytical strategies can lead to contradictory conclusions. Caution is therefore required when interpreting analyses of ratio variables whenever the underlying biological relationships among the variables involved are unspecified or unclear. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

5.
金勇进  刘展 《统计研究》2016,33(3):11-17
利用大数据进行抽样,很多情况下抽样框的构造比较困难,使得抽取的样本属于非概率样本,难以将传统的抽样推断理论应用到非概率样本中,如何解决非概率抽样的统计推断问题,是大数据背景下抽样调查面临的严重挑战。本文提出了解决非概率抽样统计推断问题的基本思路:一是抽样方法,可以考虑基于样本匹配的样本选择、链接跟踪抽样方法等,使得到的非概率样本近似于概率样本,从而可采用概率样本的统计推断理论;二是权数的构造与调整,可以考虑基于伪设计、模型和倾向得分等方法得到类似于概率样本的基础权数;三是估计,可以考虑基于伪设计、模型和贝叶斯的混合概率估计。最后,以基于样本匹配的样本选择为例探讨了具体解决方法。  相似文献   

6.
A package for the stochastic simulation of discrete variables with assigned marginal distributions and correlation matrix is presented and discussed. The simulating mechanism relies upon the Gaussian copula, linking the discrete distributions together, and an iterative scheme recovering the correlation matrix for the copula that ensures the desired correlations among the discrete variables. Examples of its use are provided as well as three possible applications (related to probability, sampling, and inference), which illustrate the utility of the package as an efficient and easy-to-use tool both in statistical research and for didactic purposes.  相似文献   

7.
Among the goals of statistical matching, a very important one is the estimation of the joint distribution of variables not jointly observed in a sample survey but separately available from independent sample surveys. The absence of joint information on the variables of interest leads to uncertainty about the data generating model since the available sample information is unable to discriminate among a set of plausible joint distributions. In the present paper a short review of the concept of uncertainty in statistical matching under logical constraints, as well as how to measure uncertainty for continuous variables is presented. The notion of matching error is related to an appropriate measure of uncertainty and a criterion of selecting matching variables by choosing the variables minimizing such an uncertainty measure is introduced. Finally, a method to choose a plausible joint distribution for the variables of interest via iterative proportional fitting algorithm is described. The proposed methodology is then applied to household income and expenditure data when extra sample information regarding the average propensity to consume is available. This leads to a reconstructed complete dataset where each record includes measures on income and expenditure.  相似文献   

8.
The class of Multivariate BiLinear GARCH (MBL-GARCH) models is proposed and its statistical properties are investigated. The model can be regarded as a generalization to a multivariate setting of the univariate BL-GARCH model proposed by Storti and Vitale (Stat Methods Appl 12:19–40, 2003a; Comput Stat 18:387–400, 2003b). It is shown how MBL-GARCH models allow to account for asymmetric effects in both conditional variances and correlations. An EM algorithm for the maximum likelihood estimation of the model parameters is derived. Furthermore, in order to test for the appropriateness of the conditional variance and covariance specifications, a set of robust conditional moments test statistics are defined. Finally, the effectiveness of MBL-GARCH models in a risk management setting is assessed by means of an application to the estimation of the optimal hedge ratio in futures hedging.  相似文献   

9.
Partial correlations can be used to statistically control for the effects of unwanted variables.Perhaps the most frequently used test of a partial correlation is the parametric F test,which requires normality of the joint distribution of observations.The possibility that this assumption may not be met in practice suggests a need for procedures that do not require normality.Unfortunately,the statistical literature provides little guidance for choosing other tests when the normalityassumption is not satisfied.Several nonparametric tests of partial correlations are investigated using a computer simulation study.Recommendations are made for selecting certain tests under particular conditions  相似文献   

10.
We introduce a new multivariate GARCH model with multivariate thresholds in conditional correlations and develop a two-step estimation procedure that is feasible in large dimensional applications. Optimal threshold functions are estimated endogenously from the data and the model conditional covariance matrix is ensured to be positive definite. We study the empirical performance of our model in two applications using U.S. stock and bond market data. In both applications our model has, in terms of statistical and economic significance, higher forecasting power than several other multivariate GARCH models for conditional correlations.  相似文献   

11.
Correlated binary data arise frequently in medical as well as other scientific disciplines; and statistical methods, such as generalized estimating equation (GEE), have been widely used for their analysis. The need for simulating correlated binary variates arises for evaluating small sample properties of the GEE estimators when modeling such data. Also, one might generate such data to simulate and study biological phenomena such as tooth decay or periodontal disease. This article introduces a simple method for generating pairs of correlated binary data. A simple algorithm is also provided for generating an arbitrary dimensional random vector of non-negatively correlated binary variates. The method relies on the idea that correlations among the random variables arise as a result of their sharing some common components that induce such correlations. It then uses some properties of the binary variates to represent each variate in terms of these common components in addition to its own elements. Unlike most previous approaches that require solving nonlinear equations or use some distributional properties of other random variables, this method uses only some properties of the binary variate. As no intermediate random variables are required for generating the binary variates, the proposed method is shown to be faster than the other methods. To verify this claim, we compare the computational efficiency of the proposed method with those of other procedures.  相似文献   

12.
记录链接的技术问题与统计理论密切相关,尤其是在建立记录链接分类规则时需要构建统计模型,识别关键变量以完成数据匹配。在贝叶斯框架下构建分层模型整合行政记录,通过多元回归可以实现匹配错误率的估计,而且一对一限制下的记录链接允许通过模块反映记录信息的来源变化,基于MCMC模拟的后验分布计算方便,有助于提高数据整合效率。  相似文献   

13.
Statistical matching consists in estimating the joint characteristics of two variables observed in two distinct and independent sample surveys, respectively. In a parametric setup, ranges of estimates for non identifiable parameters are the only estimable items, unless restrictive assumptions on the probabilistic relationship between the non jointly observed variables are imposed. These ranges correspond to the uncertainty due to the absence of joint observations on the pair of variables of interest. The aim of this paper is to analyze the uncertainty in statistical matching in a non parametric setting. A measure of uncertainty is introduced, and its properties studied: this measure studies the “intrinsic” association between the pair of variables, which is constant and equal to 1/6 whatever the form of the marginal distribution functions of the two variables when knowledge on the pair of variables is the only one available in the two samples. This measure becomes useful in the context of the reduction of uncertainty due to further knowledge than data themselves, as in the case of structural zeros. In this case the proposed measure detects how the introduction of further knowledge shrinks the intrinsic uncertainty from 1/6 to smaller values, zero being the case of no uncertainty. Sampling properties of the uncertainty measure and of the bounds of the uncertainty intervals are also proved.  相似文献   

14.
Many college courses use group work as a part of the learning and evaluation process. Class groups are often selected randomly or by allowing students to organize groups themselves. However, if it is desired to control some aspect of the group structure, such as increasing schedule compatibility within groups, multidimensional scaling can be used to form such groups. This article describes how this has been adopted in an undergraduate statistics course. Resulting groups have been more homogeneous with respect to student schedules than groups selected randomly—an example from winter quarter 2004 increased correlations between student schedules from a mean of .29 before grouping to a within-group mean of .50. Further, the exercise allows opportunities to discuss a wealth of statistical concepts in class, including surveys, association measures, multidimensional scaling, and statistical graphics.  相似文献   

15.
The study of physical processes is often aided by computer models or codes. Computer models that simulate such processes are sometimes computationally intensive and therefore not very efficient exploratory tools. In this paper, we address computer models characterized by temporal dynamics and propose new statistical correlation structures aimed at modelling their time dependence. These correlations are embedded in regression models with input-dependent design matrix and input-correlated errors that act as fast statistical surrogates for the computationally intensive dynamical codes. The methods are illustrated with an automotive industry application involving a road load data acquisition computer model.  相似文献   

16.
In a recent paper, Leong and Huang [6] proposed a wavelet-correlation-based approach to test for cointegration between two time series. However, correlation and cointegration are two different concepts even when wavelet analysis is used. It is known that statistics based on non-stationary integrated variables have non-standard asymptotic distributions. However, wavelet analysis offsets the integrating order of non-stationary series so that traditional asymptotics on stationary variables suffices to ascertain the statistical properties of wavelet-based statistics. Based on this, this note shows that wavelet correlations cannot be used as a test of cointegration.  相似文献   

17.
Fingerprinting of functional connectomes is an increasingly standard measure of reproducibility in functional magnetic resonance imaging connectomics. In such studies, one attempts to match a subject's first session image with their second, in a blinded fashion, in a group of subjects measured twice. The number or percentage of correct matches is usually reported as a statistic, which is then used in permutation tests. Despite the simplicity and increasing popularity of such procedures, the soundness of the statistical tests, the power, and the factors impacting the test are unstudied. In this article, we investigate the statistical tests of matching based on exchangeability assumption in the fingerprinting analysis. We show that a nearly universal Poisson(1) approximation applies for different matching schemes. We theoretically investigate the permutation tests and explore the issue that the test is overly sensitive to uninteresting directions in the alternative hypothesis, such as clustering due to familial status or demographics. We perform a numerical study on two functional magnetic resonance imaging (fMRI) resting‐state datasets, the Human Connectome Project (HCP) and the Baltimore Longitudinal Study of Aging (BLSA). These datasets are instructive, as the HCP includes technical replications of long scans and includes monozygotic and dizygotic twins, as well as non‐twin siblings. In contrast, the BLSA study incorporates more typical length resting‐state scans in a longitudinal study. Finally, a study of single regional connections is performed on the HCP data.  相似文献   

18.
Matching and stratification based on confounding factors or propensity scores (PS) are powerful approaches for reducing confounding bias in indirect treatment comparisons. However, implementing these approaches requires pooled individual patient data (IPD). The research presented here was motivated by an indirect comparison between a single-armed trial in acute myeloid leukemia (AML), and two external AML registries with current treatments for a control. For confidentiality reasons, IPD cannot be pooled. Common approaches to adjusting confounding bias, such as PS matching or stratification, cannot be applied as 1) a model for PS, for example, a logistic model, cannot be fitted without pooling covariate data; 2) pooling response data may be necessary for some statistical inference (e.g., estimating the SE of mean difference of matched pairs) after PS matching. We propose a set of approaches that do not require pooling IPD, using a combination of methods including a linear discriminant for matching and stratification, and secure multiparty computation for estimation of within-pair sample variance and for calculations involving multiple control sources. The approaches only need to share aggregated data offline, rather than real-time secure data transfer, as required by typical secure multiparty computation for model fitting. For survival analysis, we propose an approach using restricted mean survival time. A simulation study was conducted to evaluate this approach in several scenarios, in particular, with a mixture of continuous and binary covariates. The results confirmed the robustness and efficiency of the proposed approach. A real data example is also provided for illustration.  相似文献   

19.
This department includes the two sections New Development in Statistical Computing and Statistical Computing Software Reviews; suitable contents for each of these sections are described under the respective section heading. Articles submitted for the department, outside the two sections, should not be highly technical and should be relevant to the teaching or practice of statistical computing.

Large correlation matrices are hard to look at. In this article we present correlations as elliptical glyphs for a simple intuitive display of large matrices.  相似文献   

20.
Several models have been developed to capture the dynamics of the conditional correlations between time series of financial returns and several studies have shown that the market volatility is a major determinant of the correlations. We extend some models to include explicitly the dependence of the correlations on the market volatility. The models differ by the way—linear or nonlinear, direct or indirect—in which the volatility influences the correlations. Using a wide set of models with two measures of market volatility on two datasets, we find that for some models, the empirical results support to some extent the statistical significance and the economic significance of the volatility effect on the correlations, but the presence of the volatility effect does not improve the forecasting performance of the extended models. Supplementary materials for this article are available online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号