首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this article, we explore hypothesis testing problems related to correlated proportions from clustered matched-pair binary data. Null hypotheses of equality in proportions, homogeneity, and non-inferiority of one to another are similar testing problems of linear contrasts of correlated proportions with suitable transformation. The covariance estimators of the test statistics are based on moment estimation under the null hypotheses. We present a general framework for testing linear contrasts of the correlated proportions from clustered matched-pair data based upon a class of unbiased estimators of the proportions. The corresponding testing procedures do not impose structure assumptions on the correlation matrix and are easy to use. Simulation results suggest that the proposed method is more likely to maintain the proper significance level and to improve power than the test proposed by Obuchowski.  相似文献   

2.
Simulation studies employed to study properties of estimators for parameters in population-average models for clustered or longitudinal data require suitable algorithms for data generation. Methods for generating correlated binary data that allow general specifications of the marginal mean and correlation structures are particularly useful. We compare an algorithm based on dichotomizing multi-normal variates to one based on a conditional linear family (CLF) of distributions [Qaqish BF. A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations. Biometrika. 2003;90:455–463] with respect to range restrictions induced on correlations. Examples include generating longitudinal binary data and generating correlated binary data compatible with specified marginal means and covariance structures for bivariate, overdispersed binomial outcomes. Results show the CLF method gives a wider range of correlations for longitudinal data having autocorrelated within-subject associations, while the multivariate probit method gives a wider range of correlations for clustered data having exchangeable-type correlations. In the case of a decaying-product correlation structure, it is shown that the CLF method achieves the nonparametric limits on the range of correlations, which cannot be surpassed by any method.  相似文献   

3.
The modelling of discrete such as binary time series, unlike the continuous time series, is not easy. This is due to the fact that there is no unique way to model the correlation structure of the repeated binary data. Some models may also provide a complicated correlation structure with narrow ranges for the correlations. In this paper, we consider a nonlinear dynamic binary time series model that provides a correlation structure which is easy to interpret and the correlations under this model satisfy the full?1 to 1 range. For the estimation of the parameters of this nonlinear model, we use a conditional generalized quasilikelihood (CGQL) approach which provides the same estimates as those of the well-known maximum likelihood approach. Furthermore, we consider a competitive linear dynamic binary time series model and examine the performance of the CGQL approach through a simulation study in estimating the parameters of this linear model. The model mis-specification effects on estimation as well as forecasting are also examined through simulations.  相似文献   

4.
While analyzing 2 × 2 contingency tables, the log odds ratio for measuring the strength of association is often approximated by a normal distribution with some variance. We show that the expression of that variance needs to be modified in the presence of correlation between two binomial distributions of the contingency table. In the present paper, we derive a correlation-adjusted variance of the limiting normal distribution of log odds ratio. We also propose a correlation adjusted test based on the standard odds ratio for analyzing matched-pair studies and any other study settings that induce correlated binary outcomes. We demonstrate that our proposed test outperforms the classical McNemar’s test. Simulation studies show the gains in power are especially manifest when sample size is small and strong correlation is present. Two examples of real data sets are used to demonstrate that the proposed method may lead to conclusions significantly different from those reached using McNemar’s test.  相似文献   

5.
We extend the log‐mean linear parameterization for binary data to discrete variables with arbitrary number of levels and show that also in this case it can be used to parameterize bi‐directed graph models. Furthermore, we show that the log‐mean linear parameterization allows one to simultaneously represent marginal independencies among variables and marginal independencies that only appear when certain levels are collapsed into a single one. We illustrate the application of this property by means of an example based on genetic association studies involving single‐nucleotide polymorphisms. More generally, this feature provides a natural way to reduce the parameter count, while preserving the independence structure, by means of substantive constraints that give additional insight into the association structure of the variables. © 2014 Board of the Foundation of the Scandinavian Journal of Statistics  相似文献   

6.
ABSTRACT

Data sets originating from wide range of research studies are composed of multiple variables that are correlated and of dissimilar types, primarily of count, binary/ordinal and continuous attributes. The present paper builds on the previous works on multivariate data generation and develops a framework for generating multivariate mixed data with a pre-specified correlation matrix. The generated data consist of components that are marginally count, binary, ordinal and continuous, where the count and continuous variables follow the generalized Poisson and normal distributions, respectively. The use of the generalized Poisson distribution provides a flexible mechanism which allows under- and over-dispersed count variables generally encountered in practice. A step-by-step algorithm is provided and its performance is evaluated using simulated and real-data scenarios.  相似文献   

7.
In this article, operational details of an R package MultiOrd that is designed for the generation of correlated ordinal data are described, and examples of some important functions are given. The package provides a valuable and needed tool that has been lacking for generating multivariate ordinal data.  相似文献   

8.
An approach to fill-in of missing data where gaps exist within an otherwise continuous record is addressed. Ad hoc methods of past approaches are discussed and limitations noted. An approach for providing an estimate of data filling consistent with data having a correlation structure is presented. The method provided is an extension of parametric modelling along with additional constraints imposed via a linear filter to account for variance preservation. The method is applied and compared to real data in which a portion of the record has been removed to simulate missing data. Results show the method to provide realistic missing data that preserves the correlation structure and variance of the measured data.  相似文献   

9.
Modeling binary familial data has been a challenging task due to the dependence among family members and the constraints imposed on the joint probability distribution of the binary responses. This paper investigates some useful familial dependence structures and proposes analyzing binary familial data using Gaussian copula model. Advantages of this approach are discussed as well as some computational details. An numerical example is also presented with an aim to show the capability of Gaussian copula model in more sophisticated data analysis.  相似文献   

10.
The data collection process and the inherent population structure are the main causes for clustered data. The observations in a given cluster are correlated, and the magnitude of such correlation is often measured by the intra-cluster correlation coefficient. The intra-cluster correlation can lead to an inflated size of the standard F test in a linear model. In this paper, we propose a solution to this problem. Unlike previous adjustments, our method does not require estimation of the intra-class correlation, which is problematic especially when the number of clusters is small. Our simulation results show that the new method outperforms the existing methods.  相似文献   

11.
Recently exponential family based random effects models have received considerable attention. These models usually arise from an unobservable random process added to the independent exponential family models. An unobservable correlated process, however, would cause correlations among the exponential family based data. This paper, first, develops an asymptotically optimal test for testing the appropriateness of a fixed effects model for the exponential family based independent data versus a random effects model for the exponential family based independent or correlated data. The paper, then, provides a general framework on regression analysis for the exponential family based data generated under the random effects models.  相似文献   

12.
In recent years, the spatial lattice data has been a motivating issue for researches. Modeling of binary variables observed at locations on a spatial lattice has been sufficiently investigated and the autologistic model is a popular tool for analyzing these data. But, there are many situations where binary responses are clustered in several uncorrelated lattices, and only a few studies were found to investigate the modeling of binary data distributed in such spatial structure. Besides, due to spatial dependency in data exact likelihood analyses is not possible. Bayesian inference, for the autologistic function due to intractability of its normalizing-constant, often has limitations and difficulties. In this study, spatially correlated binary data clustered in uncorrelated lattices are modeled via autologistic regression and IBF (inverse Bayes formulas) sampler with help of introducing latent variables, is extended for posterior analysis and parameter estimation. The proposed methodology is illustrated using simulated and real observations.  相似文献   

13.
This work introduces specific tools based on phi-divergences to select and check generalized linear models with binary data. A backward selection criterion that helps to reduce the number of explanatory variables is considered. Diagnostic methods based on divergence measures such as a new measure to detect leverage points and two indicators to detect influential points are introduced. As an illustration, the diagnostics are applied to human psychology data.  相似文献   

14.
A new family of mixture models for the model‐based clustering of longitudinal data is introduced. The covariance structures of eight members of this new family of models are given and the associated maximum likelihood estimates for the parameters are derived via expectation–maximization (EM) algorithms. The Bayesian information criterion is used for model selection and a convergence criterion based on the Aitken acceleration is used to determine the convergence of these EM algorithms. This new family of models is applied to yeast sporulation time course data, where the models give good clustering performance. Further constraints are then imposed on the decomposition to allow a deeper investigation of the correlation structure of the yeast data. These constraints greatly extend this new family of models, with the addition of many parsimonious models. The Canadian Journal of Statistics 38:153–168; 2010 © 2010 Statistical Society of Canada  相似文献   

15.
Modelling udder infection data using copula models for quadruples   总被引:1,自引:0,他引:1  
We study copula models for correlated infection times in the four udder quarters of dairy cows. Both a semi-parametric and a nonparametric approach are considered to estimate the marginal survival functions, taking into account the effect of a binary udder quarter level covariate. We use a two-stage estimation approach and we briefly discuss the asymptotic behaviour of the estimators obtained in the first and the second stage of the estimation. A pseudo-likelihood ratio test is used to select an appropriate copula from the power variance copula family that describes the association between the outcomes in a cluster. We propose a new bootstrap algorithm to obtain the p-value for this test. This bootstrap algorithm also provides estimates for the standard errors of the estimated parameters in the copula. The proposed methods are applied to the udder infection data. A small simulation study for a setting similar to the setting of the udder infection data gives evidence that the proposed method provides a valid approach to select an appropriate copula within the power variance copula family.  相似文献   

16.
Data collection process in most observational and experimental studies yield different types of variables, leading to the use of joint models that are capable of handling multiple data types. Evaluation of various statistical techniques that have been developed for mixed data in simulated environments requires concurrent generation of multiple variables. In this article, I present an important augmentation to a unified framework proposed in our previously published work for simultaneously generating binary and nonnormal continuous data given the marginal characteristics and correlation structure, via fifth-order power polynomials that are known to extend the area covered in the skewness-elongation plane and to provide a better approximation to the probability density function of the continuous variables. I evaluate how well the improved methodology performs in comparison to the original one, in a simulated setting with illustrations of algorithmic steps. Although the relative gains for the associational quantities are not substantial, the augmented version appears to better capture the marginal quantities that are pertinent to the higher-order moments, as indicated by very close resemblance between the specified and empirically computed quantities on average.  相似文献   

17.
A new method for analyzing high-dimensional categorical data, Linear Latent Structure (LLS) analysis, is presented. LLS models belong to the family of latent structure models, which are mixture distribution models constrained to satisfy the local independence assumption. LLS analysis explicitly considers a family of mixed distributions as a linear space, and LLS models are obtained by imposing linear constraints on the mixing distribution.LLS models are identifiable under modest conditions and are consistently estimable. A remarkable feature of LLS analysis is the existence of a high-performance numerical algorithm, which reduces parameter estimation to a sequence of linear algebra problems. Simulation experiments with a prototype of the algorithm demonstrated a good quality of restoration of model parameters.  相似文献   

18.
SiZer (SIgnificant ZERo crossing of the derivatives) is a graphical scale-space visualization tool that allows for statistical inferences. In this paper we develop a spatial SiZer for finding significant features and conducting goodness-of-fit tests for spatially dependent images. The spatial SiZer utilizes a family of kernel estimates of the image and provides not only exploratory data analysis but also statistical inference with spatial correlation taken into account. It is also capable of comparing the observed image with a specific null model being tested by adjusting the statistical inference using an assumed covariance structure. Pixel locations having statistically significant differences between the image and a given null model are highlighted by arrows. The spatial SiZer is compared with the existing independent SiZer via the analysis of simulated data with and without signal on both planar and spherical domains. We apply the spatial SiZer method to the decadal temperature change over some regions of the Earth.  相似文献   

19.
We describe the analysis of some matched-pair binary data arising from a study designed to investigate whether cellular-telephone use is associated with motor-vehicle collisions. Conditional and random effects approaches to the problem are derived and compared. Driving intermittency is a potential confounder whose effect is assessed by strategic choices of the control period and by application of the bootstrap. The marked discrepancy between the conditional and random approaches merits further study.  相似文献   

20.
For general matched-pair data with polytomous responses in biomedical research, the Stuart–Maxwell test (Stuart, 1955, Maxwell, 1970) and the Bhapkar (1966) test are commonly used for evaluating marginal homogeneity. For data collected in clusters, we propose extensions for statistical inference without structural within-cluster correlation or distributional assumptions. Meanwhile, two extended Obuchowski tests are proposed based on the work of Obuchowski (1998) generally applied to clustered matched-pair binary data. A Monte Carlo simulation study illustrates that our proposed extension to the Stuart–Maxwell test and the two extended Obuchowski tests perform well with respect to the power and the nominal size, though the extended Bhapkar test is asymptotically equivalent to the other three tests, it is not recommended in practice due to its being liberal in the nominal size.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号