首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
Researchers are increasingly using the standardized difference to compare the distribution of baseline covariates between treatment groups in observational studies. Standardized differences were initially developed in the context of comparing the mean of continuous variables between two groups. However, in medical research, many baseline covariates are dichotomous. In this article, we explore the utility and interpretation of the standardized difference for comparing the prevalence of dichotomous variables between two groups. We examined the relationship between the standardized difference, and the maximal difference in the prevalence of the binary variable between two groups, the relative risk relating the prevalence of the binary variable in one group compared to the prevalence in the other group, and the phi coefficient for measuring correlation between the treatment group and the binary variable. We found that a standardized difference of 10% (or 0.1) is equivalent to having a phi coefficient of 0.05 (indicating negligible correlation) for the correlation between treatment group and the binary variable.  相似文献   

2.
Random error in a continuous outcome variable does not affect its regression on a predictor. However, when a continuous outcome variable is dichotomised, random measurement error results in a flatter exposure-response relationship with a higher intercept. Although this consequence is similar to the effect of misclassification in a binary outcome variable, it cannot be corrected using techniques appropriate for binary data. Conditional distributions of the measurements of the continuous outcome variable can be corrected if the reliability coefficient of the measurements can be estimated. An unbiased estimate of the exposure-response relationship is then easily calculated. This procedure is demonstrated using data on the relationship between smoking and the development of airway obstruction.  相似文献   

3.
To study the relationship between a sensitive binary response variable and a set of non‐sensitive covariates, this paper develops a hidden logistic regression to analyse non‐randomized response data collected via the parallel model originally proposed by Tian (2014). This is the first paper to employ the logistic regression analysis in the field of non‐randomized response techniques. Both the Newton–Raphson algorithm and a monotone quadratic lower bound algorithm are developed to derive the maximum likelihood estimates of the parameters of interest. In particular, the proposed logistic parallel model can be used to study the association between a sensitive binary variable and another non‐sensitive binary variable via the measure of odds ratio. Simulations are performed and a study on people's sexual practice data in the United States is used to illustrate the proposed methods.  相似文献   

4.
Multiplicative-interaction (M-I) logit models are proposed for three-way IxJx2 contingency tables where the third variable constitutes a binary response. Models are derived by assigning unknown scores to the categories and forming product interactions from them. Asymptotic results under special sampling constraints are derived for maximum likelihood estimates and the goodness-of-fit statistics. The class of models proposed in this paper are found to be useful when no obvious scores are available. An example is included.  相似文献   

5.
This article provides a method of interpreting a surprising inequality in multiple linear regression: the squared multiple correlation can be greater than the sum of the simple squared correlations between the response variable and each of the predictor variables. The interpretation is obtained via principal component analysis by studying the influence of some components with small variance on the response variable. One example is used as an illustration and some conclusions are derived.  相似文献   

6.
This study considers a fully-parametric but uncongenial multiple imputation (MI) inference to jointly analyze incomplete binary response variables observed in a correlated data settings. Multiple imputation model is specified as a fully-parametric model based on a multivariate extension of mixed-effects models. Dichotomized imputed datasets are then analyzed using joint GEE models where covariates are associated with the marginal mean of responses with response-specific regression coefficients and a Kronecker product is accommodated for cluster-specific correlation structure for a given response variable and correlation structure between multiple response variables. The validity of the proposed MI-based JGEE (MI-JGEE) approach is assessed through a Monte Carlo simulation study under different scenarios. The simulation results, which are evaluated in terms of bias, mean-squared error, and coverage rate, show that MI-JGEE has promising inferential properties even when the underlying multiple imputation is misspecified. Finally, Adolescent Alcohol Prevention Trial data are used for illustration.  相似文献   

7.
In this paper we use non-parametric local polynomial methods to estimate the regression function, m ( x ). Y may be a binary or continuous response variable, and X is continuous with non-uniform density. The main contributions of this paper are the weak convergence of a bandwidth process for kernels of order (0, k ), k =2 j , j ≥1 and the proposal of a local data-driven bandwidth selection method which is particularly beneficial for the case when X is not distributed uniformly. This selection method minimizes estimates of the asymptotic MSE and estimates the bias portion in an innovative way which relies on the order of the kernel and not estimation of m 2( x ) directly. We show that utilization of this method results in the achievement of the optimal asymptotic MSE by the estimator, i.e. the method is efficient. Simulation studies are provided which illustrate the method for both binary and continuous response cases.  相似文献   

8.
Latent variable models are widely used for jointly modeling of mixed data including nominal, ordinal, count and continuous data. In this paper, we consider a latent variable model for jointly modeling relationships between mixed binary, count and continuous variables with some observed covariates. We assume that, given a latent variable, mixed variables of interest are independent and count and continuous variables have Poisson distribution and normal distribution, respectively. As such data may be extracted from different subpopulations, consideration of an unobserved heterogeneity has to be taken into account. A mixture distribution is considered (for the distribution of the latent variable) which accounts the heterogeneity. The generalized EM algorithm which uses the Newton–Raphson algorithm inside the EM algorithm is used to compute the maximum likelihood estimates of parameters. The standard errors of the maximum likelihood estimates are computed by using the supplemented EM algorithm. Analysis of the primary biliary cirrhosis data is presented as an application of the proposed model.  相似文献   

9.
The distribution of the mean of a random sample drawn from a skew-normal population was derived by Chen et al. (2004). Here, we consider a hierarchical structure and derive the distribution of the sample mean when the location parameter itself is a random variable with a normal distribution. In neurotoxicological bioassay experiments with laboratory animals, often the response of interest is continuous in nature and the mean of responses is used for inferential purposes (Chen, 2006). However, in developmental neurotoxicity experiments where the neurological effect of a compound on the developing fetus is of interest, because of the intra-litter correlation, the mean of the response distribution may vary from one litter to another. The unconditional distribution of the litter sample mean is derived and its application in the analysis of data from developmental neurotoxicology is described. An example with real experimental data is used to provide further illustration.  相似文献   

10.
We consider data with a nominal grouping variable and a binary response variable. The grouping variable is measured without error, but the response variable is measured using a fallible device subject to misclassification. To achieve model identifiability, we use the double-sampling scheme which requires obtaining a subsample of the original data or another independent sample. This sample is then classified by both the fallible device and another infallible device regarding the response variable. We propose two Wald tests for testing the association between the two variables and illustrate the test using traffic data. The Type-I error rate and power of the tests are examined using simulations and a modified Wald test is recommended.  相似文献   

11.
Count data with structural zeros are common in public health applications. There are considerable researches focusing on zero-inflated models such as zero-inflated Poisson (ZIP) and zero-inflated Negative Binomial (ZINB) models for such zero-inflated count data when used as response variable. However, when such variables are used as predictors, the difference between structural and random zeros is often ignored and may result in biased estimates. One remedy is to include an indicator of the structural zero in the model as a predictor if observed. However, structural zeros are often not observed in practice, in which case no statistical method is available to address the bias issue. This paper is aimed to fill this methodological gap by developing parametric methods to model zero-inflated count data when used as predictors based on the maximum likelihood approach. The response variable can be any type of data including continuous, binary, count or even zero-inflated count responses. Simulation studies are performed to assess the numerical performance of this new approach when sample size is small to moderate. A real data example is also used to demonstrate the application of this method.  相似文献   

12.
Measurement error is a commonly addressed problem in psychometrics and the behavioral sciences, particularly where gold standard data either does not exist or are too expensive. The Bayesian approach can be utilized to adjust for the bias that results from measurement error in tests. Bayesian methods offer other practical advantages for the analysis of epidemiological data including the possibility of incorporating relevant prior scientific information and the ability to make inferences that do not rely on large sample assumptions. In this paper we consider a logistic regression model where both the response and a binary covariate are subject to misclassification. We assume both a continuous measure and a binary diagnostic test are available for the response variable but no gold standard test is assumed available. We consider a fully Bayesian analysis that affords such adjustments, accounting for the sources of error and correcting estimates of the regression parameters. Based on the results from our example and simulations, the models that account for misclassification produce more statistically significant results, than the models that ignore misclassification. A real data example on math disorders is considered.  相似文献   

13.
Stratified randomization based on the baseline value of the primary analysis variable is common in clinical trial design. We illustrate from a theoretical viewpoint the advantage of such a stratified randomization to achieve balance of the baseline covariate. We also conclude that the estimator for the treatment effect is consistent when including both the continuous baseline covariate and the stratification factor derived from the baseline covariate. In addition, the analysis of covariance model including both the continuous covariate and the stratification factor is asymptotically no less efficient than including either only the continuous baseline value or only the stratification factor. We recommend that the continuous baseline covariate should generally be included in the analysis model. The corresponding stratification factor may also be included in the analysis model if one is not confident that the relationship between the baseline covariate and the response variable is linear. In spite of the above recommendation, one should always carefully examine relevant historical data to pre-specify the most appropriate analysis model for a perspective study.  相似文献   

14.
In a regression context, the dichotomization of a continuous outcome variable is often motivated by the need to express results in terms of the odds ratio, as a measure of association between the response and one or more risk factors. Starting from the recent work of Moser and Coombs (Stat Med 23:1843–1860, 2004) in this article we explore in a mixed model framework the possibility of obtaining odds ratio estimates from a regression linear model without the need of dichotomizing the response variable. It is shown that the odds ratio estimators derived from a linear mixed model outperform those from a binomial generalized linear mixed model, especially when the data exhibit high levels of heterogeneity.  相似文献   

15.
This paper introduces a Markov model in Phase II profile monitoring with autocorrelated binary response variable. In the proposed approach, a logistic regression model is extended to describe the within-profile autocorrelation. The likelihood function is constructed and then a particle swarm optimization algorithm (PSO) is tuned and utilized to estimate the model parameters. Furthermore, two control charts are extended in which the covariance matrix is derived based on the Fisher information matrix. Simulation studies are conducted to evaluate the detecting capability of the proposed control charts. A numerical example is also given to illustrate the application of the proposed method.  相似文献   

16.
Summary.  A common application of multilevel models is to apportion the variance in the response according to the different levels of the data. Whereas partitioning variances is straightforward in models with a continuous response variable with a normal error distribution at each level, the extension of this partitioning to models with binary responses or to proportions or counts is less obvious. We describe methodology due to Goldstein and co-workers for apportioning variance that is attributable to higher levels in multilevel binomial logistic models. This partitioning they referred to as the variance partition coefficient. We consider extending the variance partition coefficient concept to data sets when the response is a proportion and where the binomial assumption may not be appropriate owing to overdispersion in the response variable. Using the literacy data from the 1991 Indian census we estimate simple and complex variance partition coefficients at multiple levels of geography in models with significant overdispersion and thereby establish the relative importance of different geographic levels that influence educational disparities in India.  相似文献   

17.
Abstract

Teratological experiments are controlled dose-response studies in which impregnated animals are randomly assigned to various exposure levels of a toxic substance. Subsequently, both continuous and discrete responses are recorded on the litters of fetuses that these animals produce. Discrete responses are usually binary in nature, such as the presence or absence of some fetal anomaly. This clustered binary data usually exhibits over-dispersion (or under-dispersion), which can be interpreted as either variation between litter response probabilities or intralitter correlation. To model the correlation and/or variation, the beta-binomial distribution has been assumed for the number of positive fetal responses within a litter. Although the mean of the beta-binomial model has been linked to dose-response functions, in terms of measuring over-dispersion, it may be a restrictive method in modeling data from teratological studies. Also for certain toxins, a threshold effect has been observed in the dose-response pattern of the data. We propose to incorporate a random effect into a general threshold dose-response model to account for the variation in responses, while at the same time estimating the threshold effect. We fit this model to a well-known data set in the field of teratology. Simulation studies are performed to assess the validity of the random effects threshold model in these types of studies.  相似文献   

18.
This article provides a strategy to identify the existence and direction of a causal effect in a generalized nonparametric and nonseparable model identified by instrumental variables. The causal effect concerns how the outcome depends on the endogenous treatment variable. The outcome variable, treatment variable, other explanatory variables, and the instrumental variable can be essentially any combination of continuous, discrete, or “other” variables. In particular, it is not necessary to have any continuous variables, none of the variables need to have large support, and the instrument can be binary even if the corresponding endogenous treatment variable and/or outcome is continuous. The outcome can be mismeasured or interval-measured, and the endogenous treatment variable need not even be observed. The identification results are constructive, and can be empirically implemented using standard estimation results.  相似文献   

19.
We present a scalable Bayesian modelling approach for identifying brain regions that respond to a certain stimulus and use them to classify subjects. More specifically, we deal with multi‐subject electroencephalography (EEG) data with a binary response distinguishing between alcoholic and control groups. The covariates are matrix‐variate with measurements taken from each subject at different locations across multiple time points. EEG data have a complex structure with both spatial and temporal attributes. We use a divide‐and‐conquer strategy and build separate local models, that is, one model at each time point. We employ Bayesian variable selection approaches using a structured continuous spike‐and‐slab prior to identify the locations that respond to a certain stimulus. We incorporate the spatio‐temporal structure through a Kronecker product of the spatial and temporal correlation matrices. We develop a highly scalable estimation algorithm, using likelihood approximation, to deal with large number of parameters in the model. Variable selection is done via clustering of the locations based on their duration of activation. We use scoring rules to evaluate the prediction performance. Simulation studies demonstrate the efficiency of our scalable algorithm in terms of estimation and fast computation. We present results using our scalable approach on a case study of multi‐subject EEG data.  相似文献   

20.
In this paper a new family of test statistics is presented for testing the independence between the binary response Y and an ordered categorical explanatory variable X (doses) against the alternative hypothesis of an increase dose-response relationship between a response variable Y and X (doses). The properties of these test statistics are studied. This new family of test statistics is based on the family of φ-divergence measures and contains as a particular case the likelihood ratio test. We pay special attention to the family of test statistics associated with the power divergence family. A simulation study is included in order to analyze the behavior of the power divergence family of test statistics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号