首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Strict collapsibility and model collapsibility are two important concepts associated with the dimension reduction of a multidimensional contingency table, without losing the relevant information. In this paper, we obtain some necessary and sufficient conditions for the strict collapsibility of the full model, with respect to an interaction factor or a set of interaction factors, based on the interaction parameters of the conditional/layer log-linear models. For hierarchical log-linear models, we present also necessary and sufficient conditions for the full model to be model collapsible, based on the conditional interaction parameters. We discuss both the cases where one variable or a set of variables is conditioned. The connections between the strict collapsibility and the model collapsibility are also pointed out. Our results are illustrated through suitable examples, including a real life application.  相似文献   

2.
We discuss the issue of dimensionality reduction in multinomial logistic models as problems arising in variable selection, collapsibility of responses and linear restrictions in the parameter matrix. A method using the information theoretic criterion suggested by Bai, Krishnaiah and Zhao, a variation of Akaike information criterion, is used to estimate the rank of the parameter matrix. The same procedure is used for the selection of variables and collapsibility of response categories. The strong consistency of this procedure is established in all the problems.  相似文献   

3.
Abstract.  Necessary and sufficient conditions for collapsibility of a directed acyclic graph (DAG) model for a contingency table are derived. By applying the conditions, we can easily check collapsibility over any variable in a given model either by using the joint probability distribution or by using the graph of the model structure. It is shown that collapsibility over a set of variables can be checked in a sequential manner. Furthermore, a DAG is compared with its moral graph in the context of collapsibility.  相似文献   

4.
Abstract. In this paper, we consider two kinds of collapsibility, that is, the model‐collapsibility and the estimate‐collapsibility, of conditional graphical models for multidimensional contingency tables. We show that these two definitions are equivalent, and propose a sufficient and necessary condition for them in terms of the interaction graph, which allows the collapsibility to be characterized and judged intuitively and conveniently.  相似文献   

5.
In this paper, we discuss several concepts in causal inference in terms of causal diagrams proposed by Pearl (1993 , 1995a , b ), and we give conditions for non-confounding, homogeneity and collapsibility for causal effects without knowledge of a completely constructed causal diagram. We first introduce the concepts of non-confounding, conditional non-confounding, uniform non-confounding, homogeneity, collapsibility and strong collapsibility for causal effects, then we present necessary and sufficient conditions for uniform non-confounding, homegeneity and collapsibilities, and finally we show sufficient conditions for non-confounding, conditional non-confounding and uniform non-confounding.  相似文献   

6.
We develop simple necessary and sufficient conditions for a hierarchical log linear model to be strictly collapsible in the sense defined by Whittemore (1978). We then show that collapsibility as defined by Asmussen & Edwards (1983) can be viewed as equivalent to collapsibility as defined by Whittemore (1978) and illustrate why Bishop, Fienberg, & Holland's (1975, p.47) conditions for collapsibility are sufficient but not necessary. Finally, we discuss how collapsibility facilitates interpretation of certain hierarchical log linear models and formulation of hypotheses concerning marginal distributions associated with multidimensional contingency tables.  相似文献   

7.
Abstract. The Yule–Simpson paradox notes that an association between random variables can be reversed when averaged over a background variable. Cox and Wermuth introduced the concept of distribution dependence between two random variables X and Y, and gave two dependence conditions, each of which guarantees that reversal of qualitatively similar conditional dependences cannot occur after marginalizing over the background variable. Ma, Xie and Geng studied the uniform collapsibility of distribution dependence over a background variable W, under stronger homogeneity condition. Collapsibility ensures that associations are the same for conditional and marginal models. In this article, we use the notion of average collapsibility, which requires only the conditional effects average over the background variable to the corresponding marginal effect and investigate its conditions for distribution dependence and for quantile regression coefficients.  相似文献   

8.
Rapid technological advances have resulted in continual changes in data acquisition and reporting processes. While such advances have benefited research in these areas, the changing technologies have, at the same time, created difficulty for statistical analysis by generating outdated data which are incompatible with data based on newer technology. Relationships between these incompatible variables are complicated; not only they are stochastic, but also often depend on other variables, rendering even a simple statistical analysis, such as estimation of a population mean, difficult in the presence of mixed data formats. Thus, technological advancement has brought forth, from the statistical perspective, a methodological problem of the analysis of newer data with outdated data. In this paper, we discuss general principles for addressing the statistical issues related to the analysis of incompatible data. The approach taken to the task at hand has three desirable properties, it is readily understood, since it builds upon a linear regression setting, it is flexible to allow for data incompatibility in either the response or covariate, and it is not computationally intensive. In addition, inferences may be made for a latent variable of interest. Our considerations to this problem are motivated by the analysis of delta wave counts, as a surrogate for sleep disorder, in the sleep laboratory of the Department of Psychiatry, University of Pittsburgh Medical Center, where two major changes had occurred in the acquisition of this data, resulting in three mixed formats. By developing appropriate methods for addressing this issue, we provide statistical advancement that is compatible with technological advancement.  相似文献   

9.
We discuss a general application of categorical data analysis to mutations along the HIV genome. We consider a multidimensional table for several positions at the same time. Due to the complexity of the multidimensional table, we may collapse it by pooling some categories. However, the association between the remaining variables may not be the same as before collapsing. We discuss the collapsibility of tables and the change in the meaning of parameters after collapsing categories. We also address this problem with a log-linear model. We present a parameterization with the consensus output as the reference cell as is appropriate to explain genomic mutations in HIV. We also consider five null hypotheses and some classical methods to address them. We illustrate methods for six positions along the HIV genome, through consideration of all triples of positions.  相似文献   

10.
In many areas of application, especially life testing and reliability, it is often of interest to estimate an unknown cumulative distribution (cdf). A simultaneous confidence band (SCB) of the cdf can be used to assess the statistical uncertainty of the estimated cdf over the entire range of the distribution. Cheng and Iles [1983. Confidence bands for cumulative distribution functions of continuous random variables. Technometrics 25 (1), 77–86] presented an approach to construct an SCB for the cdf of a continuous random variable. For the log-location-scale family of distributions, they gave explicit forms for the upper and lower boundaries of the SCB based on expected information. In this article, we extend the work of Cheng and Iles [1983. Confidence bands for cumulative distribution functions of continuous random variables. Technometrics 25 (1), 77–86] in several directions. We study the SCBs based on local information, expected information, and estimated expected information for both the “cdf method” and the “quantile method.” We also study the effects of exceptional cases where a simple SCB does not exist. We describe calibration of the bands to provide exact coverage for complete data and type II censoring and better approximate coverage for other kinds of censoring. We also discuss how to extend these procedures to regression analysis.  相似文献   

11.
In this paper we discuss a new theoretical basis for perturbation methods. In developing this new theoretical basis, we define the ideal measures of data utility and disclosure risk. Maximum data utility is achieved when the statistical characteristics of the perturbed data are the same as that of the original data. Disclosure risk is minimized if providing users with microdata access does not result in any additional information. We show that when the perturbed values of the confidential variables are generated as independent realizations from the distribution of the confidential variables conditioned on the non-confidential variables, they satisfy the data utility and disclosure risk requirements. We also discuss the relationship between the theoretical basis and some commonly used methods for generating perturbed values of confidential numerical variables.  相似文献   

12.
Cut-off sampling consists of deliberately excluding a set of units from possible selection in a sample, for example if the contribution of the excluded units to the total is small or if the inclusion of these units in the sample involves high costs. If the characteristics of interest of the excluded units differ from those of the rest of the population, the use of naïve estimators may result in highly biased estimates. In this paper, we discuss the use of auxiliary information to reduce the bias by means of calibration and balanced sampling. We show that the use of the available auxiliary information related to both the variable of interest and the probability of being excluded enables us to reduce the potential bias. A short numerical study supports our findings.  相似文献   

13.
Using a multivariate latent variable approach, this article proposes some new general models to analyze the correlated bounded continuous and categorical (nominal or/and ordinal) responses with and without non-ignorable missing values. First, we discuss regression methods for jointly analyzing continuous, nominal, and ordinal responses that we motivated by analyzing data from studies of toxicity development. Second, using the beta and Dirichlet distributions, we extend the models so that some bounded continuous responses are replaced for continuous responses. The joint distribution of the bounded continuous, nominal and ordinal variables is decomposed into a marginal multinomial distribution for the nominal variable and a conditional multivariate joint distribution for the bounded continuous and ordinal variables given the nominal variable. We estimate the regression parameters under the new general location models using the maximum-likelihood method. Sensitivity analysis is also performed to study the influence of small perturbations of the parameters of the missing mechanisms of the model on the maximal normal curvature. The proposed models are applied to two data sets: BMI, Steatosis and Osteoporosis data and Tehran household expenditure budgets.  相似文献   

14.
Collapsibility with respect to a measure of association implies that the measure of association can be obtained from the marginal model. We first discuss model collapsibility and collapsibility with respect to regression coefficients for linear regression models. For parallel regression models, we give simple and different proofs of some of the known results and obtain also certain new results. For random coefficient regression models, we define (average) AA-collapsibility and obtain conditions under which it holds. We consider Poisson regression and logistic regression models also, and derive conditions for collapsibility and AA-collapsibility, respectively. These results generalize some of the results available in the literature. Some suitable examples are also discussed.  相似文献   

15.
In the analysis of time‐to‐event data, competing risks occur when multiple event types are possible, and the occurrence of a competing event precludes the occurrence of the event of interest. In this situation, statistical methods that ignore competing risks can result in biased inference regarding the event of interest. We review the mechanisms that lead to bias and describe several statistical methods that have been proposed to avoid bias by formally accounting for competing risks in the analyses of the event of interest. Through simulation, we illustrate that Gray's test should be used in lieu of the logrank test for nonparametric hypothesis testing. We also compare the two most popular models for semiparametric modelling: the cause‐specific hazards (CSH) model and Fine‐Gray (F‐G) model. We explain how to interpret estimates obtained from each model and identify conditions under which the estimates of the hazard ratio and subhazard ratio differ numerically. Finally, we evaluate several model diagnostic methods with respect to their sensitivity to detect lack of fit when the CSH model holds, but the F‐G model is misspecified and vice versa. Our results illustrate that adequacy of model fit can strongly impact the validity of statistical inference. We recommend analysts incorporate a model diagnostic procedure and contingency to explore other appropriate models when designing trials in which competing risks are anticipated.  相似文献   

16.
In the literature, assuming independence of random variables X and Y, statistical estimation of the stress–strength parameter R = P(X > Y) is intensively investigated. However, in some real applications, the strength variable X could be highly dependent on the stress variable Y. In this paper, unlike the common practice in the literature, we discuss on estimation of the parameter R where more realistically X and Y are dependent random variables distributed as bivariate Rayleigh model. We derive the Bayes estimates and highest posterior density credible intervals of the parameters using suitable priors on the parameters. Because there are not closed forms for the Bayes estimates, we will use an approximation based on Laplace method and a Markov Chain Monte Carlo technique to obtain the Bayes estimate of R and unknown parameters. Finally, simulation studies are conducted in order to evaluate the performances of the proposed estimators and analysis of two data sets are provided.  相似文献   

17.
Recently, non‐uniform sampling has been suggested in microscopy to increase efficiency. More precisely, proportional to size (PPS) sampling has been introduced, where the probability of sampling a unit in the population is proportional to the value of an auxiliary variable. In the microscopy application, the sampling units are fields of view, and the auxiliary variables are easily observed approximations to the variables of interest. Unfortunately, often some auxiliary variables vanish, that is, are zero‐valued. Consequently, part of the population is inaccessible in PPS sampling. We propose a modification of the design based on a stratification idea, for which an optimal solution can be found, using a model‐assisted approach. The new optimal design also applies to the case where ‘vanish’ refers to missing auxiliary variables and has independent interest in sampling theory. We verify robustness of the new approach by numerical results, and we use real data to illustrate the applicability.  相似文献   

18.
Data analysis for randomized trials including multi-treatment arms is often complicated by subjects who do not comply with their treatment assignment. We discuss here methods of estimating treatment efficacy for randomized trials involving multi-treatment arms subject to non-compliance. One treatment effect of interest in the presence of non-compliance is the complier average causal effect (CACE) (Angrist et al. 1996), which is defined as the treatment effect for subjects who would comply regardless of the assigned treatment. Following the idea of principal stratification (Frangakis & Rubin 2002), we define principal compliance (Little et al. 2009) in trials with three treatment arms, extend CACE and define causal estimands of interest in this setting. In addition, we discuss structural assumptions needed for estimation of causal effects and the identifiability problem inherent in this setting from both a Bayesian and a classical statistical perspective. We propose a likelihood-based framework that models potential outcomes in this setting and a Bayes procedure for statistical inference. We compare our method with a method of moments approach proposed by Cheng & Small (2006) using a hypothetical data set, and further illustrate our approach with an application to a behavioral intervention study (Janevic et al. 2003).  相似文献   

19.
A substantial fraction of the statistical analyses and in particular statistical computing is done under the heading of multiple linear regression. That is the fitting of equations to multivariate data using the least squares technique for estimating parameters The optimality properties of these estimates are described in an ideal setting which is not often realized in practice.

Frequently, we do not have "good" data in the sense that the errors are non-normal or the variance is non-homogeneous. The data may contain outliers or extremes which are not easily detectable but variables in the proper functional, and we. may have the linearity

Prior to the mid-sixties regression programs provided just the basic least squares computations plus possibly a step-wise algorithm for variable selection. The increased interest in regression prompted by dramatic improvements in computers has led to a vast amount of literatur describing alternatives to least squares improved variable selection methods and extensive diagnostic procedures

The purpose of this paper is to summarize and illustrate some of these recent developments. In particular we shall review some of the potential problems with regression data discuss the statistics and techniques used to detect these problems and consider some of the proposed solutions. An example is presented to illustrate the effectiveness of these diagnostic methods in revealing such problems and the potential consequences of employing the proposed methods.  相似文献   

20.
Summary. When a number of distinct models contend for use in prediction, the choice of a single model can offer rather unstable predictions. In regression, stochastic search variable selection with Bayesian model averaging offers a cure for this robustness issue but at the expense of requiring very many predictors. Here we look at Bayes model averaging incorporating variable selection for prediction. This offers similar mean-square errors of prediction but with a vastly reduced predictor space. This can greatly aid the interpretation of the model. It also reduces the cost if measured variables have costs. The development here uses decision theory in the context of the multivariate general linear model. In passing, this reduced predictor space Bayes model averaging is contrasted with single-model approximations. A fast algorithm for updating regressions in the Markov chain Monte Carlo searches for posterior inference is developed, allowing many more variables than observations to be contemplated. We discuss the merits of absolute rather than proportionate shrinkage in regression, especially when there are more variables than observations. The methodology is illustrated on a set of spectroscopic data used for measuring the amounts of different sugars in an aqueous solution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号