首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The analysis of incomplete contingency tables is a practical and an interesting problem. In this paper, we provide characterizations for the various missing mechanisms of a variable in terms of response and non-response odds for two and three dimensional incomplete tables. Log-linear parametrization and some distinctive properties of the missing data models for the above tables are discussed. All possible cases in which data on one, two or all variables may be missing are considered. We study the missingness of each variable in a model, which is more insightful for analyzing cross-classified data than the missingness of the outcome vector. For sensitivity analysis of the incomplete tables, we propose easily verifiable procedures to evaluate the missing at random (MAR), missing completely at random (MCAR) and not missing at random (NMAR) assumptions of the missing data models. These methods depend only on joint and marginal odds computed from fully and partially observed counts in the tables, respectively. Finally, some real-life datasets are analyzed to illustrate our results, which are confirmed based on simulation studies.  相似文献   

2.
Missing observations often occur in cross-classified data collected during observational, clinical, and public health studies. Inappropriate treatment of missing data can reduce statistical power and give biased results. This work extends the Baker, Rosenberger and Dersimonian modeling approach to compute maximum likelihood estimates for cell counts in three-way tables with missing data, and studies the association between two dichotomous variables while controlling for a third variable in \( 2\times 2 \times K \) tables. This approach is applied to the Behavioral Risk Factor Surveillance System data. Simulation studies are used to investigate the efficiency of estimation of the common odds ratio.  相似文献   

3.
In situations where the structure of one of the variables of a contingency table is ordered recent theory involving the augmentation of singular vectors and orthogonal polynomials has shown to be applicable for performing symmetric and non-symmetric correspondence analysis. Such an approach has the advantage of allowing the user to identify the source of variation between the categories in terms of components that reflect linear, quadratic and higher-order trends. The purpose of this paper is to focus on the study of two asymmetrically related variables cross-classified to form a two-way contingency table where only one of the variables has an ordinal structure.  相似文献   

4.
In many areas of application, especially life testing and reliability, it is often of interest to estimate an unknown cumulative distribution (cdf). A simultaneous confidence band (SCB) of the cdf can be used to assess the statistical uncertainty of the estimated cdf over the entire range of the distribution. Cheng and Iles [1983. Confidence bands for cumulative distribution functions of continuous random variables. Technometrics 25 (1), 77–86] presented an approach to construct an SCB for the cdf of a continuous random variable. For the log-location-scale family of distributions, they gave explicit forms for the upper and lower boundaries of the SCB based on expected information. In this article, we extend the work of Cheng and Iles [1983. Confidence bands for cumulative distribution functions of continuous random variables. Technometrics 25 (1), 77–86] in several directions. We study the SCBs based on local information, expected information, and estimated expected information for both the “cdf method” and the “quantile method.” We also study the effects of exceptional cases where a simple SCB does not exist. We describe calibration of the bands to provide exact coverage for complete data and type II censoring and better approximate coverage for other kinds of censoring. We also discuss how to extend these procedures to regression analysis.  相似文献   

5.
The problem of selecting the best of k populations is studied for data which are incomplete as some of the values have been deleted randomly. This situation is met in extreme value analysis where only data exceeding a threshold are observable. For increasing sample size we study the case where the probability that a value is observed tends to zero, but the sparse condition is satisfied, so that the mean number of observable values in each population is bounded away from zero and infinity as the sample size tends to infinity. The incomplete data are described by thinned point processes which are approximated by Poisson point processes. Under weak assumptions and after suitable transformations these processes converge to a Poisson point process. Optimal selection rules for the limit model are used to construct asymptotically optimal selection rules for the original sequence of models. The results are applied to extreme value data for high thresholds data.  相似文献   

6.
ABSTRACT

We propose a multiple imputation method based on principal component analysis (PCA) to deal with incomplete continuous data. To reflect the uncertainty of the parameters from one imputation to the next, we use a Bayesian treatment of the PCA model. Using a simulation study and real data sets, the method is compared to two classical approaches: multiple imputation based on joint modelling and on fully conditional modelling. Contrary to the others, the proposed method can be easily used on data sets where the number of individuals is less than the number of variables and when the variables are highly correlated. In addition, it provides unbiased point estimates of quantities of interest, such as an expectation, a regression coefficient or a correlation coefficient, with a smaller mean squared error. Furthermore, the widths of the confidence intervals built for the quantities of interest are often smaller whilst ensuring a valid coverage.  相似文献   

7.
Can we find some common principle in the three comparisons? Lacking adequate time for a thorough exploration, let me suggest that representation is that common principle. I suggested (section 4) that judgment selection of spatial versus temporal extensions distinguish “longitudinal” local studies from “cross-section” population sampling. We had noted (section 3) that censuses are taken for detailed representation of the spatial dimension but they depend on judgmental selection of the temporal. Survey sampling lacks spatial detail but is spatially representative with randomization, and it can be made timely. Periodic samples can be designed that are representative of temporal extension. Furthermore, spatial and temporal detail can be obtained either through estimation or through cumulated samples [Purcell and Kish 1979, 1980; Kish 1979b, 1981, 1986 6.6]. Registers and administrative records can have good spatial and temporal representation, but representation may be lacking in population content, and surely in representation of variables. Representation of variables and of the relations between variables and over the population are the issues in conflict between surveys, experiments, and observations. This is a deep subject, and too deep to be explored again, as it was in section 2. A final point about limits for randomization to achieve representation through sampling: randomization for selecting samples of variables is beyond me generally, because I cannot conceive of frames for defined populations of variables. Yet we can find attempts at randomized selection of variables: in the selection of items for the consumer price index, also of items for tests of IQ or of achievements. Generally I believe that randomization is the way to achieve representation without complete coverage, and that it can be applied and practised in many dimensions.  相似文献   

8.
It has often been complained that the standard framework of decision theory is insufficient. In most applications, neither the maximin paradigm (relving on complete ignorance on the states of natures) nor the classical Bayesian paradigm (assuming perfect probabilistic information on the states of nature) reflect the situation under consideration adequately. Typically one possesses some, but incomplete, knowledge on the stochastic behaviour of the states of nature. In this paper first steps towards a comprehensive framework for decision making under such complex uncertainty will be provided. Common expected utility theory will be extended to interval probability, a generalized probabilistic setting which has the power to express incomplete stochastic knowledge and to take the extent of ambiguity (non-stochastic uncertainty) into account. Since two-monotone and totally monotone capacities are special cases of general interval probatility, wher Choquet integral and interval-valued expectation correspond to one another, the results also show, as a welcome by-product, how to deal efficiently with Choquet Expected Utility and how to perform a neat decision analysis in the case of belief functions. Received: March 2000; revised version: July 2001  相似文献   

9.
We consider a fixed number of arbitrarily dependent random variables with a common symmetric marginal distribution. For each order statistic based on the variables, we determine a common optimal bound, dependent in a simple way on the sample size and number of order statistics, for various measures of dispersion of the order statistics, expressed in terms of the same dispersion measure of the single original variable. The dispersion measures are connected with the notion of M-functional of a random variable location with respect to a symmetric and convex loss function. The measure is defined as the expected loss paid for the discrepancy between the M-functional and the variable. The most popular examples are the median absolute deviation and variance.  相似文献   

10.
We present a Bayesian analysis of variance component models via simulation. In particular, we study the 2-component hierarchical design model under balanced and unbalanced experiments. Also, we consider 2-factor additive random effect models and mixed models in a cross-classified design. We assess the sensitivity of inference to the choice of prior by a sampling/resampling technique. Finally, attention is given to non-normal error distributions such as the heavy-tailed t distribution.  相似文献   

11.
The general pattern of estimated volatilities of macroeconomic and financial variables is often broadly similar. We propose two models in which conditional volatilities feature comovement and study them using U.S. macroeconomic data. The first model specifies the conditional volatilities as driven by a single common unobserved factor, plus an idiosyncratic component. We label this model BVAR with general factor stochastic volatility (BVAR-GFSV) and we show that the loss in terms of marginal likelihood from assuming a common factor for volatility is moderate. The second model, which we label BVAR with common stochastic volatility (BVAR-CSV), is a special case of the BVAR-GFSV in which the idiosyncratic component is eliminated and the loadings to the factor are set to 1 for all the conditional volatilities. Such restrictions permit a convenient Kronecker structure for the posterior variance of the VAR coefficients, which in turn permits estimating the model even with large datasets. While perhaps misspecified, the BVAR-CSV model is strongly supported by the data when compared against standard homoscedastic BVARs, and it can produce relatively good point and density forecasts by taking advantage of the information contained in large datasets.  相似文献   

12.
Non-symmetric correspondence analysis (NSCA) is a useful technique for analysing a two-way contingency table. Frequently, the predictor variables are more than one; in this paper, we consider two categorical variables as predictor variables and one response variable. Interaction represents the joint effects of predictor variables on the response variable. When interaction is present, the interpretation of the main effects is incomplete or misleading. To separate the main effects and the interaction term, we introduce a method that, starting from the coordinates of multiple NSCA and using a two-way analysis of variance without interaction, allows a better interpretation of the impact of the predictor variable on the response variable. The proposed method has been applied on a well-known three-way contingency table proposed by Bockenholt and Bockenholt in which they cross-classify subjects by person's attitude towards abortion, number of years of education and religion. We analyse the case where the variables education and religion influence a person's attitude towards abortion.  相似文献   

13.
Monte Carlo methods for the exact inference have received much attention recently in complete or incomplete contingency table analysis. However, conventional Markov chain Monte Carlo, such as the Metropolis–Hastings algorithm, and importance sampling methods sometimes generate the poor performance by failing to produce valid tables. In this paper, we apply an adaptive Monte Carlo algorithm, the stochastic approximation Monte Carlo algorithm (SAMC; Liang, Liu, & Carroll, 2007), to the exact test of the goodness-of-fit of the model in complete or incomplete contingency tables containing some structural zero cells. The numerical results are in favor of our method in terms of quality of estimates.  相似文献   

14.
In randomized complete block design, we face the problem of selecting the best population. If some partial information about the unknown parameters is available, then we wish to delermine the optimal decisin rule to select the best population.

In this paper, in the class of natural selection rules, we employ the Γ-optimal criterion to determine optimal decision rules that will minimize the maximum expected risk over the class of some partial information. Furthermore, the traditional hypothesis testing is briefly discussed from the view point of ranking and selecting.  相似文献   

15.
Numerous variable selection methods rely on a two-stage procedure, where a sparsity-inducing penalty is used in the first stage to predict the support, which is then conveyed to the second stage for estimation or inference purposes. In this framework, the first stage screens variables to find a set of possibly relevant variables and the second stage operates on this set of candidate variables, to improve estimation accuracy or to assess the uncertainty associated to the selection of variables. We advocate that more information can be conveyed from the first stage to the second one: we use the magnitude of the coefficients estimated in the first stage to define an adaptive penalty that is applied at the second stage. We give the example of an inference procedure that highly benefits from the proposed transfer of information. The procedure is precisely analyzed in a simple setting, and our large-scale experiments empirically demonstrate that actual benefits can be expected in much more general situations, with sensitivity gains ranging from 50 to 100 % compared to state-of-the-art.  相似文献   

16.
We consider the bandit problem with an infinite number of Bernoulli arms, of which the unknown parameters are assumed to be i.i.d. random variables with a common distribution F. Our goal is to construct optimal strategies of choosing “arms” so that the expected long-run failure rate is minimized. We first review a class of strategies and establish their asymptotic properties when F is known. Based on the results, we propose a new strategy and prove that it is asymptotically optimal when F is unknown. Finally, we show that the proposed strategy performs well for a number of simulation scenarios.  相似文献   

17.
We define a chi-squared statistic for p-dimensional data as follows. First, we transform the data to remove the correlations between the p variables. Then, we discretize each variable into groups of equal size and compute the cell counts in the resulting p-way contingency table. Our statistic is just the usual chi-squared statistic for testing independence in a contingency table. Because the cells have been chosen in a data-dependent manner, this statistic does not have the usual limiting distribution. We derive the limiting joint distribution of the cell counts and the limiting distribution of the chi-squared statistic when the data is sampled from a multivariate normal distribution. The chi-squared statistic is useful in detecting hidden structure in raw data or residuals. It can also be used as a test for multivariate normality.  相似文献   

18.
In life testing, n identical testing items are placed on test. Instead of doing a complete life testing with all n outcomes, a Type II censored life testing, consisting of the first m outcomes, is usually employed. Although statistical analysis for the life testing based on censored data is less efficient than the complete life testing, the expected length of the censored life testing is less than that of the complete life testing. In this paper, we compare censored and complete life testing and suggest ways to improve time saving and efficiency. Instead of doing a complete life testing with all n outcomes, we put N>n items on test, which continues until we observe the nth outcome. With both the censored life testing and the complete life testing containing the same number of observations, we show that the expected length of the censored life testing is less than that of the complete life testing and that the censored life testing may be also more efficient than the complete life testing with the same number of observations.  相似文献   

19.
The multilevel approach can be a fruitful methodological framework in which to formulate the micro-macro relationships existing between individuals and their contexts. Usually, place of residence is taken as proxy for context. But individuals can be classified at the same level in more than one way. For example, not only may place of residence be relevant, but birthplace, household or working relations may also be taken into account. Contextual effects can be better identified if multiple classifications are simultaneously considered. in this sense, data do not have a purely hierarchical structure but a cross-classified one, and become very important to establish whether the resulting structure affects the covariance structure of data. In this paper, some critical issues arising from application of multilevel modelling are discussed, and multilevel cross-classified models are proposed as more flexible tools to study contextual effects. A multilevel cross-classified model is specified to evaluate simultaneously the effects of women's place of birth and women's current place of residence on the choice of bearing a second child by Italian women in the mid-1990s.  相似文献   

20.
This article evaluates the usefulness of a nonparametric approach to Bayesian inference by presenting two applications. Our first application considers an educational choice problem. We focus on obtaining a predictive distribution for earnings corresponding to various levels of schooling. This predictive distribution incorporates the parameter uncertainty, so that it is relevant for decision making under uncertainty in the expected utility framework of microeconomics. The second application is to quantile regression. Our point here is to examine the potential of the nonparametric framework to provide inferences without relying on asymptotic approximations. Unlike in the first application, the standard asymptotic normal approximation turns out not to be a good guide.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号