首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Let X1,…, Xn be mutually independent non-negative integer-valued random variables with probability mass functions fi(x) > 0 for z= 0,1,…. Let E denote the event that {X1X2≥…≥Xn}. This note shows that, conditional on the event E, Xi-Xi+ 1 and Xi+ 1 are independent for all t = 1,…, k if and only if Xi (i= 1,…, k) are geometric random variables, where 1 ≤kn-1. The k geometric distributions can have different parameters θi, i= 1,…, k.  相似文献   

2.
Models that involve an outcome variable, covariates, and latent variables are frequently the target for estimation and inference. The presence of missing covariate or outcome data presents a challenge, particularly when missingness depends on the latent variables. This missingness mechanism is called latent ignorable or latent missing at random and is a generalisation of missing at random. Several authors have previously proposed approaches for handling latent ignorable missingness, but these methods rely on prior specification of the joint distribution for the complete data. In practice, specifying the joint distribution can be difficult and/or restrictive. We develop a novel sequential imputation procedure for imputing covariate and outcome data for models with latent variables under latent ignorable missingness. The proposed method does not require a joint model; rather, we use results under a joint model to inform imputation with less restrictive modelling assumptions. We discuss identifiability and convergence‐related issues, and simulation results are presented in several modelling settings. The method is motivated and illustrated by a study of head and neck cancer recurrence. Imputing missing data for models with latent variables under latent‐dependent missingness without specifying a full joint model.  相似文献   

3.
In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation–Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case–control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered.  相似文献   

4.
k-POD: A Method for k-Means Clustering of Missing Data   总被引:1,自引:0,他引:1  
The k-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, are common in many applications. Mainstream approaches to clustering missing data reduce the missing data problem to a complete data formulation through either deletion or imputation but these solutions may incur significant costs. Our k-POD method presents a simple extension of k-means clustering for missing data that works even when the missingness mechanism is unknown, when external information is unavailable, and when there is significant missingness in the data.

[Received November 2014. Revised August 2015.]  相似文献   

5.
Markov Chain Monte Carlo (MCMC) is the most common method used in multiple imputation. However, it is not unbiased when it is applied to imputations of categorical variables. The literature has considered the problem for binary variables with only two levels. In this article, we consider more general situations. We not only evaluate the bias associated with the imputation of categorical variables using the MCMC method, but also introduce a method to correct the bias. A simulation study is conducted and an application is provided to demonstrate the advantages of using the correction factors proposed in this article.  相似文献   

6.
Through random cut‐points theory, the author extends inference for ordered categorical data to the unspecified continuum underlying the ordered categories. He shows that a random cut‐point Mann‐Whitney test yields slightly smaller p‐values than the conventional test for most data. However, when at least P% of the data lie in one of the k categories (with P = 80 for k = 2, P = 67 for k = 3,…, P = 18 for k = 30), he also shows that the conventional test can yield much smaller p‐values, and hence misleadingly liberal inference for the underlying continuum. The author derives formulas for exact tests; for k = 2, the Mann‐Whitney test is but a binomial test.  相似文献   

7.
Missing data are a recurring problem that can cause bias or lead to inefficient analyses. Development of statistical methods to address missingness have been actively pursued in recent years, including imputation, likelihood and weighting approaches. Each approach is more complicated when there are many patterns of missing values, or when both categorical and continuous random variables are involved. Implementations of routines to incorporate observations with incomplete variables in regression models are now widely available. We review these routines in the context of a motivating example from a large health services research dataset. While there are still limitations to the current implementations, and additional efforts are required of the analyst, it is feasible to incorporate partially observed values, and these methods should be utilized in practice.  相似文献   

8.
Data which is grouped and truncated is considered. We are given numbers n1<…<nk=n and we observe Xni ),i=1,…k, and the tottal number of observations available (N> nk is unknown. If the underlying distribution has one unknown parameter θ which enters as a scale parameter, we examine the form of the equations for both conditional, unconditional and modified maximum likelihood estimators of θ and N and examine when these estimators will be finite, and unique. We also develop expressions for asymptotic bias and search for modified estimators which minimize the maximum asymptotic bias. These results are specialized tG the zxponential distribution. Methods of computing the solutions to the likelihood equatims are also discussed.  相似文献   

9.
A sequence of independent random variables {Zn:n≥ 1} with unknown probability distributions is considered and the problem of estimating their expectations {Mn+1: n≥ 1} is examined. The estimation of Mn+1 is based on a finite set {zk:1≤kn}, each zk being an observed value of Zk, 1 ≤kn, and also based on the assumption that {Mn:n≥ 1} follows an unknown trend of a specified form.  相似文献   

10.
It is quite a challenge to develop model‐free feature screening approaches for missing response problems because the existing standard missing data analysis methods cannot be applied directly to high dimensional case. This paper develops some novel methods by borrowing information of missingness indicators such that any feature screening procedures for ultrahigh‐dimensional covariates with full data can be applied to missing response case. The first method is the so‐called missing indicator imputation screening, which is developed by proving that the set of the active predictors of interest for the response is a subset of the active predictors for the product of the response and missingness indicator under some mild conditions. As an alternative, another method called Venn diagram‐based approach is also developed. The sure screening property is proven for both methods. It is shown that the complete case analysis can also keep the sure screening property of any feature screening approach with sure screening property.  相似文献   

11.
The aim of this study is to assign weights w 1, …, w m to m clustering variables Z 1, …, Z m , so that k groups were uncovered to reveal more meaningful within-group coherence. We propose a new criterion to be minimized, which is the sum of the weighted within-cluster sums of squares and the penalty for the heterogeneity in variable weights w 1, …, w m . We will present the computing algorithm for such k-means clustering, a working procedure to determine a suitable value of penalty constant and numerical examples, among which one is simulated and the other two are real.  相似文献   

12.
ABSTRACT

Suppose independent random samples are available from k(k ≥ 2) exponential populations ∏1,…,∏ k with a common location θ and scale parameters σ1,…,σ k , respectively. Let X i and Y i denote the minimum and the mean, respectively, of the ith sample, and further let X = min{X 1,…, X k } and T i  = Y i  ? X; i = 1,…, k. For selecting a nonempty subset of {∏1,…,∏ k } containing the best population (the one associated with max{σ1,…,σ k }), we use the decision rule which selects ∏ i if T i  ≥ c max{T 1,…,T k }, i = 1,…, k. Here 0 < c ≤ 1 is chosen so that the probability of including the best population in the selected subset is at least P* (1/k ≤ P* < 1), a pre-assigned level. The problem is to estimate the average worth W of the selected subset, the arithmetic average of means of selected populations. In this article, we derive the uniformly minimum variance unbiased estimator (UMVUE) of W. The bias and risk function of the UMVUE are compared numerically with those of analogs of the best affine equivariant estimator (BAEE) and the maximum likelihood estimator (MLE).  相似文献   

13.
ABSTRACT

In this article, we consider a (k + 1)n-dimensional elliptically contoured random vector (XT1, X2T, …, XTk, ZT)T = (X11, …, X1n, …, Xk1, …, Xkn, Z1, …, Zn)T and derive the distribution of concomitant of multivariate order statistics arising from X1, X2, …, Xk. Specially, we derive a mixture representation for concomitant of bivariate order statistics. The joint distribution of the concomitant of bivariate order statistics is also obtained. Finally, the usefulness of our result is illustrated by a real-life data.  相似文献   

14.
Caren Hasler  Yves Tillé 《Statistics》2016,50(6):1310-1331
Random imputation is an interesting class of imputation methods to handle item nonresponse because it tends to preserve the distribution of the imputed variable. However, such methods amplify the total variance of the estimators because values are imputed at random. This increase in variance is called imputation variance. In this paper, we propose a new random hot-deck imputation method that is based on the k-nearest neighbour methodology. It replaces the missing value of a unit with the observed value of a similar unit. Calibration and balanced sampling are applied to minimize the imputation variance. Moreover, our proposed method provides triple protection against nonresponse bias. This means that if at least one out of three specified models holds, then the resulting total estimator is unbiased. Finally, our approach allows the user to perform consistency edits and to impute simultaneously.  相似文献   

15.
In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual – an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect.  相似文献   

16.
A RENEWAL THEOREM IN MULTIDIMENSIONAL TIME   总被引:1,自引:0,他引:1  
Let Yl, Y2,… be i.i.d., positive, integer-valued random variables with means, μ. Let the sequences {Yij, j= 1,2,…}, i= 1,…, r be independent copies of {Y1, Y2,…}. For n={n1,…, nr.}, n1≥1, let Sn=S?n1k1=1= 1 …S?nrkr=1 Yik1… Yrkr. We show that S?Nk=1S?k1=1…S?nr=1 P[[Sn= k] ? [μ-r N logr-1 (N)/(r-1)!] as N →∞.  相似文献   

17.
Dealing with incomplete data is a pervasive problem in statistical surveys. Bayesian networks have been recently used in missing data imputation. In this research, we propose a new methodology for the multivariate imputation of missing data using discrete Bayesian networks and conditional Gaussian Bayesian networks. Results from imputing missing values in coronary artery disease data set and milk composition data set as well as a simulation study from cancer-neapolitan network are presented to demonstrate and compare the performance of three Bayesian network-based imputation methods with those of multivariate imputation by chained equations (MICE) and the classical hot-deck imputation method. To assess the effect of the structure learning algorithm on the performance of the Bayesian network-based methods, two methods called Peter-Clark algorithm and greedy search-and-score have been applied. Bayesian network-based methods are: first, the method introduced by Di Zio et al. [Bayesian networks for imputation, J. R. Stat. Soc. Ser. A 167 (2004), 309–322] in which, each missing item of a variable is imputed using the information given in the parents of that variable; second, the method of Di Zio et al. [Multivariate techniques for imputation based on Bayesian networks, Neural Netw. World 15 (2005), 303–310] which uses the information in the Markov blanket set of the variable to be imputed and finally, our new proposed method which applies the whole available knowledge of all variables of interest, consisting the Markov blanket and so the parent set, to impute a missing item. Results indicate the high quality of our new proposed method especially in the presence of high missingness percentages and more connected networks. Also the new method have shown to be more efficient than the MICE method for small sample sizes with high missing rates.  相似文献   

18.
Let X 1, X 2,…, X n be independent exponential random variables with X i having failure rate λ i for i = 1,…, n. Denote by D i:n  = X i:n  ? X i?1:n the ith spacing of the order statistics X 1:n  ≤ X 2:n  ≤ ··· ≤ X n:n , i = 1,…, n, where X 0:n ≡ 0. It is shown that if λ n+1 ≤ [≥] λ k for k = 1,…, n then D n:n  ≤ lr D n+1:n+1 and D 1:n  ≤ lr D 2:n+1 [D 2:n+1 ≤ lr D 2:n ], and that if λ i  + λ j  ≥ λ k for all distinct i,j, and k then D n?1:n  ≤ lr D n:n and D n:n+1 ≤ lr D n:n , where ≤ lr denotes the likelihood ratio order. We also prove that D 1:n  ≤ lr D 2:n for n ≥ 2 and D 2:3 ≤ lr D 3:3 for all λ i 's.  相似文献   

19.
This paper investigates tail behavior of the randomly weighted sum ∑nk = 1θkXk and reaches an asymptotic formula, where Xk, 1 ? k ? n, are real-valued linearly wide quadrant-dependent (LWQD) random variables with a common heavy-tailed distribution, and θk, 1 ? k ? n, independent of Xk, 1 ? k ? n, are n non-negative random variables without any dependence assumptions. The LWQD structure includes the linearly negative quadrant-dependent structure, the negatively associated structure, and hence the independence structure. On the other hand, it also includes some positively dependent random variables and some other random variables. The obtained result coincides with the existing ones.  相似文献   

20.
Let X 1, X 2,… be a sequence of independent and identically distributed random variables, and let Y n , n = K, K + 1, K + 2,… be the corresponding backward moving average of order K. At epoch n ≥ K, the process Y n will be off target by the input X n if it exceeds a threshold. By introducing a two-state Markov chain, we define a level of significance (1 ? a)% to be the percentage of times that the moving average process stays on target. We establish a technique to evaluate, or estimate, a threshold, to guarantee that {Y n } will stay (1 ? a)% of times on target, for a given (1 ? a)%. It is proved that if the distribution of the inputs is exponential or normal, then the threshold will be a linear function in the mean of the distribution of inputs μ X . The slope and intercept of the line, in each case, are specified. It is also observed that for the gamma inputs, the threshold is merely linear in the reciprocal of the scale parameter. These linear relationships can be easily applied to estimate the desired thresholds by samples from the inputs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号