期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A CHARACTERIZATION OF GEOMETRIC DISTRIBUTIONS THROUGH CONDITIONAL INDEPENDENCE

TaChen Liang N. Balakrishnan 《Australian & New Zealand Journal of Statistics》1993,35(2):225-228

Let X₁,…, X_n be mutually independent non-negative integer-valued random variables with probability mass functions f_i(x) > 0 for z= 0,1,…. Let E denote the event that {X₁≥X₂≥…≥X_n}. This note shows that, conditional on the event E, X_i-X_i+ 1 and X_i+ 1 are independent for all t = 1,…, k if and only if X_i (i= 1,…, k) are geometric random variables, where 1 ≤k≤n-1. The k geometric distributions can have different parameters θ_i, i= 1,…, k. 相似文献

2.

Sequential imputation for models with latent variables assuming latent ignorability

Lauren J. Beesley Jeremy M. G. Taylor Roderick J. A. Little 《Australian & New Zealand Journal of Statistics》2019,61(2):213-233

Models that involve an outcome variable, covariates, and latent variables are frequently the target for estimation and inference. The presence of missing covariate or outcome data presents a challenge, particularly when missingness depends on the latent variables. This missingness mechanism is called latent ignorable or latent missing at random and is a generalisation of missing at random. Several authors have previously proposed approaches for handling latent ignorable missingness, but these methods rely on prior specification of the joint distribution for the complete data. In practice, specifying the joint distribution can be difficult and/or restrictive. We develop a novel sequential imputation procedure for imputing covariate and outcome data for models with latent variables under latent ignorable missingness. The proposed method does not require a joint model; rather, we use results under a joint model to inform imputation with less restrictive modelling assumptions. We discuss identifiability and convergence‐related issues, and simulation results are presented in several modelling settings. The method is motivated and illustrated by a study of head and neck cancer recurrence. Imputing missing data for models with latent variables under latent‐dependent missingness without specifying a full joint model. 相似文献

3.

Latent class based multiple imputation approach for missing categorical data

Mulugeta Gebregziabher Stacia M. DeSantis 《Journal of statistical planning and inference》2010

In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation–Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case–control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered. 相似文献

4.

k-POD: A Method for k-Means Clustering of Missing Data 总被引：1，自引：0，他引：1

Jocelyn T. Chi Eric C. Chi Richard G. Baraniuk 《The American statistician》2013,67(1):91-99

The k-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, are common in many applications. Mainstream approaches to clustering missing data reduce the missing data problem to a complete data formulation through either deletion or imputation but these solutions may incur significant costs. Our k-POD method presents a simple extension of k-means clustering for missing data that works even when the missingness mechanism is unknown, when external information is unavailable, and when there is significant missingness in the data.

[Received November 2014. Revised August 2015.] 相似文献

5.

Correction of Bias in Imputing Missing Values of Categorical Variables

Ruiguang Song Kathleen McDavid Harrison Debra L. Hanson H. Irene Hall 《统计学通讯:理论与方法》2013,42(2):350-362

Markov Chain Monte Carlo (MCMC) is the most common method used in multiple imputation. However, it is not unbiased when it is applied to imputations of categorical variables. The literature has considered the problem for binary variables with only two levels. In this article, we consider more general situations. We not only evaluate the bias associated with the imputation of categorical variables using the MCMC method, but also introduce a method to correct the bias. A simulation study is conducted and an application is provided to demonstrate the advantages of using the correction factors proposed in this article. 相似文献

6.

Implications of random cut‐points theory for the Mann‐Whitney and binomial tests

Michael D.Deb. Edwardes 《Revue canadienne de statistique》2000,28(2):427-438

Through random cut‐points theory, the author extends inference for ordered categorical data to the unspecified continuum underlying the ordered categories. He shows that a random cut‐point Mann‐Whitney test yields slightly smaller p‐values than the conventional test for most data. However, when at least P% of the data lie in one of the k categories (with P = 80 for k = 2, P = 67 for k = 3,…, P = 18 for k = 30), he also shows that the conventional test can yield much smaller p‐values, and hence misleadingly liberal inference for the underlying continuum. The author derives formulas for exact tests; for k = 2, the Mann‐Whitney test is but a binomial test. 相似文献

7.

Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models 总被引：1，自引：0，他引：1

Horton NJ Kleinman KP 《The American statistician》2007,61(1):79-90

Missing data are a recurring problem that can cause bias or lead to inefficient analyses. Development of statistical methods to address missingness have been actively pursued in recent years, including imputation, likelihood and weighting approaches. Each approach is more complicated when there are many patterns of missing values, or when both categorical and continuous random variables are involved. Implementations of routines to incorporate observations with incomplete variables in regression models are now widely available. We review these routines in the context of a motivating example from a large health services research dataset. While there are still limitations to the current implementations, and additional efforts are required of the analyst, it is feasible to incorporate partially observed values, and these methods should be utilized in practice. 相似文献

8.

Estimating with percentile grouped and truncated date from a scale parameter distribution

Saul Blumentthal 《统计学通讯:理论与方法》2013,42(11):3607-3628

Data which is grouped and truncated is considered. We are given numbers n₁<…<n_k=n and we observe X_{n_i}),i=1,…k, and the tottal number of observations available (N> n_k is unknown. If the underlying distribution has one unknown parameter θ which enters as a scale parameter, we examine the form of the equations for both conditional, unconditional and modified maximum likelihood estimators of θ and N and examine when these estimators will be finite, and unique. We also develop expressions for asymptotic bias and search for modified estimators which minimize the maximum asymptotic bias. These results are specialized tG the zxponential distribution. Methods of computing the solutions to the likelihood equatims are also discussed. 相似文献

9.

Sequential Estimation of Expectations in the Presence of Trend2

Radu Theodorescu Hans Wolff 《Australian & New Zealand Journal of Statistics》1981,23(2):196-203

A sequence of independent random variables {Z_n:n≥ 1} with unknown probability distributions is considered and the problem of estimating their expectations {M_n+1: n≥ 1} is examined. The estimation of M_n+1 is based on a finite set {z_k:1≤k≤n}, each z_k being an observed value of Z_k, 1 ≤k≤n, and also based on the assumption that {M_n:n≥ 1} follows an unknown trend of a specified form. 相似文献

10.

How to Make Model‐free Feature Screening Approaches for Full Data Applicable to the Case of Missing Response?

《Scandinavian Journal of Statistics》2018,45(2):324-346

It is quite a challenge to develop model‐free feature screening approaches for missing response problems because the existing standard missing data analysis methods cannot be applied directly to high dimensional case. This paper develops some novel methods by borrowing information of missingness indicators such that any feature screening procedures for ultrahigh‐dimensional covariates with full data can be applied to missing response case. The first method is the so‐called missing indicator imputation screening, which is developed by proving that the set of the active predictors of interest for the response is a subset of the active predictors for the product of the response and missingness indicator under some mild conditions. As an alternative, another method called Venn diagram‐based approach is also developed. The sure screening property is proven for both methods. It is shown that the complete case analysis can also keep the sure screening property of any feature screening approach with sure screening property. 相似文献

11.

Weighting variables in K-means clustering

Myung-Hoe Huh 《Journal of applied statistics》2009,36(1):67-78

The aim of this study is to assign weights w ₁, …, w _m to m clustering variables Z ₁, …, Z _m, so that k groups were uncovered to reveal more meaningful within-group coherence. We propose a new criterion to be minimized, which is the sum of the weighted within-cluster sums of squares and the penalty for the heterogeneity in variable weights w ₁, …, w _m. We will present the computing algorithm for such k-means clustering, a working procedure to determine a suitable value of penalty constant and numerical examples, among which one is simulated and the other two are real. 相似文献

12.

Estimating Average Worth of the Selected Subset from Two-Parameter Exponential Populations

Aditi Kar Gangopadhyay Somesh Kumar 《统计学通讯:理论与方法》2013,42(12):2257-2267

ABSTRACT

Suppose independent random samples are available from k(k ≥ 2) exponential populations ∏₁,…,∏_k with a common location θ and scale parameters σ₁,…,σ_k, respectively. Let X _i and Y _i denote the minimum and the mean, respectively, of the ith sample, and further let X = min{X ₁,…, X _k} and T _i = Y _i ? X; i = 1,…, k. For selecting a nonempty subset of {∏₁,…,∏_k} containing the best population (the one associated with max{σ₁,…,σ_k}), we use the decision rule which selects ∏_i if T _i ≥ c max{T ₁,…,T _k}, i = 1,…, k. Here 0 < c ≤ 1 is chosen so that the probability of including the best population in the selected subset is at least P* (1/k ≤ P* < 1), a pre-assigned level. The problem is to estimate the average worth W of the selected subset, the arithmetic average of means of selected populations. In this article, we derive the uniformly minimum variance unbiased estimator (UMVUE) of W. The bias and risk function of the UMVUE are compared numerically with those of analogs of the best affine equivariant estimator (BAEE) and the maximum likelihood estimator (MLE). 相似文献

13.

Concomitants of multivariate order statistics from multivariate elliptical distributions

Roohollah Roozegar Ahad Jamalizadeh Alireza Nematollahi 《统计学通讯:理论与方法》2013,42(3):722-738

ABSTRACT

In this article, we consider a (k + 1)n-dimensional elliptically contoured random vector (X^T₁, X₂^T, …, X^T_k, Z^T)^T = (X₁₁, …, X_1n, …, X_k1, …, X_kn, Z₁, …, Z_n)^T and derive the distribution of concomitant of multivariate order statistics arising from X₁, X₂, …, X_k. Specially, we derive a mixture representation for concomitant of bivariate order statistics. The joint distribution of the concomitant of bivariate order statistics is also obtained. Finally, the usefulness of our result is illustrated by a real-life data. 相似文献

14.

Balanced k-nearest neighbour imputation

Caren Hasler Yves Tillé 《Statistics》2016,50(6):1310-1331

Random imputation is an interesting class of imputation methods to handle item nonresponse because it tends to preserve the distribution of the imputed variable. However, such methods amplify the total variance of the estimators because values are imputed at random. This increase in variance is called imputation variance. In this paper, we propose a new random hot-deck imputation method that is based on the k-nearest neighbour methodology. It replaces the missing value of a unit with the observed value of a similar unit. Calibration and balanced sampling are applied to minimize the imputation variance. Moreover, our proposed method provides triple protection against nonresponse bias. This means that if at least one out of three specified models holds, then the resulting total estimator is unbiased. Finally, our approach allows the user to perform consistency edits and to impute simultaneously. 相似文献

15.

A comparison of various software tools for dealing with missing data via imputation

《Journal of Statistical Computation and Simulation》2012,82(11):1653-1675

In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual – an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect. 相似文献

16.

A RENEWAL THEOREM IN MULTIDIMENSIONAL TIME 总被引：1，自引：0，他引：1

Charles Hagwood 《Australian & New Zealand Journal of Statistics》1989,31(1):130-137

Let Y_l, Y₂,… be i.i.d., positive, integer-valued random variables with means, μ. Let the sequences {Y_ij, j= 1,2,…}, i= 1,…, r be independent copies of {Y₁, Y₂,…}. For n={n₁,…, n_r.}, n₁≥1, let S_n=S?ⁿ¹_k1=1= 1 …S?^nr_kr=1 Y_ik1… Y_rkr. We show that S?^N_k=1S?^∞_k1=1…S?^∞_nr=1 P[[S_n= k] ? [μ^-r N log^r-1 (N)/(r-1)!] as N →∞. 相似文献

17.

Some New Results on Likelihood Ratio Orderings for Spacings of Heterogeneous Exponential Random Variables

Taizhong Hu Qingshu Lu Songqiao Wen 《统计学通讯:理论与方法》2013,42(16):2506-2515

Let X ₁, X ₂,…, X _n be independent exponential random variables with X _i having failure rate λ_i for i = 1,…, n. Denote by D _i:n = X _i:n ? X _i?1:n the ith spacing of the order statistics X _1:n ≤ X _2:n ≤ ··· ≤ X _n:n, i = 1,…, n, where X _0:n ≡ 0. It is shown that if λ_n+1 ≤ [≥] λ_k for k = 1,…, n then D _n:n ≤ _lr D _n+1:n+1 and D _1:n ≤ _lr D _2:n+1 [D _2:n+1 ≤ _lr D _2:n], and that if λ_i + λ_j ≥ λ_k for all distinct i,j, and k then D _n?1:n ≤ _lr D _n:n and D _n:n+1 ≤ _lr D _n:n, where ≤ _lr denotes the likelihood ratio order. We also prove that D _1:n ≤ _lr D _2:n for n ≥ 2 and D _2:3 ≤ _lr D _3:3 for all λ_i's. 相似文献

18.

A new multivariate imputation method based on Bayesian networks

P. Niloofar M. Ganjali 《Journal of applied statistics》2014,41(3):501-518

Dealing with incomplete data is a pervasive problem in statistical surveys. Bayesian networks have been recently used in missing data imputation. In this research, we propose a new methodology for the multivariate imputation of missing data using discrete Bayesian networks and conditional Gaussian Bayesian networks. Results from imputing missing values in coronary artery disease data set and milk composition data set as well as a simulation study from cancer-neapolitan network are presented to demonstrate and compare the performance of three Bayesian network-based imputation methods with those of multivariate imputation by chained equations (MICE) and the classical hot-deck imputation method. To assess the effect of the structure learning algorithm on the performance of the Bayesian network-based methods, two methods called Peter-Clark algorithm and greedy search-and-score have been applied. Bayesian network-based methods are: first, the method introduced by Di Zio et al. [Bayesian networks for imputation, J. R. Stat. Soc. Ser. A 167 (2004), 309–322] in which, each missing item of a variable is imputed using the information given in the parents of that variable; second, the method of Di Zio et al. [Multivariate techniques for imputation based on Bayesian networks, Neural Netw. World 15 (2005), 303–310] which uses the information in the Markov blanket set of the variable to be imputed and finally, our new proposed method which applies the whole available knowledge of all variables of interest, consisting the Markov blanket and so the parent set, to impute a missing item. Results indicate the high quality of our new proposed method especially in the presence of high missingness percentages and more connected networks. Also the new method have shown to be more efficient than the MICE method for small sample sizes with high missing rates. 相似文献

19.

Randomly weighted sums of linearly wide quadrant-dependent random variables with heavy tails

Changjun Yu 《统计学通讯:理论与方法》2017,46(2):591-601

This paper investigates tail behavior of the randomly weighted sum ∑ⁿ_{k = 1}θ_kX_k and reaches an asymptotic formula, where X_k, 1 ? k ? n, are real-valued linearly wide quadrant-dependent (LWQD) random variables with a common heavy-tailed distribution, and θ_k, 1 ? k ? n, independent of X_k, 1 ? k ? n, are n non-negative random variables without any dependence assumptions. The LWQD structure includes the linearly negative quadrant-dependent structure, the negatively associated structure, and hence the independence structure. On the other hand, it also includes some positively dependent random variables and some other random variables. The obtained result coincides with the existing ones. 相似文献

20.

On Thresholds of Moving Averages with Given On-Target Significant Levels

A. R. Soltani S. A. Al-Awadhi W. M. Al-Shemeri 《统计学通讯:理论与方法》2013,42(14):2595-2606

Let X ₁, X ₂,… be a sequence of independent and identically distributed random variables, and let Y _n, n = K, K + 1, K + 2,… be the corresponding backward moving average of order K. At epoch n ≥ K, the process Y _n will be off target by the input X _n if it exceeds a threshold. By introducing a two-state Markov chain, we define a level of significance (1 ? a)% to be the percentage of times that the moving average process stays on target. We establish a technique to evaluate, or estimate, a threshold, to guarantee that {Y _n} will stay (1 ? a)% of times on target, for a given (1 ? a)%. It is proved that if the distribution of the inputs is exponential or normal, then the threshold will be a linear function in the mean of the distribution of inputs μ_X. The slope and intercept of the line, in each case, are specified. It is also observed that for the gamma inputs, the threshold is merely linear in the reciprocal of the scale parameter. These linear relationships can be easily applied to estimate the desired thresholds by samples from the inputs. 相似文献