首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We consider the situation where there is a known regression model that can be used to predict an outcome, Y, from a set of predictor variables X . A new variable B is expected to enhance the prediction of Y. A dataset of size n containing Y, X and B is available, and the challenge is to build an improved model for Y| X ,B that uses both the available individual level data and some summary information obtained from the known model for Y| X . We propose a synthetic data approach, which consists of creating m additional synthetic data observations, and then analyzing the combined dataset of size n + m to estimate the parameters of the Y| X ,B model. This combined dataset of size n + m now has missing values of B for m of the observations, and is analyzed using methods that can handle missing data (e.g., multiple imputation). We present simulation studies and illustrate the method using data from the Prostate Cancer Prevention Trial. Though the synthetic data method is applicable to a general regression context, to provide some justification, we show in two special cases that the asymptotic variances of the parameter estimates in the Y| X ,B model are identical to those from an alternative constrained maximum likelihood estimation approach. This correspondence in special cases and the method's broad applicability makes it appealing for use across diverse scenarios. The Canadian Journal of Statistics 47: 580–603; 2019 © 2019 Statistical Society of Canada  相似文献   

2.
This paper considers statistical inference for partially linear models Y = X ? β +ν(Z) +? when the linear covariate X is missing with missing probability π depending upon (Y, Z). We propose empirical likelihood‐based statistics to construct confidence regions for β and ν(z). The resulting empirical likelihood ratio statistics are shown to be asymptotically chi‐squared‐distributed. The finite‐sample performance of the proposed statistics is assessed by simulation experiments. The proposed methods are applied to a dataset from an AIDS clinical trial.  相似文献   

3.
In this paper, we consider the problem of adaptive density or survival function estimation in an additive model defined by Z=X+Y with X independent of Y, when both random variables are non‐negative. This model is relevant, for instance, in reliability fields where we are interested in the failure time of a certain material that cannot be isolated from the system it belongs. Our goal is to recover the distribution of X (density or survival function) through n observations of Z, assuming that the distribution of Y is known. This issue can be seen as the classical statistical problem of deconvolution that has been tackled in many cases using Fourier‐type approaches. Nonetheless, in the present case, the random variables have the particularity to be supported. Knowing that, we propose a new angle of attack by building a projection estimator with an appropriate Laguerre basis. We present upper bounds on the mean squared integrated risk of our density and survival function estimators. We then describe a non‐parametric data‐driven strategy for selecting a relevant projection space. The procedures are illustrated with simulated data and compared with the performances of a more classical deconvolution setting using a Fourier approach. Our procedure achieves faster convergence rates than Fourier methods for estimating these functions.  相似文献   

4.
In the literature, assuming independence of random variables X and Y, statistical estimation of the stress–strength parameter R = P(X > Y) is intensively investigated. However, in some real applications, the strength variable X could be highly dependent on the stress variable Y. In this paper, unlike the common practice in the literature, we discuss on estimation of the parameter R where more realistically X and Y are dependent random variables distributed as bivariate Rayleigh model. We derive the Bayes estimates and highest posterior density credible intervals of the parameters using suitable priors on the parameters. Because there are not closed forms for the Bayes estimates, we will use an approximation based on Laplace method and a Markov Chain Monte Carlo technique to obtain the Bayes estimate of R and unknown parameters. Finally, simulation studies are conducted in order to evaluate the performances of the proposed estimators and analysis of two data sets are provided.  相似文献   

5.
The growth curve model Yn×p = An×p ξ mtimes;kBk×p+ Enxp, where Y is an observation matrix, &sigma is a matrix of unknown parameters, A is a known matrix of rank m, B is a known matrix of rank k with 1'= (1, …, 1) as its first row, and the rows of E are independent each distributed as Np(0,Σ,) is considered. The problem of constructing the prediction intervals for future observations using the above model is considered and approximate intervals assuming different structures on σ are derived. The results are illustrated with several data sets.  相似文献   

6.

Let Y be a response and, given covariate X,Y has a conditional density f(y | x, θ), where θ is a unknown p-dimensional vector of parameters and the marginal distribution of X is unknown. When responses are missing at random, with auxiliary information and imputation, we define an adjusted empirical log-likelihood ratio for the mean of Y and obtain its asymptotic distribution. A simulation study is conducted to compare the adjusted empirical log-likelihood and the normal approximation method in terms of coverage accuracies.  相似文献   

7.
The following two predictors are compared for time series with systematically missing observations: (a) A time series model is fitted to the full series Xt , and forecasts are based on this model, (b) A time series model is fitted to the series with systematically missing observations Y τ, and forecasts are based on the resulting model. If the data generation processes are known vector autoregressive moving average (ARMA) processes, the first predictor is at least as efficient as the second one in a mean squared error sense. Conditions are given for the two predictors to be identical. If only the ARMA orders of the generation processes are known and the coefficients are estimated, or if the process orders and coefficients are estimated, the first predictor is again, in general, superior. There are, however, exceptions in which the second predictor, using seemingly less information, may be better. These results are discussed, using both asymptotic theory and small sample simulations. Some economic time series are used as illustrative examples.  相似文献   

8.
9.
Let Y be a response variable, possibly multivariate, with a density function f (y|x, v; β) conditional on vectors x and v of covariates and a vector β of unknown parameters. The authors consider the problem of estimating β when the values taken by the covariate vector v are available for all observations while some of those taken by the covariate x are missing at random. They compare the profile estimator to several alternatives, both in terms of bias and standard deviation, when the response and covariates are discrete or continuous.  相似文献   

10.
Cui  Ruifei  Groot  Perry  Heskes  Tom 《Statistics and Computing》2019,29(2):311-333

We consider the problem of causal structure learning from data with missing values, assumed to be drawn from a Gaussian copula model. First, we extend the ‘Rank PC’ algorithm, designed for Gaussian copula models with purely continuous data (so-called nonparanormal models), to incomplete data by applying rank correlation to pairwise complete observations and replacing the sample size with an effective sample size in the conditional independence tests to account for the information loss from missing values. When the data are missing completely at random (MCAR), we provide an error bound on the accuracy of ‘Rank PC’ and show its high-dimensional consistency. However, when the data are missing at random (MAR), ‘Rank PC’ fails dramatically. Therefore, we propose a Gibbs sampling procedure to draw correlation matrix samples from mixed data that still works correctly under MAR. These samples are translated into an average correlation matrix and an effective sample size, resulting in the ‘Copula PC’ algorithm for incomplete data. Simulation study shows that: (1) ‘Copula PC’ estimates a more accurate correlation matrix and causal structure than ‘Rank PC’ under MCAR and, even more so, under MAR and (2) the usage of the effective sample size significantly improves the performance of ‘Rank PC’ and ‘Copula PC.’ We illustrate our methods on two real-world datasets: riboflavin production data and chronic fatigue syndrome data.

  相似文献   

11.
In this work, we define a new method of ranked set sampling (RSS) which is suitable when the characteristic (variable) Y of primary interest on the units is jointly distributed with an auxiliary characteristic X on which one can take its measurement on any number of units, so that units having record values on X alone are ranked and retained for making measurement on Y. We name this RSS as concomitant record ranked set sampling (CRRSS). We propose estimators of the parameters associated with the variable Y of primary interest based on observations of the proposed CRRSS which are applicable to a very large class of distributions viz. Morgenstern family of distributions. We illustrate the application of CRRSS and our estimation technique of parameters, when the basic distribution is Morgenstern-type bivariate logistic distribution. A primary data collected by CRRSS method is demonstrated and the obtained data used to illustrate the results developed in this work.  相似文献   

12.
Suppose that we have a nonparametric regression model Y = m(X) + ε with XRp, where X is a random design variable and is observed completely, and Y is the response variable and some Y-values are missing at random. Based on the “complete” data sets for Y after nonaprametric regression imputation and inverse probability weighted imputation, two estimators of the regression function m(x0) for fixed x0Rp are proposed. Asymptotic normality of two estimators is established, which is used to construct normal approximation-based confidence intervals for m(x0). We also construct an empirical likelihood (EL) statistic for m(x0) with limiting distribution of χ21, which is used to construct an EL confidence interval for m(x0).  相似文献   

13.
We consider an approach to prediction in linear model when values of the future explanatory variables are unavailable, we predict a future response y f at a future sample point x f when some components of x f are unavailable. We consider both the cases where x f are dependent and independent but normally distributed. A Taylor expansion is used to derive an approximation to the predictive density, and the influence of missing future explanatory variables (the loss or discrepancy) is assessed using the Kullback–Leibler measure of divergence. This discrepancy is compared in different scenarios including the situation where the missing variables are dropped entirely.  相似文献   

14.
We propose an 1-regularized likelihood method for estimating the inverse covariance matrix in the high-dimensional multivariate normal model in presence of missing data. Our method is based on the assumption that the data are missing at random (MAR) which entails also the completely missing at random case. The implementation of the method is non-trivial as the observed negative log-likelihood generally is a complicated and non-convex function. We propose an efficient EM algorithm for optimization with provable numerical convergence properties. Furthermore, we extend the methodology to handle missing values in a sparse regression context. We demonstrate both methods on simulated and real data.  相似文献   

15.
ABSTRACT

In this article we suggest some improved version of estimators of scale parameter of Morgenstern-type bivariate uniform distribution (MTBUD) based on the observations made on the units of the ranked set sampling regarding the study variable Y which is correlated with the auxiliary variable X, when (X, Y) follows a MTBUD. We also suggest some linear shrinkage estimators of scale parameter of Morgenstern type bivariate uniform distribution (MTBUD). Efficiency comparisons are also made in this work.  相似文献   

16.
We present results of a Monte Carlo study comparing four methods of estimating the parameters of the logistic model logit (pr (Y = 1 | X, Z)) = α0 + α 1 X + α 2 Z where X and Z are continuous covariates and X is always observed but Z is sometimes missing. The four methods examined are 1) logistic regression using complete cases, 2) logistic regression with filled-in values of Z obtained from the regression of Z on X and Y, 3) logistic regression with filled-in values of Z and random error added, and 4) maximum likelihood estimation assuming the distribution of Z given X and Y is normal. Effects of different percent missing for Z and different missing value mechanisms on the bias and mean absolute deviation of the estimators are examined for data sets of N = 200 and N = 400.  相似文献   

17.
《随机性模型》2013,29(2):157-190
In this paper, we establish an explicit form of matrix decompositions for the queue length distributions of the MAP/G/1 queues under multiple and single vacations with N-policy. We show that the vector generating function Y (z) of the queue length at an arbitrary time and X (z) at departures are decomposed into Y (z) = p idle (z Y (z) and X (z) = p idle (z X (z) where p idle (z) is the vector generating function of the queue length at an arbitrary epoch at which the server is not in service, and ζ Y (z) and ζ X (z) are unidentified matrix generating functions.  相似文献   

18.
ABSTRACT

We consider the estimation of the conditional cumulative distribution function of a scalar response variable Y given a Hilbertian random variable X when the observations are linked via a single-index structure. We establish the pointwise and the uniform almost complete convergence (with the rate) of the kernel estimate of this model. As an application, we show how our result can be applied in the prediction problem via the conditional median estimate. Also, the choice of the functional index via the cross-validation procedure is also discussed but not attacked.  相似文献   

19.
We study the problem of approximating a stochastic process Y = {Y(t: tT} with known and continuous covariance function R on the basis of finitely many observations Y(t 1,), …, Y(t n ). Dependent on the knowledge about the mean function, we use different approximations ? and measure their performance by the corresponding maximum mean squared error sub t∈T E(Y(t) ? ?(t))2. For a compact T ? ? p we prove sufficient conditions for the existence of optimal designs. For the class of covariance functions on T 2 = [0, 1]2 which satisfy generalized Sacks/Ylvisaker regularity conditions of order zero or are of product type, we construct sequences of designs for which the proposed approximations perform asymptotically optimal.  相似文献   

20.
Suppose we have n observations from X = Y + Z, where Z is a noise component with known distribution, and Y has an unknown density f. When the characteristic function of Z is nonzero almost everywhere, we show that it is possible to construct a density estimate fn such that for all f, Iimn| |=0.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号