首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Suppose that we have a nonparametric regression model Y = m(X) + ε with XRp, where X is a random design variable and is observed completely, and Y is the response variable and some Y-values are missing at random. Based on the “complete” data sets for Y after nonaprametric regression imputation and inverse probability weighted imputation, two estimators of the regression function m(x0) for fixed x0Rp are proposed. Asymptotic normality of two estimators is established, which is used to construct normal approximation-based confidence intervals for m(x0). We also construct an empirical likelihood (EL) statistic for m(x0) with limiting distribution of χ21, which is used to construct an EL confidence interval for m(x0).  相似文献   

2.
ABSTRACT

In the stepwise procedure of selection of a fixed or a random explanatory variable in a mixed quantitative linear model with errors following a Gaussian stationary autocorrelated process, we have studied the efficiency of five estimators relative to Generalized Least Squares (GLS): Ordinary Least Squares (OLS), Maximum Likelihood (ML), Restricted Maximum Likelihood (REML), First Differences (FD), and First-Difference Ratios (FDR). We have also studied the validity and power of seven derived testing procedures, to assess the significance of the slope of the candidate explanatory variable x 2 to enter the model in which there is already one regressor x 1. In addition to five testing procedures of the literature, we considered the FDR t-test with n ? 3 df and the modified t-test with n? ? 3 df for partial correlations, where n? is Dutilleul's effective sample size. Efficiency, validity, and power were analyzed by Monte Carlo simulations, as functions of the nature, fixed vs. random (purely random or autocorrelated), of x 1 and x 2, the sample size and the autocorrelation of random terms in the regression model. We report extensive results for the autocorrelation structure of first-order autoregressive [AR(1)] type, and discuss results we obtained for other autocorrelation structures, such as spherical semivariogram, first-order moving average [MA(1)] and ARMA(1,1), but we could not present because of space constraints. Overall, we found that:
  1. the efficiency of slope estimators and the validity of testing procedures depend primarily on the nature of x 2, but not on that of x 1;

  2. FDR is the most inefficient slope estimator, regardless of the nature of x 1 and x 2;

  3. REML is the most efficient of the slope estimators compared relative to GLS, provided the specified autocorrelation structure is correct and the sample size is large enough to ensure the convergence of its optimization algorithm;

  4. the FDR t-test, the modified t-test and the REML t-test are the most valid of the testing procedures compared, despite the inefficiency of the FDR and OLS slope estimators for the former two;

  5. the FDR t-test, however, suffers from a lack of power that varies with the nature of x 1 and x 2; and

  6. the modified t-test for partial correlations, which does not require the specification of an autocorrelation structure, can be recommended when x 1 is fixed or random and x 2 is random, whether purely random or autocorrelated. Our results are illustrated by the environmental data that motivated our work.

  相似文献   

3.
The correlation coefficient is widely used to quantify the degree of association between two quantitative variables. By resorting to the geometric representation of the linear correlation coefficient, it is possible to calculate the upper and lower bounds of the correlation coefficient between two variables x 1,x 2 when the correlation coefficients with a third variable x 3 are available. Implications in observational studies, where x 3 could be a proxy of a target variable x 2, whose direct measurement is too expensive or impractical, are discussed.  相似文献   

4.
Imputation is commonly used to compensate for missing data in surveys. We consider the general case where the responses on either the variable of interest y or the auxiliary variable x or both may be missing. We use ratio imputation for y when the associated x is observed and different imputations when x is not observed. We obtain design-consistent linearization and jackknife variance estimators under uniform response. We also report the results of a simulation study on the efficiencies of imputed estimators, and relative biases and efficiencies of associated variance estimators.  相似文献   

5.
Consider the situation where measurements are taken at two different times and let Mj(x) be some conditional robust measure of location associated with the random variable Y at time j, given that some covariate X=x. The goal is to test H0: M1(x)=M2(x) for each xx1,?…?, xK such that the probability of one or more Type I errors is less than α, where x1,?…?, xK are K specified values of the covariate. The paper reports simulation results comparing two methods aimed at accomplishing this goal without specifying some parametric form for the regression line. The first method is based on a simple modification of the method in Wilcox [Introduction to robust estimation and hypothesis testing. 3rd ed. San Diego, CA: Academic Press; 2012, Section 11.11.1]. The main result here is that the second method, which has never been studied, can have higher power, sometimes substantially so. Data from the Well Elderly 2 study, which motivated this paper, are used to illustrate that the alternative approach can make a practical difference. Here, the estimate of Mj(x) is based in part on either a 20% trimmed mean or the Harrell–Davis quantile estimator, but in principle the more successful method can be used with any robust location estimator.  相似文献   

6.
We consider an approach to prediction in linear model when values of the future explanatory variables are unavailable, we predict a future response y f at a future sample point x f when some components of x f are unavailable. We consider both the cases where x f are dependent and independent but normally distributed. A Taylor expansion is used to derive an approximation to the predictive density, and the influence of missing future explanatory variables (the loss or discrepancy) is assessed using the Kullback–Leibler measure of divergence. This discrepancy is compared in different scenarios including the situation where the missing variables are dropped entirely.  相似文献   

7.
We present a decomposition of the correlation coefficient between xt and xt?k into three terms that include the partial and inverse autocorrelations. The first term accounts for the portion of the autocorrelation that is explained by the inner variables {xt?1 , xt?2 , …, x t? k+1}, the second one measures the portion explained by the outer variables {x t+1, x t+2, } ∪ {x t?k?1, x t?k?2,…} and the third term measures the correlation between x t and xt?k given all other variables. These terms, squared and summed, can form the basis of three portmanteau-type tests that are able to detect both deviation from white noise and lack of fit of an entertained model. Quantiles of their asymptotic sample distributions are complicated to derive at an adequate level of accuracy, so they are approximated using the Monte Carlo method. A simulation experiment is carried out to investigate significance levels and power of each test, and compare them to the portmanteau test.  相似文献   

8.
In this paper, we propose a new procedure to estimate the distribution of a variable y when there are missing data. To compensate the presence of missing responses, it is assumed that a covariate vector x is observed and that y and x are related by means of a semi-parametric regression model. Observed residuals are combined with predicted values to estimate the missing response distribution. Once the responses distribution is consistently estimated, we can estimate any parameter defined through a continuous functional T using a plug in procedure. We prove that the proposed estimators have high breakdown point.  相似文献   

9.
Two families of parameter estimation procedures for the stable laws based on a variant of the characteristic function are provided. The methodology which produces viable computational procedures for the stable laws is generally applicable to other families of distributions across a variety of settings. Both families of procedures may be described as a modified weighted chi-squared minimization procedure, and both explicitly take account of constraints on the parameter space. Influence func-tions for and efficiencies of the estimators are given. If x1, x2, …xn random sample from an unknown distribution F , a method for determining the stable law to which F is attracted is developed. Procedures for regression and autoregres-sion with stable error structure are provided. A number of examples are given.  相似文献   

10.
The paper discusses D-optimal axial designs for the additive quadratic and cubic mixture models σ1≤i≤qixi + βiix2i) and σ1≤i≤qixi + βiix2i + βiiix3i), where xi≥ 0, x1 + . . . + xq = 1. For the quadratic model, a saturated symmetric axial design is used, in which support points are of the form (x1, . . . , xq) = [1 ? (q?1)δi, δi, . . . , δi], where i = 1, 2 and 0 ≤δ2 <δ1 ≤ 1/(q ?1). It is proved that when 3 ≤q≤ 6, the above design is D-optimal if δ2 = 0 and δ1 = 1/(q?1), and when q≥ 7 it is D-optimal if δ2 = 0 and δ1 = [5q?1 ? (9q2?10q + 1)1/2]/(4q2). Similar results exist for the cubic model, with support points of the form (x1, . . . , xq) = [1 ? (q?1)δi, δi, . . . , δi], where i = 1, 2, 3 and 0 = δ3 <δ2 < δ1 ≤1/(q?1). The saturated D-optimal axial design and D-optimal design for the quadratic model are compared in terms of their efficiency and uniformity.  相似文献   

11.
In this paper, the statistical inference of the unknown parameters of a Burr Type III (BIII) distribution based on the unified hybrid censored sample is studied. The maximum likelihood estimators of the unknown parameters are obtained using the Expectation–Maximization algorithm. It is observed that the Bayes estimators cannot be obtained in explicit forms, hence Lindley's approximation and the Markov Chain Monte Carlo (MCMC) technique are used to compute the Bayes estimators. Further the highest posterior density credible intervals of the unknown parameters based on the MCMC samples are provided. The new model selection test is developed in discriminating between two competing models under unified hybrid censoring scheme. Finally, the potentiality of the BIII distribution to analyze the real data is illustrated by using the fracture toughness data of the three different materials namely silicon nitride (Si3N4), Zirconium dioxide (ZrO2) and sialon (Si6?xAlxOxN8?x). It is observed that for the present data sets, the BIII distribution has the better fit than the Weibull distribution which is frequently used in the fracture toughness data analysis.  相似文献   

12.
We present results of a Monte Carlo study comparing four methods of estimating the parameters of the logistic model logit (pr (Y = 1 | X, Z)) = α0 + α 1 X + α 2 Z where X and Z are continuous covariates and X is always observed but Z is sometimes missing. The four methods examined are 1) logistic regression using complete cases, 2) logistic regression with filled-in values of Z obtained from the regression of Z on X and Y, 3) logistic regression with filled-in values of Z and random error added, and 4) maximum likelihood estimation assuming the distribution of Z given X and Y is normal. Effects of different percent missing for Z and different missing value mechanisms on the bias and mean absolute deviation of the estimators are examined for data sets of N = 200 and N = 400.  相似文献   

13.
In many experiments, not all explanatory variables can be controlled. When the units arise sequentially, different approaches may be used. The authors study a natural sequential procedure for “marginally restricted” D‐optimal designs. They assume that one set of explanatory variables (x1) is observed sequentially, and that the experimenter responds by choosing an appropriate value of the explanatory variable x2. In order to solve the sequential problem a priori, the authors consider the problem of constructing optimal designs with a prior marginal distribution for x1. This eliminates the influence of units already observed on the next unit to be designed. They give explicit designs for various cases in which the mean response follows a linear regression model; they also consider a case study with a nonlinear logistic response. They find that the optimal strategy often consists of randomizing the assignment of the values of x2.  相似文献   

14.
This paper considers the problem of combining k unbiased estimates, x i of a parameter,μ, where each estimate, x i is the average of n i + l independent normal observations with unknown mean, μ, and unknown variance, σ i 2. The behavior of several commonly used estimators of μ is studied by means of an empirical sampling study, and the empirical results of this experiment are interpreted in terms of previous theoretical results. Finally, some extrapolations are made as to how these estimators might behave under varying conditions, and some new estimators are proposed which might have higher efficiencies under certain conditions than those which are generally used.  相似文献   

15.
Let X be lognormal(μ,σ2) with density f(x); let θ > 0 and define . We study properties of the exponentially tilted density (Esscher transform) fθ(x) = e?θxf(x)/L(θ), in particular its moments, its asymptotic form as θ and asymptotics for the saddlepoint θ(x) determined by . The asymptotic formulas involve the Lambert W function. The established relations are used to provide two different numerical methods for evaluating the left tail probability of the sum of lognormals Sn=X1+?+Xn: a saddlepoint approximation and an exponential tilting importance sampling estimator. For the latter, we demonstrate logarithmic efficiency. Numerical examples for the cdf Fn(x) and the pdf fn(x) of Sn are given in a range of values of σ2,n and x motivated by portfolio value‐at‐risk calculations.  相似文献   

16.
Let H(x, y) be a continuous bivariate distribution function with known marginal distribution functions F(x) and G(y). Suppose the values of H are given at several points, H(x i , y i ) = θ i , i = 1, 2,…, n. We first discuss conditions for the existence of a distribution satisfying these conditions, and present a procedure for checking if such a distribution exists. We then consider finding lower and upper bounds for such distributions. These bounds may be used to establish bounds on the values of Spearman's ρ and Kendall's τ. For n = 2, we present necessary and sufficient conditions for existence of such a distribution function and derive best-possible upper and lower bounds for H(x, y). As shown by a counter-example, these bounds need not be proper distribution functions, and we find conditions for these bounds to be (proper) distribution functions. We also present some results for the general case, where the values of H(x, y) are known at more than two points. In view of the simplification in notation, our results are presented in terms of copulas, but they may easily be expressed in terms of distribution functions.  相似文献   

17.
Two‐phase sampling is often used for estimating a population total or mean when the cost per unit of collecting auxiliary variables, x, is much smaller than the cost per unit of measuring a characteristic of interest, y. In the first phase, a large sample s1 is drawn according to a specific sampling design p(s1) , and auxiliary data x are observed for the units is1 . Given the first‐phase sample s1 , a second‐phase sample s2 is selected from s1 according to a specified sampling design {p(s2s1) } , and (y, x) is observed for the units is2 . In some cases, the population totals of some components of x may also be known. Two‐phase sampling is used for stratification at the second phase or both phases and for regression estimation. Horvitz–Thompson‐type variance estimators are used for variance estimation. However, the Horvitz–Thompson ( Horvitz & Thompson, J. Amer. Statist. Assoc. 1952 ) variance estimator in uni‐phase sampling is known to be highly unstable and may take negative values when the units are selected with unequal probabilities. On the other hand, the Sen–Yates–Grundy variance estimator is relatively stable and non‐negative for several unequal probability sampling designs with fixed sample sizes. In this paper, we extend the Sen–Yates–Grundy ( Sen , J. Ind. Soc. Agric. Statist. 1953; Yates & Grundy , J. Roy. Statist. Soc. Ser. B 1953) variance estimator to two‐phase sampling, assuming fixed first‐phase sample size and fixed second‐phase sample size given the first‐phase sample. We apply the new variance estimators to two‐phase sampling designs with stratification at the second phase or both phases. We also develop Sen–Yates–Grundy‐type variance estimators of the two‐phase regression estimators that make use of the first‐phase auxiliary data and known population totals of some of the auxiliary variables.  相似文献   

18.
The generalized method of moments (GMM) and empirical likelihood (EL) are popular methods for combining sample and auxiliary information. These methods are used in very diverse fields of research, where competing theories often suggest variables satisfying different moment conditions. Results in the literature have shown that the efficient‐GMM (GMME) and maximum empirical likelihood (MEL) estimators have the same asymptotic distribution to order n?1/2 and that both estimators are asymptotically semiparametric efficient. In this paper, we demonstrate that when data are missing at random from the sample, the utilization of some well‐known missing‐data handling approaches proposed in the literature can yield GMME and MEL estimators with nonidentical properties; in particular, it is shown that the GMME estimator is semiparametric efficient under all the missing‐data handling approaches considered but that the MEL estimator is not always efficient. A thorough examination of the reason for the nonequivalence of the two estimators is presented. A particularly strong feature of our analysis is that we do not assume smoothness in the underlying moment conditions. Our results are thus relevant to situations involving nonsmooth estimating functions, including quantile and rank regressions, robust estimation, the estimation of receiver operating characteristic (ROC) curves, and so on.  相似文献   

19.
Let Y be a response variable, possibly multivariate, with a density function f (y|x, v; β) conditional on vectors x and v of covariates and a vector β of unknown parameters. The authors consider the problem of estimating β when the values taken by the covariate vector v are available for all observations while some of those taken by the covariate x are missing at random. They compare the profile estimator to several alternatives, both in terms of bias and standard deviation, when the response and covariates are discrete or continuous.  相似文献   

20.
ABSTRACT

This article considers the estimation of a distribution function FX(x) based on a random sample X1, X2, …, Xn when the sample is suspected to come from a close-by distribution F0(x). The new estimators, namely the preliminary test (PTE) and Stein-type estimator (SE) are defined and compared with the “empirical distribution function” (edf) under local departure. In this case, we show that Stein-type estimators are superior to edf and PTE is superior to edf when it is close to F0(x). As a by-product similar estimators are proposed for population quantiles.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号