期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Confidence Intervals for Nonparametric Regression Functions with Missing Data

Yongsong Qin Tao Qiu Qingzhu Lei 《统计学通讯:理论与方法》2014,43(19):4123-4142

Suppose that we have a nonparametric regression model Y = m(X) + ε with X ∈ R^p, where X is a random design variable and is observed completely, and Y is the response variable and some Y-values are missing at random. Based on the “complete” data sets for Y after nonaprametric regression imputation and inverse probability weighted imputation, two estimators of the regression function m(x₀) for fixed x₀ ∈ R^p are proposed. Asymptotic normality of two estimators is established, which is used to construct normal approximation-based confidence intervals for m(x₀). We also construct an empirical likelihood (EL) statistic for m(x₀) with limiting distribution of χ²₁, which is used to construct an EL confidence interval for m(x₀). 相似文献

2.

Stepwise Regression in Mixed Quantitative Linear Models with Autocorrelated Errors

Gülhan Alpargu 《统计学通讯:模拟与计算》2013,42(1):79-104

ABSTRACT

In the stepwise procedure of selection of a fixed or a random explanatory variable in a mixed quantitative linear model with errors following a Gaussian stationary autocorrelated process, we have studied the efficiency of five estimators relative to Generalized Least Squares (GLS): Ordinary Least Squares (OLS), Maximum Likelihood (ML), Restricted Maximum Likelihood (REML), First Differences (FD), and First-Difference Ratios (FDR). We have also studied the validity and power of seven derived testing procedures, to assess the significance of the slope of the candidate explanatory variable x ₂ to enter the model in which there is already one regressor x ₁. In addition to five testing procedures of the literature, we considered the FDR t-test with n ? 3 df and the modified t-test with n? ? 3 df for partial correlations, where n? is Dutilleul's effective sample size. Efficiency, validity, and power were analyzed by Monte Carlo simulations, as functions of the nature, fixed vs. random (purely random or autocorrelated), of x ₁ and x ₂, the sample size and the autocorrelation of random terms in the regression model. We report extensive results for the autocorrelation structure of first-order autoregressive [AR(1)] type, and discuss results we obtained for other autocorrelation structures, such as spherical semivariogram, first-order moving average [MA(1)] and ARMA(1,1), but we could not present because of space constraints. Overall, we found that:

the efficiency of slope estimators and the validity of testing procedures depend primarily on the nature of x ₂, but not on that of x ₁;
FDR is the most inefficient slope estimator, regardless of the nature of x ₁ and x ₂;
REML is the most efficient of the slope estimators compared relative to GLS, provided the specified autocorrelation structure is correct and the sample size is large enough to ensure the convergence of its optimization algorithm;
the FDR t-test, the modified t-test and the REML t-test are the most valid of the testing procedures compared, despite the inefficiency of the FDR and OLS slope estimators for the former two;
the FDR t-test, however, suffers from a lack of power that varies with the nature of x ₁ and x ₂; and
the modified t-test for partial correlations, which does not require the specification of an autocorrelation structure, can be recommended when x ₁ is fixed or random and x ₂ is random, whether purely random or autocorrelated. Our results are illustrated by the environmental data that motivated our work.

相似文献

3.

Admissibility intervals for linear correlation coefficients

Luisa Canal Rocco Micciolo 《Statistical Papers》2007,48(2):305-311

The correlation coefficient is widely used to quantify the degree of association between two quantitative variables. By resorting to the geometric representation of the linear correlation coefficient, it is possible to calculate the upper and lower bounds of the correlation coefficient between two variables x ₁,x ₂ when the correlation coefficients with a third variable x ₃ are available. Implications in observational studies, where x ₃ could be a proxy of a target variable x ₂, whose direct measurement is too expensive or impractical, are discussed. 相似文献

4.

Imputation for missing values and corresponding variance estimation

R. R. Sitter J. N. K. Rao 《Revue canadienne de statistique》1997,25(1):61-73

Imputation is commonly used to compensate for missing data in surveys. We consider the general case where the responses on either the variable of interest y or the auxiliary variable x or both may be missing. We use ratio imputation for y when the associated x is observed and different imputations when x is not observed. We obtain design-consistent linearization and jackknife variance estimators under uniform response. We also report the results of a simulation study on the efficiencies of imputed estimators, and relative biases and efficiencies of associated variance estimators. 相似文献

5.

Within groups analysis of covariance: multiple comparisons at specified design points using a robust measure location when there is curvature

《Journal of Statistical Computation and Simulation》2012,82(16):3236-3246

Consider the situation where measurements are taken at two different times and let M_j(x) be some conditional robust measure of location associated with the random variable Y at time j, given that some covariate X=x. The goal is to test H₀: M₁(x)=M₂(x) for each x∈ x₁,?…?, x_K such that the probability of one or more Type I errors is less than α, where x₁,?…?, x_K are K specified values of the covariate. The paper reports simulation results comparing two methods aimed at accomplishing this goal without specifying some parametric form for the regression line. The first method is based on a simple modification of the method in Wilcox [Introduction to robust estimation and hypothesis testing. 3rd ed. San Diego, CA: Academic Press; 2012, Section 11.11.1]. The main result here is that the second method, which has never been studied, can have higher power, sometimes substantially so. Data from the Well Elderly 2 study, which motivated this paper, are used to illustrate that the alternative approach can make a practical difference. Here, the estimate of M_j(x) is based in part on either a 20% trimmed mean or the Harrell–Davis quantile estimator, but in principle the more successful method can be used with any robust location estimator. 相似文献

6.

Predictive Influence of Unavailable Values of Future Explanatory Variables in a Linear Model

S. K. Bhattacharjee Ahmed Shamiri Md. Sabiruzzaman S. Rao Jammalamadaka 《统计学通讯:理论与方法》2013,42(24):4458-4466

We consider an approach to prediction in linear model when values of the future explanatory variables are unavailable, we predict a future response y ^f at a future sample point x ^f when some components of x ^f are unavailable. We consider both the cases where x ^f are dependent and independent but normally distributed. A Taylor expansion is used to derive an approximation to the predictive density, and the influence of missing future explanatory variables (the loss or discrepancy) is assessed using the Kullback–Leibler measure of divergence. This discrepancy is compared in different scenarios including the situation where the missing variables are dropped entirely. 相似文献

7.

Partial and inverse autocorrelations in portmanteau-type tests for time series

Roberto Baragona Francesco Battaglia 《统计学通讯:模拟与计算》2013,42(3):971-986

We present a decomposition of the correlation coefficient between x_t and x_t?k into three terms that include the partial and inverse autocorrelations. The first term accounts for the portion of the autocorrelation that is explained by the inner variables {x_t?1 , x_t?2 , …, x _{t? k+1}}, the second one measures the portion explained by the outer variables {x _t+1, x _t+2, } ∪ {x _t?k?1, x _t?k?2,…} and the third term measures the correlation between x _t and x_t?k given all other variables. These terms, squared and summed, can form the basis of three portmanteau-type tests that are able to detect both deviation from white noise and lack of fit of an entertained model. Quantiles of their asymptotic sample distributions are complicated to derive at an adequate level of accuracy, so they are approximated using the Monte Carlo method. A simulation experiment is carried out to investigate significance levels and power of each test, and compare them to the portmanteau test. 相似文献

8.

High breakdown point robust estimators with missing data

Florencia Statti Victor J. Yohai 《统计学通讯:理论与方法》2018,47(21):5145-5162

In this paper, we propose a new procedure to estimate the distribution of a variable y when there are missing data. To compensate the presence of missing responses, it is assumed that a covariate vector x is observed and that y and x are related by means of a semi-parametric regression model. Observed residuals are combined with predicted values to estimate the missing response distribution. Once the responses distribution is consistently estimated, we can estimate any parameter defined through a continuous functional T using a plug in procedure. We prove that the proposed estimators have high breakdown point. 相似文献

9.

Modified weighted squared error estimation procedures with special emphasis on the stable laws

A. S. Paulson T. A. Delehanty 《统计学通讯:模拟与计算》2013,42(4):927-972

Two families of parameter estimation procedures for the stable laws based on a variant of the characteristic function are provided. The methodology which produces viable computational procedures for the stable laws is generally applicable to other families of distributions across a variety of settings. Both families of procedures may be described as a modified weighted chi-squared minimization procedure, and both explicitly take account of constraints on the parameter space. Influence func-tions for and efficiencies of the estimators are given. If x₁, x₂, …x_n random sample from an unknown distribution F , a method for determining the stable law to which F is attracted is developed. Procedures for regression and autoregres-sion with stable error structure are provided. A number of examples are given. 相似文献

10.

D-Optimal Axial Designs for Quadratic and Cubic Additive Mixture Models

L.Y. Chan J.H. Meng Y.C. Jiang & Y.N. Guan 《Australian & New Zealand Journal of Statistics》1998,40(3):359-372

The paper discusses D-optimal axial designs for the additive quadratic and cubic mixture models σ_1≤i≤q(β_ix_i + β_iix²_i) and σ_1≤i≤q(β_ix_i + β_iix²_i+ β_iiix³_i), where x_i≥ 0, x₁ + . . . + x_q = 1. For the quadratic model, a saturated symmetric axial design is used, in which support points are of the form (x₁, . . . , x_q) = [1 ? (q?1)δ_i, δ_i, . . . , δ_i], where i = 1, 2 and 0 ≤δ₂<_{δ1 ≤} 1/(q ?1). It is proved that when 3 ≤q≤ 6, the above design is D-optimal if δ₂= 0 and δ₁= 1/(q?1), and when q≥ 7 it is D-optimal if δ₂= 0 and δ₁= [5q?1 ? (9q²?10q + 1)^1/2]/(4q²). Similar results exist for the cubic model, with support points of the form (x₁, . . . , x_q) = [1 ? (q?1)δ_i, δ_i, . . . , δ_i], where i = 1, 2, 3 and 0 = δ₃<_δ2< δ_{1 ≤}1/(q?1). The saturated D-optimal axial design and D-optimal design for the quadratic model are compared in terms of their efficiency and uniformity. 相似文献

11.

Estimation of the Burr type III distribution with application in unified hybrid censored sample of fracture toughness

Hanieh Panahi 《Journal of applied statistics》2017,44(14):2575-2592

In this paper, the statistical inference of the unknown parameters of a Burr Type III (BIII) distribution based on the unified hybrid censored sample is studied. The maximum likelihood estimators of the unknown parameters are obtained using the Expectation–Maximization algorithm. It is observed that the Bayes estimators cannot be obtained in explicit forms, hence Lindley's approximation and the Markov Chain Monte Carlo (MCMC) technique are used to compute the Bayes estimators. Further the highest posterior density credible intervals of the unknown parameters based on the MCMC samples are provided. The new model selection test is developed in discriminating between two competing models under unified hybrid censoring scheme. Finally, the potentiality of the BIII distribution to analyze the real data is illustrated by using the fracture toughness data of the three different materials namely silicon nitride (Si₃N₄), Zirconium dioxide (ZrO₂) and sialon (Si_6?_xAl_xO_xN_8?_x). It is observed that for the present data sets, the BIII distribution has the better fit than the Weibull distribution which is frequently used in the fracture toughness data analysis. 相似文献

12.

Logistic regression with a partially observed covariate

Dawn W. Blackhurst Mark D. Schluchter 《统计学通讯:模拟与计算》2013,42(1):163-177

We present results of a Monte Carlo study comparing four methods of estimating the parameters of the logistic model logit (pr (Y = 1 | X, Z)) = α₀ + α ₁ X + α ₂ Z where X and Z are continuous covariates and X is always observed but Z is sometimes missing. The four methods examined are 1) logistic regression using complete cases, 2) logistic regression with filled-in values of Z obtained from the regression of Z on X and Y, 3) logistic regression with filled-in values of Z and random error added, and 4) maximum likelihood estimation assuming the distribution of Z given X and Y is normal. Effects of different percent missing for Z and different missing value mechanisms on the bias and mean absolute deviation of the estimators are examined for data sets of N = 200 and N = 400. 相似文献

13.

Marginally restricted sequential D‐optimal designs

Jesús López‐Fidalgo Raul Martín‐Martín Douglas P. Wiens 《Revue canadienne de statistique》2008,36(3):397-410

In many experiments, not all explanatory variables can be controlled. When the units arise sequentially, different approaches may be used. The authors study a natural sequential procedure for “marginally restricted” D‐optimal designs. They assume that one set of explanatory variables (x₁) is observed sequentially, and that the experimenter responds by choosing an appropriate value of the explanatory variable x₂. In order to solve the sequential problem a priori, the authors consider the problem of constructing optimal designs with a prior marginal distribution for x₁. This eliminates the influence of units already observed on the next unit to be designed. They give explicit designs for various cases in which the mean response follows a linear regression model; they also consider a case study with a nonlinear logistic response. They find that the optimal strategy often consists of randomizing the assignment of the values of x₂. 相似文献

14.

Combining unbiased estimates —a further examination of some old estimators

《Journal of Statistical Computation and Simulation》2012,82(2):147-160

This paper considers the problem of combining k unbiased estimates, x _i of a parameter,μ, where each estimate, x _i is the average of n _i + l independent normal observations with unknown mean, μ, and unknown variance, σ_i ². The behavior of several commonly used estimators of μ is studied by means of an empirical sampling study, and the empirical results of this experiment are interpreted in terms of previous theoretical results. Finally, some extrapolations are made as to how these estimators might behave under varying conditions, and some new estimators are proposed which might have higher efficiencies under certain conditions than those which are generally used. 相似文献

15.

Exponential Family Techniques for the Lognormal Left Tail

下载免费PDF全文

Søren Asmussen Jens Ledet Jensen Leonardo Rojas‐Nandayapa 《Scandinavian Journal of Statistics》2016,43(3):774-787

Let X be lognormal(μ,σ²) with density f(x); let θ > 0 and define . We study properties of the exponentially tilted density (Esscher transform) f_θ(x) = e^?θxf(x)/L(θ), in particular its moments, its asymptotic form as θ→∞ and asymptotics for the saddlepoint θ(x) determined by . The asymptotic formulas involve the Lambert W function. The established relations are used to provide two different numerical methods for evaluating the left tail probability of the sum of lognormals S_n=X₁+?+X_n: a saddlepoint approximation and an exponential tilting importance sampling estimator. For the latter, we demonstrate logarithmic efficiency. Numerical examples for the cdf F_n(x) and the pdf f_n(x) of S_n are given in a range of values of σ²,n and x motivated by portfolio value‐at‐risk calculations. 相似文献

16.

Bounds on Bivariate Distribution Functions with Given Margins and Known Values at Several Points

H. A. Mardani-Fard S. M. Sadooghi-Alvandi Z. Shishebor 《统计学通讯:理论与方法》2013,42(20):3596-3621

Let H(x, y) be a continuous bivariate distribution function with known marginal distribution functions F(x) and G(y). Suppose the values of H are given at several points, H(x _i, y _i) = θ_i, i = 1, 2,…, n. We first discuss conditions for the existence of a distribution satisfying these conditions, and present a procedure for checking if such a distribution exists. We then consider finding lower and upper bounds for such distributions. These bounds may be used to establish bounds on the values of Spearman's ρ and Kendall's τ. For n = 2, we present necessary and sufficient conditions for existence of such a distribution function and derive best-possible upper and lower bounds for H(x, y). As shown by a counter-example, these bounds need not be proper distribution functions, and we find conditions for these bounds to be (proper) distribution functions. We also present some results for the general case, where the values of H(x, y) are known at more than two points. In view of the simplification in notation, our results are presented in terms of copulas, but they may easily be expressed in terms of distribution functions. 相似文献

17.

VARIANCE ESTIMATION IN TWO-PHASE SAMPLING

M.A. Hidiroglou J.N.K. Rao David Haziza 《Australian & New Zealand Journal of Statistics》2009,51(2):127-141

Two‐phase sampling is often used for estimating a population total or mean when the cost per unit of collecting auxiliary variables, x, is much smaller than the cost per unit of measuring a characteristic of interest, y. In the first phase, a large sample s₁ is drawn according to a specific sampling design p(s₁) , and auxiliary data x are observed for the units i∈s₁ . Given the first‐phase sample s₁ , a second‐phase sample s₂ is selected from s₁ according to a specified sampling design {p(s₂∣s₁) } , and (y, x) is observed for the units i∈s₂ . In some cases, the population totals of some components of x may also be known. Two‐phase sampling is used for stratification at the second phase or both phases and for regression estimation. Horvitz–Thompson‐type variance estimators are used for variance estimation. However, the Horvitz–Thompson ( Horvitz & Thompson, J. Amer. Statist. Assoc. 1952 ) variance estimator in uni‐phase sampling is known to be highly unstable and may take negative values when the units are selected with unequal probabilities. On the other hand, the Sen–Yates–Grundy variance estimator is relatively stable and non‐negative for several unequal probability sampling designs with fixed sample sizes. In this paper, we extend the Sen–Yates–Grundy ( Sen , J. Ind. Soc. Agric. Statist. 1953; Yates & Grundy , J. Roy. Statist. Soc. Ser. B 1953) variance estimator to two‐phase sampling, assuming fixed first‐phase sample size and fixed second‐phase sample size given the first‐phase sample. We apply the new variance estimators to two‐phase sampling designs with stratification at the second phase or both phases. We also develop Sen–Yates–Grundy‐type variance estimators of the two‐phase regression estimators that make use of the first‐phase auxiliary data and known population totals of some of the auxiliary variables. 相似文献

18.

On the asymptotic non‐equivalence of efficient‐GMM and MEL estimators in models with missing data

Xuerong Chen Yan Chen Alan T.K. Wan Yong Zhou 《Scandinavian Journal of Statistics》2019,46(2):361-388

The generalized method of moments (GMM) and empirical likelihood (EL) are popular methods for combining sample and auxiliary information. These methods are used in very diverse fields of research, where competing theories often suggest variables satisfying different moment conditions. Results in the literature have shown that the efficient‐GMM (GMM_E) and maximum empirical likelihood (MEL) estimators have the same asymptotic distribution to order n^?1/2 and that both estimators are asymptotically semiparametric efficient. In this paper, we demonstrate that when data are missing at random from the sample, the utilization of some well‐known missing‐data handling approaches proposed in the literature can yield GMM_E and MEL estimators with nonidentical properties; in particular, it is shown that the GMM_E estimator is semiparametric efficient under all the missing‐data handling approaches considered but that the MEL estimator is not always efficient. A thorough examination of the reason for the nonequivalence of the two estimators is presented. A particularly strong feature of our analysis is that we do not assume smoothness in the underlying moment conditions. Our results are thus relevant to situations involving nonsmooth estimating functions, including quantile and rank regressions, robust estimation, the estimation of receiver operating characteristic (ROC) curves, and so on. 相似文献

19.

Estimation of regression parameters in missing data problems

Donald L. Mcleish Cyntha A. Struthers 《Revue canadienne de statistique》2006,34(2):233-259

Let Y be a response variable, possibly multivariate, with a density function f (y|x, v; β) conditional on vectors x and v of covariates and a vector β of unknown parameters. The authors consider the problem of estimating β when the values taken by the covariate vector v are available for all observations while some of those taken by the covariate x are missing at random. They compare the profile estimator to several alternatives, both in terms of bias and standard deviation, when the response and covariates are discrete or continuous. 相似文献

20.

New estimators of distribution functions

A. K. Md. Ehsanes Saleh 《统计学通讯:理论与方法》2013,42(11):3145-3157

ABSTRACT

This article considers the estimation of a distribution function F_X(x) based on a random sample X₁, X₂, …, X_n when the sample is suspected to come from a close-by distribution F₀(x). The new estimators, namely the preliminary test (PTE) and Stein-type estimator (SE) are defined and compared with the “empirical distribution function” (edf) under local departure. In this case, we show that Stein-type estimators are superior to edf and PTE is superior to edf when it is close to F₀(x). As a by-product similar estimators are proposed for population quantiles. 相似文献