首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
In many applications we can expect that, or are interested to know if, a density function or a regression curve satisfies some specific shape constraints. For example, when the explanatory variable, X, represents the value taken by a treatment or dosage, the conditional mean of the response, Y , is often anticipated to be a monotone function of X. Indeed, if this regression mean is not monotone (in the appropriate direction) then the medical or commercial value of the treatment is likely to be significantly curtailed, at least for values of X that lie beyond the point at which monotonicity fails. In the case of a density, common shape constraints include log-concavity and unimodality. If we can correctly guess the shape of a curve, then nonparametric estimators can be improved by taking this information into account. Addressing such problems requires a method for testing the hypothesis that the curve of interest satisfies a shape constraint, and, if the conclusion of the test is positive, a technique for estimating the curve subject to the constraint. Nonparametric methodology for solving these problems already exists, but only in cases where the covariates are observed precisely. However in many problems, data can only be observed with measurement errors, and the methods employed in the error-free case typically do not carry over to this error context. In this paper we develop a novel approach to hypothesis testing and function estimation under shape constraints, which is valid in the context of measurement errors. Our method is based on tilting an estimator of the density or the regression mean until it satisfies the shape constraint, and we take as our test statistic the distance through which it is tilted. Bootstrap methods are used to calibrate the test. The constrained curve estimators that we develop are also based on tilting, and in that context our work has points of contact with methodology in the error-free case.  相似文献   

2.
A new technique is devised to mitigate the errors-in-variables bias in linear regression. The procedure mimics a 2-stage least squares procedure where an auxiliary regression which generates a better behaved predictor variable is derived. The generated variable is then used as a substitute for the error-prone variable in the first-stage model. The performance of the algorithm is tested by simulation and regression analyses. Simulations suggest the algorithm efficiently captures the additive error term used to contaminate the artificial variables. Regressions provide further credit to the simulations as they clearly show that the compact genetic algorithm-based estimate of the true but unobserved regressor yields considerably better results. These conclusions are robust across different sample sizes and different variance structures imposed on both the measurement error and regression disturbances.  相似文献   

3.
For a linear regression model over m populations with separate regression coefficients but a common error variance, a Bayesian model is employed to obtain regression coefficient estimates which are shrunk toward an overall value. The formulation uses Normal priors on the coefficients and diffuse priors on the grand mean vectors, the error variance, and the between-to-error variance ratios. The posterior density of the parameters which were given diffuse priors is obtained. From this the posterior means and variances of regression coefficients and the predictive mean and variance of a future observation are obtained directly by numerical integration in the balanced case, and with the aid of series expansions in the approximately balanced case. An example is presented and worked out for the case of one predictor variable. The method is an extension of Box & Tiao's Bayesian estimation of means in the balanced one-way random effects model.  相似文献   

4.
The cumulative sum (CUSUM) chart is commonly used for detecting small or moderate shifts in the fraction of defective manufactured items. However, its construction relies on the error-free inspection assumption, which can seldom be met in practice. In this article, we discuss the construction of an upward CUSUM chart in the presence of inspection error, study the effects of inspection error on the out-of-control ARL of the CUSUM chart, and present a formula for determining the sampling size that compensates for the effect of inspection error on the out-of-control ARL.  相似文献   

5.
Researchers in the medical, health, and social sciences routinely encounter ordinal variables such as self‐reports of health or happiness. When modelling ordinal outcome variables, it is common to have covariates, for example, attitudes, family income, retrospective variables, measured with error. As is well known, ignoring even random error in covariates can bias coefficients and hence prejudice the estimates of effects. We propose an instrumental variable approach to the estimation of a probit model with an ordinal response and mismeasured predictor variables. We obtain likelihood‐based and method of moments estimators that are consistent and asymptotically normally distributed under general conditions. These estimators are easy to compute, perform well and are robust against the normality assumption for the measurement errors in our simulation studies. The proposed method is applied to both simulated and real data. The Canadian Journal of Statistics 47: 653–667; 2019 © 2019 Statistical Society of Canada  相似文献   

6.
For continuous inspection schemes in an automated manufacturing environment, a useful alternative to the traditional p or np chart is the Run-Length control chart, which is based on plotting the run lengths (the number of conforming items) between successive nonconforming items. However, its establishment relies on the error-free inspection assumption, which can seldom be met in practice. In this paper, the effects of inspection errors on the Run-Length chart are investigated based on that these errors are assumed known. The actual false alarm probability and the average number inspected (ANI) in the presence of inspection errors are studied. This paper also presents the adjusted control limits for the Run-Length chart, which can provide much closer ANI curves to the ones obtained under error-free inspection.  相似文献   

7.
Non-symmetric correspondence analysis (NSCA) is a useful technique for analysing a two-way contingency table. Frequently, the predictor variables are more than one; in this paper, we consider two categorical variables as predictor variables and one response variable. Interaction represents the joint effects of predictor variables on the response variable. When interaction is present, the interpretation of the main effects is incomplete or misleading. To separate the main effects and the interaction term, we introduce a method that, starting from the coordinates of multiple NSCA and using a two-way analysis of variance without interaction, allows a better interpretation of the impact of the predictor variable on the response variable. The proposed method has been applied on a well-known three-way contingency table proposed by Bockenholt and Bockenholt in which they cross-classify subjects by person's attitude towards abortion, number of years of education and religion. We analyse the case where the variables education and religion influence a person's attitude towards abortion.  相似文献   

8.
In an epidemiological study the regression slope between a response and predictor variable is underestimated when the predictor variable is measured imprecisely. Repeat measurements of the predictor in individuals in a subset of the study or in a separate study can be used to estimate a multiplicative factor to correct for this 'regression dilution bias'. In applied statistics publications various methods have been used to estimate this correction factor. Here we compare six different estimation methods and explain how they fall into two categories, namely regression and correlation-based methods. We provide new asymptotic variance formulae for the optimal correction factors in each category, when these are estimated from the repeat measurements subset alone, and show analytically and by simulation that the correlation method of choice gives uniformly lower variance. The simulations also show that, when the correction factor is not much greater than 1, this correlation method gives a correction factor which is closer to the true value than that from the best regression method on up to 80% of occasions. We also provide a variance formula for a modified correlation method which uses the standard deviation of the predictor variable in the main study; this shows further improved performance provided that the correction factor is not too extreme. A confidence interval for a corrected regression slope in an epidemiological study should reflect the imprecision of both the uncorrected slope and the estimated correction factor. We provide formulae for this and show that, particularly when the correction factor is large and the size of the subset of repeat measures is small, the effect of allowing for imprecision in the estimated correction factor can be substantial.  相似文献   

9.
This paper extends the concept of risk unbiasedness for applying to statistical prediction and nonstandard inference problems, by formalizing the idea that a risk unbiased predictor should be at least as close to the “true” predictant as to any “wrong” predictant, on the average. A novel aspect of our approach is measuring closeness between a predicted value and the predictant by a regret function, derived suitably from the given loss function. The general concept is more relevant than mean unbiasedness, especially for asymmetric loss functions. For squared error loss, we present a method for deriving best (minimum risk) risk unbiased predictors when the regression function is linear in a function of the parameters. We derive a Rao–Blackwell type result for a class of loss functions that includes squared error and LINEX losses as special cases. For location-scale families, we prove that if a unique best risk unbiased predictor exists, then it is equivariant. The concepts and results are illustrated with several examples. One interesting finding is that in some problems a best unbiased predictor does not exist, but a best risk unbiased predictor can be obtained. Thus, risk unbiasedness can be a useful tool for selecting a predictor.  相似文献   

10.
This paper considers the problem of simultaneous prediction of the actual and average values of the dependent variable in a general linear regression model. Utilizing the philosophy of Stein rule procedure, a family of improved predictors for a linear function of the actual and expected value of the dependent variable for the forecast period has been proposed. An unbiased estimator for the mean squared error (MSE) matrix of the proposed family of predictors has been obtained and dominance of the family of Stein rule predictors over the best linear unbiased predictor (BLUP) has been established under a quadratic loss function.  相似文献   

11.
Measurement error and misclassification models feature prominently in the literature. This paper describes misreporting error, which can be considered to fall somewhere between these two broad types of model. Misreporting is concerned with situations where a continuous random variable X is measured with error and only reported as the discrete random variable Z. Data grouping or rounding are the simplest examples of this, but more generally X may be reported as a value z of Z which refers to a different interval from the one in which X lies. The paper discusses a method for handling misreported data and draws links with measurement error and misclassification models. A motivating example is considered from a prenatal Down's syndrome screening, where the gestational age at which mothers present for screening is a true continuous variable but is misreported because it is only ever observed as a discrete whole number of weeks which may in fact be in error. The implications this misreporting might have for the screening are investigated.  相似文献   

12.
The paper develops a method from which algorithms can be constructed to numerically compute error-free (free from computer roundoff error) generalized inverses and solutions to linear least squares problems having rational entries. A multiple modulus system is used to avoid error accumulation that is inherent in the floating-point number system. Some properties of finite fields of characteristic p, GF(p), are used in conjunction with a bordering method for matrix inversion to find nonsingular minors of a matrix over the field of rational numbers.  相似文献   

13.
We propose a new set of test statistics to examine the association between two ordinal categorical variables X and Y after adjusting for continuous and/or categorical covariates Z. Our approach first fits multinomial (e.g., proportional odds) models of X and Y, separately, on Z. For each subject, we then compute the conditional distributions of X and Y given Z. If there is no relationship between X and Y after adjusting for Z, then these conditional distributions will be independent, and the observed value of (X, Y) for a subject is expected to follow the product distribution of these conditional distributions. We consider two simple ways of testing the null of conditional independence, both of which treat X and Y equally, in the sense that they do not require specifying an outcome and a predictor variable. The first approach adds these product distributions across all subjects to obtain the expected distribution of (X, Y) under the null and then contrasts it with the observed unconditional distribution of (X, Y). Our second approach computes "residuals" from the two multinomial models and then tests for correlation between these residuals; we define a new individual-level residual for models with ordinal outcomes. We present methods for computing p-values using either the empirical or asymptotic distributions of our test statistics. Through simulations, we demonstrate that our test statistics perform well in terms of power and Type I error rate when compared to proportional odds models which treat X as either a continuous or categorical predictor. We apply our methods to data from a study of visual impairment in children and to a study of cervical abnormalities in human immunodeficiency virus (HIV)-infected women. Supplemental materials for the article are available online.  相似文献   

14.
Random error in a continuous outcome variable does not affect its regression on a predictor. However, when a continuous outcome variable is dichotomised, random measurement error results in a flatter exposure-response relationship with a higher intercept. Although this consequence is similar to the effect of misclassification in a binary outcome variable, it cannot be corrected using techniques appropriate for binary data. Conditional distributions of the measurements of the continuous outcome variable can be corrected if the reliability coefficient of the measurements can be estimated. An unbiased estimate of the exposure-response relationship is then easily calculated. This procedure is demonstrated using data on the relationship between smoking and the development of airway obstruction.  相似文献   

15.
Mixed effects models and Berkson measurement error models are widely used. They share features which the author uses to develop a unified estimation framework. He deals with models in which the random effects (or measurement errors) have a general parametric distribution, whereas the random regression coefficients (or unobserved predictor variables) and error terms have nonparametric distributions. He proposes a second-order least squares estimator and a simulation-based estimator based on the first two moments of the conditional response variable given the observed covariates. He shows that both estimators are consistent and asymptotically normally distributed under fairly general conditions. The author also reports Monte Carlo simulation studies showing that the proposed estimators perform satisfactorily for relatively small sample sizes. Compared to the likelihood approach, the proposed methods are computationally feasible and do not rely on the normality assumption for random effects or other variables in the model.  相似文献   

16.
17.
We propose a simple method for evaluating the model that has been chosen by an adaptive regression procedure, our main focus being the lasso. This procedure deletes each chosen predictor and refits the lasso to get a set of models that are “close” to the chosen “base model,” and compares the error rates of the base model with that of nearby models. If the deletion of a predictor leads to significant deterioration in the model's predictive power, the predictor is called indispensable; otherwise, the nearby model is called acceptable and can serve as a good alternative to the base model. This provides both an assessment of the predictive contribution of each variable and a set of alternative models that may be used in place of the chosen model. We call this procedure “Next-Door analysis” since it examines models “next” to the base model. It can be applied to supervised learning problems with 1 penalization and stepwise procedures. We have implemented it in the R language as a library to accompany the well-known glmnet library. The Canadian Journal of Statistics 48: 447–470; 2020 © 2020 Statistical Society of Canada  相似文献   

18.
Summary In spite of widespread criticism, macroeconometric models are still most popular for forecasting and policy, analysis. When the most recent data available on both the exogenous and the endogenous variable are preliminaryestimates subject to a revision process, the estimators of the coefficients are affected by the presence of the preliminary data, the projections for the exogenous variables are affected by the presence of data uncertainty, the values of lagged dependent variables used as initial values for, forecasts are still subject to revisions. Since several provisional estimates of the value of a certain variable are available before the data are finalized, in this paper they are seen as repeated predictions of the same quantity (referring to different information sets not necessarily overlapping with one other) to be exploited in a forecast combination framework. The components of the asymptotic bias and of the asymptotic mean square prediction error related to data uncertainty can be reduced or eliminated by using a forecast combination technique which makes the deterministic and the Monte Carlo predictors not worse than either predictor used with or without provisional data. The precision of the forecast with the nonlinear model can be improved if the provisional data are not rational predictions of the final data and contain systematic effects. Economics Department European University Institute Thanks are due to my Ph. D. thesis advisor Bobby Mariano for his guidance and encouragment at various stages of this research. The comments of the participants in the Europan Meeting of the Econometric Society in Maastricht, Aug. 1994, helped in improving the presentation,. A grant from the NSF (SES 8604219) is gratefully acknowledged.  相似文献   

19.
Summary.  The importance of variable selection in regression has grown in recent years as computing power has encouraged the modelling of data sets of ever-increasing size. Data mining applications in finance, marketing and bioinformatics are obvious examples. A limitation of nearly all existing variable selection methods is the need to specify the correct model before selection. When the number of predictors is large, model formulation and validation can be difficult or even infeasible. On the basis of the theory of sufficient dimension reduction, we propose a new class of model-free variable selection approaches. The methods proposed assume no model of any form, require no nonparametric smoothing and allow for general predictor effects. The efficacy of the methods proposed is demonstrated via simulation, and an empirical example is given.  相似文献   

20.
Exact and approximate Bayesian inference is developed for the prediction problem in finite populations under a linear functional superpopulation model. The models considered are the usual regression models involving two variables, X and Y, where the independent variable X is measured with error. The approach is based on the conditional distribution of Y given X and our predictor is the posterior mean of the quantity of interest (population total and population variance) given the observed data. Empirical investigations about optimal purposive samples and possible model misspecifications based on comparisons with the corresponding models where X is measured without error are also reported.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号