首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 218 毫秒
Information on several auxiliary variables correlated with the variable under study is available in most of the sample survey studies. This paper attempts an optimal use of several auxiliary variables in the form of a single auxiliary variable obtained as a linear function of these variables. The performance of this condensed auxiliary variable has been studied in selecting the sample.  相似文献   

In many studies a large number of variables is measured and the identification of relevant variables influencing an outcome is an important task. For variable selection several procedures are available. However, focusing on one model only neglects that there usually exist other equally appropriate models. Bayesian or frequentist model averaging approaches have been proposed to improve the development of a predictor. With a larger number of variables (say more than ten variables) the resulting class of models can be very large. For Bayesian model averaging Occam’s window is a popular approach to reduce the model space. As this approach may not eliminate any variables, a variable screening step was proposed for a frequentist model averaging procedure. Based on the results of selected models in bootstrap samples, variables are eliminated before deriving a model averaging predictor. As a simple alternative screening procedure backward elimination can be used. Through two examples and by means of simulation we investigate some properties of the screening step. In the simulation study we consider situations with fifteen and 25 variables, respectively, of which seven have an influence on the outcome. With the screening step most of the uninfluential variables will be eliminated, but also some variables with a weak effect. Variable screening leads to more applicable models without eliminating models, which are more strongly supported by the data. Furthermore, we give recommendations for important parameters of the screening step.  相似文献   


In this paper, under the assumption of linear relationship between two variables we provide alternative simple method of proving the existing result connecting correlation coefficient with those of skewness of response and explanatory variables. Further we have given a relationship between correlation coefficient and coefficient of kurtosis of response and explanatory variables assuming the linear relationship between the two variables. Simple alternative way of deriving the formula, which helps in finding the direction dependence in linear regression, is discussed.  相似文献   

Quadratic forms capture multivariate information in a single number, making them useful, for example, in hypothesis testing. When a quadratic form is large and hence interesting, it might be informative to partition the quadratic form into contributions of individual variables. In this paper it is argued that meaningful partitions can be formed, though the precise partition that is determined will depend on the criterion used to select it. An intuitively reasonable criterion is proposed and the partition to which it leads is determined. The partition is based on a transformation that maximises the sum of the correlations between individual variables and the variables to which they transform under a constraint. Properties of the partition, including optimality properties, are examined. The contributions of individual variables to a quadratic form are less clear‐cut when variables are collinear, and forming new variables through rotation can lead to greater transparency. The transformation is adapted so that it has an invariance property under such rotation, whereby the assessed contributions are unchanged for variables that the rotation does not affect directly. Application of the partition to Hotelling's one‐ and two‐sample test statistics, Mahalanobis distance and discriminant analysis is described and illustrated through examples. It is shown that bootstrap confidence intervals for the contributions of individual variables to a partition are readily obtained.  相似文献   

Biplots are useful tools to explore the relationship among variables. In this paper, the specific regression relationship between a set of predictors X and set of response variables Y by means of partial least-squares (PLS) regression is represented. The PLS biplot provides a single graphical representation of the samples together with the predictor and response variables, as well as their interrelationships in terms of the matrix of regression coefficients.  相似文献   

In this paper, we prove a Hoeffding-like inequality for the survival function of a sum of symmetric independent identically distributed random variables, taking values in a segment [?b, b] of the reals. The symmetric case is relevant to the auditing practice and is an important case study for further investigations. The bounds as given by Hoeffding in 1963 cannot be improved upon unless we restrict the class of random variables, for instance, by assuming the law of the random variables to be symmetric with respect to their mean, which we may assume to be zero. The main result in this paper is an improvement of the Hoeffding bound for i.i.d. random variables which are bounded and have a (upper bound for the) variance by further assuming that they have a symmetric law.  相似文献   

In this paper we review some notions of positive dependence of random variables with a common univariate marginal distribution and describe the related moment and probability inequalities. We first present a comparison between i.i.d. random variables and exchangeable random variables via an application of de Finetti's theorem, then describe some useful probability inequalities via partial orderings of the strength of their positive dependence. Finally, we state a result for random variables which are not necessarily exchangeable. Special applications to the multivariate normal distribution will be discussed, and the results involve only the correlation matrix of the distribution.  相似文献   

This article provides a strategy to identify the existence and direction of a causal effect in a generalized nonparametric and nonseparable model identified by instrumental variables. The causal effect concerns how the outcome depends on the endogenous treatment variable. The outcome variable, treatment variable, other explanatory variables, and the instrumental variable can be essentially any combination of continuous, discrete, or “other” variables. In particular, it is not necessary to have any continuous variables, none of the variables need to have large support, and the instrument can be binary even if the corresponding endogenous treatment variable and/or outcome is continuous. The outcome can be mismeasured or interval-measured, and the endogenous treatment variable need not even be observed. The identification results are constructive, and can be empirically implemented using standard estimation results.  相似文献   

We consider a linear regression model when some independent variables are unobservable, but proxy variables are available instead of them. We derive the distribution and density functions of a pre-test estimator of the error variance after a pre-test for the null hypothesis that the coefficients for the unobservable variables are zeros. Based on the density function, we show that when the critical value of the pre-test is unity, the coverage probability in the interval estimation of the error variance is maximum.  相似文献   

In this paper, a variables tightened-normal-tightened (TNT) two-plan sampling system based on the widely used capability index Cpk is developed for product acceptance determination when the quality characteristic of products has two-sided specification limits and follows a normal distribution. The operating procedure and operating characteristic (OC) function of the variables TNT two-plan sampling system, and the conditions for solving plan parameters are provided. The behavior of OC curves for the variables TNT sampling system under various parameters is also studied, and compared with the variables single tightened inspection plan and single normal inspection plan.  相似文献   

We study the correlation structure for a mixture of ordinal and continuous repeated measures using a Bayesian approach. We assume a multivariate probit model for the ordinal variables and a normal linear regression for the continuous variables, where latent normal variables underlying the ordinal data are correlated with continuous variables in the model. Due to the probit model assumption, we are required to sample a covariance matrix with some of the diagonal elements equal to one. The key computational idea is to use parameter-extended data augmentation, which involves applying the Metropolis-Hastings algorithm to get a sample from the posterior distribution of the covariance matrix incorporating the relevant restrictions. The methodology is illustrated through a simulated example and through an application to data from the UCLA Brain Injury Research Center.  相似文献   

Variable selection for nonlinear regression is a complex problem, made even more difficult when there are a large number of potential covariates and a limited number of datapoints. We propose herein a multi-stage method that combines state-of-the-art techniques at each stage to best discover the relevant variables. At the first stage, an extension of the Bayesian Additive Regression tree is adopted to reduce the total number of variables to around 30. At the second stage, sensitivity analysis in the treed Gaussian process is adopted to further reduce the total number of variables. Two stopping rules are designed and sequential design is adopted to make best use of previous information. We demonstrate our approach on two simulated examples and one real data set.  相似文献   

An exploratory tool is introduced to examine potential non-linear relation-ships between two sets of variables, X andY, in a sample of multivariate data. Simulated annealing is applied to find canonical coefficient vectors a and b such that a squared non-linear correlation between a'Xand b'Y is maximiSed. A measure of non-linear correlation is developed for this optimization which utilies a nearest-neighbor regression estimate for the unknown functional relationship. In addition to examining potential relations between the canonical variables, this method can identify the important variables in each set.  相似文献   

Various methods for clustering mixed-mode data are compared. It is found that a method based on a finite mixture model in which the observed categorical variables are generated from underlying continuous variables out-performs more conventional methods when applied to artificially generated data. This method also performs best when applied to Fisher's iris data in which two of the variables are categorized by applying thresholds.  相似文献   

This article proposes a variables two-plan sampling system called tightened-normal-tightened (TNT) sampling inspection scheme where the quality characteristic follows a normal distribution or a lognormal distribution and has an upper or a lower specification limit. The TNT variables sampling inspection scheme will be useful when testing is costly and destructive. The advantages of the variables TNT scheme over variables single and double sampling plans and attributes TNT scheme are discussed. Tables are also constructed for the selection of parameters of known and unknown standard deviation variables TNT schemes for a given acceptable quality level (AQL) and limiting quality level (LQL). The problem is formulated as a nonlinear programming where the objective function to be minimized is the average sample number and the constraints are related to lot acceptance probabilities at AQL and LQL under the operating characteristic curve.  相似文献   

An elementary method of proof of the mode, median, and mean inequality is given for skewed, unimodal distributions of continuous random variables. A proof of the inequality for the gamma, F, and beta random variables is sketched.  相似文献   

A Gaussian copula is widely used to define correlated random variables. To obtain a prescribed Pearson correlation coefficient of ρx between two random variables with given marginal distributions, the correlation coefficient ρz between two standard normal variables in the copula must take a specific value which satisfies an integral equation that links ρx to ρz. In a few cases, this equation has an explicit solution, but in other cases it must be solved numerically. This paper attempts to address this issue. If two continuous random variables are involved, the marginal transformation is approximated by a weighted sum of Hermite polynomials; via Mehler’s formula, a polynomial of ρz is derived to approximate the function relationship between ρx and ρz. If a discrete variable is involved, the marginal transformation is decomposed into piecewise continuous ones, and ρx is expressed as a polynomial of ρz by Taylor expansion. For a given ρx, ρz can be efficiently determined by solving a polynomial equation.  相似文献   

A particular mixture of bivariate distributions is used to present examples of dependent uncorrelated random variables and independent random variables. A necessary and sufficient condition for the independence for such a bivariate distribution is given.  相似文献   

The problem of determining the number of variables to be included in the linear regression model is considered under the assumption that the dependent and independent variables have a joint normal distribution. It is shown that for a given sample size n there exists an optimal number k0 (0 ≤ k0 < n-2) of variables among all independent variables in the model, such that the expectation of the mean squared error corresponding to the prediction equation with k0 variables is minimal.Application of this result to ustepwise procedures is discussed.  相似文献   

This paper attempts to develop a repetitive group sampling (RGS) plan by variables inspection for controlling the process fraction defective or the number of nonconformities when the quality characteristic follows a normal distribution and has only the lower or upper specification limit. The proposed sampling plan is derived by the exact sampling distribution rather than the approximation approach. The plan parameters are solved by a nonlinear optimization model which minimizes the average sample number required for inspection and fulfills the classical two-point conditions on the operating characteristic (OC) curve. The efficiency of the proposed variables RGS is examined and also compared with the existing variables single sampling plan in terms of the sample size required for inspection. The results indicate that the proposed variables RGS plan could significantly reduce samples required for inspection compared to the traditional variables single sampling plan.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号