期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Rank correlation plots for use with correlated input variables

Ronald L. Iman James M. Davenport 《统计学通讯:模拟与计算》2013,42(3):335-360

A method for inducing a desired rank correlation matrix on multivariate input vectors for simulation studies has recently been developed by Iman and Conover (1982). The primary intention of this procedure is to produce correlated input variables for use with computer models. Since this procedure is distribution free and allows the exact marginal distributions to remain intact it can be used with any marginal distributions for which it is reasonable to think in terms of correlation. In this paper we present a series of rank correlation plots based on this procedure when the marginal distributions are normal, lognormal, uniform and loguniform. These plots provide a convenient tool both for aiding the modeler in determining the degree of dependence among input variables (rather than guessing) and for communicating with the modeler the effect of different correlation assumptions. In addition this procedure can be used with sample multivariate data by sampling directly from the respective marginal empirical distribution functions. 相似文献

2.

OPTIMAL DESIGN OF PLANS FOR ACCEPTANCE SAMPLING BY VARIABLES WITH INVERSE GAUSSIAN DISTRIBUTION

《统计学通讯:模拟与计算》2013,42(3):463-488

ABSTRACT

A vast majority of the literature on the design of sampling plans by variables assumes that the distribution of the quality characteristic variable is normal, and that only its mean varies while its variance is known and remains constant. But, for many processes, the quality variable is nonnormal, and also either one or both of the mean and the variance of the variable can vary randomly. In this paper, an optimal economic approach is developed for design of plans for acceptance sampling by variables having Inverse Gaussian (IG) distributions. The advantage of developing an IG distribution based model is that it can be used for diverse quality variables ranging from highly skewed to almost symmetrical. We assume that the process has two independent assignable causes, one of which shifts the mean of the quality characteristic variable of a product and the other shifts the variance. Since a product quality variable may be affected by any one or both of the assignable causes, three different likely cases of shift (mean shift only, variance shift only, and both mean and variance shift) have been considered in the modeling process. For all of these likely scenarios, mathematical models giving the cost of using a variable acceptance sampling plan are developed. The cost models are optimized in selecting the optimal sampling plan parameters, such as the sample size, and the upper and lower acceptance limits. A large set of numerical example problems is solved for all the cases. Some of these numerical examples are also used in depicting the consequences of: 1) using the assumption that the quality variable is normally distributed when the true distribution is IG, and 2) using sampling plans from the existing standards instead of the optimal plans derived by the methodology developed in this paper. Sensitivities of some of the model input parameters are also studied using the analysis of variance technique. The information obtained on the parameter sensitivities can be used by the model users on prudently allocating resources for estimation of input parameters. 相似文献

3.

Small sample sensitivity analysis techniques for computer models.with an application to risk assessment

Ronald L. Iman W.J. Conover 《统计学通讯:理论与方法》2013,42(17):1749-1842

As modeling efforts expand to a broader spectrum of areas the amount of computer time required to exercise the corresponding computer codes has become quite costly (several hours for a single run is not uncommon). This costly process can be directly tied to the complexity of the modeling and to the large number of input variables (often numbering in the hundreds) Further, the complexity of the modeling (usually involving systems of differential equations) makes the relationships among the input variables not mathematically tractable. In this setting it is desired to perform sensitivity studies of the input-output relationships. Hence, a judicious selection procedure for the choic of values of input variables is required, Latin hypercube sampling has been shown to work well on this type of problem.

However, a variety of situations require that decisions and judgments be made in the face of uncertainty. The source of this uncertainty may be lack ul knowledge about probability distributions associated with input variables, or about different hypothesized future conditions, or may be present as a result of different strategies associated with a decision making process In this paper a generalization of Latin hypercube sampling is given that allows these areas to be investigated without making additional computer runs. In particular it is shown how weights associated with Latin hypercube input vectors may be rhangpd to reflect different probability distribution assumptions on key input variables and yet provide: an unbiased estimate of the cumulative distribution function of the output variable. This allows for different distribution assumptions on input variables to be studied without additional computer runs and without fitting a response surface. In addition these same weights can be used in a modified nonparametric Friedman test to compare treatments, Sample size requirements needed to apply the results of the work are also considered. The procedures presented in this paper are illustrated using a model associated with the risk assessment of geologic disposal of radioactive waste. 相似文献

4.

Variable selection in joint location and scale models of the skew-normal distribution

Liu-Cang Wu Zhong-Zhan Zhang Deng-Ke Xu 《Journal of Statistical Computation and Simulation》2013,83(7):1266-1278

A regression model with skew-normal errors provides a useful extension for ordinary normal regression models when the data set under consideration involves asymmetric outcomes. Variable selection is an important issue in all regression analyses, and in this paper, we investigate the simultaneously variable selection in joint location and scale models of the skew-normal distribution. We propose a unified penalized likelihood method which can simultaneously select significant variables in the location and scale models. Furthermore, the proposed variable selection method can simultaneously perform parameter estimation and variable selection in the location and scale models. With appropriate selection of the tuning parameters, we establish the consistency and the oracle property of the regularized estimators. Simulation studies and a real example are used to illustrate the proposed methodologies. 相似文献

5.

Variable Selection for Naive Bayes Semisupervised Learning

Byoung-Jeong Choi Kwang-Rae Kim Kyu-Dong Cho Changyi Park 《统计学通讯:模拟与计算》2013,42(10):2702-2713

This article deals with a semisupervised learning based on naive Bayes assumption. A univariate Gaussian mixture density is used for continuous input variables whereas a histogram type density is adopted for discrete input variables. The EM algorithm is used for the computation of maximum likelihood estimators of parameters in the model when we fix the number of mixing components for each continuous input variable. We carry out a model selection for choosing a parsimonious model among various fitted models based on an information criterion. A common density method is proposed for the selection of significant input variables. Simulated and real datasets are used to illustrate the performance of the proposed method. 相似文献

6.

Fast robust feature screening for ultrahigh-dimensional varying coefficient models

Xuejun Ma Xin Chen 《Journal of Statistical Computation and Simulation》2017,87(4):724-732

In this paper, we propose a new partial correlation, the so-called composite quantile partial correlation, to measure the relationship of two variables given other variables. We further use this correlation to screen variables in ultrahigh-dimensional varying coefficient models. Our proposed method is fast and robust against outliers and can be efficiently employed in both single index variable and multiple index variable varying coefficient models. Numerical results indicate the preference of our proposed method. 相似文献

7.

Variable selection and importance in presence of high collinearity: an application to the prediction of lean body mass from multi-frequency bioelectrical impedance

Camillo Cammarota Alessandro Pinto 《Journal of applied statistics》2021,48(9):1644

In prediction problems both response and covariates may have high correlation with a second group of influential regressors, that can be considered as background variables. An important challenge is to perform variable selection and importance assessment among the covariates in the presence of these variables. A clinical example is the prediction of the lean body mass (response) from bioimpedance (covariates), where anthropometric measures play the role of background variables. We introduce a reduced dataset in which the variables are defined as the residuals with respect to the background, and perform variable selection and importance assessment both in linear and random forest models. Using a clinical dataset of multi-frequency bioimpedance, we show the effectiveness of this method to select the most relevant predictors of the lean body mass beyond anthropometry. 相似文献

8.

Interpreting an Inequality in Multiple Regression

Carles M. Cuadras 《The American statistician》2013,67(4):256-258

This article provides a method of interpreting a surprising inequality in multiple linear regression: the squared multiple correlation can be greater than the sum of the simple squared correlations between the response variable and each of the predictor variables. The interpretation is obtained via principal component analysis by studying the influence of some components with small variance on the response variable. One example is used as an illustration and some conclusions are derived. 相似文献

9.

PARTIAL CORRELATION AND CONDITIONAL CORRELATION AS MEASURES OF CONDITIONAL INDEPENDENCE 总被引：1，自引：0，他引：1

Kunihiro Baba Ritei Shibata Masaaki Sibuya 《Australian & New Zealand Journal of Statistics》2004,46(4):657-664

This paper investigates the roles of partial correlation and conditional correlation as measures of the conditional independence of two random variables. It first establishes a sufficient condition for the coincidence of the partial correlation with the conditional correlation. The condition is satisfied not only for multivariate normal but also for elliptical, multivariate hypergeometric, multivariate negative hypergeometric, multinomial and Dirichlet distributions. Such families of distributions are characterized by a semigroup property as a parametric family of distributions. A necessary and sufficient condition for the coincidence of the partial covariance with the conditional covariance is also derived. However, a known family of multivariate distributions which satisfies this condition cannot be found, except for the multivariate normal. The paper also shows that conditional independence has no close ties with zero partial correlation except in the case of the multivariate normal distribution; it has rather close ties to the zero conditional correlation. It shows that the equivalence between zero conditional covariance and conditional independence for normal variables is retained by any monotone transformation of each variable. The results suggest that care must be taken when using such correlations as measures of conditional independence unless the joint distribution is known to be normal. Otherwise a new concept of conditional independence may need to be introduced in place of conditional independence through zero conditional correlation or other statistics. 相似文献

10.

A Criterion for Stepwise Regression

R. B. Bendel A. A. Afifi 《The American statistician》2013,67(2):85-87

In “stepwise” regression analysis, the usual procedure enters or removes variables at each “step” on the basis of testing whether certain partial correlation coefficients are zero. An alternative method suggested in this paper involves testing the hypothesis that the mean square error of prediction does not decrease from one step to the next. This is equivalent to testing that the partial correlation coefficient is equal to a certain nonzero constant. For sample sizes sufficiently large, Fisher's z transformation can be used to obtain an asymptotically UMP unbiased test. The two methods are contrasted with an example involving actual data. 相似文献

11.

Nonlinear mixed-effects scalar-on-function models and variable selection

Cheng Yafeng Shi Jian Qing Eyre Janet 《Statistics and Computing》2020,30(1):129-140

This paper is motivated by our collaborative research and the aim is to model clinical assessments of upper limb function after stroke using 3D-position and 4D-orientation movement data. We present a new nonlinear mixed-effects scalar-on-function regression model with a Gaussian process prior focusing on the variable selection from a large number of candidates including both scalar and function variables. A novel variable selection algorithm has been developed, namely functional least angle regression. As it is essential for this algorithm, we studied the representation of functional variables with different methods and the correlation between a scalar and a group of mixed scalar and functional variables. We also propose a new stopping rule for practical use. This algorithm is efficient and accurate for both variable selection and parameter estimation even when the number of functional variables is very large and the variables are correlated. And thus the prediction provided by the algorithm is accurate. Our comprehensive simulation study showed that the method is superior to other existing variable selection methods. When the algorithm was applied to the analysis of the movement data, the use of the nonlinear random-effect model and the function variables significantly improved the prediction accuracy for the clinical assessment.

相似文献

12.

Estimating percentiles of uncertain computer code outputs

Jeremy Oakley 《Journal of the Royal Statistical Society. Series C, Applied statistics》2004,53(1):83-93

Summary. A deterministic computer model is to be used in a situation where there is uncertainty about the values of some or all of the input parameters. This uncertainty induces uncertainty in the output of the model. We consider the problem of estimating a specific percentile of the distribution of this uncertain output. We also suppose that the computer code is computationally expensive, so we can run the model only at a small number of distinct inputs. This means that we must consider our uncertainty about the computer code itself at all untested inputs. We model the output, as a function of its inputs, as a Gaussian process, and after a few initial runs of the code use a simulation approach to choose further suitable design points and to make inferences about the percentile of interest itself. An example is given involving a model that is used in sewer design. 相似文献

13.

Untangle the structural and random zeros in statistical modelings

W. Tang W.J. Wang D.G. Chen 《Journal of applied statistics》2018,45(9):1714-1733

Count data with structural zeros are common in public health applications. There are considerable researches focusing on zero-inflated models such as zero-inflated Poisson (ZIP) and zero-inflated Negative Binomial (ZINB) models for such zero-inflated count data when used as response variable. However, when such variables are used as predictors, the difference between structural and random zeros is often ignored and may result in biased estimates. One remedy is to include an indicator of the structural zero in the model as a predictor if observed. However, structural zeros are often not observed in practice, in which case no statistical method is available to address the bias issue. This paper is aimed to fill this methodological gap by developing parametric methods to model zero-inflated count data when used as predictors based on the maximum likelihood approach. The response variable can be any type of data including continuous, binary, count or even zero-inflated count responses. Simulation studies are performed to assess the numerical performance of this new approach when sample size is small to moderate. A real data example is also used to demonstrate the application of this method. 相似文献

14.

Higher-Order Moments Using the Survival Function: The Alternative Expectation Formula

Subhabrata Chakraborti Felipe Jardim Eugenio Epprecht 《The American statistician》2019,73(2):191-194

Undergraduate and graduate students in a first-year probability (or a mathematical statistics) course learn the important concept of the moment of a random variable. The moments are related to various aspects of a probability distribution. In this context, the formula for the mean or the first moment of a nonnegative continuous random variable is often shown in terms of its c.d.f. (or the survival function). This has been called the alternative expectation formula. However, higher-order moments are also important, for example, to study the variance or the skewness of a distribution. In this note, we consider the rth moment of a nonnegative random variable and derive formulas in terms of the c.d.f. (or the survival function) paralleling the existing results for the first moment (the mean) using Fubini's theorem. Both nonnegative continuous and discrete integer-valued random variables are considered. These formulas may be advantageous, for example, when dealing with the moments of a transformed random variable, where it may be easier to derive its c.d.f. using the so-called c.d.f. method. 相似文献

15.

Transforming linear functions to normality:optimal component powers

Robert J. Boik 《统计学通讯:模拟与计算》2013,42(2):351-367

This paper considers the analysis of linear models where the response variable is a linear function of observable component variables. For example, scores on two or more psychometric measures (the component variables) might be weighted and summed to construct a single response variable in a psychological study. A linear model is then fit to the response variable. The question addressed in this paper is how to optimally transform the component variables so that the response is approximately normally distributed. The transformed component variables, themselves, need not be jointly normal. Two cases are considered; in both cases, the Box-Cox power family of transformations is employed. In Case I, the coefficients of the linear transformation are known constants. In Case II, the linear function is the first principal component based on the matrix of correlations among the transformed component variables. For each case, an algorithm is described for finding the transformation powers that minimize a generalized Anderson-Darling statistic. The proposed transformation procedure is compared to likelihood-based methods by means of simulation. The proposed method rarely performed worse than likelihood-based methods and for many data sets performed substantially better. As an illustration, the algorithm is applied to a problem from rural sociology and social psychology; namely scaling family residences along an urban-rural dimension. 相似文献

16.

Optimal inclusion probabilities for balanced sampling

G. Chauvet D. Bonnéry J.-C. Deville 《Journal of statistical planning and inference》2011,141(2):984-994

When auxiliary information is available at the design stage, samples may be selected by means of balanced sampling. The variance of the Horvitz-Thompson estimator is then reduced, since it is approximately given by that of the residuals of the variable of interest on the balancing variables. In this paper, a method for computing optimal inclusion probabilities for balanced sampling on given auxiliary variables is studied. We show that the method formerly suggested by Tillé and Favre (2005) enables the computation of inclusion probabilities that lead to a decrease in variance under some conditions on the set of balancing variables. A disadvantage is that the target optimal inclusion probabilities depend on the variable of interest. If the needed quantities are unknown at the design stage, we propose to use estimates instead (e.g., arising from a previous wave of the survey). A limited simulation study suggests that, under some conditions, our method performs better than the method of Tillé and Favre (2005). 相似文献

17.

Model selection for logistic regression via association rules analysis

Pannapa Changpetch Dennis K.J. Lin 《Journal of Statistical Computation and Simulation》2013,83(8):1415-1428

Interaction is very common in reality, but has received little attention in logistic regression literature. This is especially true for higher-order interactions. In conventional logistic regression, interactions are typically ignored. We propose a model selection procedure by implementing an association rules analysis. We do this by (1) exploring the combinations of input variables which have significant impacts to response (via association rules analysis); (2) selecting the potential (low- and high-order) interactions; (3) converting these potential interactions into new dummy variables; and (4) performing variable selections among all the input variables and the newly created dummy variables (interactions) to build up the optimal logistic regression model. Our model selection procedure establishes the optimal combination of main effects and potential interactions. The comparisons are made through thorough simulations. It is shown that the proposed method outperforms the existing methods in all cases. A real-life example is discussed in detail to demonstrate the proposed method. 相似文献

18.

An approximation to percentiles of a variable of the bivariate normal distribution when the other variable is truncated,with applications

Youn-Min Chou D. B Owen 《统计学通讯:理论与方法》2013,42(20):2535-2547

A Cornish-Fisher expansion is used to approximate the per-centiles of a variable of the bivariate normal distribution when the other variable is truncated. The expression is in terms of the bivariate cumulants of a singly truncated bivariate normal distribution. The percentiles are useful in the problem of personnel selection where we use a screening variable to screen applicants for employment and a correlated performance variable to screen employees for rehiring. This paper provides a bivariate cumulants table for determining the cutoff score of the performance variable. The following two problems are also con¬sidered: (1) determine the proportion of applicants who would have been successful had no screening been applied, and (2) determine the proportion of individuals being rejected byscreening who would have been successful had they been hired, The variable that is used to measure job performance and the variable that measures the outcome of an aptitude test are assumed to be jointly normally distributed with correlation ρ 相似文献

19.

Sure explained variability and independence screening

Min Chen Yimin Lian Zhengjun Zhang 《Journal of nonparametric statistics》2017,29(4):849-883

In the era of Big Data, extracting the most important exploratory variables available in ultrahigh-dimensional data plays a key role in scientific researches. Existing researches have been mainly focusing on applying the extracted exploratory variables to describe the central tendency of their related response variables. For a response variable, its variability characteristic is as much important as the central tendency in statistical inference. This paper focuses on the variability and proposes a new model-free feature screening approach: sure explained variability and independence screening (SEVIS). The core of SEVIS is to take the advantage of recently proposed asymmetric and nonlinear generalised measures of correlation in the screening. Under some mild conditions, the paper shows that SEVIS not only possesses desired sure screening property and ranking consistency property, but also is a computational convenient variable selection method to deal with ultrahigh-dimensional data sets with more features than observations. The superior performance of SEVIS, compared with existing model-free methods, is illustrated in extensive simulations. A real example in ultrahigh-dimensional variable selection demonstrates that the variables selected by SEVIS better explain not only the response variables, but also the variables selected by other methods. 相似文献

20.

The Relationship Between the T2 Statistic and the Influence Function

Robert L. Mason Youn-Min Chou John C. Young 《统计学通讯:理论与方法》2014,43(13):2844-2857

Hotelling's T² statistic has many applications in multivariate analysis. In particular, it can be used to measure the influence that a particular observation vector has on parameter estimation. For example, in the bivariate case, there exists a direct relationship between the ellipse generated using a T² statistic for individual observations and the hyperbolae generated using Hampel's influence function for the corresponding correlation coefficient. In this paper, we jointly use the components of an orthogonal decomposition of the T² statistic and some influence functions to identify outliers or influential observations. Since the conditional components in the T² statistic are related to the possible changes in the correlation between a variable and a group of other variables, we consider the theoretical influence functions of the correlations and multiple correlation coefficients. Finite-sample versions of these influence functions are used to find the estimated influence function values. 相似文献