期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Joint impact of multiple observations on a subset of variables in multiple linear regression

Sung H. Park Bum S. Lee Hyang S. Jung 《Journal of applied statistics》2005,32(3):207-219

In multiple linear regression analysis, each observation affects the fitted regression equation differently and has varying influences on the regression coefficients of the different variables. Chatterjee & Hadi (1988) have proposed some measures such as DSSE_ij (Impact on Residual Sum of Squares of simultaneously omitting the ith observation and the jth variable), F_j (Partial F-test for the jth variable) and F_j(i) (Partial F-test for the jth variable omitting the ith observation) to show the joint impact and the interrelationship that exists among a variable and an observation. In this paper we have proposed more extended form of those measures DSSE_IJ, F_J and F_J(I) to deal with the interrelationships that exist among the multiple observations and a subset of variables by monitoring the effects of the simultaneous omission of multiple variables and multiple observations. 相似文献

2.

How far from identifiability? A systematic overview of the statistical matching problem in a non parametric framework

Pier Luigi Conti Mauro Scanu 《统计学通讯:理论与方法》2017,46(2):967-994

Statistical matching consists in estimating the joint characteristics of two variables observed in two distinct and independent sample surveys, respectively. In a parametric setup, ranges of estimates for non identifiable parameters are the only estimable items, unless restrictive assumptions on the probabilistic relationship between the non jointly observed variables are imposed. These ranges correspond to the uncertainty due to the absence of joint observations on the pair of variables of interest. The aim of this paper is to analyze the uncertainty in statistical matching in a non parametric setting. A measure of uncertainty is introduced, and its properties studied: this measure studies the “intrinsic” association between the pair of variables, which is constant and equal to 1/6 whatever the form of the marginal distribution functions of the two variables when knowledge on the pair of variables is the only one available in the two samples. This measure becomes useful in the context of the reduction of uncertainty due to further knowledge than data themselves, as in the case of structural zeros. In this case the proposed measure detects how the introduction of further knowledge shrinks the intrinsic uncertainty from 1/6 to smaller values, zero being the case of no uncertainty. Sampling properties of the uncertainty measure and of the bounds of the uncertainty intervals are also proved. 相似文献

3.

Decomposition of the main effects and interaction term by using orthogonal polynomials in multiple non symmetrical correspondence analysis

Antonello D'Ambra Pietro Amenta Anna Crisci 《统计学通讯:理论与方法》2017,46(20):10179-10188

The multiple non symmetric correspondence analysis (MNSCA) is a useful technique for analyzing a two-way contingency table. In more complex cases, the predictor variables are more than one. In this paper, the MNSCA, along with the decomposition of the Gray–Williams Tau index, in main effects and interaction term, is used to analyze a contingency table with two predictor categorical variables and an ordinal response variable. The Multiple-Tau index is a measure of association that contains both main effects and interaction term. The main effects represent the change in the response variables due to the change in the level/categories of the predictor variables, considering the effects of their addition, while the interaction effect represents the combined effect of predictor categorical variables on the ordinal response variable. Moreover, for ordinal scale variables, we propose a further decomposition in order to check the existence of power components by using Emerson's orthogonal polynomials. 相似文献

4.

On asymptotic approximation of inverse moments for a class of nonnegative random variables

Aiting Shen 《Statistics》2013,47(6):1371-1379

Sung [On inverse moments for a class of nonnegative random variables. J Inequal Appl. 2010;2010:1–13. Article ID 823767, doi:10.1155/2010/823767] obtained the asymptotic approximation of inverse moments for a class of nonnegative random variables with finite second moments and satisfying a Rosenthal-type inequality. In the paper, we further study the asymptotic approximation of inverse moments for a class of nonnegative random variables with finite first moments, which generalizes and improves the corresponding ones of Wu et al. [Asymptotic approximation of inverse moments of nonnegative random variables. Statist Probab Lett. 2009;79:1366–1371], Wang et al. [Exponential inequalities and inverse moment for NOD sequence. Statist Probab Lett. 2010;80:452–461; On complete convergence for weighted sums of ? mixing random variables. J Inequal Appl. 2010;2010:1–13, Article ID 372390, doi:10.1155/2010/372390], Sung (2010) and Hu et al. [A note on the inverse moment for the nonnegative random variables. Commun Statist Theory Methods. 2012. Article ID 673677, doi:10.1080/03610926.2012.673677]. 相似文献

5.

Identification of the Direction of a Causal Effect by Instrumental Variables

Brendan Kline 《商业与经济统计学杂志》2016,34(2):176-184

This article provides a strategy to identify the existence and direction of a causal effect in a generalized nonparametric and nonseparable model identified by instrumental variables. The causal effect concerns how the outcome depends on the endogenous treatment variable. The outcome variable, treatment variable, other explanatory variables, and the instrumental variable can be essentially any combination of continuous, discrete, or “other” variables. In particular, it is not necessary to have any continuous variables, none of the variables need to have large support, and the instrument can be binary even if the corresponding endogenous treatment variable and/or outcome is continuous. The outcome can be mismeasured or interval-measured, and the endogenous treatment variable need not even be observed. The identification results are constructive, and can be empirically implemented using standard estimation results. 相似文献

6.

Exponential progressive step-stress life-testing with link function based on Box–Cox transformation

Tsai-Hung Fan Wan-Lun Wang N. Balakrishnan 《Journal of statistical planning and inference》2008,138(8):2340-2354

In order to quickly extract information on the life of a product, accelerated life-tests are usually employed. In this article, we discuss a k-stage step-stress accelerated life-test with M-stress variables when the underlying data are progressively Type-I group censored. The life-testing model assumed is an exponential distribution with a link function that relates the failure rate and the stress variables in a linear way under the Box–Cox transformation, and a cumulative exposure model for modelling the effect of stress changes. The classical maximum likelihood method as well as a fully Bayesian method based on the Markov chain Monte Carlo (MCMC) technique is developed for inference on all the parameters of this model. Numerical examples are presented to illustrate all the methods of inference developed here, and a comparison of the ML and Bayesian methods is also carried out. 相似文献

7.

Embedding latent class regression and latent class distal outcome models into cluster-weighted latent class analysis: a detailed simulation experiment

Roberto Di Mari Antonio Punzo Zsuzsa Bakk 《Australian & New Zealand Journal of Statistics》2023,65(3):213-233

Usually in latent class (LC) analysis, external predictors are taken to be cluster conditional probability predictors (LC models with external predictors), and/or score conditional probability predictors (LC regression models). In such cases, their distribution is not of interest. Class-specific distribution is of interest in the distal outcome model, when the distribution of the external variables is assumed to depend on LC membership. In this paper, we consider a more general formulation, that embeds both the LC regression and the distal outcome models, as is typically done in cluster-weighted modelling. This allows us to investigate (1) whether the distribution of the external variables differs across classes, (2) whether there are significant direct effects of the external variables on the indicators, by modelling jointly the relationship between the external and the latent variables. We show the advantages of the proposed modelling approach through a set of artificial examples, an extensive simulation study and an empirical application about psychological contracts among employees and employers in Belgium and the Netherlands. 相似文献

8.

Compositional Performance Evaluation with Importance Measures

Xibin Zhao Hongyan Dui Zhiqiang Cai Junbo Wang Xiaoyu Song 《统计学通讯:理论与方法》2013,42(24):5240-5253

Importance measures are used to identify weak components and/or states in a system based on the component state random variables, which seem to be inadequate to show the corresponding actual situations. By contrast, the performance random variables own significant practical meanings and eliminate the subjectivity and limitation of state division and definition in many actual situations. In this paper, instead of state random variables, the performance stochastic processes are used for modeling all the components and the entire system, and the integrated importance measure (IIM) for the performance random variables are extended. The generalized IIM evaluates the contribution of component performance to the desired level of system performance. A case study of an oil transmission system is used to illustrate the effectiveness of our approach with importance measures. 相似文献

9.

Bayesian Analysis of a Queueing System with a Long-Tailed Arrival Process

Pepa Ramírez Rosa E. Lillo Michael P. Wiper 《统计学通讯:模拟与计算》2013,42(4):697-712

Internet traffic data is characterized by some unusual statistical properties, in particular, the presence of heavy-tailed variables. A typical model for heavy-tailed distributions is the Pareto distribution although this is not adequate in many cases. In this article, we consider a mixture of two-parameter Pareto distributions as a model for heavy-tailed data and use a Bayesian approach based on the birth-death Markov chain Monte Carlo algorithm to fit this model. We estimate some measures of interest related to the queueing system k-Par/M/1 where k-Par denotes a mixture of k Pareto distributions. Heavy-tailed variables are difficult to model in such queueing systems because of the lack of a simple expression for the Laplace Transform (LT). We use a procedure based on recent LT approximating results for the Pareto/M/1 system. We illustrate our approach with both simulated and real data. 相似文献

10.

A generalized analysis of the dependence structure by means of ANOVA

Antonello D'Ambra Anna Crisci Pasquale Sarnacchiaro 《Journal of applied statistics》2015,42(10):2192-2202

The multiple non-symmetric correspondence analysis (MNSCA) is a useful technique for analysing the prediction of a categorical variable through two or more predictor variables placed in a contingency table. In MNSCA framework, for summarizing the predictability between criterion and predictor variables, the Multiple-TAU index has been proposed. But it cannot be used to test association, and for overcoming this limitation, a relationship with C-Statistic has been recommended. Multiple-TAU index is an overall measure of association that contains both main effects and interaction terms. The main effects represent the change in the response variables due to the change in the level/categories of the predictor variables, considering the effects of their addition. On the other hand, the interaction effect represents the combined effect of predictor variables on the response variable. In this paper, we propose a decomposition of the Multiple-TAU index in main effects and interaction terms. In order to show this decomposition, we consider an empirical case in which the relationship between the demographic characteristics of the American people, such as race, gender and location (column variables), and their propensity to move (row variable) to a new town to find a job is considered. 相似文献

11.

Doubly robust weighted composite quantile regression based on SCAD-L2

Zhimiao Cao Xiaoning Kang Mingqiu Wang 《Revue canadienne de statistique》2023,51(1):38-76

In this article, a robust variable selection procedure based on the weighted composite quantile regression (WCQR) is proposed. Compared with the composite quantile regression (CQR), WCQR is robust to heavy-tailed errors and outliers in the explanatory variables. For the choice of the weights in the WCQR, we employ a weighting scheme based on the principal component method. To select variables with grouping effect, we consider WCQR with SCAD-L₂ penalization. Furthermore, under some suitable assumptions, the theoretical properties, including the consistency and oracle property of the estimator, are established with a diverging number of parameters. In addition, we study the numerical performance of the proposed method in the case of ultrahigh-dimensional data. Simulation studies and real examples are provided to demonstrate the superiority of our method over the CQR method when there are outliers in the explanatory variables and/or the random error is from a heavy-tailed distribution. 相似文献

12.

Multistage sampling for latent variable models

Thomas DC 《Lifetime data analysis》2007,13(4):565-581

I consider the design of multistage sampling schemes for epidemiologic studies involving latent variable models, with surrogate measurements of the latent variables on a subset of subjects. Such models arise in various situations: when detailed exposure measurements are combined with variables that can be used to assign exposures to unmeasured subjects; when biomarkers are obtained to assess an unobserved pathophysiologic process; or when additional information is to be obtained on confounding or modifying variables. In such situations, it may be possible to stratify the subsample on data available for all subjects in the main study, such as outcomes, exposure predictors, or geographic locations. Three circumstances where analytic calculations of the optimal design are possible are considered: (i) when all variables are binary; (ii) when all are normally distributed; and (iii) when the latent variable and its measurement are normally distributed, but the outcome is binary. In each of these cases, it is often possible to considerably improve the cost efficiency of the design by appropriate selection of the sampling fractions. More complex situations arise when the data are spatially distributed: the spatial correlation can be exploited to improve exposure assignment for unmeasured locations using available measurements on neighboring locations; some approaches for informative selection of the measurement sample using location and/or exposure predictor data are considered. 相似文献

13.

Designing an economically optimal repetitive group-sampling plan based on loss functions

Mohammad Saber Fallah Nezhad Faeze Zahmatkesh Saredorahi 《统计学通讯:模拟与计算》2018,47(3):783-799

Acceptance sampling is a quality assurance tool, which provides a rule for the producer and the consumer to make acceptance or rejection decision about a lot. This paper attempts to develop a more efficient sampling plan, variables repetitive group sampling plan, based on the total loss to the producer and consumer. To design this model, two constraints are considered to satisfy the opposing priorities and requirements of the producer and the consumer by using Acceptable quality level (AQL) and Limiting quality level (LQL) points on operating characteristic (OC) curve. The objective function of this model is constructed based on the total expected loss. In order to illustrate the application of the proposed model, an example is presented. In addition, the effects of process parameters on the optimal solution and the total expected loss are studied by performing a sensitivity analysis. Finally, the efficiency of the proposed model is compared with the variables single sampling plan, the variables double sampling plan and the repetitive group sampling plan of Balamurali and Jun (2006) in terms of average sample number, total expected loss and its difference with ideal OC curve. 相似文献

14.

Parameterized multistate population dynamics and projections 总被引：1，自引：0，他引：1

Rogers A 《Journal of the American Statistical Association》1986,81(393):48-61

"This article reports progress on the development of a population projection process that emphasizes model selection over demographic accounting. Transparent multiregional/multistate population projections that rely on parameterized model schedules are illustrated [using data primarily from a number of developed countries, particularly Sweden], together with simple techniques that extrapolate the recent trends exhibited by the parameters of such schedules." The author notes that "the parameterized schedules condense the amount of demographic information, expressing it in a language and variables that are more readily understood by the users of the projections. In addition, they permit a concise specification of the expected temporal patterns of variation among these variables, and they allow a disaggregated focus on demographic change that otherwise would not be feasible." 相似文献

15.

On the Estimation Accuracy of Causal Effects using Supplementary Variables

下载免费PDF全文

Manabu Kuroki Takahiro Hayashi 《Scandinavian Journal of Statistics》2016,43(2):505-519

This paper focuses on a situation in which a set of treatments is associated with a response through a set of supplementary variables in linear models as well as discrete models. Under the situation, we demonstrate that the causal effect can be estimated more accurately from the set of supplementary variables. In addition, we show that the set of supplementary variables can include selection variables and proxy variables as well. Furthermore, we propose selection criteria for supplementary variables based on the estimation accuracy of causal effects. From graph structures based on our results, we can judge certain situations under which the causal effect can be estimated more accurately by supplementary variables and reliably evaluate the causal effects from observed data. 相似文献

16.

Chernoff distance for doubly truncated distributions

Chanchal Kundu 《统计学通讯:理论与方法》2017,46(21):10594-10606

In a recent paper, Nair et al. [Stat Pap 52:893–909, 2011] proposed Chernoff distance measure for left/right-truncated random variables and studied their properties in the context of reliability analysis. Here we extend the definition of Chernoff distance for doubly truncated distributions. This measure may help the information theorists and reliability analysts to study the various characteristics of a system/component when it fails between two time points. We study some properties of this measure and obtain its upper and lower bounds. We also study the interval Chernoff distance between the original and weighted distributions. These results generalize and enhance the related existing results that are developed based on Chernoff distance for one-sided truncated random variables. 相似文献

17.

Investigation about a screening step in model selection

Willi Sauerbrei Norbert Holländer Anika Buchholz 《Statistics and Computing》2008,18(2):195-208

In many studies a large number of variables is measured and the identification of relevant variables influencing an outcome is an important task. For variable selection several procedures are available. However, focusing on one model only neglects that there usually exist other equally appropriate models. Bayesian or frequentist model averaging approaches have been proposed to improve the development of a predictor. With a larger number of variables (say more than ten variables) the resulting class of models can be very large. For Bayesian model averaging Occam’s window is a popular approach to reduce the model space. As this approach may not eliminate any variables, a variable screening step was proposed for a frequentist model averaging procedure. Based on the results of selected models in bootstrap samples, variables are eliminated before deriving a model averaging predictor. As a simple alternative screening procedure backward elimination can be used. Through two examples and by means of simulation we investigate some properties of the screening step. In the simulation study we consider situations with fifteen and 25 variables, respectively, of which seven have an influence on the outcome. With the screening step most of the uninfluential variables will be eliminated, but also some variables with a weak effect. Variable screening leads to more applicable models without eliminating models, which are more strongly supported by the data. Furthermore, we give recommendations for important parameters of the screening step. 相似文献

18.

Extension of biplot methodology to multivariate regression analysis

Opeoluwa F. Oyedele 《Journal of applied statistics》2021,48(10):1816

At the core of multivariate statistics is the investigation of relationships between different sets of variables. More precisely, the inter-variable relationships and the causal relationships. The latter is a regression problem, where one set of variables is referred to as the response variables and the other set of variables as the predictor variables. In this situation, the effect of the predictors on the response variables is revealed through the regression coefficients. Results from the resulting regression analysis can be viewed graphically using the biplot. The consequential biplot provides a single graphical representation of the samples together with the predictor variables and response variables. In addition, their effect in terms of the regression coefficients can be visualized, although sub-optimally, in the said biplot.KEYWORDS: Biplot, regression analysis, multivariate regression, rank approximation 相似文献

19.

Coaching variables for regression and classification

ROBERT TIBSHIRANI GEOFFREY HINTON 《Statistics and Computing》1998,8(1):25-33

In a regression or classification setting where we wish to predict Y from x1,x2,..., xp, we suppose that an additional set of coaching variables z1,z2,..., zm are available in our training sample. These might be variables that are difficult to measure, and they will not be available when we predict Y from x1,x2,..., xp in the future. We consider two methods of making use of the coaching variables in order to improve the prediction of Y from x1,x2,..., xp. The relative merits of these approaches are discussed and compared in a number of examples. 相似文献

20.

Non‐stationary Cross‐Covariance Models for Multivariate Processes on a Globe

MIKYOUNG JUN 《Scandinavian Journal of Statistics》2011,38(4):726-747

Abstract. In geophysical and environmental problems, it is common to have multiple variables of interest measured at the same location and time. These multiple variables typically have dependence over space (and/or time). As a consequence, there is a growing interest in developing models for multivariate spatial processes, in particular, the cross‐covariance models. On the other hand, many data sets these days cover a large portion of the Earth such as satellite data, which require valid covariance models on a globe. We present a class of parametric covariance models for multivariate processes on a globe. The covariance models are flexible in capturing non‐stationarity in the data yet computationally feasible and require moderate numbers of parameters. We apply our covariance model to surface temperature and precipitation data from an NCAR climate model output. We compare our model to the multivariate version of the Matérn cross‐covariance function and models based on coregionalization and demonstrate the superior performance of our model in terms of AIC (and/or maximum loglikelihood values) and predictive skill. We also present some challenges in modelling the cross‐covariance structure of the temperature and precipitation data. Based on the fitted results using full data, we give the estimated cross‐correlation structure between the two variables. 相似文献