期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Heteroskedastic linear regression model with compositional response and covariates

Jiajia Chen Shengjia Li 《Journal of applied statistics》2018,45(12):2164-2181

Compositional data are known as a sort of complex multidimensional data with the feature that reflect the relative information rather than absolute information. There are a variety of models for regression analysis with compositional variables. Similar to the traditional regression analysis, the heteroskedasticity still exists in these models. However, the existing heteroskedastic regression analysis methods cannot apply in these models with compositional error term. In this paper, we mainly study the heteroskedastic linear regression model with compositional response and covariates. The parameter estimator is obtained through weighted least squares method. For the hypothesis test of parameter, the test statistic is based on the original least squares estimator and corresponding heteroskedasticity-consistent covariance matrix estimator. When the proposed method is applied to both simulation and real example, we use the original least squares method as a comparison during the whole process. The results implicate the model's practicality and effectiveness in regression analysis with heteroskedasticity. 相似文献

2.

Multiple linear regression with compositional response and covariates

Jiajia Chen Shengjia Li 《Journal of applied statistics》2017,44(12):2270-2285

相似文献

3.

Partial least squares regression with compositional response variables and covariates

Jiajia Chen Xiaoqin Zhang Karel Hron 《Journal of applied statistics》2021,48(16):3130

相似文献

4.

A robust Parafac model for compositional data

M. A. Di Palma P. Filzmoser M. Gallo K. Hron 《Journal of applied statistics》2018,45(8):1347-1369

Compositional data are characterized by values containing relative information, and thus the ratios between the data values are of interest for the analysis. Due to specific features of compositional data, standard statistical methods should be applied to compositions expressed in a proper coordinate system with respect to an orthonormal basis. It is discussed how three-way compositional data can be analyzed with the Parafac model. When data are contaminated by outliers, robust estimates for the Parafac model parameters should be employed. It is demonstrated how robust estimation can be done in the context of compositional data and how the results can be interpreted. A real data example from macroeconomics underlines the usefulness of this approach. 相似文献

5.

Standard and robust orthogonal regression

Larry Ammann John Van Ness 《统计学通讯:模拟与计算》2013,42(1):145-162

A fast routine for converting regression algorithms into corresponding orthogonal regression (OR) algorithms was introduced in Ammann and Van Ness (1988). The present paper discusses the properties of various ordinary and robust OR procedures created using this routine. OR minimizes the sum of the orthogonal distances from the regression plane to the data points. OR has three types of applications. First, L ₂ OR is the maximum likelihood solution of the Gaussian errors-in-variables (EV) regression problem. This L ₂ solution is unstable, thus the robust OR algorithms created from robust regression algorithms should prove very useful. Secondly, OR is intimately related to principal components analysis. Therefore, the routine can also be used to create L ₁, robust, etc. principal components algorithms. Thirdly, OR treats the x and y variables symmetrically which is important in many modeling problems. Using Monte Carlo studies this paper compares the performance of standard regression, robust regression, OR, and robust OR on Gaussian EV data, contaminated Gaussian EV data, heavy-tailed EV data, and contaminated heavy-tailed EV data. 相似文献

6.

Regression imputation with Q-mode clustering for rounded zero replacement in high-dimensional compositional data

Jiajia Chen Karel Hron Matthias Templ Shengjia Li 《Journal of applied statistics》2018,45(11):2067-2080

The logratio methodology is not applicable when rounded zeros occur in compositional data. There are many methods to deal with rounded zeros. However, some methods are not suitable for analyzing data sets with high dimensionality. Recently, related methods have been developed, but they cannot balance the calculation time and accuracy. For further improvement, we propose a method based on regression imputation with Q-mode clustering. This method forms the groups of parts and builds partial least squares regression with these groups using centered logratio coordinates. We also prove that using centered logratio coordinates or isometric logratio coordinates in the response of partial least squares regression have the equivalent results for the replacement of rounded zeros. Simulation study and real example are conducted to analyze the performance of the proposed method. The results show that the proposed method can reduce the calculation time in higher dimensions and improve the quality of results. 相似文献

7.

Total least squares solution for compositional data using linear models

Eva Fišerová 《Journal of applied statistics》2010,37(7):1137-1152

The restrictive properties of compositional data, that is multivariate data with positive parts that carry only relative information in their components, call for special care to be taken while performing standard statistical methods, for example, regression analysis. Among the special methods suitable for handling this problem is the total least squares procedure (TLS, orthogonal regression, regression with errors in variables, calibration problem), performed after an appropriate log-ratio transformation. The difficulty or even impossibility of deeper statistical analysis (confidence regions, hypotheses testing) using the standard TLS techniques can be overcome by calibration solution based on linear regression. This approach can be combined with standard statistical inference, for example, confidence and prediction regions and bounds, hypotheses testing, etc., suitable for interpretation of results. Here, we deal with the simplest TLS problem where we assume a linear relationship between two errorless measurements of the same object (substance, quantity). We propose an iterative algorithm for estimating the calibration line and also give confidence ellipses for the location of unknown errorless results of measurement. Moreover, illustrative examples from the fields of geology, geochemistry and medicine are included. It is shown that the iterative algorithm converges to the same values as those obtained using the standard TLS techniques. Fitted lines and confidence regions are presented for both original and transformed compositional data. The paper contains basic principles of linear models and addresses many related problems. 相似文献

8.

Bayesian quantile regression for ordinal longitudinal data

Rahim Alhamzawi Haithem Taha Mohammad Ali 《Journal of applied statistics》2018,45(5):815-828

Since the pioneering work by Koenker and Bassett [27], quantile regression models and its applications have become increasingly popular and important for research in many areas. In this paper, a random effects ordinal quantile regression model is proposed for analysis of longitudinal data with ordinal outcome of interest. An efficient Gibbs sampling algorithm was derived for fitting the model to the data based on a location-scale mixture representation of the skewed double-exponential distribution. The proposed approach is illustrated using simulated data and a real data example. This is the first work to discuss quantile regression for analysis of longitudinal data with ordinal outcome. 相似文献

9.

Interval-valued data regression using partial linear model

Yuan Wei Huiwen Wang 《Journal of Statistical Computation and Simulation》2017,87(16):3175-3194

Semi-parametric modelling of interval-valued data is of great practical importance, as exampled by applications in economic and financial data analysis. We propose a flexible semi-parametric modelling of interval-valued data by integrating the partial linear regression model based on the Center & Range method, and investigate its estimation procedure. Furthermore, we introduce a test statistic that allows one to decide between a parametric linear model and a semi-parametric model, and approximate its null asymptotic distribution based on wild Bootstrap method to obtain the critical values. Extensive simulation studies are carried out to evaluate the performance of the proposed methodology and the new test. Moreover, several empirical data sets are analysed to document its practical applications. 相似文献

10.

A fuzzy robust regression approach applied to bedload transport data

Jalal Chachi 《统计学通讯:模拟与计算》2017,46(3):1703-1714

Fuzzy least-square regression can be very sensitive to unusual data (e.g., outliers). In this article, we describe how to fit an alternative robust-regression estimator in fuzzy environment, which attempts to identify and ignore unusual data. The proposed approach concerns classical robust regression and estimation methods that are insensitive to outliers. In this regard, based on the least trimmed square estimation method, an estimation procedure is proposed for determining the coefficients of the fuzzy regression model for crisp input-fuzzy output data. The investigated fuzzy regression model is applied to bedload transport data forecasting suspended load by discharge based on a real world data. The accuracy of the proposed method is compared with the well-known fuzzy least-square regression model. The comparison results reveal that the fuzzy robust regression model performs better than the other models in suspended load estimation for the particular dataset. This comparison is done based on a similarity measure between fuzzy sets. The proposed model is general and can be used for modeling natural phenomena whose available observations are reported as imprecise rather than crisp. 相似文献

11.

Variable selection for semiparametric errors-in-variables regression model with longitudinal data

《Journal of Statistical Computation and Simulation》2012,82(8):1654-1669

In this paper, we focus on the variable selection for the semiparametric regression model with longitudinal data when some covariates are measured with errors. A new bias-corrected variable selection procedure is proposed based on the combination of the quadratic inference functions and shrinkage estimations. With appropriate selection of the tuning parameters, we establish the consistency and asymptotic normality of the resulting estimators. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedure. We further illustrate the proposed procedure with an application. 相似文献

12.

Using compositional and Dirichlet models for market share regression

Joanna Morais Christine Thomas-Agnan Michel Simioni 《Journal of applied statistics》2018,45(9):1670-1689

When the aim is to model market shares, the marketing literature proposes some regression models which can be qualified as attraction models. They are generally derived from an aggregated version of the multinomial logit model. But aggregated multinomial logit models (MNL) and the so-called generalized multiplicative competitive interaction models (GMCI) present some limitations: in their simpler version they do not specify brand-specific and cross effect parameters. In this paper, we consider alternative models: the Dirichlet model (DIR) and the compositional model (CODA). DIR allows to introduce brand-specific parameters and CODA allows additionally to consider cross effect parameters. We show that these two models can be written in a similar fashion, called attraction form, as the MNL and the GMCI models. As market share models are usually interpreted in terms of elasticities, we also use this notion to interpret the DIR and CODA models. We compare the properties of the models in order to explain why CODA and DIR models can outperform traditional market share models. An application to the automobile market is presented where we model brands market shares as a function of media investments, controlling for the brands price and scrapping incentive. We compare the quality of the models using measures adapted to shares. 相似文献

13.

Bayesian quantile regression for longitudinal data models

《Journal of Statistical Computation and Simulation》2012,82(11):1635-1649

In this paper, we discuss a fully Bayesian quantile inference using Markov Chain Monte Carlo (MCMC) method for longitudinal data models with random effects. Under the assumption of error term subject to asymmetric Laplace distribution, we establish a hierarchical Bayesian model and obtain the posterior distribution of unknown parameters at τ-th level. We overcome the current computational limitations using two approaches. One is the general MCMC technique with Metropolis–Hastings algorithm and another is the Gibbs sampling from the full conditional distribution. These two methods outperform the traditional frequentist methods under a wide array of simulated data models and are flexible enough to easily accommodate changes in the number of random effects and in their assumed distribution. We apply the Gibbs sampling method to analyse a mouse growth data and some different conclusions from those in the literatures are obtained. 相似文献

14.

Efficient regression modeling for correlated and overdispersed count data

《统计学通讯:理论与方法》2012,41(24):6005-6018

Abstract

The objective of this paper is to propose an efficient estimation procedure in a marginal mean regression model for longitudinal count data and to develop a hypothesis test for detecting the presence of overdispersion. We extend the matrix expansion idea of quadratic inference functions to the negative binomial regression framework that entails accommodating both the within-subject correlation and overdispersion issue. Theoretical and numerical results show that the proposed procedure yields a more efficient estimator asymptotically than the one ignoring either the within-subject correlation or overdispersion. When the overdispersion is absent in data, the proposed method might hinder the estimation efficiency in practice, yet the Poisson regression based regression model is fitted to the data sufficiently well. Therefore, we construct the hypothesis test that recommends an appropriate model for the analysis of the correlated count data. Extensive simulation studies indicate that the proposed test can identify the effective model consistently. The proposed procedure is also applied to a transportation safety study and recommends the proposed negative binomial regression model. 相似文献

15.

On simultaneous confidence intervals based on rank-estimates with application to analysis of gene expression data

Hossein Mansouri Bo Li 《统计学通讯:理论与方法》2013,42(17):4339-4349

Abstract

Inferential methods based on ranks present robust and powerful alternative methodology for testing and estimation. In this article, two objectives are followed. First, develop a general method of simultaneous confidence intervals based on the rank estimates of the parameters of a general linear model and derive the asymptotic distribution of the pivotal quantity. Second, extend the method to high dimensional data such as gene expression data for which the usual large sample approximation does not apply. It is common in practice to use the asymptotic distribution to make inference for small samples. The empirical investigation in this article shows that for methods based on the rank-estimates, this approach does not produce a viable inference and should be avoided. A method based on the bootstrap is outlined and it is shown to provide a reliable and accurate method of constructing simultaneous confidence intervals based on rank estimates. In particular it is shown that commonly applied methods of normal or t-approximation are not satisfactory, particularly for large-scale inferences. Methods based on ranks are uniquely suitable for analysis of microarray gene expression data since they often involve large scale inferences based on small samples containing a large number of outliers and violate the assumption of normality. A real microarray data is analyzed using the rank-estimate simultaneous confidence intervals. Viability of the proposed method is assessed through a Monte Carlo simulation study under varied assumptions. 相似文献

16.

Estimation of regression parameters in missing data problems

Donald L. Mcleish Cyntha A. Struthers 《Revue canadienne de statistique》2006,34(2):233-259

Let Y be a response variable, possibly multivariate, with a density function f (y|x, v; β) conditional on vectors x and v of covariates and a vector β of unknown parameters. The authors consider the problem of estimating β when the values taken by the covariate vector v are available for all observations while some of those taken by the covariate x are missing at random. They compare the profile estimator to several alternatives, both in terms of bias and standard deviation, when the response and covariates are discrete or continuous. 相似文献

17.

Assessing goodness-of-fit of categorical regression models based on case-control data

Biao Zhang 《Australian & New Zealand Journal of Statistics》2004,46(3):407-423

Demonstrated equivalence between a categorical regression model based on case‐control data and an I‐sample semiparametric selection bias model leads to a new goodness‐of‐fit test. The proposed test statistic is an extension of an existing Kolmogorov–Smirnov‐type statistic and is the weighted average of the absolute differences between two estimated distribution functions in each response category. The paper establishes an optimal property for the maximum semiparametric likelihood estimator of the parameters in the I‐sample semiparametric selection bias model. It also presents a bootstrap procedure, some simulation results and an analysis of two real datasets. 相似文献

18.

A new regression model for bimodal data and applications in agriculture

Julio Cezar Souza Vasconcelos Gauss Moutinho Cordeiro Edwin Moises Marcos Ortega dila Maria de Rezende 《Journal of applied statistics》2021,48(2):349

We define the odd log-logistic exponential Gaussian regression with two systematic components, which extends the heteroscedastic Gaussian regression and it is suitable for bimodal data quite common in the agriculture area. We estimate the parameters by the method of maximum likelihood. Some simulations indicate that the maximum-likelihood estimators are accurate. The model assumptions are checked through case deletion and quantile residuals. The usefulness of the new regression model is illustrated by means of three real data sets in different areas of agriculture, where the data present bimodality. 相似文献

19.

Isotonic regression for metallic microstructure data: estimation and testing under order restrictions

Martina Vittorietti Javier Hidalgo Jilt Sietsma Wei Li Geurt Jongbloed 《Journal of applied statistics》2022,49(9):2208

相似文献

20.

Estimation of exponential regression parameters using binary data

K.F. Cheng J.W. Wu 《统计学通讯:理论与方法》2013,42(8):2203-2214

Exponential regression model is important in analyzing data from heterogeneous populations. In this paper we propose a simple method to estimate the regression parameters using binary data. Under certain design distributions, including ellipticaily symmetric distributions, for the explanatory variables, the estimators are shown to be consistent and asymptotically normal when sample size is large. For finite samples, the new estimates were shown to behave reasonably well. They are competitive with the maximum likelihood estimates and more importantly, according to our simulation results, the cost of CPU time for computing new estimates is only 1/7 of that required for computing the usual maximum likelihood estimates. We expect the savings in CPU time would be more dramatic with larger dimension of the regression parameter space. 相似文献