期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Zero-inflated beta distribution for modeling the proportions in quantitative fatty acid signature analysis

Connie Stewart 《Journal of applied statistics》2013,40(5):985-992

Quantitative fatty acid signature analysis (QFASA) produces diet estimates containing the proportion of each species of prey in a predator's diet. Since the diet estimates are compositional, often contain an abundance of zeros (signifying the absence of a species in the diet), and samples sizes are generally small, inference problems require the use of nonstandard statistical methodology. Recently, a mixture distribution involving the multiplicative logistic normal distribution (and its skew-normal extension) was introduced in relation to QFASA to manage the problematic zeros. In this paper, we examine an alternative mixture distribution, namely, the recently proposed zero-inflated beta (ZIB) distribution. A potential advantage of using the ZIB distribution over the previously considered mixture models is that it does not require transformation of the data. To assess the usefulness of the ZIB distribution in QFASA inference problems, a simulation study is first carried out which compares the small sample properties of the maximum likelihood estimators of the means. The fit of the distributions is then examined using ‘pseudo-predators’ generated from a large real-life prey base. Finally, confidence intervals for the true diet based on the ZIB distribution are compared with earlier results through a simulation study and harbor seal data. 相似文献

2.

Exploratory tools for outlier detection in compositional data with structural zeros

M. Templ K. Hron P. Filzmoser 《Journal of applied statistics》2017,44(4):734-752

The analysis of compositional data using the log-ratio approach is based on ratios between the compositional parts. Zeros in the parts thus cause serious difficulties for the analysis. This is a particular problem in case of structural zeros, which cannot be simply replaced by a non-zero value as it is done, e.g. for values below detection limit or missing values. Instead, zeros to be incorporated into further statistical processing. The focus is on exploratory tools for identifying outliers in compositional data sets with structural zeros. For this purpose, Mahalanobis distances are estimated, computed either directly for subcompositions determined by their zero patterns, or by using imputation to improve the efficiency of the estimates, and then proceed to the subcompositional and subgroup level. For this approach, new theory is formulated that allows to estimate covariances for imputed compositional data and to apply estimations on subgroups using parts of this covariance matrix. Moreover, the zero pattern structure is analyzed using principal component analysis for binary data to achieve a comprehensive view of the overall multivariate data structure. The proposed tools are applied to larger compositional data sets from official statistics, where the need for an appropriate treatment of zeros is obvious. 相似文献

3.

Regression imputation with Q-mode clustering for rounded zero replacement in high-dimensional compositional data

Jiajia Chen Karel Hron Matthias Templ Shengjia Li 《Journal of applied statistics》2018,45(11):2067-2080

The logratio methodology is not applicable when rounded zeros occur in compositional data. There are many methods to deal with rounded zeros. However, some methods are not suitable for analyzing data sets with high dimensionality. Recently, related methods have been developed, but they cannot balance the calculation time and accuracy. For further improvement, we propose a method based on regression imputation with Q-mode clustering. This method forms the groups of parts and builds partial least squares regression with these groups using centered logratio coordinates. We also prove that using centered logratio coordinates or isometric logratio coordinates in the response of partial least squares regression have the equivalent results for the replacement of rounded zeros. Simulation study and real example are conducted to analyze the performance of the proposed method. The results show that the proposed method can reduce the calculation time in higher dimensions and improve the quality of results. 相似文献

4.

Time use of married couples: Bayesian approach

Hikaru Hasegawa 《Journal of applied statistics》2019,46(14):2649-2665

ABSTRACT

The living hours data of individuals' time spent on daily activities are compositional and include many zeros because individuals do not pursue all activities every day. Thus, we should exercise caution in using such data for empirical analyses. The Bayesian method offers several advantages in analyzing compositional data. In this study, we analyze the time allocation of Japanese married couples using the Bayesian model. Based on the Bayes factors, we compare models that consider and do not consider the correlations between married couples' time use data. The model that considers the correlation shows superior performance. We show that the Bayesian method can adequately take into account the correlations of wives' and husbands' living hours, facilitating the calculation of partial effects that their activities' variables have on living hours. The partial effects of the model that considers the correlations between the couples' time use are easily calculated from the posterior results. 相似文献

5.

Estimating and Testing Nonlinear Local Dependence Between Two Time Series

Virginia Lacal Dag Tjøstheim 《商业与经济统计学杂志》2013,31(4):648-660

ABSTRACT

The most common measure of dependence between two time series is the cross-correlation function. This measure gives a complete characterization of dependence for two linear and jointly Gaussian time series, but it often fails for nonlinear and non-Gaussian time series models, such as the ARCH-type models used in finance. The cross-correlation function is a global measure of dependence. In this article, we apply to bivariate time series the nonlinear local measure of dependence called local Gaussian correlation. It generally works well also for nonlinear models, and it can distinguish between positive and negative local dependence. We construct confidence intervals for the local Gaussian correlation and develop a test based on this measure of dependence. Asymptotic properties are derived for the parameter estimates, for the test functional and for a block bootstrap procedure. For both simulated and financial index data, we construct confidence intervals and we compare the proposed test with one based on the ordinary correlation and with one based on the Brownian distance correlation. Financial indexes are examined over a long time period and their local joint behavior, including tail behavior, is analyzed prior to, during and after the financial crisis. Supplementary material for this article is available online. 相似文献

6.

Testing for zero-inflation in count series: application to occupational health

Y. Zhao V. Burke K. K.W. Yau 《Journal of applied statistics》2009,36(12):1353-1359

Count data series with extra zeros relative to a Poisson distribution are common in many biomedical applications. A score test is presented to assess whether the zero-inflation problem is significant to warrant the analysis by the more complex zero-inflated Poisson autoregression model. The score test is implemented as a computer program in the Splus platform. For illustration, the test procedure is applied to a workplace injury series where many zero counts are observed due to the heterogeneity in injury risk and the dynamic population involved. 相似文献

7.

Model distances for vine copulas in high dimensions

Matthias Killiches Daniel Kraus Claudia Czado 《Statistics and Computing》2018,28(2):323-341

Vine copulas are a flexible class of dependence models consisting of bivariate building blocks and have proven to be particularly useful in high dimensions. Classical model distance measures require multivariate integration and thus suffer from the curse of dimensionality. In this paper, we provide numerically tractable methods to measure the distance between two vine copulas even in high dimensions. For this purpose, we consecutively develop three new distance measures based on the Kullback–Leibler distance, using the result that it can be expressed as the sum over expectations of KL distances between univariate conditional densities, which can be easily obtained for vine copulas. To reduce numerical calculations, we approximate these expectations on adequately designed grids, outperforming Monte Carlo integration with respect to computational time. For the sake of interpretability, we provide a baseline calibration for the proposed distance measures. We further develop similar substitutes for the Jeffreys distance, a symmetrized version of the Kullback–Leibler distance. In numerous examples and applications, we illustrate the strengths and weaknesses of the developed distance measures. 相似文献

8.

Detection and Estimation of Block Structure in Spatial Weight Matrix 总被引：1，自引：1，他引：0

Clifford Lam 《Econometric Reviews》2016,35(8-10):1347-1376

In many economic applications, it is often of interest to categorize, classify, or label individuals by groups based on similarity of observed behavior. We propose a method that captures group affiliation or, equivalently, estimates the block structure of a neighboring matrix embedded in a Spatial Econometric model. The main results of the Least Absolute Shrinkage and Selection Operator (Lasso) estimator shows that off-diagonal block elements are estimated as zeros with high probability, property defined as “zero-block consistency.” Furthermore, we present and prove zero-block consistency for the estimated spatial weight matrix even under a thin margin of interaction between groups. The tool developed in this article can be used as a verification of block structure by applied researchers, or as an exploration tool for estimating unknown block structures. We analyzed the U.S. Senate voting data and correctly identified blocks based on party affiliations. Simulations also show that the method performs well. 相似文献

9.

Tests of multinormality based on location vectors and scatter matrices

Annaliisa Kankainen Sara Taskinen Hannu Oja 《Statistical Methods and Applications》2007,16(3):357-379

Classical univariate measures of asymmetry such as Pearson’s (mean-median)/σ or (mean-mode)/σ often measure the standardized distance between two separate location parameters and have been widely used in assessing univariate normality. Similarly, measures of univariate kurtosis are often just ratios of two scale measures. The classical standardized fourth moment and the ratio of the mean deviation to the standard deviation serve as examples. In this paper we consider tests of multinormality which are based on the Mahalanobis distance between two multivariate location vector estimates or on the (matrix) distance between two scatter matrix estimates, respectively. Asymptotic theory is developed to provide approximate null distributions as well as to consider asymptotic efficiencies. Limiting Pitman efficiencies for contiguous sequences of contaminated normal distributions are calculated and the efficiencies are compared to those of the classical tests by Mardia. Simulations are used to compare finite sample efficiencies. The theory is also illustrated by an example. 相似文献

10.

Prediction of disease status: A regressive model approach for repeated measures

M. Ataharul Islam Rafiqul I. Chowdhury 《Statistical Methodology》2010,7(5):520-540

In this paper, regressive models are proposed for modeling a sequence of transitions in longitudinal data. These models are employed to predict the future status of the outcome variable of the individuals on the basis of their underlying background characteristics or risk factors. The estimation of parameters and also estimates of conditional and unconditional probabilities are shown for repeated measures. The goodness of fit tests are extended in this paper on the basis of the deviance and the Hosmer–Lemeshow procedures and generalized to repeated measures. In addition, to measure the suitability of the proposed models for predicting the disease status, we have extended the ROC curve approach to repeated measures. The procedure is shown for the conditional models for any order as well as for the unconditional model, to predict the outcome at the end of the study. The test procedures are also suggested. For testing the differences between areas under the ROC curves in subsequent follow-ups, two different test procedures are employed, one of which is based on permutation test. In this paper, an unconditional model is proposed on the basis of conditional models for the disease progression of depression among the elderly population in the USA on the basis of the Health and Retirement Survey data collected longitudinally. The illustration shows that the disease progression observed conditionally can be employed to predict the outcome and the role of selected variables and the previous outcomes can be utilized for predictive purposes. The results show that the percentage of correct predictions of a disease is quite high and the measures of sensitivity and specificity are also reasonably impressive. The extended measures of area under the ROC curve show that the models provide a reasonably good fit in terms of predicting the disease status during a long period of time. This procedure will have extensive applications in the field of longitudinal data analysis where the objective is to obtain estimates of unconditional probabilities on the basis of series of conditional transitional models. 相似文献

11.

Heteroskedastic linear regression model with compositional response and covariates

Jiajia Chen Shengjia Li 《Journal of applied statistics》2018,45(12):2164-2181

Compositional data are known as a sort of complex multidimensional data with the feature that reflect the relative information rather than absolute information. There are a variety of models for regression analysis with compositional variables. Similar to the traditional regression analysis, the heteroskedasticity still exists in these models. However, the existing heteroskedastic regression analysis methods cannot apply in these models with compositional error term. In this paper, we mainly study the heteroskedastic linear regression model with compositional response and covariates. The parameter estimator is obtained through weighted least squares method. For the hypothesis test of parameter, the test statistic is based on the original least squares estimator and corresponding heteroskedasticity-consistent covariance matrix estimator. When the proposed method is applied to both simulation and real example, we use the original least squares method as a comparison during the whole process. The results implicate the model's practicality and effectiveness in regression analysis with heteroskedasticity. 相似文献

12.

稳健高效的高维成分数据近似零值插补方法及应用

熊巍等《统计研究》2020,37(5):104-116

随着计算机技术的迅猛发展,高维成分数据不断涌现并伴有大量近似零值和缺失,数据的高维特性不仅给传统统计方法带来了巨大的挑战,其厚尾特征、复杂的协方差结构也使得理论分析难上加难。于是如何对高维成分数据的近似零值进行稳健的插补,挖掘潜在的内蕴结构成为当今学者研究的焦点。对此,本文结合修正的EM算法,提出基于R型聚类的Lasso-分位回归插补法(SubLQR)对高维成分数据的近似零值问题予以解决。与现有高维近似零值插补方法相比,本文所提出的SubLQR具有如下优势。①稳健全面性:利用Lasso-分位回归方法,不仅可以有效地探测到响应变量的整个条件分布,还能提供更加真实的高维稀疏模式;②有效准确性:采用基于R型聚类的思想进行插补,可以降低计算复杂度,极大提高插补的精度。模拟研究证实,本文提出的SubLQR高效灵活准确,特别在零值、异常值较多的情形更具优势。最后将SubLQR方法应用于罕见病代谢组学研究中,进一步表明本文所提出的方法具有广泛的适用性。相似文献

13.

Constructing brain connectivity group graphs from EEG time series

A. T. Walden L. Zhuang 《Journal of applied statistics》2019,46(6):1107-1128

Graphical analysis of complex brain networks is a fundamental area of modern neuroscience. Functional connectivity is important since many neurological and psychiatric disorders, including schizophrenia, are described as ‘dys-connectivity’ syndromes. Using electroencephalogram time series collected on each of a group of 15 individuals with a common medical diagnosis of positive syndrome schizophrenia we seek to build a single, representative, brain functional connectivity group graph. Disparity/distance measures between spectral matrices are identified and used to define the normalized graph Laplacian enabling clustering of the spectral matrices for detecting ‘outlying’ individuals. Two such individuals are identified. For each remaining individual, we derive a test for each edge in the connectivity graph based on average estimated partial coherence over frequencies, and associated p-values are found. For each edge these are used in a multiple hypothesis test across individuals and the proportion rejecting the hypothesis of no edge is used to construct a connectivity group graph. This study provides a framework for integrating results on multiple individuals into a single overall connectivity structure. 相似文献

14.

Colours and Cocktails: Compositional Data Analysis 2013 Lancaster Lecture 总被引：1，自引：0，他引：1

J. L. Scealy A. H. Welsh 《Australian & New Zealand Journal of Statistics》2014,56(2):145-169

The different constituents of physical mixtures such as coloured paint, cocktails, geological and other samples can be represented by d‐dimensional vectors called compositions with non‐negative components that sum to one. Data in which the observations are compositions are called compositional data. There are a number of different ways of thinking about and consequently analysing compositional data. The log‐ratio methods proposed by Aitchison in the 1980s have become the dominant methods in the field. One reason for this is the development of normative arguments converting the properties of log‐ratio methods to ‘essential requirements’ or Principles for any method of analysis to satisfy. We discuss different ways of thinking about compositional data and interpret the development of the Principles in terms of these different viewpoints. We illustrate the properties on which the Principles are based, focussing particularly on the key subcompositional coherence property. We show that this Principle is based on implicit assumptions and beliefs that do not always hold. Moreover, it is applied selectively because it is not actually satisfied by the log‐ratio methods it is intended to justify. This implies that a more open statistical approach to compositional data analysis should be adopted. 相似文献

15.

On a measure of information gain for regression models in survival analysis

Delphine Maucort-Boulch Pascal Roy Janez Stare 《Journal of applied statistics》2014,41(12):2696-2708

Papers dealing with measures of predictive power in survival analysis have seen their independence of censoring, or their estimates being unbiased under censoring, as the most important property. We argue that this property has been wrongly understood. Discussing the so-called measure of information gain, we point out that we cannot have unbiased estimates if all values, greater than a given time τ, are censored. This is due to the fact that censoring before τ has a different effect than censoring after τ. Such τ is often introduced by design of a study. Independence can only be achieved under the assumption of the model being valid after τ, which is impossible to verify. But if one is willing to make such an assumption, we suggest using multiple imputation to obtain a consistent estimate. We further show that censoring has different effects on the estimation of the measure for the Cox model than for parametric models, and we discuss them separately. We also give some warnings about the usage of the measure, especially when it comes to comparing essentially different models. 相似文献

16.

A score test for extra zeros in negative binomial mixed models

《Journal of Statistical Computation and Simulation》2012,82(5):635-644

The negative binomial (NB)-mixed regression in many situations is more appropriate for analysing the correlated and over-dispersed count data. In this paper, a score test for assessing extra zeros against the NB-mixed regression in the correlated count data with excess zeros is developed. The sampling distribution and power of the score test statistic is evaluated using a simulation study. The results show that under a wide range of conditions, the score statistic performs satisfactorily. Finally, the use of the score test is illustrated on DMFT index data of children aged 12 years old. 相似文献

17.

Recognizing and visualizing departures from independence in bivariate data using local Gaussian correlation

Geir Drage Berentsen Dag Tjøstheim 《Statistics and Computing》2014,24(5):785-801

It is well known that the traditional Pearson correlation in many cases fails to capture non-linear dependence structures in bivariate data. Other scalar measures capable of capturing non-linear dependence exist. A common disadvantage of such measures, however, is that they cannot distinguish between negative and positive dependence, and typically the alternative hypothesis of the accompanying test of independence is simply “dependence”. This paper discusses how a newly developed local dependence measure, the local Gaussian correlation, can be used to construct local and global tests of independence. A global measure of dependence is constructed by aggregating local Gaussian correlation on subsets of \(\mathbb{R}^{2}\) , and an accompanying test of independence is proposed. Choice of bandwidth is based on likelihood cross-validation. Properties of this measure and asymptotics of the corresponding estimate are discussed. A bootstrap version of the test is implemented and tried out on both real and simulated data. The performance of the proposed test is compared to the Brownian distance covariance test. Finally, when the hypothesis of independence is rejected, local independence tests are used to investigate the cause of the rejection. 相似文献

18.

Quantile-adaptive variable screening in ultra-high dimensional varying coefficient models

Junying Zhang Zhiping Lu 《Journal of applied statistics》2016,43(4):643-654

The varying-coefficient model is an important nonparametric statistical model since it allows appreciable flexibility on the structure of fitted model. For ultra-high dimensional heterogeneous data it is very necessary to examine how the effects of covariates vary with exposure variables at different quantile level of interest. In this paper, we extended the marginal screening methods to examine and select variables by ranking a measure of nonparametric marginal contributions of each covariate given the exposure variable. Spline approximations are employed to model marginal effects and select the set of active variables in quantile-adaptive framework. This ensures the sure screening property in quantile-adaptive varying-coefficient model. Numerical studies demonstrate that the proposed procedure works well for heteroscedastic data. 相似文献

19.

Nonparametric Regression as an Example of Model Choice

Laurie Davies Henrike Weinert 《统计学通讯:模拟与计算》2013,42(2):274-289

Nonparametric regression can be considered as a problem of model choice. In this article, we present the results of a simulation study in which several nonparametric regression techniques including wavelets and kernel methods are compared with respect to their behavior on different test beds. We also include the taut-string method whose aim is not to minimize the distance of an estimator to some “true” generating function f but to provide a simple adequate approximation to the data. Test beds are situations where a “true” generating f exists and in this situation it is possible to compare the estimates of f with f itself. The measures of performance we use are the L²- and the L^∞-norms and the ability to identify peaks. 相似文献

20.

菲茨杰拉德作品一致性的潜在语义分析

刘海燕尹晓虎《统计与信息论坛》2013,28(1):64-68

小说文本的一致性是该作品在创作上是否成功的一个重要属性,评论家们对文本一致性的认知分歧导致其对同一部作品有着截然不同的评论。为验证这一观点,以菲茨杰拉德的四部长篇小说为例,采用潜在语义分析方法对这些作品进行一致性分析;首次提出并定义文本的章节相似性矩阵、一致性测度和一致性系数的概念,并通过一致性测度和一致性系数的计算与比较,检验和解释评论家们对菲茨杰拉德作品的现有评论。结果表明:小说文本的一致性与其章节结构有关;对小说整体连续性与章节相关性的不同考量是导致评论家们在对《夜色温柔》的评判上出现差异的主要原因。相似文献