期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Modified classification and regression tree splitting criteria for data with interactions 总被引：1，自引：0，他引：1

Alexandra P. Bremner & Ross H. Taplin 《Australian & New Zealand Journal of Statistics》2002,44(2):169-176

This paper proposes modified splitting criteria for classification and regression trees by modifying the definition of the deviance. The modified deviance is based on local averaging instead of global averaging and is more successful at modelling data with interactions. The paper shows that the modified criteria result in much simpler trees for pure interaction data (no main effects) and can produce trees with fewer errors and lower residual mean deviances than those produced by Clark & Pregibon's (1992) method when applied to real datasets with strong interaction effects. 相似文献

2.

Results of exploratory data analysis in the broken stick model

David Almorza M. Hortensia García 《Journal of applied statistics》2008,35(9):979-983

The broken stick model is a model of the abundance of species in a habitat, and it has been widely extended. In this paper, we present results from exploratory data analysis of this model. To obtain some of the statistics, we formulate the broken stick model as a probability distribution function based on the same model, and we provide an expression for the cumulative distribution function, which is needed to obtain the results from exploratory data analysis. The inequalities we present are useful in ecological studies that apply broken stick models. These results are also useful for testing the goodness of fit of the broken stick model as an alternative to the chi square test, which has often been the main test used. Therefore, these results may be used in several alternative and complementary ways for testing the goodness of fit of the broken stick model. 相似文献

3.

Exploratory data structure comparisons: three new visual tools based on principal component analysis

Anne Helby Petersen Bo Markussen Karl Bang Christensen 《Journal of applied statistics》2021,48(9):1675

Datasets are sometimes divided into distinct subsets, e.g. due to multi-center sampling, or to variations in instruments, questionnaire item ordering or mode of administration, and the data analyst then needs to assess whether a joint analysis is meaningful. The Principal Component Analysis-based Data Structure Comparisons (PCADSC) tools are three new non-parametric, visual diagnostic tools for investigating differences in structure for two subsets of a dataset through covariance matrix comparisons by use of principal component analysis. The PCADCS tools are demonstrated in a data example using European Social Survey data on psychological well-being in three countries, Denmark, Sweden, and Bulgaria. The data structures are found to be different in Denmark and Bulgaria, and thus a comparison of for example mean psychological well-being scores is not meaningful. However, when comparing Denmark and Sweden, very similar data structures, and thus comparable concepts of well-being, are found. Therefore, inter-country comparisons are warranted for these countries. 相似文献

4.

Inferactive data analysis

Nan Bi Jelena Markovic Lucy Xia Jonathan Taylor 《Scandinavian Journal of Statistics》2020,47(1):212-249

We describe inferactive data analysis, so-named to denote an interactive approach to data analysis with an emphasis on inference after data analysis. Our approach is a compromise between Tukey's exploratory and confirmatory data analysis allowing also for Bayesian data analysis. We see this as a useful step in concrete providing tools (with statistical guarantees) for current data scientists. The basis of inference we use is (a conditional approach to) selective inference, in particular its randomized form. The relevant reference distributions are constructed from what we call a DAG-DAG—a Data Analysis Generative DAG, and a selective change of variables formula is crucial to any practical implementation of inferactive data analysis via sampling these distributions. We discuss a canonical example of an incomplete cross-validation test statistic to discriminate between black box models, and a real HIV dataset example to illustrate inference after making multiple queries on data. 相似文献

5.

Families of splitting criteria for classification trees 总被引：6，自引：0，他引：6

Shih Y.-S. 《Statistics and Computing》1999,9(4):309-315

Several splitting criteria for binary classification trees are shown to be written as weighted sums of two values of divergence measures. This weighted sum approach is then used to form two families of splitting criteria. One of them contains the chi-squared and entropy criterion, the other contains the mean posterior improvement criterion. Both family members are shown to have the property of exclusive preference. Furthermore, the optimal splits based on the proposed families are studied. We find that the best splits depend on the parameters in the families. The results reveal interesting differences among various criteria. Examples are given to demonstrate the usefulness of both families. 相似文献

6.

Spatio-temporal functional regression on paleoecological data

Liliane Bel Avner Bar-Hen Rémy Petit Rachid Cheddadi 《Journal of applied statistics》2011,38(4):695-704

There is much interest in predicting the impact of global warming on the genetic diversity of natural populations and the influence of climate on biodiversity is an important ecological question. Since Holocene, we face many climate perturbations and the geographical ranges of plant taxa have changed substantially. Actual genetic diversity of plant is a result of these processes and a first step to study the impact of future climate change is to understand the important features of reconstructed climate variables such as temperature or precipitation for the last 15,000 years on actual genetic diversity of forest. We model the relationship between genetic diversity in the European beech (Fagus sylvatica) forests and curves of temperature and precipitation reconstructed from pollen databases. Our model links the genetic measure to the climate curves. We adapt classical functional linear model to take into account interactions between climate variables as a bilinear form. Since the data are georeferenced, our extensions also account for the spatial dependence among the observations. The practical issues of these methodological extensions are discussed. 相似文献

7.

Tree-structured subgroup analysis for censored survival data: Validation of computationally inexpensive model selection criteria

Abdissa?Negassa Email author Antonio?Ciampi Michal?Abrahamowicz Stanley?Shapiro Jean-Fran?ois?Boivin 《Statistics and Computing》2005,15(3):231-239

The performance of computationally inexpensive model selection criteria in the context of tree-structured subgroup analysis is investigated. It is shown through simulation that no single model selection criterion exhibits a uniformly superior performance over a wide range of scenarios. Therefore, a two-stage approach for model selection is proposed and shown to perform satisfactorily. Applied example of subgroup analysis is presented. Problems associated with tree-structured subgroup analysis are discussed and practical solutions are suggested. 相似文献

8.

Semiparametric principal component poisson regression on clustered data

Kristina Celene M. Manalaysay 《统计学通讯:模拟与计算》2017,46(2):1546-1556

In modeling count data with multivariate predictors, we often encounter problems with clustering of observations and interdependency of predictors. We propose to use principal components of predictors to mitigate the multicollinearity problem and to abate information losses due to dimension reduction, a semiparametric link between the count dependent variable and the principal components is postulated. Clustering of observations is accounted into the model as a random component and the model is estimated via the backfitting algorithm. Simulation study illustrates the advantages of the proposed model over standard poisson regression in a wide range of scenarios. 相似文献

9.

Exploring multivariate data using directions of high density

FOSTER PETER 《Statistics and Computing》1998,8(4):347-355

The most common techniques for graphically presenting a multivariate dataset involve projection onto a one or two-dimensional subspace. Interpretation of such plots is not always straightforward because projections are smoothing operations in that structure can be obscured by projection but never enhanced. In this paper an alternative procedure for finding interesting features is proposed that is based on locating the modes of an induced hyperspherical density function, and a simple algorithm for this purpose is developed. Emphasis is placed on identifying the non-linear effects, such as clustering, so to this end the data are firstly sphered to remove all of the location, scale and correlational structure. A set of simulated bivariate data and artistic qualities of painters data are used as examples. 相似文献

10.

Exploratory data analysis for counts using the empirical probability generating function 1

Miguel Nakamura Victor Pérez-Abreu 《统计学通讯:理论与方法》2013,42(3):827-842

We present a graphical method based on the empirical probability generating function for preliminary statistical analysis of distributions for counts. The method is especially useful in fitting a Poisson model, or for identifying alternative models as well as possible outlying observations from general discrete distributions. 相似文献

11.

A distance based regression model for prediction with mixed data

C.M. Cuadras C. Arenas 《统计学通讯:理论与方法》2013,42(6):2261-2279

A multiple regression method based on distance analysis and metric scaling is proposed and studied. This method allow us to predict a continuous response variable from several explanatory variables, is compatible with the general linear model and is found to be useful when the predictor variables are both continuous and categorical. Real data examples are given to illustrate the results obtained. 相似文献

12.

Function-on-function regression for two-dimensional functional data

Andrada E. Ivanescu 《统计学通讯:模拟与计算》2013,42(9):2656-2669

ABSTRACT

We present methods for modeling and estimation of a concurrent functional regression when the predictors and responses are two-dimensional functional datasets. The implementations use spline basis functions and model fitting is based on smoothing penalties and mixed model estimation. The proposed methods are implemented in available statistical software, allow the construction of confidence intervals for the bivariate model parameters, and can be applied to completely or sparsely sampled responses. Methods are tested to data in simulations and they show favorable results in practice. The usefulness of the methods is illustrated in an application to environmental data. 相似文献

13.

A log-linear regression model for the odd Weibull distribution with censored data

Edwin M.M. Ortega Gauss M. Cordeiro Elizabeth M. Hashimoto Kahadawala Cooray 《Journal of applied statistics》2014,41(9):1859-1880

We introduce the log-odd Weibull regression model based on the odd Weibull distribution (Cooray, 2006). We derive some mathematical properties of the log-transformed distribution. The new regression model represents a parametric family of models that includes as sub-models some widely known regression models that can be applied to censored survival data. We employ a frequentist analysis and a parametric bootstrap for the parameters of the proposed model. We derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and present some ways to assess global influence. Further, for different parameter settings, sample sizes and censoring percentages, some simulations are performed. In addition, the empirical distribution of some modified residuals are given and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be extended to a modified deviance residual in the proposed regression model applied to censored data. We define martingale and deviance residuals to check the model assumptions. The extended regression model is very useful for the analysis of real data. 相似文献

14.

Estimation of semiparametric regression model with right-censored high-dimensional data

Dursun Aydın S. Ejaz Ahmed 《Journal of Statistical Computation and Simulation》2019,89(6):985-1004

相似文献

15.

Streaming constrained binary logistic regression with online standardized data

Benoît Lallou Jean-Marie Monnez Eliane Albuisson 《Journal of applied statistics》2022,49(6):1519

相似文献

16.

Additive hazards regression of current status data with auxiliary covariates

Yanqin Feng Yuan Dong 《统计学通讯:理论与方法》2017,46(21):10657-10671

This paper discusses the regression analysis of current status failure time data arising from the additive hazards model with auxiliary covariates. As often occurs in practice, it is impossible or impractical to measure the exact magnitude of covariates for all subjects in a study. To compensate the missing information, some auxiliary covariates are utilized instead. We propose two easy-to-implement procedures for estimation of regression parameters by making use of auxiliary information. The asymptotic properties of the resulting estimators are established and extensive numerical studies indicate that both procedures work well in practice. 相似文献

17.

Multivariate regression analysis of panel data with binary outcomes applied to unemployment data

Claudia Czado 《Statistical Papers》2000,41(3):281-304

Summary In panel studies binary outcome measures together with time stationary and time varying explanatory variables are collected over time on the same individual. Therefore, a regression analysis for this type of data must allow for the correlation among the outcomes of an individual. The multivariate probit model of Ashford and Sowden (1970) was the first regression model for multivariate binary responses. However, a likelihood analysis of the multivariate probit model with general correlation structure for higher dimensions is intractable due to the maximization over high dimensional integrals thus severely restricting ist applicability so far. Czado (1996) developed a Markov Chain Monte Carlo (MCMC) algorithm to overcome this difficulty. In this paper we present an application of this algorithm to unemployment data from the Panel Study of Income Dynamics involving 11 waves of the panel study. In addition we adapt Bayesian model checking techniques based on the posterior predictive distribution (see for example Gelman et al. (1996)) for the multivariate probit model. These help to identify mean and correlation specification which fit the data well. C. Czado was supported by research grant OGP0089858 of the Natural Sciences and Engineering Research Council of Canada. 相似文献

18.

On density and regression estimation with incomplete data

Majid Mojirsheibani Kevin Manley William Pouliot 《统计学通讯:理论与方法》2017,46(23):11688-11711

We consider the problem of estimation of a density function in the presence of incomplete data and study the Hellinger distance between our proposed estimators and the true density function. Here, the presence of incomplete data is handled by utilizing a Horvitz–Thompson-type inverse weighting approach, where the weights are the estimates of the unknown selection probabilities. We also address the problem of estimation of a regression function with incomplete data. 相似文献

19.

Performance of asymmetric links and correction methods for imbalanced data in binary regression

Alex de la Cruz Huayanay Jorge L. Bazán Vicente G. Cancho Dipak K. Dey 《Journal of Statistical Computation and Simulation》2019,89(9):1694-1714

In binary regression, imbalanced data result from the presence of values equal to zero (or one) in a proportion that is significantly greater than the corresponding real values of one (or zero). In this work, we evaluate two methods developed to deal with imbalanced data and compare them to the use of asymmetric links. The results based on simulation study show, that correction methods do not adequately correct bias in the estimation of regression coefficients and that the models with power links and reverse power considered produce better results for certain types of imbalanced data. Additionally, we present an application for imbalanced data, identifying the best model among the various ones proposed. The parameters are estimated using a Bayesian approach, considering the Hamiltonian Monte-Carlo method, utilizing the No-U-Turn Sampler algorithm and the comparisons of models were developed using different criteria for model comparison, predictive evaluation and quantile residuals. 相似文献

20.

Generalized spatial regression with differential regularization

Matthieu Wilhelm 《Journal of Statistical Computation and Simulation》2016,86(13):2497-2518

ABSTRACT

We aim at analysing geostatistical and areal data observed over irregularly shaped spatial domains and having a distribution within the exponential family. We propose a generalized additive model that allows to account for spatially varying covariate information. The model is fitted by maximizing a penalized log-likelihood function, with a roughness penalty term that involves a differential quantity of the spatial field, computed over the domain of interest. Efficient estimation of the spatial field is achieved resorting to the finite element method, which provides a basis for piecewise polynomial surfaces. The proposed model is illustrated by an application to the study of criminality in the city of Portland, OR, USA. 相似文献