首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Summary. We present a technique for extending generalized linear models to the situation where some of the predictor variables are observations from a curve or function. The technique is particularly useful when only fragments of each curve have been observed. We demonstrate, on both simulated and real data sets, how this approach can be used to perform linear, logistic and censored regression with functional predictors. In addition, we show how functional principal components can be used to gain insight into the relationship between the response and functional predictors. Finally, we extend the methodology to apply generalized linear models and principal components to standard missing data problems.  相似文献   

2.
High-dimensional data often exhibit multi-collinearity, leading to unstable regression coefficients. To address sample selection bias and problems associated with high dimensionality, principal components were extracted and used as predictors in a switching regression model. Since principal component regression often results to decline in predictive ability due to the selection of few principal components, we formulate the model with nonparametric function of principal components in lieu of individual predictors. Simulation studies indicated better predictive ability for nonparametric principal component switching regression over the parametric counterpart while mitigating the adverse effects of multi-collinearity and high dimensionality.  相似文献   

3.
This article considers panel data models in the presence of a large number of potential predictors and unobservable common factors. The model is estimated by the regularization method together with the principal components procedure. We propose a panel information criterion for selecting the regularization parameter and the number of common factors under a diverging number of predictors. Under the correct model specification, we show that the proposed criterion consistently identifies the true model. If the model is instead misspecified, the proposed criterion achieves asymptotically efficient model selection. Simulation results confirm these theoretical arguments.  相似文献   

4.
Dependent multivariate count data occur in several research studies. These data can be modelled by a multivariate Poisson or Negative binomial distribution constructed using copulas. However, when some of the counts are inflated, that is, the number of observations in some cells are much larger than other cells, then the copula-based multivariate Poisson (or Negative binomial) distribution may not fit well and it is not an appropriate statistical model for the data. There is a need to modify or adjust the multivariate distribution to account for the inflated frequencies. In this article, we consider the situation where the frequencies of two cells are higher compared to the other cells and develop a doubly inflated multivariate Poisson distribution function using multivariate Gaussian copula. We also discuss procedures for regression on covariates for the doubly inflated multivariate count data. For illustrating the proposed methodologies, we present real data containing bivariate count observations with inflations in two cells. Several models and linear predictors with log link functions are considered, and we discuss maximum likelihood estimation to estimate unknown parameters of the models.  相似文献   

5.
We propose a model for count data from two-stage cluster sampling, where observations within each cluster are subjected simultaneously to internal influences and external factors at the cluster level. This model can be seen as a two-stage hierarchical model with local and global predictors. This parameter-driven model causes the counts within a cluster to share a common latent factor and to be correlated. Maximum likelihood (ml) estimation based on an EM algorithm for the model is discussed. Simulation study is carried out to assess the benefit of using ml estimates compared to a standard Poisson regression analysis that ignores the within cluster correlation.  相似文献   

6.
This article considers in-sample prediction and out-of-sample forecasting in regressions with many exogenous predictors. We consider four dimension-reduction devices: principal components, ridge, Landweber Fridman, and partial least squares. We derive rates of convergence for two representative models: an ill-posed model and an approximate factor model. The theory is developed for a large cross-section and a large time-series. As all these methods depend on a tuning parameter to be selected, we also propose data-driven selection methods based on cross-validation and establish their optimality. Monte Carlo simulations and an empirical application to forecasting inflation and output growth in the U.S. show that data-reduction methods outperform conventional methods in several relevant settings, and might effectively guard against instabilities in predictors’ forecasting ability.  相似文献   

7.
A number of results have been derived recently concerning the influence of individual observations in a principal component analysis. Some of these results, particularly those based on the correlation matrix, are applied to data consisting of seven anatomical measurements on students. The data have a correlation structure which is fairly typical of many found in allometry. This case study shows that theoretical influence functions often provide good estimates of the actual changes observed when individual observations are deleted from a principal component analysis. Different observations may be influential for different aspects of the principal component analysis (coefficients, variances and scores of principal components); these differences, and the distinction between outlying and influential observations are discussed in the context of the case study. A number of other complications, such as switching and rotation of principal components when an observation is deleted, are also illustrated.  相似文献   

8.
Time series sometimes consist of count data in which the number of events occurring in a given time interval is recorded. Such data are necessarily nonnegative integers, and an assumption of a Poisson or negative binomial distribution is often appropriate. This article sets ups a model in which the level of the process generating the observations changes over time. A recursion analogous to the Kalman filter is used to construct the likelihood function and to make predictions of future observations. Qualitative variables, based on a binomial or multinomial distribution, may be handled in a similar way. The model for count data may be extended to include explanatory variables. This enables nonstochastic slope and seasonal components to be included in the model, as well as permitting intervention analysis. The techniques are illustrated with a number of applications, and an attempt is made to develop a model-selection strategy along the lines of that used for Gaussian structural time series models. The applications include an analysis of the results of international football matches played between England and Scotland and an assessment of the effect of the British seat-belt law on the drivers of light-goods vehicles.  相似文献   

9.
The purpose of this article is to obtain the jackknifed ridge predictors in the linear mixed models and to examine the superiorities, the linear combinations of the jackknifed ridge predictors over the ridge, principal components regression, r?k class and Henderson's predictors in terms of bias, covariance matrix and mean square error criteria. Numerical analyses are considered to illustrate the findings and a simulation study is conducted to see the performance of the jackknifed ridge predictors.  相似文献   

10.
This article provides a simple shrinkage representation that describes the operational characteristics of various forecasting methods designed for a large number of orthogonal predictors (such as principal components). These methods include pretest methods, Bayesian model averaging, empirical Bayes, and bagging. We compare empirically forecasts from these methods with dynamic factor model (DFM) forecasts using a U.S. macroeconomic dataset with 143 quarterly variables spanning 1960–2008. For most series, including measures of real economic activity, the shrinkage forecasts are inferior to the DFM forecasts. This article has online supplementary material.  相似文献   

11.
Abstract. We review and extend some statistical tools that have proved useful for analysing functional data. Functional data analysis primarily is designed for the analysis of random trajectories and infinite‐dimensional data, and there exists a need for the development of adequate statistical estimation and inference techniques. While this field is in flux, some methods have proven useful. These include warping methods, functional principal component analysis, and conditioning under Gaussian assumptions for the case of sparse data. The latter is a recent development that may provide a bridge between functional and more classical longitudinal data analysis. Besides presenting a brief review of functional principal components and functional regression, we develop some concepts for estimating functional principal component scores in the sparse situation. An extension of the so‐called generalized functional linear model to the case of sparse longitudinal predictors is proposed. This extension includes functional binary regression models for longitudinal data and is illustrated with data on primary biliary cirrhosis.  相似文献   

12.
With rapid development in the technology of measuring disease characteristics at molecular or genetic level, it is possible to collect a large amount of data on various potential predictors of the clinical outcome of interest in medical research. It is often of interest to effectively use the information on a large number of predictors to make prediction of the interested outcome. Various statistical tools were developed to overcome the difficulties caused by the high-dimensionality of the covariate space in the setting of a linear regression model. This paper focuses on the situation, where the interested outcomes are subjected to right censoring. We implemented the extended partial least squares method along with other commonly used approaches for analyzing the high-dimensional covariates to the ACTG333 data set. Especially, we compared the prediction performance of different approaches with extensive cross-validation studies. The results show that the Buckley–James based partial least squares, stepwise subset model selection and principal components regression have similar promising predictive power and the partial least square method has several advantages in terms of interpretability and numerical computation.  相似文献   

13.
Count data with structural zeros are common in public health applications. There are considerable researches focusing on zero-inflated models such as zero-inflated Poisson (ZIP) and zero-inflated Negative Binomial (ZINB) models for such zero-inflated count data when used as response variable. However, when such variables are used as predictors, the difference between structural and random zeros is often ignored and may result in biased estimates. One remedy is to include an indicator of the structural zero in the model as a predictor if observed. However, structural zeros are often not observed in practice, in which case no statistical method is available to address the bias issue. This paper is aimed to fill this methodological gap by developing parametric methods to model zero-inflated count data when used as predictors based on the maximum likelihood approach. The response variable can be any type of data including continuous, binary, count or even zero-inflated count responses. Simulation studies are performed to assess the numerical performance of this new approach when sample size is small to moderate. A real data example is also used to demonstrate the application of this method.  相似文献   

14.
We develop a new principal components analysis (PCA) type dimension reduction method for binary data. Different from the standard PCA which is defined on the observed data, the proposed PCA is defined on the logit transform of the success probabilities of the binary observations. Sparsity is introduced to the principal component (PC) loading vectors for enhanced interpretability and more stable extraction of the principal components. Our sparse PCA is formulated as solving an optimization problem with a criterion function motivated from penalized Bernoulli likelihood. A Majorization-Minimization algorithm is developed to efficiently solve the optimization problem. The effectiveness of the proposed sparse logistic PCA method is illustrated by application to a single nucleotide polymorphism data set and a simulation study.  相似文献   

15.
In many fields of empirical research one is faced with observations arising from a functional process. If so, classical multivariate methods are often not feasible or appropriate to explore the data at hand and functional data analysis is prevailing. In this paper we present a method for joint modeling of mean and variance in longitudinal data using penalized splines. Unlike previous approaches we model both components simultaneously via rich spline bases. Estimation as well as smoothing parameter selection is carried out using a mixed model framework. The resulting smooth covariance structures are then used to perform principal component analysis. We illustrate our approach by several simulations and an application to financial interest data.  相似文献   

16.
Recently, van der Linde (Comput. Stat. Data Anal. 53:517–533, 2008) proposed a variational algorithm to obtain approximate Bayesian inference in functional principal components analysis (FPCA), where the functions were observed with Gaussian noise. Generalized FPCA under different noise models with sparse longitudinal data was developed by Hall et al. (J. R. Stat. Soc. B 70:703–723, 2008), but no Bayesian approach is available yet. It is demonstrated that an adapted version of the variational algorithm can be applied to obtain a Bayesian FPCA for canonical parameter functions, particularly log-intensity functions given Poisson count data or logit-probability functions given binary observations. To this end a second order Taylor expansion of the log-likelihood, that is, a working Gaussian distribution and hence another step of approximation, is used. Although the approach is conceptually straightforward, difficulties can arise in practical applications depending on the accuracy of the approximation and the information in the data. A modified algorithm is introduced generally for one-parameter exponential families and exemplified for binary and count data. Conditions for its successful application are discussed and illustrated using simulated data sets. Also an application with real data is presented.  相似文献   

17.
In this paper, a probability plots class of tests for multivariate normality is introduced. Based on independent standardized principal components of a d-variate normal data set, we obtained the sum of squared differences between corresponding observations of an ordered set of each principal component observations and the set of the population pth quantiles of the standard normal distribution. We proposed the sum of these d-sums of squared differences as an appropriate statistic for testing multivariate normality. We evaluated empirical critical values of the statistic and compared its power with those of some highly regarded techniques with a wonderful result.  相似文献   

18.
Clustered (longitudinal) count data arise in many bio-statistical practices in which a number of repeated count responses are observed on a number of individuals. The repeated observations may also represent counts over time from a number of individuals. One important problem that arises in practice is to test homogeneity within clusters (individuals) and between clusters (individuals). As data within clusters are observations of repeated responses, the count data may be correlated and/or over-dispersed. For over-dispersed count data with unknown over-dispersion parameter we derive two score tests by assuming a random intercept model within the framework of (i) the negative binomial mixed effects model and (ii) the double extended quasi-likelihood mixed effects model (Lee and Nelder, 2001). These two statistics are much simpler than a statistic derived by Jacqmin-Gadda and Commenges (1995) under the framework of the over-dispersed generalized linear model. The first statistic takes the over-dispersion more directly into the model and therefore is expected to do well when the model assumptions are satisfied and the other statistic is expected to be robust. Simulations show superior level property of the statistics derived under the negative binomial and double extended quasi-likelihood model assumptions. A data set is analyzed and a discussion is given.  相似文献   

19.
In this study, classical and robust principal component analyses are used to evaluate socioeconomic development of regions of development agencies that give service on the purpose of decreasing development difference among regions in Turkey. Due to the high differences between development levels of regions outlier problem occurs, hence robust statistical methods are used. Also, classical and robust statistical methods are used to investigate if there are any outliers in data set. In classic principal component analyse, the number of observations must be larger than the number of variables. Otherwise determinant of covariance matrix is zero. In Robust method for Principal Component Analysis (ROBPCA), a robust approach to principal component analyse in high-dimensional data, even if the number of variables is larger than the number of observations, principal components are obtained. In this paper, firstly 26 development agencies are evaluated with 19 variables by using principal component analysis based on classical and robust scatter matrices and then these 26 development agencies are evaluated with 46 variables by using the ROBPCA method.  相似文献   

20.
Principal component regression (PCR) has two steps: estimating the principal components and performing the regression using these components. These steps generally are performed sequentially. In PCR, a crucial issue is the selection of the principal components to be included in regression. In this paper, we build a hierarchical probabilistic PCR model with a dynamic component selection procedure. A latent variable is introduced to select promising subsets of components based upon the significance of the relationship between the response variable and principal components in the regression step. We illustrate this model using real and simulated examples. The simulations demonstrate that our approach outperforms some existing methods in terms of root mean squared error of the regression coefficient.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号