首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper describes inference methods for functional data under the assumption that the functional data of interest are smooth latent functions, characterized by a Gaussian process, which have been observed with noise over a finite set of time points. The methods we propose are completely specified in a Bayesian environment that allows for all inferences to be performed through a simple Gibbs sampler. Our main focus is in estimating and describing uncertainty in the covariance function. However, these models also encompass functional data estimation, functional regression where the predictors are latent functions, and an automatic approach to smoothing parameter selection. Furthermore, these models require minimal assumptions on the data structure as the time points for observations do not need to be equally spaced, the number and placement of observations are allowed to vary among functions, and special treatment is not required when the number of functional observations is less than the dimensionality of those observations. We illustrate the effectiveness of these models in estimating latent functional data, capturing variation in the functional covariance estimate, and in selecting appropriate smoothing parameters in both a simulation study and a regression analysis of medfly fertility data.  相似文献   

2.
Very often, in psychometric research, as in educational assessment, it is necessary to analyze item response from clustered respondents. The multiple group item response theory (IRT) model proposed by Bock and Zimowski [12] provides a useful framework for analyzing such type of data. In this model, the selected groups of respondents are of specific interest such that group-specific population distributions need to be defined. The usual assumption for parameter estimation in this model, which is that the latent traits are random variables following different symmetric normal distributions, has been questioned in many works found in the IRT literature. Furthermore, when this assumption does not hold, misleading inference can result. In this paper, we consider that the latent traits for each group follow different skew-normal distributions, under the centered parameterization. We named it skew multiple group IRT model. This modeling extends the works of Azevedo et al. [4], Bazán et al. [11] and Bock and Zimowski [12] (concerning the latent trait distribution). Our approach ensures that the model is identifiable. We propose and compare, concerning convergence issues, two Monte Carlo Markov Chain (MCMC) algorithms for parameter estimation. A simulation study was performed in order to evaluate parameter recovery for the proposed model and the selected algorithm concerning convergence issues. Results reveal that the proposed algorithm recovers properly all model parameters. Furthermore, we analyzed a real data set which presents asymmetry concerning the latent traits distribution. The results obtained by using our approach confirmed the presence of negative asymmetry for some latent trait distributions.  相似文献   

3.
In cluster analysis interest lies in probabilistically capturing partitions of individuals, items or observations into groups, such that those belonging to the same group share similar attributes or relational profiles. Bayesian posterior samples for the latent allocation variables can be effectively obtained in a wide range of clustering models, including finite mixtures, infinite mixtures, hidden Markov models and block models for networks. However, due to the categorical nature of the clustering variables and the lack of scalable algorithms, summary tools that can interpret such samples are not available. We adopt a Bayesian decision theoretical approach to define an optimality criterion for clusterings and propose a fast and context-independent greedy algorithm to find the best allocations. One important facet of our approach is that the optimal number of groups is automatically selected, thereby solving the clustering and the model-choice problems at the same time. We consider several loss functions to compare partitions and show that our approach can accommodate a wide range of cases. Finally, we illustrate our approach on both artificial and real datasets for three different clustering models: Gaussian mixtures, stochastic block models and latent block models for networks.  相似文献   

4.
Life insurance companies want to predict the average claimed sums they have to pay in events of death for specific groups of customers in order to derive group specific premiums. This requires estimation of the variability of claims across groups. We derive a corresponding mixed linear model for claim data from many groups of customers that incorporates group-specific age distributions, the Compertz-Makeham mortality function and an unknown group-specific random hazard factor. It takes the form of a generalized replicated model with two variance components where the between blocks variance component depends on the common mean of all observations. Two methods of parameter estimation are derived along the lines of C. R. Rao's MINQUE and generalized least squares estimation. Simulations show both methods to work well for large sets of data.  相似文献   

5.
Graphical analysis of complex brain networks is a fundamental area of modern neuroscience. Functional connectivity is important since many neurological and psychiatric disorders, including schizophrenia, are described as ‘dys-connectivity’ syndromes. Using electroencephalogram time series collected on each of a group of 15 individuals with a common medical diagnosis of positive syndrome schizophrenia we seek to build a single, representative, brain functional connectivity group graph. Disparity/distance measures between spectral matrices are identified and used to define the normalized graph Laplacian enabling clustering of the spectral matrices for detecting ‘outlying’ individuals. Two such individuals are identified. For each remaining individual, we derive a test for each edge in the connectivity graph based on average estimated partial coherence over frequencies, and associated p-values are found. For each edge these are used in a multiple hypothesis test across individuals and the proportion rejecting the hypothesis of no edge is used to construct a connectivity group graph. This study provides a framework for integrating results on multiple individuals into a single overall connectivity structure.  相似文献   

6.
The estimation of the mixtures of regression models is usually based on the normal assumption of components and maximum likelihood estimation of the normal components is sensitive to noise, outliers, or high-leverage points. Missing values are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this article, we propose the mixtures of regression models for contaminated incomplete heterogeneous data. The proposed models provide robust estimates of regression coefficients varying across latent subgroups even under the presence of missing values. The methodology is illustrated through simulation studies and a real data analysis.  相似文献   

7.
王小燕等 《统计研究》2014,31(9):107-112
变量选择是统计建模的重要环节,选择合适的变量可以建立结构简单、预测精准的稳健模型。本文在logistic回归下提出了新的双层变量选择惩罚方法——adaptive Sparse Group Lasso(adSGL),其独特之处在于基于变量的分组结构作筛选,实现了组内和组间双层选择。该方法的优点是对各单个系数和组系数采取不同程度的惩罚,避免了过度惩罚大系数,从而提高了模型的估计和预测精度。求解的难点是惩罚似然函数不是严格凸的,因此本文基于组坐标下降法求解模型,并建立了调整参数的选取准则。模拟分析表明,对比现有代表性方法Sparse Group Lasso、Group Lasso及Lasso,adSGL法不仅提高了双层选择精度,而且降低了模型误差。最后本文将adSGL法应用到信用卡信用评分研究,对比logistic回归,它具有更高的分类精度和稳健性。  相似文献   

8.
Summary. Genome-wide measurement of gene expression is a promising approach to the identification of subclasses of cancer that are currently not differentiable, but potentially biologically heterogeneous. This type of molecular classification gives hope for highly individualized and more effective prognosis and treatment of cancer. Statistically, the analysis of gene expression data from unclassified tumours is a complex hypothesis-generating activity, involving data exploration, modelling and expert elicitation. We propose a modelling framework that can be used to inform and organize the development of exploratory tools for classification. Our framework uses latent categories to provide both a statistical definition of differential expression and a precise, experiment-independent, definition of a molecular profile. It also generates natural similarity measures for traditional clustering and gives probabilistic statements about the assignment of tumours to molecular profiles.  相似文献   

9.
We propose a class of state-space models for multivariate longitudinal data where the components of the response vector may have different distributions. The approach is based on the class of Tweedie exponential dispersion models, which accommodates a wide variety of discrete, continuous and mixed data. The latent process is assumed to be a Markov process, and the observations are conditionally independent given the latent process, over time as well as over the components of the response vector. This provides a fully parametric alternative to the quasilikelihood approach of Liang and Zeger. We estimate the regression parameters for time-varying covariates entering either via the observation model or via the latent process, based on an estimating equation derived from the Kalman smoother. We also consider analysis of residuals from both the observation model and the latent process.  相似文献   

10.
Psychometric growth curve modeling techniques are used to describe a person’s latent ability and how that ability changes over time based on a specific measurement instrument. However, the same instrument cannot always be used over a period of time to measure that latent ability. This is often the case when measuring traits longitudinally in children. Reasons may be that over time some measurement tools that were difficult for young children become too easy as they age resulting in floor effects or ceiling effects or both. We propose a Bayesian hierarchical model for such a scenario. Within the Bayesian model we combine information from multiple instruments used at different age ranges and having different scoring schemes to examine growth in latent ability over time. The model includes between-subject variance and within-subject variance and does not require linking item specific difficulty between the measurement tools. The model’s utility is demonstrated on a study of language ability in children from ages one to ten who are hard of hearing where measurement tool specific growth and subject-specific growth are shown in addition to a group level latent growth curve comparing the hard of hearing children to children with normal hearing.KEYWORDS: Bayesian hierarchical models, psychometric modeling, language ability, growth curve modeling, longitudinal analysis  相似文献   

11.
Latent variable models have been widely used for modelling the dependence structure of multiple outcomes data. However, the formulation of a latent variable model is often unknown a priori, the misspecification will distort the dependence structure and lead to unreliable model inference. Moreover, multiple outcomes with varying types present enormous analytical challenges. In this paper, we present a class of general latent variable models that can accommodate mixed types of outcomes. We propose a novel selection approach that simultaneously selects latent variables and estimates parameters. We show that the proposed estimator is consistent, asymptotically normal and has the oracle property. The practical utility of the methods is confirmed via simulations as well as an application to the analysis of the World Values Survey, a global research project that explores peoples’ values and beliefs and the social and personal characteristics that might influence them.  相似文献   

12.
In this article, we present a model-based framework to estimate the educational attainments of students in latent groups defined by unobservable or only partially observed features that are likely to affect the outcome distribution, as well as being interesting to be investigated. We focus our attention on the case of students in the first year of the upper secondary schools, for which the teachers’ suggestion at the end of their lower educational level toward the subsequent type of school is available. We use this information to develop latent strata according to the compliance behavior of students simplifying to the case of binary data for both counseled and attended school (i.e., academic or technical institute). We consider a likelihood-based approach to estimate outcome distributions in the latent groups and propose a set of plausible assumptions with respect to the problem at hand. In order to assess our method and its robustness, we simulate data resembling a real study conducted on pupils of the province of Bologna in year 2007/2008 to investigate their success or failure at the end of the first school year.  相似文献   

13.
We consider inference for functional proteomics experiments that record protein activation over time following perturbation under different dose levels of several drugs. The main inference goal is the dependence structure of the selected proteins. A critical challenge is the lack of sufficient data under any one drug and dose level to allow meaningful inference on dependence structure. We propose a hierarchical model to implement the desired inference. The key element of the model is a shared dependence structure on (latent) binary indicators of protein activation.  相似文献   

14.
We propose a mixture of latent variables model for the model-based clustering, classification, and discriminant analysis of data comprising variables with mixed type. This approach is a generalization of latent variable analysis, and model fitting is carried out within the expectation-maximization framework. Our approach is outlined and a simulation study conducted to illustrate the effect of sample size and noise on the standard errors and the recovery probabilities for the number of groups. Our modelling methodology is then applied to two real data sets and their clustering and classification performance is discussed. We conclude with discussion and suggestions for future work.  相似文献   

15.
In this paper, we propose a multivariate growth curve mixture model that groups subjects based on multiple symptoms measured repeatedly over time. Our model synthesizes features of two models. First, we follow Roy and Lin (2000) in relating the multiple symptoms at each time point to a single latent variable. Second, we use the growth mixture model of Muthén and Shedden (1999) to group subjects based on distinctive longitudinal profiles of this latent variable. The mean growth curve for the latent variable in each class defines that class's features. For example, a class of "responders" would have a decline in the latent symptom summary variable over time. A Bayesian approach to estimation is employed where the methods of Elliott et al (2005) are extended to simultaneously estimate the posterior distributions of the parameters from the latent variable and growth curve mixture portions of the model. We apply our model to data from a randomized clinical trial evaluating the efficacy of Bacillus Calmette-Guerin (BCG) in treating symptoms of Interstitial Cystitis. In contrast to conventional approaches using a single subjective Global Response Assessment, we use the multivariate symptom data to identify a class of subjects where treatment demonstrates effectiveness. Simulations are used to confirm identifiability results and evaluate the performance of our algorithm. The definitive version of this paper is available at onlinelibrary.wiley.com.  相似文献   

16.
We consider a general class of prior distributions for nonparametric Bayesian estimation which uses finite random series with a random number of terms. A prior is constructed through distributions on the number of basis functions and the associated coefficients. We derive a general result on adaptive posterior contraction rates for all smoothness levels of the target function in the true model by constructing an appropriate ‘sieve’ and applying the general theory of posterior contraction rates. We apply this general result on several statistical problems such as density estimation, various nonparametric regressions, classification, spectral density estimation and functional regression. The prior can be viewed as an alternative to the commonly used Gaussian process prior, but properties of the posterior distribution can be analysed by relatively simpler techniques. An interesting approximation property of B‐spline basis expansion established in this paper allows a canonical choice of prior on coefficients in a random series and allows a simple computational approach without using Markov chain Monte Carlo methods. A simulation study is conducted to show that the accuracy of the Bayesian estimators based on the random series prior and the Gaussian process prior are comparable. We apply the method on Tecator data using functional regression models.  相似文献   

17.
The volatility pattern of financial time series is often characterized by several peaks and abrupt changes, consistent with the time-varying coefficients of the underlying data-generating process. As a consequence, the model-based classification of the volatility of a set of assets could vary over a period of time. We propose a procedure to classify the unconditional volatility obtained from an extended family of Multiplicative Error Models with time-varying coefficients to verify if it changes in correspondence with different regimes or particular dates. The proposed procedure is experimented on 15 stock indices.  相似文献   

18.
In this work we propose an autoregressive model with parameters varying in time applied to irregularly spaced non-stationary time series. We expand all the functional parameters in a wavelet basis and estimate the coefficients by least squares after truncation at a suitable resolution level. We also present some simulations in order to evaluate both the estimation method and the model behavior on finite samples. Applications to silicates and nitrites irregularly observed data are provided as well.  相似文献   

19.
We develop a hierarchical Gaussian process model for forecasting and inference of functional time series data. Unlike existing methods, our approach is especially suited for sparsely or irregularly sampled curves and for curves sampled with nonnegligible measurement error. The latent process is dynamically modeled as a functional autoregression (FAR) with Gaussian process innovations. We propose a fully nonparametric dynamic functional factor model for the dynamic innovation process, with broader applicability and improved computational efficiency over standard Gaussian process models. We prove finite-sample forecasting and interpolation optimality properties of the proposed model, which remain valid with the Gaussian assumption relaxed. An efficient Gibbs sampling algorithm is developed for estimation, inference, and forecasting, with extensions for FAR(p) models with model averaging over the lag p. Extensive simulations demonstrate substantial improvements in forecasting performance and recovery of the autoregressive surface over competing methods, especially under sparse designs. We apply the proposed methods to forecast nominal and real yield curves using daily U.S. data. Real yields are observed more sparsely than nominal yields, yet the proposed methods are highly competitive in both settings. Supplementary materials, including R code and the yield curve data, are available online.  相似文献   

20.
In this article, we propose a new class of semiparametric instrumental variable models with partially varying coefficients, in which the structural function has a partially linear form and the impact of endogenous structural variables can vary over different levels of some exogenous variables. We propose a three-step estimation procedure to estimate both functional and constant coefficients. The consistency and asymptotic normality of these proposed estimators are established. Moreover, a generalized F-test is developed to test whether the functional coefficients are of particular parametric forms with some underlying economic intuitions, and furthermore, the limiting distribution of the proposed generalized F-test statistic under the null hypothesis is established. Finally, we illustrate the finite sample performance of our approach with simulations and two real data examples in economics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号