首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 906 毫秒
1.
Jaeyong Lee 《Statistics》2013,47(6):515-526
Clustered survival data are often modelled with frailty models which incorporate frailties to model the cluster specific heterogeneity and the dependence between observations in the same cluster. For the analysis of the frailty models, we propose Bayesian modelling with beta process prior on the cumulative hazard function and describe the details of the posterior computation. We demonstrate the method with two data sets using three different frailty distributions: gamma, log-normal and log-logistic distributions. We also empirically demonstrate the difficulty in checking the assumed frailty distribution with the posterior sample of the frailties.  相似文献   

2.
We study the problem of classification with multiple q-variate observations with and without time effect on each individual. We develop new classification rules for populations with certain structured and unstructured mean vectors and under certain covariance structures. The new classification rules are effective when the number of observations is not large enough to estimate the variance–covariance matrix. Computational schemes for maximum likelihood estimates of required population parameters are given. We apply our findings to two real data sets as well as to a simulated data set.  相似文献   

3.
Although devised in 1936 by Fisher, discriminant analysis is still rapidly evolving, as the complexity of contemporary data sets grows exponentially. Our classification rules explore these complexities by modeling various correlations in higher-order data. Moreover, our classification rules are suitable to data sets where the number of response variables is comparable or larger than the number of observations. We assume that the higher-order observations have a separable variance-covariance matrix and two different Kronecker product structures on the mean vector. In this article, we develop quadratic classification rules among g different populations where each individual has κth order (κ ≥2) measurements. We also provide the computational algorithms to compute the maximum likelihood estimates for the model parameters and eventually the sample classification rules.  相似文献   

4.
Registration of temporal observations is a fundamental problem in functional data analysis. Various frameworks have been developed over the past two decades where registrations are conducted based on optimal time warping between functions. Comparison of functions solely based on time warping, however, may have limited application, in particular when certain constraints are desired in the registration. In this paper, we study registration with norm-preserving constraint. A closely related problem is on signal estimation, where the goal is to estimate the ground-truth template given random observations with both compositional and additive noises. We propose to adopt the Fisher–Rao framework to compute the underlying template, and mathematically prove that such framework leads to a consistent estimator. We then illustrate the constrained Fisher–Rao registration using simulations as well as two real data sets. It is found that the constrained method is robust with respect to additive noise and has superior alignment and classification performance to conventional, unconstrained registration methods.  相似文献   

5.
We study the suitability of different modelling methods for joint prediction of mean and variance based on large data sets. We review the approaches to the modelling of conditional variance function that are capable of handling a problem where conditional variance depends on about 10 explanatory variables and training dataset consists of 100 000 observations. We present a promising approach for neural network modelling of mean and dispersion. We compare different approaches in predicting the mechanical properties of steel in two case data sets collected from the production line of a steel plate mill. As a conclusion we give some recommendations concerning the modelling of conditional variance in large datasets.  相似文献   

6.
Ensemble methods using the same underlying algorithm trained on different subsets of observations have recently received increased attention as practical prediction tools for massive data sets. We propose Subsemble: a general subset ensemble prediction method, which can be used for small, moderate, or large data sets. Subsemble partitions the full data set into subsets of observations, fits a specified underlying algorithm on each subset, and uses a clever form of V-fold cross-validation to output a prediction function that combines the subset-specific fits. We give an oracle result that provides a theoretical performance guarantee for Subsemble. Through simulations, we demonstrate that Subsemble can be a beneficial tool for small- to moderate-sized data sets, and often has better prediction performance than the underlying algorithm fit just once on the full data set. We also describe how to include Subsemble as a candidate in a SuperLearner library, providing a practical way to evaluate the performance of Subsemble relative to the underlying algorithm fit just once on the full data set.  相似文献   

7.
Detection of outliers or influential observations is an important work in statistical modeling, especially for the correlated time series data. In this paper we propose a new procedure to detect patch of influential observations in the generalized autoregressive conditional heteroskedasticity (GARCH) model. Firstly we compare the performance of innovative perturbation scheme, additive perturbation scheme and data perturbation scheme in local influence analysis. We find that the innovative perturbation scheme give better result than other two schemes although this perturbation scheme may suffer from masking effects. Then we use the stepwise local influence method under innovative perturbation scheme to detect patch of influential observations and uncover the masking effects. The simulated studies show that the new technique can successfully detect a patch of influential observations or outliers under innovative perturbation scheme. The analysis based on simulation studies and two real data sets show that the stepwise local influence method under innovative perturbation scheme is efficient for detecting multiple influential observations and dealing with masking effects in the GARCH model.  相似文献   

8.
Count data are routinely assumed to have a Poisson distribution, especially when there are no straightforward diagnostic procedures for checking this assumption. We reanalyse two data sets from crossover trials of treatments for angina pectoris , in which the outcomes are counts of anginal attacks. Standard analyses focus on treatment effects, averaged over subjects; we are also interested in the dispersion of these effects (treatment heterogeneity). We set up a log-Poisson model with random coefficients to estimate the distribution of the treatment effects and show that the analysis is very sensitive to the distributional assumption; the population variance of the treatment effects is confounded with the (variance) function that relates the conditional variance of the outcomes, given the subject's rate of attacks, to the conditional mean. Diagnostic model checks based on resampling from the fitted distribution indicate that the default choice of the Poisson distribution for the analysed data sets is poorly supported. We propose to augment the data sets with observations of the counts, made possibly outside the clinical setting, so that the conditional distribution of the counts could be established.  相似文献   

9.
The income or expenditure-related data sets are often nonlinear, heteroscedastic, skewed even after the transformation, and contain numerous outliers. We propose a class of robust nonlinear models that treat outlying observations effectively without removing them. For this purpose, case-specific parameters and a related penalty are employed to detect and modify the outliers systematically. We show how the existing nonlinear models such as smoothing splines and generalized additive models can be robustified by the case-specific parameters. Next, we extend the proposed methods to the heterogeneous models by incorporating unequal weights. The details of estimating the weights are provided. Two real data sets and simulated data sets show the potential of the proposed methods when the nature of the data is nonlinear with outlying observations.  相似文献   

10.
This paper introduces a nonparametric approach for testing the equality of two or more survival distributions based on right censored failure times with missing population marks for the censored observations. The standard log-rank test is not applicable here because the population membership information is not available for the right censored individuals. We propose to use the imputed population marks for the censored observations leading to fractional at-risk sets that can be used in a two sample censored data log-rank test. We demonstrate with a simple example that there could be a gain in power by imputing population marks (the proposed method) for the right censored individuals compared to simply removing them (which also would maintain the right size). Performance of the imputed log-rank tests obtained this way is studied through simulation. We also obtain an asymptotic linear representation of our test statistic. Our testing methodology is illustrated using a real data set.  相似文献   

11.
12.
The aim of this paper is to develop a Bayesian local influence method (Zhu et al. 2009, submitted) for assessing minor perturbations to the prior, the sampling distribution, and individual observations in survival analysis. We introduce a perturbation model to characterize simultaneous (or individual) perturbations to the data, the prior distribution, and the sampling distribution. We construct a Bayesian perturbation manifold to the perturbation model and calculate its associated geometric quantities including the metric tensor to characterize the intrinsic structure of the perturbation model (or perturbation scheme). We develop local influence measures based on several objective functions to quantify the degree of various perturbations to statistical models. We carry out several simulation studies and analyze two real data sets to illustrate our Bayesian local influence method in detecting influential observations, and for characterizing the sensitivity to the prior distribution and hazard function.  相似文献   

13.
Methods of detecting influential observations for the normal model for censored data are proposed. These methods include one-step deletion methods, deletion of observations and the empirical influence function. Emphasis is placed on assessing the impact that a single observation has on the estimation of coefficients of the model. Functions of the coefficients such as the median lifetime are also considered. Results are compared when applied to two sets of data.  相似文献   

14.
Life insurance companies want to predict the average claimed sums they have to pay in events of death for specific groups of customers in order to derive group specific premiums. This requires estimation of the variability of claims across groups. We derive a corresponding mixed linear model for claim data from many groups of customers that incorporates group-specific age distributions, the Compertz-Makeham mortality function and an unknown group-specific random hazard factor. It takes the form of a generalized replicated model with two variance components where the between blocks variance component depends on the common mean of all observations. Two methods of parameter estimation are derived along the lines of C. R. Rao's MINQUE and generalized least squares estimation. Simulations show both methods to work well for large sets of data.  相似文献   

15.
This paper extends the scedasticity comparison among several groups of observations, usually complying with the homoscedastic and the heteroscedastic cases, in order to deal with data sets laying in an intermediate situation. As is well known, homoscedasticity corresponds to equality in orientation, shape and size of the group scatters. Here our attention is focused on two weaker requirements: scatters with the same orientation, but with different shape and size, or scatters with the same shape and size but different orientation. We introduce a multiple testing procedure that takes into account each of the above conditions. This approach discloses a richer information on the data underlying structure than the classical method only based on homo/heteroscedasticity. At the same time, it allows a more parsimonious parametrization, whenever the patterned model is appropriate to describe the real data. The new inferential methodology is then applied to some well-known data sets, chosen in the multivariate literature, to show the real gain in using this more informative approach. Finally, a wide simulation study illustrates and compares the performance of the proposal using data sets with gradual departure from homoscedasticity.  相似文献   

16.
Summary. We present a technique for extending generalized linear models to the situation where some of the predictor variables are observations from a curve or function. The technique is particularly useful when only fragments of each curve have been observed. We demonstrate, on both simulated and real data sets, how this approach can be used to perform linear, logistic and censored regression with functional predictors. In addition, we show how functional principal components can be used to gain insight into the relationship between the response and functional predictors. Finally, we extend the methodology to apply generalized linear models and principal components to standard missing data problems.  相似文献   

17.
Several methods have been suggested, in the literature, to detect influential observations from the data fitting usual linear model y=X???+???, ???∽N(0, ???2I). Recently, Chatterjee & Hadi (1986) have reviewed most of these available methods and described the inter-relationships between them. In this article, we extend some of these methods to the case of multivariate regression data. We consider several data sets to illustrate the methods.  相似文献   

18.
In the past decades, the number of variables explaining observations in different practical applications increased gradually. This has led to heavy computational tasks, despite of widely using provisional variable selection methods in data processing. Therefore, more methodological techniques have appeared to reduce the number of explanatory variables without losing much of the information. In these techniques, two distinct approaches are apparent: ‘shrinkage regression’ and ‘sufficient dimension reduction’. Surprisingly, there has not been any communication or comparison between these two methodological categories, and it is not clear when each of these two approaches are appropriate. In this paper, we fill some of this gap by first reviewing each category in brief, paying special attention to the most commonly used methods in each category. We then compare commonly used methods from both categories based on their accuracy, computation time, and their ability to select effective variables. A simulation study on the performance of the methods in each category is generated as well. The selected methods are concurrently tested on two sets of real data which allows us to recommend conditions under which one approach is more appropriate to be applied to high-dimensional data.  相似文献   

19.
We consider the problem of optimal design of experiments for random effects models, especially population models, where a small number of correlated observations can be taken on each individual, while the observations corresponding to different individuals are assumed to be uncorrelated. We focus on c-optimal design problems and show that the classical equivalence theorem and the famous geometric characterization of Elfving (1952) from the case of uncorrelated data can be adapted to the problem of selecting optimal sets of observations for the n individual patients. The theory is demonstrated by finding optimal designs for a linear model with correlated observations and a nonlinear random effects population model, which is commonly used in pharmacokinetics.  相似文献   

20.
The author proposes a general method for constructing nonparametric tests of hypotheses for umbrella alternatives. Such alternatives are relevant when the treatment effect changes in direction after reaching a peak. The author's class of tests is based on the ranks of the observations. His general approach consists of defining two sets of rankings: the first is induced by the alternative and the other by the data itself. His test statistic measures the distance between the two sets. The author determines the asymptotic distribution for some special cases of distances under both the null and the alternative hypothesis when the location of the peak is known or unknown. He shows the good power of his tests through a limited simulation study  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号