首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 953 毫秒
1.
ABSTRACT

We propose a multiple imputation method based on principal component analysis (PCA) to deal with incomplete continuous data. To reflect the uncertainty of the parameters from one imputation to the next, we use a Bayesian treatment of the PCA model. Using a simulation study and real data sets, the method is compared to two classical approaches: multiple imputation based on joint modelling and on fully conditional modelling. Contrary to the others, the proposed method can be easily used on data sets where the number of individuals is less than the number of variables and when the variables are highly correlated. In addition, it provides unbiased point estimates of quantities of interest, such as an expectation, a regression coefficient or a correlation coefficient, with a smaller mean squared error. Furthermore, the widths of the confidence intervals built for the quantities of interest are often smaller whilst ensuring a valid coverage.  相似文献   

2.
Count data are routinely assumed to have a Poisson distribution, especially when there are no straightforward diagnostic procedures for checking this assumption. We reanalyse two data sets from crossover trials of treatments for angina pectoris , in which the outcomes are counts of anginal attacks. Standard analyses focus on treatment effects, averaged over subjects; we are also interested in the dispersion of these effects (treatment heterogeneity). We set up a log-Poisson model with random coefficients to estimate the distribution of the treatment effects and show that the analysis is very sensitive to the distributional assumption; the population variance of the treatment effects is confounded with the (variance) function that relates the conditional variance of the outcomes, given the subject's rate of attacks, to the conditional mean. Diagnostic model checks based on resampling from the fitted distribution indicate that the default choice of the Poisson distribution for the analysed data sets is poorly supported. We propose to augment the data sets with observations of the counts, made possibly outside the clinical setting, so that the conditional distribution of the counts could be established.  相似文献   

3.
In this paper we analyse the performances of a novel approach to modelling non-linear conditionally heteroscedastic time series characterised by asymmetries in both the conditional mean and variance. This is based on the combination of a TAR model for the conditional mean with a Constrained Changing Parameters Volatility (CPV-C) model for the conditional variance. Empirical results are given for the daily returns of the S&P 500, NASDAQ composite and FTSE 100 stock market indexes.  相似文献   

4.
In the recent past, the autoregressive conditional duration (ACD) models have gained popularity in modelling the durations between successive events. The aim of this paper is to propose a simple and distribution free re-sampling procedure for developing the forecast intervals of linear ACD Models. We use the conditional least squares method to estimate the parameters of the ACD Model instead of the conditional Maximum Likelihood Estimation or Quasi-Maximum Likelihood Estimation and show that they are consistent for large samples. The properties of the proposed procedure are illustrated by a simulation study and an application to two real data sets.  相似文献   

5.
A semi-parametric additive model for variance heterogeneity   总被引:1,自引:0,他引:1  
This paper presents a flexible model for variance heterogeneity in a normal error model. Specifically, both the mean and variance are modelled using semi-parametric additive models. We call this model a Mean And Dispersion Additive Model (MADAM). A successive relaxation algorithm for fitting the model is described and justified as maximizing a penalized likelihood function with penalties for lack of smoothness in the additive non-parametric functions in both mean and variance models. The algorithm is implemented in GLIM4, allowing flexible and interactive modelling of variance heterogeneity. Two data sets are used for demonstration.  相似文献   

6.
The statistical literature on the analysis of discrete variate time series has concentrated mainly on parametric models, that is the conditional probability mass function is assumed to belong to a parametric family. Generally, these parametric models impose strong assumptions on the relationship between the conditional mean and variance. To generalize these implausible assumptions, this paper instead considers a more realistic semiparametric model, called random rounded integer-valued autoregressive conditional heteroskedastic (RRINARCH) model, where there are essentially no assumptions on the relationship between the conditional mean and variance. The new model has several advantages: (a) it provides a coherent semiparametric framework for discrete variate time series, in which the conditional mean and variance can be modeled separately; (b) it allows negative values both for the series and its autocorrelation function; (c) its autocorrelation structure is the same as that of a standard autoregressive (AR) process; (d) standard software for its estimation is directly applicable. For the new model, conditions for stationarity, ergodicity and the existence of moments are established and the consistency and asymptotic normality of the conditional least squares estimator are proved. Simulation experiments are carried out to assess the performance of the model. The analyses of real data sets illustrate the flexibility and usefulness of the RRINARCH model for obtaining more realistic forecast means and variances.  相似文献   

7.
Investigators and epidemiologists often use statistics based on the parameters of a multinomial distribution. Two main approaches have been developed to assess the inferences of these statistics. The first one uses asymptotic formulae which are valid for large sample sizes. The second one computes the exact distribution, which performs quite well for small samples. They present some limitations for sample sizes N neither large enough to satisfy the assumption of asymptotic normality nor small enough to allow us to generate the exact distribution. We analytically computed the 1/N corrections of the asymptotic distribution for any statistics based on a multinomial law. We applied these results to the kappa statistic in 2×2 and 3×3 tables. We also compared the coverage probability obtained with the asymptotic and the corrected distributions under various hypothetical configurations of sample size and theoretical proportions. With this method, the estimate of the mean and the variance were highly improved as well as the 2.5 and the 97.5 percentiles of the distribution, allowing us to go down to sample sizes around 20, for data sets not too asymmetrical. The order of the difference between the exact and the corrected values was 1/N2 for the mean and 1/N3 for the variance.  相似文献   

8.
A parametric modelling for interval data is proposed, assuming a multivariate Normal or Skew-Normal distribution for the midpoints and log-ranges of the interval variables. The intrinsic nature of the interval variables leads to special structures of the variance–covariance matrix, which is represented by five different possible configurations. Maximum likelihood estimation for both models under all considered configurations is studied. The proposed modelling is then considered in the context of analysis of variance and multivariate analysis of variance testing. To access the behaviour of the proposed methodology, a simulation study is performed. The results show that, for medium or large sample sizes, tests have good power and their true significance level approaches nominal levels when the constraints assumed for the model are respected; however, for small samples, sizes close to nominal levels cannot be guaranteed. Applications to Chinese meteorological data in three different regions and to credit card usage variables for different card designations, illustrate the proposed methodology.  相似文献   

9.
We study the problem of classification with multiple q-variate observations with and without time effect on each individual. We develop new classification rules for populations with certain structured and unstructured mean vectors and under certain covariance structures. The new classification rules are effective when the number of observations is not large enough to estimate the variance–covariance matrix. Computational schemes for maximum likelihood estimates of required population parameters are given. We apply our findings to two real data sets as well as to a simulated data set.  相似文献   

10.
Extended Poisson process modelling is generalised to allow for covariate-dependent dispersion as well as a covariate-dependent mean response. This is done by a re-parameterisation that uses approximate expressions for the mean and variance. Such modelling allows under- and over-dispersion, or a combination of both, in the same data set to be accommodated within the same modelling framework. All the necessary calculations can be done numerically, enabling maximum likelihood estimation of all model parameters to be carried out. The modelling is applied to re-analyse two published data sets, where there is evidence of covariate-dependent dispersion, with the modelling leading to more informative analyses of these data and more appropriate measures of the precision of any estimates.  相似文献   

11.
With the influx of complex and detailed tracking data gathered from electronic tracking devices, the analysis of animal movement data has recently emerged as a cottage industry among biostatisticians. New approaches of ever greater complexity are continue to be added to the literature. In this paper, we review what we believe to be some of the most popular and most useful classes of statistical models used to analyse individual animal movement data. Specifically, we consider discrete-time hidden Markov models, more general state-space models and diffusion processes. We argue that these models should be core components in the toolbox for quantitative researchers working on stochastic modelling of individual animal movement. The paper concludes by offering some general observations on the direction of statistical analysis of animal movement. There is a trend in movement ecology towards what are arguably overly complex modelling approaches which are inaccessible to ecologists, unwieldy with large data sets or not based on mainstream statistical practice. Additionally, some analysis methods developed within the ecological community ignore fundamental properties of movement data, potentially leading to misleading conclusions about animal movement. Corresponding approaches, e.g. based on Lévy walk-type models, continue to be popular despite having been largely discredited. We contend that there is a need for an appropriate balance between the extremes of either being overly complex or being overly simplistic, whereby the discipline relies on models of intermediate complexity that are usable by general ecologists, but grounded in well-developed statistical practice and efficient to fit to large data sets.  相似文献   

12.
Coefficient estimation in linear regression models with missing data is routinely carried out in the mean regression framework. However, the mean regression theory breaks down if the error variance is infinite. In addition, correct specification of the likelihood function for existing imputation approach is often challenging in practice, especially for skewed data. In this paper, we develop a novel composite quantile regression and a weighted quantile average estimation procedure for parameter estimation in linear regression models when some responses are missing at random. Instead of imputing the missing response by randomly drawing from its conditional distribution, we propose to impute both missing and observed responses by their estimated conditional quantiles given the observed data and to use the parametrically estimated propensity scores to weigh check functions that define a regression parameter. Both estimation procedures are resistant to heavy‐tailed errors or outliers in the response and can achieve nice robustness and efficiency. Moreover, we propose adaptive penalization methods to simultaneously select significant variables and estimate unknown parameters. Asymptotic properties of the proposed estimators are carefully investigated. An efficient algorithm is developed for fast implementation of the proposed methodologies. We also discuss a model selection criterion, which is based on an ICQ ‐type statistic, to select the penalty parameters. The performance of the proposed methods is illustrated via simulated and real data sets.  相似文献   

13.
Copulas are powerful explanatory tools for studying dependence patterns in multivariate data. While the primary use of copula models is in multivariate dependence modelling, they also offer predictive value for regression analysis. This article investigates the utility of copula models for model‐based predictions from two angles. We assess whether, where, and by how much various copula models differ in their predictions of a conditional mean and conditional quantiles. From a model selection perspective, we then evaluate the predictive discrepancy between copula models using in‐sample and out‐of‐sample predictions both in bivariate and higher‐dimensional settings. Our findings suggest that some copula models are more difficult to distinguish in terms of their overall predictive power than others, and depending on the quantity of interest, the differences in predictions can be detected only in some targeted regions. The situations where copula‐based regression approaches would be advantageous over traditional ones are discussed using simulated and real data. The Canadian Journal of Statistics 47: 8–26; 2019 © 2018 Statistical Society of Canada  相似文献   

14.
This paper considers a time series model with a piecewise linear conditional mean and a piecewise linear conditional variance which is a natural extension of Tong's threshold autoregressive model. The model has potential applications in modelling asymmetric behaviour in volatility in the financial market. Conditions for stationarity and ergodicity are derived. Asymptotic properties of the maximum likelihood estimator and two model diagnostic checking statistics are also presented. An illustrative example based on the Hong Kong Hang Seng index is also reported.  相似文献   

15.
In RSS, the variance of observations in each ranked set plays an important role in finding an optimal design for unbalanced RSS and in inferring the population mean. The empirical estimator (i.e., the sample variance in a given ranked set) is most commonly used for estimating the variance in the literature. However, the empirical estimator does not use the information in the entire data over different ranked sets. Further, it is highly variable when the sample size is not large enough, as is typical in RSS applications. In this paper, we propose a plug-in estimator for the variance of each set, which is more efficient than the empirical one. The estimator uses a result in order statistics which characterizes the cumulative distribution function (CDF) of the rth order statistics as a function of the population CDF. We analytically prove the asymptotic normality of the proposed estimator. We further apply it to estimate the standard error of the RSS mean estimator. Both our simulation and empirical study show that our estimators consistently outperform existing methods.  相似文献   

16.
Millions of smart meters that are able to collect individual load curves, that is, electricity consumption time series, of residential and business customers at fine scale time grids are now deployed by electricity companies all around the world. It may be complex and costly to transmit and exploit such a large quantity of information, therefore it can be relevant to use survey sampling techniques to estimate mean load curves of specific groups of customers. Data collection, like every mass process, may undergo technical problems at every point of the metering and collection chain resulting in missing values. We consider imputation approaches (linear interpolation, kernel smoothing, nearest neighbours, principal analysis by conditional estimation) that take advantage of the specificities of the data, that is to say the strong relation between the consumption at different instants of time. The performances of these techniques are compared on a real example of Irish electricity load curves under various scenarios of missing data. A general variance approximation of total estimators is also given which encompasses nearest neighbours, kernel smoothers imputation and linear imputation methods. The Canadian Journal of Statistics 47: 65–89; 2019 © 2018 Statistical Society of Canada  相似文献   

17.
Summary Meta-analyses of sets of clinical trials often combine risk differences from several 2×2 tables according to a random-effects model. The DerSimonian-Laird random-effects procedure, widely used for estimating the populaton mean risk difference, weights the risk difference from each primary study inversely proportional to an estimate of its variance (the sum of the between-study variance and the conditional within-study variance). Because those weights are not independent of the risk differences, however, the procedure sometimes exhibits bias and unnatural behavior. The present paper proposes a modified weighting scheme that uses the unconditional within-study variance to avoid this source of bias. The modified procedure has variance closer to that available from weighting by ideal weights when such weights are known. We studied the modified procedure in extensive simulation experiments using situations whose parameters resemble those of actual studies in medical research. For comparison we also included two unbiased procedures, the unweighted mean and a sample-size-weighted mean; their relative variability depends on the extent of heterogeneity among the primary studies. An example illustrates the application of the procedures to actual data and the differences among the results. This research was supported by Grant HS 05936 from the Agency for Health Care Policy and Research to Harvard University.  相似文献   

18.
Detecting dependence between marks and locations of marked point processes   总被引:1,自引:0,他引:1  
Summary.  We introduce two characteristics for stationary and isotropic marked point proces- ses, E ( h ) and V ( h ), and describe their use in investigating mark–point interactions. These quantities are functions of the interpoint distance h and denote the conditional expectation and the conditional variance of a mark respectively, given that there is a further point of the process a distance h away. We present tests based on E and V for the hypothesis that the values of the marks can be modelled by a random field which is independent of the unmarked point process. We apply the methods to two data sets in forestry.  相似文献   

19.
This paper proposes a new approach, based on the recent developments of the wavelet theory, to model the dynamic of the exchange rate. First, we consider the maximum overlap discrete wavelet transform (MODWT) to decompose the level exchange rates into several scales. Second, we focus on modelling the conditional mean of the detrended series as well as their volatilities. In particular, we consider the generalized fractional, one-factor, Gegenbauer process (GARMA) to model the conditional mean and the fractionally integrated generalized autoregressive conditional heteroskedasticity process (FIGARCH) to model the conditional variance. Moreover, we estimate the GARMA-FIGARCH model using the wavelet-based maximum likelihood estimator (Whitcher in Technometrics 46:225–238, 2004). To illustrate the usefulness of our methodology, we carry out an empirical application using the daily Tunisian exchange rates relative to the American Dollar, the Euro and the Japanese Yen. The empirical results show the relevance of the selected modelling approach which contributes to a better forecasting performance of the exchange rate series.  相似文献   

20.
Modern statistical applications involving large data sets have focused attention on statistical methodologies which are both efficient computationally and able to deal with the screening of large numbers of different candidate models. Here we consider computationally efficient variational Bayes approaches to inference in high-dimensional heteroscedastic linear regression, where both the mean and variance are described in terms of linear functions of the predictors and where the number of predictors can be larger than the sample size. We derive a closed form variational lower bound on the log marginal likelihood useful for model selection, and propose a novel fast greedy search algorithm on the model space which makes use of one-step optimization updates to the variational lower bound in the current model for screening large numbers of candidate predictor variables for inclusion/exclusion in a computationally thrifty way. We show that the model search strategy we suggest is related to widely used orthogonal matching pursuit algorithms for model search but yields a framework for potentially extending these algorithms to more complex models. The methodology is applied in simulations and in two real examples involving prediction for food constituents using NIR technology and prediction of disease progression in diabetes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号