首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
We propose localized spectral estimators for the quadratic covariation and the spot covolatility of diffusion processes, which are observed discretely with additive observation noise. The appropriate estimation for time‐varying volatilities is based on an asymptotic equivalence of the underlying statistical model to a white‐noise model with correlation and volatility processes being constant over small time intervals. The asymptotic equivalence of the continuous‐time and discrete‐time experiments is proved by a construction with linear interpolation in one direction and local means for the other. The new estimator outperforms earlier non‐parametric methods in the literature for the considered model. We investigate its finite sample size characteristics in simulations and draw a comparison between various proposed methods.  相似文献   

2.
3.
We consider computationally-fast methods for estimating parameters in ARMA processes from binary time series data, obtained by thresholding the latent ARMA process. All methods involve matching estimated and expected autocorrelations of the binary series. In particular, we focus on the spectral representation of the likelihood of an ARMA process and derive a restricted form of this likelihood, which uses correlations at only the first few lags. We contrast these methods with an efficient but computationally-intensive Markov chain Monte Carlo (MCMC) method. In a simulation study we show that, for a range of ARMA processes, the spectral method is more efficient than variants of least squares and much faster than MCMC. We illustrate by fitting an ARMA(2,1) model to a binary time series of cow feeding data.  相似文献   

4.
We consider two problems concerning locating change points in a linear regression model. One involves jump discontinuities (change-point) in a regression model and the other involves regression lines connected at unknown points. We compare four methods for estimating single or multiple change points in a regression model, when both the error variance and regression coefficients change simultaneously at the unknown point(s): Bayesian, Julious, grid search, and the segmented methods. The proposed methods are evaluated via a simulation study and compared via some standard measures of estimation bias and precision. Finally, the methods are illustrated and compared using three real data sets. The simulation and empirical results overall favor both the segmented and Bayesian methods of estimation, which simultaneously estimate the change point and the other model parameters, though only the Bayesian method is able to handle both continuous and dis-continuous change point problems successfully. If it is known that regression lines are continuous then the segmented method ranked first among methods.  相似文献   

5.
Model-based clustering methods for continuous data are well established and commonly used in a wide range of applications. However, model-based clustering methods for categorical data are less standard. Latent class analysis is a commonly used method for model-based clustering of binary data and/or categorical data, but due to an assumed local independence structure there may not be a correspondence between the estimated latent classes and groups in the population of interest. The mixture of latent trait analyzers model extends latent class analysis by assuming a model for the categorical response variables that depends on both a categorical latent class and a continuous latent trait variable; the discrete latent class accommodates group structure and the continuous latent trait accommodates dependence within these groups. Fitting the mixture of latent trait analyzers model is potentially difficult because the likelihood function involves an integral that cannot be evaluated analytically. We develop a variational approach for fitting the mixture of latent trait models and this provides an efficient model fitting strategy. The mixture of latent trait analyzers model is demonstrated on the analysis of data from the National Long Term Care Survey (NLTCS) and voting in the U.S. Congress. The model is shown to yield intuitive clustering results and it gives a much better fit than either latent class analysis or latent trait analysis alone.  相似文献   

6.
In many longitudinal studies multiple characteristics of each individual, along with time to occurrence of an event of interest, are often collected. In such data set, some of the correlated characteristics may be discrete and some of them may be continuous. In this paper, a joint model for analysing multivariate longitudinal data comprising mixed continuous and ordinal responses and a time to event variable is proposed. We model the association structure between longitudinal mixed data and time to event data using a multivariate zero-mean Gaussian process. For modeling discrete ordinal data we assume a continuous latent variable follows the logistic distribution and for continuous data a Gaussian mixed effects model is used. For the event time variable, an accelerated failure time model is considered under different distributional assumptions. For parameter estimation, a Bayesian approach using Markov Chain Monte Carlo is adopted. The performance of the proposed methods is illustrated using some simulation studies. A real data set is also analyzed, where different model structures are used. Model comparison is performed using a variety of statistical criteria.  相似文献   

7.
This paper describes the various stages in building a statistical model to predict temperatures in the core of a reactor, and compares the benefits of this model with those of a physical model. We give a brief background to this study and the applications of the model to rapid online monitoring and safe operation of the reactor. We describe the methods, of correlation and two dimensional spectral analysis, which we use to identify the effects that are incorporated in a spatial regression model for the measured temperatures. These effects are related to the age of the reactor fuel and the spatial geometry of the reactor. A remaining component of the temperature variation is a slowly varying temperature surface modelled by smooth functions with constrained coefficients. We assess the accuracy of the model for interpolating temperatures throughout the reactor, when measurements are available only at a reduced set of spatial locations, as is the case in most reactors. Further possible improvements to the model are discussed.  相似文献   

8.
Summary.  Hansen, Kooperberg and Sardy introduced a family of continuous, piecewise linear functions defined over adaptively selected triangulations of the plane as a general approach to statistical modelling of bivariate densities and regression and hazard functions. These triograms enjoy a natural affine equivariance that offers distinct advantages over competing tensor product methods that are more commonly used in statistical applications. Triograms employ basis functions consisting of linear 'tent functions' defined with respect to a triangulation of a given planar domain. As in knot selection for univariate splines, Hansen and colleagues adopted the regression spline approach of Stone. Vertices of the triangulation are introduced or removed sequentially in an effort to balance fidelity to the data and parsimony. We explore a smoothing spline variant of the triogram model based on a roughness penalty adapted to the piecewise linear structure of the triogram model. We show that the roughness penalty proposed may be interpreted as a total variation penalty on the gradient of the fitted function. The methods are illustrated with real and artificial examples, including an application to estimated quantile surfaces of land value in the Chicago metropolitan area.  相似文献   

9.
This paper describes the various stages in building a statistical model to predict temperatures in the core of a reactor, and compares the benefits of this model with those of a physical model. We give a brief background to this study and the applications of the model to rapid online monitoring and safe operation of the reactor. We describe the methods, of correlation and two dimensional spectral analysis, which we use to identify the effects that are incorporated in a spatial regression model for the measured temperatures. These effects are related to the age of the reactor fuel and the spatial geometry of the reactor. A remaining component of the temperature variation is a slowly varying temperature surface modelled by smooth functions with constrained coefficients. We assess the accuracy of the model for interpolating temperatures throughout the reactor, when measurements are available only at a reduced set of spatial locations, as is the case in most reactors. Further possible improvements to the model are discussed.  相似文献   

10.
Most feature screening methods for ultrahigh-dimensional classification explicitly or implicitly assume the covariates are continuous. However, in the practice, it is quite common that both categorical and continuous covariates appear in the data, and applicable feature screening method is very limited. To handle this non-trivial situation, we propose an entropy-based feature screening method, which is model free and provides a unified screening procedure for both categorical and continuous covariates. We establish the sure screening and ranking consistency properties of the proposed procedure. We investigate the finite sample performance of the proposed procedure by simulation studies and illustrate the method by a real data analysis.  相似文献   

11.
We study the problem of classifying an individual into one of several populations based on mixed nominal, continuous, and ordinal data. Specifically, we obtain a classification procedure as an extension to the so-called location linear discriminant function, by specifying a general mixed-data model for the joint distribution of the mixed discrete and continuous variables. We outline methods for estimating misclassification error rates. Results of simulations of the performance of proposed classification rules in various settings vis-à-vis a robust mixed-data discrimination method are reported as well. We give an example utilizing data on croup in children.  相似文献   

12.
Random coefficient model (RCM) is a powerful statistical tool in analyzing correlated data collected from studies with different clusters or from longitudinal studies. In practice, there is a need for statistical methods that allow biomedical researchers to adjust for the measured and unmeasured covariates that might affect the regression model. This article studies two nonparametric methods dealing with auxiliary covariate data in linear random coefficient models. We demonstrate how to estimate the coefficients of the models and how to predict the random effects when the covariates are missing or mismeasured. We employ empirical estimator and kernel smoother to handle a discrete and continuous auxiliary, respectively. Simulation results show that the proposed methods perform better than an alternative method that only uses data in the validation data set and ignores the random effects in the random coefficient model.  相似文献   

13.
A large volume of CCD X-ray spectra is being generated by the Chandra X-ray Observatory (Chandra) and XMM-Newton. Automated spectral analysis and classification methods can aid in sorting, characterizing, and classifying this large volume of CCD X-ray spectra in a non-parametric fashion, complementary to current parametric model fits. We have developed an algorithm that uses multivariate statistical techniques, including an ensemble clustering method, applied for the first time for X-ray spectral classification. The algorithm uses spectral data to group similar discrete sources of X-ray emission by placing the X-ray sources in a three-dimensional spectral sequence and then grouping the ordered sources into clusters based on their spectra. This new method can handle large quantities of data and operate independently of the requirement of spectral source models and a priori knowledge concerning the nature of the sources (i.e., young stars, interacting binaries, active galactic nuclei). We apply the method to Chandra imaging spectroscopy of the young stellar clusters in the Orion Nebula Cluster and the NGC 1333 star formation region.  相似文献   

14.
Summary.  The moment method is a well-known astronomical mode identification technique in asteroseismology which uses a time series of the first three moments of a spectral line to estimate the discrete oscillation mode parameters l and m . The method, in contrast with many other mode identification techniques, also provides estimates of other important continuous parameters such as the inclination angle α and the rotational velocity v e. We developed a statistical formalism for the moment method based on so-called generalized estimating equations. This formalism allows an estimation of the uncertainty of the continuous parameters, taking into account that the different moments of a line profile are correlated and that the uncertainty of the observed moments also depends on the model parameters. Furthermore, we set up a procedure to take into account the mode uncertainty, i.e. the fact that often several modes ( l ,  m ) can adequately describe the data. We also introduce a new lack-of-fit function which works at least as well as a previous discriminant function, and which in addition allows us to identify the sign of the azimuthal order m . We applied our method to star HD181558 by using several numerical methods, from which we learned that numerically solving the estimating equations is an intensive task. We report on the numerical results, from which we gain insight in the statistical uncertainties of the physical parameters that are involved in the moment method.  相似文献   

15.
This research was motivated by our goal to design an efficient clinical trial to compare two doses of docosahexaenoic acid supplementation for reducing the rate of earliest preterm births (ePTB) and/or preterm births (PTB). Dichotomizing continuous gestational age (GA) data using a classic binomial distribution will result in a loss of information and reduced power. A distributional approach is an improved strategy to retain statistical power from the continuous distribution. However, appropriate distributions that fit the data properly, particularly in the tails, must be chosen, especially when the data are skewed. A recent study proposed a skew-normal method. We propose a three-component normal mixture model and introduce separate treatment effects at different components of GA. We evaluate operating characteristics of mixture model, beta-binomial model, and skew-normal model through simulation. We also apply these three methods to data from two completed clinical trials from the USA and Australia. Finite mixture models are shown to have favorable properties in PTB analysis but minimal benefit for ePTB analysis. Normal models on log-transformed data have the largest bias. Therefore we recommend finite mixture model for PTB study. Either finite mixture model or beta-binomial model is acceptable for ePTB study.  相似文献   

16.
Many areas of statistical modeling are plagued by the “curse of dimensionality,” in which there are more variables than observations. This is especially true when developing functional regression models where the independent dataset is some type of spectral decomposition, such as data from near-infrared spectroscopy. While we could develop a very complex model by simply taking enough samples (such that n > p), this could prove impossible or prohibitively expensive. In addition, a regression model developed like this could turn out to be highly inefficient, as spectral data usually exhibit high multicollinearity. In this article, we propose a two-part algorithm for selecting an effective and efficient functional regression model. Our algorithm begins by evaluating a subset of discrete wavelet transformations, allowing for variation in both wavelet and filter number. Next, we perform an intermediate processing step to remove variables with low correlation to the response data. Finally, we use the genetic algorithm to perform a stochastic search through the subset regression model space, driven by an information-theoretic objective function. We allow our algorithm to develop the regression model for each response variable independently, so as to optimally model each variable. We demonstrate our method on the familiar biscuit dough dataset, which has been used in a similar context by several researchers. Our results demonstrate both the flexibility and the power of our algorithm. For each response variable, a different subset model is selected, and different wavelet transformations are used. The models developed by our algorithm show an improvement, as measured by lower mean error, over results in the published literature.  相似文献   

17.
We compare the commonly used two-step methods and joint likelihood method for joint models of longitudinal and survival data via extensive simulations. The longitudinal models include LME, GLMM, and NLME models, and the survival models include Cox models and AFT models. We find that the full likelihood method outperforms the two-step methods for various joint models, but it can be computationally challenging when the dimension of the random effects in the longitudinal model is not small. We thus propose an approximate joint likelihood method which is computationally efficient. We find that the proposed approximation method performs well in the joint model context, and it performs better for more “continuous” longitudinal data. Finally, a real AIDS data example shows that patients with higher initial viral load or lower initial CD4 are more likely to drop out earlier during an anti-HIV treatment.  相似文献   

18.
Abstract

Teratological experiments are controlled dose-response studies in which impregnated animals are randomly assigned to various exposure levels of a toxic substance. Subsequently, both continuous and discrete responses are recorded on the litters of fetuses that these animals produce. Discrete responses are usually binary in nature, such as the presence or absence of some fetal anomaly. This clustered binary data usually exhibits over-dispersion (or under-dispersion), which can be interpreted as either variation between litter response probabilities or intralitter correlation. To model the correlation and/or variation, the beta-binomial distribution has been assumed for the number of positive fetal responses within a litter. Although the mean of the beta-binomial model has been linked to dose-response functions, in terms of measuring over-dispersion, it may be a restrictive method in modeling data from teratological studies. Also for certain toxins, a threshold effect has been observed in the dose-response pattern of the data. We propose to incorporate a random effect into a general threshold dose-response model to account for the variation in responses, while at the same time estimating the threshold effect. We fit this model to a well-known data set in the field of teratology. Simulation studies are performed to assess the validity of the random effects threshold model in these types of studies.  相似文献   

19.
Scientific experiments commonly result in clustered discrete and continuous data. Existing methods for analyzing such data include the use of quasi-likelihood procedures and generalized estimating equations to estimate marginal mean response parameters. In applications to areas such as developmental toxicity studies, where discrete and continuous measurements are recorded on each fetus, or clinical ophthalmologic trials, where different types of observations are made on each eye, the assumption that data within cluster are exchangeable is often very reasonable. We use this assumption to formulate fully parametric regression models for clusters of bivariate data with binary and continuous components. The regression models proposed have marginal interpretations and reproducible model structures. Tractable expressions for likelihood equations are derived and iterative schemes are given for computing efficient estimates (MLEs) of the marginal mean, correlations, variances and higher moments. We demonstrate the use the ‘exchangeable’ procedure with an application to a developmental toxicity study involving fetal weight and malformation data.  相似文献   

20.
One of the main problems in geostatistics is fitting a valid variogram or covariogram model in order to describe the underlying dependence structure in the data. The dependence between observations can be also modeled in the spectral domain, but the traditional methods based on the periodogram as an estimator of the spectral density may present some problems for the spatial case. In this work, we propose an estimation method for the covariogram parameters based on the fast Fourier transform (FFT) of biased covariances. The performance of this estimator for finite samples is compared through a simulation study with other classical methods stated in spatial domain, such as weighted least squares and maximum likelihood, as well as with other spectral estimators. Additionally, an example of application to real data is given.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号