首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
Data Sharpening for Hazard Rate Estimation   总被引:1,自引:0,他引:1  
Data sharpening is a general tool for enhancing the performance of statistical estimators, by altering the data before substituting them into conventional methods. In one of the simplest forms of data sharpening, available for curve estimation, an explicit empirical transformation is used to alter the data. The attraction of this approach is diminished, however, if the formula has to be altered for each different application. For example, one could expect the formula for use in hazard rate estimation to differ from that for straight density estimation, since a hazard rate is a ratio–type functional of a density. This paper shows that, in fact, identical data transformations can be used in each case, regardless of whether the data involve censoring. This dramatically simplifies the application of data sharpening to problems involving hazard rate estimation, and makes data sharpening attractive.  相似文献   

2.
A common assumption for data analysis in functional magnetic resonance imaging is that the response signal can be modelled as the convolution of a haemodynamic response (HDR) kernel with a stimulus reference function. Early approaches modelled spatially constant HDR kernels, but more recently spatially varying models have been proposed. However, convolution limits the flexibility of these models and their ability to capture spatial variation. Here, a range of (nonlinear) parametric curves are fitted by least squares minimisation directly to individual voxel HDRs (i.e., without using convolution). A ‘constrained gamma curve’ is proposed as an efficient form for fitting the HDR at each individual voxel. This curve allows for spatial variation in the delay of the HDR, but places a global constraint on the temporal spread. The approach of directly fitting individual parameters of HDR shape is demonstrated to lead to an improved fit of response estimates.  相似文献   

3.
The nonparametric estimation of the growth curve has been extensively studied in both stationary and some nonstationary particular situations. In this work, we consider the statistical problem of estimating the average growth curve for a fixed design model with nonstationary error process. The nonstationarity considered here is of a general form, and this article may be considered as an extension of previous results. The optimal bandwidth is shown to depend on the singularity of the autocovariance function of the error process along the diagonal. A Monte Carlo study is conducted in order to assess the influence of the number of subjects and the number of observations per subject on the estimation.  相似文献   

4.
This paper presents a two‐stage procedure for estimating the conditional support curve of a random variable X, given the information of a random vector X. Quantile estimation is followed by an extremal analysis on the residuals for problems which can be written as regression models. The technique is applied to data from the National Bureau of Economic Research and US Census Bureau's Center for Economic Studies which contain all four‐digit manufacturing industries. Simulation results show that in linear regression models the proposed estimation procedure is more efficient than the extreme linear regression quantile.  相似文献   

5.
ABSTRACT

We propose a new unsupervised learning algorithm to fit regression mixture models with unknown number of components. The developed approach consists in a penalized maximum likelihood estimation carried out by a robust expectation–maximization (EM)-like algorithm. We derive it for polynomial, spline, and B-spline regression mixtures. The proposed learning approach is unsupervised: (i) it simultaneously infers the model parameters and the optimal number of the regression mixture components from the data as the learning proceeds, rather than in a two-fold scheme as in standard model-based clustering using afterward model selection criteria, and (ii) it does not require accurate initialization unlike the standard EM for regression mixtures. The developed approach is applied to curve clustering problems. Numerical experiments on simulated and real data show that the proposed algorithm performs well and provides accurate clustering results, and confirm its benefit for practical applications.  相似文献   

6.
Contamination of a sampled distribution, for example by a heavy-tailed distribution, can degrade the performance of a statistical estimator. We suggest a general approach to alleviating this problem, using a version of the weighted bootstrap. The idea is to 'tilt' away from the contaminated distribution by a given (but arbitrary) amount, in a direction that minimizes a measure of the new distribution's dispersion. This theoretical proposal has a simple empirical version, which results in each data value being assigned a weight according to an assessment of its influence on dispersion. Importantly, distance can be measured directly in terms of the likely level of contamination, without reference to an empirical measure of scale. This makes the procedure particularly attractive for use in multivariate problems. It has several forms, depending on the definitions taken for dispersion and for distance between distributions. Examples of dispersion measures include variance and generalizations based on high order moments. Practicable measures of the distance between distributions may be based on power divergence, which includes Hellinger and Kullback–Leibler distances. The resulting location estimator has a smooth, redescending influence curve and appears to avoid computational difficulties that are typically associated with redescending estimators. Its breakdown point can be located at any desired value ε∈ (0, ½) simply by 'trimming' to a known distance (depending only on ε and the choice of distance measure) from the empirical distribution. The estimator has an affine equivariant multivariate form. Further, the general method is applicable to a range of statistical problems, including regression.  相似文献   

7.
This article reviews four area-level linear mixed models that borrow strength by exploiting the possible correlation among the neighboring areas or/and past time periods. Its main goal is to study if there are efficiency gains when a spatial dependence or/and a temporal autocorrelation among random-area effects are included into the models. The Fay–Herriot estimator is used as benchmark. A design-based simulation study based on real data collected from a longitudinal survey conducted by a statistical office is presented. Our results show that models that explore both spatial and chronological association considerably improve the efficiency of small area estimates.  相似文献   

8.
A simple approach for analyzing longitudinally measured biomarkers is to calculate summary measures such as the area under the curve (AUC) for each individual and then compare the mean AUC between treatment groups using methods such as t test. This two-step approach is difficult to implement when there are missing data since the AUC cannot be directly calculated for individuals with missing measurements. Simple methods for dealing with missing data include the complete case analysis and imputation. A recent study showed that the estimated mean AUC difference between treatment groups based on the linear mixed model (LMM), rather than on individually calculated AUCs by simple imputation, has negligible bias under random missing assumptions and only small bias when missing is not at random. However, this model assumes the outcome to be normally distributed, which is often violated in biomarker data. In this paper, we propose to use a LMM on log-transformed biomarkers, based on which statistical inference for the ratio, rather than difference, of AUC between treatment groups is provided. The proposed method can not only handle the potential baseline imbalance in a randomized trail but also circumvent the estimation of the nuisance variance parameters in the log-normal model. The proposed model is applied to a recently completed large randomized trial studying the effect of nicotine reduction on biomarker exposure of smokers.  相似文献   

9.
基于遗传算法的扩展Nelson-Siegel模型及实证研究   总被引:1,自引:0,他引:1  
将遗传算法引入扩展Nelson-Siegel模型估计中,并将其用于国债收益率曲线的估计。实证分析表明,基于遗传算法的扩展Nelson-Siegel模型在收益率曲线拟合和估计方面明显优于基于三次样条插值的息票剥离法及基于非线性回归的扩展Nelson-Siegel模型。基于此,利用基于遗传算法的扩展Nelson-Siegel模型对所选取的三个样本交易日的收益率曲线进行估计和分析,发现金融危机中后期的收益率曲线较金融危机初期的收益率曲线有了显著变化,主要表现在收益率曲线整体水平下降,但不同部分下降的幅度不同,并且收益率曲线的斜率和曲度均明显增大。以上变化主要是金融危机背景下货币政策调整以及市场信心变化共同作用的结果。  相似文献   

10.
Interpretation of continuous measurements in microenvironmental studies and exposure assessments can be complicated by autocorrelation, the implications of which are often not fully addressed. We discuss some statistical issues that arose in the analysis of microenvironmental particulate matter concentration data collected in 1998 by the Harvard School of Public Health. We present a simulation study that suggests that Generalized Estimating Equations, a technique often used to adjust for autocorrelation, may produce inflated Type I errors when applied to microenvironmental studies of small or moderate sample size, and that Linear Mixed Effects models may be more appropriate in small-sample settings. Environmental scientists often appeal to longer averaging times to reduce autocorrelation. We explore the functional relationship between averaging time, autocorrelation, and standard errors of both mean and variance, showing that longer averaging times impair statistical inferences about main effects. We conclude that, given widely available techniques that adjust for autocorrelation, longer averaging times may be inappropriate in microenvironmental studies.  相似文献   

11.
Real-time monitoring is necessary for nanoparticle exposure assessment to characterize the exposure profile, but the data produced are autocorrelated. This study was conducted to compare three statistical methods used to analyze data, which constitute autocorrelated time series, and to investigate the effect of averaging time on the reduction of the autocorrelation using field data. First-order autoregressive (AR(1)) and autoregressive-integrated moving average (ARIMA) models are alternative methods that remove autocorrelation. The classical regression method was compared with AR(1) and ARIMA. Three data sets were used. Scanning mobility particle sizer data were used. We compared the results of regression, AR(1), and ARIMA with averaging times of 1, 5, and 10?min. AR(1) and ARIMA models had similar capacities to adjust autocorrelation of real-time data. Because of the non-stationary of real-time monitoring data, the ARIMA was more appropriate. When using the AR(1), transformation into stationary data was necessary. There was no difference with a longer averaging time. This study suggests that the ARIMA model could be used to process real-time monitoring data especially for non-stationary data, and averaging time setting is flexible depending on the data interval required to capture the effects of processes for occupational and environmental nano measurements.  相似文献   

12.
我国“十一五”各省区节能潜力分析   总被引:9,自引:0,他引:9  
内容提要:本文基于历史序列的分析,提出了区域经济-能源消耗的“学习曲线”,运用30个省区1990-2005年的时间序列数据,建立了万元产值能耗随人均国民生产总值(GDP)增加而逐步下降的能源学习曲线,并分析并计算了“十一五”期间不同省区万元产值的节能潜力和在全国总节能目标中的分担率。结果显示:(1)山西、甘肃以及贵州等省区的万元产值的节能潜力较大,而福建、浙江、广东、江苏、北京等省区较小;(2)结合各省区经济总量和能源消耗,山东、山西、河北、辽宁等能耗大省和广东、江苏、浙江、上海等经济强省节能贡献大分担率高,而海南等省区,由于经济规模小而节能贡献相对较小,从而给出我国“十一五”期间万元产值节能潜力和节能贡献分担率的地区分布格局。  相似文献   

13.
Common kernel density estimators (KDE) are generalised, which involve that assumptions on the kernel of the distribution can be given. Instead of using metrics as input to the kernels, the new estimators use parameterisable pseudometrics. In general, the volumes of the balls in pseudometric spaces are dependent on both the radius and the location of the centre. To enable constant smoothing, the volumes of the balls need to be calculated and analytical expressions are preferred for computational reasons. Two suitable parametric families of pseudometrics are identified. One of them has common KDE as special cases. In a few experiments, the proposed estimators show increased statistical power when proper assumptions are made. As a consequence, this paper describes an approach, where partial knowledge about the distribution can be used effectively. Furthermore, it is suggested that the new estimators are adequate for statistical learning algorithms such as regression and classification.  相似文献   

14.
This paper presents the limit distribution (as the number of time points increase) for the score vector of a growth curve model assuming both stationary and explosive autoregressive (A.R.) errors. Limit distributions of the score statistic and the likelihood-ratio statistic for testing composite hypotheses about the regression parameters of several growth curves, when the autocorrelation parameters are treated as nuisance parameters, are presented.  相似文献   

15.
The authors deal with the problem of comparing receiver operating characteristic (ROC) curves from independent samples. From a nonparametric approach, they propose and study three different statistics. Their asymptotic distributions are obtained and a resample plan is considered. In order to study the statistical power of the introduced statistics, a simulation study is carried out. The (observed) results suggest that, for the considered models, the new statistics are more powerful than the usually employed ones (the Venkatraman test and the usual area under the ROC curve criterion) in non-uniform dominance situations and quite good otherwise.  相似文献   

16.
A model involving autocorrelated random effects and sampling errors is proposed for small-area estimation, using both time-series and cross-sectional data. The sampling errors are assumed to have a known block-diagonal covariance matrix. This model is an extension of a well-known model, due to Fay and Herriot (1979), for cross-sectional data. A two-stage estimator of a small-area mean for the current period is obtained under the proposed model with known autocorrelation, by first deriving the best linear unbiased prediction estimator assuming known variance components, and then replacing them with their consistent estimators. Extending the approach of Prasad and Rao (1986, 1990) for the Fay-Herriot model, an estimator of mean squared error (MSE) of the two-stage estimator, correct to a second-order approximation for a small or moderate number of time points, T, and a large number of small areas, m, is obtained. The case of unknown autocorrelation is also considered. Limited simulation results on the efficiency of two-stage estimators and the accuracy of the proposed estimator of MSE are présentés.  相似文献   

17.
In assessing the area under the ROC curve for the accuracy of a diagnostic test, it is imperative to detect and locate multiple abnormalities per image. This approach takes that into account by adopting a statistical model that allows for correlation between the reader scores of several regions of interest (ROI).

The ROI method of partitioning the image is taken. The readers give a score to each ROI in the image and the statistical model takes into account the correlation between the scores of the ROI's of an image in estimating test accuracy. The test accuracy is given by Pr[Y > Z] + (1/2)Pr[Y = Z], where Y is an ordinal diagnostic measurement of an affected ROI, and Z is the diagnostic measurement of an unaffected ROI. This way of measuring test accuracy is equivalent to the area under the ROC curve. The parameters are the parameters of a multinomial distribution, then based on the multinomial distribution, a Bayesian method of inference is adopted for estimating the test accuracy.

Using a multinomial model for the test results, a Bayesian method based on the predictive distribution of future diagnostic scores is employed to find the test accuracy. By resampling from the posterior distribution of the model parameters, samples from the posterior distribution of test accuracy are also generated. Using these samples, the posterior mean, standard deviation, and credible intervals are calculated in order to estimate the area under the ROC curve. This approach is illustrated by estimating the area under the ROC curve for a study of the diagnostic accuracy of magnetic resonance angiography for diagnosis of arterial atherosclerotic stenosis. A generalization to multiple readers and/or modalities is proposed.

A Bayesian way to estimate test accuracy is easy to perform with standard software packages and has the advantage of employing the efficient inclusion of information from prior related imaging studies.  相似文献   

18.
It is of scientific interest to study the application of COM-Poisson model to the case of longitudinal response data, the analysis of which is quite challenging due to the fact that longitudinal responses of a subject are correlated and the correlation pattern is usually unknown. In this article, we extend the COM-Poisson GLM to the generalized linear longitudinal model. We also develop a joint generalized quasi-likelihood estimating equation approach based on a stationary autocorrelation structure for the repeated count data. We further compare the performance of this estimation method with that of Generalized Method of Moments through a simulation study.  相似文献   

19.
The aim of the present work was to develop a new mathematical method for estimating the area under the curve (AUC) and its variability that could be applied in different preclinical experimental designs and amenable to be implemented in standard calculation worksheets. In order to assess the usefulness of the new approach, different experimental scenarios were studied and the results were compared with those obtained with commonly used software: WinNonlin® and Phoenix WinNonlin®. The results do not show statistical differences among the AUC values obtained by both procedures, but the new method appears to be a better estimator of the AUC standard error, measured as the coverage of 95% confidence interval. In this way, the new proposed method demonstrates to be as useful as WinNonlin® software when it was applicable. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

20.
Summary.  We develop a general non-parametric approach to the analysis of clustered data via random effects. Assuming only that the link function is known, the regression functions and the distributions of both cluster means and observation errors are treated non-parametrically. Our argument proceeds by viewing the observation error at the cluster mean level as though it were a measurement error in an errors-in-variables problem, and using a deconvolution argument to access the distribution of the cluster mean. A Fourier deconvolution approach could be used if the distribution of the error-in-variables were known. In practice it is unknown, of course, but it can be estimated from repeated measurements, and in this way deconvolution can be achieved in an approximate sense. This argument might be interpreted as implying that large numbers of replicates are necessary for each cluster mean distribution, but that is not so; we avoid this requirement by incorporating statistical smoothing over values of nearby explanatory variables. Empirical rules are developed for the choice of smoothing parameter. Numerical simulations, and an application to real data, demonstrate small sample performance for this package of methodology. We also develop theory establishing statistical consistency.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号