首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Summary.  We construct approximate confidence intervals for a nonparametric regression function, using polynomial splines with free-knot locations. The number of knots is determined by generalized cross-validation. The estimates of knot locations and coefficients are obtained through a non-linear least squares solution that corresponds to the maximum likelihood estimate. Confidence intervals are then constructed based on the asymptotic distribution of the maximum likelihood estimator. Average coverage probabilities and the accuracy of the estimate are examined via simulation. This includes comparisons between our method and some existing methods such as smoothing spline and variable knots selection as well as a Bayesian version of the variable knots method. Simulation results indicate that our method works well for smooth underlying functions and also reasonably well for discontinuous functions. It also performs well for fairly small sample sizes.  相似文献   

2.
Two new stochastic search methods are proposed for optimizing the knot locations and/or smoothing parameters for least-squares or penalized splines. One of the methods is a golden-section-augmented blind search, while the other is a continuous genetic algorithm. Monte Carlo experiments indicate that the algorithms are very successful at producing knot locations and/or smoothing parameters that are near optimal in a squared error sense. Both algorithms are amenable to parallelization and have been implemented in OpenMP and MPI. An adjusted GCV criterion is also considered for selecting both the number and location of knots. The method performed well relative to MARS in a small empirical comparison.  相似文献   

3.
The geographical relative risk function is a useful tool for investigating the spatial distribution of disease based on case and control data. The most common way of estimating this function is using the ratio of bivariate kernel density estimates constructed from the locations of cases and controls, respectively. An alternative is to use a local-linear (LL) estimator of the log-relative risk function. In both cases, the choice of bandwidth is critical. In this article, we examine the relative performance of the two estimation techniques using a variety of data-driven bandwidth selection methods, including likelihood cross-validation (CV), least-squares CV, rule-of-thumb reference methods, and a new approximate plug-in (PI) bandwidth for the LL estimator. Our analysis includes the comparison of asymptotic results; a simulation study; and application of the estimators on two real data sets. Our findings suggest that the density ratio method implemented with the least-squares CV bandwidth selector is generally best, with the LL estimator with PI bandwidth being competitive in applications with strong large-scale trends but much worse in situations with elliptical clusters.  相似文献   

4.
In conditional logspline modelling, the logarithm of the conditional density function, log f(y|x), is modelled by using polynomial splines and their tensor products. The parameters of the model (coefficients of the spline functions) are estimated by maximizing the conditional log-likelihood function. The resulting estimate is a density function (positive and integrating to one) and is twice continuously differentiable. The estimate is used further to obtain estimates of regression and quantile functions in a natural way. An automatic procedure for selecting the number of knots and knot locations based on minimizing a variant of the AIC is developed. An example with real data is given. Finally, extensions and further applications of conditional logspline models are discussed.  相似文献   

5.
Methods for choosing a fixed set of knot locations in additive spline models are fairly well established in the statistical literature. The curse of dimensionality makes it nontrivial to extend these methods to nonadditive surface models, especially when there are more than a couple of covariates. We propose a multivariate Gaussian surface regression model that combines both additive splines and interactive splines, and a highly efficient Markov chain Monte Carlo algorithm that updates all the knot locations jointly. We use shrinkage prior to avoid overfitting with different estimated shrinkage factors for the additive and surface part of the model, and also different shrinkage parameters for the different response variables. Simulated data and an application to firm leverage data show that the approach is computationally efficient, and that allowing for freely estimated knot locations can offer a substantial improvement in out‐of‐sample predictive performance.  相似文献   

6.
Spatial econometric models estimated on the big geo-located point data have at least two problems: limited computational capabilities and inefficient forecasting for the new out-of-sample geo-points. This is because of spatial weights matrix W defined for in-sample observations only and the computational complexity. Machine learning models suffer the same when using kriging for predictions; thus this problem still remains unsolved. The paper presents a novel methodology for estimating spatial models on big data and predicting in new locations. The approach uses bootstrap and tessellation to calibrate both model and space. The best bootstrapped model is selected with the PAM (Partitioning Around Medoids) algorithm by classifying the regression coefficients jointly in a nonindependent manner. Voronoi polygons for the geo-points used in the best model allow for a representative space division. New out-of-sample points are assigned to tessellation tiles and linked to the spatial weights matrix as a replacement for an original point what makes feasible usage of calibrated spatial models as a forecasting tool for new locations. There is no trade-off between forecast quality and computational efficiency in this approach. An empirical example illustrates a model for business locations and firms' profitability.  相似文献   

7.
In this article, we study the problem of estimating the prevalence rate of a disease in a geographical area, based on data collected from a sample of locations within this area. If there are several locations with zero incidence of the disease, the usual estimators are not suitable and so we develop a new estimator, together with an unbiased estimator of its variance, which may be appropriately used in such situations. An application of this estimator is illustrated with data from a large-scale survey, which was carried out in the city of Kolkata, India, to estimate the prevalence rate of stroke. We show that spatial modelling may be used to smooth the observed data before applying our proposed estimator. Our computations show that this smoothing helps to reduce the coefficient of variation and such a model-cum-design-based procedure is useful for estimating the prevalence rate. This method may of course be used in other similar situations.  相似文献   

8.
ABSTRACT

In this paper, we propose modified spline estimators for nonparametric regression models with right-censored data, especially when the censored response observations are converted to synthetic data. Efficient implementation of these estimators depends on the set of knot points and an appropriate smoothing parameter. We use three algorithms, the default selection method (DSM), myopic algorithm (MA), and full search algorithm (FSA), to select the optimum set of knots in a penalized spline method based on a smoothing parameter, which is chosen based on different criteria, including the improved version of the Akaike information criterion (AICc), generalized cross validation (GCV), restricted maximum likelihood (REML), and Bayesian information criterion (BIC). We also consider the smoothing spline (SS), which uses all the data points as knots. The main goal of this study is to compare the performance of the algorithm and criteria combinations in the suggested penalized spline fits under censored data. A Monte Carlo simulation study is performed and a real data example is presented to illustrate the ideas in the paper. The results confirm that the FSA slightly outperforms the other methods, especially for high censoring levels.  相似文献   

9.
Abstract

Spatial heterogeneity and correlation are both considered in the geographical weighted spatial autoregressive model. At present, this kind of model has aroused the attention of some scholars. For the estimation of the model, the existing research is based on the assumption that the error terms are independent and identically distributed. In this article we use a computationally simple procedure for estimating the model with spatially autoregressive disturbance terms, both the estimates of constant coefficients and variable coefficients are obtained. Finally, we give the large sample properties of the estimators under some ordinary conditions. In addition, application study of the estimation methods involved will be further explored in a separate study.  相似文献   

10.
With the ready availability of spatial databases and geographical information system software, statisticians are increasingly encountering multivariate modelling settings featuring associations of more than one type: spatial associations between data locations and associations between the variables within the locations. Although flexible modelling of multivariate point-referenced data has recently been addressed by using a linear model of co-regionalization, existing methods for multivariate areal data typically suffer from unnecessary restrictions on the covariance structure or undesirable dependence on the conditioning order of the variables. We propose a class of Bayesian hierarchical models for multivariate areal data that avoids these restrictions, permitting flexible and order-free modelling of correlations both between variables and across areal units. Our framework encompasses a rich class of multivariate conditionally autoregressive models that are computationally feasible via modern Markov chain Monte Carlo methods. We illustrate the strengths of our approach over existing models by using simulation studies and also offer a real data application involving annual lung, larynx and oesophageal cancer death-rates in Minnesota counties between 1990 and 2000.  相似文献   

11.
Nutritional status of a child is an important determinant factor of his/her health and survival. Evaluation of child nutritional status is usually based on three anthropometric indices of height-for-age (stunting), weight-for-height (wasting) and weight-for-age (underweight). This paper is a case study that focuses on a new approach of estimating nonlinear effects. It models the dependence of probability of underweight children in Zambia on some covariates, some of which are metrical whose effects are assumed to be nonlinear and estimated nonparametrically through a Bayesian B-spline basis function approach with adaptive knot selection using the data set from the 1992 Zambia Demographic and Health Survey (ZDHS). For all the unknown functions, the number and location of knots as well as the unknown coefficients of the basis functions are estimated simultaneously using reversible jump Markov chain Monte Carlo (RJMCMC) techniques.  相似文献   

12.

Regression spline smoothing is a popular approach for conducting nonparametric regression. An important issue associated with it is the choice of a "theoretically best" set of knots. Different statistical model selection methods, such as Akaike's information criterion and generalized cross-validation, have been applied to derive different "theoretically best" sets of knots. Typically these best knot sets are defined implicitly as the optimizers of some objective functions. Hence another equally important issue concerning regression spline smoothing is how to optimize such objective functions. In this article different numerical algorithms that are designed for carrying out such optimization problems are compared by means of a simulation study. Both the univariate and bivariate smoothing settings will be considered. Based on the simulation results, recommendations for choosing a suitable optimization algorithm under various settings will be provided.  相似文献   

13.
Studies on diffusion tensor imaging (DTI) quantify the diffusion of water molecules in a brain voxel using an estimated 3 × 3 symmetric positive definite (p.d.) diffusion tensor matrix. Due to the challenges associated with modelling matrix‐variate responses, the voxel‐level DTI data are usually summarized by univariate quantities, such as fractional anisotropy. This approach leads to evident loss of information. Furthermore, DTI analyses often ignore the spatial association among neighbouring voxels, leading to imprecise estimates. Although the spatial modelling literature is rich, modelling spatially dependent p.d. matrices is challenging. To mitigate these issues, we propose a matrix‐variate Bayesian semiparametric mixture model, where the p.d. matrices are distributed as a mixture of inverse Wishart distributions, with the spatial dependence captured by a Markov model for the mixture component labels. Related Bayesian computing is facilitated by conjugacy results and use of the double Metropolis–Hastings algorithm. Our simulation study shows that the proposed method is more powerful than competing non‐spatial methods. We also apply our method to investigate the effect of cocaine use on brain microstructure. By extending spatial statistics to matrix‐variate data, we contribute to providing a novel and computationally tractable inferential tool for DTI analysis.  相似文献   

14.
Bartlett correction constitutes one of the attractive features of empirical likelihood because it enables the construction of confidence regions for parameters with improved coverage probabilities. We study the Bartlett correction of spatial frequency domain empirical likelihood (SFDEL) based on general spectral estimating functions for regularly spaced spatial data. This general formulation can be applied to testing and estimation problems in spatial analysis, for example testing covariance isotropy, testing covariance separability as well as estimating the parameters of spatial covariance models. We show that the SFDEL is Bartlett correctable. In particular, the improvement in coverage accuracies of the Bartlett‐corrected confidence regions depends on the underlying spatial structures. The Canadian Journal of Statistics 47: 455–472; 2019 © 2019 Statistical Society of Canada  相似文献   

15.
Weighted log‐rank estimating function has become a standard estimation method for the censored linear regression model, or the accelerated failure time model. Well established statistically, the estimator defined as a consistent root has, however, rather poor computational properties because the estimating function is neither continuous nor, in general, monotone. We propose a computationally efficient estimator through an asymptotics‐guided Newton algorithm, in which censored quantile regression methods are tailored to yield an initial consistent estimate and a consistent derivative estimate of the limiting estimating function. We also develop fast interval estimation with a new proposal for sandwich variance estimation. The proposed estimator is asymptotically equivalent to the consistent root estimator and barely distinguishable in samples of practical size. However, computation time is typically reduced by two to three orders of magnitude for point estimation alone. Illustrations with clinical applications are provided.  相似文献   

16.
We consider variable selection in linear regression of geostatistical data that arise often in environmental and ecological studies. A penalized least squares procedure is studied for simultaneous variable selection and parameter estimation. Various penalty functions are considered including smoothly clipped absolute deviation. Asymptotic properties of penalized least squares estimates, particularly the oracle properties, are established, under suitable regularity conditions imposed on a random field model for the error process. Moreover, computationally feasible algorithms are proposed for estimating regression coefficients and their standard errors. Finite‐sample properties of the proposed methods are investigated in a simulation study and comparison is made among different penalty functions. The methods are illustrated by an ecological dataset of landcover in Wisconsin. The Canadian Journal of Statistics 37: 607–624; 2009 © 2009 Statistical Society of Canada  相似文献   

17.
We study methods to estimate regression and variance parameters for over-dispersed and correlated count data from highly stratified surveys. Our application involves counts of fish catches from stratified research surveys and we propose a novel model in fisheries science to address changes in survey protocols. A challenge with this model is the large number of nuisance parameters which leads to computational issues and biased statistical inferences. We use a computationally efficient profile generalized estimating equation method and compare it to marginal maximum likelihood (MLE) and restricted MLE (REML) methods. We use REML to address bias and inaccurate confidence intervals because of many nuisance parameters. The marginal MLE and REML approaches involve intractable integrals and we used a new R package that is designed for estimating complex nonlinear models that may include random effects. We conclude from simulation analyses that the REML method provides more reliable statistical inferences among the three methods we investigated.  相似文献   

18.
Spatially-adaptive Penalties for Spline Fitting   总被引:2,自引:0,他引:2  
The paper studies spline fitting with a roughness penalty that adapts to spatial heterogeneity in the regression function. The estimates are p th degree piecewise polynomials with p − 1 continuous derivatives. A large and fixed number of knots is used and smoothing is achieved by putting a quadratic penalty on the jumps of the p th derivative at the knots. To be spatially adaptive, the logarithm of the penalty is itself a linear spline but with relatively few knots and with values at the knots chosen to minimize the generalized cross validation (GCV) criterion. This locally-adaptive spline estimator is compared with other spline estimators in the literature such as cubic smoothing splines and knot-selection techniques for least squares regression. Our estimator can be interpreted as an empirical Bayes estimate for a prior allowing spatial heterogeneity. In cases of spatially heterogeneous regression functions, empirical Bayes confidence intervals using this prior achieve better pointwise coverage probabilities than confidence intervals based on a global-penalty parameter. The method is developed first for univariate models and then extended to additive models.  相似文献   

19.
Markov chain Monte Carlo (MCMC) implementations of Bayesian inference for latent spatial Gaussian models are very computationally intensive, and restrictions on storage and computation time are limiting their application to large problems. Here we propose various parallel MCMC algorithms for such models. The algorithms' performance is discussed with respect to a simulation study, which demonstrates the increase in speed with which the algorithms explore the posterior distribution as a function of the number of processors. We also discuss how feasible problem size is increased by use of these algorithms.  相似文献   

20.
We consider the problem of variable selection in high-dimensional partially linear models with longitudinal data. A variable selection procedure is proposed based on the smooth-threshold generalized estimating equation (SGEE). The proposed procedure automatically eliminates inactive predictors by setting the corresponding parameters to be zero, and simultaneously estimates the nonzero regression coefficients by solving the SGEE. We establish the asymptotic properties in a high-dimensional framework where the number of covariates pn increases as the number of clusters n increases. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号