首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Transductive methods are useful in prediction problems when the training dataset is composed of a large number of unlabeled observations and a smaller number of labeled observations. In this paper, we propose an approach for developing transductive prediction procedures that are able to take advantage of the sparsity in the high dimensional linear regression. More precisely, we define transductive versions of the LASSO (Tibshirani, 1996) and the Dantzig Selector (Candès and Tao, 2007). These procedures combine labeled and unlabeled observations of the training dataset to produce a prediction for the unlabeled observations. We propose an experimental study of the transductive estimators that shows that they improve the LASSO and Dantzig Selector in many situations, and particularly in high dimensional problems when the predictors are correlated. We then provide non-asymptotic theoretical guarantees for these estimation methods. Interestingly, our theoretical results show that the Transductive LASSO and Dantzig Selector satisfy sparsity inequalities under weaker assumptions than those required for the “original” LASSO.  相似文献   

2.
Most problems related to environmental studies are innately multivariate. In fact, in each spatial location more than one variable is usually measured. In geostatistics multivariate data analysis, where we intend to predict the value of a random vector in a new site, which has no data, cokriging method is used as the best linear unbiased prediction. In lattice data analysis, where almost exclusively the probability modeling of data is of concern, only auto-Gaussian model has been used for continuous multivariate data. For discrete multivariate data little work has been carried out. In this paper, an auto-multinomial model is suggested for analyzing multivariate lattice discrete data. The proposed method is illustrated by a real example of air pollution in Tehran, Iran.  相似文献   

3.
Bayesian analysis of discrete time warranty data   总被引:1,自引:0,他引:1  
Summary.  The analysis of warranty claim data, and their use for prediction, has been a topic of active research in recent years. Field data comprising numbers of units returned under guarantee are examined, covering both situations in which the ages of the failed units are known and in which they are not. The latter case poses particular computational problems for likelihood-based methods because of the large number of feasible failure patterns that must be included as contributions to the likelihood function. For prediction of future warranty exposure, which is of central concern to the manufacturer, the Bayesian approach is adopted. For this, Markov chain Monte Carlo methodology is developed.  相似文献   

4.
Summary.  We consider the problem of multistep-ahead prediction in time series analysis by using nonparametric smoothing techniques. Forecasting is always one of the main objectives in time series analysis. Research has shown that non-linear time series models have certain advantages in multistep-ahead forecasting. Traditionally, nonparametric k -step-ahead least squares prediction for non-linear autoregressive AR( d ) models is done by estimating E ( X t + k  | X t , …,  X t − d +1) via nonparametric smoothing of X t + k on ( X t , …,  X t − d +1) directly. We propose a multistage nonparametric predictor. We show that the new predictor has smaller asymptotic mean-squared error than the direct smoother, though the convergence rate is the same. Hence, the predictor proposed is more efficient. Some simulation results, advice for practical bandwidth selection and a real data example are provided.  相似文献   

5.
The multivariate t linear mixed model (MtLMM) has been recently proposed as a robust tool for analysing multivariate longitudinal data with atypical observations. Missing outcomes frequently occur in longitudinal research even in well controlled situations. As a powerful alternative to the traditional expectation maximization based algorithm employing single imputation, we consider a Bayesian analysis of the MtLMM to account for the uncertainties of model parameters and missing outcomes through multiple imputation. An inverse Bayes formulas sampler coupled with Metropolis-within-Gibbs scheme is used to effectively draw the posterior distributions of latent data and model parameters. The techniques for multiple imputation of missing values, estimation of random effects, prediction of future responses, and diagnostics of potential outliers are investigated as well. The proposed methodology is illustrated through a simulation study and an application to AIDS/HIV data.  相似文献   

6.
Abstract

Many methods used in spatial statistics are computationally demanding, and so, the development of more computationally efficient methods has received attention. A important development is the integrated nested Laplace approximation method which is carry out Bayesian analysis more efficiently This method, for geostatistical data, is done considering the SPDE approach that requires the creation of a mesh overlying the study area and all the obtained results depend on it. The impact of the mesh on inference and prediction is investigated through simulations. As there is no formal procedure to specify it, we investigate a guideline to create an optimal mesh.  相似文献   

7.
唐晓彬等 《统计研究》2020,37(7):104-115
消费者信心指数等宏观经济指标具有时间上的滞后效应和动态变化的多维性,不易精确预测。本文基于机器学习长短时间记忆(Long Short-Term Memory,LSTM)神经网络模型,结合大数据技术挖掘消费者信心指数相关网络搜索数据(User Search,US),进而构建一种LSTM&US预测模型,并将其应用于对我国消费者信心指数的长期、中期与短期的预测研究,同时引入多个基准预测模型进行了对比分析。结果发现:引入网络搜索数据能够提高LSTM神经网络模型的预测性能与预测精度;LSTM&US预测模型具有较好的泛化能力,对不同期限的预测效果均较稳定,其预测性能与预测精度均优于其他六种基准预测模型(LSTM、SVR&US、RFR&US、BP&US、XGB&US和LGB&US);预测结果显示本文提出的LSTM&US预测模型具有一定的实用价值,该预测方法为消费者信心指数的预测与预判提供了一种新的研究思路,丰富了机器学习方法在宏观经济指标预测领域中的理论研究。  相似文献   

8.
We use a Bayesian multivariate time series model for the analysis of the dynamics of carbon monoxide atmospheric concentrations. The data are observed at four sites. It is assumed that the logarithm of the observed process can be represented as the sum of unobservable components: a trend, a daily periodicity, a stationary autoregressive signal and an erratic term. Bayesian analysis is performed via Gibbs sampling. In particular, we consider the problem of joint temporal prediction when data are observed at a few sites and it is not possible to fit a complex space–time model. A retrospective analysis of the trend component is also given, which is important in that it explains the evolution of the variability in the observed process.  相似文献   

9.
Linear combinations of random variables play a crucial role in multivariate analysis. Two extension of this concept are considered for functional data and shown to coincide using the Loève–Parzen reproducing kernel Hilbert space representation of a stochastic process. This theory is then used to provide an extension of the multivariate concept of canonical correlation. A solution to the regression problem of best linear unbiased prediction is obtained from this abstract canonical correlation formulation. The classical identities of Lawley and Rao that lead to canonical factor analysis are also generalized to the functional data setting. Finally, the relationship between Fisher's linear discriminant analysis and canonical correlation analysis for random vectors is extended to include situations with function-valued random elements. This allows for classification using the canonical Y scores and related distance measures.  相似文献   

10.
The accurate estimation of an individual's usual dietary intake is an important topic in nutritional epidemiology. This paper considers the best linear unbiased predictor (BLUP) computed from repeatedly measured dietary data and derives several nonparametric prediction intervals for true intake. However, the performance of the BLUP and the validity of prediction intervals depends on whether required model assumptions for the true intake estimation problem hold. To address this issue, the paper examines how the BLUP and prediction intervals behave in the case of a violation of model assumptions, and then proposes an analysis pipeline for checking them with data.  相似文献   

11.
This paper focuses on the analysis of errors between a flight trajectory prediction model and flight data. A novel stochastic prediction flight model is compared with the popular fly-by and fly-over turn models. The propagated error is measured using either spatial coordinates or angles. Depending on the case, the distribution of error is estimated and confidence bounds for the linear and directional mean are provided for all three stochastic flight models.  相似文献   

12.
13.
Factor analysis of multivariate spatial data is considered. A systematic approach for modeling the underlying structure of potentially irregularly spaced, geo-referenced vector observations is proposed. Statistical inference procedures for selecting the number of factors and for model building are discussed. We derive a condition under which a simple and practical inference procedure is valid without specifying the form of distributions and factor covariance functions. The multivariate prediction problem is also discussed, and a procedure combining the latent variable modeling and a measurement-error-free kriging technique is introduced. Simulation results and an example using agricultural data are presented.  相似文献   

14.
Partially linear single-index models play important roles in advanced non-/semi-parametric statistics due to their generality and flexibility. We generalise these models from univariate response to multivariate responses. A Bayesian method with free-knot spline is used to analyse the proposed models, including the estimation and the prediction, and a Metropolis-within-Gibbs sampler is provided for posterior exploration. We also utilise the partially collapsed idea in our algorithm to speed up the convergence. The proposed models and methods of analysis are demonstrated by simulation studies and are applied to a real data set.  相似文献   

15.
Massive correlated data with many inputs are often generated from computer experiments to study complex systems. The Gaussian process (GP) model is a widely used tool for the analysis of computer experiments. Although GPs provide a simple and effective approximation to computer experiments, two critical issues remain unresolved. One is the computational issue in GP estimation and prediction where intensive manipulations of a large correlation matrix are required. For a large sample size and with a large number of variables, this task is often unstable or infeasible. The other issue is how to improve the naive plug-in predictive distribution which is known to underestimate the uncertainty. In this article, we introduce a unified framework that can tackle both issues simultaneously. It consists of a sequential split-and-conquer procedure, an information combining technique using confidence distributions (CD), and a frequentist predictive distribution based on the combined CD. It is shown that the proposed method maintains the same asymptotic efficiency as the conventional likelihood inference under mild conditions, but dramatically reduces the computation in both estimation and prediction. The predictive distribution contains comprehensive information for inference and provides a better quantification of predictive uncertainty as compared with the plug-in approach. Simulations are conducted to compare the estimation and prediction accuracy with some existing methods, and the computational advantage of the proposed method is also illustrated. The proposed method is demonstrated by a real data example based on tens of thousands of computer experiments generated from a computational fluid dynamic simulator.  相似文献   

16.
Functional data analysis has become an important area of research because of its ability of handling high‐dimensional and complex data structures. However, the development is limited in the context of linear mixed effect models and, in particular, for small area estimation. The linear mixed effect models are the backbone of small area estimation. In this article, we consider area‐level data and fit a varying coefficient linear mixed effect model where the varying coefficients are semiparametrically modelled via B‐splines. We propose a method of estimating the fixed effect parameters and consider prediction of random effects that can be implemented using a standard software. For measuring prediction uncertainties, we derive an analytical expression for the mean squared errors and propose a method of estimating the mean squared errors. The procedure is illustrated via a real data example, and operating characteristics of the method are judged using finite sample simulation studies.  相似文献   

17.
It is often necessary to run response surface designs in blocks. In this paper the analysis of data from such experiments, using polynomial regression models, is discussed. The definition and estimation of pure error in blocked designs are considered. It is recommended that pure error is estimated by assuming additive block and treatment effects, as this is more consistent with designs without blocking. The recovery of inter-block information using REML analysis is discussed, although it is shown that it has very little impact if the design is nearly orthogonally blocked. Finally prediction from blocked designs is considered and it is shown that prediction of many quantities of interest is much simpler than prediction of the response itself.  相似文献   

18.
A nonparametric inference algorithm developed by Davis and Geman (1983) is extended problem. The algorithm and applied to a medical prediction employs an estimation procedure for acquiring pairwise statistics among variables of a binary data set, allows for the data-driven creation of interaction terms among the variables, and employs a decision rule which asymptotically gives the minimum expected error. The inference procedure was designed for large data sets but has been extended via the method of cross-validation to encompass smaller data sets.  相似文献   

19.
Degradation tests are especially difficult to conduct for items with high reliability. Test costs, caused mainly by prolonged item duration and item destruction costs, establish the necessity of sequential degradation test designs. We propose a methodology that sequentially selects the optimal observation times to measure the degradation, using a convenient rule that maximizes the inference precision and minimizes test costs. In particular our objective is to estimate a quantile of the time to failure distribution, where the degradation process is modelled as a linear model using Bayesian inference. The proposed sequential analysis is based on an index that measures the expected discrepancy between the estimated quantile and its corresponding prediction, using Monte Carlo methods. The procedure was successfully implemented for simulated and real data.  相似文献   

20.
This article enlarges the covariance configurations, on which the classical linear discriminant analysis is based, by considering the four models arising from the spectral decomposition when eigenvalues and/or eigenvectors matrices are allowed to vary or not between groups. As in the classical approach, the assessment of these configurations is accomplished via a test on the training set. The discrimination rule is then built upon the configuration provided by the test, considering or not the unlabeled data. Numerical experiments, on simulated and real data, have been performed to evaluate the gain of our proposal with respect to the linear discriminant analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号