首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The estimation of Bayesian networks given high‐dimensional data, in particular gene expression data, has been the focus of much recent research. Whilst there are several methods available for the estimation of such networks, these typically assume that the data consist of independent and identically distributed samples. It is often the case, however, that the available data have a more complex mean structure, plus additional components of variance, which must then be accounted for in the estimation of a Bayesian network. In this paper, score metrics that take account of such complexities are proposed for use in conjunction with score‐based methods for the estimation of Bayesian networks. We propose first, a fully Bayesian score metric, and second, a metric inspired by the notion of restricted maximum likelihood. We demonstrate the performance of these new metrics for the estimation of Bayesian networks using simulated data with known complex mean structures. We then present the analysis of expression levels of grape‐berry genes adjusting for exogenous variables believed to affect the expression levels of the genes. Demonstrable biological effects can be inferred from the estimated conditional independence relationships and correlations amongst the grape‐berry genes.  相似文献   

2.
Incomplete data subject to non‐ignorable non‐response are often encountered in practice and have a non‐identifiability problem. A follow‐up sample is randomly selected from the set of non‐respondents to avoid the non‐identifiability problem and get complete responses. Glynn, Laird, & Rubin analyzed non‐ignorable missing data with a follow‐up sample under a pattern mixture model. In this article, maximum likelihood estimation of parameters of the categorical missing data is considered with a follow‐up sample under a selection model. To estimate the parameters with non‐ignorable missing data, the EM algorithm with weighting, proposed by Ibrahim, is used. That is, in the E‐step, the weighted mean is calculated using the fractional weights for imputed data. Variances are estimated using the approximated jacknife method. Simulation results are presented to compare the proposed method with previously presented methods.  相似文献   

3.
In this paper, we propose the hard thresholding regression (HTR) for estimating high‐dimensional sparse linear regression models. HTR uses a two‐stage convex algorithm to approximate the ?0‐penalized regression: The first stage calculates a coarse initial estimator, and the second stage identifies the oracle estimator by borrowing information from the first one. Theoretically, the HTR estimator achieves the strong oracle property over a wide range of regularization parameters. Numerical examples and a real data example lend further support to our proposed methodology.  相似文献   

4.
With reference to a specific dataset, we consider how to perform a flexible non‐parametric Bayesian analysis of an inhomogeneous point pattern modelled by a Markov point process, with a location‐dependent first‐order term and pairwise interaction only. A priori we assume that the first‐order term is a shot noise process, and that the interaction function for a pair of points depends only on the distance between the two points and is a piecewise linear function modelled by a marked Poisson process. Simulation of the resulting posterior distribution using a Metropolis–Hastings algorithm in the ‘conventional’ way involves evaluating ratios of unknown normalizing constants. We avoid this problem by applying a recently introduced auxiliary variable technique. In the present setting, the auxiliary variable used is an example of a partially ordered Markov point process model.  相似文献   

5.
The group Lasso is a penalized regression method, used in regression problems where the covariates are partitioned into groups to promote sparsity at the group level [27 M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B 68 (2006), pp. 4967. doi: 10.1111/j.1467-9868.2005.00532.x[Crossref] [Google Scholar]]. Quantile group Lasso, a natural extension of quantile Lasso [25 Y. Wu and Y. Liu, Variable selection in quantile regression, Statist. Sinica 19 (2009), pp. 801817.[Web of Science ®] [Google Scholar]], is a good alternative when the data has group information and has many outliers and/or heavy tails. How to discover important features that are correlated with interest of outcomes and immune to outliers has been paid much attention. In many applications, however, we may also want to keep the flexibility of selecting variables within a group. In this paper, we develop a sparse group variable selection based on quantile methods which select important covariates at both the group level and within the group level, which penalizes the empirical check loss function by the sum of square root group-wise L1-norm penalties. The oracle properties are established where the number of parameters diverges. We also apply our new method to varying coefficient model with categorial effect modifiers. Simulations and real data example show that the newly proposed method has robust and superior performance.  相似文献   

6.
Abstract. We propose an ?1‐penalized estimation procedure for high‐dimensional linear mixed‐effects models. The models are useful whenever there is a grouping structure among high‐dimensional observations, that is, for clustered data. We prove a consistency and an oracle optimality result and we develop an algorithm with provable numerical convergence. Furthermore, we demonstrate the performance of the method on simulated and a real high‐dimensional data set.  相似文献   

7.
In the economics and biological gene expression study area where a large number of variables will be involved, even when the predictors are independent, as long as the dimension is high, the maximum sample correlation can be large. Variable selection is a fundamental method to deal with such models. The ridge regression performs well when the predictors are highly correlated and some nonconcave penalized thresholding estimators enjoy the nice oracle property. In order to provide a satisfactory solution to the collinearity problem, in this paper we report the combined-penalization (CP) mixed by the nonconcave penalty and ridge, with a diverging number of parameters. It is observed that the CP estimator with a diverging number of parameters can correctly select covariates with nonzero coefficients and can estimate parameters simultaneously in the presence of multicollinearity. Simulation studies and a real data example demonstrate the well performance of the proposed method.  相似文献   

8.
The most common forecasting methods in business are based on exponential smoothing, and the most common time series in business are inherently non‐negative. Therefore it is of interest to consider the properties of the potential stochastic models underlying exponential smoothing when applied to non‐negative data. We explore exponential smoothing state space models for non‐negative data under various assumptions about the innovations, or error, process. We first demonstrate that prediction distributions from some commonly used state space models may have an infinite variance beyond a certain forecasting horizon. For multiplicative error models that do not have this flaw, we show that sample paths will converge almost surely to zero even when the error distribution is non‐Gaussian. We propose a new model with similar properties to exponential smoothing, but which does not have these problems, and we develop some distributional properties for our new model. We then explore the implications of our results for inference, and compare the short‐term forecasting performance of the various models using data on the weekly sales of over 300 items of costume jewelry. The main findings of the research are that the Gaussian approximation is adequate for estimation and one‐step‐ahead forecasting. However, as the forecasting horizon increases, the approximate prediction intervals become increasingly problematic. When the model is to be used for simulation purposes, a suitably specified scheme must be employed.  相似文献   

9.
Empirical Bayes is a versatile approach to “learn from a lot” in two ways: first, from a large number of variables and, second, from a potentially large amount of prior information, for example, stored in public repositories. We review applications of a variety of empirical Bayes methods to several well‐known model‐based prediction methods, including penalized regression, linear discriminant analysis, and Bayesian models with sparse or dense priors. We discuss “formal” empirical Bayes methods that maximize the marginal likelihood but also more informal approaches based on other data summaries. We contrast empirical Bayes to cross‐validation and full Bayes and discuss hybrid approaches. To study the relation between the quality of an empirical Bayes estimator and p, the number of variables, we consider a simple empirical Bayes estimator in a linear model setting. We argue that empirical Bayes is particularly useful when the prior contains multiple parameters, which model a priori information on variables termed “co‐data”. In particular, we present two novel examples that allow for co‐data: first, a Bayesian spike‐and‐slab setting that facilitates inclusion of multiple co‐data sources and types and, second, a hybrid empirical Bayes–full Bayes ridge regression approach for estimation of the posterior predictive interval.  相似文献   

10.
We propose the Laplace Error Penalty (LEP) function for variable selection in high‐dimensional regression. Unlike penalty functions using piecewise splines construction, the LEP is constructed as an exponential function with two tuning parameters and is infinitely differentiable everywhere except at the origin. With this construction, the LEP‐based procedure acquires extra flexibility in variable selection, admits a unified derivative formula in optimization and is able to approximate the L0 penalty as close as possible. We show that the LEP procedure can identify relevant predictors in exponentially high‐dimensional regression with normal errors. We also establish the oracle property for the LEP estimator. Although not being convex, the LEP yields a convex penalized least squares function under mild conditions if p is no greater than n. A coordinate descent majorization‐minimization algorithm is introduced to implement the LEP procedure. In simulations and a real data analysis, the LEP methodology performs favorably among competitive procedures.  相似文献   

11.
Non‐random sampling is a source of bias in empirical research. It is common for the outcomes of interest (e.g. wage distribution) to be skewed in the source population. Sometimes, the outcomes are further subjected to sample selection, which is a type of missing data, resulting in partial observability. Thus, methods based on complete cases for skew data are inadequate for the analysis of such data and a general sample selection model is required. Heckman proposed a full maximum likelihood estimation method under the normality assumption for sample selection problems, and parametric and non‐parametric extensions have been proposed. We generalize Heckman selection model to allow for underlying skew‐normal distributions. Finite‐sample performance of the maximum likelihood estimator of the model is studied via simulation. Applications illustrate the strength of the model in capturing spurious skewness in bounded scores, and in modelling data where logarithm transformation could not mitigate the effect of inherent skewness in the outcome variable.  相似文献   

12.
Abstract. Non‐parametric regression models have been studied well including estimating the conditional mean function, the conditional variance function and the distribution function of errors. In addition, empirical likelihood methods have been proposed to construct confidence intervals for the conditional mean and variance. Motivated by applications in risk management, we propose an empirical likelihood method for constructing a confidence interval for the pth conditional value‐at‐risk based on the non‐parametric regression model. A simulation study shows the advantages of the proposed method.  相似文献   

13.
The influence of economic conditions on the movement of a variable between states (for example a change in credit rating from A to B) can be modelled using a multi‐state latent factor intensity framework. Estimation of this type of model is, however, not straightforward, as transition probabilities are involved and the model contains a few highly analytically intractable distributions. In this paper, a Bayesian approach is adopted to manage the distributions. The innovation in the sampling algorithm used to obtain the posterior distributions of the model parameters includes a particle filter step and a Metropolis–Hastings step within a Gibbs sampler. The feasibility and accuracy of the proposed sampling algorithm is supported with a few simulated examples. The paper contains an application concerning what caused 1049 firms to change their credit ratings over a span of ten years.  相似文献   

14.
We propose a vector generalized additive modeling framework for taking into account the effect of covariates on angular density functions in a multivariate extreme value context. The proposed methods are tailored for settings where the dependence between extreme values may change according to covariates. We devise a maximum penalized log‐likelihood estimator, discuss details of the estimation procedure, and derive its consistency and asymptotic normality. The simulation study suggests that the proposed methods perform well in a wealth of simulation scenarios by accurately recovering the true covariate‐adjusted angular density. Our empirical analysis reveals relevant dynamics of the dependence between extreme air temperatures in two alpine resorts during the winter season.  相似文献   

15.
Abstract. Similar to variable selection in the linear model, selecting significant components in the additive model is of great interest. However, such components are unknown, unobservable functions of independent variables. Some approximation is needed. We suggest a combination of penalized regression spline approximation and group variable selection, called the group‐bridge‐type spline method (GBSM), to handle this component selection problem with a diverging number of correlated variables in each group. The proposed method can select significant components and estimate non‐parametric additive function components simultaneously. To make the GBSM stable in computation and adaptive to the level of smoothness of the component functions, weighted power spline bases and projected weighted power spline bases are proposed. Their performance is examined by simulation studies. The proposed method is extended to a partial linear regression model analysis with real data, and gives reliable results.  相似文献   

16.
Abstract. Although generalized cross‐validation (GCV) has been frequently applied to select bandwidth when kernel methods are used to estimate non‐parametric mixed‐effect models in which non‐parametric mean functions are used to model covariate effects, and additive random effects are applied to account for overdispersion and correlation, the optimality of the GCV has not yet been explored. In this article, we construct a kernel estimator of the non‐parametric mean function. An equivalence between the kernel estimator and a weighted least square type estimator is provided, and the optimality of the GCV‐based bandwidth is investigated. The theoretical derivations also show that kernel‐based and spline‐based GCV give very similar asymptotic results. This provides us with a solid base to use kernel estimation for mixed‐effect models. Simulation studies are undertaken to investigate the empirical performance of the GCV. A real data example is analysed for illustration.  相似文献   

17.
Non‐inferiority trials aim to demonstrate whether an experimental therapy is not unacceptably worse than an active reference therapy already in use. When applicable, a three‐arm non‐inferiority trial, including an experiment therapy, an active reference therapy, and a placebo, is often recommended to assess assay sensitivity and internal validity of a trial. In this paper, we share some practical considerations based on our experience from a phase III three‐arm non‐inferiority trial. First, we discuss the determination of the total sample size and its optimal allocation based on the overall power of the non‐inferiority testing procedure and provide ready‐to‐use R code for implementation. Second, we consider the non‐inferiority goal of ‘capturing all possibilities’ and show that it naturally corresponds to a simple two‐step testing procedure. Finally, using this two‐step non‐inferiority testing procedure as an example, we compare extensively commonly used frequentist p ‐value methods with the Bayesian posterior probability approach. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

18.
Abstract. Continuous proportional outcomes are collected from many practical studies, where responses are confined within the unit interval (0,1). Utilizing Barndorff‐Nielsen and Jørgensen's simplex distribution, we propose a new type of generalized linear mixed‐effects model for longitudinal proportional data, where the expected value of proportion is directly modelled through a logit function of fixed and random effects. We establish statistical inference along the lines of Breslow and Clayton's penalized quasi‐likelihood (PQL) and restricted maximum likelihood (REML) in the proposed model. We derive the PQL/REML using the high‐order multivariate Laplace approximation, which gives satisfactory estimation of the model parameters. The proposed model and inference are illustrated by simulation studies and a data example. The simulation studies conclude that the fourth order approximate PQL/REML performs satisfactorily. The data example shows that Aitchison's technique of the normal linear mixed model for logit‐transformed proportional outcomes is not robust against outliers.  相似文献   

19.
The authors extend the classical Cormack‐Jolly‐Seber mark‐recapture model to account for both temporal and spatial movement through a series of markers (e.g., dams). Survival rates are modeled as a function of (possibly) unobserved travel times. Because of the complex nature of the likelihood, they use a Bayesian approach based on the complete data likelihood, and integrate the posterior through Markov chain Monte Carlo methods. They test the model through simulations and apply it also to actual salmon data arising from the Columbia river system. The methodology was developed for use by the Pacific Ocean Shelf Tracking (POST) project.  相似文献   

20.
We present a scalable Bayesian modelling approach for identifying brain regions that respond to a certain stimulus and use them to classify subjects. More specifically, we deal with multi‐subject electroencephalography (EEG) data with a binary response distinguishing between alcoholic and control groups. The covariates are matrix‐variate with measurements taken from each subject at different locations across multiple time points. EEG data have a complex structure with both spatial and temporal attributes. We use a divide‐and‐conquer strategy and build separate local models, that is, one model at each time point. We employ Bayesian variable selection approaches using a structured continuous spike‐and‐slab prior to identify the locations that respond to a certain stimulus. We incorporate the spatio‐temporal structure through a Kronecker product of the spatial and temporal correlation matrices. We develop a highly scalable estimation algorithm, using likelihood approximation, to deal with large number of parameters in the model. Variable selection is done via clustering of the locations based on their duration of activation. We use scoring rules to evaluate the prediction performance. Simulation studies demonstrate the efficiency of our scalable algorithm in terms of estimation and fast computation. We present results using our scalable approach on a case study of multi‐subject EEG data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号