期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Markov-chain-based regression model with random effects for the analysis of 18O-labelled mass spectra

Qi Zhu Tomasz Burzykowski 《Journal of Statistical Computation and Simulation》2013,83(1):145-157

The enzymatic ¹⁸O-labelling is a useful technique for reducing the influence of the between-spectra variability on the results of mass-spectrometry experiments. A difficulty in applying the technique lies in the quantification of the corresponding peptides due to the possibility of an incomplete labelling, which may result in biased estimates of the relative peptide abundance. To address the problem, Zhu et al. [A Markov-chain-based heteroscedastic regression model for the analysis of high-resolution enzymatically ¹⁸O-labeled mass spectra, J. Proteome Res. 9(5) (2010), pp. 2669–2677] proposed a Markov-chain-based regression model with heteroscedastic residual variance, which corrects for the possible bias. In this paper, we extend the model by allowing for the estimation of the technical and/or biological variability for the mass spectra data. To this aim, we use a mixed-effects version of the model. The performance of the model is evaluated based on results of an application to real-life mass spectra data and a simulation study. 相似文献

2.

Overlapping group lasso for high-dimensional generalized linear models

Shengbin Zhou Jingke Zhou Bo Zhang 《统计学通讯:理论与方法》2013,42(19):4903-4917

Abstract

Structured sparsity has recently been a very popular technique to deal with the high-dimensional data. In this paper, we mainly focus on the theoretical problems for the overlapping group structure of generalized linear models (GLMs). Although the overlapping group lasso method for GLMs has been widely applied in some applications, the theoretical properties about it are still unknown. Under some general conditions, we presents the oracle inequalities for the estimation and prediction error of overlapping group Lasso method in the generalized linear model setting. Then, we apply these results to the so-called Logistic and Poisson regression models. It is shown that the results of the Lasso and group Lasso procedures for GLMs can be recovered by specifying the group structures in our proposed method. The effect of overlap and the performance of variable selection of our proposed method are both studied by numerical simulations. Finally, we apply our proposed method to two gene expression data sets: the p53 data and the lung cancer data. 相似文献

3.

Robust multivariate diagnostics for PLSR and application on high dimensional spectrally overlapped drug systems

Aylin Alin Claudio Agostinelli Georgi Gergov Plamen Katsarov Yahya Al-Degs 《Journal of Statistical Computation and Simulation》2019,89(6):966-984

ABSTRACT

Statistical methods are effectively used in the evaluation of pharmaceutical formulations instead of laborious liquid chromatography. However, signal overlapping, nonlinearity, multicollinearity and presence of outliers deteriorate the performance of statistical methods. The Partial Least Squares Regression (PLSR) is a very popular method in the quantification of high dimensional spectrally overlapped drug formulations. The SIMPLS is the mostly used PLSR algorithm, but it is highly sensitive to outliers that also effect the diagnostics. In this paper, we propose new robust multivariate diagnostics to identify outliers, influential observations and points causing non-normality for a PLSR model. We study performances of the proposed diagnostics on two everyday use highly overlapping drug systems: Paracetamol–Caffeine and Doxylamine Succinate–Pyridoxine Hydrochloride. 相似文献

4.

Tracking multiple moving objects in images using Markov Chain Monte Carlo

Lan Jiang Sumeetpal S. Singh 《Statistics and Computing》2018,28(3):495-510

A new Bayesian state and parameter learning algorithm for multiple target tracking models with image observations are proposed. Specifically, a Markov chain Monte Carlo algorithm is designed to sample from the posterior distribution of the unknown time-varying number of targets, their birth, death times and states as well as the model parameters, which constitutes the complete solution to the specific tracking problem we consider. The conventional approach is to pre-process the images to extract point observations and then perform tracking, i.e. infer the target trajectories. We model the image generation process directly to avoid any potential loss of information when extracting point observations using a pre-processing step that is decoupled from the inference algorithm. Numerical examples show that our algorithm has improved tracking performance over commonly used techniques, for both synthetic examples and real florescent microscopy data, especially in the case of dim targets with overlapping illuminated regions. 相似文献

5.

Uncertainty Quantification in Case of Imperfect Models: A Non‐Bayesian Approach

《Scandinavian Journal of Statistics》2018,45(3):729-752

The starting point in uncertainty quantification is a stochastic model, which is fitted to a technical system in a suitable way, and prediction of uncertainty is carried out within this stochastic model. In any application, such a model will not be perfect, so any uncertainty quantification from such a model has to take into account the inadequacy of the model. In this paper, we rigorously show how the observed data of the technical system can be used to build a conservative non‐asymptotic confidence interval on quantiles related to experiments with the technical system. The construction of this confidence interval is based on concentration inequalities and order statistics. An asymptotic bound on the length of this confidence interval is presented. Here we assume that engineers use more and more of their knowledge to build models with order of errors bounded by . The results are illustrated by applying the newly proposed approach to real and simulated data. 相似文献

6.

Use of the Barker model in an experiment examining covariate effects on first-year survival in Ross's Geese ( Chen rossii ): A case study

S. M. Slattery R. T. Alisauskas 《Journal of applied statistics》2002,29(1-4):497-508

The Barker model provides researchers with an opportunity to use three types of data for mark-recapture analyses - recaptures, recoveries, and resightings. This model structure maximizes use of encounter data and increases the precision of parameter estimates, provided the researcher has large amounts of resighting data. However, to our knowledge, this model has not been used for any published ringing studies. Our objective here is to report our use of the Barker model in covariate-dependent analyses that we conducted in Program MARK. In particular, we wanted to describe our experimental study design and discuss our analytical approach plus some logistical constraints we encountered while conducting a study of the effects of growth and parasites on survival of juvenile Ross's Geese. Birds were marked just before fledging, alternately injected with antiparasite drugs or a control, and then were re-encountered during migration and breeding in following years. Although the Barker model estimates seven parameters, our objectives focused on annual survival only, thus we considered all other parameters as nuisance terms. Therefore, we simplified our model structures by maintaining biological complexity on survival, while retaining a very basic structure on nuisance parameters. These analyses were conducted in a two-step approach where we used the most parsimonious model from nuisance parameter analyses as our starting model for analyses of covariate effects. This analytical approach also allowed us to minimize the long CPU times associated with the use of covariates in earlier versions of Program MARK. Resightings made up about 80% of our encounter history data, and simulations demonstrated that precision and bias of parameter estimates were minimally affected by this distribution. Overall, the main source of bias was that smaller goslings were too small to retain neckbands, yet were the birds that we predicted would have the lowest survival probability and highest probability for parasite effects. Consequently, we considered our results conservative. The largest constraint of our study design was the inability to partition survival into biologically meaningful periods to provide insight into the timing and mechanisms of mortality. 相似文献

7.

An auxiliary function approach for Lasso in music composition using cellular automata

Kun Li WenBo Zhao 《Journal of applied statistics》2014,41(5):989-997

In this paper, we present an auxiliary function approach to solve the overlap group Lasso problem. Our goal is to solve a more general structure with overlapping groups, which is suitable to be used in cellular automata (CA). The CA were introduced to the algorithmic composition which is based on the development and classification. At the same time, concrete algorithm and mapping from CA to music series are given. Experimental simulations show the effectiveness of our algorithms, and using the auxiliary function approach to solve Lasso with CA is a potentially useful music automatic-generation algorithm. 相似文献

8.

Parametric non-mixture cure models for schedule finding of therapeutic agents

Liu CA Braun TM 《Journal of the Royal Statistical Society. Series C, Applied statistics》2009,58(2):225-236

We propose a phase I clinical trial design that seeks to determine the cumulative safety of a series of administrations of a fixed dose of an investigational agent. In contrast with traditional phase I trials that are designed solely to find the maximum tolerated dose of the agent, our design instead identifies a maximum tolerated schedule that includes a maximum tolerated dose as well as a vector of recommended administration times. Our model is based on a non-mixture cure model that constrains the probability of dose limiting toxicity for all patients to increase monotonically with both dose and the number of administrations received. We assume a specific parametric hazard function for each administration and compute the total hazard of dose limiting toxicity for a schedule as a sum of individual administration hazards. Throughout a variety of settings motivated by an actual study in allogeneic bone marrow transplant recipients, we demonstrate that our approach has excellent operating characteristics and performs as well as the only other currently published design for schedule finding studies. We also present arguments for the preference of our non-mixture cure model over the existing model. 相似文献

9.

Earthquakes occurrences estimation through a parametric semi-Markov approach

Giovanni Masala 《Journal of applied statistics》2012,39(1):81-96

The estimation of earthquakes’ occurrences prediction in seismic areas is a challenging problem in seismology and earthquake engineering. Indeed, the prevention and the quantification of possible damage provoked by destructive earthquakes are directly linked to this kind of prevision. In our paper, we adopt a parametric semi-Markov approach. This model assumes that a sequence of earthquakes is seen as a Markov process and besides it permits to take into consideration the more realistic assumption of events’ dependence in space and time. The elapsed time between two consecutive events is modeled as a general Weibull distribution. We determine then the transition probabilities and the so-called crossing states probabilities. We conclude then with a Monte Carlo simulation and the model is validated through a large database containing real data. 相似文献

10.

Second-derivative functional regression with applications to near infra-red spectroscopy

Constantinos Goutis 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1998,60(1):103-114

A linear regression method to predict a scalar from a discretized smooth function is presented. The method takes into account the functional nature of the predictors and the importance of the second derivative in spectroscopic applications. This motivates a functional inner product that can be used as a roughness penalty. Using this inner product, we derive a linear prediction method that is similar to ridge regression but with different shrinkage characteristics. We describe its practical implementation and we address the problem of computing the second derivatives nonparametrically. We apply the method to a calibration example using near infra-red spectra. We conclude with a discussion comparing our approach with other regression algorithms. 相似文献

11.

A Bayesian phase I/II design to determine subgroup-specific optimal dose for immunotherapy sequentially combined with radiotherapy

Beibei Guo Yong Zang Li-Hsiang Lin Rui Zhang 《Pharmaceutical statistics》2023,22(1):143-161

Sequential administration of immunotherapy following radiotherapy (immunoRT) has attracted much attention in cancer research. Due to its unique feature that radiotherapy upregulates the expression of a predictive biomarker for immunotherapy, novel clinical trial designs are needed for immunoRT to identify patient subgroups and the optimal dose for each subgroup. In this article, we propose a Bayesian phase I/II design for immunotherapy administered after standard-dose radiotherapy for this purpose. We construct a latent subgroup membership variable and model it as a function of the baseline and pre-post radiotherapy change in the predictive biomarker measurements. Conditional on the latent subgroup membership of each patient, we jointly model the continuous immune response and the binary efficacy outcome using plateau models, and model toxicity using the equivalent toxicity score approach to account for toxicity grades. During the trial, based on accumulating data, we continuously update model estimates and adaptively randomize patients to admissible doses. Simulation studies and an illustrative trial application show that our design has good operating characteristics in terms of identifying both patient subgroups and the optimal dose for each subgroup. 相似文献

12.

Bayesian model selection using test statistics

Jianhua Hu Valen E. Johnson 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2009,71(1):143-158

Summary. Existing Bayesian model selection procedures require the specification of prior distributions on the parameters appearing in every model in the selection set. In practice, this requirement limits the application of Bayesian model selection methodology. To overcome this limitation, we propose a new approach towards Bayesian model selection that uses classical test statistics to compute Bayes factors between possible models. In several test cases, our approach produces results that are similar to previously proposed Bayesian model selection and model averaging techniques in which prior distributions were carefully chosen. In addition to eliminating the requirement to specify complicated prior distributions, this method offers important computational and algorithmic advantages over existing simulation-based methods. Because it is easy to evaluate the operating characteristics of this procedure for a given sample size and specified number of covariates, our method facilitates the selection of hyperparameter values through prior-predictive simulation. 相似文献

13.

Fuzzy K-means clustering models for triangular fuzzy time trajectories

Renato Coppi Pierpaolo D'Urso 《Statistical Methods and Applications》2002,11(1):21-40

We focus our attention on the classification of fuzzy time trajectories with triangular membership function, described by a given set of individuals. To this purpose, we adopt a fullyinformational approach, explicitly recognizing the informational nature shared by the ingredients of the classification procedure: the observed data (Empirical Information) and the classification model (Theoretical Information). In particular, by supposing that the informational paradigm has a fuzzy nature, we suggest three fuzzy clustering models allowing the classification of the triangular fuzzy time trajectories, based on the analysis of the cross sectional and/or longitudinal characteristics of their components (centers and spreads). Two applicative examples are illustrated. 相似文献

14.

处理效应模型的理论拓展及在政策评价中的应用

纪园园等《统计研究》2020,37(9):106-119

现有文献在利用处理效应模型评估政策时,模型中的假设条件局限性大多较强,在实际应用中很难验证,且一旦这些假设错误,就会引起参数估计的不一致。本文首先在非参数框架下提出了一种关于处理效应模型的半参数估计方法,其既不对模型中的函数形式做任何假定,也允许误差项的联合分布是广义异方差形式,从而大大减少因模型误设而引起的估计偏误。考虑到处理效应的内生性问题,提出了一个两步估计量。第一步关于选择方程进行非参数估计;第二步在结果方程中,利用工具变量法估计平均处理效应。其次,对估计量的大样本性质进行分析,表明了估计量的一致性和渐近正态性质。再次,通过蒙特卡罗模拟与已有估计方法进行比较,结果表明本文的方法具有较强的稳健性。最后,本文将该方法应用于研究高新技术企业认证政策对企业盈利能力影响,研究发现该政策提升了高新技术企业的盈利能力,并且相比于国有企业,该政策对民营企业促进效应更大。相似文献

15.

Outlier detection using difference-based variance estimators in multiple regression

Chun Gun Park 《统计学通讯:理论与方法》2018,47(24):5986-6001

In this article, we propose an outlier detection approach in a multiple regression model using the properties of a difference-based variance estimator. This type of a difference-based variance estimator was originally used to estimate error variance in a non parametric regression model without estimating a non parametric function. This article first employed a difference-based error variance estimator to study the outlier detection problem in a multiple regression model. Our approach uses the leave-one-out type method based on difference-based error variance. The existing outlier detection approaches using the leave-one-out approach are highly affected by other outliers, while ours is not because our approach does not use the regression coefficient estimator. We compared our approach with several existing methods using a simulation study, suggesting the outperformance of our approach. The advantages of our approach are demonstrated using a real data application. Our approach can be extended to the non parametric regression model for outlier detection. 相似文献

16.

Bayesian hierarchical model for protein identifications

Riten Mitra Ryan Gill Sinjini Sikdar 《Journal of applied statistics》2019,46(1):30-46

In proteomics, identification of proteins from complex mixtures of proteins extracted from biological samples is an important problem. Among the experimental technologies, mass spectrometry (MS) is the most popular one. Protein identification from MS data typically relies on a ‘two-step’ procedure of identifying the peptide first followed by the separate protein identification procedure next. In this setup, the interdependence of peptides and proteins is neglected resulting in relatively inaccurate protein identification. In this article, we propose a Markov chain Monte Carlo based Bayesian hierarchical model, a first of its kind in protein identification, which integrates the two steps and performs joint analysis of proteins and peptides using posterior probabilities. We remove the assumption of independence of proteins by using clustering group priors to the proteins based on the assumption that proteins sharing the same biological pathway are likely to be present or absent together and are correlated. The complete conditionals of the proposed joint model being tractable, we propose and implement a Gibbs sampling scheme for full posterior inference that provides the estimation and statistical uncertainties of all relevant parameters. The model has better operational characteristics compared to two existing ‘one-step’ procedures on a range of simulation settings as well as on two well-studied datasets. 相似文献

17.

Bayesian mixture models for complex high dimensional count data in phage display experiments

Yuan Ji Guosheng Yin Kam-Wah Tsui Mikhail G. Kolonin Jessica Sun Wadih Arap Renata Pasqualini Kim-Anh Do 《Journal of the Royal Statistical Society. Series C, Applied statistics》2007,56(2):139-152

Summary. Phage display is a biological process that is used to screen random peptide libraries for ligands that bind to a target of interest with high affinity. On the basis of a count data set from an innovative multistage phage display experiment, we propose a class of Bayesian mixture models to cluster peptide counts into three groups that exhibit different display patterns across stages. Among the three groups, the investigators are particularly interested in that with an ascending display pattern in the counts, which implies that the peptides are likely to bind to the target with strong affinity. We apply a Bayesian false discovery rate approach to identify the peptides with the strongest affinity within the group. A list of peptides is obtained, among which important ones with meaningful functions are further validated by biologists. To examine the performance of the Bayesian model, we conduct a simulation study and obtain desirable results. 相似文献

18.

An Empirical Process View of Inverse Regression

下载免费PDF全文

François Portier 《Scandinavian Journal of Statistics》2016,43(3):827-844

A common approach taken in high‐dimensional regression analysis is sliced inverse regression, which separates the range of the response variable into non‐overlapping regions, called ‘slices’. Asymptotic results are usually shown assuming that the slices are fixed, while in practice, estimators are computed with random slices containing the same number of observations. Based on empirical process theory, we present a unified theoretical framework to study these techniques, and revisit popular inverse regression estimators. Furthermore, we introduce a bootstrap methodology that reproduces the laws of Cramér–von Mises test statistics of interest to model dimension, effects of specified covariates and whether or not a sliced inverse regression estimator is appropriate. Finally, we investigate the accuracy of different bootstrap procedures by means of simulations. 相似文献

19.

Local Whittle likelihood approach for generalized divergence

Yujie Xue Masanobu Taniguchi 《Scandinavian Journal of Statistics》2020,47(1):182-195

There are many approaches in the estimation of spectral density. With regard to parametric approaches, different divergences are proposed in fitting a certain parametric family of spectral densities. Moreover, nonparametric approaches are also quite common considering the situation when we cannot specify the model of process. In this paper, we develop a local Whittle likelihood approach based on a general score function, with some special cases of which, the approach applies to more applications. This paper highlights the effective asymptotics of our general local Whittle estimator, and presents a comparison with other estimators. Additionally, for a special case, we construct the one-step ahead predictor based on the form of the score function. Subsequently, we show that it has a smaller prediction error than the classical exponentially weighted linear predictor. The provided numerical studies show some interesting features of our local Whittle estimator. 相似文献

20.

Sensitivity analysis for ranked data

《Journal of the Korean Statistical Society》2014,43(1):1-9

Sensitivity analysis is to study the influence of a small change in the input data on the output of the analysis. Han and Huh (1995) developed a quantification method for the ranked data. However, the question of stability in the analysis of ranked data has not been considered. Here, we propose a method of sensitivity analysis for ranked data. Our aim is to evaluate perturbations by using a graphical approach suggested by Han and Huh (1995). It extends the results obtained by Tanaka (1984) and Huh (1989) for the sensitivity analysis in Hayashi’s third method of quantification and those by Huh and Park (1990) for the principal component reduction of the case influence derivatives in regression. A numerical example is provided to explain how to conduct sensitivity analysis based on the proposed approach. 相似文献