首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Classical regression analysis is usually performed in two steps. In the first step, an appropriate model is identified to describe the data generating process and in the second step, statistical inference is performed in the identified model. An intuitively appealing approach to the design of experiment for these different purposes are sequential strategies, which use parts of the sample for model identification and adapt the design according to the outcome of the identification steps. In this article, we investigate the finite sample properties of two sequential design strategies, which were recently proposed in the literature. A detailed comparison of sequential designs for model discrimination in several regression models is given by means of a simulation study. Some non-sequential designs are also included in the study.  相似文献   

2.
Fault detection and Isolation takes a strategic position in modern industrial processes for which various approaches are proposed. These approaches are usually developed and based on a consistency test between an observed state of the process provided by sensors and an expected behaviour provided by a mathematical model of the system. These methods require a reliable model of the system to be monitored which is a complex task. Alternatively, we propose in this paper to use blind source separation filters (BSSFs) in order to detect and isolate faults in a three tank pilot plant. This technique is very beneficial as it uses blind identification without an explicit mathematical model of the system. The independent component analysis (ICA), relying on the assumption of the statistical independence of the extracted sources, is used as a tool for each BSSF to extract signals of the process under consideration. The experimental results show the effectiveness and robustness of this approach in detecting and isolating faults that are on sensors in the system.  相似文献   

3.
Modelling the underlying stochastic process is one of the main goals in the study of many dynamic phenomena, such as signal processing, system identification and time series. The issue is often addressed within the framework of ARMA (Autoregressive Moving Average) paradigm, so that the related task of identification of the ‘true’ order is crucial. As it is well known, the effectiveness of such an approach may be seriously compromised by misspecification errors since they may affect model capabilities in capturing dynamic structures of the process. As a result, inference and empirical outcomes may be heavily misleading. Despite the big number of available approaches aimed at determining the order of an ARMA model, the issue is still open. In this paper, we bring the problem in the framework of bootstrap theory in conjunction with the information-based criterion of Akaike (AIC), and a new method for ARMA model selection will be presented. A theoretical justification for the proposed approach as well as an evaluation of its small sample performances, via simulation study, are given.  相似文献   

4.
In this article, we propose a new modeling approach for the multivariate growth curve model with distribution-free errors, which is a useful tool for analyzing multiple-response repeated measurements. We first use the outer product least-squares technique to directly estimate covariance and then explore the feasible generalized least-squares technique to derive the estimator of regression coefficients. Large-sample properties are investigated for these estimators. Moreover, the above estimations for covariance and regression coefficients are extended to the situation under certain null hypothesis tests and the best subset BIC is used for variable selection. A real dataset is analyzed to demonstrate the usefulness and competency of the proposed methodology for model specification (identification) and model fitting (parameter estimation) in multiple-response repeated measurements.  相似文献   

5.
We reconsider the signal-extraction approach to measuring premia in the pricing of forward foreign exchange, put forward by Wolff, in which the difference between the forward rate and the associated future spot rate is modeled as an autoregressive moving average (ARMA) model for the risk premium buried in a white-noise forecast error. We point out that an ARMA model for the risk premium is not always identifiable from information on the difference between the forward rate and the future spot rate only. We present solutions to the problem of identification and show how the model for the risk premium can be estimated in a direct way, provided that the identification problem is solved. For reason of comparison, we use the series analyzed by Wolff to estimate the models for risk premia. The results confirm the earlier finding that premia in forward exchange exhibit a certain degree of persistence over time.  相似文献   

6.
In this article, we studied the identification of significant predictors in partially linear model in which some regressors are contaminated with random errors. Moreover, the dimension of parametric component is divergent and the regression coefficients are sparse. We applied difference technique to remove the nonparametric component for circumventing the selection of bandwidth, and constructed a bias-corrected shrinking estimator for the coefficient by using smoothly clipped absolute deviation (SCAD) penalty. Then, we derived the estimating and selecting consistency and established the asymptotic distribution for the identified significant estimators. Finally, Monte Carlo studies illustrate the performance of our approach.  相似文献   

7.
Summary.  We propose an approach for assessing the risk of individual identification in the release of categorical data. This requires the accurate calculation of predictive probabilities for those cells in a contingency table which have small sample frequencies, making the problem somewhat different from usual contingency table estimation, where interest is generally focused on regions of high probability. Our approach is Bayesian and provides posterior predictive probabilities of identification risk. By incorporating model uncertainty in our analysis, we can provide more realistic estimates of disclosure risk for individual cell counts than are provided by methods which ignore the multivariate structure of the data set.  相似文献   

8.
In this paper, the goal of identifying disease subgroups based on differences in observed symptom profile is considered. Commonly referred to as phenotype identification, solutions to this task often involve the application of unsupervised clustering techniques. In this paper, we investigate the application of a Dirichlet process mixture model for this task. This model is defined by the placement of the Dirichlet process on the unknown components of a mixture model, allowing for the expression of uncertainty about the partitioning of observed data into homogeneous subgroups. To exemplify this approach, an application to phenotype identification in Parkinson's disease is considered, with symptom profiles collected using the Unified Parkinson's Disease Rating Scale.  相似文献   

9.
In the literature, different optimality criteria have been considered for model identification. Most of the proposals assume the normal distribution for the response variable and thus they provide optimality criteria for discriminating between regression models. In this paper, a max–min approach is followed to discriminate among competing statistical models (i.e., probability distribution families). More specifically, k different statistical models (plausible for the data) are embedded in a more general model, which includes them as particular cases. The proposed optimal design maximizes the minimum KL-efficiency to discriminate between each rival model and the extended one. An equivalence theorem is proved and an algorithm is derived from it, which is useful to compute max–min KL-efficiency designs. Finally, the algorithm is run on two illustrative examples.  相似文献   

10.
Varying-coefficient models have been widely used to investigate the possible time-dependent effects of covariates when the response variable comes from normal distribution. Much progress has been made for inference and variable selection in the framework of such models. However, the identification of model structure, that is how to identify which covariates have time-varying effects and which have fixed effects, remains a challenging and unsolved problem especially when the dimension of covariates is much larger than the sample size. In this article, we consider the structural identification and variable selection problems in varying-coefficient models for high-dimensional data. Using a modified basis expansion approach and group variable selection methods, we propose a unified procedure to simultaneously identify the model structure, select important variables and estimate the coefficient curves. The unique feature of the proposed approach is that we do not have to specify the model structure in advance, therefore, it is more realistic and appropriate for real data analysis. Asymptotic properties of the proposed estimators have been derived under regular conditions. Furthermore, we evaluate the finite sample performance of the proposed methods with Monte Carlo simulation studies and a real data analysis.  相似文献   

11.
In many therapeutic areas, the identification and validation of surrogate endpoints is of prime interest to reduce the duration and/or size of clinical trials. Buyse et al. [Biostatistics 2000; 1:49-67] proposed a meta-analytic approach to the validation. In this approach, the validity of a surrogate is quantified by the coefficient of determination Rtrial2 obtained from a model, which allows for prediction of the treatment effect on the endpoint of interest ('true' endpoint) from the effect on the surrogate. One problem related to the use of Rtial2 is the difficulty in interpreting its value. To address this difficulty, in this paper we introduce a new concept, the so-called surrogate threshold effect (STE), defined as the minimum treatment effect on the surrogate necessary to predict a non-zero effect on the true endpoint. One of its interesting features, apart from providing information relevant to the practical use of a surrogate endpoint, is its natural interpretation from a clinical point of view.  相似文献   

12.
In many studies a large number of variables is measured and the identification of relevant variables influencing an outcome is an important task. For variable selection several procedures are available. However, focusing on one model only neglects that there usually exist other equally appropriate models. Bayesian or frequentist model averaging approaches have been proposed to improve the development of a predictor. With a larger number of variables (say more than ten variables) the resulting class of models can be very large. For Bayesian model averaging Occam’s window is a popular approach to reduce the model space. As this approach may not eliminate any variables, a variable screening step was proposed for a frequentist model averaging procedure. Based on the results of selected models in bootstrap samples, variables are eliminated before deriving a model averaging predictor. As a simple alternative screening procedure backward elimination can be used. Through two examples and by means of simulation we investigate some properties of the screening step. In the simulation study we consider situations with fifteen and 25 variables, respectively, of which seven have an influence on the outcome. With the screening step most of the uninfluential variables will be eliminated, but also some variables with a weak effect. Variable screening leads to more applicable models without eliminating models, which are more strongly supported by the data. Furthermore, we give recommendations for important parameters of the screening step.  相似文献   

13.
The aim of this paper is to define a new approach, called Hybrid Two-Step, to estimate the parameters of a second-order latent variable (LV) model in the case of formative relationships between the first-order and the second-order LVs. In this respect, we introduce the two main approaches to the estimation of second-order constructs through the partial least squares-path modelling: the so-called Repeated Indicators approach and the Two-Step approach. Some criticisms of these methodologies are highlighted and a solution to the issue of the identification of formative second-order constructs is suggested through the adoption of a Hybrid Two-Step approach. A Monte Carlo simulation study aimed at comparing the approach proposed with the traditional ones was performed. Finally, a case study about the passenger satisfaction is presented to show the implementation of the method and to give some comparative empirical results.  相似文献   

14.
In this paper, the correlation analysis based error compensation recursive least-square (RLS) identification method is proposed for the Hammerstein model. Firstly, the covariance matrix between input and output data points of the Hammerstein model is derived by using separable signal to realize that the unmeasurable internal variable is replaced by the covariance matrix of input. Thus, the correlation analysis method can be accordingly used to estimate parameters of the linear part, which results in the identification problem of the nonlinear part separated from the linear part. In addition, a correction term is added to least-square estimation to compensate error caused by output noise, further the error compensation-based RLS method is obtained for the observed data from the Hammerstein model. As a result, the least-square identification method, which generates error in the presence of noise distribution, can be compensated. Finally, simulation experiments are conducted to illustrate the performance of the proposed identification method.  相似文献   

15.
The article considers a Gaussian model with the mean and the variance modeled flexibly as functions of the independent variables. The estimation is carried out using a Bayesian approach that allows the identification of significant variables in the variance function, as well as averaging over all possible models in both the mean and the variance functions. The computation is carried out by a simulation method that is carefully constructed to ensure that it converges quickly and produces iterates from the posterior distribution that have low correlation. Real and simulated examples demonstrate that the proposed method works well. The method in this paper is important because (a) it produces more realistic prediction intervals than nonparametric regression estimators that assume a constant variance; (b) variable selection identifies the variables in the variance function that are important; (c) variable selection and model averaging produce more efficient prediction intervals than those obtained by regular nonparametric regression.  相似文献   

16.
In proteomics, identification of proteins from complex mixtures of proteins extracted from biological samples is an important problem. Among the experimental technologies, mass spectrometry (MS) is the most popular one. Protein identification from MS data typically relies on a ‘two-step’ procedure of identifying the peptide first followed by the separate protein identification procedure next. In this setup, the interdependence of peptides and proteins is neglected resulting in relatively inaccurate protein identification. In this article, we propose a Markov chain Monte Carlo based Bayesian hierarchical model, a first of its kind in protein identification, which integrates the two steps and performs joint analysis of proteins and peptides using posterior probabilities. We remove the assumption of independence of proteins by using clustering group priors to the proteins based on the assumption that proteins sharing the same biological pathway are likely to be present or absent together and are correlated. The complete conditionals of the proposed joint model being tractable, we propose and implement a Gibbs sampling scheme for full posterior inference that provides the estimation and statistical uncertainties of all relevant parameters. The model has better operational characteristics compared to two existing ‘one-step’ procedures on a range of simulation settings as well as on two well-studied datasets.  相似文献   

17.
The main object of this paper is to propose a multivariate extension to the alpha-power model which is an alternative to the multivariate skew-normal model (Arellano-Valle and Azzalini, 2008). It also extends the power-normal model discussed in Gupta and Gupta (2008) by making it more flexible. Inference is dealt with by using the likelihood approach and a pseudo-likelihood approach based on conditional distributions which, although slightly less efficient, is simpler to implement. An application to a real data set is used to demonstrate the usefulness of the extension.  相似文献   

18.
Three Mixed Proportional Hazard models for estimation of unemployment duration when attrition is present are considered. The virtue of these models is that they take account of dependence between failure times in a multivariate failure time distribution context. However, identification in dependent competing risks models is not straightforward. We show that these models, independently derived, are special cases of a general frailty model. It is also demonstrated that the three models are identified by means of identification of the general model. An empirical example illustrates the approach to model dependent failure times.  相似文献   

19.
Multivariate adaptive regression spline fitting or MARS (Friedman 1991) provides a useful methodology for flexible adaptive regression with many predictors. The MARS methodology produces an estimate of the mean response that is a linear combination of adaptively chosen basis functions. Recently, a Bayesian version of MARS has been proposed (Denison, Mallick and Smith 1998a, Holmes and Denison, 2002) combining the MARS methodology with the benefits of Bayesian methods for accounting for model uncertainty to achieve improvements in predictive performance. In implementation of the Bayesian MARS approach, Markov chain Monte Carlo methods are used for computations, in which at each iteration of the algorithm it is proposed to change the current model by either (a) Adding a basis function (birth step) (b) Deleting a basis function (death step) or (c) Altering an existing basis function (change step). In the algorithm of Denison, Mallick and Smith (1998a), when a birth step is proposed, the type of basis function is determined by simulation from the prior. This works well in problems with a small number of predictors, is simple to program, and leads to a simple form for Metropolis-Hastings acceptance probabilities. However, in problems with very large numbers of predictors where many of the predictors are useless it may be difficult to find interesting interactions with such an approach. In the original MARS algorithm of Friedman (1991) a heuristic is used of building up higher order interactions from lower order ones, which greatly reduces the complexity of the search for good basis functions to add to the model. While we do not exactly follow the intuition of the original MARS algorithm in this paper, we nevertheless suggest a similar idea in which the Metropolis-Hastings proposals of Denison, Mallick and Smith (1998a) are altered to allow dependence on the current model. Our modification allows more rapid identification and exploration of important interactions, especially in problems with very large numbers of predictor variables and many useless predictors. Performance of the algorithms is compared in simulation studies.  相似文献   

20.
This paper addresses the problem of identifying groups that satisfy the specific conditions for the means of feature variables. In this study, we refer to the identified groups as “target clusters” (TCs). To identify TCs, we propose a method based on the normal mixture model (NMM) restricted by a linear combination of means. We provide an expectation–maximization (EM) algorithm to fit the restricted NMM by using the maximum-likelihood method. The convergence property of the EM algorithm and a reasonable set of initial estimates are presented. We demonstrate the method's usefulness and validity through a simulation study and two well-known data sets. The proposed method provides several types of useful clusters, which would be difficult to achieve with conventional clustering or exploratory data analysis methods based on the ordinary NMM. A simple comparison with another target clustering approach shows that the proposed method is promising in the identification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号