首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 463 毫秒
1.
ABSTRACT

Fisher's linear discriminant analysis (FLDA) is known as a method to find a discriminative feature space for multi-class classification. As a theory of extending FLDA to an ultimate nonlinear form, optimal nonlinear discriminant analysis (ONDA) has been proposed. ONDA indicates that the best theoretical nonlinear map for maximizing the Fisher's discriminant criterion is formulated by using the Bayesian a posterior probabilities. In addition, the theory proves that FLDA is equivalent to ONDA when the Bayesian a posterior probabilities are approximated by linear regression (LR). Due to some limitations of the linear model, there is room to modify FLDA by using stronger approximation/estimation methods. For the purpose of probability estimation, multi-nominal logistic regression (MLR) is more suitable than LR. Along this line, in this paper, we develop a nonlinear discriminant analysis (NDA) in which the posterior probabilities in ONDA are estimated by MLR. In addition, in this paper, we develop a way to introduce sparseness into discriminant analysis. By applying L1 or L2 regularization to LR or MLR, we can incorporate sparseness in FLDA and our NDA to increase generalization performance. The performance of these methods is evaluated by benchmark experiments using last_exam17 standard datasets and a face classification experiment.  相似文献   

2.
In some industrial applications, the quality of a process or product is characterized by a relationship between the response variable and one or more independent variables which is called as profile. There are many approaches for monitoring different types of profiles in the literature. Most researchers assume that the response variable follows a normal distribution. However, this assumption may be violated in many cases. The most likely situation is when the response variable follows a distribution from generalized linear models (GLMs). For example, when the response variable is the number of defects in a certain area of a product, the observations follow Poisson distribution and ignoring this fact will cause misleading results. In this paper, three methods including a T2-based method, likelihood ratio test (LRT) method and F method are developed and modified in order to be applied in monitoring GLM regression profiles in Phase I. The performance of the proposed methods is analysed and compared for the special case that the response variable follows Poisson distribution. A simulation study is done regarding the probability of the signal criterion. Results show that the LRT method performs better than two other methods and the F method performs better than the T2-based method in detecting either small or large step shifts as well as drifts. Moreover, the F method performs better than the other two methods, and the LRT method performs poor in comparison with the F and T2-based methods in detecting outliers. A real case, in which the size and number of agglomerates ejected from a volcano in successive days form the GLM profile, is illustrated and the proposed methods are applied to determine whether the number of agglomerates of each size is under statistical control or not. Results showed that the proposed methods could handle the mentioned situation and distinguish the out-of-control conditions.  相似文献   

3.
Magnetic resonance imaging techniques can be used to measure some biophysical properties of tissue. In this context, the T2 relaxation time is an important parameter for soft‐tissue contrast. The authors develop a new technique to estimate the integral of the distribution of T2 relaxation time without imposing any constraint other than the monotonicity of the underlying cumulative relaxation time distribution. They explore the properties of the estimation and its applications for the analysis of breast tissue data. As they show, an extension of linear discriminant analysis is found to distinguish well between two classes of breast tissue.  相似文献   

4.
The class of symmetric linear regression models has the normal linear regression model as a special case and includes several models that assume that the errors follow a symmetric distribution with longer-than-normal tails. An important member of this class is the t linear regression model, which is commonly used as an alternative to the usual normal regression model when the data contain extreme or outlying observations. In this article, we develop second-order asymptotic theory for score tests in this class of models. We obtain Bartlett-corrected score statistics for testing hypotheses on the regression and the dispersion parameters. The corrected statistics have chi-squared distributions with errors of order O(n ?3/2), n being the sample size. The corrections represent an improvement over the corresponding original Rao's score statistics, which are chi-squared distributed up to errors of order O(n ?1). Simulation results show that the corrected score tests perform much better than their uncorrected counterparts in samples of small or moderate size.  相似文献   

5.
A geometrical interpretation of the classical tests of the relation between two sets of variables is presented. One of the variable sets may be considered as fixed and then we have a multivariate regression model. When the Wilks’ lambda distribution is viewed geometrically it is obvious that the two special cases, theF distribution and the HotellingT 2 distribution are equivalent. From the geometrical perspective it is also obvious that the test statistic and thep-value are unchanged if the responses and the predictors are interchanged.  相似文献   

6.
Social network analysis is an important analytic tool to forecast social trends by modeling and monitoring the interactions between network members. This paper proposes an extension of a statistical process control method to monitor social networks by determining the baseline periods when the reference network set is collected. We consider probability density profile (PDP) to identify baseline periods using Poisson regression to model the communications between members. Also, Hotelling T2 and likelihood ratio test (LRT) statistics are developed to monitor the network in Phase I. The results based on signal probability indicate a satisfactory performance for the proposed method.  相似文献   

7.
Algebraic relationships between Hosmer–Lemeshow (HL), Pigeon–Heyse (J2), and Tsiatis (T) goodness-of-fit statistics for binary logistic regression models with continuous covariates were investigated, and their distributional properties and performances studied using simulations. Groups were formed under deciles-of-risk (DOR) and partition-covariate-space (PCS) methods. Under DOR, HL and T followed reported null distributions, while J2 did not. Under PCS, only T followed its reported null distribution, with HL and J2 dependent on model covariate number and partitioning. Generally, all had similar power. Of the three, T performed best, maintaining Type-I error rates and having a distribution invariant to covariate characteristics, number, and partitioning.  相似文献   

8.
Peanut allergy is one of the most prevalent food allergies. The possibility of a lethal accidental exposure and the persistence of the disease make it a public health problem. Evaluating the intensity of symptoms is accomplished with a double blind placebo-controlled food challenge (DBPCFC), which scores the severity of reactions and measures the dose of peanut that elicits the first reaction. Since DBPCFC can result in life-threatening responses, we propose an alternate procedure with the long-term goal of replacing invasive allergy tests. Discriminant analyses of DBPCFC score, the eliciting dose and the first accidental exposure score were performed in 76 allergic patients using 6 immunoassays and 28 skin prick tests. A multiple factorial analysis was performed to assign equal weights to both groups of variables, and predictive models were built by cross-validation with linear discriminant analysis, k-nearest neighbours, classification and regression trees, penalized support vector machine, stepwise logistic regression and AdaBoost methods. We developed an algorithm for simultaneously clustering eliciting dose values and selecting discriminant variables. Our main conclusion is that antibody measurements offer information on the allergy severity, especially those directed against rAra-h1 and rAra-h3. Further independent validation of these results and the use of new predictors will help extend this study to clinical practices.  相似文献   

9.
The classical D-optimality principle in regression design may be motivated by a desire to maximize the coverage probability of a fixed-volume confidence ellipsoid on the regression parameters. When the fitted model is exactly correct, this amounts to minimizing the determinant of the covariance matrix of the estimators. We consider an analogue of this problem, under the approximately linear model E[y|x] = θTz(x) + f(x). The nonlinear disturbance f(x) is essentially unknown, and the experimenter fits only to the linear part of the response. The resulting bias affects the coverage probability of the confidence ellipsoid on θ. We study the construction of designs which maximize the minimum coverage probability as f varies over a certain class. Explicit designs are given in the case that the fitted response surface is a plane.  相似文献   

10.
ABSTRACT

The effect of parameters estimation on profile monitoring methods has only been studied by a few researchers and only the assumption of a normal response variable has been tackled. However, in some practical situation, the normality assumption is violated and the response variable follows a discrete distribution such as Poisson. In this paper, we evaluate the effect of parameters estimation on the Phase II monitoring of Poisson regression profiles by considering two control charts, namely the Hotelling’s T2 and the multivariate exponentially weighted moving average (MEWMA) charts. Simulation studies in terms of the average run length (ARL) and the standard deviation of the run length (SDRL) are carried out to assess the effect of estimated parameters on the performance of Phase II monitoring approaches. The results reveal that both in-control and out-of-control performances of these charts are adversely affected when the regression parameters are estimated.  相似文献   

11.
For a general class of scalar stationary processes, essentially those for which the best linear predictor is the best predictor (in the mean square sense), it is shown that, under fairly minor additional conditions, the sample autocorrelations converge to the true values almost surely and hniformly in the lag, t, at a rate (T-1log T)1/2, where T is the sample size. For ARMA processes, if |t|(log T)a, a < ∞, the rate is the best possible, namely (T-1log log T)1/2. In particular the somewhat implausible condition, on the innovations, that E{ε(t)2| Ft-l} is constant is avoided in these results. The theorems are used to discuss autoregressive approximation. When the stationary process is a vector process the condition on the innovation sequence, ε(t), that E{ε(t)ε(t)| Ft-l} be constant, cannot be entirely avoided in relation to autoregressive approximation. This is also discussed.  相似文献   

12.
The asymptotic local power of least squares–based fixed-T panel unit root tests allowing for a structural break in their individual effects and/or incidental trends of the AR(1) panel data model is studied. Limiting distributions of these tests are derived under a sequence of local alternatives, and analytic expressions show how their means and variances are functions of the break date and the time dimension of the panel. The considered tests have nontrivial local power in a N?1/2 neighborhood of unity when the panel data model includes individual intercepts. For panel data models with incidental trends, the power of the tests becomes trivial in this neighborhood. However, this problem does not always appear if the tests allow for serial correlation in the error term and completely vanishes in the presence of cross-section correlation. These results show that fixed-T tests have very different theoretical properties than their large-T counterparts. Monte Carlo experiments demonstrate the usefulness of the asymptotic theory in small samples.  相似文献   

13.
In this article, we study the joint distribution of X and two linear combinations of order statistics, a T Y (2) and b T Y (2), where a = (a 1, a 2) T and b = (b 1, b 2) T are arbitrary vectors in R 2 and Y (2) = (Y (1), Y (2)) T is a vector of ordered statistics obtained from (Y 1, Y 2) T when (X, Y 1, Y 2) T follows a trivariate normal distribution with a positive definite covariance matrix. We show that this distribution belongs to the skew-normal family and hence our work is a generalization of Olkin and Viana (J Am Stat Assoc 90:1373–1379, 1995) and Loperfido (Test 17:370–380, 2008).  相似文献   

14.
Canonical discriminant functions are defined here as linear combinations that separate groups of observations, and canonical variates are defined as linear combinations associated with canonical correlations between two sets of variables. In standardized form, the coefficients in either type of canonical function provide information about the joint contribution of the variables to the canonical function. The standardized coefficients can be converted to correlations between the variables and the canonical function. These correlations generally alter the interpretation of the canonical functions. For canonical discriminant functions, the standardized coefficients are compared with the correlations, with partial t and F tests, and with rotated coefficients. For canonical variates, the discussion includes standardized coefficients, correlations between variables and the function, rotation, and redundancy analysis. Various approaches to interpretation of principal components are compared: the choice between the covariance and correlation matrices, the conversion of coefficients to correlations, the rotation of the coefficients, and the effect of special patterns in the covariance and correlation matrices.  相似文献   

15.
One of the objectives of research in statistical process control is to obtain control charts that show few false alarms but, at the same time, are able to detect quickly the shifts in the distribution of the quality variables employed to monitor a productive process. In this article, the synthetic-T 2 control chart is developed, which consists of the simultaneous use of a CRL chart and a Hotelling's T 2 control chart. The ARL is calculated employing Markov chains for steady and zero-state scenarios. A procedure of optimization has been developed to obtain the optimum parameters of the synthetic-T 2, for zero and steady cases, given the values of in-control ARL and magnitude of shift which needs to be detected rapidly. A comparison between (standard T 2, MEWMA, T 2 with variable sample size, and T 2 with double sampling) charts reveals that the synthetic-T 2 chart always performs better than the standard T 2 chart. The comparison with the remaining charts demonstrate in which cases the performance of this new chart makes it interesting to employ in real applications.  相似文献   

16.
When a process is monitored with a T 2 control chart in a Phase II setting, the MYT decomposition is a valuable diagnostic tool for interpreting signals in terms of the process variables. The decomposition splits a signaling T 2 statistic into independent components that can be associated with either individual variables or groups of variables. Since these components are T 2 statistics with known distributions, they can be used to determine which of the process variable(s) contribute to the signal. However, this procedure cannot be applied directly to Phase I since the distributions of the individual components are unknown. In this article, we develop the MYT decomposition procedure for a Phase I operation, when monitoring a random sample of individual observations and identifying outliers. We use a relationship between the T 2 statistic in Phase I with the corresponding T 2 statistic resulting when an observation is omitted from this sample to derive the distributions of these components and demonstrate the Phase I application of the MYT decomposition.  相似文献   

17.
This study considers the binary classification of functional data collected in the form of curves. In particular, we assume a situation in which the curves are highly mixed over the entire domain, so that the global discriminant analysis based on the entire domain is not effective. This study proposes an interval-based classification method for functional data: the informative intervals for classification are selected and used for separating the curves into two classes. The proposed method, called functional logistic regression with fused lasso penalty, combines the functional logistic regression as a classifier and the fused lasso for selecting discriminant segments. The proposed method automatically selects the most informative segments of functional data for classification by employing the fused lasso penalty and simultaneously classifies the data based on the selected segments using the functional logistic regression. The effectiveness of the proposed method is demonstrated with simulated and real data examples.  相似文献   

18.
Exact powers of four classical tests in a GMANOVA model are compared numerically when the order of the error sum of square matrix is 2. The four tests are likelihood ratio (=LR), Pillai's V, Hotelling's T 2, and Roy's largest root tests. It turns out that for small sizes, there are a few cases in which Rothenberg's condition for the relative magnitude of asymptotic powers of three standard tests does not hold.  相似文献   

19.
Results of an exhaustive study of the bias of the least square estimator (LSE) of an first order autoregression coefficient α in a contaminated Gaussian model are presented. The model describes the following situation. The process is defined as Xt = α Xt-1 + Yt . Until a specified time T, Yt are iid normal N(0, 1). At the moment T we start our observations and since then the distribution of Yt, tT, is a Tukey mixture T(εσ) = (1 – ε)N(0,1) + εN(0, σ2). Bias of LSE as a function of α and ε, and σ2 is considered. A rather unexpected fact is revealed: given α and ε, the bias does not change montonically with σ (“the magnitude of the contaminant”), and similarly, given α and σ, the bias is not growing with ε (“the amount of contaminants”).  相似文献   

20.
Artur J. Lemonte 《Statistics》2013,47(6):1249-1265
The class of generalized linear models with dispersion covariates, which allows us to jointly model the mean and dispersion parameters, is a natural extension to the classical generalized linear models. In this paper, we derive the asymptotic expansions under a sequence of Pitman alternatives (up to order n ?1/2) for the nonnull distribution functions of the likelihood ratio, Wald, Rao score and gradient statistics in this class of models. The asymptotic distributions of these statistics are obtained for testing a subset of regression parameters and for testing a subset of dispersion parameters. Based on these nonnull asymptotic expansions, the power of all four tests, which are equivalent to first order, are compared. Furthermore, we consider Monte Carlo simulations in order to compare the finite-sample performance of these tests in this class of models. We present two empirical applications to two real data sets for illustrative purposes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号