首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In 1936, H. Fairfield Smith (A discriminant function for plant selection, Annalsof Eugenice (London) 7,240–260) suggested a linear selection index for selecting varieties with higher genotypic values. Since then, the idea has been extended in various directions such as restricted selection indices. In this paper, linear selection indices are considered when, unlike squared error, the loss function is asymmetric. In particular, a LINEX loss function is considered for this purpose. It is shown that under multivariate normality, this approach still leads to the usual selection indices. Certain computational aspects are indicated.  相似文献   

2.
The problem of construction of selection indices with arbitrary linear inequality constraints on covariances is considered. The selection indices suggested by Smith (1936), Kempthorne Nordskog (1959) Tall is (1962) are shown as special cases. A method of solution is presented and prediction errors of various indices are compared. The method is applied to the example given in Kempthorne Nordskog (1959).  相似文献   

3.
In Wu and Zen (1999), a linear model selection procedure based on M-estimation is proposed, which includes many classical model selection criteria as its special cases, and it is shown that the selection procedure is strongly consistent for a variety of penalty functions. In this paper, we will investigate its small sample performances for some choices of fixed penalty functions. It can be seen that the performance varies with the choice of the penalty. Hence, a randomized penalty based on observed data is proposed, which preserves the consistency property and provides improved performance over a fixed choice of penalty functions.  相似文献   

4.
There has been ever increasing interest in the use of microarray experiments as a basis for the provision of prediction (discriminant) rules for improved diagnosis of cancer and other diseases. Typically, the microarray cancer studies provide only a limited number of tissue samples from the specified classes of tumours or patients, whereas each tissue sample may contain the expression levels of thousands of genes. Thus researchers are faced with the problem of forming a prediction rule on the basis of a small number of classified tissue samples, which are of very high dimension. Usually, some form of feature (gene) selection is adopted in the formation of the prediction rule. As the subset of genes used in the final form of the rule have not been randomly selected but rather chosen according to some criterion designed to reflect the predictive power of the rule, there will be a selection bias inherent in estimates of the error rates of the rules if care is not taken. We shall present various situations where selection bias arises in the formation of a prediction rule and where there is a consequent need for the correction of this bias. We describe the design of cross-validation schemes that are able to correct for the various selection biases.  相似文献   

5.
ABSTRACT

Supersaturated designs (SSDs) constitute a large class of fractional factorial designs which can be used for screening out the important factors from a large set of potentially active ones. A major advantage of these designs is that they reduce the experimental cost dramatically, but their crucial disadvantage is the confounding involved in the statistical analysis. Identification of active effects in SSDs has been the subject of much recent study. In this article we present a two-stage procedure for analyzing two-level SSDs assuming a main-effect only model, without including any interaction terms. This method combines sure independence screening (SIS) with different penalty functions; such as Smoothly Clipped Absolute Deviation (SCAD), Lasso and MC penalty achieving both the down-selection and the estimation of the significant effects, simultaneously. Insights on using the proposed methodology are provided through various simulation scenarios and several comparisons with existing approaches, such as stepwise in combination with SCAD and Dantzig Selector (DS) are presented as well. Results of the numerical study and real data analysis reveal that the proposed procedure can be considered as an advantageous tool due to its extremely good performance for identifying active factors.  相似文献   

6.
The logistic regression model is used when the response variables are dichotomous. In the presence of multicollinearity, the variance of the maximum likelihood estimator (MLE) becomes inflated. The Liu estimator for the linear regression model is proposed by Liu to remedy this problem. Urgan and Tez and Mansson et al. examined the Liu estimator (LE) for the logistic regression model. We introduced the restricted Liu estimator (RLE) for the logistic regression model. Moreover, a Monte Carlo simulation study is conducted for comparing the performances of the MLE, restricted maximum likelihood estimator (RMLE), LE, and RLE for the logistic regression model.  相似文献   

7.
In this article, by using the constant and random selection matrices, several properties of the maximum likelihood (ML) estimates and the ML estimator of a normal distribution with missing data are derived. The constant selection matrix allows us to obtain an explicit form of the ML estimates and the exact relationship between the EM algorithm and the score function. The random selection matrix allows us to clarify how the missing-data mechanism works in the proof of the consistency of the ML estimator, to derive the asymptotic properties of the sequence by the EM algorithm, and to derive the information matrix.  相似文献   

8.
The number of variables in a regression model is often too large and a more parsimonious model may be preferred. Selection strategies (e.g. all-subset selection with various penalties for model complexity, or stepwise procedures) are widely used, but there are few analytical results about their properties. The problems of replication stability, model complexity, selection bias and an over-optimistic estimate of the predictive value of a model are discussed together with several proposals based on resampling methods. The methods are applied to data from a case–control study on atopic dermatitis and a clinical trial to compare two chemotherapy regimes by using a logistic regression and a Cox model. A recent proposal to use shrinkage factors to reduce the bias of parameter estimates caused by model building is extended to parameterwise shrinkage factors and is discussed as a further possibility to illustrate problems of models which are too complex. The results from the resampling approaches favour greater simplicity of the final regression model.  相似文献   

9.
林存洁  李扬 《统计研究》2016,33(11):109-112
在大数据时代,传统的统计学是否还有用武之地成为很多人的争议。本文以ARGO模型为案例,介绍了统计方法在大数据分析中的应用和取得的成果,并从统计学的角度出发,提出改进的措施与方法。通过ARGO模型的分析结果发现,大数据分析的很多根本性问题仍然是统计问题,而数据中的统计规律仍然是数据分析要挖掘的最大价值,这也意味着统计思想在大数据分析中只能越来越重要。而对于结构复杂、来源多样的大数据来说,统计学方法也需要新的探索和尝试,这将是统计学所面临的机遇和挑战。  相似文献   

10.
The impact of restricted randomization on the information matrix has created challenges for the computation of design optimality criteria. This article focuses on the computation of the maximum and minimum prediction variance for Central Composite (CCD) and Box–Behnken (BBD) split plot designs (SPD). The approach is to analytically determine the exact maximum and minimum prediction variance for both spherical and cuboidal second-order SPD. A particular feature of these analytical functions is that they are functions of the design parameters. Finally, the application of these analytical functions is demonstrated for a CCD SPD.  相似文献   

11.
In its application to variable selection in the linear model, cross-validation is traditionally applied to an individual model contained in a set of potential models. Each model in the set is cross-validated independently of the rest and the model with the smallest cross-validated sum of squares is selected. In such settings, an efficient algorithm for cross-validation must be able to add and to delete single points quickly from a mixed model. Recent work in variable selection has applied cross-validation to an entire process of variable selection, such as Backward Elimination or Stepwise regression (Thall, Simon and Grier, 1992). The cross-validated version of Backward Elimination, for example, divides the data into an estimation and validation set and performs a complete Backward Elimination on the estimation set, while computing the cross-validated sum of squares at each step with the validation set. After doing this process once, a different validation set is selected and the process is repeated. The final model selection is based on the cross-validated sum of squares for all Backward Eliminations. An optimal algorithm for this application of cross-validation need not be efficient in adding and deleting observations from a single model but must be efficient in computing the cross-validation sum of squares from a series of models using a common validation set. This paper explores such an algorithm based on the sweep operator.  相似文献   

12.
In the linear regression model with elliptical errors, a shrinkage ridge estimator is proposed. In this regard, the restricted ridge regression estimator under sub-space restriction is improved by incorporating a general function which satisfies Taylor’s series expansion. Approximate quadratic risk function of the proposed shrinkage ridge estimator is evaluated in the elliptical regression model. A Monte Carlo simulation study and analysis based on a real data example are considered for performance analysis. It is evident from the numerical results that the shrinkage ridge estimator performs better than both unrestricted and restricted estimators in the multivariate t-regression model, for some specific cases.  相似文献   

13.
A review of the randomized response model introduced by Warner (1965) is given, then a randomized response model applicable to continuous data that considers a mixture of two normal distributions is considered. The target here is not to estimate any parameter, but rather to select the population with the best parameter value. This article provides a study on how to choose the best population between k distinct populations using an indifference-zone procedure. Also, this article includes tables for the required sample size needed in order to have a probability of correct selection higher than some specified value in the preference zone for the randomized response model considered.  相似文献   

14.
The least absolute shrinkage and selection operator (LASSO) is a prominent estimator which selects significant (under some sense) features and kills insignificant ones. Indeed the LASSO shrinks features larger than a noise level to zero. In this article, we force LASSO to be shrunken more by proposing a Stein-type shrinkage estimator emanating from the LASSO, namely the Stein-type LASSO. The newly proposed estimator proposes good performance in risk sense numerically. Variants of this estimator have smaller relative MSE and prediction error, compared to the LASSO, in the analysis of prostate cancer dataset.  相似文献   

15.
We propose that Bayesian variable selection for linear parametrizations with Gaussian iid likelihoods should be based on the spherical symmetry of the diagonalized parameter space. Our r-prior results in closed forms for the evidence for four examples, including the hyper-g prior and the Zellner–Siow prior, which are shown to be special cases. Scenarios of a single-variable dispersion parameter and of fixed dispersion are studied, and asymptotic forms comparable to the traditional information criteria are derived. A simulation exercise shows that model comparison based on our r-prior gives good results comparable to or better than current model comparison schemes.  相似文献   

16.
Abstract

We propose a unified approach for multilevel sample selection models using a generalized result on skew distributions arising from selection. If the underlying distributional assumption is normal, then the resulting density for the outcome is the continuous component of the sample selection density and has links with the closed skew-normal distribution (CSN). The CSN distribution provides a framework which simplifies the derivation of the conditional expectation of the observed data. This generalizes the Heckman’s two-step method to a multilevel sample selection model. Finite-sample performance of the maximum likelihood estimator of this model is studied through a Monte Carlo simulation.  相似文献   

17.
18.
Let (X1,…,Xk) be a multinomial vector with unknown cell probabilities (p1,?,pk). A subset of the cells is to be selected in a way so that the cell associated with the smallest cell probability is included in the selected subset with a preassigned probability, P1. Suppose the loss is measured by the size of the selected subset, S. Using linear programming techniques, selection rules can be constructed which are minimax with respect to S in the class of rules which satisfy the P1-condition. In some situations, the rule constructed by this method is the rule proposed by Nagel (1970). Similar techniques also work for selection in terms of the largest cell probability.  相似文献   

19.
Özkale and Kaçiranlar introduced the restricted two-parameter estimator (RTPE) to deal with the well-known multicollinearity problem in linear regression model. In this paper, the restricted almost unbiased two-parameter estimator (RAUTPE) based on the RTPE is presented. The quadratic bias and mean-squared error of the proposed estimator is discussed and compared with the corresponding competitors in literatures. Furthermore, a numerical example and a Monte Carlo simulation study are given to explain some of the theoretical results.  相似文献   

20.
Variable selection in elliptical Linear Mixed Models (LMMs) with a shrinkage penalty function (SPF) is the main scope of this study. SPFs are applied for parameter estimation and variable selection simultaneously. The smoothly clipped absolute deviation penalty (SCAD) is one of the SPFs and it is adapted into the elliptical LMM in this study. The proposed idea is highly applicable to a variety of models which are set up with different distributions such as normal, student-t, Pearson VII, power exponential and so on. Simulation studies and real data example with one of the elliptical distributions show that if the variable selection is also a concern, it is worthwhile to carry on the variable selection and the parameter estimation simultaneously in the elliptical LMM.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号