期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The application of bias to discriminant analysis

Pasquale J. Di Pillo 《统计学通讯:理论与方法》2013,42(9):843-854

When classification rules are constructed using sample estimatest it is known that the probability of misclassification is not minimized. This article introduces a biased minimum X² rule to classify items from a multivariate normal population. Using the principle of variance reduction, the probability of misclassification is reduced when the biased procedure is employed. Results of sampling experiments over a broad range of conditions are provided to demonstrate this improvement. 相似文献

2.

Biased discriminant analysis: Evaluation of the optimum probability of misclassification

Pasquale J. Dipillo 《统计学通讯:理论与方法》2013,42(14):1447-1457

This article extends the work of DiPillo (1976) on the Biased Minimum x² Rule. The optimum value of k (the biasing factor) Is determined and the true probability of misclassification is found. The proportion improvements reported in the 1976 paper are shown to be conservative. Some suggestions for algorithms to determine the optimal value of k are presented. 相似文献

3.

A simulation study of five biased estimators for straight line regression

S. G. Carmer W. T. Hsieh 《统计学通讯:模拟与计算》2013,42(6):529-548

Five biased estimators of the slope in straight line regression are considered. For each, the estimate of the “bias parameter”, k, is a function of N, the number of observations, and [rcirc]² , the square of the least squares estimate of the standardized slope, β. The estimators include that of Farebrother, the ridge estimator of Hoerl, Kennard, and Baldwin, Vinod's shrunken estimators., and a new modification of one of the latter. Properties of the estimators are studied for 13 combinations of N and 3. Results of simulation experiments provide empirical evidence concerning the values of means and variances of the biased estimators of the slope and estimates of the “bias parameter”, the mean square errors of the estimators, and the frequency of improvement relative to least squares. Adjustments to degrees of freedom in the biased regression analysis of variance table are also considered. An extension of the new modification to the case of p> 1 independent variables is presented in an Appendix. 相似文献

4.

M-estimation Under a Two-Sample Semiparametric Model

Biao Zhang 《Scandinavian Journal of Statistics》2000,27(2):263-280

We consider M -estimation under a two-sample semiparametric model in which the log ratio of two unknown density functions has a known parametric form. This two-sample semiparametric model, arising naturally from case-control studies and logistic discriminant analysis, can be regarded as a biased sampling model. A new class of M -estimators are constructed on the basis of the maximum semiparametric likelihood estimator of the underlying distribution function. It is shown that the proposed M -estimators are consistent and asymptotically normally distributed. A simulation study is presented to demonstrate the performance of the proposed M -estimators. 相似文献

5.

An Optimal Semiparametric Method for Two‐group Classification

《Scandinavian Journal of Statistics》2018,45(3):806-846

In the classical discriminant analysis, when two multivariate normal distributions with equal variance–covariance matrices are assumed for two groups, the classical linear discriminant function is optimal with respect to maximizing the standardized difference between the means of two groups. However, for a typical case‐control study, the distributional assumption for the case group often needs to be relaxed in practice. Komori et al. (Generalized t ‐statistic for two‐group classification. Biometrics 2015, 71: 404–416) proposed the generalized t ‐statistic to obtain a linear discriminant function, which allows for heterogeneity of case group. Their procedure has an optimality property in the class of consideration. We perform a further study of the problem and show that additional improvement is achievable. The approach we propose does not require a parametric distributional assumption on the case group. We further show that the new estimator is efficient, in that no further improvement is possible to construct the linear discriminant function more efficiently. We conduct simulation studies and real data examples to illustrate the finite sample performance and the gain that it produces in comparison with existing methods. 相似文献

6.

A comparison of various estimators of the mean of an inverse gaussian distribution

《Journal of Statistical Computation and Simulation》2012,82(1-2):71-81

In this paper we consider the Inverse Gaussian distribution whose variance is proportional to the mean. Assuming that the data are available from IGD(,μ,c,μ ²), and also from its length biased version, simulation studies are presented to compare the MVUE and MLE in terms of their variances and mean square errors from both kinds of data. Some tables and graphs are provided to analyze the comparisons. Finally, some recommendations and conclusions are given when one or both kinds of data are available. 相似文献

7.

Performance of nonparametric maximum likelihood estimator in a

Kyunghee K. Song Weissfeld Lisa A. 《统计学通讯:模拟与计算》2013,42(3):637-655

When the probability of selecting an individual in a population is proportional to its lifelength, it is called length biased sampling. A nonparametric maximum likelihood estimator (NPMLE) of survival in a length biased sample is given in Vardi (1982). In this study, we examine the performance of Vardi's NPMLE in estimating the true survival curve when observations are from a length biased sample. We also compute estimators based on a linear combination (LCE) of empirical distribution function (EDF) estimators and weighted estimators. In our simulations, we consider observations from a mixture of two different distributions, one from F and the other from G which is a length biased distribution of F. Through a series of simulations with various proportions of length biasing in a sample, we show that the NPMLE and the LCE closely approximate the true survival curve. Throughout the survival curve, the EDF estimators overestimate the survival. We also consider a case where the observations are from three different weighted distributions, Again, both the NPMLE and the LCE closely approximate the true distribution, indicating that the length biasedness is properly adjusted for. Finally, an efficiency study shows that Vardi's estimators are more efficient than the EDF estimators in the lower percentiles of the survival curves. 相似文献

8.

Rank procedures in many population forced discrimination problems

Tie-Hua Ng Ronald H. Randies 《统计学通讯:理论与方法》2013,42(17):1943-1959

In this paper the rank method for forced discrimination in two population problems, introduced by Randies, Broffitt, Ramberg and Hogg (1978), is extended to cover settings involving more than two populations. Several methods of ranking are compared to the normal theory procedure in a Monte Carlo study. Asymptotic theory is included which confirms that the rank method does balance the limiting probabilities of misclassification in a two population setting. 相似文献

9.

FOURIER SERIES ESTIMATION FOR LENGTH BIASED DATA

M.C. Jones R.J. Karunamuni 《Australian & New Zealand Journal of Statistics》1997,39(1):57-68

This paper proposes and investigates Fourier series estimators for length biased data. Specifically, two Fourier series estimators are constructed and studied based on ideas of Jones (1991) and Bhattacharyya et al. (1988) in the case of kernel density estimation. Approximate expressions for mean squared errors and integrated mean squared errors are obtained and compared, and some simulated examples are investigated. The Fourier series estimator based on the proposal of Jones seems to have the more desirable properties of the two. The paper concludes with some comments that put this work in a wider context. 相似文献

10.

Comment on hoerl and kennard's ridge regression simulation methodology

Mark D. Pagel 《统计学通讯:理论与方法》2013,42(22):2361-2367

Ridge regression has received strong support in several Monte carlo studies, leading some authors to advocate its general use. It is argued, however, that these studies were strongly biased in favor of ridge regression by simulating regression coefficient vectors centered at the origin; a condition well suited for a shrinkage technique. Studies which modeled some non-zero regression coefficients and which showed only qualified support for ridge regression are cited in support of this argument. It is concluded that only to the extent that ridge regression type coefficient vectors actually underlie real data sets -a poorly understood phenomenon - will ridge regression be of use. 相似文献

11.

Weighted bivariate logarithmic series distributions

Ramesh C. Gupta Ram C. Tripathti 《统计学通讯:理论与方法》2013,42(5):1099-1117

In this paper, we are interested in the weighted distributions of a bivariate three parameter logarithmic series distribution studied by Kocherlakota and Kocherlakota (1990). The weighted versions of the model are derived with weight W(x,y) = x^[r] y^[s]. Explicit expressions for the probability mass function and probability generating functions are derived in the case r = s = l. The marginal and conditional distributions are derived in the general case. The maximum likelihood estimation of the parameters, in both two parameter and three parameter cases, is studied. A procedure for computer generation of bivariate data from a discrete distribution is described. This enables us to present two examples, in order to illustrate the methods developed, for finding the maximum likelihood estimates. 相似文献

12.

Bayesian nonparametric density estimation under length bias

Spyridon J. Hatjispyros Theodoros Nicoleris Stephen G. Walker 《统计学通讯:模拟与计算》2017,46(10):8064-8076

A density estimation method in a Bayesian nonparametric framework is presented when recorded data are not coming directly from the distribution of interest, but from a length biased version. From a Bayesian perspective, efforts to computationally evaluate posterior quantities conditionally on length biased data were hindered by the inability to circumvent the problem of a normalizing constant. In this article, we present a novel Bayesian nonparametric approach to the length bias sampling problem that circumvents the issue of the normalizing constant. Numerical illustrations as well as a real data example are presented and the estimator is compared against its frequentist counterpart, the kernel density estimator for indirect data of Jones. 相似文献

13.

Random sequential packing in Rn

《Journal of Statistical Computation and Simulation》2012,82(2):87-93

We introduce a Monte Carlo method for packing hypercubes in Rⁿ . Rigorous and conceptually simple, it is currently practical for n≥4. Experimental results indicate that Palasti's conjecture is false for R ² and K³ and still undecided for K⁴ 相似文献

14.

Fitting parametric frailty and mixture models under biased sampling

P. Economou 《Journal of applied statistics》2009,36(1):53-66

Biased sampling from an underlying distribution with p.d.f. f(t), t>0, implies that observations follow the weighted distribution with p.d.f. f ^w(t)=w(t)f(t)/E[w(T)] for a known weight function w. In particular, the function w(t)=t ^α has important applications, including length-biased sampling (α=1) and area-biased sampling (α=2). We first consider here the maximum likelihood estimation of the parameters of a distribution f(t) under biased sampling from a censored population in a proportional hazards frailty model where a baseline distribution (e.g. Weibull) is mixed with a continuous frailty distribution (e.g. Gamma). A right-censored observation contributes a term proportional to w(t)S(t) to the likelihood; this is not the same as S ^w(t), so the problem of fitting the model does not simply reduce to fitting the weighted distribution. We present results on the distribution of frailty in the weighted distribution and develop an EM algorithm for estimating the parameters of the model in the important Weibull–Gamma case. We also give results for the case where f(t) is a finite mixture distribution. Results are presented for uncensored data and for Type I right censoring. Simulation results are presented, and the methods are illustrated on a set of lifetime data. 相似文献

15.

A study on discriminant analysis techniques applied to multivariate lognormal data

《Journal of Statistical Computation and Simulation》2012,82(1-2):79-100

The purpose of this paper is to examine the multiple group (>2) discrimination problem in which the group sizes are unequal and the variables used in the classification are correlated with skewed distributions. Using statistical simulation based on data from a clinical study, we compare the performances, in terms of misclassification rates, of nine statistical discrimination methods. These methods are linear and quadratic discriminant analysis applied to untransformed data, rank transformed data, and inverse normal scores data, as well as fixed kernel discriminant analysis, variable kernel discriminant analysis, and variable kernel discriminant analysis applied to inverse normal scores data. It is found that the parametric methods with transformed data generally outperform the other methods, and the parametric methods applied to inverse normal scores usually outperform the parametric methods applied to rank transformed data. Although the kernel methods often have very biased estimates, the variable kernel method applied to inverse normal scores data provides considerable improvement in terms of total nonerror rate. 相似文献

16.

Integrating linear discriminant analysis,polynomial basis expansion,and genetic search for two-group classification

Michael J. Brusco Clay M. Voorhees Roger J. Calantone Michael K. Brady Douglas Steinley 《统计学通讯:模拟与计算》2019,48(6):1623-1636

We propose a hybrid two-group classification method that integrates linear discriminant analysis, a polynomial expansion of the basis (or variable space), and a genetic algorithm with multiple crossover operations to select variables from the expanded basis. Using new product launch data from the biochemical industry, we found that the proposed algorithm offers mean percentage decreases in the misclassification error rate of 50%, 56%, 59%, 77%, and 78% in comparison to a support vector machine, artificial neural network, quadratic discriminant analysis, linear discriminant analysis, and logistic regression, respectively. These improvements correspond to annual cost savings of $4.40–$25.73 million. 相似文献

17.

How non-normality affects the quadratic discriminant function

William R. Clarke Peter A. Lachenbruch Barabara Broffitt 《统计学通讯:理论与方法》2013,42(13):1285-1301

The quadratic discriminant function is commonly used for the two group classification problem when the covariance matrices in the two populations are substantially unequal. This procedure is optimal when both populations are multivariate normal with known means and covariance matrices. This study examined the robustness of the QDF to non-normality. Sampling experiments were conducted to estimate expected actual error rates for the QDF when sampling from a variety of non-normal distributions. Results indicated that the QDF was robust to non-normality except when the distributions were highly skewed, in which case relatively large deviations from optimal were observed. In all cases studied the average probabilities of misclassification were relatively stable while the individual population error rates exhibited considerable variability. 相似文献

18.

On identification of transfer function models by biased regression methods

《Journal of Statistical Computation and Simulation》2012,82(3):131-148

This paper investigates a biased regression approach to the preliminary estimation of the Box-Jenkins transfer function weights. Using statistical simulation to generate time series, 14 estimators (various OLS, ridge and principal components estimators) are compared in terms of MSE and standard error of the weight estimators. The estimators are investigated for different levels of multicollinearity, signal-to-noise ratio, number of independent variables, length of time series and number of lags included in the estimation. The results show that the ridge estimators nearly always give lower MSE than the OLS estimator, and in the computationally difficult cases give much lower MSE than the OLS estimator. The principal components estimators can give lower MSE than the OLS, but also higher values. All biased estimators nearly always give much lower estimated standard error than OLS when estimating the weights. 相似文献

19.

Comparison of Two Estimators of Parameters Under Pitman Nearness Criterion

Hu Yang Wenxue Li Jianwen Xu 《统计学通讯:理论与方法》2013,42(17):3081-3094

In this article, Pitman nearness criterion is used to compare two competing united biased estimators in linear model. In particular, a sufficient and necessary condition for one estimator being superior to the other is derived. Furthermore, a simulation study is performed to illustrate the theoretical results and several special cases are also studied. 相似文献

20.

Predicting early educational program placement with discrete discriminant analysis

Louise H. Boothby James K. Brewer 《统计学通讯:理论与方法》2013,42(11):4049-4060

The purpose of this study was to predict placement and nonplacement outcomes for mildly handicapped three through five year old children given knowledge of developmental screening test data. Discrete discriminant analysis (Anderson, 1951; Cochran & Hopkins, 1961; Goldstein & Dillon, 1978) was used to classify children into either a placement or nonplacement group using developmental information retrieved from longitudinal Child Find records (1982-89). These records were located at the Florida Diagnostic and Learning Resource System (FDLRS) in Sarasota, Florida and provided usable data for 602 children. The developmental variables included performance on screening test activities from the Comprehensive Identification Process (Zehrbach, 1975), and consisted of: (a) gross motor skills, (b) expressive language skills, and (c) social-emotional skills. These three dichotomously scored developmental variables generated eight mutually exclusive and exhaustive combinations of screening data. Combined with one of three different types of cost-of-misclassification functions, each child in a random cross-validation sample of 100 was classified into one of the two outcome groups minimizing the expected cost of misclassification based on the remaining 502 children. For each cost function designed by the researchers a comparison was made between classifications from the discrete discriminant analysis procedure and actual placement outcomes for the 100 children. A logit analysis and a standard discriminant analysis were likewise conducted using the 502 children and compared with results of the discrete discriminant analysis for selected cost functions. 相似文献