In this paper we introduce a class of estimators which includes the ordinary least squares (OLS), the principal components regression (PCR) and the Liu estimator [1] Liu, K. 1993. A new class of biased estimate in linear regression. Communications in Statistics – Theory and Methods, 22(2): 393402. [Taylor & Francis Online], [Web of Science ®] [Google Scholar]. In particular, we show that our new estimator is superior, in the scalar mean-squared error (mse) sense, to the Liu estimator, to the OLS estimator and to the PCR estimator.  相似文献   

This paper uses various gauges to construct principal variables that satisfy criteria of maximal scatter. The solutions coincide with Hotelling's (1933) principal components in structured ensembles and mixtures, including heavy-tailed distributions not having moments. Thus, normal-theory tests are exact in level and power under nonstandard models allowing for correlated vector observations and for certain mixtures having neither moments nor unimodal marginals.  相似文献   

For a general class of scalar stationary processes, essentially those for which the best linear predictor is the best predictor (in the mean square sense), it is shown that, under fairly minor additional conditions, the sample autocorrelations converge to the true values almost surely and hniformly in the lag, t, at a rate (T-1log T)1/2, where T is the sample size. For ARMA processes, if |t|(log T)a, a < ∞, the rate is the best possible, namely (T-1log log T)1/2. In particular the somewhat implausible condition, on the innovations, that E{ε(t)2| Ft-l} is constant is avoided in these results. The theorems are used to discuss autoregressive approximation. When the stationary process is a vector process the condition on the innovation sequence, ε(t), that E{ε(t)ε(t)| Ft-l} be constant, cannot be entirely avoided in relation to autoregressive approximation. This is also discussed.  相似文献   

We consider the problem of semi-parametric regression modelling when the data consist of a collection of short time series for which measurements within series are correlated. The objective is to estimate a regression function of the form E[Y(t) | x] =x'ß+μ(t), where μ(.) is an arbitrary, smooth function of time t, and x is a vector of explanatory variables which may or may not vary with t. For the non-parametric part of the estimation we use a kernel estimator with fixed bandwidth h. When h is chosen without reference to the data we give exact expressions for the bias and variance of the estimators for β and μ(t) and an asymptotic analysis of the case in which the number of series tends to infinity whilst the number of measurements per series is held fixed. We also report the results of a small-scale simulation study to indicate the extent to which the theoretical results continue to hold when h is chosen by a data-based cross-validation method.  相似文献   

In practice, when a principal component analysis is applied on a large number of variables the resultant principal components may not be easy to interpret, as each principal component is a linear combination of all the original variables. Selection of a subset of variables that contains, in some sense, as much information as possible and enhances the interpretations of the first few covariance principal components is one possible approach to tackle this problem. This paper describes several variable selection criteria and investigates which criteria are best for this purpose. Although some criteria are shown to be better than others, the main message of this study is that it is unwise to rely on only one or two criteria. It is also clear that the interdependence between variables and the choice of how to measure closeness between the original components and those using subsets of variables are both important in determining the best criteria to use.  相似文献   

This paper presents the problem of prediction of a domain total value based on the general linear model. In many methods presented in the survey sampling literature (e.g. Cassel, Särndal & Wretman, 1977 [Foundations of inference in survey sampling, New York: John Wiley & Sons]; Valliant, Dorfman & Royall, 2000 [Finite population sampling and inference. A prediction approach. New York: John Wiley & Sons]; Rao, 2003 [Small area estimation. New York; John Wiley & Sons]) a common assumption is that for each element of a population the domain to which it belongs is known. This assumption is especially important in the situation when a superpopulation model with auxiliary variables is considered. In this paper a method is proposed for prediction of the domain total when it is not known whether a unit belongs to a given domain or not, or when the information is available only for sampled elements of the population.  相似文献   

We exploit the fact that the Wilcoxon score R-estimator of the slope in a linear regression model minimises Gini's mean difference of the residuals to obtain a Berry-Esseen rate of convergence result for the Wilcoxon R-estimator.  相似文献   

The linear structural model provides one way of modelling a linear relationship between two random variables. It is well known that problems of unidentifiability arise for unreplicated observations and normal error structure. As in all data sets, outliers can arise and methods are needed for detecting and testing them. An outlier-generating model of mean–slippage type can be used to characterise four different forms of outlier manifestation. It is interesting to find that the unidentifiability problem provides no obstacle for detecting or testing the outliers for three of the four forms. Detection principles, and specific discordancy tests, are derived and illustrated by application to some data on physical measurements of Pacific squid.  相似文献   

Cut-off sampling consists of deliberately excluding a set of units from possible selection in a sample, for example if the contribution of the excluded units to the total is small or if the inclusion of these units in the sample involves high costs. If the characteristics of interest of the excluded units differ from those of the rest of the population, the use of naïve estimators may result in highly biased estimates. In this paper, we discuss the use of auxiliary information to reduce the bias by means of calibration and balanced sampling. We show that the use of the available auxiliary information related to both the variable of interest and the probability of being excluded enables us to reduce the potential bias. A short numerical study supports our findings.  相似文献   

In linear regression, outliers and leverage points often have large influence in the model selection process. Such cases are downweighted with Mallows-type weights here, during estimation of submodel parameters by generalised M-estimation. A robust version of Mallows's Cp (Ronchetti &. Staudte, 1994) is then used to select a variety of submodels which are as informative as the full model. The methodology is illustrated on a new dataset concerning the agglomeration of alumina in Bayer precipitation.  相似文献   

This paper reviews some interesting but scattered results that are known about correlation in bivariate Poisson distributions and processes and presents some new results. A particular concern in both contexts is with results and examples relating to negative correlation.  相似文献   

Sampling procedures using randomized observation-points are suggested for estimating parameters in renewal and Markov renewal models. The usual asymptotic properties of the maximum likelihood method are shown to hold. The method we suggest provides a solution to the ML estimation problem in either or both of the following situations: (i) observations on between-event intervals are unavailable, (ii) the interval densities are unknown or difficult to evaluate while their Laplace-Stieltjes transforms are known.  相似文献   

This paper concerns the joint behaviour of precedence and exceedance statistics in random threshold models. Joint distributions of precedence and exceedance statistics, both exact and asymptotic, are obtained, and the results are illustrated for random thresholds based on order statistics and record values.  相似文献   


In this study, Monte Carlo simulation experiments were employed to examine the performance of four statistical two-group classification methods when the data distributions are skewed and misclassification costs are unequal, conditions frequently encountered in business and economic applications. The classification methods studied are linear and quadratic parametric, nearest neighbor and logistic regression methods. It was found that when skewness is moderate, the parametric methods tend to give best results. Depending on the specific data condition, when skewness is high, either the linear parametric, logistic regression, or the nearest-neighbor method gives the best results. When misclassification costs differ widely across groups, the linear parametric method is favored over the other methods for many of the data conditions studied.  相似文献   

Consider the model of k populations whose densities are nonreg-ular in the sense that they involve one or two unknown truncation parameters. In this paper a unified treatment of the problem of Bahadur efficiency of the likelihood ratio test for such a model is presented. The Bahadur efficiency of a certain test based on the union-intersection principle is also studied. Some of these results are then extended to a larger class of nonregular densities.  相似文献   

The paper has its origin in the finding that the frequency-domain estimation of ARh4A models can produce estimates which may be remarkably biased. Both of the frequency-domain estimation methods considered in the paper are based on the frequency-domain likelihood function, which depends on the periodogram ordinates of the time series. It is found that, as estimates of the spectrum ordinates, the corresponding periodogram ordinates may contain a rather remarkable bias, which again causes bias in the estimates of parameters produced by a frequency-domain estimation method of an ARMA model. The bias is reduced by tapering the observed time series. An example is given of estimation experiments for simulated time series from a pure autoregressive process of order two.  相似文献   

Since Durbin (1954) and Sargan (1958), instrumental variable (IV) method has long been one of the most popular procedures among economists and other social scientists to handle linear models with errors-in-variables. A direct application of this method to nonlinear errors-in-variables models, however, fails to yield consistent estimators.

This article restricts attention to Tobit and Probit models and shows that simple recentering and rescaling of the observed dependent variable may restore consistency of the standard IV estimator if the true dependent variable and the IV's are jointly normally distributed. Although the required condition seems rarely to be satisfied by real data, our Monte Carlo experiment suggests that the proposed estimator may be quite robust to the possible deviation from normality.  相似文献   

Many games played between two contestants start with a random deal, which introduces luck from the deal. The concept of ‘duplication’, commonly employed in Bridge tournaments, can be used to nullify that luck in a tournament. This paper poses a model for the contest between any two players and analyses the model for a tournament which employs the duplication principle. Two example tournaments on a new game are analysed.  相似文献   

