首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
In this paper, we describe an overall strategy for robust estimation of multivariate location and shape, and the consequent identification of outliers and leverage points. Parts of this strategy have been described in a series of previous papers (Rocke, Ann. Statist., in press; Rocke and Woodruff, Statist. Neerlandica 47 (1993), 27–42, J. Amer. Statist. Assoc., in press; Woodruff and Rocke, J. Comput. Graphical Statist. 2 (1993), 69–95; J. Amer. Statist. Assoc. 89 (1994), 888–896) but the overall structure is presented here for the first time. After describing the first-level architecture of a class of algorithms for this problem, we review available information about possible tactics for each major step in the process. The major steps that we have found to be necessary are as follows: (1) partition the data into groups of perhaps five times the dimension; (2) for each group, search for the best available solution to a combinatorial estimator such as the Minimum Covariance Determinant (MCD) — these are the preliminary estimates; (3) for each preliminary estimate, iterate to the solution of a smooth estimator chosen for robustness and outlier resistance; and (4) choose among the final iterates based on a robust criterion, such as minimum volume. Use of this algorithm architecture can enable reliable, fast, robust estimation of heavily contaminated multivariate data in high (> 20) dimension even with large quantities of data. A computer program implementing the algorithm is available from the authors.  相似文献   

We consider estimating the tail-index of a distribution under the assumption of multivariate ellipticity. Recently, a separating Hill estimator for multivariate elliptical distributions was proposed. This estimator is an affine invariant alternative to using the marginal observations in tail-index estimation and is hence unaffected by, e.g. change of units of measurement. However, the separating Hill estimator depends on the location and scatter of the elliptical distribution, which, in practice, have to be estimated. The effect of replacing the true location and scatter of the distribution by estimates has previously been only examined through simulations. In this article we show that the error caused by replacing the location and scatter of the distribution by estimates indeed is asymptotically negligible. This fact is essential for the practicality of the separating Hill estimator. In addition to providing the theoretical results, we present simulation results on the asymptotic behaviour of the estimators.  相似文献   

Summary.  We compare two different multilevel modelling approaches to the analysis of repeated measures data to assess the effect of mother level characteristics on women's use of prenatal care services in Uttar Pradesh, India. We apply univariate multilevel models to our data and find that the model assumptions are severely violated and the parameter estimates are not stable, particularly for the mother level random effect. To overcome this we apply a multivariate multilevel model. The correlation structure shows that, once the decision has been made regarding use of antenatal care by the mother for her first observed birth in the data, she does not tend to change this decision for higher order births.  相似文献   

In a regression model with univariate response, the quantities derived from the least-absolute-deviations method need not be unique. In this note, we show that, contrary to the univariate case, in a regression model with multivariate response, the least-distances method typically yields quantities that exhibit uniqueness properties that are similar to those obtained by the least-squares method.  相似文献   


The asymptotic cumulants of the minimum phi-divergence estimators of the parameters in a model for categorical data are obtained up to the fourth order with the higher-order asymptotic variance under possible model misspecification. The corresponding asymptotic cumulants up to the third order for the studentized minimum phi-divergence estimator are also derived. These asymptotic cumulants, when a model is misspecified, depend on the form of the phi-divergence. Numerical illustrations with simulations are given for typical cases of the phi-divergence, where the maximum likelihood estimator does not necessarily give best results. Real data examples are shown using log-linear models for contingency tables.  相似文献   

This study reveals that contrary to the conventional wisdom among econometricians, the bias of the OLS estimator can be quite small when the estimator is applied to a geometrically distributed lag model, yt<ce:glyph name="dbnd6"/> α + βx t+ λy t-1. + ut, with autocorrelated disturbances, be they AR(1), MA(1), MA(2), AR(2), and ARMA(1,1). This happens when λ is large and xtis smoothly trended (e.g., a real GNP series). In fact, the bias of the OLS estimator becomes zero at one parameter combination, and the OLS estimator performs well over a wide range around this parameter combination. By decomposing the disturbance term into two parts, the paper also explains why OLS shows such an unexpected property. These findings have both pedagogical and practical significance.  相似文献   

Inspired by a primary hypertension study was conducted by Chinese government in the Inner Mongolia Autonomous Region, we introduce partially linear models with multivariate responses to evaluate the simultaneous effects of modifiable risk factors on both the systolic and the diastolic blood pressures. We propose a class of weighted profile least-squares approaches to estimate both the parametric and the nonparametric components of the multivariate partially linear models. We also investigate how the weight matrix affects the resultant estimation efficiency. We illustrate our proposals through simulations and an analysis of the primary hypertension data. Our analysis provides strong evidence that the obesity is indeed an important risk factor predisposing to primary hypertension even after adjusting for the ageing effect.  相似文献   

Generalized additive models for location, scale and shape   总被引:10,自引:0,他引:10  
Summary.  A general class of statistical models for a univariate response variable is presented which we call the generalized additive model for location, scale and shape (GAMLSS). The model assumes independent observations of the response variable y given the parameters, the explanatory variables and the values of the random effects. The distribution for the response variable in the GAMLSS can be selected from a very general family of distributions including highly skew or kurtotic continuous and discrete distributions. The systematic part of the model is expanded to allow modelling not only of the mean (or location) but also of the other parameters of the distribution of y , as parametric and/or additive nonparametric (smooth) functions of explanatory variables and/or random-effects terms. Maximum (penalized) likelihood estimation is used to fit the (non)parametric models. A Newton–Raphson or Fisher scoring algorithm is used to maximize the (penalized) likelihood. The additive terms in the model are fitted by using a backfitting algorithm. Censored data are easily incorporated into the framework. Five data sets from different fields of application are analysed to emphasize the generality of the GAMLSS class of models.  相似文献   

Summary.  Problems of the analysis of data with incomplete observations are all too familiar in statistics. They are doubly difficult if we are also uncertain about the choice of model. We propose a general formulation for the discussion of such problems and develop approximations to the resulting bias of maximum likelihood estimates on the assumption that model departures are small. Loss of efficiency in parameter estimation due to incompleteness in the data has a dual interpretation: the increase in variance when an assumed model is correct; the bias in estimation when the model is incorrect. Examples include non-ignorable missing data, hidden confounders in observational studies and publication bias in meta-analysis. Doubling variances before calculating confidence intervals or test statistics is suggested as a crude way of addressing the possibility of undetectably small departures from the model. The problem of assessing the risk of lung cancer from passive smoking is used as a motivating example.  相似文献   


In this work, we proposed an adaptive multivariate cumulative sum (CUSUM) statistical process control chart for signaling a range of location shifts. This method was based on the multivariate CUSUM control chart proposed by Pignatiello and Runger (1990 Pignatiello, J.J., Runger, G.C. (1990). Comparisons of multivariate CUSUM charts. J. Qual. Technol. 22(3):173186.[Taylor & Francis Online], [Web of Science ®] [Google Scholar]), but we adopted the adaptive approach similar to that discussed by Dai et al. (2011 Dai, Y., Luo, Y., Li, Z., Wang, Z. (2011). A new adaptive CUSUM control chart for detecting the multivariate process mean. Qual. Reliab. Eng. Int. 27(7):877884.[Crossref], [Web of Science ®] [Google Scholar]), which was based on a different CUSUM method introduced by Crosier (1988 Crosier, R.B. (1988). Multivariate generalizations of cumulative sum quality-control schemes. Technometrics 30(3):291303.[Taylor & Francis Online], [Web of Science ®] [Google Scholar]). The reference value in this proposed procedure was changed adaptively in each run, with the current mean shift estimated by exponentially weighted moving average (EWMA) statistic. By specifying the minimal magnitude of the mean shift, our proposed control chart achieved a good overall performance for detecting a range of shifts rather than a single value. We compared our adaptive multivariate CUSUM method with that of Dai et al. (2001 Dai, Y., Luo, Y., Li, Z., Wang, Z. (2011). A new adaptive CUSUM control chart for detecting the multivariate process mean. Qual. Reliab. Eng. Int. 27(7):877884.[Crossref], [Web of Science ®] [Google Scholar]) and the non adaptive versions of these two methods, by evaluating both the steady state and zero state average run length (ARL) values. The detection efficiency of our method showed improvements over the comparative methods when the location shift is unknown but falls within an expected range.  相似文献   


Every large census operation should undergo evaluation programs to find the sources and extent of inherent coverage errors. In this article, we briefly discuss the statistical methodology to estimate the omission rate in Indian census using dual-system estimation (DSE) technique. We have explicitly studied the correlation bias factor involved in the estimate, its extent, and consequences. A new potential source of bias in the estimate is identified and discussed. During the survey, more efficient enumerators compared to the census operations are appointed, and this fact may inflate the dependency between two lists and lead to a significant bias. Some examples are given to demonstrate this argument in various plausible situations. We have suggested one simple and flexible approach which can control this bias. Our proposed estimator can efficiently overcome the potential bias by achieving the desired degree of accuracy (almost unbiased) with relatively higher efficiency. Overall improvements in the results are explored through simulation study on different populations.  相似文献   

Assuming the disturbances are normally distributed, we derive expressions for, and simple conditions for the existence of the exact bias and matrix of second order moments of the Lawless and Wang Operational Ridge Regression estimator.  相似文献   

Summary.  A general latent normal model for multilevel data with mixtures of response types is extended in the case of ordered responses to deal with variates having a large number of categories and including count data. An example is analysed by using repeated measures data on child growth and adult measures of body mass index and glucose. Applications are described that are concerned with the flexible prediction of adult measurements from collections of growth measurements and for studying the relationship between the number of measurement occasions and growth trajectories.  相似文献   

In this paper, we consider an estimation for the unknown parameters of a conditional Gaussian MA(1) model. In the majority of cases, a maximum-likelihood estimator is chosen because the estimator is consistent. However, for small sample sizes the error is large, because the estimator has a bias of O(n? 1). Therefore, we provide a bias of O(n? 1) for the maximum-likelihood estimator for the conditional Gaussian MA(1) model. Moreover, we propose new estimators for the unknown parameters of the conditional Gaussian MA(1) model based on the bias of O(n? 1). We investigate the properties of the bias, as well as the asymptotical variance of the maximum-likelihood estimators for the unknown parameters, by performing some simulations. Finally, we demonstrate the validity of the new estimators through this simulation study.  相似文献   

A convergence result for kernel type density estimators, proved by Devroye and Gyrofi (1985), is extended to stationary Markov processess satisfying (G 2-condition introduced by Rosenblatt (1970).  相似文献   

In this article, we propose an efficient and robust estimation for the semiparametric mixture model that is a mixture of unknown location-shifted symmetric distributions. Our estimation is derived by minimizing the profile Hellinger distance (MPHD) between the model and a nonparametric density estimate. We propose a simple and efficient algorithm to find the proposed MPHD estimation. Monte Carlo simulation study is conducted to examine the finite sample performance of the proposed procedure and to compare it with other existing methods. Based on our empirical studies, the newly proposed procedure works very competitively compared to the existing methods for normal component cases and much better for non-normal component cases. More importantly, the proposed procedure is robust when the data are contaminated with outlying observations. A real data application is also provided to illustrate the proposed estimation procedure.  相似文献   

The Clean Air Act of 1970 anticipates the existence of nonzero threshold concentrations of pollutants below which no consequential materials damage or soiling results from exposure to pollution. Determination of quantitative values for such thresholds on the basis of available empirical evidence involves substantial practical problems, including conceptual problems relating to measurement and attribution of effects and statistical problems relating to data analysis and interpretation. It is essential to maintain a broad, yet focused perspective that systematically accounts for such problems when synthesizing and interpreting empirical evidence reported in different studies of the same hypothesized effect.  相似文献   

In heteroskedasticity pretesting, if the null hypothesis of homoskedasticity is accepted, the OLS estimator rather than the 2SAE is used. However, if the degree of severity of heteroskedasticity were so mild that the OLS estimator would still outperform the 2SAE, then this methodology would produce results that are inferior to the OLS estimator.

This paper suggests that instead of pretesting for the presence of heteroskedasticity alone, researchers should in addition use a relative efficiency criterion to compare the performance of both estimators for improved results.  相似文献   

Summary.  Traditional studies of school differences in educational achievement use multilevel modelling techniques to take into account the nesting of pupils within schools. However, educational data are known to have more complex non-hierarchical structures. The potential importance of such structures is apparent when considering the effect of pupil mobility during secondary schooling on educational achievement. Movements of pupils between schools suggest that we should model pupils as belonging to the series of schools that are attended and not just their final school. Since these school moves are strongly linked to residential moves, it is important to explore additionally whether achievement is also affected by the history of neighbourhoods that are lived in. Using the national pupil database, this paper combines multiple membership and cross-classified multilevel models to explore simultaneously the relationships between secondary school, primary school, neighbourhood and educational achievement. The results show a negative relationship between pupil mobility and achievement, the strength of which depends greatly on the nature and timing of these moves. Accounting for pupil mobility also reveals that schools and neighbourhoods are more important than shown by previous analysis. A strong primary school effect appears to last long after a child has left that phase of schooling. The additional effect of neighbourhoods, in contrast, is small. Crucially, the rank order of school effects across all types of pupil is sensitive to whether we account for the complexity of the multilevel data structure.  相似文献   

We propose a varying‐coefficient autoregressive model that contains additive models, varying‐ coefficient models, partially linear models and low‐dimensional interaction models as special cases. A global kernel backfitting method is proposed for the estimation and inference of parameters and unknown functions in this model. Key large‐sample results are established, including estimation consistency, asymptotic normality and the generalized likelihood ratio test for parameters and non‐parametric functions. The proposed methodology is examined by simulation studies and applied to examine the relationship between suicide news reports in the three leading newspapers and the daily number of suicides in Taiwan. The relationship between the media reporting and suicide incidence has been established and explored. The Canadian Journal of Statistics 47: 487–519; 2019 © 2019 Statistical Society of Canada  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号