首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Summary.  Because highly correlated data arise from many scientific fields, we investigate parameter estimation in a semiparametric regression model with diverging number of predictors that are highly correlated. For this, we first develop a distribution-weighted least squares estimator that can recover directions in the central subspace, then use the distribution-weighted least squares estimator as a seed vector and project it onto a Krylov space by partial least squares to avoid computing the inverse of the covariance of predictors. Thus, distrbution-weighted partial least squares can handle the cases with high dimensional and highly correlated predictors. Furthermore, we also suggest an iterative algorithm for obtaining a better initial value before implementing partial least squares. For theoretical investigation, we obtain strong consistency and asymptotic normality when the dimension p of predictors is of convergence rate O { n 1/2/ log ( n )} and o ( n 1/3) respectively where n is the sample size. When there are no other constraints on the covariance of predictors, the rates n 1/2 and n 1/3 are optimal. We also propose a Bayesian information criterion type of criterion to estimate the dimension of the Krylov space in the partial least squares procedure. Illustrative examples with a real data set and comprehensive simulations demonstrate that the method is robust to non-ellipticity and works well even in 'small n –large p ' problems.  相似文献   

2.
Estimating smooth monotone functions   总被引:1,自引:0,他引:1  
Many situations call for a smooth strictly monotone function f of arbitrary flexibility. The family of functions defined by the differential equation D  2 f  = w Df , where w is an unconstrained coefficient function comprises the strictly monotone twice differentiable functions. The solution to this equation is f = C 0 + C 1  D −1{exp( D −1 w )}, where C 0 and C 1 are arbitrary constants and D −1 is the partial integration operator. A basis for expanding w is suggested that permits explicit integration in the expression of f . In fitting data, it is also useful to regularize f by penalizing the integral of w 2 since this is a measure of the relative curvature in f . Applications are discussed to monotone nonparametric regression, to the transformation of the dependent variable in non-linear regression and to density estimation.  相似文献   

3.
Methods are suggested for improving the coverage accuracy of intervals for predicting future values of a random variable drawn from a sampled distribution. It is shown that properties of solutions to such problems may be quite unexpected. For example, the bootstrap and the jackknife perform very poorly when used to calibrate coverage, although the jackknife estimator of the true coverage is virtually unbiased. A version of the smoothed bootstrap can be employed for successful calibration, however. Interpolation among adjacent order statistics can also be an effective way of calibrating, although even there the results are unexpected. In particular, whereas the coverage error can be reduced from O ( n -1) to orders O ( n -2) and O ( n -3) (where n denotes the sample size) by interpolating among two and three order statistics respectively, the next two orders of reduction require interpolation among five and eight order statistics respectively.  相似文献   

4.
Abstract.  The supremum difference between the cumulative sum diagram, and its greatest convex minorant (GCM), in case of non-parametric isotonic regression is considered. When the regression function is strictly increasing, and the design points are unequally spaced, but approximate a positive density in even a slow rate ( n −1/3), then the difference is shown to shrink in a very rapid (close to n −2/3) rate. The result is analogous to the corresponding result in case of a monotone density estimation established by Kiefer and Wolfowitz, but uses entirely different representation. The limit distribution of the GCM as a process on the unit interval is obtained when the design variables are i.i.d. with a positive density. Finally, a pointwise asymptotic normality result is proved for the smooth monotone estimator, obtained by the convolution of a kernel with the classical monotone estimator.  相似文献   

5.
The predictor that minimizes mean-squared prediction error is used to derive a goodness-of-fit measure that offers an asymptotically valid model selection criterion for a wide variety of regression models. In particular, a new goodness-of-fit criterion (cr2) is proposed for censored or otherwise limited dependent variables. The new goodness-of-fit measure is then applied to the analysis of duration.  相似文献   

6.
B   rdal   eno  lu 《Journal of applied statistics》2005,32(10):1051-1066
It is well known that the least squares method is optimal only if the error distributions are normally distributed. However, in practice, non-normal distributions are more prevalent. If the error terms have a non-normal distribution, then the efficiency of least squares estimates and tests is very low. In this paper, we consider the 2k factorial design when the distribution of error terms are Weibull W(p,σ). From the methodology of modified likelihood, we develop robust and efficient estimators for the parameters in 2k factorial design. F statistics based on modified maximum likelihood estimators (MMLE) for testing the main effects and interaction are defined. They are shown to have high powers and better robustness properties as compared to the normal theory solutions. A real data set is analysed.  相似文献   

7.
Let F and G be lifetime distributions and consider the problem of estimating F −1 when it is known that G −1 F is star-shaped. Estimators of F −1 are considered here which are shown to be uniformly strongly consistent. The case of censored data is also presented. Asymptotic confidence intervals and bands for F −1 are provided. The result are applicable, for example, to the estimation of quantile functions of k -out-of- n systems in reliability. The special case of an IFRA distribution follows immediately from the more general case presented here  相似文献   

8.
In this paper we use non-parametric local polynomial methods to estimate the regression function, m ( x ). Y may be a binary or continuous response variable, and X is continuous with non-uniform density. The main contributions of this paper are the weak convergence of a bandwidth process for kernels of order (0, k ), k =2 j , j ≥1 and the proposal of a local data-driven bandwidth selection method which is particularly beneficial for the case when X is not distributed uniformly. This selection method minimizes estimates of the asymptotic MSE and estimates the bias portion in an innovative way which relies on the order of the kernel and not estimation of m 2( x ) directly. We show that utilization of this method results in the achievement of the optimal asymptotic MSE by the estimator, i.e. the method is efficient. Simulation studies are provided which illustrate the method for both binary and continuous response cases.  相似文献   

9.
In a polynomial regression with measurement errors in the covariate, the latter being supposed to be normally distributed, one has (at least) three ways to estimate the unknown regression parameters: one can apply ordinary least squares (OLS) to the model without regard to the measurement error or one can correct for the measurement error, either by correcting the estimating equation (ALS) or by correcting the mean and variance functions of the dependent variable, which is done by conditioning on the observable, error ridden, counter part of the covariate (SLS). While OLS is biased, the other two estimators are consistent. Their asymptotic covariance matrices and thus their relative efficiencies can be compared to each other, in particular for the case of a small measurement error variance. In this case, it appears that ALS and SLS become almost equally efficient, even when they differ noticeably from OLS.  相似文献   

10.
In this paper, a review is given of various goodness-of-fit measures that have been proposed for the binary choice model in the last two decades. The relative behaviour of several pseudo-R2 measures is analysed in a series of misspecified binary choice models, the misspecification being omitted variables or an included irrelevant variable. A comparison is made with the OLS-R2 of the underlying latent variable model and with the squared sample correlation coefficient of the true and predicted probabilities. Further, it is investigated how the values of the measures change with a changing frequency rate of successes.  相似文献   

11.
Several distribution-free bounds on expected values of L-statistics based on the sample of possibly dependent and nonidentically distributed random variables are given in the case when the sample size is a random variable, possibly dependent on the observations, with values in the set {1,2,…}. Some bounds extend the results of Papadatos (2001a) to the case of random sample size. The others provide new evaluations even if the sample size is nonrandom. Some applications of the presented bounds are also indicated.  相似文献   

12.
Exact expressions for the cumulative distribution function of a random variable of the form ( α 1 X 1+ α 2 X 2)/ Y are given where X 1, X 2 and Y are independent chi-squared random variables. The expressions are applied to the detection of joint outliers and Hotelling's mis-specified T 2 distribution.  相似文献   

13.
Summary This paper investigates the effects of ordinal regressors in linear regression models and in limited dependent variable models. Each ordered categorical variable is interpreted as a rough measurement of an underlying continuous variable as it is often done in microeconometrics for the dependent variable. It is shown that using ordinal indicators only leads to correct answers in a few special cases. In most situations, the usual estimators are biased. In order to estimate the parameters of the model consistently, the indirect estimation procedure suggested by Gourieroux et al. (1993) is applied. To demonstrate this method, first a simulation study is performed and then in a second step, two real data sets are used. In the latter case, continuous regressors are transformed into categorical variables to study the behavior of the estimation procedure. The method is extended to the case of limited dependent variable models. In general, the indirect estimators lead to adequate results. Received: March 27, 2000; revised version: March 6, 2001  相似文献   

14.
Abstract.  Consider the model Y = β ' X + ε . Let F 0 be the unknown cumulative distribution function of the random variable ε . Consistency of the semi-parametric Maximum likelihood estimator of ( β , F 0), denoted by     , has not been established under any interval censorship (IC) model. We prove in this paper that     is consistent under the mixed case IC model and some mild assumptions.  相似文献   

15.
Summary.  For a binary treatment ν =0, 1 and the corresponding 'potential response' Y 0 for the control group ( ν =0) and Y 1 for the treatment group ( ν =1), one definition of no treatment effect is that Y 0 and Y 1 follow the same distribution given a covariate vector X . Koul and Schick have provided a non-parametric test for no distributional effect when the realized response (1− ν ) Y 0+ ν Y 1 is fully observed and the distribution of X is the same across the two groups. This test is thus not applicable to censored responses, nor to non-experimental (i.e. observational) studies that entail different distributions of X across the two groups. We propose ' X -matched' non-parametric tests generalizing the test of Koul and Schick following an idea of Gehan. Our tests are applicable to non-experimental data with randomly censored responses. In addition to these motivations, the tests have several advantages. First, they have the intuitive appeal of comparing all available pairs across the treatment and control groups, instead of selecting a number of matched controls (or treated) in the usual pair or multiple matching. Second, whereas most matching estimators or tests have a non-overlapping support (of X ) problem across the two groups, our tests have a built-in protection against the problem. Third, Gehan's idea allows the tests to make good use of censored observations. A simulation study is conducted, and an empirical illustration for a job training effect on the duration of unemployment is provided.  相似文献   

16.
Clinical trials and other types of studies often examine the effects of a particular treatment or experimental condition on a number of different response variables. Although the usual approach for analysing such data is to examine each variable separately, this can increase the chance of false positive findings. Bonferroni's inequality or Hotelling's T2 statistic can be employed to control the overall type I error rate, but these tests generally lack power for alternatives in which the treatment improves the outcome on most or all of the endpoints. For the comparison of independent groups, O'Brien (1984) developed a rank-sum type test that has greater power than the Bonferroni and T2 procedures when one treatment is uniformly better (i.e. for all endpoints) than the other treatment(s). In this paper we adapt the rank-sum test to studies involving paired data and demonstrate that it, too, has power advantages for such alternatives. Simulation results are described, and an example from a study measuring the effects of sleep loss on glucose metabolism is presented to illustrate the methodology.  相似文献   

17.
There has been much recent interest in supersaturated designs and their application in factor screening experiments. Supersaturated designs have mainly been constructed by using the E ( s 2)-optimality criterion originally proposed by Booth and Cox in 1962. However, until now E ( s 2)-optimal designs have only been established with certainty for n experimental runs when the number of factors m is a multiple of n-1 , and in adjacent cases where m = q ( n -1) + r (| r | 2, q an integer). A method of constructing E ( s 2)-optimal designs is presented which allows a reasonably complete solution to be found for various numbers of runs n including n ,=8 12, 16, 20, 24, 32, 40, 48, 64.  相似文献   

18.
Summary.  Many contemporary classifiers are constructed to provide good performance for very high dimensional data. However, an issue that is at least as important as good classification is determining which of the many potential variables provide key information for good decisions. Responding to this issue can help us to determine which aspects of the datagenerating mechanism (e.g. which genes in a genomic study) are of greatest importance in terms of distinguishing between populations. We introduce tilting methods for addressing this problem. We apply weights to the components of data vectors, rather than to the data vectors themselves (as is commonly the case in related work). In addition we tilt in a way that is governed by L 2-distance between weight vectors, rather than by the more commonly used Kullback–Leibler distance. It is shown that this approach, together with the added constraint that the weights should be non-negative, produces an algorithm which eliminates vector components that have little influence on the classification decision. In particular, use of the L 2-distance in this problem produces properties that are reminiscent of those that arise when L 1-penalties are employed to eliminate explanatory variables in very high dimensional prediction problems, e.g. those involving the lasso. We introduce techniques that can be implemented very rapidly, and we show how to use bootstrap methods to assess the accuracy of our variable ranking and variable elimination procedures.  相似文献   

19.
We examine empirical relevance of three alternative asymptotic approximations to the distribution of instrumental variables estimators by Monte Carlo experiments. We find that conventional asymptotics provides a reasonable approximation to the actual distribution of instrumental variables estimators when the sample size is reasonably large. For most sample sizes, we find Bekker[11] asymptotics provides reasonably good approximation even when the first stage R2 is very small. We conclude that reporting Bekker[11] confidence interval would suffice for most microeconometric (cross-sectional) applications, and the comparative advantage of Staiger and Stock[5] asymptotic approximation is in applications with sample sizes typical in macroeconometric (time series) applications.  相似文献   

20.
Process control involves repeated hypothesis testing based on several samples. However, process control is not exactly hypothesis testing as such since it deals with detection of non-random patterns of variation as well in a fleeting kind of population. Compare this with hypothesis testing which is principally meant for a stagnant population. Dr Walter A. Shewhart introduced a graphic method for doing this testing in a fleeting population in 1924. This graphic method came to be known as control chart and is widely used throughout the world today for process management purposes. Subsequently there was much advancement in process control techniques. In particular, when more than one variable was involved, process control techniques were developed mainly by Hicks (1955), Jackson (1956 and 1959) and Montgomery and Wadsworth (1972) based on the pioneering work of Hotelling in 1931. Most of them have worked in the area of multivariate variable control chart with the underlying distribution as multivariate normal. When more than one attribute variables are involved some works relating to test of hypothesis was done by Mahalanobis (1946). These works were also based on the Hotelling T2 test. This paper expands the concept of 'Mahalanobis Distance' in case of a multinomial distribution and thereby proposes a multivariate attribute control chart.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号