首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 562 毫秒
1.
Analysis of massive datasets is challenging owing to limitations of computer primary memory. Composite quantile regression (CQR) is a robust and efficient estimation method. In this paper, we extend CQR to massive datasets and propose a divide-and-conquer CQR method. The basic idea is to split the entire dataset into several blocks, applying the CQR method for data in each block, and finally combining these regression results via weighted average. The proposed approach significantly reduces the required amount of primary memory, and the resulting estimate will be as efficient as if the entire data set is analysed simultaneously. Moreover, to improve the efficiency of CQR, we propose a weighted CQR estimation approach. To achieve sparsity with high-dimensional covariates, we develop a variable selection procedure to select significant parametric components and prove the method possessing the oracle property. Both simulations and data analysis are conducted to illustrate the finite sample performance of the proposed methods.  相似文献   

2.
This paper develops a varying-coefficient approach to the estimation and testing of regression quantiles under randomly truncated data. In order to handle the truncated data, the random weights are introduced and the weighted quantile regression (WQR) estimators for nonparametric functions are proposed. To achieve nice efficiency properties, we further develop a weighted composite quantile regression (WCQR) estimation method for nonparametric functions in varying-coefficient models. The asymptotic properties both for the proposed WQR and WCQR estimators are established. In addition, we propose a novel bootstrap-based test procedure to test whether the nonparametric functions in varying-coefficient quantile models can be specified by some function forms. The performance of the proposed estimators and test procedure are investigated through simulation studies and a real data example.  相似文献   

3.
We propose a weighted empirical likelihood approach to inference with multiple samples, including stratified sampling, the estimation of a common mean using several independent and non-homogeneous samples and inference on a particular population using other related samples. The weighting scheme and the basic result are motivated and established under stratified sampling. We show that the proposed method can ideally be applied to the common mean problem and problems with related samples. The proposed weighted approach not only provides a unified framework for inference with multiple samples, including two-sample problems, but also facilitates asymptotic derivations and computational methods. A bootstrap procedure is also proposed in conjunction with the weighted approach to provide better coverage probabilities for the weighted empirical likelihood ratio confidence intervals. Simulation studies show that the weighted empirical likelihood confidence intervals perform better than existing ones.  相似文献   

4.
This article considers the problem of testing for an explosive bubble in financial data in the presence of time-varying volatility. We propose a weighted least squares-based variant of the Phillips et al.) test for explosive autoregressive behavior. We find that such an approach has appealing asymptotic power properties, with the potential to deliver substantially greater power than the established OLS-based approach for many volatility and bubble settings. Given that the OLS-based test can outperform the weighted least squares-based test for other volatility and bubble specifications, we also suggest a union of rejections procedure that succeeds in capturing the better power available from the two constituent tests for a given alternative. Our approach involves a nonparametric kernel-based volatility function estimator for computation of the weighted least squares-based statistic, together with the use of a wild bootstrap procedure applied jointly to both individual tests, delivering a powerful testing procedure that is asymptotically size-robust to a wide range of time-varying volatility specifications.  相似文献   

5.
We propose a survey weighted quadratic inference function method for the analysis of data collected from longitudinal surveys, as an alternative to the survey weighted generalized estimating equation method. The procedure yields estimators of model parameters, which are shown to be consistent and have a limiting normal distribution. Furthermore, based on the inference function, a pseudolikelihood ratio type statistic for testing a composite hypothesis on model parameters and a statistic for testing the goodness of fit of the assumed model are proposed. We establish their asymptotic distributions as weighted sums of independent chi‐squared random variables and obtain Rao‐Scott corrections to those statistics leading to a chi‐squared distribution, approximately. We examine the performance of the proposed methods in a simulation study.  相似文献   

6.
For testing the equality of two survival functions, the weighted logrank test and the weighted Kaplan–Meier test are the two most widely used methods. Actually, each of these tests has advantages and defects against various alternatives, while we cannot specify in advance the possible types of the survival differences. Hence, how to choose a single test or combine a number of competitive tests for indicating the diversities of two survival functions without suffering a substantial loss in power is an important issue. Instead of directly using a particular test which generally performs well in some situations and poorly in others, we further consider a class of tests indexed by a weighted parameter for testing the equality of two survival functions in this paper. A delete-1 jackknife method is implemented for selecting weights such that the variance of the test is minimized. Some numerical experiments are performed under various alternatives for illustrating the superiority of the proposed method. Finally, the proposed testing procedure is applied to two real-data examples as well.  相似文献   

7.
This paper presents a goodness‐of‐fit test for parametric regression models with scalar response and directional predictor, that is, a vector on a sphere of arbitrary dimension. The testing procedure is based on the weighted squared distance between a smooth and a parametric regression estimator, where the smooth regression estimator is obtained by a projected local approach. Asymptotic behaviour of the test statistic under the null hypothesis and local alternatives is provided, jointly with a consistent bootstrap algorithm for application in practice. A simulation study illustrates the performance of the test in finite samples. The procedure is applied to test a linear model in text mining.  相似文献   

8.
Summary.  We present a general method of adjustment for non-ignorable non-response in studies where one or more further attempts are made to contact initial non-responders. A logistic regression model relates the probability of response at each contact attempt to covariates and outcomes of interest. We assume that the effect of these covariates and outcomes on the probability of response is the same at all contact attempts. Knowledge of the number of contact attempts enables estimation of the model by using only information from the respondents and the number of non-responders. Three approaches for fitting the response models and estimating parameters of substantive interest and their standard errors are compared: a modified conditional likelihood method in which the fitted inverse probabilities of response are used in weighted analyses for the outcomes of interest, an EM procedure with the Louis formula and a Bayesian approach using Markov chain Monte Carlo methods. We further propose the creation of several sets of weights to incorporate uncertainty in the probability weights in subsequent analyses. Our methods are applied as a sensitivity analysis to a postal survey of symptoms in Persian Gulf War veterans and other servicemen.  相似文献   

9.
Abstract. The zero‐inflated Poisson regression model is a special case of finite mixture models that is useful for count data containing many zeros. Typically, maximum likelihood (ML) estimation is used for fitting such models. However, it is well known that the ML estimator is highly sensitive to the presence of outliers and can become unstable when mixture components are poorly separated. In this paper, we propose an alternative robust estimation approach, robust expectation‐solution (RES) estimation. We compare the RES approach with an existing robust approach, minimum Hellinger distance (MHD) estimation. Simulation results indicate that both methods improve on ML when outliers are present and/or when the mixture components are poorly separated. However, the RES approach is more efficient in all the scenarios we considered. In addition, the RES method is shown to yield consistent and asymptotically normal estimators and, in contrast to MHD, can be applied quite generally.  相似文献   

10.
Abstract. This article presents a novel estimation procedure for high‐dimensional Archimedean copulas. In contrast to maximum likelihood estimation, the method presented here does not require derivatives of the Archimedean generator. This is computationally advantageous for high‐dimensional Archimedean copulas in which higher‐order derivatives are needed but are often difficult to obtain. Our procedure is based on a parameter‐dependent transformation of the underlying random variables to a one‐dimensional distribution where a minimum‐distance method is applied. We show strong consistency of the resulting minimum‐distance estimators to the case of known margins as well as to the case of unknown margins when pseudo‐observations are used. Moreover, we conduct a simulation comparing the performance of the proposed estimation procedure with the well‐known maximum likelihood approach according to bias and standard deviation.  相似文献   

11.
Estimation of the single-index model with a discontinuous unknown link function is considered in this paper. Existed refined minimum average variance estimation (rMAVE) method can estimate the single-index parameter and unknown link function simultaneously by minimising the average pointwise conditional variance, where the conditional variance can be estimated using the local linear fit method with centred kernel function. When there are jumps in the link function, big biases around jumps can appear. For this reason, we embed the jump-preserving technique in the rMAVE method, then propose an adaptive jump-preserving estimation procedure for the single-index model. Concretely speaking, the conditional variance is obtained by the one among local linear fits with centred, left-sided and right-sided kernel functions who has minimum weighted residual mean squares. The resulting estimators can preserve the jumps well and also give smooth estimates of the continuity parts. Asymptotic properties are established under some mild conditions. Simulations and real data analysis show the proposed method works well.  相似文献   

12.
Multiple Hypotheses Testing with Weights   总被引:2,自引:0,他引:2  
In this paper we offer a multiplicity of approaches and procedures for multiple testing problems with weights. Some rationale for incorporating weights in multiple hypotheses testing are discussed. Various type-I error-rates and different possible formulations are considered, for both the intersection hypothesis testing and the multiple hypotheses testing problems. An optimal per family weighted error-rate controlling procedure a la Spjotvoll (1972) is obtained. This model serves as a vehicle for demonstrating the different implications of the approaches to weighting. Alternative approach es to that of Holm (1979) for family-wise error-rate control with weights are discussed, one involving an alternative procedure for family-wise error-rate control, and the other involving the control of a weighted family-wise error-rate. Extensions and modifications of the procedures based on Simes (1986) are given. These include a test of the overall intersec tion hypothesis with general weights, and weighted sequentially rejective procedures for testing the individual hypotheses. The false discovery rate controlling approach and procedure of Benjamini & Hochberg (1995) are extended to allow for different weights.  相似文献   

13.
Abstract. DNA array technology is an important tool for genomic research due to its capa‐city of measuring simultaneously the expression levels of a great number of genes or fragments of genes in different experimental conditions. An important point in gene expression data analysis is to identify clusters of genes which present similar expression levels. We propose a new procedure for estimating the mixture model for clustering of gene expression data. The proposed method is a posterior split‐merge‐birth MCMC procedure which does not require the specification of the number of components, since it is estimated jointly with component parameters. The strategy for splitting is based on data and on posterior distribution from the previously allocated observations. This procedure defines a quick split proposal in contrary to other split procedures, which require substantial computational effort. The performance of the method is verified using real and simulated datasets.  相似文献   

14.
Cross-country economic convergence has been increasingly investigated by finite mixture models. Multiple components in a mixture reflect groups of countries that converge locally. Testing for the number of components is crucial for detecting “convergence clubs.” To assess the number of components of the mixture, we propose a sequential procedure that compares the shape of the hypothesized mixture distribution with the true unknown density, consistently estimated through a kernel estimator. The novelty of our approach is its capability to select the number of components along with a satisfactory fitting of the model. Simulation studies and an empirical application to per capita income distribution across countries testify for the good performance of our approach. A three-clubs convergence seems to emerge.  相似文献   

15.
In this article, we present the problem of selecting a good stochastic system with high probability and minimum total simulation cost when the number of alternatives is very large. We propose a sequential approach that starts with the Ordinal Optimization procedure to select a subset that overlaps with the set of the actual best m% systems with high probability. Then we use Optimal Computing Budget Allocation to allocate the available computing budget in a way that maximizes the Probability of Correct Selection. This is followed by a Subset Selection procedure to get a smaller subset that contains the best system among the subset that is selected before. Finally, the Indifference-Zone procedure is used to select the best system among the survivors in the previous stage. The numerical test involved with all these procedures shows the results for selecting a good stochastic system with high probability and a minimum number of simulation samples, when the number of alternatives is large. The results also show that the proposed approach is able to identify a good system in a very short simulation time.  相似文献   

16.
Interval-censored data are very common in the reliability and lifetime data analysis. This paper investigates the performance of different estimation procedures for a special type of interval-censored data, i.e. grouped data, from three widely used lifetime distributions. The approaches considered here include the maximum likelihood estimation, the minimum distance estimation based on chi-square criterion, the moment estimation based on imputation (IM) method and an ad hoc estimation procedure. Although IM-based techniques are extensively used recently, we show that this method is not always effective. It is found that the ad hoc estimation procedure is equivalent to the minimum distance estimation with another distance metric and more effective in the simulation. The procedures of different approaches are presented and their performances are investigated by Monte Carlo simulation for various combinations of sample sizes and parameter settings. The numerical results provide guidelines to analyse grouped data for practitioners when they need to choose a good estimation approach.  相似文献   

17.
We are concerned with the problem of estimating the treatment effects at the effective doses in a dose-finding study. Under monotone dose-response, the effective doses can be identified through the estimation of the minimum effective dose, for which there is an extensive set of statistical tools. In particular, when a fixed-sequence multiple testing procedure is used to estimate the minimum effective dose, Hsu and Berger (1999) show that the confidence lower bounds for the treatment effects can be constructed without the need to adjust for multiplicity. Their method, called the dose-response method, is simple to use, but does not account for the magnitude of the observed treatment effects. As a result, the dose-response method will estimate the treatment effects at effective doses with confidence bounds invariably identical to the hypothesized value. In this paper, we propose an error-splitting method as a variant of the dose-response method to construct confidence bounds at the identified effective doses after a fixed-sequence multiple testing procedure. Our proposed method has the virtue of simplicity as in the dose-response method, preserves the nominal coverage probability, and provides sharper bounds than the dose-response method in most cases.  相似文献   

18.
In this study, a new per-field classification method is proposed for supervised classification of remotely sensed multispectral image data of an agricultural area using Gaussian mixture discriminant analysis (MDA). For the proposed per-field classification method, multivariate Gaussian mixture models constructed for control and test fields can have fixed or different number of components and each component can have different or common covariance matrix structure. The discrimination function and the decision rule of this method are established according to the average Bhattacharyya distance and the minimum values of the average Bhattacharyya distances, respectively. The proposed per-field classification method is analyzed for different structures of a covariance matrix with fixed and different number of components. Also, we classify the remotely sensed multispectral image data using the per-pixel classification method based on Gaussian MDA.  相似文献   

19.
The detection of (structural) breaks or the so called change point problem has drawn increasing attention from the theoretical, applied economic and financial fields. Much of the existing research concentrates on the detection of change points and asymptotic properties of their estimators in panels when N, the number of panels, as well as T, the number of observations in each panel are large. In this paper we pursue a different approach, i.e., we consider the asymptotic properties when N→∞ while keeping T fixed. This situation is typically related to large (firm-level) data containing financial information about an immense number of firms/stocks across a limited number of years/quarters/months. We propose a general approach for testing for break(s) in this setup. In particular, we obtain the asymptotic behavior of test statistics. We also propose a wild bootstrap procedure that could be used to generate the critical values of the test statistics. The theoretical approach is supplemented by numerous simulations and by an empirical illustration. We demonstrate that the testing procedure works well in the framework of the four factors CAPM model. In particular, we estimate the breaks in the monthly returns of US mutual funds during the period January 2006 to February 2010 which covers the subprime crises.  相似文献   

20.
We consider a class of closed multiple test procedures indexed by a fixed weight vector. The class includes the Holm weighted step-down procedure, the closed method using the weighted Fisher combination test, and the closed method using the weighted version of Simes’ test. We show how to choose weights to maximize average power, where “average power” is itself weighted by importance assigned to the various hypotheses.Numerical computations suggest that the optimal weights for the multiple test procedures tend to certain asymptotic configurations. These configurations offer numerical justification for intuitive multiple comparisons methods, such as downweighting variables found insignificant in preliminary studies, giving primary variables more emphasis, gatekeeping test strategies, pre-determined multiple testing sequences, and pre-determined sequences of families of tests. We establish that such methods fall within the envelope of weighted closed testing procedures, thus providing a unified view of fixed sequences, fixed sequences of families, and gatekeepers within the closed testing paradigm. We also establish that the limiting cases control the familywise error rate (or FWE), using well-known results about closed tests, along with the dominated convergence theorem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号