首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this article, we propose a method of averaging generalized least squares estimators for linear regression models with heteroskedastic errors. The averaging weights are chosen to minimize Mallows’ Cp-like criterion. We show that the weight vector selected by our method is optimal. It is also shown that this optimality holds even when the variances of the error terms are estimated and the feasible generalized least squares estimators are averaged. The variances can be estimated parametrically or nonparametrically. Monte Carlo simulation results are encouraging. An empirical example illustrates that the proposed method is useful for predicting a measure of firms’ performance.  相似文献   

2.
Nonparametric estimators of the upper boundary of the support of a multivariate distribution are very appealing because they rely on very few assumptions. But in productivity and efficiency analysis, this upper boundary is a production (or a cost) frontier and a parametric form for it allows for a richer economic interpretation of the production process under analysis. On the other hand, most of the parametric approaches rely on often too restrictive assumptions on the stochastic part of the model and are based on standard regression techniques fitting the shape of the center of the cloud of points rather than its boundary. To overcome these limitations, Florens and Simar [2005. Parametric approximations of nonparametric frontiers. J. Econometrics 124 (1), 91–116] propose a two-stage approach which tries to capture the shape of the cloud of points near its frontier by providing parametric approximations of a nonparametric frontier. In this paper we propose an alternative method using the nonparametric quantile-type frontiers introduced in Aragon, Daouia and Thomas-Agnan [2005. Nonparametric frontier estimation: a conditional quantile-based approach. Econometric Theory 21, 358–389] for the nonparametric part of our model. These quantile-type frontiers have the superiority of being more robust to extremes. Our main result concerns the functional convergence of the quantile-type frontier process. Then we provide convergence and asymptotic normality of the resulting estimators of the parametric approximation. The approach is illustrated through simulated and real data sets.  相似文献   

3.
In a calibration of near-infrared (NIR) instrument, we regress some chemical compositions of interest as a function of their NIR spectra. In this process, we have two immediate challenges: first, the number of variables exceeds the number of observations and, second, the multicollinearity between variables are extremely high. To deal with the challenges, prediction models that produce sparse solutions have recently been proposed. The term ‘sparse’ means that some model parameters are zero estimated and the other parameters are estimated naturally away from zero. In effect, a variable selection is embedded in the model to potentially achieve a better prediction. Many studies have investigated sparse solutions for latent variable models, such as partial least squares and principal component regression, and for direct regression models such as ridge regression (RR). However, in the latter, it mainly involves an L1 norm penalty to the objective function such as lasso regression. In this study, we investigate new sparse alternative models for RR within a random effects model framework, where we consider Cauchy and mixture-of-normals distributions on the random effects. The results indicate that the mixture-of-normals model produces a sparse solution with good prediction and better interpretation. We illustrate the methods using NIR spectra datasets from milk and corn specimens.  相似文献   

4.
ABSTRACT

The effect of parameters estimation on profile monitoring methods has only been studied by a few researchers and only the assumption of a normal response variable has been tackled. However, in some practical situation, the normality assumption is violated and the response variable follows a discrete distribution such as Poisson. In this paper, we evaluate the effect of parameters estimation on the Phase II monitoring of Poisson regression profiles by considering two control charts, namely the Hotelling’s T2 and the multivariate exponentially weighted moving average (MEWMA) charts. Simulation studies in terms of the average run length (ARL) and the standard deviation of the run length (SDRL) are carried out to assess the effect of estimated parameters on the performance of Phase II monitoring approaches. The results reveal that both in-control and out-of-control performances of these charts are adversely affected when the regression parameters are estimated.  相似文献   

5.
This paper proposes a probabilistic frontier regression model for binary type output data in a production process setup. We consider one of the two categories of outputs as ‘selected’ category and the reduction in probability of falling in this category is attributed to the reduction in technical efficiency (TE) of the decision-making unit. An efficiency measure is proposed to determine the deviations of individual units from the probabilistic frontier. Simulation results show that the average estimated TE component is close to its true value. An application of the proposed method to the data related to the Indian public sector banking system is provided where the output variable is the indicator of level of non-performing assets. Individual TE is obtained for each of the banks under consideration. Among the public sector banks, Andhra bank is found to be the most efficient, whereas the United Bank of India is the least.  相似文献   

6.
In this work, we analyze the long-range dependence parameter for a nucleotide sequence in several different transformations. The long-range dependence parameter is estimated by the approximated maximum likelihood method, by a novel estimator based on the spectral envelope theory, by a regression method based on the periodogram function, and also by the detrended fluctuation analysis method. We study the length distribution of coding and noncoding regions for all Homo sapiens chromosomes available from the European Bioinformatics Institute. The parameter of the tail rate decay is estimated by the Hill estimator ?α. We show that the tail rate decay is greater than 2 for coding regions, while for almost all noncoding regions it is less than 2.  相似文献   

7.
Abstract

It is common to monitor several correlated quality characteristics using the Hotelling's T 2 statistic. However, T 2 confounds the location shift with scale shift and consequently it is often difficult to determine the factors responsible for out of control signal in terms of the process mean vector and/or process covariance matrix. In this paper, we propose a diagnostic procedure called ‘D-technique’ to detect the nature of shift. For this purpose, two sets of regression equations, each consisting of regression of a variable on the remaining variables, are used to characterize the ‘structure’ of the ‘in control’ process and that of ‘current’ process. To determine the sources responsible for an out of control state, it is shown that it is enough to compare these two structures using the dummy variable multiple regression equation. The proposed method is operationally simpler and computationally advantageous over existing diagnostic tools. The technique is illustrated with various examples.  相似文献   

8.
In this article, we deal with an optimal reliability and maintainability design problem of a searching system with complex structures. The system availability and life cycle cost are used as optimization criteria and estimated by simulation. We want to determine MTBF (Mean Time between Failures) and MTTR (Mean Time to Repair) for all components and ALDT (Administrative and Logistics Delay Times) of the searching system in order to minimize the life cycle cost and to satisfy the target system availability. A hybrid genetic algorithm with a heuristic method is proposed to find near-optimal solutions and compared with a general genetic algorithm.  相似文献   

9.
Nested case–control (NCC) sampling is widely used in large epidemiological cohort studies for its cost effectiveness, but its data analysis primarily relies on the Cox proportional hazards model. In this paper, we consider a family of linear transformation models for analyzing NCC data and propose an inverse selection probability weighted estimating equation method for inference. Consistency and asymptotic normality of our estimators for regression coefficients are established. We show that the asymptotic variance has a closed analytic form and can be easily estimated. Numerical studies are conducted to support the theory and an application to the Wilms’ Tumor Study is also given to illustrate the methodology.  相似文献   

10.
Abstract

In some clinical, environmental, or economical studies, researchers are interested in a semi-continuous outcome variable which takes the value zero with a discrete probability and has a continuous distribution for the non-zero values. Due to the measuring mechanism, it is not always possible to fully observe some outcomes, and only an upper bound is recorded. We call this left-censored data and observe only the maximum of the outcome and an independent censoring variable, together with an indicator. In this article, we introduce a mixture semi-parametric regression model. We consider a parametric model to investigate the influence of covariates on the discrete probability of the value zero. For the non-zero part of the outcome, a semi-parametric Cox’s regression model is used to study the conditional hazard function. The different parameters in this mixture model are estimated using a likelihood method. Hereby the infinite dimensional baseline hazard function is estimated by a step function. As results, we show the identifiability and the consistency of the estimators for the different parameters in the model. We study the finite sample behaviour of the estimators through a simulation study and illustrate this model on a practical data example.  相似文献   

11.
Partially linear regression models are semiparametric models that contain both linear and nonlinear components. They are extensively used in many scientific fields for their flexibility and convenient interpretability. In such analyses, testing the significance of the regression coefficients in the linear component is typically a key focus. Under the high-dimensional setting, i.e., “large p, small n,” the conventional F-test strategy does not apply because the coefficients need to be estimated through regularization techniques. In this article, we develop a new test using a U-statistic of order two, relying on a pseudo-estimate of the nonlinear component from the classical kernel method. Using the martingale central limit theorem, we prove the asymptotic normality of the proposed test statistic under some regularity conditions. We further demonstrate our proposed test's finite-sample performance by simulation studies and by analyzing some breast cancer gene expression data.  相似文献   

12.
ABSTRACT

In this article, we propose a more general criterion called Sp -criterion, for subset selection in the multiple linear regression Model. Many subset selection methods are based on the Least Squares (LS) estimator of β, but whenever the data contain an influential observation or the distribution of the error variable deviates from normality, the LS estimator performs ‘poorly’ and hence a method based on this estimator (for example, Mallows’ Cp -criterion) tends to select a ‘wrong’ subset. The proposed method overcomes this drawback and its main feature is that it can be used with any type of estimator (either the LS estimator or any robust estimator) of β without any need for modification of the proposed criterion. Moreover, this technique is operationally simple to implement as compared to other existing criteria. The method is illustrated with examples.  相似文献   

13.
The nonlinear responses of species to environmental variability can play an important role in the maintenance of ecological diversity. Nonetheless, many models use parametric nonlinear terms which pre-determine the ecological conclusions. Motivated by this concern, we study the estimate of the second derivative (curvature) of the link function in a functional single index model. Since the coefficient function and the link function are both unknown, the estimate is expressed as a nested optimization. We first estimate the coefficient function by minimizing squared error where the link function is estimated with a Nadaraya-Watson estimator for each candidate coefficient function. The first and second derivatives of the link function are then estimated via local-quadratic regression using the estimated coefficient function. In this paper, we derive a convergence rate for the curvature of the nonlinear response. In addition, we prove that the argument of the linear predictor can be estimated root-n consistently. However, practical implementation of the method requires solving a nonlinear optimization problem, and our results show that the estimates of the link function and the coefficient function are quite sensitive to the choices of starting values.  相似文献   

14.
We consider estimating the mode of a response given an error‐prone covariate. It is shown that ignoring measurement error typically leads to inconsistent inference for the conditional mode of the response given the true covariate, as well as misleading inference for regression coefficients in the conditional mode model. To account for measurement error, we first employ the Monte Carlo corrected score method (Novick & Stefanski, 2002) to obtain an unbiased score function based on which the regression coefficients can be estimated consistently. To relax the normality assumption on measurement error this method requires, we propose another method where deconvoluting kernels are used to construct an objective function that is maximized to obtain consistent estimators of the regression coefficients. Besides rigorous investigation on asymptotic properties of the new estimators, we study their finite sample performance via extensive simulation experiments, and find that the proposed methods substantially outperform a naive inference method that ignores measurement error. The Canadian Journal of Statistics 47: 262–280; 2019 © 2019 Statistical Society of Canada  相似文献   

15.
Coefficient estimation in linear regression models with missing data is routinely carried out in the mean regression framework. However, the mean regression theory breaks down if the error variance is infinite. In addition, correct specification of the likelihood function for existing imputation approach is often challenging in practice, especially for skewed data. In this paper, we develop a novel composite quantile regression and a weighted quantile average estimation procedure for parameter estimation in linear regression models when some responses are missing at random. Instead of imputing the missing response by randomly drawing from its conditional distribution, we propose to impute both missing and observed responses by their estimated conditional quantiles given the observed data and to use the parametrically estimated propensity scores to weigh check functions that define a regression parameter. Both estimation procedures are resistant to heavy‐tailed errors or outliers in the response and can achieve nice robustness and efficiency. Moreover, we propose adaptive penalization methods to simultaneously select significant variables and estimate unknown parameters. Asymptotic properties of the proposed estimators are carefully investigated. An efficient algorithm is developed for fast implementation of the proposed methodologies. We also discuss a model selection criterion, which is based on an ICQ ‐type statistic, to select the penalty parameters. The performance of the proposed methods is illustrated via simulated and real data sets.  相似文献   

16.
The aim of the paper is to demonstrate the appropriateness of an a priori analysis to determine the distributional assumption of the inefficiency term in a stochastic frontier model. To this end, theoretical distributions of estimated inefficiency were obtained when the inefficiency term is assumed to be distributed as a half normal and an exponential in a cost frontier model. Comparisons of such theoretical distributions with the respective cost inefficiency estimators using the goodness of fit test allow selecting the most appropriate distributional assumption. The application on three data sets of Spanish banking system in 2009 demonstrated the relevance of the research question. First, the results of estimated cost inefficiency with a half normal assumption are larger than with an exponential distribution significantly. Besides, half normal assumption was rejected and exponential was not rejected as the most appropriate distribution of inefficiency term in Spanish banking data set. However, the adjustment of saving banks data had been better with the former distribution than the latter. In the case of banks, any distribution results appropriate. To sum up, this work demonstrate that the distributional assumption on inefficiency term in Stochastic Frontier Approach must be established in a justified way, as it can significantly bias the results of estimated inefficiency and therefore, influences improving policies and strategies in the Spanish banking sector.  相似文献   

17.
In this article we present a simple procedure to test for the null hypothesis of equality of two regression curves versus one-sided alternatives in a general nonparametric and heteroscedastic setup. The test is based on the comparison of the sample averages of the estimated residuals in each regression model under the null hypothesis. The test statistic has asymptotic normal distribution and can detect any local alternative of rate n-1/2. Some simulations and an application to a data set are included.  相似文献   

18.
A general class of multivariate regression models is considered for repeated measurements with discrete and continuous outcome variables. The proposed model is based on the seemingly unrelated regression model (Zellner, 1962) and an extension of the model of Park and Woolson(1992). The regression parameters of the model are consistently estimated using the two-stage least squares method. When the out come variables are multivariate normal, the two-stage estimator reduces to Zellner’s two-stage estimator. As a special case, we consider the marginal distribution described by Liang and Zeger (1986). Under this this distributional assumption, we show that the two-stage estimator has similar asymptotic properties and comparable small sample properties to Liang and Zeger's estimator. Since the proposed approach is based on the least squares method, however, any distributional assumption is not required for variables outcome variables. As a result, the proposed estimator is more robust to the marginal distribution of outcomes.  相似文献   

19.
Abstract

In this article, we propose a two-stage generalized case–cohort design and develop an efficient inference procedure for the data collected with this design. In the first-stage, we observe the failure time, censoring indicator and covariates which are easy or cheap to measure, and in the second-stage, select a subcohort by simple random sampling and a subset of failures in remaining subjects from the first-stage subjects to observe their exposures which are different or expensive to measure. We derive estimators for regression parameters in the accelerated failure time model under the two-stage generalized case–cohort design through the estimated augmented estimating equation and the kernel function method. The resulting estimators are shown to be consistent and asymptotically normal. The finite sample performance of the proposed method is evaluated through the simulation studies. The proposed method is applied to a real data set from the National Wilm’s Tumor Study Group.  相似文献   

20.
Recently, Kokonendji et al. have adapted the well-known Nadaraya–Watson kernel estimator for estimating the count function m in the context of nonparametric discrete regression. The authors have also investigated the bandwidth selection using the cross-validation method. In this article, we propose a Bayesian approach in the context of nonparametric count regression for estimating the bandwidth and the variance of the model error, which has not been estimated in Kokonendji et al. The model error is considered as Gaussian with mean of zero and a variance of σ2. The Bayes estimates cannot be obtained in closed form and then, we use the well-known Markov chain Monte Carlo (MCMC) technique to compute the Bayes estimates under the squared errors loss function. The performance of this proposed approach and the cross-validation method are compared through simulation and real count data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号