首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We develop a Bayesian variable selection method for logistic regression models that can simultaneously accommodate qualitative covariates and interaction terms under various heredity constraints. We use expectation-maximization variable selection (EMVS) with a deterministic annealing variant as the platform for our method, due to its proven flexibility and efficiency. We propose a variance adjustment of the priors for the coefficients of qualitative covariates, which controls false-positive rates, and a flexible parameterization for interaction terms, which accommodates user-specified heredity constraints. This method can handle all pairwise interaction terms as well as a subset of specific interactions. Using simulation, we show that this method selects associated covariates better than the grouped LASSO and the LASSO with heredity constraints in various exploratory research scenarios encountered in epidemiological studies. We apply our method to identify genetic and non-genetic risk factors associated with smoking experimentation in a cohort of Mexican-heritage adolescents.  相似文献   

2.
In this article, we consider statistical inference for longitudinal partial linear models when the response variable is sometimes missing with missingness probability depending on the covariate that is measured with error. A generalized empirical likelihood (GEL) method is proposed by combining correction attenuation and quadratic inference functions. The method that takes into consideration the correlation within groups is used to estimate the regression coefficients. Furthermore, residual-adjusted empirical likelihood (EL) is employed for estimating the baseline function so that undersmoothing is avoided. The empirical log-likelihood ratios are proven to be asymptotically Chi-squared, and the corresponding confidence regions for the parameters of interest are then constructed. Compared with methods based on NAs, the GEL does not require consistent estimators for the asymptotic variance and bias. The numerical study is conducted to compare the performance of the EL and the normal approximation-based method, and a real example is analysed.  相似文献   

3.
This article proposes a new data‐based prior distribution for the error variance in a Gaussian linear regression model, when the model is used for Bayesian variable selection and model averaging. For a given subset of variables in the model, this prior has a mode that is an unbiased estimator of the error variance but is suitably dispersed to make it uninformative relative to the marginal likelihood. The advantage of this empirical Bayes prior for the error variance is that it is centred and dispersed sensibly and avoids the arbitrary specification of hyperparameters. The performance of the new prior is compared to that of a prior proposed previously in the literature using several simulated examples and two loss functions. For each example our paper also reports results for the model that orthogonalizes the predictor variables before performing subset selection. A real example is also investigated. The empirical results suggest that for both the simulated and real data, the performance of the estimators based on the prior proposed in our article compares favourably with that of a prior used previously in the literature.  相似文献   

4.
In functional linear regression, one conventional approach is to first perform functional principal component analysis (FPCA) on the functional predictor and then use the first few leading functional principal component (FPC) scores to predict the response variable. The leading FPCs estimated by the conventional FPCA stand for the major source of variation of the functional predictor, but these leading FPCs may not be mostly correlated with the response variable, so the prediction accuracy of the functional linear regression model may not be optimal. In this paper, we propose a supervised version of FPCA by considering the correlation of the functional predictor and response variable. It can automatically estimate leading FPCs, which represent the major source of variation of the functional predictor and are simultaneously correlated with the response variable. Our supervised FPCA method is demonstrated to have a better prediction accuracy than the conventional FPCA method by using one real application on electroencephalography (EEG) data and three carefully designed simulation studies.  相似文献   

5.
One of the most important steps in the design of a pharmaceutical clinical trial is the estimation of the sample size. For a superiority trial the sample size formula (to achieve a stated power) would be based on a given clinically meaningful difference and a value for the population variance. The formula is typically used as though this population variance is known whereas in reality it is unknown and is replaced by an estimate with its associated uncertainty. The variance estimate would be derived from an earlier similarly designed study (or an overall estimate from several previous studies) and its precision would depend on its degrees of freedom. This paper provides a solution for the calculation of sample sizes that allows for the imprecision in the estimate of the sample variance and shows how traditional formulae give sample sizes that are too small since they do not allow for this uncertainty with the deficiency being more acute with fewer degrees of freedom. It is recommended that the methodology described in this paper should be used when the sample variance has less than 200 degrees of freedom.  相似文献   

6.
A NOTE ON VARIANCE ESTIMATION FOR THE GENERALIZED REGRESSION PREDICTOR   总被引:1,自引:0,他引:1  
The generalized regression (GREG) predictor is used for estimating a finite population total when the study variable is well‐related to the auxiliary variable. In 1997, Chaudhuri & Roy provided an optimal estimator for the variance of the GREG predictor within a class of non‐homogeneous quadratic estimators (H) under a certain superpopulation model M. They also found an inequality concerning the expected variances of the estimators of the variance of the GREG predictor belonging to the class H under the model M. This paper shows that the derivation of the optimal estimator and relevant inequality, presented by Chaudhuri & Roy, are incorrect.  相似文献   

7.
Minimization of the variance of the estimated slope of a response surface maximized over all points in the factor space is taken as the design criterion. Optimal designs under the criterion are derived for second-order polynomial regression over hypercubic regions.  相似文献   

8.
This article provides a method of interpreting a surprising inequality in multiple linear regression: the squared multiple correlation can be greater than the sum of the simple squared correlations between the response variable and each of the predictor variables. The interpretation is obtained via principal component analysis by studying the influence of some components with small variance on the response variable. One example is used as an illustration and some conclusions are derived.  相似文献   

9.
基于回归组合技术的连续性抽样估计方法研究   总被引:1,自引:1,他引:0  
在使用样本轮换的连续性抽样调查中,不仅可以利用前期调查的研究变量的信息,还可使用现期调查的辅助变量信息来建立回归模型进行回归估计,进而构造回归组合估计量,并在此基础上确定最优样本轮换率和最优权重系数,使得回归组合估计量的方差最小,从而更大程度地提高连续性抽样调查的估计精度。  相似文献   

10.
For a linear regression model over m populations with separate regression coefficients but a common error variance, a Bayesian model is employed to obtain regression coefficient estimates which are shrunk toward an overall value. The formulation uses Normal priors on the coefficients and diffuse priors on the grand mean vectors, the error variance, and the between-to-error variance ratios. The posterior density of the parameters which were given diffuse priors is obtained. From this the posterior means and variances of regression coefficients and the predictive mean and variance of a future observation are obtained directly by numerical integration in the balanced case, and with the aid of series expansions in the approximately balanced case. An example is presented and worked out for the case of one predictor variable. The method is an extension of Box & Tiao's Bayesian estimation of means in the balanced one-way random effects model.  相似文献   

11.
In this paper, a hypothesis test for heteroscedasticity is proposed in a nonparametric regression model. The test statistic, which uses the residuals from a nonparametric fit of the mean function, is based on an adaptation of the well-known Levene's test. Using the recent theory for analysis of variance when the number of factor levels goes to infinity, the asymptotic distribution of the test statistic is established under the null hypothesis of homocedasticity and under local alternatives. Simulations suggest that the proposed test performs well in several situations, especially when the variance is a nonlinear function of the predictor.  相似文献   

12.
The balanced half-sample, jackknife and linearization methods are used to estimate the variance of the slope of a linear regression under a variety of computer generated situations. The basic sampling design is one in which two PSU's are selected from each of a number of strata . The variance estimation techniques are compared with a Monte Carlo experiment. Results show that variance estimates may be highly biased and variable unless sizeable numbers of observations are available from each stratum. The jackknife and linearization estimates appear superior to the balanced half sample method - particularly when the number of strata or the number of available observations from each stratum is small.  相似文献   

13.
Many areas of statistical modeling are plagued by the “curse of dimensionality,” in which there are more variables than observations. This is especially true when developing functional regression models where the independent dataset is some type of spectral decomposition, such as data from near-infrared spectroscopy. While we could develop a very complex model by simply taking enough samples (such that n > p), this could prove impossible or prohibitively expensive. In addition, a regression model developed like this could turn out to be highly inefficient, as spectral data usually exhibit high multicollinearity. In this article, we propose a two-part algorithm for selecting an effective and efficient functional regression model. Our algorithm begins by evaluating a subset of discrete wavelet transformations, allowing for variation in both wavelet and filter number. Next, we perform an intermediate processing step to remove variables with low correlation to the response data. Finally, we use the genetic algorithm to perform a stochastic search through the subset regression model space, driven by an information-theoretic objective function. We allow our algorithm to develop the regression model for each response variable independently, so as to optimally model each variable. We demonstrate our method on the familiar biscuit dough dataset, which has been used in a similar context by several researchers. Our results demonstrate both the flexibility and the power of our algorithm. For each response variable, a different subset model is selected, and different wavelet transformations are used. The models developed by our algorithm show an improvement, as measured by lower mean error, over results in the published literature.  相似文献   

14.
New robust estimates for variance components are introduced. Two simple models are considered: the balanced one-way classification model with a random factor and the balanced mixed model with one random factor and one fixed factor. However, the method of estimation proposed can be extended to more complex models. The new method of estimation we propose is based on the relationship between the variance components and the coefficients of the least-mean-squared-error predictor between two observations of the same group. This relationship enables us to transform the problem of estimating the variance components into the problem of estimating the coefficients of a simple linear regression model. The variance-component estimators derived from the least-squares regression estimates are shown to coincide with the maximum-likelihood estimates. Robust estimates of the variance components can be obtained by replacing the least-squares estimates by robust regression estimates. In particular, a Monte Carlo study shows that for outlier-contaminated normal samples, the estimates of variance components derived from GM regression estimates and the derived test outperform other robust procedures.  相似文献   

15.
The generalized regression (greg) predictor for the finite population total of a real variable is often employed when values of an auxiliary variable are available. Several variance estimators for it do well in large samples though bearing no optimality properties. We find a variance estimator which, under a restrictive model, has an optimality property under ‘exact’ as well as ‘asymptotic’ analysis. But this involves model parameters. Under a further restriction on the model, two model-parameter-free variance estimators are derived sharing the same ‘asymptotic’ optimality. Numerical illustrations through simulation are presented to demonstrate marginal improvements in using them rather than their predecessors. Two of the latter, though not optimal, are simpler, intuitively appealing, compete well in large samples, generally applicable and should be persisted with in practice.  相似文献   

16.
In linear quantile regression, the regression coefficients for different quantiles are typically estimated separately. Efforts to improve the efficiency of estimators are often based on assumptions of commonality among the slope coefficients. We propose instead a two-stage procedure whereby the regression coefficients are first estimated separately and then smoothed over quantile level. Due to the strong correlation between coefficient estimates at nearby quantile levels, existing bandwidth selectors will pick bandwidths that are too small. To remedy this, we use 10-fold cross-validation to determine a common bandwidth inflation factor for smoothing the intercept as well as slope estimates. Simulation results suggest that the proposed method is effective in pooling information across quantile levels, resulting in estimates that are typically more efficient than the separately obtained estimates and the interquantile shrinkage estimates derived using a fused penalty function. The usefulness of the proposed method is demonstrated in a real data example.  相似文献   

17.
Estimation of the population mean under the regression model with random components is considered. Conditions under which the random components regression estimator is design consistent are given. It is shown that consistency holds when incorrect values are used for the variance components. The regression estimator constructed with model parameters that differ considerably from the true parameters performed well in a Monte Carlo study. Variance estimators for the regression predictor are suggested. A variance estimator appropriate for estimators constructed with a biased estimator for the between-group variance component performed well in the Monte Carlo study.  相似文献   

18.
Abstract

Dominance analysis is a procedure for measuring the importance of predictors in multiple regression analysis. We show that dominance analysis can be enhanced using a dynamic programing approach for the rank-ordering of predictors. Using customer satisfaction data from a call center operation, we demonstrate how the integration of dominance analysis with dynamic programing can provide a better understanding of predictor importance. As a cautionary note, we recommend careful reflection on the relationship between predictor importance and variable subset selection. We observed that slight changes in the selected predictor subset can have an impact on the importance rankings produced by a dominance analysis.  相似文献   

19.
In regression analysis, it is assumed that the response (or dependent variable) distribution is Normal, and errors are homoscedastic and uncorrelated. However, in practice, these assumptions are rarely satisfied by a real data set. To stabilize the heteroscedastic response variance, generally, log-transformation is suggested. Consequently, the response variable distribution approaches nearer to the Normal distribution. As a result, the model fit of the data is improved. Practically, a proper (seems to be suitable) transformation may not always stabilize the variance, and the response distribution may not reduce to Normal distribution. The present article assumes that the response distribution is log-normal with compound autocorrelated errors. Under these situations, estimation and testing of hypotheses regarding regression parameters have been derived. From a set of reduced data, we have derived the best linear unbiased estimators of all the regression coefficients, except the intercept which is often unimportant in practice. Unknown correlation parameters have been estimated. In this connection, we have derived a test rule for testing any set of linear hypotheses of the unknown regression coefficients. In addition, we have developed the confidence ellipsoids of a set of estimable functions of regression coefficients. For the fitted regression equation, an index of fit has been proposed. A simulated study illustrates the results derived in this report.  相似文献   

20.
Abstract

In this article, we study the variable selection and estimation for linear regression models with missing covariates. The proposed estimation method is almost as efficient as the popular least-squares-based estimation method for normal random errors and empirically shown to be much more efficient and robust with respect to heavy tailed errors or outliers in the responses and covariates. To achieve sparsity, a variable selection procedure based on SCAD is proposed to conduct estimation and variable selection simultaneously. The procedure is shown to possess the oracle property. To deal with the covariates missing, we consider the inverse probability weighted estimators for the linear model when the selection probability is known or unknown. It is shown that the estimator by using estimated selection probability has a smaller asymptotic variance than that with true selection probability, thus is more efficient. Therefore, the important Horvitz-Thompson property is verified for penalized rank estimator with the covariates missing in the linear model. Some numerical examples are provided to demonstrate the performance of the estimators.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号