首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Varying-coefficient partially linear models provide a useful tools for modeling of covariate effects on the response variable in regression. One key question in varying-coefficient partially linear models is the choice of model structure, that is, how to decide which covariates have linear effect and which have non linear effect. In this article, we propose a profile method for identifying the covariates with linear effect or non linear effect. Our proposed method is a penalized regression approach based on group minimax concave penalty. Under suitable conditions, we show that the proposed method can correctly determine which covariates have a linear effect and which do not with high probability. The convergence rate of the linear estimator is established as well as the asymptotical normality. The performance of the proposed method is evaluated through a simulation study which supports our theoretical results.  相似文献   

2.
Quantile regression (QR) is a natural alternative for depicting the impact of covariates on the conditional distributions of a outcome variable instead of the mean. In this paper, we investigate Bayesian regularized QR for the linear models with autoregressive errors. LASSO-penalized type priors are forced on regression coefficients and autoregressive parameters of the model. Gibbs sampler algorithm is employed to draw the full posterior distributions of unknown parameters. Finally, the proposed procedures are illustrated by some simulation studies and applied to a real data analysis of the electricity consumption.  相似文献   

3.
To study the relationship between a sensitive binary response variable and a set of non‐sensitive covariates, this paper develops a hidden logistic regression to analyse non‐randomized response data collected via the parallel model originally proposed by Tian (2014). This is the first paper to employ the logistic regression analysis in the field of non‐randomized response techniques. Both the Newton–Raphson algorithm and a monotone quadratic lower bound algorithm are developed to derive the maximum likelihood estimates of the parameters of interest. In particular, the proposed logistic parallel model can be used to study the association between a sensitive binary variable and another non‐sensitive binary variable via the measure of odds ratio. Simulations are performed and a study on people's sexual practice data in the United States is used to illustrate the proposed methods.  相似文献   

4.
There are several procedures for fitting generalized additive models, i.e. regression models for an exponential family response where the influence of each single covariates is assumed to have unknown, potentially non-linear shape. Simulated data are used to compare a smoothing parameter optimization approach for selection of smoothness and of covariates, a stepwise approach, a mixed model approach, and a procedure based on boosting techniques. In particular it is investigated how the performance of procedures is linked to amount of information, type of response, total number of covariates, number of influential covariates, and extent of non-linearity. Measures for comparison are prediction performance, identification of influential covariates, and smoothness of fitted functions. One result is that the mixed model approach returns sparse fits with frequently over-smoothed functions, while the functions are less smooth for the boosting approach and variable selection is less strict. The other approaches are in between with respect to these measures. The boosting procedure is seen to perform very well when little information is available and/or when a large number of covariates is to be investigated. It is somewhat surprising that in scenarios with low information the fitting of a linear model, even with stepwise variable selection, has not much advantage over the fitting of an additive model when the true underlying structure is linear. In cases with more information the prediction performance of all procedures is very similar. So, in difficult data situations the boosting approach can be recommended, in others the procedures can be chosen conditional on the aim of the analysis.  相似文献   

5.
The properties of a method of estimating the ratio of parameters for ordered categorical response regression models are discussed. If the link function relating the response variable to the linear combination of covariates is unknown then it is only possible to estimate the ratio of regression parameters. This ratio of parameters has a substitutability or relative importance interpretation.

The maximum likelihood estimate of the ratio of parameters, assuming a logistic function (McCullagh, 1980), is found to have very small bias for a wide variety of true link functions. Further it is shown using Monte Carlo simulations that this maximum likelihood estimate, has good coverage properties, even if the link function is incorrectly specified. It is demonstrated that combining adjacent categories to make the response binary can result in an analysis which is appreciably less efficient. The size of the efficiency loss on, among other factors, the marginal distribution in the ordered categories  相似文献   

6.
The class of beta regression models proposed by Ferrari and Cribari-Neto [Beta regression for modelling rates and proportions, Journal of Applied Statistics 31 (2004), pp. 799–815] is useful for modelling data that assume values in the standard unit interval (0, 1). The dependent variable relates to a linear predictor that includes regressors and unknown parameters through a link function. The model is also indexed by a precision parameter, which is typically taken to be constant for all observations. Some authors have used, however, variable dispersion beta regression models, i.e., models that include a regression submodel for the precision parameter. In this paper, we show how to perform testing inference on the parameters that index the mean submodel without having to model the data precision. This strategy is useful as it is typically harder to model dispersion effects than mean effects. The proposed inference procedure is accurate even under variable dispersion. We present the results of extensive Monte Carlo simulations where our testing strategy is contrasted to that in which the practitioner models the underlying dispersion and then performs testing inference. An empirical application that uses real (not simulated) data is also presented and discussed.  相似文献   

7.
The joint models for longitudinal data and time-to-event data have recently received numerous attention in clinical and epidemiologic studies. Our interest is in modeling the relationship between event time outcomes and internal time-dependent covariates. In practice, the longitudinal responses often show non linear and fluctuated curves. Therefore, the main aim of this paper is to use penalized splines with a truncated polynomial basis to parameterize the non linear longitudinal process. Then, the linear mixed-effects model is applied to subject-specific curves and to control the smoothing. The association between the dropout process and longitudinal outcomes is modeled through a proportional hazard model. Two types of baseline risk functions are considered, namely a Gompertz distribution and a piecewise constant model. The resulting models are referred to as penalized spline joint models; an extension of the standard joint models. The expectation conditional maximization (ECM) algorithm is applied to estimate the parameters in the proposed models. To validate the proposed algorithm, extensive simulation studies were implemented followed by a case study. In summary, the penalized spline joint models provide a new approach for joint models that have improved the existing standard joint models.  相似文献   

8.
Efficient statistical inference on nonignorable missing data is a challenging problem. This paper proposes a new estimation procedure based on composite quantile regression (CQR) for linear regression models with nonignorable missing data, that is applicable even with high-dimensional covariates. A parametric model is assumed for modelling response probability, which is estimated by the empirical likelihood approach. Local identifiability of the proposed strategy is guaranteed on the basis of an instrumental variable approach. A set of data-based adaptive weights constructed via an empirical likelihood method is used to weight CQR functions. The proposed method is resistant to heavy-tailed errors or outliers in the response. An adaptive penalisation method for variable selection is proposed to achieve sparsity with high-dimensional covariates. Limiting distributions of the proposed estimators are derived. Simulation studies are conducted to investigate the finite sample performance of the proposed methodologies. An application to the ACTG 175 data is analysed.  相似文献   

9.
Least-squares regression is not appropriate when the response variable is circular, and can lead to erroneous results. The reason for this is that the squared difference is not an appropriate measure of distance on the circle. In this paper, a circular analog to least-squares regression is presented for predicting a circular response variable by another circular variable and a set of linear covariates. An alternative maximum-likelihood formulation yields the same regression parameter estimates. Under the maximum-likelihood model, asymptotic standard errors of the parameter estimates are obtained. As an example, the regression model is used to model data from a marine biology study.  相似文献   

10.
We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed.  相似文献   

11.
This paper deals with the problem of predicting the real‐valued response variable using explanatory variables containing both multivariate random variable and random curve. The proposed functional partial linear single‐index model treats the multivariate random variable as linear part and the random curve as functional single‐index part, respectively. To estimate the non‐parametric link function, the functional single‐index and the parameters in the linear part, a two‐stage estimation procedure is proposed. Compared with existing semi‐parametric methods, the proposed approach requires no initial estimation and iteration. Asymptotical properties are established for both the parameters in the linear part and the functional single‐index. The convergence rate for the non‐parametric link function is also given. In addition, asymptotical normality of the error variance is obtained that facilitates the construction of confidence region and hypothesis testing for the unknown parameter. Numerical experiments including simulation studies and a real‐data analysis are conducted to evaluate the empirical performance of the proposed method.  相似文献   

12.
Conventionally, a ridge parameter is estimated as a function of regression parameters based on ordinary least squares. In this article, we proposed an iterative procedure instead of the one-step or conventional ridge method. Additionally, we construct an indicator that measures the potential degree of improvement in mean squared error when ridge estimates are employed. Simulations show that our methods are appropriate for a wide class of non linear models including generalized linear models and proportional hazards (PHs) regressions. The method is applied to a PH regression with highly collinear covariates in a cancer recurrence study.  相似文献   

13.
Semiparametric regression models with multiple covariates are commonly encountered. When there are covariates not associated with response variable, variable selection may lead to sparser models, more lucid interpretations and more accurate estimation. In this study, we adopt a sieve approach for the estimation of nonparametric covariate effects in semiparametric regression models. We adopt a two-step iterated penalization approach for variable selection. In the first step, a mixture of the Lasso and group Lasso penalties are employed to conduct the first-round variable selection and obtain the initial estimate. In the second step, a mixture of the weighted Lasso and weighted group Lasso penalties, with weights constructed using the initial estimate, are employed for variable selection. We show that the proposed iterated approach has the variable selection consistency property, even when number of unknown parameters diverges with sample size. Numerical studies, including simulation and analysis of a diabetes dataset, show satisfactory performance of the proposed approach.  相似文献   

14.
A regression model with skew-normal errors provides a useful extension for ordinary normal regression models when the data set under consideration involves asymmetric outcomes. Variable selection is an important issue in all regression analyses, and in this paper, we investigate the simultaneously variable selection in joint location and scale models of the skew-normal distribution. We propose a unified penalized likelihood method which can simultaneously select significant variables in the location and scale models. Furthermore, the proposed variable selection method can simultaneously perform parameter estimation and variable selection in the location and scale models. With appropriate selection of the tuning parameters, we establish the consistency and the oracle property of the regularized estimators. Simulation studies and a real example are used to illustrate the proposed methodologies.  相似文献   

15.
When variable selection with stepwise regression and model fitting are conducted on the same data set, competition for inclusion in the model induces a selection bias in coefficient estimators away from zero. In proportional hazards regression with right-censored data, selection bias inflates the absolute value of parameter estimate of selected parameters, while the omission of other variables may shrink coefficients toward zero. This paper explores the extent of the bias in parameter estimates from stepwise proportional hazards regression and proposes a bootstrap method, similar to those proposed by Miller (Subset Selection in Regression, 2nd edn. Chapman & Hall/CRC, 2002) for linear regression, to correct for selection bias. We also use bootstrap methods to estimate the standard error of the adjusted estimators. Simulation results show that substantial biases could be present in uncorrected stepwise estimators and, for binary covariates, could exceed 250% of the true parameter value. The simulations also show that the conditional mean of the proposed bootstrap bias-corrected parameter estimator, given that a variable is selected, is moved closer to the unconditional mean of the standard partial likelihood estimator in the chosen model, and to the population value of the parameter. We also explore the effect of the adjustment on estimates of log relative risk, given the values of the covariates in a selected model. The proposed method is illustrated with data sets in primary biliary cirrhosis and in multiple myeloma from the Eastern Cooperative Oncology Group.  相似文献   

16.
While most regression models focus on explaining distributional aspects of one single response variable alone, interest in modern statistical applications has recently shifted towards simultaneously studying multiple response variables as well as their dependence structure. A particularly useful tool for pursuing such an analysis are copula-based regression models since they enable the separation of the marginal response distributions and the dependence structure summarised in a specific copula model. However, so far copula-based regression models have mostly been relying on two-step approaches where the marginal distributions are determined first whereas the copula structure is studied in a second step after plugging in the estimated marginal distributions. Moreover, the parameters of the copula are mostly treated as a constant not related to covariates and most regression specifications for the marginals are restricted to purely linear predictors. We therefore propose simultaneous Bayesian inference for both the marginal distributions and the copula using computationally efficient Markov chain Monte Carlo simulation techniques. In addition, we replace the commonly used linear predictor by a generic structured additive predictor comprising for example nonlinear effects of continuous covariates, spatial effects or random effects and furthermore allow to make the copula parameters covariate-dependent. To facilitate Bayesian inference, we construct proposal densities for a Metropolis–Hastings algorithm relying on quadratic approximations to the full conditionals of regression coefficients avoiding manual tuning. The performance of the resulting Bayesian estimates is evaluated in simulations comparing our approach with penalised likelihood inference, studying the choice of a specific copula model based on the deviance information criterion, and comparing a simultaneous approach with a two-step procedure. Furthermore, the flexibility of Bayesian conditional copula regression models is illustrated in two applications on childhood undernutrition and macroecology.  相似文献   

17.
部分线性模型是一类非常重要的半参数回归模型,由于它既含有参数部分又含有非参数部分,与常规的线性模型相比具有更强的适应性和解释能力。文章研究带有局部平稳协变量的固定效应部分线性面板数据模型的统计推断。首先提出一个两阶段估计方法得到模型中未知参数和非参数函数的估计,并证明估计量的渐近性质,然后运用不变原理构造出非参数函数的一致置信带,最后通过数值模拟研究和实例分析验证了该方法的有效性。  相似文献   

18.
Abstract

In this article, we study the variable selection and estimation for linear regression models with missing covariates. The proposed estimation method is almost as efficient as the popular least-squares-based estimation method for normal random errors and empirically shown to be much more efficient and robust with respect to heavy tailed errors or outliers in the responses and covariates. To achieve sparsity, a variable selection procedure based on SCAD is proposed to conduct estimation and variable selection simultaneously. The procedure is shown to possess the oracle property. To deal with the covariates missing, we consider the inverse probability weighted estimators for the linear model when the selection probability is known or unknown. It is shown that the estimator by using estimated selection probability has a smaller asymptotic variance than that with true selection probability, thus is more efficient. Therefore, the important Horvitz-Thompson property is verified for penalized rank estimator with the covariates missing in the linear model. Some numerical examples are provided to demonstrate the performance of the estimators.  相似文献   

19.
In survey sampling, policymaking regarding the allocation of resources to subgroups (called small areas) or the determination of subgroups with specific properties in a population should be based on reliable estimates. Information, however, is often collected at a different scale than that of these subgroups; hence, the estimation can only be obtained on finer scale data. Parametric mixed models are commonly used in small‐area estimation. The relationship between predictors and response, however, may not be linear in some real situations. Recently, small‐area estimation using a generalised linear mixed model (GLMM) with a penalised spline (P‐spline) regression model, for the fixed part of the model, has been proposed to analyse cross‐sectional responses, both normal and non‐normal. However, there are many situations in which the responses in small areas are serially dependent over time. Such a situation is exemplified by a data set on the annual number of visits to physicians by patients seeking treatment for asthma, in different areas of Manitoba, Canada. In cases where covariates that can possibly predict physician visits by asthma patients (e.g. age and genetic and environmental factors) may not have a linear relationship with the response, new models for analysing such data sets are required. In the current work, using both time‐series and cross‐sectional data methods, we propose P‐spline regression models for small‐area estimation under GLMMs. Our proposed model covers both normal and non‐normal responses. In particular, the empirical best predictors of small‐area parameters and their corresponding prediction intervals are studied with the maximum likelihood estimation approach being used to estimate the model parameters. The performance of the proposed approach is evaluated using some simulations and also by analysing two real data sets (precipitation and asthma).  相似文献   

20.
For clustering mixed categorical and continuous data, Lawrence and Krzanowski (1996) proposed a finite mixture model in which component densities conform to the location model. In the graphical models literature the location model is known as the homogeneous Conditional Gaussian model. In this paper it is shown that their model is not identifiable without imposing additional restrictions. Specifically, for g groups and m locations, (g!)m–1 distinct sets of parameter values (not including permutations of the group mixing parameters) produce the same likelihood function. Excessive shrinkage of parameter estimates in a simulation experiment reported by Lawrence and Krzanowski (1996) is shown to be an artifact of the model's non-identifiability. Identifiable finite mixture models can be obtained by imposing restrictions on the conditional means of the continuous variables. These new identified models are assessed in simulation experiments. The conditional mean structure of the continuous variables in the restricted location mixture models is similar to that in the underlying variable mixture models proposed by Everitt (1988), but the restricted location mixture models are more computationally tractable.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号