首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Expectile regression [Newey W, Powell J. Asymmetric least squares estimation and testing, Econometrica. 1987;55:819–847] is a nice tool for estimating the conditional expectiles of a response variable given a set of covariates. Expectile regression at 50% level is the classical conditional mean regression. In many real applications having multiple expectiles at different levels provides a more complete picture of the conditional distribution of the response variable. Multiple linear expectile regression model has been well studied [Newey W, Powell J. Asymmetric least squares estimation and testing, Econometrica. 1987;55:819–847; Efron B. Regression percentiles using asymmetric squared error loss, Stat Sin. 1991;1(93):125.], but it can be too restrictive for many real applications. In this paper, we derive a regression tree-based gradient boosting estimator for nonparametric multiple expectile regression. The new estimator, referred to as ER-Boost, is implemented in an R package erboost publicly available at http://cran.r-project.org/web/packages/erboost/index.html. We use two homoscedastic/heteroscedastic random-function-generator models in simulation to show the high predictive accuracy of ER-Boost. As an application, we apply ER-Boost to analyse North Carolina County crime data. From the nonparametric expectile regression analysis of this dataset, we draw several interesting conclusions that are consistent with the previous study using the economic model of crime. This real data example also provides a good demonstration of some nice features of ER-Boost, such as its ability to handle different types of covariates and its model interpretation tools.  相似文献   

2.
In this paper, we investigate a nonparametric estimation of the conditional density of a scalar response variable given a random variable taking values in separable Hilbert space. We establish under general conditions the uniform almost complete convergence rates and the asymptotic normality of the conditional density kernel estimator, when the variables satisfy the strong mixing dependency, based on the single-index structure. The asymptotic \((1-\zeta )\) confidence intervals of conditional density function are given, for \(0 < \zeta < 1\) . We further demonstrate the impact of this functional parameter to the conditional mode estimate. Simulation study is also presented. Finally, the estimation of the functional index via the pseudo-maximum likelihood method is discussed, but not tackled.  相似文献   

3.
We are concerned with cumulative regression models for an ordered categorical response variable Y. We propose two methods to build partial residuals from regression on a subset Z1 of covariates Z., which take into regard the ordinal character of the response. The first method makes use of a multivariate GLM-representation of the model and produces residual measures for diagnostic purposes. The second uses a latent continuous variable model and yields new (adjusted) ordinal data Y*. Both methods are illustrated by a data set from forestry.  相似文献   

4.
Rubin (1976 Rubin, D.B. (1976). Inference and missing data. Biometrika 63(3):581592.[Crossref], [Web of Science ®] [Google Scholar]) derived general conditions under which inferences that ignore missing data are valid. These conditions are sufficient but not generally necessary, and therefore may be relaxed in some special cases. We consider here the case of frequentist estimation of a conditional cdf subject to missing outcomes. We partition a set of data into outcome, conditioning, and latent variables, all of which potentially affect the probability of a missing response. We describe sufficient conditions under which a complete-case estimate of the conditional cdf of the outcome given the conditioning variable is unbiased. We use simulations on a renal transplant data set (Dienemann et al.) to illustrate the implications of these results.  相似文献   

5.
In this article, we investigate the nonparametric estimation of the conditional density of a scalar response variable Y, given the explanatory variable X taking value in a Hilbert space when the observations are linked with a single index structure. The goal of this article is to present the asymptotic results such as pointwise almost complete consistency and the uniform almost complete convergence of the kernel estimation with rate for the conditional density in the setting of the α-mixing functional data, which extend the i.i.d case in Attaoui et al. (2011 Attaoui , S. , Laksaci , A. , Ould-Said , E. ( 2011 ). A note on the conditional density estimate in the single functional index model . Statist. Probab. Lett. 81 ( 1 ): 4553 .[Crossref], [Web of Science ®] [Google Scholar]) to the dependence setting. As an application, the convergence rate of the kernel estimation for the conditional mode is also obtained.  相似文献   

6.
This article focuses on the conditional density of a scalar response variable given a random variable taking values in a semimetric space. The local linear estimators of the conditional density and its derivative are considered. It is assumed that the observations form a stationary α-mixing sequence. Under some regularity conditions, the joint asymptotic normality of the estimators of the conditional density and its derivative is established. The result confirms the prospect in Rachdi et al. (2014 Rachdi, M., A. Laksaci, J. Demongeot, A. Abdali, and F. Madani. 2014. Theoretical and practical aspects of the quadratic error in the local linear estimation of the conditional density for functional data. Computational Statistics and Data Analysis 73 :5368.[Crossref], [Web of Science ®] [Google Scholar]) and can be applied in time-series analysis to make predictions and build confidence intervals. The finite-sample behavior of the estimator is investigated by simulations as well.  相似文献   

7.
8.
Nonparametric estimates of the conditional distribution of a response variable given a covariate are important for data exploration purposes. In this article, we propose a nonparametric estimator of the conditional distribution function in the case where the response variable is subject to interval censoring and double truncation. Using the approach of Dehghan and Duchesne (2011), the proposed method consists in adding weights that depend on the covariate value in the self-consistency equation of Turnbull (1976), which results in a nonparametric estimator. We demonstrate by simulation that the estimator, bootstrap variance estimation and bandwidth selection all perform well in finite samples.  相似文献   

9.
Copula models have become increasingly popular for modelling the dependence structure in multivariate survival data. The two-parameter Archimedean family of Power Variance Function (PVF) copulas includes the Clayton, Positive Stable (Gumbel) and Inverse Gaussian copulas as special or limiting cases, thus offers a unified approach to fitting these important copulas. Two-stage frequentist procedures for estimating the marginal distributions and the PVF copula have been suggested by Andersen (Lifetime Data Anal 11:333–350, 2005), Massonnet et al. (J Stat Plann Inference 139(11):3865–3877, 2009) and Prenen et al. (J R Stat Soc Ser B 79(2):483–505, 2017) which first estimate the marginal distributions and conditional on these in a second step to estimate the PVF copula parameters. Here we explore an one-stage Bayesian approach that simultaneously estimates the marginal and the PVF copula parameters. For the marginal distributions, we consider both parametric as well as semiparametric models. We propose a new method to simulate uniform pairs with PVF dependence structure based on conditional sampling for copulas and on numerical approximation to solve a target equation. In a simulation study, small sample properties of the Bayesian estimators are explored. We illustrate the usefulness of the methodology using data on times to appendectomy for adult twins in the Australian NH&MRC Twin registry. Parameters of the marginal distributions and the PVF copula are simultaneously estimated in a parametric as well as a semiparametric approach where the marginal distributions are modelled using Weibull and piecewise exponential distributions, respectively.  相似文献   

10.
The randomized response (RR) procedures for estimating the proportion (π)(π) of a population belonging to a sensitive or stigmatized group ask each respondent to report a response by randomly transforming his/her true attribute into one of several response categories. In this paper, we present a common framework for discussing various RR surveys of dichotomous populations with polychotomous responses. The unified approach is focused on the substantive issues relating to respondents’ privacy and statistical efficiency and is helpful for fair comparison of various procedures. We describe a general technique for constructing unbiased estimators of ππ based on arbitrary RR procedures, from unbiased estimators based on an open survey with the same sampling design. The technique works well for any sampling design p(s)p(s) and also for variance estimation. We develop an approach for comparing RR procedures, taking both respondents’ protection and statistical efficiency into account. For any given RR procedure with three or more response categories, we present a method for designing an RR procedure with a binary response variable which provides the same respondents’ protection and at least as much statistical information. This result suggests that RR surveys of dichotomous populations should use only binary response variables.  相似文献   

11.
Given a stationary multidimensional spatial process $\left\{ Z_{\mathbf{i}}=\left( X_{\mathbf{i}},\ Y_{\mathbf{i}}\right) \in \mathbb R ^d\right. \left. \times \mathbb R ,\mathbf{i}\in \mathbb Z ^{N}\right\} $ , we investigate a kernel estimate of the spatial conditional mode function of the response variable $Y_{\mathbf{i}}$ given the explicative variable $X_{\mathbf{i}}$ . Consistency in $L^p$ norm and strong convergence of the kernel estimate are obtained when the sample considered is a $\alpha $ -mixing sequence. An application to real data is given in order to illustrate the behavior of our methodology.  相似文献   

12.
13.
Cox's seminal 1972 paper on regression methods for possibly censored failure time data popularized the use of time to an event as a primary response in prospective studies. But one key assumption of this and other regression methods is that observations are independent of one another. In many problems, failure times are clustered into small groups where outcomes within a group are correlated. Examples include failure times for two eyes from one person or for members of the same family.This paper presents a survey of models for multivariate failure time data. Two distinct classes of models are considered: frailty and marginal models. In a frailty model, the correlation is assumed to derive from latent variables (frailties) common to observations from the same cluster. Regression models are formulated for the conditional failure time distribution given the frailties. Alternatively, marginal models describe the marginal failure time distribution of each response while separately modelling the association among responses from the same cluster.We focus on recent extensions of the proportional hazards model for multivariate failure time data. Model formulation, parameter interpretation and estimation procedures are considered.  相似文献   

14.
Latent class analysis (LCA) has been found to have important applications in social and behavioral sciences for modeling categorical response variables, and nonresponse is typical when collecting data. In this study, the nonresponse mainly included “contingency questions” and real “missing data.” The primary objective of this research was to evaluate the effects of some potential factors on model selection indices in LCA with nonresponse data.

We simulated missing data with contingency questions and evaluated the accuracy rates of eight information criteria for selecting the correct models. The results showed that the main factors are latent class proportions, conditional probabilities, sample size, the number of items, the missing data rate, and the contingency data rate. Interactions of the conditional probabilities with class proportions, sample size, and the number of items are also significant. From our simulation results, the impact of missing data and contingency questions can be amended by increasing the sample size or the number of items.  相似文献   


15.
In this paper we consider a nonparametric regression model in which the conditional variance function is assumed to vary smoothly with the predictor. We offer an easily implemented and fully Bayesian approach that involves the Markov chain Monte Carlo sampling of standard distributions. This method is based on a technique utilized by Kim, Shephard, and Chib (in Rev. Econ. Stud. 65:361–393, 1998) for the stochastic volatility model. Although the (parametric or nonparametric) heteroscedastic regression and stochastic volatility models are quite different, they share the same structure as far as the estimation of the conditional variance function is concerned, a point that has been previously overlooked. Our method can be employed in the frequentist context and in Bayesian models more general than those considered in this paper. Illustrations of the method are provided.  相似文献   

16.
In a regression model with univariate censored responses, a new estimator of the joint distribution function of the covariates and response is proposed, under the assumption that the response and the censoring variable are independent conditionally to the covariates. This estimator is based on the conditional Kaplan–Meier estimator of Beran (1981 Beran , R. ( 1981 ). Nonparametric regression with randomly censored survival data. Technical Report, University of California, Berkeley, California . [Google Scholar]), and happens to be an extension of the multivariate empirical distribution function used in the uncensored case. We derive asymptotic i.i.d. representations for the integrals with respect to the measure defined by this estimated distribution function. These representations hold even in the case where the covariates are multidimensional under some additional assumption on the censoring. Applications to censored regression and to density estimation are considered.  相似文献   

17.
A main goal of regression is to derive statistical conclusions on the conditional distribution of the output variable Y given the input values x. Two of the most important characteristics of a single distribution are location and scale. Regularised kernel methods (RKMs) – also called support vector machines in a wide sense – are well established to estimate location functions like the conditional median or the conditional mean. We investigate the estimation of scale functions by RKMs when the conditional median is unknown, too. Estimation of scale functions is important, e.g. to estimate the volatility in finance. We consider the median absolute deviation (MAD) and the interquantile range as measures of scale. Our main result shows the consistency of MAD-type RKMs.  相似文献   

18.
Simple nonparametric estimates of the conditional distribution of a response variable given a covariate are often useful for data exploration purposes or to help with the specification or validation of a parametric or semi-parametric regression model. In this paper we propose such an estimator in the case where the response variable is interval-censored and the covariate is continuous. Our approach consists in adding weights that depend on the covariate value in the self-consistency equation proposed by Turnbull (J R Stat Soc Ser B 38:290–295, 1976), which results in an estimator that is no more difficult to implement than Turnbull’s estimator itself. We show the convergence of our algorithm and that our estimator reduces to the generalized Kaplan–Meier estimator (Beran, Nonparametric regression with randomly censored survival data, 1981) when the data are either complete or right-censored. We demonstrate by simulation that the estimator, bootstrap variance estimation and bandwidth selection (by rule of thumb or cross-validation) all perform well in finite samples. We illustrate the method by applying it to a dataset from a study on the incidence of HIV in a group of female sex workers from Kinshasa.  相似文献   

19.
We propose a new set of test statistics to examine the association between two ordinal categorical variables X and Y after adjusting for continuous and/or categorical covariates Z. Our approach first fits multinomial (e.g., proportional odds) models of X and Y, separately, on Z. For each subject, we then compute the conditional distributions of X and Y given Z. If there is no relationship between X and Y after adjusting for Z, then these conditional distributions will be independent, and the observed value of (X, Y) for a subject is expected to follow the product distribution of these conditional distributions. We consider two simple ways of testing the null of conditional independence, both of which treat X and Y equally, in the sense that they do not require specifying an outcome and a predictor variable. The first approach adds these product distributions across all subjects to obtain the expected distribution of (X, Y) under the null and then contrasts it with the observed unconditional distribution of (X, Y). Our second approach computes "residuals" from the two multinomial models and then tests for correlation between these residuals; we define a new individual-level residual for models with ordinal outcomes. We present methods for computing p-values using either the empirical or asymptotic distributions of our test statistics. Through simulations, we demonstrate that our test statistics perform well in terms of power and Type I error rate when compared to proportional odds models which treat X as either a continuous or categorical predictor. We apply our methods to data from a study of visual impairment in children and to a study of cervical abnormalities in human immunodeficiency virus (HIV)-infected women. Supplemental materials for the article are available online.  相似文献   

20.
In many experiments where pre-treatment and post-treatment measurements are taken, investigators wish to determine if there is a difference between two treatment groups. For this type of data, the post-treatment variable is used as the primary comparison variable and the pre-treatment variable is used as a covariate. Although most of the discussion in this paper is written with the pre-treatment variable as the covariate the results are applicable to other choices of a covariate. Tests based on residuals have been proposed as alternatives to the usual covariance methods. Our objective is to investigate how the powers of these tests are affected when the conditional variance of the post-treatment variable depends on the magnitude of the pre-treatment variable. In particular, we investigate two cases. [1] Crager, Michael R. 1987. Analysis of Covariance in Parallel-Group Clinical Trials With Pretreatment Baselines. Biometrics, 43: 895901. [Crossref], [PubMed], [Web of Science ®] [Google Scholar] The conditional variance of the post-treatment variable gradually increases as the magnitude of the pre-treatment variable increases. (In many biological models this is the case.) [2] Knoke, James D. 1991. Nonparametric Analysis of Covariance for Comparing Change in Randomized Studies with Baseline Values Subject to Error. Biometrics, 47: 523533. [Crossref], [PubMed], [Web of Science ®] [Google Scholar] The conditional variance of the post-treatment variable is dependent upon natural or imposed subgroups contained within the pre-treatment variable. Power comparisons are made using Monte Carlo techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号