首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We are interested in estimating prediction error for a classification model built on high dimensional genomic data when the number of genes (p) greatly exceeds the number of subjects (n). We examine a distance argument supporting the conventional 0.632+ bootstrap proposed for the $n > p$ scenario, modify it for the $n < p$ situation and develop learning curves to describe how the true prediction error varies with the number of subjects in the training set. The curves are then applied to define adjusted resampling estimates for the prediction error in order to achieve a balance in terms of bias and variability. The adjusted resampling methods are proposed as counterparts of the 0.632+ bootstrap when $n < p$ , and are found to improve on the 0.632+ bootstrap and other existing methods in the microarray study scenario when the sample size is small and there is some level of differential expression. The Canadian Journal of Statistics 41: 133–150; 2013 © 2012 Statistical Society of Canada  相似文献   

2.
In a missing data setting, we have a sample in which a vector of explanatory variables ${\bf x}_i$ is observed for every subject i, while scalar responses $y_i$ are missing by happenstance on some individuals. In this work we propose robust estimators of the distribution of the responses assuming missing at random (MAR) data, under a semiparametric regression model. Our approach allows the consistent estimation of any weakly continuous functional of the response's distribution. In particular, strongly consistent estimators of any continuous location functional, such as the median, L‐functionals and M‐functionals, are proposed. A robust fit for the regression model combined with the robust properties of the location functional gives rise to a robust recipe for estimating the location parameter. Robustness is quantified through the breakdown point of the proposed procedure. The asymptotic distribution of the location estimators is also derived. The proofs of the theorems are presented in Supplementary Material available online. The Canadian Journal of Statistics 41: 111–132; 2013 © 2012 Statistical Society of Canada  相似文献   

3.
We consider the maximum likelihood estimator $\hat{F}_n$ of a distribution function in a class of deconvolution models where the known density of the noise variable is of bounded variation. This class of noise densities contains in particular bounded, decreasing densities. The estimator $\hat{F}_n$ is defined, characterized in terms of Fenchel optimality conditions and computed. Under appropriate conditions, various consistency results for $\hat{F}_n$ are derived, including uniform strong consistency. The Canadian Journal of Statistics 41: 98–110; 2013 © 2012 Statistical Society of Canada  相似文献   

4.
The class $G^{\rho,\lambda }$ of weighted log‐rank tests proposed by Fleming & Harrington [Fleming & Harrington (1991) Counting Processes and Survival Analysis, Wiley, New York] has been widely used in survival analysis and is nowadays, unquestionably, the established method to compare, nonparametrically, k different survival functions based on right‐censored survival data. This paper extends the $G^{\rho,\lambda }$ class to interval‐censored data. First we introduce a new general class of rank based tests, then we show the analogy to the above proposal of Fleming & Harrington. The asymptotic behaviour of the proposed tests is derived using an observed Fisher information approach and a permutation approach. Aiming to make this family of tests interpretable and useful for practitioners, we explain how to interpret different choices of weights and we apply it to data from a cohort of intravenous drug users at risk for HIV infection. The Canadian Journal of Statistics 40: 501–516; 2012 © 2012 Statistical Society of Canada  相似文献   

5.
We study estimation and feature selection problems in mixture‐of‐experts models. An $l_2$ ‐penalized maximum likelihood estimator is proposed as an alternative to the ordinary maximum likelihood estimator. The estimator is particularly advantageous when fitting a mixture‐of‐experts model to data with many correlated features. It is shown that the proposed estimator is root‐$n$ consistent, and simulations show its superior finite sample behaviour compared to that of the maximum likelihood estimator. For feature selection, two extra penalty functions are applied to the $l_2$ ‐penalized log‐likelihood function. The proposed feature selection method is computationally much more efficient than the popular all‐subset selection methods. Theoretically it is shown that the method is consistent in feature selection, and simulations support our theoretical results. A real‐data example is presented to demonstrate the method. The Canadian Journal of Statistics 38: 519–539; 2010 © 2010 Statistical Society of Canada  相似文献   

6.
In this paper, we extend the general minimum lower‐order confounding (GMC) criterion to the case of three‐level designs. First, we review the relationship between GMC and other criteria. Then we introduce an aliased component‐number pattern (ACNP) and a three‐level GMC criterion via the consideration of component effects, and obtain some results on the new criterion. All the 27‐run GMC designs, 81‐run GMC designs with factor numbers $n=5,\ldots,20$ and 243‐run GMC designs with resolution $IV$ or higher are tabulated. The Canadian Journal of Statistics 41: 192–210; 2013 © 2012 Statistical Society of Canada  相似文献   

7.
Statistical procedures for the detection of a change in the dependence structure of a series of multivariate observations are studied in this work. The test statistics that are proposed are $L_1$ , $L_2$ , and $L_{\infty }$ distances computed from vectors of differences of Kendall's tau; two multivariate extensions of Kendall's measure of association are used. Since the distributions of these statistics under the null hypothesis of no change depend on the unknown underlying copula of the vectors, a procedure based on the multiplier central limit theorem is used for the computation of p‐values; the method is shown to be valid both asymptotically and for moderate sample sizes. Alternative versions of the tests that take into account possible breakpoints in the marginal distributions are also investigated. Monte Carlo simulations show that the tests are powerful under many scenarios of change‐point. In addition, two estimators of the time of change are proposed and their efficiency is carefully studied. The methodologies are illustrated on simulated series from the Canadian Regional Climate Model. The Canadian Journal of Statistics 41: 65–82; 2013 © 2012 Statistical Society of Canada  相似文献   

8.
A contaminated beta model $(1-\gamma) B(1,1) + \gamma B(\alpha,\beta)$ is often used to describe the distribution of $P$ ‐values arising from a microarray experiment. The authors propose and examine a different approach: namely, using a contaminated normal model $(1-\gamma) N(0,\sigma^2) + \gamma N(\mu,\sigma^2)$ to describe the distribution of $Z$ statistics or suitably transformed $T$ statistics. The authors then address whether a researcher who has $Z$ statistics should analyze them using the contaminated normal model or whether the $Z$ statistics should be converted to $P$ ‐values to be analyzed using the contaminated beta model. The authors also provide a decision‐theoretic perspective on the analysis of $Z$ statistics. The Canadian Journal of Statistics 38: 315–332; 2010 © 2010 Statistical Society of Canada  相似文献   

9.
The Dantzig selector (Candès & Tao, 2007) is a popular $\ell^{1}$ ‐regularization method for variable selection and estimation in linear regression. We present a very weak geometric condition on the observed predictors which is related to parallelism and, when satisfied, ensures the uniqueness of Dantzig selector estimators. The condition holds with probability 1, if the predictors are drawn from a continuous distribution. We discuss the necessity of this condition for uniqueness and also provide a closely related condition which ensures the uniqueness of lasso estimators (Tibshirani, 1996). Large sample asymptotics for the Dantzig selector, that is, almost sure convergence and the asymptotic distribution, follow directly from our uniqueness results and a continuity argument. The limiting distribution of the Dantzig selector is generally non‐normal. Though our asymptotic results require that the number of predictors is fixed (similar to Knight & Fu, 2000), our uniqueness results are valid for an arbitrary number of predictors and observations. The Canadian Journal of Statistics 41: 23–35; 2013 © 2012 Statistical Society of Canada  相似文献   

10.
Testing goodness‐of‐fit of commonly used genetic models is of critical importance in many applications including association studies and testing for departure from Hardy–Weinberg equilibrium. Case–control design has become widely used in population genetics and genetic epidemiology, thus it is of interest to develop powerful goodness‐of‐fit tests for genetic models using case–control data. This paper develops a likelihood ratio test (LRT) for testing recessive and dominant models for case–control studies. The LRT statistic has a closed‐form formula with a simple $\chi^{2}(1)$ null asymptotic distribution, thus its implementation is easy even for genome‐wide association studies. Moreover, it has the same power and optimality as when the disease prevalence is known in the population. The Canadian Journal of Statistics 41: 341–352; 2013 © 2013 Statistical Society of Canada  相似文献   

11.
We extend the empirical likelihood beyond its domain by expanding its contours nested inside the domain with a similarity transformation. The extended empirical likelihood achieves two objectives at the same time: escaping the “convex hull constraint” on the empirical likelihood and improving the coverage accuracy of the empirical likelihood ratio confidence region to $O(n^{-2})$ . The latter is accomplished through a special transformation which matches the extended empirical likelihood with the Bartlett corrected empirical likelihood. The extended empirical likelihood ratio confidence region retains the shape of the original empirical likelihood ratio confidence region. It also accommodates adjustments for dimension and small sample size, giving it good coverage accuracy in large and small sample situations. The Canadian Journal of Statistics 41: 257–274; 2013 © 2013 Statistical Society of Canada  相似文献   

12.
The joint analysis of longitudinal measurements and survival data is useful in clinical trials and other medical studies. In this paper, we consider a joint model which assumes a linear mixed $tt$ model for longitudinal measurements and a promotion time cure model for survival data and links these two models through a latent variable. A semiparametric inference procedure with an EM algorithm implementation is developed for the parameters in the joint model. The proposed procedure is evaluated in a simulation study and applied to analyze the quality of life and time to recurrence data from a clinical trial on women with early breast cancer. The Canadian Journal of Statistics 40: 207–224; 2012 © 2012 Statistical Society of Canada  相似文献   

13.
This paper considers estimators of survivor functions subject to a stochastic ordering constraint based on right censored data. We present the constrained nonparametric maximum likelihood estimator (C‐NPMLE) of the survivor functions in one‐and two‐sample settings where the survivor distributions could be discrete or continuous and discuss the non‐uniqueness of the estimators. We also present a computationally efficient algorithm to obtain the C‐NPMLE. To address the possibility of non‐uniqueness of the C‐NPMLE of $S_1(t)$ when $S_1(t)\le S_2(t)$ , we consider the maximum C‐NPMLE (MC‐NPMLE) of $S_1(t)$ . In the one‐sample case with arbitrary upper bound survivor function $S_2(t)$ , we present a novel and efficient algorithm for finding the MC‐NPMLE of $S_1(t)$ . Dykstra ( 1982 ) also considered constrained nonparametric maximum likelihood estimation for such problems, however, as we show, Dykstra's method has an error and does not always give the C‐NPMLE. We corrected this error and simulation shows improvement in efficiency compared to Dykstra's estimator. Confidence intervals based on bootstrap methods are proposed and consistency of the estimators is proved. Data from a study on larynx cancer are analysed to illustrate the method. The Canadian Journal of Statistics 40: 22–39; 2012 © 2012 Statistical Society of Canada  相似文献   

14.
15.
This paper deals with a bias correction of Akaike's information criterion (AIC) for selecting variables in multivariate normal linear regression models when the true distribution of observation is an unknown non‐normal distribution. It is well known that the bias of AIC is $O(1)$ , and there are a number of the first‐order bias‐corrected AICs which improve the bias to $O(n^{-1})$ , where $n$ is the sample size. A new information criterion is proposed by slightly adjusting the first‐order bias‐corrected AIC. Although the adjustment is achieved by merely using constant coefficients, the bias of the new criterion is reduced to $O(n^{-2})$ . Then, a variance of the new criterion is also improved. Through numerical experiments, we verify that our criterion is superior to others. The Canadian Journal of Statistics 39: 126–146; 2011 © 2011 Statistical Society of Canada  相似文献   

16.
We consider the problem of selecting variables in factor analysis models. The $L_1$ regularization procedure is introduced to perform an automatic variable selection. In the factor analysis model, each variable is controlled by multiple factors when there are more than one underlying factor. We treat parameters corresponding to the multiple factors as grouped parameters, and then apply the group lasso. Furthermore, the weight of the group lasso penalty is modified to obtain appropriate estimates and improve the performance of variable selection. Crucial issues in this modeling procedure include the selection of the number of factors and a regularization parameter. Choosing these parameters can be viewed as a model selection and evaluation problem. We derive a model selection criterion for evaluating the factor analysis model via the weighted group lasso. Monte Carlo simulations are conducted to investigate the effectiveness of the proposed procedure. A real data example is also given to illustrate our procedure. The Canadian Journal of Statistics 40: 345–361; 2012 © 2012 Statistical Society of Canada  相似文献   

17.
Panel count data occur in many fields and a number of approaches have been developed. However, most of these approaches are for situations where there is no terminal event and the observation process is independent of the underlying recurrent event process unconditionally or conditional on the covariates. In this paper, we discuss a more general situation where the observation process is informative and there exists a terminal event which precludes further occurrence of the recurrent events of interest. For the analysis, a semiparametric transformation model is presented for the mean function of the underlying recurrent event process among survivors. To estimate the regression parameters, an estimating equation approach is proposed in which an inverse survival probability weighting technique is used. The asymptotic distribution of the proposed estimates is provided. Simulation studies are conducted and suggest that the proposed approach works well for practical situations. An illustrative example is provided. The Canadian Journal of Statistics 41: 174–191; 2013 © 2012 Statistical Society of Canada  相似文献   

18.
If (X1,Y1), …, (Xn,Yn) is a sequence of independent identically distributed Rd × R-valued random vectors then Nadaraya (1964) and Watson (1964) proposed to estimate the regression function m(x) = ? {Y1|X1 = x{ by where K is a known density and {hn} is a sequence of positive numbers satisfying certain properties. In this paper a variety of conditions are given for the strong convergence to 0 of essXsup|mn (X)-m(X)| (here X is independent of the data and distributed as X1). The theorems are valid for all distributions of X1 and for all sequences {hn} satisfying hn → 0 and nh/log n→0.  相似文献   

19.
A new test is proposed for the hypothesis of uniformity on bi‐dimensional supports. The procedure is an adaptation of the “distance to boundary test” (DB test) proposed in Berrendero, Cuevas, & Vázquez‐Grande (2006). This new version of the DB test, called DBU test, allows us (as a novel, interesting feature) to deal with the case where the support S of the underlying distribution is unknown. This means that S is not specified in the null hypothesis so that, in fact, we test the null hypothesis that the underlying distribution is uniform on some support S belonging to a given class ${\cal C}$ . We pay special attention to the case that ${\cal C}$ is either the class of compact convex supports or the (broader) class of compact λ‐convex supports (also called r‐convex or α‐convex in the literature). The basic idea is to apply the DB test in a sort of plug‐in version, where the support S is approximated by using methods of set estimation. The DBU method is analysed from both the theoretical and practical point of view, via some asymptotic results and a simulation study, respectively. The Canadian Journal of Statistics 40: 378–395; 2012 © 2012 Statistical Society of Canada  相似文献   

20.
Lachenbruch ( 1976 , 2001 ) introduced two‐part tests for comparison of two means in zero‐inflated continuous data. We are extending this approach and compare k independent distributions (by comparing their means, either overall or the departure from equal proportion of zeros and equal means of nonzero values) by introducing two tests: a two‐part Wald test and a two‐part likelihood ratio test. If the continuous part of the distributions is lognormal then the proposed two test statistics have asymptotically chi‐square distribution with $2(k-1)$ degrees of freedom. A simulation study was conducted to compare the performance of the proposed tests with several well‐known tests such as ANOVA, Welch ( 1951 ), Brown & Forsythe ( 1974 ), Kruskal–Wallis, and one‐part Wald test proposed by Tu & Zhou ( 1999 ). Results indicate that the proposed tests keep the nominal type I error and have consistently best power among all tests being compared. An application to rainfall data is provided as an example. The Canadian Journal of Statistics 39: 690–702; 2011. © 2011 Statistical Society of Canada  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号