首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper we design a sure independent ranking and screening procedure for censored regression (cSIRS, for short) with ultrahigh dimensional covariates. The inverse probability weighted cSIRS procedure is model-free in the sense that it does not specify a parametric or semiparametric regression function between the response variable and the covariates. Thus, it is robust to model mis-specification. This model-free property is very appealing in ultrahigh dimensional data analysis, particularly when there is lack of information for the underlying regression structure. The cSIRS procedure is also robust in the presence of outliers or extreme values as it merely uses the rank of the censored response variable. We establish both the sure screening and the ranking consistency properties for the cSIRS procedure when the number of covariates p satisfies \(p=o\{\exp (an)\}\), where a is a positive constant and n is the available sample size. The advantages of cSIRS over existing competitors are demonstrated through comprehensive simulations and an application to the diffuse large-B-cell lymphoma data set.  相似文献   

2.
This paper presents a graphical technique for detecting influential cases in regression analysis. The idea is to decompose a diagnostic problem involving higher order dimensional regression problems, into a series of two-dimensional diagnostic sub-problems, such that the diagnoses of influential cases is undertaken by visually inspecting two-dimensional diagnostic plots of these sub-problems. An algorithm for the graphical procedure is proposed to reduce the computational effort. Practical examples are used to illustrate this graphical technique.  相似文献   

3.
Let (X1, X2, Y1, Y2) be a four dimensional random variable having the joint probability density function f(x1, x2, y1, y2). In this paper we consider the problem of estimating the regression function \({{E[(_{Y_2 }^{Y_1 } )} \mathord{\left/ {\vphantom {{E[(_{Y_2 }^{Y_1 } )} {_{X_2 = X_2 }^{X_1 = X_1 } }}} \right. \kern-0em} {_{X_2 = X_2 }^{X_1 = X_1 } }}]\) on the basis of a random sample of size n. We have proved that under certain regularity conditions the kernel estimate of this regression function is uniformly strongly consistent. We have also shown that under certain conditions the estimate is asymptotically normally distributed.  相似文献   

4.
We address the issue of recovering the structure of large sparse directed acyclic graphs from noisy observations of the system. We propose a novel procedure based on a specific formulation of the \(\ell _1\)-norm regularized maximum likelihood, which decomposes the graph estimation into two optimization sub-problems: topological structure and node order learning. We provide convergence inequalities for the graph estimator, as well as an algorithm to solve the induced optimization problem, in the form of a convex program embedded in a genetic algorithm. We apply our method to various data sets (including data from the DREAM4 challenge) and show that it compares favorably to state-of-the-art methods. This algorithm is available on CRAN as the R package GADAG.  相似文献   

5.
In many applications, the cumulative distribution function (cdf) \(F_{Q_N}\) of a positively weighted sum of N i.i.d. chi-squared random variables \(Q_N\) is required. Although there is no known closed-form solution for \(F_{Q_N}\), there are many good approximations. When computational efficiency is not an issue, Imhof’s method provides a good solution. However, when both the accuracy of the approximation and the speed of its computation are a concern, there is no clear preferred choice. Previous comparisons between approximate methods could be considered insufficient. Furthermore, in streaming data applications where the computation needs to be both sequential and efficient, only a few of the available methods may be suitable. Streaming data problems are becoming ubiquitous and provide the motivation for this paper. We develop a framework to enable a much more extensive comparison between approximate methods for computing the cdf of weighted sums of an arbitrary random variable. Utilising this framework, a new and comprehensive analysis of four efficient approximate methods for computing \(F_{Q_N}\) is performed. This analysis procedure is much more thorough and statistically valid than previous approaches described in the literature. A surprising result of this analysis is that the accuracy of these approximate methods increases with N.  相似文献   

6.
Essential graphs and largest chain graphs are well-established graphical representations of equivalence classes of directed acyclic graphs and chain graphs respectively, especially useful in the context of model selection. Recently, the notion of a labelled block ordering of vertices was introduced as a flexible tool for specifying subfamilies of chain graphs. In particular, both the family of directed acyclic graphs and the family of “unconstrained” chain graphs can be specified in this way, for the appropriate choice of . The family of chain graphs identified by a labelled block ordering of vertices is partitioned into equivalence classes each represented by means of a -essential graph. In this paper, we introduce a topological ordering of meta-arrows and use this concept to devise an efficient procedure for the construction of -essential graphs. In this way we also provide an efficient procedure for the construction of both largest chain graphs and essential graphs. The key feature of the proposed procedure is that every meta-arrow needs to be processed only once.  相似文献   

7.
This paper proposes a method for obtaining the exact probability of occurrence of the first success run of specified length with the additional constraint that at every trial until the occurrence of the first success run the number of successes up to the trial exceeds that of failures. For the sake of the additional constraint, the problem cannot be solved by the usual method of conditional probability generating functions. An idea of a kind of truncation is introduced and studied in order to solve the problem. Concrete methods for obtaining the probability in the cases of Bernoulli trials and time-homogeneous {0,1}{0,1}-valued Markov dependent trials are given. As an application of the results, a modification of the start-up demonstration test is studied. Numerical examples which illustrate the feasibility of the results are also given.  相似文献   

8.
A joint estimation approach for multiple high‐dimensional Gaussian copula graphical models is proposed, which achieves estimation robustness by exploiting non‐parametric rank‐based correlation coefficient estimators. Although we focus on continuous data in this paper, the proposed method can be extended to deal with binary or mixed data. Based on a weighted minimisation problem, the estimators can be obtained by implementing second‐order cone programming. Theoretical properties of the procedure are investigated. We show that the proposed joint estimation procedure leads to a faster convergence rate than estimating the graphs individually. It is also shown that the proposed procedure achieves an exact graph structure recovery with probability tending to 1 under certain regularity conditions. Besides theoretical analysis, we conduct numerical simulations to compare the estimation performance and graph recovery performance of some state‐of‐the‐art methods including both joint estimation methods and estimation methods for individuals. The proposed method is then applied to a gene expression data set, which illustrates its practical usefulness.  相似文献   

9.
This paper addresses the issue of estimating the expectation of a real-valued random variable of the form \(X = g(\mathbf {U})\) where g is a deterministic function and \(\mathbf {U}\) can be a random finite- or infinite-dimensional vector. Using recent results on rare event simulation, we propose a unified framework for dealing with both probability and mean estimation for such random variables, i.e. linking algorithms such as Tootsie Pop Algorithm or Last Particle Algorithm with nested sampling. Especially, it extends nested sampling as follows: first the random variable X does not need to be bounded any more: it gives the principle of an ideal estimator with an infinite number of terms that is unbiased and always better than a classical Monte Carlo estimator—in particular it has a finite variance as soon as there exists \(k \in \mathbb {R}> 1\) such that \({\text {E}}\left[ X^k \right] < \infty \). Moreover we address the issue of nested sampling termination and show that a random truncation of the sum can preserve unbiasedness while increasing the variance only by a factor up to 2 compared to the ideal case. We also build an unbiased estimator with fixed computational budget which supports a Central Limit Theorem and discuss parallel implementation of nested sampling, which can dramatically reduce its running time. Finally we extensively study the case where X is heavy-tailed.  相似文献   

10.
Given a prior distribution for a model , the prior information specified on a nested submodel by means of a conditioning procedure crucially depends on the parameterisation used to describe the model. Regression coefficients represent the most common parameterisation of Gaussian DAG models. Nevertheless, in the specification of prior distributions, invariance considerations lead to the use of different parameterisations of the model, depending on the required invariance class. In this paper we consider the problem of prior specification by conditioning on zero regression coefficients and show that also such a procedure satisfies the property of invariance with respect to a class of parameterisations and characterise such a class.  相似文献   

11.
Let \(X_1 ,X_2 ,\ldots ,X_n \) be a sequence of Markov Bernoulli trials (MBT) and \(\underline{X}_n =( {X_{n,k_1 } ,X_{n,k_2 } ,\ldots ,X_{n,k_r } })\) be a random vector where \(X_{n,k_i } \) represents the number of occurrences of success runs of length \(k_i \,( {i=1,2,\ldots ,r})\) . In this paper the joint distribution of \(\underline{X}_n \) in the sequence of \(n\) MBT is studied using method of conditional probability generating functions. Five different counting schemes of runs namely non-overlapping runs, runs of length at least \(k\) , overlapping runs, runs of exact length \(k\) and \(\ell \) -overlapping runs (i.e. \(\ell \) -overlapping counting scheme), \(0\le \ell are considered. The pgf of joint distribution of \(\underline{X}_n \) is obtained in terms of matrix polynomial and an algorithm is developed to get exact probability distribution. Numerical results are included to demonstrate the computational flexibility of the developed results. Various applications of the joint distribution of \(\underline{X}_n \) such as in evaluation of the reliability of \(( {n,f,k})\!\!:\!\!G\) and \(\!:\!\!G\) system, in evaluation of quantities related to start-up demonstration tests, acceptance sampling plans are also discussed.  相似文献   

12.
LetF(x,y) be a distribution function of a two dimensional random variable (X,Y). We assume that a distribution functionF x(x) of the random variableX is known. The variableX will be called an auxiliary variable. Our purpose is estimation of the expected valuem=E(Y) on the basis of two-dimensional simple sample denoted by:U=[(X 1, Y1)…(Xn, Yn)]=[X Y]. LetX=[X 1X n]andY=[Y 1Y n].This sample is drawn from a distribution determined by the functionF(x,y). LetX (k)be the k-th (k=1, …,n) order statistic determined on the basis of the sampleX. The sampleU is truncated by means of this order statistic into two sub-samples: % MathType!End!2!1! and % MathType!End!2!1!.Let % MathType!End!2!1! and % MathType!End!2!1! be the sample means from the sub-samplesU k,1 andU k,2, respectively. The linear combination % MathType!End!2!1! of these means is the conditional estimator of the expected valuem. The coefficients of this linear combination depend on the distribution function of auxiliary variable in the pointx (k).We can show that this statistic is conditionally as well as unconditionally unbiased estimator of the averagem. The variance of this estimator is derived. The variance of the statistic % MathType!End!2!1! is compared with the variance of the order sample mean. The generalization of the conditional estimation of the mean is considered, too.  相似文献   

13.
Assume that a linear random-effects model \(\mathbf{y}= \mathbf{X}\varvec{\beta }+ \varvec{\varepsilon }= \mathbf{X}(\mathbf{A}\varvec{\alpha }+ \varvec{\gamma }) + \varvec{\varepsilon }\) is transformed as \(\mathbf{T}\mathbf{y}= \mathbf{T}\mathbf{X}\varvec{\beta }+ \mathbf{T}\varvec{\varepsilon }= \mathbf{T}\mathbf{X}(\mathbf{A}\varvec{\alpha }+ \varvec{\gamma }) + \mathbf{T}\varvec{\varepsilon }\) by pre-multiplying a given matrix \(\mathbf{T}\) of arbitrary rank. These two models are not necessarily equivalent unless \(\mathbf{T}\) is of full column rank, and we have to work with this derived model in many situations. Because predictors/estimators of the parameter spaces under the two models are not necessarily the same, it is primary work to compare predictors/estimators in the two models and to establish possible links between the inference results obtained from two models. This paper presents a general algebraic approach to the problem of comparing best linear unbiased predictors (BLUPs) of parameter spaces in an original linear random-effects model and its transformations, and provides a group of fundamental and comprehensive results on mathematical and statistical properties of the BLUPs. In particular, we construct many equalities for the BLUPs under an original linear random-effects model and its transformations, and obtain necessary and sufficient conditions for the equalities to hold.  相似文献   

14.
In this article, we introduce two new estimates of the normalizing constant (or marginal likelihood) for partially observed diffusion (POD) processes, with discrete observations. One estimate is biased but non-negative and the other is unbiased but not almost surely non-negative. Our method uses the multilevel particle filter of Jasra et al. (Multilevel particle lter, arXiv:1510.04977, 2015). We show that, under assumptions, for Euler discretized PODs and a given \(\varepsilon >0\) in order to obtain a mean square error (MSE) of \({\mathcal {O}}(\varepsilon ^2)\) one requires a work of \({\mathcal {O}}(\varepsilon ^{-2.5})\) for our new estimates versus a standard particle filter that requires a work of \({\mathcal {O}}(\varepsilon ^{-3})\). Our theoretical results are supported by numerical simulations.  相似文献   

15.
We study the statistical performance of different tests for comparing the mean effect of two treatments. Given a reference classical test \({\mathcal {T}}_0\), we determine which sample size and proportion allocation guarantee to a test \({\mathcal {T}}\), based on response-adaptive design, to be better than \({\mathcal {T}}_0\), in terms of (a) higher power and (b) fewer subjects assigned to the inferior treatment. The adoption of a response-adaptive design to implement the random allocation procedure is necessary to ensure that both (a) and (b) are satisfied. In particular, we propose to use a Modified Randomly Reinforced Urn design and we show how to perform the model parameters selection for the purpose of this paper. Then, the opportunity of relaxing some assumptions on treatment response distributions is presented. Results of simulation studies on the test performance are reported and a real case study is analyzed.  相似文献   

16.
R. Göb 《Statistical Papers》1992,33(1):273-277
In elementary probability theory, as a result of a limiting process the probabilities of aBi(n, p) binomial distribution are approximated by the probabilities of aPo(np) Poisson distribution. Accordingly, in statistical quality control the binomial operating characteristic function \(\mathcal{L}_{n,c} (p)\) is approximated by the Poisson operating characteristic function \(\mathcal{F}_{n,c} (p)\) . The inequality \(\mathcal{L}_{n + 1,c + 1} (p) > \mathcal{L}_{n,c} (p)\) forp∈(0;1) is evident from the interpretation of \(\mathcal{L}_{n + 1,c + 1} (p)\) , \(\mathcal{L}_{n,c} (p)\) as probabilities of accepting a lot. It is shown that the Poisson approximation \(\mathcal{F}_{n,c} (p)\) preserves this essential feature of the binomial operating characteristic function, i.e. that an analogous inequality holds for the Poisson operating characteristic function, too.  相似文献   

17.
For the counting process N={N(t), t≥0} and the probability that a device survives the first k shocks \(\bar P_k \) , the probability that the device survives beyond t that is \(\bar H(t) = \sum\limits_{k = 0}^\omega {P(N(t) = k)} \bar P_k \) is considered. The survival \(\bar H(t)\) is proved to have the new better (worse) than used renewal failure rate and the new better (worse) than average failure rate properties under, some conditions on N and \((\bar P_k )_{k = \rho }^\omega \) . In particular we study the survival probability when N is a nonhomogeneous Poisson process or birth process. Acumulative damage model and Laplace transform characterization for properties are investigated. Further the generating functions for these renewal failure rates properties are given.  相似文献   

18.
The results of quantile smoothing often show crossing curves, in particular, for small data sets. We define a surface, called a quantile sheet, on the domain of the independent variable and the probability. Any desired quantile curve is obtained by evaluating the sheet for a fixed probability. This sheet is modeled by $P$ -splines in form of tensor products of $B$ -splines with difference penalties on the array of coefficients. The amount of smoothing is optimized by cross-validation. An application for reference growth curves for children is presented.  相似文献   

19.
In analyzing interval censored data, a non-parametric estimator is often desired due to difficulties in assessing model fits. Because of this, the non-parametric maximum likelihood estimator (NPMLE) is often the default estimator. However, the estimates for values of interest of the survival function, such as the quantiles, have very large standard errors due to the jagged form of the estimator. By forcing the estimator to be constrained to the class of log concave functions, the estimator is ensured to have a smooth survival estimate which has much better operating characteristics than the unconstrained NPMLE, without needing to specify a parametric family or smoothing parameter. In this paper, we first prove that the likelihood can be maximized under a finite set of parameters under mild conditions, although the log likelihood function is not strictly concave. We then present an efficient algorithm for computing a local maximum of the likelihood function. Using our fast new algorithm, we present evidence from simulated current status data suggesting that the rate of convergence of the log-concave estimator is faster (between \(n^{2/5}\) and \(n^{1/2}\)) than the unconstrained NPMLE (between \(n^{1/3}\) and \(n^{1/2}\)).  相似文献   

20.
In this paper we introduce the distribution of , with c >  0, where X i , i =  1, 2, are independent generalized beta-prime-distributed random variables, and establish a closed form expression of its density. This distribution has as its limiting case the generalized beta type I distribution recently introduced by Nadarajah and Kotz (2004). Due to the presence of several parameters the density can take a wide variety of shapes.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号