首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 680 毫秒
1.
Let (X, Y) be a bivariate random vector with joint distribution function FX, Y(x, y) = C(F(x), G(y)), where C is a copula and F and G are marginal distributions of X and Y, respectively. Suppose that (Xi, Yi), i = 1, 2, …, n is a random sample from (X, Y) but we are able to observe only the data consisting of those pairs (Xi, Yi) for which Xi ? Yi. We denote such pairs as (X*i, Yi*), i = 1, 2, …, ν, where ν is a random variable. The main problem of interest is to express the distribution function FX, Y(x, y) and marginal distributions F and G with the distribution function of observed random variables X* and Y*. It is shown that if X and Y are exchangeable with marginal distribution function F, then F can be uniquely determined by the distributions of X* and Y*. It is also shown that if X and Y are independent and absolutely continuous, then F and G can be expressed through the distribution functions of X* and Y* and the stress–strength reliability P{X ? Y}. This allows also to estimate P{X ? Y} with the truncated observations (X*i, Yi*). The copula of bivariate random vector (X*, Y*) is also derived.  相似文献   

2.
We consider the situation where there is a known regression model that can be used to predict an outcome, Y, from a set of predictor variables X . A new variable B is expected to enhance the prediction of Y. A dataset of size n containing Y, X and B is available, and the challenge is to build an improved model for Y| X ,B that uses both the available individual level data and some summary information obtained from the known model for Y| X . We propose a synthetic data approach, which consists of creating m additional synthetic data observations, and then analyzing the combined dataset of size n + m to estimate the parameters of the Y| X ,B model. This combined dataset of size n + m now has missing values of B for m of the observations, and is analyzed using methods that can handle missing data (e.g., multiple imputation). We present simulation studies and illustrate the method using data from the Prostate Cancer Prevention Trial. Though the synthetic data method is applicable to a general regression context, to provide some justification, we show in two special cases that the asymptotic variances of the parameter estimates in the Y| X ,B model are identical to those from an alternative constrained maximum likelihood estimation approach. This correspondence in special cases and the method's broad applicability makes it appealing for use across diverse scenarios. The Canadian Journal of Statistics 47: 580–603; 2019 © 2019 Statistical Society of Canada  相似文献   

3.
This article derives the likelihood ratio statistic to test the independence between (X 1,…,X r ) and (X r+1,…,X k ) under the assumption that (X 1,…,X k ) has a multivariate normal distribution and that a sample of size n is available, where for N observation vectors all components are available, while for M = (n + N) observation vectors, the data on the last q components, (Xk-q+1,…,X k ) are missing (k+q≥r).  相似文献   

4.
In the parametric regression model, the covariate missing problem under missing at random is considered. It is often desirable to use flexible parametric or semiparametric models for the covariate distribution, which can reduce a potential misspecification problem. Recently, a completely nonparametric approach was developed by [H.Y. Chen, Nonparametric and semiparametric models for missing covariates in parameter regression, J. Amer. Statist. Assoc. 99 (2004), pp. 1176–1189; Z. Zhang and H.E. Rockette, On maximum likelihood estimation in parametric regression with missing covariates, J. Statist. Plann. Inference 47 (2005), pp. 206–223]. Although it does not require a model for the covariate distribution or the missing data mechanism, the proposed method assumes that the covariate distribution is supported only by observed values. Consequently, their estimator is a restricted maximum likelihood estimator (MLE) rather than the global MLE. In this article, we show the restricted semiparametric MLE could be very misleading in some cases. We discuss why this problem occurs and suggest an algorithm to obtain the global MLE. Then, we assess the performance of the proposed method via some simulation experiments.  相似文献   

5.
Missing values are common in longitudinal data studies. The missing data mechanism is termed non-ignorable (NI) if the probability of missingness depends on the non-response (missing) observations. This paper presents a model for the ordinal categorical longitudinal data with NI non-monotone missing values. We assumed two separate models for the response and missing procedure. The response is modeled as ordinal logistic, whereas the logistic binary model is considered for the missing process. We employ these models in the context of so-called shared-parameter models, where the outcome and missing data models are connected by a common set of random effects. It is commonly assumed that the random effect follows the normal distribution in longitudinal data with or without missing data. This can be extremely restrictive in practice, and it may result in misleading statistical inferences. In this paper, we instead adopt a more flexible alternative distribution which is called the skew-normal distribution. The methodology is illustrated through an application to Schizophrenia Collaborative Study data [19 D. Hedeker, Generalized linear mixed models, in Encyclopedia of Statistics in Behavioral Science, B. Everitt and D. Howell, eds., John Wiley, London, 2005, pp. 729738. [Google Scholar]] and a simulation.  相似文献   

6.
Using the framework proposed by Bickel et al. (2006 Bickel , P. J. , Ritov , Y. , Stoker , T. ( 2006 ). Tailor-made tests for goodness-of-fit to semiparametric hypotheses . Ann. Stat. 34 ( 2 ): 721741 . [Google Scholar]), we provide a score-based testing method to check the exclusion restriction in quantile regression, i.e., H: να(Y|U, V) = να(Y|U) w.p.1, where να denotes the αth (0 < α < 1) quantile. A subsampling method is suggested to acquire the critical values and justified. The tests are all found to be consistent against fixed alternatives and have discriminating power against local alternatives at root-n scale. We address this particular problem as a representative among a wide family of semiparametric model checking problems. The methodology can be carried over to other goodness-of-fit testing of semiparametric models, possibly involve non smooth functions.  相似文献   

7.
8.
We consider the semiparametric regression model introduced by [1] Duan, N. and Li, K. C. 1991. Slicing regression: a link-free regression method. The Annals of Statistics, 19: 505530. [Crossref], [Web of Science ®] [Google Scholar]. The dependent variable y is linked to the index x′ β through an unknown link function. [1] Duan, N. and Li, K. C. 1991. Slicing regression: a link-free regression method. The Annals of Statistics, 19: 505530. [Crossref], [Web of Science ®] [Google Scholar] and [2] Li, K. C. 1991. Sliced inverse regression for dimension reduction, with discussions. Journal of the American Statistical Association, 86: 316342. [Taylor & Francis Online], [Web of Science ®] [Google Scholar] present Slicing methods (the Sliced Inverse Regression methods SIR-I, SIR-II and SIRα) in order to estimate the direction of the unknown slope parameter β. These methods are computationally simple and fast but depend on the choice of an arbitrary slicing fixed by the user. When the sample size is small, the number and the position of slices have an influence on the estimated direction. In this paper, we suggest to use the corresponding Pooled Slicing methods: PSIR-I (proposed by [3] Aragon, Y. and Saracco, J. 1997. Sliced Inverse Regression (SIR): an appraisal of small sample alternatives to slicing. Computational Statistics, 12: 109130. [Web of Science ®] [Google Scholar]), PSIR-II and PSIRα. These methods combine the results from a number of slicings. We compare the sample behaviour of Slicing and Pooled Slicing methods on simulations. We also propose a practical choice of α in SIRα and PSIRα methods.  相似文献   

9.
In the literature, there were only a few reports on goodness-of-fit tests on logistic regression models specifically derived for case-control studies. In this article, we propose a goodness-of-fit test for logistic regression models in stratified case-control studies using an empirical likelihood approach. The proposed statistic is an alternative to the statistic G o , recently proposed by Arbigast and Lin (2005 Arbigast , P. G. , Lin , D. Y. ( 2005 ). Model-checking techniques for stratified case-control studies . Statist. Med. 24 : 229247 . [Google Scholar]). Simulation results show that the proposed statistic is often slightly more powerful than G o , although their performances are always close to each other. Moreover, implementation of our method is easy since the usual stratified logistic regression procedures in many statistical softwares can be employed. Some asymptotic results and application of the proposed statistic to two real datasets are also presented.  相似文献   

10.
Case-control data are often used in medical-related applications, and most studies have applied parametric logistic regression to analyze such data. In this study, we investigated a semiparametric model for the analysis of case-control data by relaxing the linearity assumption of risk factors by using a partial smoothing spline model. A faster computation method for the model by extending the lower-dimensional approximation approach of Gu and Kim 4 Gu, C. and Kim, Y.-J. 2002. Penalized likelihood regression: General formulation and efficient approximation. Canad. J. Statist., 30: 619628. [Crossref], [Web of Science ®] [Google Scholar] developed in penalized likelihood regression is considered to apply to case-control studies. Simulations were conducted to evaluate the performance of the method with selected smoothing parameters and to compare the method with existing methods. The method was applied to Korean gastric cancer case-control data to estimate the nonparametric probability function of age and regression parameters for other categorical risk factors simultaneously. The method could be used in preliminary studies to identify whether there is a flexible function form of risk factors in the semiparametric logistic regression analysis involving a large data set.  相似文献   

11.
Abstract. Suppose the random vector (X,Y) satisfies the regression model Y = m(X) + σ (X) ? , where m (?) and σ (?) are unknown location and scale functions and ? is independent of X. The response Y is subject to random right censoring, and the covariate X is completely observed. A new test for a specific parametric form of any scale function σ (?) (including the standard deviation function) is proposed. Its statistic is based on the distribution of the residuals obtained from the assumed regression model. Weak convergence of the corresponding process is obtained, and its finite sample behaviour is studied via simulations. Finally, characteristics of the test are illustrated in the analysis of a fatigue data set.  相似文献   

12.
For the balanced two-way layout of a count response variable Y classified by fixed or random factors A and B, we address the problems of (i) testing for individual and interactive effects on Y of two fixed factors, and (ii) testing for the effect of a fixed factor in the presence of a random factor and conversely. In case (i), we assume independent Poisson responses with µij= E(Y| A=i,B=j) = αiβjγij corresponding respectively to the multiplicative

interactive and non-interactive cases. For case (ii) with factor A random, we derive a multivariate gamma-Poisson model by mixing on the random variable associated with each level of A. In each case Neyman C(α) score tests are derived. We present simulation results,and apply the interaction test to a data set, to evaluate and compare the size and power of the score test for interaction between two fixed factors, the competing Poisson-based likelihood ratio test, and the F-tests based on the assumptions that √Y+1 or log(Y+1) are approximately normal. Our results provide strong evidence that the normal-theory based F-tests typically are very far from nominal size, and that the likelihood ratio test is somewhat more liberal than the score test.  相似文献   

13.
We propose an 1-regularized likelihood method for estimating the inverse covariance matrix in the high-dimensional multivariate normal model in presence of missing data. Our method is based on the assumption that the data are missing at random (MAR) which entails also the completely missing at random case. The implementation of the method is non-trivial as the observed negative log-likelihood generally is a complicated and non-convex function. We propose an efficient EM algorithm for optimization with provable numerical convergence properties. Furthermore, we extend the methodology to handle missing values in a sparse regression context. We demonstrate both methods on simulated and real data.  相似文献   

14.
The linear model Y - N(Xb, σ2∑) with a restriction R'b = M'u + c is considered, where X, R, M, ∑ and c are known. Explicit formulae are obtained for the best linear unbiased estimator of K'b, for the F-test of the hypothesis K'b = W'v + a, and for the simultaneous confidence intervals of the parameters K′i b' s, where K = [K1,K2,…Ks], w, and a are known, none of the matrices X, ∑, R, M, K, and W is required to have full ranks, and the design X can be one - or multi-way,complete or incomplete, balanced or not balanced, connected or disconnected.  相似文献   

15.
We examined the impact of different methods for replacing missing data in discriminant analyses conducted on randomly generated samples from multivariate normal and non-normal distributions. The probabilities of correct classification were obtained for these discriminant analyses before and after randomly deleting data as well as after deleted data were replaced using: (1) variable means, (2) principal component projections, and (3) the EM algorithm. Populations compared were: (1) multivariate normal with covariance matrices ∑1=∑2, (2) multivariate normal with ∑1≠∑2 and (3) multivariate non-normal with ∑1=∑2. Differences in the probabilities of correct classification were most evident for populations with small Mahalanobis distances or high proportions of missing data. The three replacement methods performed similarly but all were better than non - replacement.  相似文献   

16.
Let X be a normally distributed p-dimensional column vector with mean μ and positive definite covariance matrix σ. and let X α, α = 1,…, N, be a random sample of size N from this distribution. Partition X as ( X 1, X (2)', X '(3))', where X1 is one-dimension, X(2) is p2- dimensional, and so 1 + p1 + p2 = p. Let ρ1 and ρ be the multiple correlation coefficients of X1 with X(2) and with ( X '(2), X '(3))', respectively. Write ρ2/2 = ρ2 - ρ2/1. We shall cosider the following two problems  相似文献   

17.
In comparing several regressions E(yij) =αi + βixij i = 1, 2, ..., k, j = 1,2, ..., ni, researchers are generally interested in the following five problems: whether they have (1) equal slope, (2) equal intercept, (3) coincidence, (4) common intersection on X-axis, and (5) common intersection on (X,Y) - plane. Problems (1) - (3) can be put into the framework of the general linear hypothesis and the F-test can be used. However, problems (4) and (5) cannot be put into the general linear hypothesis because they are ratios of parameters. Hence, in this paper we consider the generalized likelihood ratio test for hypothesis testing. An application to an enzyme kinetics problem in Aniline Metabolism is demonstrated  相似文献   

18.
This paper considers statistical inference for partially linear models Y = X ? β +ν(Z) +? when the linear covariate X is missing with missing probability π depending upon (Y, Z). We propose empirical likelihood‐based statistics to construct confidence regions for β and ν(z). The resulting empirical likelihood ratio statistics are shown to be asymptotically chi‐squared‐distributed. The finite‐sample performance of the proposed statistics is assessed by simulation experiments. The proposed methods are applied to a dataset from an AIDS clinical trial.  相似文献   

19.
For the general linear regression model Y = Xη + e, we construct small-sample exponentially tilted empirical confidence intervals for a linear parameter 6 = aTη and for nonlinear functions of η. The coverage error for the intervals is Op(1/n), as shown in Tingley and Field (1990). The technique, though sample-based, does not require bootstrap resampling. The first step is calculation of an estimate for η. We have used a Mallows estimate. The algorithm applies whenever η is estimated as the solution of a system of equations having expected value 0. We include calculations of the relative efficiency of the estimator (compared with the classical least-squares estimate). The intervals are compared with asymptotic intervals as found, for example, in Hampel et at. (1986). We demonstrate that the procedure gives sensible intervals for small samples.  相似文献   

20.
Expressions are derived for the bias and variance associated with procedures frequently used to estimate partial regression coefficients in a linear model having the two explanatory variables x 1 and x 2, with missing values on x 2 only. The expressions are used to help gain insight into the relative effectiveness of these procedures for handling more complex patterns of missing data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号