首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
ABSTRACT

We introduce a score-type statistic to test for a non-zero regression coefficient when the relevant term involves a nuisance parameter present only under the alternative. Despite the non-regularity and complexity of the problem and unlike the previous approaches, the proposed test statistic does not require the nuisance to be estimated. It is simple to implement by relying on the conventional distributions, such as Normal or t, and it justified in the setting of probabilistic coherence. We focus on testing for the existence of a breakpoint in segmented regression, and illustrate the methodology with an analysis on data of DNA copy number aberrations and gene expression profiles from 97 breast cancer patients; moreover some simulations reveal that the proposed test is more powerful than its competitors previously discussed in literature.  相似文献   

2.
ABSTRACT

Missing data are commonly encountered in self-reported measurements and questionnaires. It is crucial to treat missing values using appropriate method to avoid bias and reduction of power. Various types of imputation methods exist, but it is not always clear which method is preferred for imputation of data with non-normal variables. In this paper, we compared four imputation methods: mean imputation, quantile imputation, multiple imputation, and quantile regression multiple imputation (QRMI), using both simulated and real data investigating factors affecting self-efficacy in breast cancer survivors. The results displayed an advantage of using multiple imputation, especially QRMI when data are not normal.  相似文献   

3.
ABSTRACT

We propose parametric inferences for quantile event times with adjustment for covariates on competing risks data. We develop parametric quantile inferences using parametric regression modeling of the cumulative incidence function from the cause-specific hazard and direct approaches. Maximum likelihood inferences are developed for estimation of the cumulative incidence function and quantiles. We develop the construction of parametric confidence intervals for quantiles. Simulation studies show that the proposed methods perform well. We illustrate the methods using early stage breast cancer data.  相似文献   

4.
This paper presents a methodology for model fitting and inference in the context of Bayesian models of the type f(Y | X,θ)f(X|θ)f(θ), where Y is the (set of) observed data, θ is a set of model parameters and X is an unobserved (latent) stationary stochastic process induced by the first order transition model f(X (t+1)|X (t),θ), where X (t) denotes the state of the process at time (or generation) t. The crucial feature of the above type of model is that, given θ, the transition model f(X (t+1)|X (t),θ) is known but the distribution of the stochastic process in equilibrium, that is f(X|θ), is, except in very special cases, intractable, hence unknown. A further point to note is that the data Y has been assumed to be observed when the underlying process is in equilibrium. In other words, the data is not collected dynamically over time. We refer to such specification as a latent equilibrium process (LEP) model. It is motivated by problems in population genetics (though other applications are discussed), where it is of interest to learn about parameters such as mutation and migration rates and population sizes, given a sample of allele frequencies at one or more loci. In such problems it is natural to assume that the distribution of the observed allele frequencies depends on the true (unobserved) population allele frequencies, whereas the distribution of the true allele frequencies is only indirectly specified through a transition model. As a hierarchical specification, it is natural to fit the LEP within a Bayesian framework. Fitting such models is usually done via Markov chain Monte Carlo (MCMC). However, we demonstrate that, in the case of LEP models, implementation of MCMC is far from straightforward. The main contribution of this paper is to provide a methodology to implement MCMC for LEP models. We demonstrate our approach in population genetics problems with both simulated and real data sets. The resultant model fitting is computationally intensive and thus, we also discuss parallel implementation of the procedure in special cases.  相似文献   

5.
The Buckley–James estimator (BJE) [J. Buckley and I. James, Linear regression with censored data, Biometrika 66 (1979), pp. 429–436] has been extended from right-censored (RC) data to interval-censored (IC) data by Rabinowitz et al. [D. Rabinowitz, A. Tsiatis, and J. Aragon, Regression with interval-censored data, Biometrika 82 (1995), pp. 501–513]. The BJE is defined to be a zero-crossing of a modified score function H(b), a point at which H(·) changes its sign. We discuss several approaches (for finding a BJE with IC data) which are extensions of the existing algorithms for RC data. However, these extensions may not be appropriate for some data, in particular, they are not appropriate for a cancer data set that we are analysing. In this note, we present a feasible iterative algorithm for obtaining a BJE. We apply the method to our data.  相似文献   

6.
ABSTRACT

Transformation of the response is a popular method to meet the usual assumptions of statistical methods based on linear models such as ANOVA and t-test. In this paper, we introduce new families of transformations for proportions or percentage data. Most of the transformations for proportions require 0 < x < 1 (where x denotes the proportion), which is often not the case in real data. The proposed families of transformations allow x = 0 and x = 1. We study the properties of the proposed transformations, as well as the performance in achieving normality and homoscedasticity. We analyze three real data sets to empirically show how the new transformation performs in meeting the usual assumptions. A simulation study is also performed to study the behavior of new families of transformations.  相似文献   

7.
A segmented line regression model has been used to describe changes in cancer incidence and mortality trends [Kim, H.-J., Fay, M.P., Feuer, E.J. and Midthune, D.N., 2000, Permutation tests for joinpoint regression with applications to cancer rates. Statistics in Medicine, 19, 335–351. Kim, H.-J., Fay, M.P., Yu, B., Barrett., M.J. and Feuer, E.J., 2004, Comparability of segmented line regression models. Biometrics, 60, 1005–1014.]. The least squares fit can be obtained by using either the grid search method proposed by Lerman [Lerman, P.M., 1980, Fitting segmented regression models by grid search. Applied Statistics, 29, 77–84.] which is implemented in Joinpoint 3.0 available at http://srab.cancer.gov/joinpoint/index.html, or by using the continuous fitting algorithm proposed by Hudson [Hudson, D.J., 1966, Fitting segmented curves whose join points have to be estimated. Journal of the American Statistical Association, 61, 1097–1129.] which will be implemented in the next version of Joinpoint software. Following the least squares fitting of the model, inference on the parameters can be pursued by using the asymptotic results of Hinkley [Hinkley, D.V., 1971, Inference in two-phase regression. Journal of the American Statistical Association, 66, 736–743.] and Feder [Feder, P.I., 1975a, On asymptotic distribution theory in segmented regression Problems-Identified Case. The Annals of Statistics, 3, 49–83.] Feder [Feder, P.I., 1975b, The log likelihood ratio in segmented regression. The Annals of Statistics, 3, 84–97.] Via simulations, this paper empirically examines small sample behavior of these asymptotic results, studies how the two fitting methods, the grid search and the Hudson's algorithm affect these inferential procedures, and also assesses the robustness of the asymptotic inferential procedures.  相似文献   

8.
ABSTRACT

Genetic data are frequently categorical and have complex dependence structures that are not always well understood. For this reason, clustering and classification based on genetic data, while highly relevant, are challenging statistical problems. Here we consider a versatile U-statistics-based approach for non-parametric clustering that allows for an unconventional way of solving these problems. In this paper we propose a statistical test to assess group homogeneity taking into account multiple testing issues and a clustering algorithm based on dissimilarities within and between groups that highly speeds up the homogeneity test. We also propose a test to verify classification significance of a sample in one of two groups. We present Monte Carlo simulations that evaluate size and power of the proposed tests under different scenarios. Finally, the methodology is applied to three different genetic data sets: global human genetic diversity, breast tumour gene expression and Dengue virus serotypes. These applications showcase this statistical framework's ability to answer diverse biological questions in the high dimension low sample size scenario while adapting to the specificities of the different datatypes.  相似文献   

9.
ABSTRACT

Parameter estimation based on truncated data is dealt with; the data are assumed to obey truncated exponential distributions with a variety of truncation time—a 1 data are obtained by truncation time b 1, a 2 data are obtained by truncation time b 2 and so on, whereas the underlying distribution is the same exponential one. The purpose of the present paper is to give existence conditions of the maximum likelihood estimators (MLEs) and to show some properties of the MLEs in two cases: 1) the grouped and truncated data are given (that is, the data each express the number of the data value falling in a corresponding subinterval), 2) the continuous and truncated data are given.  相似文献   

10.
In partly linear models, the dependence of the response y on (x T, t) is modeled through the relationship y=x T β+g(t)+?, where ? is independent of (x T, t). We are interested in developing an estimation procedure that allows us to combine the flexibility of the partly linear models, studied by several authors, but including some variables that belong to a non-Euclidean space. The motivating application of this paper deals with the explanation of the atmospheric SO2 pollution incidents using these models when some of the predictive variables belong in a cylinder. In this paper, the estimators of β and g are constructed when the explanatory variables t take values on a Riemannian manifold and the asymptotic properties of the proposed estimators are obtained under suitable conditions. We illustrate the use of this estimation approach using an environmental data set and we explore the performance of the estimators through a simulation study.  相似文献   

11.
Current statistical methods for analyzing epidemiological data with disease subtype information allow us to acquire knowledge not only for risk factor-disease subtype association but also, on a more profound account, heterogeneity in these associations by multiple disease characteristics (so-called etiologic heterogeneity of the disease). Current interest, particularly in cancer epidemiology, lies in obtaining a valid p-value for testing the hypothesis whether a particular cancer is etiologically heterogeneous. We consider the two-stage logistic regression model along with pseudo-conditional likelihood estimation method and design a testing strategy based on Rao's score test. An extensive Monte Carlo simulation study is carried out, false discovery rate and statistical power of the suggested test are investigated. Simulation results indicate that applying the proposed testing strategy, even a small degree of true etiologic heterogeneity can be recovered with a large statistical power from the sampled data. The strategy is then applied on a breast cancer data set to illustrate its use in practice where there are multiple risk factors and multiple disease characteristics of simultaneous concern.  相似文献   

12.
By entering the data (y i ,x i ) followed by (–y i ,–x i ), one can obtain an intercept-free regression Y = Xβ + ε from a program package that normally uses an intercept term. There is no bias in the resultant regression coefficients, but a minor postanalysis adjustment is needed to the residual variance and standard errors.  相似文献   

13.
ABSTRACT

In this paper, we propose modified spline estimators for nonparametric regression models with right-censored data, especially when the censored response observations are converted to synthetic data. Efficient implementation of these estimators depends on the set of knot points and an appropriate smoothing parameter. We use three algorithms, the default selection method (DSM), myopic algorithm (MA), and full search algorithm (FSA), to select the optimum set of knots in a penalized spline method based on a smoothing parameter, which is chosen based on different criteria, including the improved version of the Akaike information criterion (AICc), generalized cross validation (GCV), restricted maximum likelihood (REML), and Bayesian information criterion (BIC). We also consider the smoothing spline (SS), which uses all the data points as knots. The main goal of this study is to compare the performance of the algorithm and criteria combinations in the suggested penalized spline fits under censored data. A Monte Carlo simulation study is performed and a real data example is presented to illustrate the ideas in the paper. The results confirm that the FSA slightly outperforms the other methods, especially for high censoring levels.  相似文献   

14.
Approximate confidence intervals are given for the lognormal regression problem. The error in the nominal level can be reduced to O(n ?2), where n is the sample size. An alternative procedure is given which avoids the non-robust assumption of lognormality. This amounts to finding a confidence interval based on M-estimates for a general smooth function of both ? and F, where ? are the parameters of the general (possibly nonlinear) regression problem and F is the unknown distribution function of the residuals. The derived intervals are compared using theory, simulation and real data sets.  相似文献   

15.
ABSTRACT

Background: Many exposures in epidemiological studies have nonlinear effects and the problem is to choose an appropriate functional relationship between such exposures and the outcome. One common approach is to investigate several parametric transformations of the covariate of interest, and to select a posteriori the function that fits the data the best. However, such approach may result in an inflated Type I error. Methods: Through a simulation study, we generated data from Cox's models with different transformations of a single continuous covariate. We investigated the Type I error rate and the power of the likelihood ratio test (LRT) corresponding to three different procedures that considered the same set of parametric dose-response functions. The first unconditional approach did not involve any model selection, while the second conditional approach was based on a posteriori selection of the parametric function. The proposed third approach was similar to the second except that it used a corrected critical value for the LRT to ensure a correct Type I error. Results: The Type I error rate of the second approach was two times higher than the nominal size. For simple monotone dose-response, the corrected test had similar power as the unconditional approach, while for non monotone, dose-response, it had a higher power. A real-life application that focused on the effect of body mass index on the risk of coronary heart disease death, illustrated the advantage of the proposed approach. Conclusion: Our results confirm that a posteriori selecting the functional form of the dose-response induces a Type I error inflation. The corrected procedure, which can be applied in a wide range of situations, may provide a good trade-off between Type I error and power.  相似文献   

16.
Ipsilateral breast tumor relapse (IBTR) often occurs in breast cancer patients after their breast conservation therapy. The IBTR status' classification (true local recurrence versus new ipsilateral primary tumor) is subject to error and there is no widely accepted gold standard. Time to IBTR is likely informative for IBTR classification because new primary tumor tends to have a longer mean time to IBTR and is associated with improved survival as compared with the true local recurrence tumor. Moreover, some patients may die from breast cancer or other causes in a competing risk scenario during the follow-up period. Because the time to death can be correlated to the unobserved true IBTR status and time to IBTR (if relapse occurs), this terminal mechanism is non-ignorable. In this paper, we propose a unified framework that addresses these issues simultaneously by modeling the misclassified binary outcome without a gold standard and the correlated time to IBTR, subject to dependent competing terminal events. We evaluate the proposed framework by a simulation study and apply it to a real data set consisting of 4477 breast cancer patients. The adaptive Gaussian quadrature tools in SAS procedure NLMIXED can be conveniently used to fit the proposed model. We expect to see broad applications of our model in other studies with a similar data structure.  相似文献   

17.
ABSTRACT

In actuarial applications, mixed Poisson distributions are widely used for modelling claim counts as observed data on the number of claims often exhibit a variance noticeably exceeding the mean. In this study, a new claim number distribution is obtained by mixing negative binomial parameter p which is reparameterized as p?=?exp( ?λ) with Gamma distribution. Basic properties of this new distribution are given. Maximum likelihood estimators of the parameters are calculated using the Newton–Raphson and genetic algorithm (GA). We compared the performance of these methods in terms of efficiency by simulation. A numerical example is provided.  相似文献   

18.
Partially linear regression models are semiparametric models that contain both linear and nonlinear components. They are extensively used in many scientific fields for their flexibility and convenient interpretability. In such analyses, testing the significance of the regression coefficients in the linear component is typically a key focus. Under the high-dimensional setting, i.e., “large p, small n,” the conventional F-test strategy does not apply because the coefficients need to be estimated through regularization techniques. In this article, we develop a new test using a U-statistic of order two, relying on a pseudo-estimate of the nonlinear component from the classical kernel method. Using the martingale central limit theorem, we prove the asymptotic normality of the proposed test statistic under some regularity conditions. We further demonstrate our proposed test's finite-sample performance by simulation studies and by analyzing some breast cancer gene expression data.  相似文献   

19.
The growth curve model Yn×p = An×p ξ mtimes;kBk×p+ Enxp, where Y is an observation matrix, &sigma is a matrix of unknown parameters, A is a known matrix of rank m, B is a known matrix of rank k with 1'= (1, …, 1) as its first row, and the rows of E are independent each distributed as Np(0,Σ,) is considered. The problem of constructing the prediction intervals for future observations using the above model is considered and approximate intervals assuming different structures on σ are derived. The results are illustrated with several data sets.  相似文献   

20.
For a multivariate linear model, Wilk's likelihood ratio test (LRT) constitutes one of the cornerstone tools. However, the computation of its quantiles under the null or the alternative hypothesis requires complex analytic approximations, and more importantly, these distributional approximations are feasible only for moderate dimension of the dependent variable, say p≤20. On the other hand, assuming that the data dimension p as well as the number q of regression variables are fixed while the sample size n grows, several asymptotic approximations are proposed in the literature for Wilk's Λ including the widely used chi-square approximation. In this paper, we consider necessary modifications to Wilk's test in a high-dimensional context, specifically assuming a high data dimension p and a large sample size n. Based on recent random matrix theory, the correction we propose to Wilk's test is asymptotically Gaussian under the null hypothesis and simulations demonstrate that the corrected LRT has very satisfactory size and power, surely in the large p and large n context, but also for moderately large data dimensions such as p=30 or p=50. As a byproduct, we give a reason explaining why the standard chi-square approximation fails for high-dimensional data. We also introduce a new procedure for the classical multiple sample significance test in multivariate analysis of variance which is valid for high-dimensional data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号