首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 796 毫秒
1.
The measurable multiple bio-markers for a disease are used as indicators for studying the response variable of interest in order to monitor and model disease progression. However, it is common for subjects to drop out of the studies prematurely resulting in unbalanced data and hence complicating the inferences involving such data. In this paper we consider a case where data are unbalanced among subjects and also within a subject because for some reason only a subset of the multiple outcomes of the response variable are observed at any one occasion. We propose a nonlinear mixed-effects model for the multivariate response variable data and derive a joint likelihood function that takes into account the partial dropout of the outcomes of the response variable. We further show how the methodology can be used in the estimation of the parameters that characterise HIV disease dynamics. An approximation technique of the parameters is also given and illustrated using a routine observational HIV dataset.  相似文献   

2.
The paper considers the property of global log-concavity of the likelihood function in discrete data models in which the data are observed in ‘grouped’ form, meaning that for some observations, while the actual value is unknown, the realisation of the discrete random variable is known to fall within a certain range of values. A typical likelihood contribution in this type of model is a sum of probabilities over a range of realisations. An important issue is whether the property of log-concavity in the ungrouped case carries over to the grouped counterpart; the paper finds, by way of a simple but relevant counter-example, that this is not always the case. However, in two cases of practical interest, namely the Poisson and geometric models, the property of log-concavity is preserved under grouping.  相似文献   

3.
This article presents a method for modeling endogenous selectivity in count data. As in the case of the switching-regression model, two regimes are distinguished with potentially different data-generating processes. The regime choice is allowed to be correlated with the observed count in each of the regimes. An estimable model is obtained by transforming the underlying processes to the bivariate normal distribution. An empirical application on trip count is provided.  相似文献   

4.
This paper presents a new measure of association. It is applicable to polytomies of either categorical or numerical type. It has the desirable property of being 0 if and only if the polytomies are independent. Its properties are studied and compared to those of existing measures. An interpretation of it is given. One situation where it is particularly useful is in measuring the ability to predict one polytomy given knowledge of the other. An example is given where the proposed measure is more relevant in describing the degree of association between two polytomies than are any of the existing measures. The corresponding sample quantity is presented and its asymptotic properties are studied. A discussion of its use in inference is given. The test for independence based on this measure is contrasted with the chi-square test.  相似文献   

5.
Score test of homogeneity for survival data   总被引:3,自引:0,他引:3  
If follow-up is made for subjects which are grouped into units, such as familial or spatial units then it may be interesting to test whether the groups are homogeneous (or independent for given explanatory variables). The effect of the groups is modelled as random and we consider a frailty proportional hazards model which allows to adjust for explanatory variables. We derive the score test of homogeneity from the marginal partial likelihood and it turns out to be the sum of a pairwise correlation term of martingale residuals and an overdispersion term. In the particular case where the sizes of the groups are equal to one, this statistic can be used for testing overdispersion. The asymptotic variance of this statistic is derived using counting process arguments. An extension to the case of several strata is given. The resulting test is computationally simple; its use is illustrated using both simulated and real data. In addition a decomposition of the score statistic is proposed as a sum of a pairwise correlation term and an overdispersion term. The pairwise correlation term can be used for constructing a statistic more robust to departure from the proportional hazard model, and the overdispesion term for constructing a test of fit of the proportional hazard model.  相似文献   

6.
Clustering of Variables Around Latent Components   总被引:1,自引:0,他引:1  
Abstract

Clustering of variables around latent components is investigated as a means to organize multivariate data into meaningful structures. The coverage includes (i) the case where it is desirable to lump together correlated variables no matter whether the correlation coefficient is positive or negative; (ii) the case where negative correlation shows high disagreement among variables; (iii) an extension of the clustering techniques which makes it possible to explain the clustering of variables taking account of external data. The strategy basically consists in performing a hierarchical cluster analysis, followed by a partitioning algorithm. Both algorithms aim at maximizing the same criterion which reflects the extent to which variables in each cluster are related to the latent variable associated with this cluster. Illustrations are outlined using real data sets from sensory studies.  相似文献   

7.
The following life-testing situation is considered. At some time in the distant past, n objects, from a population with life distribution F, were put in use; whenever an object failed, it was promptly replaced. At some time τ, long after the start of the process, a statistician starts observing the n objects in use at that time; he knows the age of each of those n objects, and observes each of them for a fixed length of time? ∞, or until failure, whichever occurs first. In the case where T is finite, some of the observations may be censored; in the case where T =∞, there is no censoring. The total life of an object in use at time ∞ is a length-biased observation from F. A nonparametric estimator of the (cumulative) hazard function is proposed, and is used to construct an estimator of F which is of the product-limit type. Strong uniform consistency results (for n → ∞) are obtained. An “Aalen-Johansen” identity, satisfied by any pair of life distributions and their (cumulative) hazard functions, is used in obtaining rate-of-convergence results.  相似文献   

8.
The focused information criterion for model selection is constructed to select the model that best estimates a particular quantity of interest, the focus, in terms of mean squared error. We extend this focused selection process to the high‐dimensional regression setting with potentially a larger number of parameters than the size of the sample. We distinguish two cases: (i) the case where the considered submodel is of low dimension and (ii) the case where it is of high dimension. In the former case, we obtain an alternative expression of the low‐dimensional focused information criterion that can directly be applied. In the latter case, we use a desparsified estimator that allows us to derive the mean squared error of the focus estimator. We illustrate the performance of the high‐dimensional focused information criterion with a numerical study and a real dataset.  相似文献   

9.
Abstract. An objective of randomized placebo‐controlled preventive HIV vaccine efficacy trials is to assess the relationship between the vaccine effect to prevent infection and the genetic distance of the exposing HIV to the HIV strain represented in the vaccine construct. Motivated by this objective, recently a mark‐specific proportional hazards (PH) model with a continuum of competing risks has been studied, where the genetic distance of the transmitting strain is the continuous ‘mark’ defined and observable only in failures. A high percentage of genetic marks of interest may be missing for a variety of reasons, predominantly because rapid evolution of HIV sequences after transmission before a blood sample is drawn from which HIV sequences are measured. This research investigates the stratified mark‐specific PH model with missing marks where the baseline functions may vary with strata. We develop two consistent estimation approaches, the first based on the inverse probability weighted complete‐case (IPW) technique, and the second based on augmenting the IPW estimator by incorporating auxiliary information predictive of the mark. We investigate the asymptotic properties and finite‐sample performance of the two estimators, and show that the augmented IPW estimator, which satisfies a double robustness property, is more efficient.  相似文献   

10.
We define the odd log-logistic exponential Gaussian regression with two systematic components, which extends the heteroscedastic Gaussian regression and it is suitable for bimodal data quite common in the agriculture area. We estimate the parameters by the method of maximum likelihood. Some simulations indicate that the maximum-likelihood estimators are accurate. The model assumptions are checked through case deletion and quantile residuals. The usefulness of the new regression model is illustrated by means of three real data sets in different areas of agriculture, where the data present bimodality.  相似文献   

11.
Two-phase case–control studies cope with the problem of confounding by obtaining required additional information for a subset (phase 2) of all individuals (phase 1). Nowadays, studies with rich phase 1 data are available where only few unmeasured confounders need to be obtained in phase 2. The extended conditional maximum likelihood (ECML) approach in two-phase logistic regression is a novel method to analyse such data. Alternatively, two-phase case–control studies can be analysed by multiple imputation (MI), where phase 2 information for individuals included in phase 1 is treated as missing. We conducted a simulation of two-phase studies, where we compared the performance of ECML and MI in typical scenarios with rich phase 1. Regarding exposure effect, MI was less biased and more precise than ECML. Furthermore, ECML was sensitive against misspecification of the participation model. We therefore recommend MI to analyse two-phase case–control studies in situations with rich phase 1 data.  相似文献   

12.
This article discusses regression analysis of mixed interval-censored failure time data. Such data frequently occur across a variety of settings, including clinical trials, epidemiologic investigations, and many other biomedical studies with a follow-up component. For example, mixed failure times are commonly found in the two largest studies of long-term survivorship after childhood cancer, the datasets that motivated this work. However, most existing methods for failure time data consider only right-censored or only interval-censored failure times, not the more general case where times may be mixed. Additionally, among regression models developed for mixed interval-censored failure times, the proportional hazards formulation is generally assumed. It is well-known that the proportional hazards model may be inappropriate in certain situations, and alternatives are needed to analyze mixed failure time data in such cases. To fill this need, we develop a maximum likelihood estimation procedure for the proportional odds regression model with mixed interval-censored data. We show that the resulting estimators are consistent and asymptotically Gaussian. An extensive simulation study is performed to assess the finite-sample properties of the method, and this investigation indicates that the proposed method works well for many practical situations. We then apply our approach to examine the impact of age at cranial radiation therapy on risk of growth hormone deficiency in long-term survivors of childhood cancer.  相似文献   

13.
This article considers a class of estimators for the location and scale parameters in the location-scale model based on ‘synthetic data’ when the observations are randomly censored on the right. The asymptotic normality of the estimators is established using counting process and martingale techniques when the censoring distribution is known and unknown, respectively. In the case when the censoring distribution is known, we show that the asymptotic variances of this class of estimators depend on the data transformation and have a lower bound which is not achievable by this class of estimators. However, in the case that the censoring distribution is unknown and estimated by the Kaplan–Meier estimator, this class of estimators has the same asymptotic variance and attains the lower bound for variance for the case of known censoring distribution. This is different from censored regression analysis, where asymptotic variances depend on the data transformation. Our method has three valuable advantages over the method of maximum likelihood estimation. First, our estimators are available in a closed form and do not require an iterative algorithm. Second, simulation studies show that our estimators being moment-based are comparable to maximum likelihood estimators and outperform them when sample size is small and censoring rate is high. Third, our estimators are more robust to model misspecification than maximum likelihood estimators. Therefore, our method can serve as a competitive alternative to the method of maximum likelihood in estimation for location-scale models with censored data. A numerical example is presented to illustrate the proposed method.  相似文献   

14.
We present upper and lower bounds for information measures, and use these to find the optimal design of experiments for Bayesian networks. The bounds are inspired by properties of the junction tree algorithm, which is commonly used for calculating conditional probabilities in graphical models like Bayesian networks. We demonstrate methods for iteratively improving the upper and lower bounds until they are sufficiently tight. We illustrate properties of the algorithm by tutorial examples in the case where we want to ensure optimality and for the case where the goal is an approximate solution with a guarantee. We further use the bounds to accelerate established algorithms for constructing useful designs. An example with petroleum fields in the North Sea is studied, where the design problem is related to exploration drilling campaigns. All of our examples consider binary random variables, but the theory can also be applied to other discrete or continuous distributions.  相似文献   

15.
Pettitt  A. N.  Weir  I. S.  Hart  A. G. 《Statistics and Computing》2002,12(4):353-367
A Gaussian conditional autoregressive (CAR) formulation is presented that permits the modelling of the spatial dependence and the dependence between multivariate random variables at irregularly spaced sites so capturing some of the modelling advantages of the geostatistical approach. The model benefits not only from the explicit availability of the full conditionals but also from the computational simplicity of the precision matrix determinant calculation using a closed form expression involving the eigenvalues of a precision matrix submatrix. The introduction of covariates into the model adds little computational complexity to the analysis and thus the method can be straightforwardly extended to regression models. The model, because of its computational simplicity, is well suited to application involving the fully Bayesian analysis of large data sets involving multivariate measurements with a spatial ordering. An extension to spatio-temporal data is also considered. Here, we demonstrate use of the model in the analysis of bivariate binary data where the observed data is modelled as the sign of the hidden CAR process. A case study involving over 450 irregularly spaced sites and the presence or absence of each of two species of rain forest trees at each site is presented; Markov chain Monte Carlo (MCMC) methods are implemented to obtain posterior distributions of all unknowns. The MCMC method works well with simulated data and the tree biodiversity data set.  相似文献   

16.
Incorporating historical information into the design and analysis of a new clinical trial has been the subject of much discussion as a way to increase the feasibility of trials in situations where patients are difficult to recruit. The best method to include this data is not yet clear, especially in the case when few historical studies are available. This paper looks at the power prior technique afresh in a binomial setting and examines some previously unexamined properties, such as Box P values, bias, and coverage. Additionally, it proposes an empirical Bayes‐type approach to estimating the prior weight parameter by marginal likelihood. This estimate has advantages over previously criticised methods in that it varies commensurably with differences in the historical and current data and can choose weights near 1 when the data are similar enough. Fully Bayesian approaches are also considered. An analysis of the operating characteristics shows that the adaptive methods work well and that the various approaches have different strengths and weaknesses.  相似文献   

17.
We describe a mixed-effect hurdle model for zero-inflated longitudinal count data, where a baseline variable is included in the model specification. Association between the count data process and the endogenous baseline variable is modeled through a latent structure, assumed to be dependent across equations. We show how model parameters can be estimated in a finite mixture context, allowing for overdispersion, multivariate association and endogeneity of the baseline variable. The model behavior is investigated through a large-scale simulation experiment. An empirical example on health care utilization data is provided.  相似文献   

18.
We consider the problem of sample size calculation for non-inferiority based on the hazard ratio in time-to-event trials where overall study duration is fixed and subject enrollment is staggered with variable follow-up. An adaptation of previously developed formulae for the superiority framework is presented that specifically allows for effect reversal under the non-inferiority setting, and its consequent effect on variance. Empirical performance is assessed through a small simulation study, and an example based on an ongoing trial is presented. The formulae are straightforward to program and may prove a useful tool in planning trials of this type.  相似文献   

19.
This paper proposes a probabilistic frontier regression model for binary type output data in a production process setup. We consider one of the two categories of outputs as ‘selected’ category and the reduction in probability of falling in this category is attributed to the reduction in technical efficiency (TE) of the decision-making unit. An efficiency measure is proposed to determine the deviations of individual units from the probabilistic frontier. Simulation results show that the average estimated TE component is close to its true value. An application of the proposed method to the data related to the Indian public sector banking system is provided where the output variable is the indicator of level of non-performing assets. Individual TE is obtained for each of the banks under consideration. Among the public sector banks, Andhra bank is found to be the most efficient, whereas the United Bank of India is the least.  相似文献   

20.
An estimator of the ratio of scale parameters of the distributions of two positive random variables is developed for the case where the only difference between the distributions is a difference in scale. Simulation studies demonstrate that the estimator performs much better, in terms of mean squared error, than the most popular one among those estimators currently available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号