首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
A survey is given of papers which have influenced or have been influenced by the Growth Curve Model due to Potthoff & Roy (1964). The review covers, among others, methods of estimating parameters, the canonical version of the model, tests, extensions, incomplete data, Bayesian approaches and covariance structures.  相似文献   

2.
Logistic regression is estimated by maximizing the log-likelihood objective function formulated under the assumption of maximizing the overall accuracy. That does not apply to the imbalanced data. The resulting models tend to be biased towards the majority class (i.e. non-event), which can bring great loss in practice. One strategy for mitigating such bias is to penalize the misclassification costs of observations differently in the log-likelihood function. Existing solutions require either hard hyperparameter estimating or high computational complexity. We propose a novel penalized log-likelihood function by including penalty weights as decision variables for observations in the minority class (i.e. event) and learning them from data along with model coefficients. In the experiments, the proposed logistic regression model is compared with the existing ones on the statistics of area under receiver operating characteristics (ROC) curve from 10 public datasets and 16 simulated datasets, as well as the training time. A detailed analysis is conducted on an imbalanced credit dataset to examine the estimated probability distributions, additional performance measurements (i.e. type I error and type II error) and model coefficients. The results demonstrate that both the discrimination ability and computation efficiency of logistic regression models are improved using the proposed log-likelihood function as the learning objective.  相似文献   

3.
Missing data, and the bias they can cause, are an almost ever‐present concern in clinical trials. The last observation carried forward (LOCF) approach has been frequently utilized to handle missing data in clinical trials, and is often specified in conjunction with analysis of variance (LOCF ANOVA) for the primary analysis. Considerable advances in statistical methodology, and in our ability to implement these methods, have been made in recent years. Likelihood‐based, mixed‐effects model approaches implemented under the missing at random (MAR) framework are now easy to implement, and are commonly used to analyse clinical trial data. Furthermore, such approaches are more robust to the biases from missing data, and provide better control of Type I and Type II errors than LOCF ANOVA. Empirical research and analytic proof have demonstrated that the behaviour of LOCF is uncertain, and in many situations it has not been conservative. Using LOCF as a composite measure of safety, tolerability and efficacy can lead to erroneous conclusions regarding the effectiveness of a drug. This approach also violates the fundamental basis of statistics as it involves testing an outcome that is not a physical parameter of the population, but rather a quantity that can be influenced by investigator behaviour, trial design, etc. Practice should shift away from using LOCF ANOVA as the primary analysis and focus on likelihood‐based, mixed‐effects model approaches developed under the MAR framework, with missing not at random methods used to assess robustness of the primary analysis. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

4.
Mixed model repeated measures (MMRM) is the most common analysis approach used in clinical trials for Alzheimer's disease and other progressive diseases measured with continuous outcomes over time. The model treats time as a categorical variable, which allows an unconstrained estimate of the mean for each study visit in each randomized group. Categorizing time in this way can be problematic when assessments occur off-schedule, as including off-schedule visits can induce bias, and excluding them ignores valuable information and violates the intention to treat principle. This problem has been exacerbated by clinical trial visits which have been delayed due to the COVID19 pandemic. As an alternative to MMRM, we propose a constrained longitudinal data analysis with natural cubic splines that treats time as continuous and uses test version effects to model the mean over time. Compared to categorical-time models like MMRM and models that assume a proportional treatment effect, the spline model is shown to be more parsimonious and precise in real clinical trial datasets, and has better power and Type I error in a variety of simulation scenarios.  相似文献   

5.
Generalized additive mixed models are proposed for overdispersed and correlated data, which arise frequently in studies involving clustered, hierarchical and spatial designs. This class of models allows flexible functional dependence of an outcome variable on covariates by using nonparametric regression, while accounting for correlation between observations by using random effects. We estimate nonparametric functions by using smoothing splines and jointly estimate smoothing parameters and variance components by using marginal quasi-likelihood. Because numerical integration is often required by maximizing the objective functions, double penalized quasi-likelihood is proposed to make approximate inference. Frequentist and Bayesian inferences are compared. A key feature of the method proposed is that it allows us to make systematic inference on all model components within a unified parametric mixed model framework and can be easily implemented by fitting a working generalized linear mixed model by using existing statistical software. A bias correction procedure is also proposed to improve the performance of double penalized quasi-likelihood for sparse data. We illustrate the method with an application to infectious disease data and we evaluate its performance through simulation.  相似文献   

6.
It is generally considered that analysis of variance by maximum likelihood or its variants is computationally impractical, despite existing techniques for reducing computational effect per iteration and for reducing the number of iterations to convergence. This paper shows thata major reduction in the overall computational effort can be achieved through the use of sparse-matrix algorithms that take advantage of the factorial designs that characterize most applications of large analysis-of-variance problems. In this paper, an algebraic structure for factorial designsis developed. Through this structure, it is shown that the required computations can be arranged so that sparse-matrix methods result in greatly reduced storage and time requirements.  相似文献   

7.
The statistical modeling of big data bases constitutes one of the most challenging issues, especially nowadays. The issue is even more critical in case of a complicated correlation structure. Variable selection plays a vital role in statistical analysis of large data bases and many methods have been proposed so far to deal with the aforementioned problem. One of such methods is the Sure Independence Screening which has been introduced to reduce dimensionality to a relatively smaller scale. This method, though simple, produces remarkable results even under both ultra high dimensionality and big scale in terms of sample size problems. In this paper we dealt with the analysis of a big real medical data set assuming a Poisson regression model. We support the analysis by conducting simulated experiments taking into consideration the correlation structure of the design matrix.  相似文献   

8.
We propose a flexible semiparametric stochastic mixed effects model for bivariate cyclic longitudinal data. The model can handle either single cycle or, more generally, multiple consecutive cycle data. The approach models the mean of responses by parametric fixed effects and a smooth nonparametric function for the underlying time effects, and the relationship across the bivariate responses by a bivariate Gaussian random field and a joint distribution of random effects. The proposed model not only can model complicated individual profiles, but also allows for more flexible within-subject and between-response correlations. The fixed effects regression coefficients and the nonparametric time functions are estimated using maximum penalized likelihood, where the resulting estimator for the nonparametric time function is a cubic smoothing spline. The smoothing parameters and variance components are estimated simultaneously using restricted maximum likelihood. Simulation results show that the parameter estimates are close to the true values. The fit of the proposed model on a real bivariate longitudinal dataset of pre-menopausal women also performs well, both for a single cycle analysis and for a multiple consecutive cycle analysis. The Canadian Journal of Statistics 48: 471–498; 2020 © 2020 Statistical Society of Canada  相似文献   

9.
The last observation carried forward (LOCF) approach is commonly utilized to handle missing values in the primary analysis of clinical trials. However, recent evidence suggests that likelihood‐based analyses developed under the missing at random (MAR) framework are sensible alternatives. The objective of this study was to assess the Type I error rates from a likelihood‐based MAR approach – mixed‐model repeated measures (MMRM) – compared with LOCF when estimating treatment contrasts for mean change from baseline to endpoint (Δ). Data emulating neuropsychiatric clinical trials were simulated in a 4 × 4 factorial arrangement of scenarios, using four patterns of mean changes over time and four strategies for deleting data to generate subject dropout via an MAR mechanism. In data with no dropout, estimates of Δ and SEΔ from MMRM and LOCF were identical. In data with dropout, the Type I error rates (averaged across all scenarios) for MMRM and LOCF were 5.49% and 16.76%, respectively. In 11 of the 16 scenarios, the Type I error rate from MMRM was at least 1.00% closer to the expected rate of 5.00% than the corresponding rate from LOCF. In no scenario did LOCF yield a Type I error rate that was at least 1.00% closer to the expected rate than the corresponding rate from MMRM. The average estimate of SEΔ from MMRM was greater in data with dropout than in complete data, whereas the average estimate of SEΔ from LOCF was smaller in data with dropout than in complete data, suggesting that standard errors from MMRM better reflected the uncertainty in the data. The results from this investigation support those from previous studies, which found that MMRM provided reasonable control of Type I error even in the presence of MNAR missingness. No universally best approach to analysis of longitudinal data exists. However, likelihood‐based MAR approaches have been shown to perform well in a variety of situations and are a sensible alternative to the LOCF approach. MNAR methods can be used within a sensitivity analysis framework to test the potential presence and impact of MNAR data, thereby assessing robustness of results from an MAR method. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

10.
This paper is concerned wim ine maximum likelihood estimation and the likelihood ratio test for hierarchical loglinear models of multidimensional contingency tables with missing data. The problems of estimation and test for a high dimensional contingency table can be reduced into those for a class of low dimensional tables. In some cases, the incomplete data in the high dimensional table can become complete in the low dimensional tables through the reduction can indicate how much the incomplete data contribute to the estimation and the test.  相似文献   

11.
In prospective cohort studies, individuals are usually recruited according to a certain cross-sectional sampling criterion. The prevalent cohort is defined as a group of individuals who are alive but possibly with disease at the beginning of the study. It is appealing to incorporate the prevalent cases to estimate the incidence rate of disease before the enrollment. The method of back calculation of incidence rate has been used to estimate the incubation time from human immunodeficiency virus (HIV) infection to AIDS. The time origin is defined as the time of HIV infection. In aging cohort studies, the primary time scale is age of disease onset, subjects have to survive certain years to be enrolled into the study, thus creating left truncation (delay entry). The current methods usually assume that either the disease incidence is rare or the excess mortality due to disease is small compared with the healthy subjects. So far the validity of the results based on these assumptions has not been examined. In this paper, a simple alternative method is proposed to estimate dementia incidence rate before enrollment using prevalent cohort data with left truncation. Furthermore, simulations are used to examine the performance of the estimation of disease incidence under different assumptions of disease incidence rates and excess mortality hazards due to disease. As application, the method is applied to the prevalent cases of dementia from the Honolulu-Asia Aging Study to estimate the dementia incidence rate and to assess the effect of hypertension, Apoe 4 and education on dementia onset.  相似文献   

12.
We consider the situation where there is a known regression model that can be used to predict an outcome, Y, from a set of predictor variables X . A new variable B is expected to enhance the prediction of Y. A dataset of size n containing Y, X and B is available, and the challenge is to build an improved model for Y| X ,B that uses both the available individual level data and some summary information obtained from the known model for Y| X . We propose a synthetic data approach, which consists of creating m additional synthetic data observations, and then analyzing the combined dataset of size n + m to estimate the parameters of the Y| X ,B model. This combined dataset of size n + m now has missing values of B for m of the observations, and is analyzed using methods that can handle missing data (e.g., multiple imputation). We present simulation studies and illustrate the method using data from the Prostate Cancer Prevention Trial. Though the synthetic data method is applicable to a general regression context, to provide some justification, we show in two special cases that the asymptotic variances of the parameter estimates in the Y| X ,B model are identical to those from an alternative constrained maximum likelihood estimation approach. This correspondence in special cases and the method's broad applicability makes it appealing for use across diverse scenarios. The Canadian Journal of Statistics 47: 580–603; 2019 © 2019 Statistical Society of Canada  相似文献   

13.
Abstract: The authors address the problem of estimating an inter‐event distribution on the basis of count data. They derive a nonparametric maximum likelihood estimate of the inter‐event distribution utilizing the EM algorithm both in the case of an ordinary renewal process and in the case of an equilibrium renewal process. In the latter case, the iterative estimation procedure follows the basic scheme proposed by Vardi for estimating an inter‐event distribution on the basis of time‐interval data; it combines the outputs of the E‐step corresponding to the inter‐event distribution and to the length‐biased distribution. The authors also investigate a penalized likelihood approach to provide the proposed estimation procedure with regularization capabilities. They evaluate the practical estimation procedure using simulated count data and apply it to real count data representing the elongation of coffee‐tree leafy axes.  相似文献   

14.
Abstract. Partially linear models are extensions of linear models to include a non-parametric function of some covariate. They have been found to be useful in both cross-sectional and longitudinal studies. This paper provides a convenient means to extend Cook's local influence analysis to the penalized Gaussian likelihood estimator that uses a smoothing spline as a solution to its non-parametric component. Insight is also provided into the interplay of the influence or leverage measures between the linear and the non-parametric components in the model. The diagnostics are applied to a mouthwash data set and a longitudinal hormone study with informative results.  相似文献   

15.
16.
Modeling binary familial data has been a challenging task due to the dependence among family members and the constraints imposed on the joint probability distribution of the binary responses. This paper investigates some useful familial dependence structures and proposes analyzing binary familial data using Gaussian copula model. Advantages of this approach are discussed as well as some computational details. An numerical example is also presented with an aim to show the capability of Gaussian copula model in more sophisticated data analysis.  相似文献   

17.
Three types of polynomial mixed model splines have been proposed: smoothing splines, P‐splines and penalized splines using a truncated power function basis. The close connections between these models are demonstrated, showing that the default cubic form of the splines differs only in the penalty used. A general definition of the mixed model spline is given that includes general constraints and can be used to produce natural or periodic splines. The impact of different penalties is demonstrated by evaluation across a set of functions with specific features, and shows that the best penalty in terms of mean squared error of prediction depends on both the form of the underlying function and the signal:noise ratio.  相似文献   

18.
We investigate the impacts of complex sampling on point and standard error estimates in latent growth curve modelling of survey data. Methodological issues are illustrated with empirical evidence from the analysis of longitudinal data on life satisfaction trajectories using data from the British Household Panel Survey, a national representative survey in Great Britain. A multi-process second-order latent growth curve model with conditional linear growth is used to study variation in the two perceived life satisfaction latent factors considered. The benefits of accounting for the complex survey design are considered, including obtaining unbiased both point and standard error estimates, and therefore correctly specified confidence intervals and statistical tests. We conclude that, even for the rather elaborated longitudinal data models that were considered, estimation procedures are affected by variance-inflating impacts of complex sampling.  相似文献   

19.
This paper develops a likelihood‐based method for fitting additive models in the presence of measurement error. It formulates the additive model using the linear mixed model representation of penalized splines. In the presence of a structural measurement error model, the resulting likelihood involves intractable integrals, and a Monte Carlo expectation maximization strategy is developed for obtaining estimates. The method's performance is illustrated with a simulation study.  相似文献   

20.
This paper investigates on the problem of parameter estimation in statistical model when observations are intervals assumed to be related to underlying crisp realizations of a random sample. The proposed approach relies on the extension of likelihood function in interval setting. A maximum likelihood estimate of the parameter of interest may then be defined as a crisp value maximizing the generalized likelihood function. Using the expectation-maximization (EM) to solve such maximizing problem therefore derives the so-called interval-valued EM algorithm (IEM), which makes it possible to solve a wide range of statistical problems involving interval-valued data. To show the performance of IEM, the following two classical problems are illustrated: univariate normal mean and variance estimation from interval-valued samples, and multiple linear/nonlinear regression with crisp inputs and interval output.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号