期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A non-parametric maximum test for the Behrens–Fisher problem

Anke Welz Graeme D. Ruxton 《Journal of Statistical Computation and Simulation》2018,88(7):1336-1347

Non-normality and heteroscedasticity are common in applications. For the comparison of two samples in the non-parametric Behrens–Fisher problem, different tests have been proposed, but no single test can be recommended for all situations. Here, we propose combining two tests, the Welch t test based on ranks and the Brunner–Munzel test, within a maximum test. Simulation studies indicate that this maximum test, performed as a permutation test, controls the type I error rate and stabilizes the power. That is, it has good power characteristics for a variety of distributions, and also for unbalanced sample sizes. Compared to the single tests, the maximum test shows acceptable type I error control. 相似文献

2.

A generalized Q–Q plot for longitudinal data

M. C. Pardo 《Journal of applied statistics》2012,39(11):2349-2362

Most biomedical research is carried out using longitudinal studies. The method of generalized estimating equations (GEEs) introduced by Liang and Zeger [Longitudinal data analysis using generalized linear models, Biometrika 73 (1986), pp. 13–22] and Zeger and Liang [Longitudinal data analysis for discrete and continuous outcomes, Biometrics 42 (1986), pp. 121–130] has become a standard method for analyzing non-normal longitudinal data. Since then, a large variety of GEEs have been proposed. However, the model diagnostic problem has not been explored intensively. Oh et al. [Modeldiagnostic plots for repeated measures data using the generalized estimating equations approach, Comput. Statist. Data Anal. 53 (2008), pp. 222–232] proposed residual plots based on the quantile–quantile (Q–Q) plots of the χ²-distribution for repeated-measures data using the GEE methodology. They considered the Pearson, Anscombe and deviance residuals. In this work, we propose to extend this graphical diagnostic using a generalized residual. A simulation study is presented as well as two examples illustrating the proposed generalized Q–Q plots. 相似文献

3.

A modified Hosmer–Lemeshow test for large data sets

Wei Yu Wangli Xu 《统计学通讯:理论与方法》2017,46(23):11813-11825

The Hosmer–Lemeshow test is a widely used method for evaluating the goodness of fit of logistic regression models. But its power is much influenced by the sample size, like other chi-square tests. Paul, Pennell, and Lemeshow (2013 Paul, P., M. L. Pennell, and S. Lemeshow. 2013. Standardizing the power of the Hosmer–Lemeshow goodness of fit test in large data sets. Statistics in Medicine 32:67–80.[Crossref], [PubMed], [Web of Science ®] , [Google Scholar]) considered using a large number of groups for large data sets to standardize the power. But simulations show that their method performs poorly for some models. In addition, it does not work when the sample size is larger than 25,000. In the present paper, we propose a modified Hosmer–Lemeshow test that is based on estimation and standardization of the distribution parameter of the Hosmer–Lemeshow statistic. We provide a mathematical derivation for obtaining the critical value and power of our test. Through simulations, we can see that our method satisfactorily standardizes the power of the Hosmer–Lemeshow test. It is especially recommendable for enough large data sets, as the power is rather stable. A bank marketing data set is also analyzed for comparison with existing methods. 相似文献

4.

A new local estimation method for single index models for longitudinal data

Hongmei Lin Jianhong Shi Jicai Liu Yanghui Liu 《Journal of nonparametric statistics》2016,28(3):644-658

Single index models are natural extensions of linear models and overcome the so-called curse of dimensionality. They are very useful for longitudinal data analysis. In this paper, we develop a new efficient estimation procedure for single index models with longitudinal data, based on Cholesky decomposition and local linear smoothing method. Asymptotic normality for the proposed estimators of both the parametric and nonparametric parts will be established. Monte Carlo simulation studies show excellent finite sample performance. Furthermore, we illustrate our methods with a real data example. 相似文献

5.

A simple test procedure in standardizing the power of Hosmer–Lemeshow test in large data sets

Xin Lai 《Journal of Statistical Computation and Simulation》2018,88(13):2463-2472

The Hosmer–Lemeshow (H–L) test is a widely used method when assessing the goodness-of-fit of a logistic regression model. However, the H–L test is sensitive to the sample sizes and the number of groups in H–L test. Cautions need to be taken for interpreting an H–L test with a large sample size. In this paper, we propose a simple test procedure to evaluate the model fit of logistic regression model with a large sample size, in which a bootstrap method is used and the test result is determined by the power of H–L test at the target sample size. Simulation studies show that the proposed method can effectively standardize the power of the H–L test under the pre-specified level of type I error. Application to the two datasets illustrates the usefulness of the proposed model. 相似文献

6.

Is categorization of random data necessary for parallel analysis on Likert-type data?

Li-Jen Weng 《统计学通讯:模拟与计算》2017,46(7):5367-5377

Random eigenvalues are the key elements in parallel analysis. When analyzing Likert-type data, is it necessary to convert the continuous random data to discrete type before estimating eigenvalues? The study compared the random eigenvalues obtained from continuous and categorized random data from two popular computer programs to be used as the basis for comparison in conducting parallel analysis on Likert-type data. Results indicated that categorized random data gave eigenvalues and number of factors similar to those obtained from continuous random data. It is suggested that when conducting parallel analysis on Likert-type data by the two programs, the conversion is unnecessary. 相似文献

7.

Adaptive robust estimation in joint mean–covariance regression model for bivariate longitudinal data

Jing Lv Tingting Li Yuanyuan Hao Xiaolin Pan 《Statistics》2018,52(1):64-83

The estimation of the covariance matrix is important in the analysis of bivariate longitudinal data. A good estimator for the covariance matrix can improve the efficiency of the estimators of the mean regression coefficients. Furthermore, the covariance estimation itself is also of interest, but it is a challenging job to model the covariance matrix of bivariate longitudinal data due to the complex structure and positive definite constraint. In addition, most of existing approaches are based on the maximum likelihood, which is very sensitive to outliers or heavy-tail error distributions. In this article, an adaptive robust estimation method is proposed for bivariate longitudinal data. Unlike the existing likelihood-based methods, the proposed method can adapt to different error distributions. Specifically, at first, we utilize the modified Cholesky block decomposition to parameterize the covariance matrices. Secondly, we apply the bounded Huber's score function to develop a set of robust generalized estimating equations to estimate the parameters both in the mean and the covariance models simultaneously. A data-driven approach is presented to select the parameter c in the Huber's score function, which can ensure that the proposed method is robust and efficient. A simulation study and a real data analysis are conducted to illustrate the robustness and efficiency of the proposed approach. 相似文献

8.

On inference for Kendall's τ within a longitudinal data setting

Yan Ma 《Journal of applied statistics》2012,39(11):2441-2452

Kendall's τ is a non-parametric measure of correlation based on ranks and is used in a wide range of research disciplines. Although methods are available for making inference about Kendall's τ, none has been extended to modeling multiple Kendall's τs arising in longitudinal data analysis. Compounding this problem is the pervasive issue of missing data in such study designs. In this article, we develop a novel approach to provide inference about Kendall's τ within a longitudinal study setting under both complete and missing data. The proposed approach is illustrated with simulated data and applied to an HIV prevention study. 相似文献

9.

Joint mean–covariance model in generalized partially linear varying coefficient models for longitudinal data

《Journal of Statistical Computation and Simulation》2012,82(6):1166-1182

In longitudinal data analysis, efficient estimation of regression coefficients requires a correct specification of certain covariance structure, and efficient estimation of covariance matrix requires a correct specification of mean regression model. In this article, we propose a general semiparametric model for the mean and the covariance simultaneously using the modified Cholesky decomposition. A regression spline-based approach within the framework of generalized estimating equations is proposed to estimate the parameters in the mean and the covariance. Under regularity conditions, asymptotic properties of the resulting estimators are established. Extensive simulation is conducted to investigate the performance of the proposed estimator and in the end a real data set is analysed using the proposed approach. 相似文献

10.

How much reliable are the integrated ‘live’ data? A validation strategy proposal for the non-parametric micro statistical matching

Riccardo D'Alberto Meri Raggi 《Journal of applied statistics》2021,48(2):322

The integration of different data sources is a widely discussed topic among both the researchers and the Official Statistics. Integrating data helps to contain costs and time required by new data collections. The non-parametric micro Statistical Matching (SM) allows to integrate ‘live’ data resorting only to the observed information, potentially avoiding the misspecification bias and speeding the computational effort. Despite these pros, the assessment of the integration goodness when we use this method is not robust. Moreover, several applications comply with some commonly accepted practices which recommend e.g. to use the biggest data set as donor. We propose a validation strategy to assess the integration goodness. We apply it to investigate these practices and to explore how different combinations of the SM techniques and distance functions perform in terms of the reliability of the synthetic (complete) data set generated. The validation strategy takes advantage of the relation existing among the variables pre-and-post the integration. The results show that ‘the biggest, the best’ rule must not be considered mandatory anymore. Indeed, the integration goodness increases in relation to the variability of the matching variables rather than with respect to the dimensionality ratio between the recipient and the donor data set. 相似文献

11.

The weighted general linear model for longitudinal medical cost data – an application in colorectal cancer

Y. T. Hwang C. H. Huang W. L. Yeh Y. D. Shen 《Journal of applied statistics》2017,44(2):288-307

Identifying cost-effective decisions that can take into account of medical cost and health outcome is an important issue under very limited resources. Analyzing medical costs has been challenged owing to skewness of cost distributions, heterogeneity across samples and censoring. When censoring is due to administrative reasons, the total cost might be related to the survival time since longer survivals are likely to be censored and the corresponding total cost will be censored as well. This paper uses the general linear model for the longitudinal data to model the repeated medical cost data and the weighted estimating equation is used to find more accurate estimates for the parameter. Furthermore, the asymptotic properties for the proposed model are discussed. Simulations are used to evaluate the performance of estimators under various scenarios. Finally, the proposed model is implemented on the data extracted from National Health Insurance database for patients with the colorectal cancer. 相似文献

12.

A weighted Harrell–Davis distance test with applications to censored data

Dongliang Wang Alan D. Hutson 《统计学通讯:理论与方法》2017,46(10):5022-5034

Consider the standard treatment-control model with a time-to-event endpoint. We propose a novel interpretable test statistic from a quantile function point of view. The large sample consistency of our estimator is proven for fixed bandwidth values theoretically and validated empirically. A Monte Carlo simulation study also shows that given small sample sizes, utilization of a tuning parameter through the application of a smooth quantile function estimator shows an improvement in efficiency in terms of the MSE when compared to direct application of classic Kaplan–Meier survival function estimator. The procedure is finally illustrated via an application to epithelial ovarian cancer data. 相似文献

13.

Dale L. Zimmerman, Vicente A. Núñez-Antón: Antedependence models for longitudinal data

P. G. Hackl 《Statistical Papers》2013,54(2):543-544

相似文献

14.

A multivariate test for detecting fraud based on Benford’s law,with application to music streaming data

Mumic Nermina Filzmoser Peter 《Statistical Methods and Applications》2021,30(3):819-840

Statistical Methods & Applications - Benford’s law became a prevalent concept for fraud and anomaly detection. It examines the frequencies of the leading digits of numbers in a collection... 相似文献

15.

When does Heckman’s two-step procedure for censored data work and when does it not?

Robert Jonsson 《Statistical Papers》2012,53(1):33-49

Heckman’s two-step procedure (Heckit) for estimating the parameters in linear models from censored data is frequently used by econometricians, despite of the fact that earlier studies cast doubt on the procedure. In this paper it is shown that estimates of the hazard h for approaching the censoring limit, the latter being used as an explanatory variable in the second step of the Heckit, can induce multicollinearity. The influence of the censoring proportion and sample size upon bias and variance in three types of random linear models are studied by simulations. From these results a simple relation is established that describes how absolute bias depends on the censoring proportion and the sample size. It is also shown that the Heckit may work with non-normal (Laplace) distributions, but it collapses if h deviates too much from that of the normal distribution. Data from a study of work resumption after sick-listing are used to demonstrate that the Heckit can be very risky. 相似文献

16.

A new approach for disclosure control in the IAB establishment panel—multiple imputation for a better data access

Jörg Drechsler Agnes Dundler Stefan Bender Susanne Rässler Thomas Zwick 《AStA Advances in Statistical Analysis》2008,92(4):439-458

For micro-datasets considered for release as scientific or public use files, statistical agencies have to face the dilemma of guaranteeing the confidentiality of survey respondents on the one hand and offering sufficiently detailed data on the other hand. For that reason, a variety of methods to guarantee disclosure control is discussed in the literature. In this paper, we present an application of Rubin’s (J. Off. Stat. 9, 462–468, 1993) idea to generate synthetic datasets from existing confidential survey data for public release.We use a set of variables from the 1997 wave of the German IAB Establishment Panel and evaluate the quality of the approach by comparing results from an analysis by Zwick (Ger. Econ. Rev. 6(2), 155–184, 2005) with the original data with the results we achieve for the same analysis run on the dataset after the imputation procedure. The comparison shows that valid inferences can be obtained using the synthetic datasets in this context, while confidentiality is guaranteed for the survey participants. 相似文献

17.

A note on the Jarque–Bera normality test for GARCH innovations

Sangyeol Lee Siyun Park Taewook Lee 《Journal of the Korean Statistical Society》2010,39(1):93-102

In this paper, we consider the validity of the Jarque–Bera normality test whose construction is based on the residuals, for the innovations of GARCH (generalized autoregressive conditional heteroscedastic) models. It is shown that the asymptotic behavior of the original form of the JB test adopted in this paper is identical to that of the test statistic based on true errors. The simulation study also confirms the validity of the original form since it outperforms other available normality tests. 相似文献

18.

The estimation for Lévy processes in high frequency data

Jing Zheng Baolin Xu Zongwu Cai 《Econometric Reviews》2018,37(10):1051-1066

In this article, a generalized Lévy model is proposed and its parameters are estimated in high-frequency data settings. An infinitesimal generator of Lévy processes is used to study the asymptotic properties of the drift and volatility estimators. They are consistent asymptotically and are independent of other parameters making them better than those in Chen et al. (2010 Chen, S. X., Delaigle, A., Hall, P. (2010). Nonparametric estimation for a class of Lévy processes. Journal of Econometrics 157:257–271.[Crossref], [Web of Science ®] , [Google Scholar]). The estimators proposed here also have fast convergence rates and are simple to implement. 相似文献

19.

A two-stage estimation in the Clayton–Oakes model with marginal linear transformation models for multivariate failure time data

Chen CM Yu CY 《Lifetime data analysis》2012,18(1):94-115

This paper considers the analysis of multivariate survival data where the marginal distributions are specified by semiparametric transformation models, a general class including the Cox model and the proportional odds model as special cases. First, consideration is given to the situation where the joint distribution of all failure times within the same cluster is specified by the Clayton–Oakes model (Clayton, Biometrika 65:141–151, l978; Oakes, J R Stat Soc B 44:412–422, 1982). A two-stage estimation procedure is adopted by first estimating the marginal parameters under the independence working assumption, and then the association parameter is estimated from the maximization of the full likelihood function with the estimators of the marginal parameters plugged in. The asymptotic properties of all estimators in the semiparametric model are derived. For the second situation, the third and higher order dependency structures are left unspecified, and interest focuses on the pairwise correlation between any two failure times. Thus, the pairwise association estimate can be obtained in the second stage by maximizing the pairwise likelihood function. Large sample properties for the pairwise association are also derived. Simulation studies show that the proposed approach is appropriate for practical use. To illustrate, a subset of the data from the Diabetic Retinopathy Study is used. 相似文献

20.

A stable estimator of the information matrix under EM for dependent data

Jin-Chuan?Duan Andras?Fulop Email author 《Statistics and Computing》2011,21(1):83-91

This article develops a new and stable estimator for information matrix when the EM algorithm is used in maximum likelihood estimation. This estimator is constructed using the smoothed individual complete-data scores that are readily available from running the EM algorithm. The method works for dependent data sets and when the expectation step is an irregular function of the conditioning parameters. In comparison to the approach of Louis (J. R. Stat. Soc., Ser. B 44:226–233, 1982), this new estimator is more stable and easier to implement. Both real and simulated data are used to demonstrate the use of this new estimator. 相似文献