期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Adjusting for bias in randomized cluster trials

James Reed III 《Journal of applied statistics》2003,30(1):79-85

The randomized cluster design is typical in studies where the unit of randomization is a cluster of individuals rather than the individual. Evaluating various intervention strategies across medical care providers at either an institutional level or at a physician group practice level fits the randomized cluster model. Clearly, the analytical approach to such studies must take the unit of randomization and accompanying intraclass correlation into consideration. We review alternative methods to the typical Pearson's chi-square analysis and illustrate these alternatives. We have written and tested a Fortran program that produces the statistics outlined in this paper. The program, in an executable format is available from the author on request. 相似文献

2.

Q型系统聚类分析中的统计检验问题 总被引：2，自引：1，他引：1

傅德印《统计与信息论坛》2007,22(3):10-14

Q型系统聚类分析已经越来越成为人们广泛应用的多元统计分析方法。然而在应用中盲目套用系统聚类分析方法的情况很多，并对聚类分析方法的适用性、聚类过程的合理性、聚类结果的有效性等问题分析重视不够，更谈不上对聚类分析进行统计检验。因此，为了更好地应用Q型系统聚类分析，就应对Q型系统聚类分析结果进行统计检验并建立统计检验体系。Q型系统聚类分析统计检验体系主要包括：Q型系统聚类分析结果的有效性检验；聚类分析类（组）数选择合理性检验；聚类变量的显著性检验。相似文献

3.

列联资料的有向相关聚类分析

杨贵军陈玮晓《统计与信息论坛》2012,(1):107-112

将相关分析和有向聚类分析结合,提出有向相关聚类方法。先依据相关性进行变量合并,再进行有向聚类,分析结果更合理,聚类过程更简单。将该方法用于大学生健康成长影响因素的调查数据,得出更合理的结果。相似文献

4.

Multilevel modelling of complex survey data

Sophia Rabe-Hesketh Anders Skrondal 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2006,169(4):805-827

Summary. Multilevel modelling is sometimes used for data from complex surveys involving multistage sampling, unequal sampling probabilities and stratification. We consider generalized linear mixed models and particularly the case of dichotomous responses. A pseudolikelihood approach for accommodating inverse probability weights in multilevel models with an arbitrary number of levels is implemented by using adaptive quadrature. A sandwich estimator is used to obtain standard errors that account for stratification and clustering. When level 1 weights are used that vary between elementary units in clusters, the scaling of the weights becomes important. We point out that not only variance components but also regression coefficients can be severely biased when the response is dichotomous. The pseudolikelihood methodology is applied to complex survey data on reading proficiency from the American sample of the 'Program for international student assessment' 2000 study, using the Stata program gllamm which can estimate a wide range of multilevel and latent variable models. Performance of pseudo-maximum-likelihood with different methods for handling level 1 weights is investigated in a Monte Carlo experiment. Pseudo-maximum-likelihood estimators of (conditional) regression coefficients perform well for large cluster sizes but are biased for small cluster sizes. In contrast, estimators of marginal effects perform well in both situations. We conclude that caution must be exercised in pseudo-maximum-likelihood estimation for small cluster sizes when level 1 weights are used. 相似文献

5.

On the asymptotic properties of fuzzy c-means cluster prototypes as estimators of mixture subpopulation centers

Richard J Hathaway James C Bezdek 《统计学通讯:理论与方法》2013,42(2):505-513

Several computational studies suggest that the fuzzy c-means (FCM) clustering scheme may be used successfully in some cases to obtain estimates for the parameters of a statistical mixture (e.g., for a mixture of normal distributions). While these (limited) simulation results for the fuzzy c-means approach support this hypothesis, we provide herein an example that shows that the FCM cluster prototypes cannot generally be statistically consistent estimators of the centers (means) of any univariate mixture having symmetric component distributions 相似文献

6.

Modeling probability density through ultraspherical polynomial transformations

Terhi Mäkinen Lasse Holmström 《统计学通讯:模拟与计算》2017,46(8):5879-5900

We present a method for fitting parametric probability density models using an integrated square error criterion on a continuum of weighted Lebesgue spaces formed by ultraspherical polynomials. This approach is inherently suitable for creating mixture model representations of complex distributions and allows fully autonomous cluster analysis of high-dimensional datasets. The method is also suitable for extremely large sets, allowing post facto model selection and analysis even in the absence of the original data. Furthermore, the fitting procedure only requires the parametric model to be pointwise evaluable, making it trivial to fit user-defined models through a generic algorithm. 相似文献

7.

Modeling proportions and marginal counts simultaneously for clustered multinomial data with random cluster sizes

Guohua Yan Renjun Ma 《Journal of applied statistics》2016,43(6):1074-1087

Clustered multinomial data with random cluster sizes commonly appear in health, environmental and ecological studies. Traditional approaches for analyzing clustered multinomial data contemplate two assumptions. One of these assumptions is that cluster sizes are fixed, whereas the other demands cluster sizes to be positive. Randomness of the cluster sizes may be the determinant of the within-cluster correlation and between-cluster variation. We propose a baseline-category mixed model for clustered multinomial data with random cluster sizes based on Poisson mixed models. Our orthodox best linear unbiased predictor approach to this model depends only on the moment structure of unobserved distribution-free random effects. Our approach also consolidates the marginal and conditional modeling interpretations. Unlike the traditional methods, our approach can accommodate both random and zero cluster sizes. Two real-life multinomial data examples, crime data and food contamination data, are used to manifest our proposed methodology. 相似文献

8.

A Probability theory of hierarchic clustering using random dendrograms

《Journal of Statistical Computation and Simulation》2012,82(2-3):145-157

It is widely recognized that a major, current problem in cluster analysis is that of validating results. This paper looks at one possible approach to the validation of results. It considers the structure of (unlabelled) dendrograms and proposes a model of random dendrograms. Algorithms for calculating the probability distribution of a coefficient of structure of a dendrogram are discussed, —both by enumeration of distinct dendrograms, and by Monte Carlo generation of dendrograms 相似文献

9.

一类基于模型的聚类方法

魏瑾瑞《统计与信息论坛》2014,(2):19-22

在聚类问题中,若变量之间存在相关性,传统的应对方法主要是考虑采用马氏距离、主成分聚类等方法,但其可操作性或可解释性较差,因此提出一类基于模型的聚类方法,先对变量间的相关性结构建模(作为辅助信息)再做聚类分析。这种方法的优点主要在于:适用范围更宽泛,不仅能处理(线性)相关问题,而且还可以处理变量间存在的其他复杂结构生成的数据聚类问题;各个变量的重要性也可以通过模型的回归系数来体现;比马氏距离更稳健、更具操作性,比主成分聚类更容易得到解释,算法上也更为简洁有效。相似文献

10.

A joint modelling approach for clustered recurrent events and death events

Yanchun Bao Hongsheng Dai Tao Wang Sung-Kiang Chuang 《Journal of applied statistics》2013,40(1):123-140

In dental implant research studies, events such as implant complications including pain or infection may be observed recurrently before failure events, i.e. the death of implants. It is natural to assume that recurrent events and failure events are correlated to each other, since they happen on the same implant (subject) and complication times have strong effects on the implant survival time. On the other hand, each patient may have more than one implant. Therefore these recurrent events or failure events are clustered since implant complication times or failure times within the same patient (cluster) are likely to be correlated. The overall implant survival times and recurrent complication times are both interesting to us. In this paper, a joint modelling approach is proposed for modelling complication events and dental implant survival times simultaneously. The proposed method uses a frailty process to model the correlation within cluster and the correlation within subjects. We use Bayesian methods to obtain estimates of the parameters. Performance of the joint models are shown via simulation studies and data analysis. 相似文献

11.

Deconvolution methods for non-parametric inference in two-level mixed models

Peter Hall Tapabrata Maiti 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2009,71(3):703-718

Summary. We develop a general non-parametric approach to the analysis of clustered data via random effects. Assuming only that the link function is known, the regression functions and the distributions of both cluster means and observation errors are treated non-parametrically. Our argument proceeds by viewing the observation error at the cluster mean level as though it were a measurement error in an errors-in-variables problem, and using a deconvolution argument to access the distribution of the cluster mean. A Fourier deconvolution approach could be used if the distribution of the error-in-variables were known. In practice it is unknown, of course, but it can be estimated from repeated measurements, and in this way deconvolution can be achieved in an approximate sense. This argument might be interpreted as implying that large numbers of replicates are necessary for each cluster mean distribution, but that is not so; we avoid this requirement by incorporating statistical smoothing over values of nearby explanatory variables. Empirical rules are developed for the choice of smoothing parameter. Numerical simulations, and an application to real data, demonstrate small sample performance for this package of methodology. We also develop theory establishing statistical consistency. 相似文献

12.

Clustering work and family trajectories by using a divisive algorithm

Raffaella Piccarreta Francesco C. Billari 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2007,170(4):1061-1078

Summary. We present an approach to the construction of clusters of life course trajectories and use it to obtain ideal types of trajectories that can be interpreted and analysed meaningfully. We represent life courses as sequences on a monthly timescale and apply optimal matching analysis to compute dissimilarities between individuals. We introduce a new divisive clustering algorithm which has features that are in common with both Ward's agglomerative algorithm and classification and regression trees. We analyse British Household Panel Survey data on the employment and family trajectories of women. Our method produces clusters of sequences for which it is straightforward to determine who belongs to each cluster, making it easier to interpret the relative importance of life course factors in distinguishing subgroups of the population. Moreover our method gives guidance on selecting the number of clusters. 相似文献

13.

A simple approach to analyzing clustered longitudinal data

Matthew Stephenson R. Ayesha Ali Gerarda A. Darlington 《统计学通讯:模拟与计算》2017,46(5):3553-3562

When modeling correlated binary data in the presence of informative cluster sizes, generalized estimating equations with either resampling or inverse-weighting, are often used to correct for estimation bias. However, existing methods for the clustered longitudinal setting assume constant cluster sizes over time. We present a subject-weighted generalized estimating equations scheme that provides valid parameter estimation for the clustered longitudinal setting while allowing cluster sizes to change over time. We compare, via simulation, the performance of existing methods to our subject-weighted approach. The subject-weighted approach was the only method that showed negligible bias, with excellent coverage, for all model parameters. 相似文献

14.

基于模糊分析的中国商业银行战略联盟伙伴选择研究

邬连东《统计与信息论坛》2008,23(2):81-85

随着金融业务的国际化，战略联盟已经成为银行提高自身竞争力的一个有效手段。研究确定商业银行联盟伙伴选择的原则与标准，根据这些标准，运用模糊聚类分析法对银行潜在的联盟伙伴的选择进行分析，使商业银行有针对性的与一些联盟伙伴发展合作伙伴关系，减少交易成本和搜寻成本，从而提高银行的运作效率。相似文献

15.

Support Vector Clustering for Customer Segmentation on Mobile TV Service

Pedro Albuquerque Solange Alfinito Claudio V. Torres 《统计学通讯:模拟与计算》2015,44(6):1453-1464

This article presents a proposal for customer segmentation through Support Vector Clustering, a technique that has been gaining attention in the academic literature due to the good results usually obtained. An application of this method to a sample from consumers in Brazil about mobile TV service is performed; we compare this approach with the classical hierarchical methods of cluster analysis. It is concluded that this methodology is effective in reducing the heterogeneity often present in customers’ data, improving the cluster segmentation analysis for managers. 相似文献

16.

Analyzing Binary Outcome Data with Small Clusters: A Simulation Study

Ying Xu Chun Fan Lee Yin Bun Cheung 《统计学通讯:模拟与计算》2013,42(7):1771-1782

Binary outcome data with small clusters often arise in medical studies and the size of clusters might be informative of the outcome. The authors conducted a simulation study to examine the performance of a range of statistical methods. The simulation results showed that all methods performed mostly comparable in the estimation of covariate effects. However, the standard logistic regression approach that ignores the clustering encountered an undercoverage problem when the degree of clustering was nontrivial. The performance of random-effects logistic regression approach tended to be affected by low disease prevalence, relatively small cluster size, or informative cluster size. 相似文献

17.

Residual-based specification of the random-effects distribution for cluster data

Samuel Soubeyrand Joël Chad&#x;uf Ivan Sache Christian Lannou 《Statistical Methodology》2006,3(4):464-482

We propose a method for specifying the distribution of random effects included in a model for cluster data. The class of models we consider includes mixed models and frailty models whose random effects and explanatory variables are constant within clusters. The method is based on cluster residuals obtained by assuming that the random effects are equal between clusters. We exhibit an asymptotic relationship between the cluster residuals and variations of the random effects as the number of observations increases and the variance of the random effects decreases. The asymptotic relationship is used to specify the random-effects distribution. The method is applied to a frailty model and a model used to describe the spread of plant diseases. 相似文献

18.

A Generalized Estimating Equations Approach for Analysis of the Impact of New Technology on a Trawl Fishery 总被引：1，自引：0，他引：1

Janet Bishop David Die & You-Gan Wang 《Australian & New Zealand Journal of Statistics》2000,42(2):159-177

The article describes a generalized estimating equations approach that was used to investigate the impact of technology on vessel performance in a trawl fishery during 1988–96, while accounting for spatial and temporal correlations in the catch-effort data. Robust estimation of parameters in the presence of several levels of clustering depended more on the choice of cluster definition than on the choice of correlation structure within the cluster. Models with smaller cluster sizes produced stable results, while models with larger cluster sizes, that may have had complex within-cluster correlation structures and that had within-cluster covariates, produced estimates sensitive to the correlation structure. The preferred model arising from this dataset assumed that catches from a vessel were correlated in the same years and the same areas, but independent in different years and areas. The model that assumed catches from a vessel were correlated in all years and areas, equivalent to a random effects term for vessel, produced spurious results. This was an unexpected finding that highlighted the need to adopt a systematic strategy for modelling. The article proposes a modelling strategy of selecting the best cluster definition first, and the working correlation structure (within clusters) second. The article discusses the selection and interpretation of the model in the light of background knowledge of the data and utility of the model, and the potential for this modelling approach to apply in similar statistical situations. 相似文献

19.

Designing and analyzing clinical trials for personalized medicine via Bayesian models

Chuanwu Zhang Matthew S. Mayo Jo A. Wick Byron J. Gajewski 《Pharmaceutical statistics》2021,20(3):573-596

Patients with different characteristics (e.g., biomarkers, risk factors) may have different responses to the same medicine. Personalized medicine clinical studies that are designed to identify patient subgroup treatment efficacies can benefit patients and save medical resources. However, subgroup treatment effect identification complicates the study design in consideration of desired operating characteristics. We investigate three Bayesian adaptive models for subgroup treatment effect identification: pairwise independent, hierarchical, and cluster hierarchical achieved via Dirichlet Process (DP). The impact of interim analysis and longitudinal data modeling on the personalized medicine study design is also explored. Interim analysis is considered since they can accelerate personalized medicine studies in cases where early stopping rules for success or futility are met. We apply integrated two-component prediction method (ITP) for longitudinal data simulation, and simple linear regression for longitudinal data imputation to optimize the study design. The designs' performance in terms of power for the subgroup treatment effects and overall treatment effect, sample size, and study duration are investigated via simulation. We found the hierarchical model is an optimal approach to identifying subgroup treatment effects, and the cluster hierarchical model is an excellent alternative approach in cases where sufficient information is not available for specifying the priors. The interim analysis introduction to the study design lead to the trade-off between power and expected sample size via the adjustment of the early stopping criteria. The introduction of the longitudinal modeling slightly improves the power. These findings can be applied to future personalized medicine studies with discrete or time-to-event endpoints. 相似文献

20.

Quantile dispersion graphs to compare the efficiencies of cluster randomized designs

S. Mukhopadhyay S. W. Looney 《Journal of applied statistics》2009,36(11):1293-1305

The purpose of this article is to compare efficiencies of several cluster randomized designs using the method of quantile dispersion graphs (QDGs). A cluster randomized design is considered whenever subjects are randomized at a group level but analyzed at the individual level. A prior knowledge of the correlation existing between subjects within the same cluster is necessary to design these cluster randomized trials. Using the QDG approach, we are able to compare several cluster randomized designs without requiring any information on the intracluster correlation. For a given design, several quantiles of the power function, which are directly related to the effect size, are obtained for several effect sizes. The quantiles depend on the intracluster correlation present in the model. The dispersion of these quantiles over the space of the unknown intracluster correlation is determined, and then depicted by the QDGs. Two applications of the proposed methodology are presented. 相似文献