期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A geometric characterization of optimal designs for regression models with correlated observations

Holland-Letz T Dette H Pepelyshev A 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2011,73(2):239-252

We consider the problem of optimal design of experiments for random effects models, especially population models, where a small number of correlated observations can be taken on each individual, while the observations corresponding to different individuals are assumed to be uncorrelated. We focus on c-optimal design problems and show that the classical equivalence theorem and the famous geometric characterization of Elfving (1952) from the case of uncorrelated data can be adapted to the problem of selecting optimal sets of observations for the n individual patients. The theory is demonstrated by finding optimal designs for a linear model with correlated observations and a nonlinear random effects population model, which is commonly used in pharmacokinetics. 相似文献

2.

New flexible models generated by gamma random variables for lifetime modeling

Edwin M.M. Ortega Giovana O. Silva Gauss M. Cordeiro 《Journal of applied statistics》2015,42(10):2159-2179

In this paper we introduce a new three-parameter exponential-type distribution. The new distribution is quite flexible and can be used effectively in modeling survival data and reliability problems. It can have constant, decreasing, increasing, upside-down bathtub and bathtub-shaped hazard rate functions. It also generalizes some well-known distributions. We discuss maximum likelihood estimation of the model parameters for complete sample and for censored sample. Additionally, we formulate a new cure rate survival model by assuming that the number of competing causes of the event of interest has the Poisson distribution and the time to this event follows the proposed distribution. Maximum likelihood estimation of the model parameters of the new cure rate survival model is discussed for complete sample and censored sample. Two applications to real data are provided to illustrate the flexibility of the new model in practice. 相似文献

3.

A Bayesian approach to inference about a change point model with application to DNA copy number experimental data

Jie Chen Ayten Yiğiter Kuang-Chao Chang 《Journal of applied statistics》2011,38(9):1899-1913

In this paper, we study the change-point inference problem motivated by the genomic data that were collected for the purpose of monitoring DNA copy number changes. DNA copy number changes or copy number variations (CNVs) correspond to chromosomal aberrations and signify abnormality of a cell. Cancer development or other related diseases are usually relevant to DNA copy number changes on the genome. There are inherited random noises in such data, therefore, there is a need to employ an appropriate statistical model for identifying statistically significant DNA copy number changes. This type of statistical inference is evidently crucial in cancer researches, clinical diagnostic applications, and other related genomic researches. For the high-throughput genomic data resulting from DNA copy number experiments, a mean and variance change point model (MVCM) for detecting the CNVs is appropriate. We propose to use a Bayesian approach to study the MVCM for the cases of one change and propose to use a sliding window to search for all CNVs on a given chromosome. We carry out simulation studies to evaluate the estimate of the locus of the DNA copy number change using the derived posterior probability. These simulation results show that the approach is suitable for identifying copy number changes. The approach is also illustrated on several chromosomes from nine fibroblast cancer cell line data (array-based comparative genomic hybridization data). All DNA copy number aberrations that have been identified and verified by karyotyping are detected by our approach on these cell lines. 相似文献

4.

A new skew-bimodal distribution with applications

Altemir da Silva Braga Gauss M. Cordeiro Edwin M. M. Ortega 《统计学通讯:理论与方法》2018,47(12):2950-2968

The modeling and analysis of experiments is an important aspect of statistical work in a wide variety of scientific and technological fields. We introduce and study the odd log-logistic skew-normal model, which can be interpreted as a generalization of the skew-normal distribution. The new distribution can be used effectively in the analysis of experiments data since it accommodates unimodal, bimodal, symmetric, bimodal and right-skewed, and bimodal and left-skewed density function depending on the parameter values. We illustrate the importance of the new model by means of three real data sets in analysis of experiments. 相似文献

5.

Mixtures of Gaussian copula factor analyzers for clustering high dimensional data

《Journal of the Korean Statistical Society》2019,48(3):480-492

Mixtures of factor analyzers is a useful model-based clustering method which can avoid the curse of dimensionality in high-dimensional clustering. However, this approach is sensitive to both diverse non-normalities of marginal variables and outliers, which are commonly observed in multivariate experiments. We propose mixtures of Gaussian copula factor analyzers (MGCFA) for clustering high-dimensional clustering. This model has two advantages; (1) it allows different marginal distributions to facilitate fitting flexibility of the mixture model, (2) it can avoid the curse of dimensionality by embedding the factor-analytic structure in the component-correlation matrices of the mixture distribution.An EM algorithm is developed for the fitting of MGCFA. The proposed method is free of the curse of dimensionality and allows any parametric marginal distribution which fits best to the data. It is applied to both synthetic data and a microarray gene expression data for clustering and shows its better performance over several existing methods. 相似文献

6.

Flexible estimation of serial correlation in nonlinear mixed models

Jan Serroyen Marc Aerts Ellen Vloeberghs Peter Paul De Deyn Geert Verbeke 《Journal of applied statistics》2010,37(5):833-846

In the conventional linear mixed-effects model, four structures can be distinguished: fixed effects, random effects, measurement error and serial correlation. The latter captures the phenomenon that the correlation structure within a subject depends on the time lag between two measurements. While the general linear mixed model is rather flexible, the need has arisen to further increase flexibility. In addition to work done in the area, we propose the use of spline-based modeling of the serial correlation function, so as to allow for additional flexibility. This approach is applied to data from a pre-clinical experiment in dementia which studied the eating and drinking behavior in mice. 相似文献

7.

A novel quantification of information for longitudinal data analyzed by mixed‐effects modeling

Min Yuan Yi Li Yaning Yang Jinfeng Xu Fangbiao Tao Liang Zhao Honghui Zhou Jose Pinheiro Xu Steven Xu 《Pharmaceutical statistics》2020,19(4):388-398

Nonlinear mixed‐effects (NLME) modeling is one of the most powerful tools for analyzing longitudinal data especially under the sparse sampling design. The determinant of the Fisher information matrix is a commonly used global metric of the information that can be provided by the data under a given model. However, in clinical studies, it is also important to measure how much information the data provide for a certain parameter of interest under the assumed model, for example, the clearance in population pharmacokinetic models. This paper proposes a new, easy‐to‐interpret information metric, the “relative information” (RI), which is designed for specific parameters of a model and takes a value between 0% and 100%. We establish the relationship between interindividual variability for a specific parameter and the variance of the associated parameter estimator, demonstrating that, under a “perfect” experiment (eg, infinite samples or/and minimum experimental error), the RI and the variance of the model parameter estimator converge, respectively, to 100% and the ratio of the interindividual variability for that parameter and the number of subjects. Extensive simulation experiments and analyses of three real datasets show that our proposed RI metric can accurately characterize the information for parameters of interest for NLME models. The new information metric can be readily used to facilitate study designs and model diagnosis. 相似文献

8.

Slashed power-Lindley distribution

Yuri A. Iriarte Mario A. Rojas 《统计学通讯:理论与方法》2019,48(7):1709-1720

In this article, we introduce the slashed power-Lindley distribution. This model can be seen as an extension of the power-Lindley distribution with more flexibility in terms of the kurtosis of distribution. It arises as the ratio of two independent random variables, the one being a power-Lindley distribution and a power of the uniform distribution. We present properties and carry out estimates of the model parameters by the maximum likelihood method. Finally, we conduct a small simulation study to evaluate the performance of maximum likelihood estimators and we analyze a real data set to illustrate the usefulness of the new model. 相似文献

9.

Accelerated life regression modelling of dependent bivariate time‐to‐event data

Yun‐Hee Choi David E. Matthews 《Revue canadienne de statistique》2005,33(3):449-464

To analyze bivariate time‐to‐event data from matched or naturally paired study designs, researchers frequently use a random effect called frailty to model the dependence between within‐pair response measurements. The authors propose a computational framework for fitting dependent bivariate time‐to‐event data that combines frailty distributions and accelerated life regression models. In this framework users can choose from several parametric options for frailties, as well as the conditional distributions for within‐pair responses. The authors illustrate the flexibility that their framework represents using paired data from a study of laser photocoagulation therapy for retinopathy in diabetic patients. 相似文献

10.

The exponentiated power exponential regression model with different regression structures: application in nursing data

F. Prataviera J. C. S. Vasconcelos G. M. Cordeiro E. M. Hashimoto 《Journal of applied statistics》2019,46(10):1792-1821

We define the exponentiated power exponential distribution and propose a regression model with different systematic structures based on the new distribution. We show that the new regression model can be applied to dispersion data since it represents a parametric family of models that includes as sub-models some widely-known regression models. It then can be used more effectively in the analysis of real data. We use maximum likelihood estimation and derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes. Some global-influence measurements are also investigated and simulation studies are performed to evaluate the accuracy of the estimates. We provide an application of the regression model with four systematic structures to nursing activities score data in the Unit of the Medical Clinic of University of São Paulo (USP) Hospital. 相似文献

11.

Odd-Burr generalized family of distributions with some applications

Morad Alizadeh Gauss M. Cordeiro Maria do Carmo S. Lima Edwin M. M. Ortega 《Journal of Statistical Computation and Simulation》2017,87(2):367-389

We study a new family of continuous distributions with two extra shape parameters called the Burr generalized family of distributions. We investigate the shapes of the density and hazard rate function. We derive explicit expressions for some of its mathematical quantities. The estimation of the model parameters is performed by maximum likelihood. We prove the flexibility of the new family by means of applications to two real data sets. Furthermore, we propose a new extended regression model based on the logarithm of the Burr generalized distribution. This model can be very useful to the analysis of real data and provide more realistic fits than other special regression models. 相似文献

12.

Estimation of population mean and variance in flock management: a ranked set sampling approach in a finite population setting

《Journal of Statistical Computation and Simulation》2012,82(11):905-919

Ranked set sampling is a sampling technique that provides substantial cost efficiency in experiments where a quick, inexpensive ranking procedure is available to rank the units prior to formal, expensive and precise measurements. Although the theoretical properties and relative efficiencies of this approach with respect to simple random sampling have been extensively studied in the literature for the infinite population setting, the use of ranked set sampling methods has not yet been explored widely for finite populations. The purpose of this study is to use sheep population data from the Research Farm at Ataturk University, Erzurum, Turkey, to demonstrate the practical benefits of ranked set sampling procedures relative to the more commonly used simple random sampling estimation of the population mean and variance in a finite population. It is shown that the ranked set sample mean remains unbiased for the population mean as is the case for the infinite population, but the variance estimators are unbiased only with use of the finite population correction factor. Both mean and variance estimators provide substantial improvement over their simple random sample counterparts. 相似文献

13.

A study of R2 measure under the accelerated failure time models

Priscilla H. Chan Christina D. Chambers 《统计学通讯:模拟与计算》2018,47(2):380-391

For right-censored data, the accelerated failure time (AFT) model is an alternative to the commonly used proportional hazards regression model. It is a linear model for the (log-transformed) outcome of interest, and is particularly useful for censored outcomes that are not time-to-event, such as laboratory measurements. We provide a general and easily computable definition of the R² measure of explained variation under the AFT model for right-censored data. We study its behavior under different censoring scenarios and under different error distributions; in particular, we also study its robustness when the parametric error distribution is misspecified. Based on Monte Carlo investigation results, we recommend the log-normal distribution as a robust error distribution to be used in practice for the parametric AFT model, when the R² measure is of interest. We apply our methodology to an alcohol consumption during pregnancy data set from Ukraine. 相似文献

14.

Generalized exponential distributions 总被引：8，自引：0，他引：8

Rameshwar D. Gupta & Debasis Kundu 《Australian & New Zealand Journal of Statistics》1999,41(2):173-188

The three-parameter gamma and three-parameter Weibull distributions are commonly used for analysing any lifetime data or skewed data. Both distributions have several desirable properties, and nice physical interpretations. Because of the scale and shape parameters, both have quite a bit of flexibility for analysing different types of lifetime data. They have increasing as well as decreasing hazard rate depending on the shape parameter. Unfortunately both distributions also have certain drawbacks. This paper considers a three-parameter distribution which is a particular case of the exponentiated Weibull distribution originally proposed by Mudholkar, Srivastava & Freimer (1995) when the location parameter is not present. The study examines different properties of this model and observes that this family has some interesting features which are quite similar to those of the gamma family and the Weibull family, and certain distinct properties also. It appears this model can be used as an alternative to the gamma model or the Weibull model in many situations. One dataset is provided where the three-parameter generalized exponential distribution fits better than the three-parameter Weibull distribution or the three-parameter gamma distribution. 相似文献

15.

Bivariate modelling of longitudinal measurements of two human immunodeficiency type 1 disease progression markers in the presence of informative drop-outs

N. Pantazis G. Touloumi A. S. Walker A. G. Babiker 《Journal of the Royal Statistical Society. Series C, Applied statistics》2005,54(2):405-423

Summary. The main statistical problem in many epidemiological studies which involve repeated measurements of surrogate markers is the frequent occurrence of missing data. Standard likelihood-based approaches like the linear random-effects model fail to give unbiased estimates when data are non-ignorably missing. In human immunodeficiency virus (HIV) type 1 infection, two markers which have been widely used to track progression of the disease are CD4 cell counts and HIV–ribonucleic acid (RNA) viral load levels. Repeated measurements of these markers tend to be informatively censored, which is a special case of non-ignorable missingness. In such cases, we need to apply methods that jointly model the observed data and the missingness process. Despite their high correlation, longitudinal data of these markers have been analysed independently by using mainly random-effects models. Touloumi and co-workers have proposed a model termed the joint multivariate random-effects model which combines a linear random-effects model for the underlying pattern of the marker with a log-normal survival model for the drop-out process. We extend the joint multivariate random-effects model to model simultaneously the CD4 cell and viral load data while adjusting for informative drop-outs due to disease progression or death. Estimates of all the model's parameters are obtained by using the restricted iterative generalized least squares method or a modified version of it using the EM algorithm as a nested algorithm in the case of censored survival data taking also into account non-linearity in the HIV–RNA trend. The method proposed is evaluated and compared with simpler approaches in a simulation study. Finally the method is applied to a subset of the data from the 'Concerted action on seroconversion to AIDS and death in Europe' study. 相似文献

16.

Estimation of extremes in corrosion engineering

PHILIP A. SCARF PATRICK J. LAYCOCK 《Journal of applied statistics》1996,23(6):621-644

SUMMARY This paper reviews a number of extreme value models which have been applied to corrosion problems. The techniques considered are used to model and predict the statistical behaviour of corrosion extremes, such as the largest pit, thinnest wall, maximum penetration or similar assessment of corrosion phenomenon. These techniques can be applied to measurements over a regular grid or to measurements of selected extremes, and can be adapted to accommodate all values over a selected threshold, or a selected number of the largest values-or only the single largest value. Data can come from one coupon or several coupons, and can be modelled to allow for dependence on environmental conditions, surface area examined, and duration of exposure or of experimentation. The techniquesare demonstrated on data from laboratory experiments and also on data collected in an industrial context. 相似文献

17.

High dimensional extension of the growth curve model and its application in genetics

Sayantee Jana Narayanaswamy Balakrishnan Dietrich von Rosen Jemila Seid Hamid 《Statistical Methods and Applications》2017,26(2):273-292

Recent advances in technology have allowed researchers to collect large scale complex biological data, simultaneously, often in matrix format. In genomic studies, for instance, measurements from tens to hundreds of thousands of genes are taken from individuals across several experimental groups. In time course microarray experiments, gene expression is measured at several time points for each individual across the whole genome resulting in a high-dimensional matrix for each gene. In such experiments, researchers are faced with high-dimensional longitudinal data. Unfortunately, traditional methods for longitudinal data are not appropriate for high-dimensional situations. In this paper, we use the growth curve model and introduce test useful for high-dimensional longitudinal data and evaluate its performance using simulations. We also show how our approach can be used to filter genes in time course genomic experiments. We illustrate this using publicly available genomic data, involving experiments comparing normal human lung tissue with vanadium pentoxide treated human lung tissue, designed with the aim of understanding the susceptibility of individuals working in petro-chemical factories to airway re-modelling. Using our method, we were able to filter out 1053 (about 5 %) genes as non-noise genes from a pool of 22,277. Although our focus is on hypothesis testing, we also provided modified maximum likelihood estimator for the mean parameter of the growth curve model and assessed its performance through bias and mean squared error. 相似文献

18.

Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities 总被引：9，自引：0，他引：9

Ian J. Wilson Michael E. Weale David J. Balding 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2003,166(2):155-188

Summary. We develop a flexible class of Metropolis–Hastings algorithms for drawing inferences about population histories and mutation rates from deoxyribonucleic acid (DNA) sequence data. Match probabilities for use in forensic identification are also obtained, which is particularly useful for mitochondrial DNA profiles. Our data augmentation approach, in which the ancestral DNA data are inferred at each node of the genealogical tree, simplifies likelihood calculations and permits a wide class of mutation models to be employed, so that many different types of DNA sequence data can be analysed within our framework. Moreover, simpler likelihood calculations imply greater freedom for generating tree proposals, so that algorithms with good mixing properties can be implemented. We incorporate the effects of demography by means of simple mechanisms for changes in population size and structure, and we estimate the corresponding demographic parameters, but we do not here allow for the effects of either recombination or selection. We illustrate our methods by application to four human DNA data sets, consisting of DNA sequences, short tandem repeat loci, single-nucleotide polymorphism sites and insertion sites. Two of the data sets are drawn from the male-specific Y-chromosome, one from maternally inherited mitochondrial DNA and one from the β -globin locus on chromosome 11. 相似文献

19.

Inferences on the Number of Unseen Species and the Number of Abundant/Rare Species

Hongmei Zhang 《Journal of applied statistics》2007,34(6):725-740

This paper focuses on estimating the number of species and the number of abundant species in a specific geographic region and, consequently, draw inferences on the number of rare species. The word 'species' is generic referring to any objects in a population that can be categorized. In the areas of biology, ecology, literature, etc, the species frequency distributions are usually severely skewed, in which case the population contains a few very abundant species and many rare ones. To model a such situation, we develop an asymmetric multinomial-Dirichlet probability model using species frequency data. Posterior distributions on the number of species and the number of abundant species are obtained and posterior inferences are induced using MCMC simulations. Simulations are used to demonstrate and evaluate the developed methodology. We apply the method to a DNA segment data set and a butterfly data set. Comparisons among different approaches to inferring the number of species are also discussed in this paper. 相似文献

20.

Analysis of repeated categorical responses from fully and partially cross-classified data

Michael Haber Catherine C. H. Chen C. David Williamson 《统计学通讯:理论与方法》2013,42(10):3293-3313

Many follow-up studies involve categorical data measured on the same individual at different times. Frequently, some of the individuals are missing one or more of the measurements. This results in a contingency table with both fully and partially cross-classified data. Two models can be used to analyze data of this type: (i) The multiple-sample model, in which all the study subjects with the same configuration of missing observations are considered a separate sample. (ii) The single-sample model, which assumes that the missing observations are the result of a mechanism causing subjects to lose the informtion from one or some of the measurements. In this work we compare the two approaches and show that under certain conditions, the two models yield the same maximum likelihood estimates of the cell probabilities in the underlying contingency table. 相似文献