期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The Bernstein–von Mises theorem in semiparametric competing risks models

Pierpaolo De Blasi Nils Lid Hjort 《Journal of statistical planning and inference》2009

Semiparametric Bayesian models are nowadays a popular tool in event history analysis. An important area of research concerns the investigation of frequentist properties of posterior inference. In this paper, we propose novel semiparametric Bayesian models for the analysis of competing risks data and investigate the Bernstein–von Mises theorem for differentiable functionals of model parameters. The model is specified by expressing the cause-specific hazard as the product of the conditional probability of a failure type and the overall hazard rate. We take the conditional probability as a smooth function of time and leave the cumulative overall hazard unspecified. A prior distribution is defined on the joint parameter space, which includes a beta process prior for the cumulative overall hazard. We first develop the large-sample properties of maximum likelihood estimators by giving simple sufficient conditions for them to hold. Then, we show that, under the chosen priors, the posterior distribution for any differentiable functional of interest is asymptotically equivalent to the sampling distribution derived from maximum likelihood estimation. A simulation study is provided to illustrate the coverage properties of credible intervals on cumulative incidence functions. 相似文献

2.

A Kolmogorov–Smirnov type test for skew normal distributions based on the empirical moment generating function

Simos G. Meintanis 《Journal of statistical planning and inference》2007

In this paper tests of hypothesis are constructed for the family of skew normal distributions. The proposed tests utilize the fact that the moment generating function of the skew normal variable satisfies a simple differential equation. The empirical counterpart of this equation, involving the empirical moment generating function, yields simple consistent test statistics. Finite-sample results as well as results from real data are provided for the proposed procedures. 相似文献

3.

Test for homogeneity in Hardy–Weinberg normal mixture model

Jiahua Chen Yongsong Qin 《Journal of statistical planning and inference》2008

The phenotype of a quantitative trait locus (QTL) is often modeled by a finite mixture of normal distributions. If the QTL effect depends on the number of copies of a specific allele one carries, then the mixture model has three components. In this case, the mixing proportions have a binomial structure according to the Hardy–Weinberg equilibrium. In the search for QTL, a significance test of homogeneity against the Hardy–Weinberg normal mixture model alternative is an important first step. The LOD score method, a likelihood ratio test used in genetics, is a favored choice. However, there is not yet a general theory for the limiting distribution of the likelihood ratio statistic in the presence of unknown variance. This paper derives the limiting distribution of the likelihood ratio statistic, which can be described by the supremum of a quadratic form of a Gaussian process. Further, the result implies that the distribution of the modified likelihood ratio statistic is well approximated by a chi-squared distribution. Simulation results show that the approximation has satisfactory precision for the cases considered. We also give a real-data example. 相似文献

4.

Introducing model uncertainty by moving blocks bootstrap 总被引：1，自引：1，他引：0

Andrés M. Alonso Daniel Peña Juan Romo 《Statistical Papers》2006,47(2):167-179

It is common in parametric bootstrap to select the model from the data, and then treat as if it were the true model. Chatfield (1993, 1996) has shown that ignoring the model uncertainty may seriously undermine the coverage accuracy of prediction intervals. In this paper, we propose a method based on moving block bootstrap for introducing the model selection step in the resampling algorithm. We present a Monte Carlo study comparing the finite sample properties of the proposel method with those of alternative methods in the case of prediction intervas. 相似文献

5.

Bayesian variable selection with strong heredity constraints

Joungyoun Kim Johan Lim Yongdai Kim Woncheol Jang 《Journal of the Korean Statistical Society》2018,47(3):314-329

In this paper, we propose a Bayesian variable selection method for linear regression models with high-order interactions. Our method automatically enforces the heredity constraint, that is, a higher order interaction term can exist in the model only if both of its parent terms are in the model. Based on the stochastic search variable selection George and McCulloch (1993), we propose a novel hierarchical prior that fully considers the heredity constraint and controls the degree of sparsity simultaneously. We develop a Markov chain Monte Carlo (MCMC) algorithm to explore the model space efficiently while accounting for the heredity constraint by modifying the shotgun stochastic search algorithm Hans et al. (2007). The performance of the new model is demonstrated through comparisons with other methods. Numerical studies on both real data analysis and simulations show that our new method tends to find relevant variable more effectively when higher order interaction terms are considered. 相似文献

6.

Accelerated failure time models for the analysis of competing risks

Sangbum Choi Hyunsoon Cho 《Journal of the Korean Statistical Society》2019,48(3):315-326

Competing risks are common in clinical cancer research, as patients are subject to multiple potential failure outcomes, such as death from the cancer itself or from complications arising from the disease. In the analysis of competing risks, several regression methods are available for the evaluation of the relationship between covariates and cause-specific failures, many of which are based on Cox’s proportional hazards model. Although a great deal of research has been conducted on estimating competing risks, less attention has been devoted to linear regression modeling, which is often referred to as the accelerated failure time (AFT) model in survival literature. In this article, we address the use and interpretation of linear regression analysis with regard to the competing risks problem. We introduce two types of AFT modeling framework, where the influence of a covariate can be evaluated in relation to either a cause-specific hazard function, referred to as cause-specific AFT (CS-AFT) modeling in this study, or the cumulative incidence function of a particular failure type, referred to as crude-risk AFT (CR-AFT) modeling. Simulation studies illustrate that, as in hazard-based competing risks analysis, these two models can produce substantially different effects, depending on the relationship between the covariates and both the failure type of principal interest and competing failure types. We apply the AFT methods to data from non-Hodgkin lymphoma patients, where the dataset is characterized by two competing events, disease relapse and death without relapse, and non-proportionality. We demonstrate how the data can be analyzed and interpreted, using linear competing risks regression models. 相似文献

7.

Log-linear models for mutations in the HIV genome

C. Ahn G.G. Koch L. Paynter J.S. Preisser F. Seillier-Moiseiwitsch 《Journal of statistical planning and inference》2007

We discuss a general application of categorical data analysis to mutations along the HIV genome. We consider a multidimensional table for several positions at the same time. Due to the complexity of the multidimensional table, we may collapse it by pooling some categories. However, the association between the remaining variables may not be the same as before collapsing. We discuss the collapsibility of tables and the change in the meaning of parameters after collapsing categories. We also address this problem with a log-linear model. We present a parameterization with the consensus output as the reference cell as is appropriate to explain genomic mutations in HIV. We also consider five null hypotheses and some classical methods to address them. We illustrate methods for six positions along the HIV genome, through consideration of all triples of positions. 相似文献

8.

Large sample interval mapping method for genetic trait loci in finite regression mixture models

Hong Zhang Hanfeng Chen Zhaohai Li 《Journal of statistical planning and inference》2009

This article investigates the large sample interval mapping method for genetic trait loci (GTL) in a finite non-linear regression mixture model. The general model includes most commonly used kernel functions, such as exponential family mixture, logistic regression mixture and generalized linear mixture models, as special cases. The populations derived from either the backcross or intercross design are considered. In particular, unlike all existing results in the literature in the finite mixture models, the large sample results presented in this paper do not require the boundness condition on the parametric space. Therefore, the large sample theory presented in this article possesses general applicability to the interval mapping method of GTL in genetic research. The limiting null distribution of the likelihood ratio test statistics can be utilized easily to determine the threshold values or p-values required in the interval mapping. The limiting distribution is proved to be free of the parameter values of null model and free of the choice of a kernel function. Extension to the multiple marker interval GTL detection is also discussed. Simulation study results show favorable performance of the asymptotic procedure when sample sizes are moderate. 相似文献

9.

Generalized Linear Models with a Small Scale Parameter

K. M. Dorkenoo J.-R. Mathieu 《Statistics》2013,47(1-2):107-113

We propose a study of asymptotic properties in an extension of the Nelder-Wedderburn generalized linear models (GLM) and we apply the results to a model choice in Mandel's models of analysis of variance (ANOVA). This study concerns the behaviour of the maximum likelihood estimators (MLE) when the scale parameter of the GLM tends to zero. Finally, to allow the use of our results, we give some specifications of this limit in three cases of the GLM. 相似文献

10.

Multivariate meta-analysis for data consortia,individual patient meta-analysis,and pooling projects

John Ritz Eugene Demidenko Donna Spiegelman 《Journal of statistical planning and inference》2008

We discuss maximum likelihood and estimating equations methods for combining results from multiple studies in pooling projects and data consortia using a meta-analysis model, when the multivariate estimates with their covariance matrices are available. The estimates to be combined are typically regression slopes, often from relative risk models in biomedical and epidemiologic applications. We generalize the existing univariate meta-analysis model and investigate the efficiency advantages of the multivariate methods, relative to the univariate ones. We generalize a popular univariate test for between-studies homogeneity to a multivariate test. The methods are applied to a pooled analysis of type of carotenoids in relation to lung cancer incidence from seven prospective studies. In these data, the expected gain in efficiency was evident, sometimes to a large extent. Finally, we study the finite sample properties of the estimators and compare the multivariate ones to their univariate counterparts. 相似文献

11.

Bayesian mixture modeling for spatial Poisson process intensities,with applications to extreme value analysis

Athanasios Kottas Bruno Sansó 《Journal of statistical planning and inference》2007

We propose a method for the analysis of a spatial point pattern, which is assumed to arise as a set of observations from a spatial nonhomogeneous Poisson process. The spatial point pattern is observed in a bounded region, which, for most applications, is taken to be a rectangle in the space where the process is defined. The method is based on modeling a density function, defined on this bounded region, that is directly related with the intensity function of the Poisson process. We develop a flexible nonparametric mixture model for this density using a bivariate Beta distribution for the mixture kernel and a Dirichlet process prior for the mixing distribution. Using posterior simulation methods, we obtain full inference for the intensity function and any other functional of the process that might be of interest. We discuss applications to problems where inference for clustering in the spatial point pattern is of interest. Moreover, we consider applications of the methodology to extreme value analysis problems. We illustrate the modeling approach with three previously published data sets. Two of the data sets are from forestry and consist of locations of trees. The third data set consists of extremes from the Dow Jones index over a period of 1303 days. 相似文献

12.

Genetic Algorithm in the Wavelet Domain for Large p Small n Regression

Eylem Deniz Howe Orietta Nicolis 《统计学通讯:模拟与计算》2015,44(5):1144-1157

Many areas of statistical modeling are plagued by the “curse of dimensionality,” in which there are more variables than observations. This is especially true when developing functional regression models where the independent dataset is some type of spectral decomposition, such as data from near-infrared spectroscopy. While we could develop a very complex model by simply taking enough samples (such that n > p), this could prove impossible or prohibitively expensive. In addition, a regression model developed like this could turn out to be highly inefficient, as spectral data usually exhibit high multicollinearity. In this article, we propose a two-part algorithm for selecting an effective and efficient functional regression model. Our algorithm begins by evaluating a subset of discrete wavelet transformations, allowing for variation in both wavelet and filter number. Next, we perform an intermediate processing step to remove variables with low correlation to the response data. Finally, we use the genetic algorithm to perform a stochastic search through the subset regression model space, driven by an information-theoretic objective function. We allow our algorithm to develop the regression model for each response variable independently, so as to optimally model each variable. We demonstrate our method on the familiar biscuit dough dataset, which has been used in a similar context by several researchers. Our results demonstrate both the flexibility and the power of our algorithm. For each response variable, a different subset model is selected, and different wavelet transformations are used. The models developed by our algorithm show an improvement, as measured by lower mean error, over results in the published literature. 相似文献

13.

Modeling statistical dependence of Markov chains via copula models

Fentaw Abegaz U.V. Naik-Nimbalkar 《Journal of statistical planning and inference》2008

Conditional probability distributions have been commonly used in modeling Markov chains. In this paper we consider an alternative approach based on copulas to investigate Markov-type dependence structures. Based on the realization of a single Markov chain, we estimate the parameters using one- and two-stage estimation procedures. We derive asymptotic properties of the marginal and copula parameter estimators and compare performance of the estimation procedures based on Monte Carlo simulations. At low and moderate dependence structures the two-stage estimation has comparable performance as the maximum likelihood estimation. In addition we propose a parametric pseudo-likelihood ratio test for copula model selection under the two-stage procedure. We apply the proposed methods to an environmental data set. 相似文献

14.

Information attainable in some randomly incomplete multivariate response models

Tejas A. Desai Pranab K. Sen 《Journal of statistical planning and inference》2008

In a general parametric setup, a multivariate regression model is considered when responses may be missing at random while the explanatory variables and covariates are completely observed. Asymptotic optimality properties of maximum likelihood estimators for such models are linked to the Fisher information matrix for the parameters. It is shown that the information matrix is well defined for the missing-at-random model and that it plays the same role as in the complete-data linear models. Applications of the methodologic developments in hypothesis-testing problems, without any imputation of missing data, are illustrated. Some simulation results comparing the proposed method with Rubin's multiple imputation method are presented. 相似文献

15.

Nonlinear regression modeling via regularized radial basis function networks

Tomohiro Ando Sadanori Konishi Seiya Imoto 《Journal of statistical planning and inference》2008

The problem of constructing nonlinear regression models is investigated to analyze data with complex structure. We introduce radial basis functions with hyperparameter that adjusts the amount of overlapping basis functions and adopts the information of the input and response variables. By using the radial basis functions, we construct nonlinear regression models with help of the technique of regularization. Crucial issues in the model building process are the choices of a hyperparameter, the number of basis functions and a smoothing parameter. We present information-theoretic criteria for evaluating statistical models under model misspecification both for distributional and structural assumptions. We use real data examples and Monte Carlo simulations to investigate the properties of the proposed nonlinear regression modeling techniques. The simulation results show that our nonlinear modeling performs well in various situations, and clear improvements are obtained for the use of the hyperparameter in the basis functions. 相似文献

16.

Bayesian inference for dynamic social network data

Johan H. Koskinen Tom A.B. Snijders 《Journal of statistical planning and inference》2007

We consider a continuous-time model for the evolution of social networks. A social network is here conceived as a (di-) graph on a set of vertices, representing actors, and the changes of interest are creation and disappearance over time of (arcs) edges in the graph. Hence we model a collection of random edge indicators that are not, in general, independent. We explicitly model the interdependencies between edge indicators that arise from interaction between social entities. A Markov chain is defined in terms of an embedded chain with holding times and transition probabilities. Data are observed at fixed points in time and hence we are not able to observe the embedded chain directly. Introducing a prior distribution for the parameters we may implement an MCMC algorithm for exploring the posterior distribution of the parameters by simulating the evolution of the embedded process between observations. 相似文献

17.

Semiparametric inference of proportional odds model based on randomly truncated data

Sundaram R 《Journal of statistical planning and inference》2009,139(4):1381-1393

This paper studies the estimation in the proportional odds model based on randomly truncated data. The proposed estimators for the regression coefficients include a class of minimum distance estimators defined through weighted empirical odds function. We have investigated the asymptotic properties like the consistency and the limiting distribution of the proposed estimators under mild conditions. The finite sample properties were investigated through simulation study making comparison of some of the estimators in the class. We conclude with an illustration of our proposed method to a well-known AIDS data. 相似文献

18.

Variable selection via the weighted group lasso for factor analysis models

Kei Hirose Sadanori Konishi 《Revue canadienne de statistique》2012,40(2):345-361

We consider the problem of selecting variables in factor analysis models. The $L_1$ regularization procedure is introduced to perform an automatic variable selection. In the factor analysis model, each variable is controlled by multiple factors when there are more than one underlying factor. We treat parameters corresponding to the multiple factors as grouped parameters, and then apply the group lasso. Furthermore, the weight of the group lasso penalty is modified to obtain appropriate estimates and improve the performance of variable selection. Crucial issues in this modeling procedure include the selection of the number of factors and a regularization parameter. Choosing these parameters can be viewed as a model selection and evaluation problem. We derive a model selection criterion for evaluating the factor analysis model via the weighted group lasso. Monte Carlo simulations are conducted to investigate the effectiveness of the proposed procedure. A real data example is also given to illustrate our procedure. The Canadian Journal of Statistics 40: 345–361; 2012 © 2012 Statistical Society of Canada 相似文献

19.

A two-state regime switching autoregressive model with an application to river flow analysis

Krisztina Vasas Péter Elek László Márkus 《Journal of statistical planning and inference》2007

We propose a regime switching autoregressive model and apply it to analyze daily water discharge series of River Tisza in Hungary. The dynamics is governed by two regimes, along which both the autoregressive coefficients and the innovation distributions are altering, moreover, the hidden regime indicator process is allowed to be non-Markovian. After examining stationarity and basic properties of the model, we turn to its estimation by Markov Chain Monte Carlo (MCMC) methods and propose two algorithms. The values of the latent process serve as auxiliary parameters in the first one, while the change points of the regimes do the same in the second one in a reversible jump MCMC setting. After comparing the mixing performance of the two methods, the model is fitted to the water discharge data. Simulations show that it reproduces the important features of the water discharge series such as the highly skewed marginal distribution and the asymmetric shape of the hydrograph. 相似文献

20.

On the distribution of the adaptive LASSO estimator

Benedikt M. Pötscher Ulrike Schneider 《Journal of statistical planning and inference》2009

We study the distribution of the adaptive LASSO estimator [Zou, H., 2006. The adaptive LASSO and its oracle properties. J. Amer. Statist. Assoc. 101, 1418–1429] in finite samples as well as in the large-sample limit. The large-sample distributions are derived both for the case where the adaptive LASSO estimator is tuned to perform conservative model selection as well as for the case where the tuning results in consistent model selection. We show that the finite-sample as well as the large-sample distributions are typically highly nonnormal, regardless of the choice of the tuning parameter. The uniform convergence rate is also obtained, and is shown to be slower than n^-1/2

n^{- 1 / 2}

in case the estimator is tuned to perform consistent model selection. In particular, these results question the statistical relevance of the ‘oracle’ property of the adaptive LASSO estimator established in Zou [2006. The adaptive LASSO and its oracle properties. J. Amer. Statist. Assoc. 101, 1418–1429]. Moreover, we also provide an impossibility result regarding the estimation of the distribution function of the adaptive LASSO estimator. The theoretical results, which are obtained for a regression model with orthogonal design, are complemented by a Monte Carlo study using nonorthogonal regressors. 相似文献