期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Variable selection for semiparametric varying coefficient partially linear model based on modal regression with missing data

Yafeng Xia Yarong Qu Nailing Sun 《统计学通讯:理论与方法》2013,42(20):5121-5137

Abstract

In this article, we focus on the variable selection for semiparametric varying coefficient partially linear model with response missing at random. Variable selection is proposed based on modal regression, where the non parametric functions are approximated by B-spline basis. The proposed procedure uses SCAD penalty to realize variable selection of parametric and nonparametric components simultaneously. Furthermore, we establish the consistency, the sparse property and asymptotic normality of the resulting estimators. The penalty estimation parameters value of the proposed method is calculated by EM algorithm. Simulation studies are carried out to assess the finite sample performance of the proposed variable selection procedure. 相似文献

2.

Unsupervised learning of regression mixture models with unknown number of components

《Journal of Statistical Computation and Simulation》2012,82(12):2308-2334

ABSTRACT

We propose a new unsupervised learning algorithm to fit regression mixture models with unknown number of components. The developed approach consists in a penalized maximum likelihood estimation carried out by a robust expectation–maximization (EM)-like algorithm. We derive it for polynomial, spline, and B-spline regression mixtures. The proposed learning approach is unsupervised: (i) it simultaneously infers the model parameters and the optimal number of the regression mixture components from the data as the learning proceeds, rather than in a two-fold scheme as in standard model-based clustering using afterward model selection criteria, and (ii) it does not require accurate initialization unlike the standard EM for regression mixtures. The developed approach is applied to curve clustering problems. Numerical experiments on simulated and real data show that the proposed algorithm performs well and provides accurate clustering results, and confirm its benefit for practical applications. 相似文献

3.

On Variance Estimation in Semiparametric Regression Models

《统计学通讯:理论与方法》2013,42(8):1737-1742

ABSTRACT

This article considers estimation of the error variance in a semiparametric regression model. The estimator, based on the semiparametric residuals, is shown to be consistent (with certain rate) for the error variance. 相似文献

4.

An Alternative Estimation Method for the Semiparametric Accelerated Failure Time Mixture Cure Model

Linzhi Xu 《统计学通讯:模拟与计算》2013,42(9):1980-1990

We propose an alternative estimation method for the semiparametric accelerated failure time mixture cure model by incorporating the profile likelihood into the M-step of the EM algorithm. The proposed method performs as well as the existing methods when the censoring is light and better than the existing methods when the censoring is moderate from the simulation studies. Regarding to the computational time, the proposed method runs faster than the existing methods. 相似文献

5.

Semiparametric multiple kernel estimators and model diagnostics for count regression functions

Lamia Djerroud Tristan Senga Kiessé Smail Adjabi 《统计学通讯:理论与方法》2020,49(9):2131-2157

Abstract

This study concerns semiparametric approaches to estimate discrete multivariate count regression functions. The semiparametric approaches investigated consist of combining discrete multivariate nonparametric kernel and parametric estimations such that (i) a prior knowledge of the conditional distribution of model response may be incorporated and (ii) the bias of the traditional nonparametric kernel regression estimator of Nadaraya-Watson may be reduced. We are precisely interested in combination of the two estimations approaches with some asymptotic properties of the resulting estimators. Asymptotic normality results were showed for nonparametric correction terms of parametric start function of the estimators. The performance of discrete semiparametric multivariate kernel estimators studied is illustrated using simulations and real count data. In addition, diagnostic checks are performed to test the adequacy of the parametric start model to the true discrete regression model. Finally, using discrete semiparametric multivariate kernel estimators provides a bias reduction when the parametric multivariate regression model used as start regression function belongs to a neighborhood of the true regression model. 相似文献

6.

Variable selection in finite mixture of semi-parametric regression models

Ehsan Ormoz Farzad Eskandari 《统计学通讯:理论与方法》2013,42(3):695-711

Abstract

In this paper we are concerned with variable selection in finite mixture of semiparametric regression models. This task consists of model selection for non parametric component and variable selection for parametric part. Thus, we encountered separate model selections for every non parametric component of each sub model. To overcome this computational burden, we introduced a class of variable selection procedures for finite mixture of semiparametric regression models using penalized approach for variable selection. It is shown that the new method is consistent for variable selection. Simulations show that the performance of proposed method is good, and it consequently improves pervious works in this area and also requires much less computing power than existing methods. 相似文献

7.

Using logistic regression for semiparametric comparison of population means and variances

Shuwen Wan Binrong Xu Biao Zhang 《统计学通讯:理论与方法》2013,42(9):2485-2503

Abstract

We propose to compare population means and variances under a semiparametric density ratio model. The proposed method is easy to implement by employing logistic regression procedures in many statistical software, and it often works very well when data are not normal. In this paper, we construct semiparametric estimators of the differences of two population means and variances, and derive their asymptotic distributions. We prove that the proposed semiparametric estimators are asymptotically more efficient than the corresponding non parametric ones. In addition, a simulation study and the analysis of two real data sets are presented. Finally, a short discussion is provided. 相似文献

8.

Semiparametric models: a generalized self-consistency approach 总被引：1，自引：0，他引：1

Tsodikov A 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2003,65(3):759-774

Summary. In semiparametric models, the dimension d of the maximum likelihood problem is potentially unlimited. Conventional estimation methods generally behave like O ( d ³). A new O ( d ) estimation procedure is proposed for a large class of semiparametric models. Potentially unlimited dimension is handled in a numerically efficient way through a Nelson–Aalen-like estimator. Discussion of the new method is put in the context of recently developed minorization–maximization algorithms based on surrogate objective functions. The procedure for semiparametric models is used to demonstrate three methods to construct a surrogate objective function: using the difference of two concave functions, the EM way and the new quasi-EM (QEM) approach. The QEM approach is based on a generalization of the EM-like construction of the surrogate objective function so it does not depend on the missing data representation of the model. Like the EM algorithm, the QEM method has a dual interpretation, a result of merging the idea of surrogate maximization with the idea of imputation and self-consistency. The new approach is compared with other possible approaches by using simulations and analysis of real data. The proportional odds model is used as an example throughout the paper. 相似文献

9.

Semiparametric additive isotonic regression

Guang Cheng 《Journal of statistical planning and inference》2009

We consider the efficient estimation in the semiparametric additive isotonic regression model where each additive nonparametric component is assumed to be a monotone function. We show that the least-square estimator of the finite-dimensional regression coefficient is root-n

n

consistent and asymptotically normal. Moreover, the isotonic estimator of each additive functional component is proved to have the oracle property, which means the additive component can be estimated with the highest asymptotic accuracy as if the other components were known. A fast algorithm is developed by iterating between a cyclic pool adjacent violators procedure and solving a standard ordinary least squares problem. Simulations are used to illustrate the performance of the proposed procedure and verify the oracle property. 相似文献

10.

Hidden Markov Models with mixtures as emission distributions

Stevenn Volant Caroline Bérard Marie-Laure Martin-Magniette Stéphane Robin 《Statistics and Computing》2014,24(4):493-504

In unsupervised classification, Hidden Markov Models (HMM) are used to account for a neighborhood structure between observations. The emission distributions are often supposed to belong to some parametric family. In this paper, a semiparametric model where the emission distributions are a mixture of parametric distributions is proposed to get a higher flexibility. We show that the standard EM algorithm can be adapted to infer the model parameters. For the initialization step, starting from a large number of components, a hierarchical method to combine them into the hidden states is proposed. Three likelihood-based criteria to select the components to be combined are discussed. To estimate the number of hidden states, BIC-like criteria are derived. A simulation study is carried out both to determine the best combination between the combining criteria and the model selection criteria and to evaluate the accuracy of classification. The proposed method is also illustrated using a biological dataset from the model plant Arabidopsis thaliana. A R package HMMmix is freely available on the CRAN. 相似文献

11.

A marginal regression model for multivariate failure time data with a surviving fraction

Peng Y Taylor JM Yu B 《Lifetime data analysis》2007,13(3):351-369

A marginal regression approach for correlated censored survival data has become a widely used statistical method. Examples of this approach in survival analysis include from the early work by Wei et al. (J Am Stat Assoc 84:1065–1073, 1989) to more recent work by Spiekerman and Lin (J Am Stat Assoc 93:1164–1175, 1998). This approach is particularly useful if a covariate’s population average effect is of primary interest and the correlation structure is not of interest or cannot be appropriately specified due to lack of sufficient information. In this paper, we consider a semiparametric marginal proportional hazard mixture cure model for clustered survival data with a surviving or “cure” fraction. Unlike the clustered data in previous work, the latent binary cure statuses of patients in one cluster tend to be correlated in addition to the possible correlated failure times among the patients in the cluster who are not cured. The complexity of specifying appropriate correlation structures for the data becomes even worse if the potential correlation between cure statuses and the failure times in the cluster has to be considered, and thus a marginal regression approach is particularly attractive. We formulate a semiparametric marginal proportional hazards mixture cure model. Estimates are obtained using an EM algorithm and expressions for the variance–covariance are derived using sandwich estimators. Simulation studies are conducted to assess finite sample properties of the proposed model. The marginal model is applied to a multi-institutional study of local recurrences of tonsil cancer patients who received radiation therapy. It reveals new findings that are not available from previous analyses of this study that ignored the potential correlation between patients within the same institution. 相似文献

12.

Semi-empirical pseudo-likelihood for estimating equations in the presence of missing responses

Qihua Wang Ruimiao Luo 《Journal of statistical planning and inference》2011,141(8):2589-2599

Consider a semiparametric model which parameterizes only the conditional distribution of Y given X, f(y|x,β), and allows the marginal distribution of X to be completely arbitrary. Under the semiparametric model, we develop semi-empirical pseudo-likelihood inference with estimating equation in the presence of missing responses. We define semi-empirical likelihood pseudo-score estimates for both the model parameter and the parameter in the estimating equation simultaneously. Also, we develop semi-empirical pseudo-likelihood ratio inference for them, respectively. A simulation was conducted to evaluate the finite sample properties of the proposed estimators and semi-empirical pseudo-likelihood approach. 相似文献

13.

Semiparametric Efficient Estimation of the Mean of a Time Series in the Presence of Conditional Heterogeneity of Unknown Form

《Econometric Reviews》2013,32(3):229-257

Abstract

We obtain semiparametric efficiency bounds for estimation of a location parameter in a time series model where the innovations are stationary and ergodic conditionally symmetric martingale differences but otherwise possess general dependence and distributions of unknown form. We then describe an iterative estimator that achieves this bound when the conditional density functions of the sample are known. Finally, we develop a “semi-adaptive” estimator that achieves the bound when these densities are unknown by the investigator. This estimator employs nonparametric kernel estimates of the densities. Monte Carlo results are reported. 相似文献

14.

Semiparametric Smooth Coefficient Stochastic Frontier Model With Panel Data

Feng Yao Fan Zhang Subal C. Kumbhakar 《商业与经济统计学杂志》2013,31(3):556-572

ABSTRACT

We investigate the semiparametric smooth coefficient stochastic frontier model for panel data in which the distribution of the composite error term is assumed to be of known form but depends on some environmental variables. We propose multi-step estimators for the smooth coefficient functions as well as the parameters of the distribution of the composite error term and obtain their asymptotic properties. The Monte Carlo study demonstrates that the proposed estimators perform well in finite samples. We also consider an application and perform model specification test, construct confidence intervals, and estimate efficiency scores that depend on some environmental variables. The application uses a panel data on 451 large U.S. firms to explore the effects of computerization on productivity. Results show that two popular parametric models used in the stochastic frontier literature are likely to be misspecified. Compared with the parametric estimates, our semiparametric model shows a positive and larger overall effect of computer capital on the productivity. The efficiency levels, however, were not much different among the models. Supplementary materials for this article are available online. 相似文献

15.

Stochastic EM algorithm of a finite mixture model from hurdle Poisson distribution with missing responses

Ying-zi Fu 《统计学通讯:理论与方法》2013,42(20):5918-5932

ABSTRACT

In this article, a finite mixture model of hurdle Poisson distribution with missing outcomes is proposed, and a stochastic EM algorithm is developed for obtaining the maximum likelihood estimates of model parameters and mixing proportions. Specifically, missing data is assumed to be missing not at random (MNAR)/non ignorable missing (NINR) and the corresponding missingness mechanism is modeled through probit regression. To improve the algorithm efficiency, a stochastic step is incorporated into the E-step based on data augmentation, whereas the M-step is solved by the method of conditional maximization. A variation on Bayesian information criterion (BIC) is also proposed to compare models with different number of components with missing values. The considered model is a general model framework and it captures the important characteristics of count data analysis such as zero inflation/deflation, heterogeneity as well as missingness, providing us with more insight into the data feature and allowing for dispersion to be investigated more fully and correctly. Since the stochastic step only involves simulating samples from some standard distributions, the computational burden is alleviated. Once missing responses and latent variables are imputed to replace the conditional expectation, our approach works as part of a multiple imputation procedure. A simulation study and a real example illustrate the usefulness and effectiveness of our methodology. 相似文献

16.

ASYMPTOTIC AND SMALL SAMPLE STATISTICAL PROPERTIES OF RANDOM FRAILTY VARIANCE ESTIMATES FOR SHARED GAMMA FRAILTY MODELS

《统计学通讯:模拟与计算》2013,42(3):581-595

This paper concerns maximum likelihood estimation for the semiparametric shared gamma frailty model; that is the Cox proportional hazards model with the hazard function multiplied by a gamma random variable with mean 1 and variance θ. A hybrid ML-EM algorithm is applied to 26 400 simulated samples of 400 to 8000 observations with Weibull hazards. The hybrid algorithm is much faster than the standard EM algorithm, faster than standard direct maximum likelihood (ML, Newton Raphson) for large samples, and gives almost identical results to the penalised likelihood method in S-PLUS 2000. When the true value θ₀ of θ is zero, the estimates of θ are asymptotically distributed as a 50–50 mixture between a point mass at zero and a normal random variable on the positive axis. When θ₀ > 0, the asymptotic distribution is normal. However, for small samples, simulations suggest that the estimates of θ are approximately distributed as an x ? (100 ? x)% mixture, 0 ≤ x ≤ 50, between a point mass at zero and a normal random variable on the positive axis even for θ₀ > 0. In light of this, p-values and confidence intervals need to be adjusted accordingly. We indicate an approximate method for carrying out the adjustment. 相似文献

17.

Semiparametric principal component poisson regression on clustered data

Kristina Celene M. Manalaysay 《统计学通讯:模拟与计算》2017,46(2):1546-1556

In modeling count data with multivariate predictors, we often encounter problems with clustering of observations and interdependency of predictors. We propose to use principal components of predictors to mitigate the multicollinearity problem and to abate information losses due to dimension reduction, a semiparametric link between the count dependent variable and the principal components is postulated. Clustering of observations is accounted into the model as a random component and the model is estimated via the backfitting algorithm. Simulation study illustrates the advantages of the proposed model over standard poisson regression in a wide range of scenarios. 相似文献

18.

Testing generalized linear and semiparametric models against smooth alternatives

Göran Kauermann & Gerhard Tutz 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2001,63(1):147-166

We propose goodness-of-fit tests for testing generalized linear models and semiparametric regression models against smooth alternatives. The focus is on models having both continous and factorial covariates. As a smooth extension of a parametric or semiparametric model we use generalized varying-coefficient models as proposed by Hastie and Tibshirani. A likelihood ratio statistic is used for testing. Asymptotic expansions allow us to write the estimates as linear smoothers which in turn guarantees simple and fast bootstrapping of the test statistic. The test is shown to have √ n -power, but in contrast with parametric tests it is powerful against smooth alternatives in general. 相似文献

19.

Empirical likelihood inference for a semiparametric hazards regression model

Wei Chen Dehui Wang 《统计学通讯:理论与方法》2013,42(11):3236-3248

ABSTRACT

We investigated the empirical likelihood inference approach under a general class of semiparametric hazards regression models with survival data subject to right-censoring. An empirical likelihood ratio for the full 2p regression parameters involved in the model is obtained. We showed that it converged weakly to a random variable which could be written as a weighted sum of 2p independent chi-squared variables with one degree of freedom. Using this, we could construct a confidence region for parameters. We also suggested an adjusted version for the preceding statistic, whose limit followed a standard chi-squared distribution with 2p degrees of freedom. 相似文献

20.

Bootstrap likelihood ratio test for Weibull mixture models fitted to grouped data

Youjiao Yu Jane L. Harvill 《统计学通讯:理论与方法》2013,42(18):4550-4568

Abstract

Weibull mixture models are widely used in a variety of fields for modeling phenomena caused by heterogeneous sources. We focus on circumstances in which original observations are not available, and instead the data comes in the form of a grouping of the original observations. We illustrate EM algorithm for fitting Weibull mixture models for grouped data and propose a bootstrap likelihood ratio test (LRT) for determining the number of subpopulations in a mixture model. The effectiveness of the LRT methods are investigated via simulation. We illustrate the utility of these methods by applying them to two grouped data applications. 相似文献