首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Non‐random sampling is a source of bias in empirical research. It is common for the outcomes of interest (e.g. wage distribution) to be skewed in the source population. Sometimes, the outcomes are further subjected to sample selection, which is a type of missing data, resulting in partial observability. Thus, methods based on complete cases for skew data are inadequate for the analysis of such data and a general sample selection model is required. Heckman proposed a full maximum likelihood estimation method under the normality assumption for sample selection problems, and parametric and non‐parametric extensions have been proposed. We generalize Heckman selection model to allow for underlying skew‐normal distributions. Finite‐sample performance of the maximum likelihood estimator of the model is studied via simulation. Applications illustrate the strength of the model in capturing spurious skewness in bounded scores, and in modelling data where logarithm transformation could not mitigate the effect of inherent skewness in the outcome variable.  相似文献   

2.
The Quermass‐interaction model allows to generalize the classical germ‐grain Boolean model in adding a morphological interaction between the grains. It enables to model random structures with specific morphologies, which are unlikely to be generated from a Boolean model. The Quermass‐interaction model depends in particular on an intensity parameter, which is impossible to estimate from classical likelihood or pseudo‐likelihood approaches because the number of points is not observable from a germ‐grain set. In this paper, we present a procedure based on the Takacs–Fiksel method, which is able to estimate all parameters of the Quermass‐interaction model, including the intensity. An intensive simulation study is conducted to assess the efficiency of the procedure and to provide practical recommendations. It also illustrates that the estimation of the intensity parameter is crucial in order to identify the model. The Quermass‐interaction model is finally fitted by our method to P. Diggle's heather data set.  相似文献   

3.
We present a scalable Bayesian modelling approach for identifying brain regions that respond to a certain stimulus and use them to classify subjects. More specifically, we deal with multi‐subject electroencephalography (EEG) data with a binary response distinguishing between alcoholic and control groups. The covariates are matrix‐variate with measurements taken from each subject at different locations across multiple time points. EEG data have a complex structure with both spatial and temporal attributes. We use a divide‐and‐conquer strategy and build separate local models, that is, one model at each time point. We employ Bayesian variable selection approaches using a structured continuous spike‐and‐slab prior to identify the locations that respond to a certain stimulus. We incorporate the spatio‐temporal structure through a Kronecker product of the spatial and temporal correlation matrices. We develop a highly scalable estimation algorithm, using likelihood approximation, to deal with large number of parameters in the model. Variable selection is done via clustering of the locations based on their duration of activation. We use scoring rules to evaluate the prediction performance. Simulation studies demonstrate the efficiency of our scalable algorithm in terms of estimation and fast computation. We present results using our scalable approach on a case study of multi‐subject EEG data.  相似文献   

4.
Quantile regression methods have been widely used in many research areas in recent years. However conventional estimation methods for quantile regression models do not guarantee that the estimated quantile curves will be non‐crossing. While there are various methods in the literature to deal with this problem, many of these methods force the model parameters to lie within a subset of the parameter space in order for the required monotonicity to be satisfied. Note that different methods may use different subspaces of the space of model parameters. This paper establishes a relationship between the monotonicity of the estimated conditional quantiles and the comonotonicity of the model parameters. We develope a novel quasi‐Bayesian method for parameter estimation which can be used to deal with both time series and independent statistical data. Simulation studies and an application to real financial returns show that the proposed method has the potential to be very useful in practice.  相似文献   

5.
Latent variable models have been widely used for modelling the dependence structure of multiple outcomes data. However, the formulation of a latent variable model is often unknown a priori, the misspecification will distort the dependence structure and lead to unreliable model inference. Moreover, multiple outcomes with varying types present enormous analytical challenges. In this paper, we present a class of general latent variable models that can accommodate mixed types of outcomes. We propose a novel selection approach that simultaneously selects latent variables and estimates parameters. We show that the proposed estimator is consistent, asymptotically normal and has the oracle property. The practical utility of the methods is confirmed via simulations as well as an application to the analysis of the World Values Survey, a global research project that explores peoples’ values and beliefs and the social and personal characteristics that might influence them.  相似文献   

6.
Many model‐free dimension reduction methods have been developed for high‐dimensional regression data but have not paid much attention on problems with non‐linear confounding. In this paper, we propose an inverse‐regression method of dependent variable transformation for detecting the presence of non‐linear confounding. The benefit of using geometrical information from our method is highlighted. A ratio estimation strategy is incorporated in our approach to enhance the interpretation of variable selection. This approach can be implemented not only in principal Hessian directions (PHD) but also in other recently developed dimension reduction methods. Several simulation examples that are reported for illustration and comparisons are made with sliced inverse regression and PHD in ignorance of non‐linear confounding. An illustrative application to one real data is also presented.  相似文献   

7.
Small‐area estimation of poverty‐related variables is an increasingly important analytical tool in targeting the delivery of food and other aid in developing countries. We compare two methods for the estimation of small‐area means and proportions, namely empirical Bayes and composite estimation, with what has become the international standard method of Elbers, Lanjouw & Lanjouw (2003) . In addition to differences among the sets of estimates and associated estimated standard errors, we discuss data requirements, design and model selection issues and computational complexity. The Elbers, Lanjouw and Lanjouw (ELL) method is found to produce broadly similar estimates but to have much smaller estimated standard errors than the other methods. The question of whether these standard error estimates are downwardly biased is discussed. Although the question cannot yet be answered in full, as a precautionary measure it is strongly recommended that the ELL model be modified to include a small‐area‐level error component in addition to the cluster‐level and household‐level errors it currently contains. This recommendation is particularly important because the allocation of billions of dollars of aid funding is being determined and monitored via ELL. Under current aid distribution mechanisms, any downward bias in estimates of standard error may lead to allocations that are suboptimal because distinctions are made between estimated poverty levels at the small‐area level that are not significantly different statistically.  相似文献   

8.
Penalised likelihood methods, such as the least absolute shrinkage and selection operator (Lasso) and the smoothly clipped absolute deviation penalty, have become widely used for variable selection in recent years. These methods impose penalties on regression coefficients to shrink a subset of them towards zero to achieve parameter estimation and model selection simultaneously. The amount of shrinkage is controlled by the regularisation parameter. Popular approaches for choosing the regularisation parameter include cross‐validation, various information criteria and bootstrapping methods that are based on mean square error. In this paper, a new data‐driven method for choosing the regularisation parameter is proposed and the consistency of the method is established. It holds not only for the usual fixed‐dimensional case but also for the divergent setting. Simulation results show that the new method outperforms other popular approaches. An application of the proposed method to motif discovery in gene expression analysis is included in this paper.  相似文献   

9.
In real‐data analysis, deciding the best subset of variables in regression models is an important problem. Akaike's information criterion (AIC) is often used in order to select variables in many fields. When the sample size is not so large, the AIC has a non‐negligible bias that will detrimentally affect variable selection. The present paper considers a bias correction of AIC for selecting variables in the generalized linear model (GLM). The GLM can express a number of statistical models by changing the distribution and the link function, such as the normal linear regression model, the logistic regression model, and the probit model, which are currently commonly used in a number of applied fields. In the present study, we obtain a simple expression for a bias‐corrected AIC (corrected AIC, or CAIC) in GLMs. Furthermore, we provide an ‘R’ code based on our formula. A numerical study reveals that the CAIC has better performance than the AIC for variable selection.  相似文献   

10.
Length‐biased sampling data are often encountered in the studies of economics, industrial reliability, epidemiology, genetics and cancer screening. The complication of this type of data is due to the fact that the observed lifetimes suffer from left truncation and right censoring, where the left truncation variable has a uniform distribution. In the Cox proportional hazards model, Huang & Qin (Journal of the American Statistical Association, 107, 2012, p. 107) proposed a composite partial likelihood method which not only has the simplicity of the popular partial likelihood estimator, but also can be easily performed by the standard statistical software. The accelerated failure time model has become a useful alternative to the Cox proportional hazards model. In this paper, by using the composite partial likelihood technique, we study this model with length‐biased sampling data. The proposed method has a very simple form and is robust when the assumption that the censoring time is independent of the covariate is violated. To ease the difficulty of calculations when solving the non‐smooth estimating equation, we use a kernel smoothed estimation method (Heller; Journal of the American Statistical Association, 102, 2007, p. 552). Large sample results and a re‐sampling method for the variance estimation are discussed. Some simulation studies are conducted to compare the performance of the proposed method with other existing methods. A real data set is used for illustration.  相似文献   

11.
Skew‐symmetric models offer a very flexible class of distributions for modelling data. These distributions can also be viewed as selection models for the symmetric component of the specified skew‐symmetric distribution. The estimation of the location and scale parameters corresponding to the symmetric component is considered here, with the symmetric component known. Emphasis is placed on using the empirical characteristic function to estimate these parameters. This is made possible by an invariance property of the skew‐symmetric family of distributions, namely that even transformations of random variables that are skew‐symmetric have a distribution only depending on the symmetric density. A distance metric between the real components of the empirical and true characteristic functions is minimized to obtain the estimators. The method is semiparametric, in that the symmetric component is specified, but the skewing function is assumed unknown. Furthermore, the methodology is extended to hypothesis testing. Two tests for a null hypothesis of specific parameter values are considered, as well as a test for the hypothesis that the symmetric component has a specific parametric form. A resampling algorithm is described for practical implementation of these tests. The outcomes of various numerical experiments are presented.  相似文献   

12.
In outcome‐dependent sampling, the continuous or binary outcome variable in a regression model is available in advance to guide selection of a sample on which explanatory variables are then measured. Selection probabilities may either be a smooth function of the outcome variable or be based on a stratification of the outcome. In many cases, only data from the final sample is accessible to the analyst. A maximum likelihood approach for this data configuration is developed here for the first time. The likelihood for fully general outcome‐dependent designs is stated, then the special case of Poisson sampling is examined in more detail. The maximum likelihood estimator differs from the well‐known maximum sample likelihood estimator, and an information bound result shows that the former is asymptotically more efficient. A simulation study suggests that the efficiency difference is generally small. Maximum sample likelihood estimation is therefore recommended in practice when only sample data is available. Some new smooth sample designs show considerable promise.  相似文献   

13.
The Ising model is one of the simplest and most famous models of interacting systems. It was originally proposed to model ferromagnetic interactions in statistical physics and is now widely used to model spatial processes in many areas such as ecology, sociology, and genetics, usually without testing its goodness of fit. Here, we propose various test statistics and an exact goodness‐of‐fit test for the finite‐lattice Ising model. The theory of Markov bases has been developed in algebraic statistics for exact goodness‐of‐fit testing using a Monte Carlo approach. However, finding a Markov basis is often computationally intractable. Thus, we develop a Monte Carlo method for exact goodness‐of‐fit testing for the Ising model that avoids computing a Markov basis and also leads to a better connectivity of the Markov chain and hence to a faster convergence. We show how this method can be applied to analyze the spatial organization of receptors on the cell membrane.  相似文献   

14.
Pharmacokinetic (PK) data often contain concentration measurements below the quantification limit (BQL). While specific values cannot be assigned to these observations, nevertheless these observed BQL data are informative and generally known to be lower than the lower limit of quantification (LLQ). Setting BQLs as missing data violates the usual missing at random (MAR) assumption applied to the statistical methods, and therefore leads to biased or less precise parameter estimation. By definition, these data lie within the interval [0, LLQ], and can be considered as censored observations. Statistical methods that handle censored data, such as maximum likelihood and Bayesian methods, are thus useful in the modelling of such data sets. The main aim of this work was to investigate the impact of the amount of BQL observations on the bias and precision of parameter estimates in population PK models (non‐linear mixed effects models in general) under maximum likelihood method as implemented in SAS and NONMEM, and a Bayesian approach using Markov chain Monte Carlo (MCMC) as applied in WinBUGS. A second aim was to compare these different methods in dealing with BQL or censored data in a practical situation. The evaluation was illustrated by simulation based on a simple PK model, where a number of data sets were simulated from a one‐compartment first‐order elimination PK model. Several quantification limits were applied to each of the simulated data to generate data sets with certain amounts of BQL data. The average percentage of BQL ranged from 25% to 75%. Their influence on the bias and precision of all population PK model parameters such as clearance and volume distribution under each estimation approach was explored and compared. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

15.
Remote sensing of the earth with satellites yields datasets that can be massive in size, nonstationary in space, and non‐Gaussian in distribution. To overcome computational challenges, we use the reduced‐rank spatial random effects (SRE) model in a statistical analysis of cloud‐mask data from NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) instrument on board NASA's Terra satellite. Parameterisations of cloud processes are the biggest source of uncertainty and sensitivity in different climate models’ future projections of Earth's climate. An accurate quantification of the spatial distribution of clouds, as well as a rigorously estimated pixel‐scale clear‐sky‐probability process, is needed to establish reliable estimates of cloud‐distributional changes and trends caused by climate change. Here we give a hierarchical spatial‐statistical modelling approach for a very large spatial dataset of 2.75 million pixels, corresponding to a granule of MODIS cloud‐mask data, and we use spatial change‐of‐Support relationships to estimate cloud fraction at coarser resolutions. Our model is non‐Gaussian; it postulates a hidden process for the clear‐sky probability that makes use of the SRE model, EM‐estimation, and optimal (empirical Bayes) spatial prediction of the clear‐sky‐probability process. Measures of prediction uncertainty are also given.  相似文献   

16.
Panel count data arise in many fields and a number of estimation procedures have been developed along with two procedures for variable selection. In this paper, we discuss model selection and parameter estimation together. For the former, a focused information criterion (FIC) is presented and for the latter, a frequentist model average (FMA) estimation procedure is developed. A main advantage, also the difference from the existing model selection methods, of the FIC is that it emphasizes the accuracy of the estimation of the parameters of interest, rather than all parameters. Further efficiency gain can be achieved by the FMA estimation procedure as unlike existing methods, it takes into account the variability in the stage of model selection. Asymptotic properties of the proposed estimators are established, and a simulation study conducted suggests that the proposed methods work well for practical situations. An illustrative example is also provided. © 2014 Board of the Foundation of the Scandinavian Journal of Statistics  相似文献   

17.
Abstract. Continuous proportional outcomes are collected from many practical studies, where responses are confined within the unit interval (0,1). Utilizing Barndorff‐Nielsen and Jørgensen's simplex distribution, we propose a new type of generalized linear mixed‐effects model for longitudinal proportional data, where the expected value of proportion is directly modelled through a logit function of fixed and random effects. We establish statistical inference along the lines of Breslow and Clayton's penalized quasi‐likelihood (PQL) and restricted maximum likelihood (REML) in the proposed model. We derive the PQL/REML using the high‐order multivariate Laplace approximation, which gives satisfactory estimation of the model parameters. The proposed model and inference are illustrated by simulation studies and a data example. The simulation studies conclude that the fourth order approximate PQL/REML performs satisfactorily. The data example shows that Aitchison's technique of the normal linear mixed model for logit‐transformed proportional outcomes is not robust against outliers.  相似文献   

18.
High-dimensional data with a group structure of variables arise always in many contemporary statistical modelling problems. Heavy-tailed errors or outliers in the response often exist in these data. We consider robust group selection for partially linear models when the number of covariates can be larger than the sample size. The non-convex penalty function is applied to achieve both goals of variable selection and estimation in the linear part simultaneously, and we use polynomial splines to estimate the nonparametric component. Under regular conditions, we show that the robust estimator enjoys the oracle property. Simulation studies demonstrate the performance of the proposed method with samples of moderate size. The analysis of a real example illustrates that our method works well.  相似文献   

19.
Abstract. The zero‐inflated Poisson regression model is a special case of finite mixture models that is useful for count data containing many zeros. Typically, maximum likelihood (ML) estimation is used for fitting such models. However, it is well known that the ML estimator is highly sensitive to the presence of outliers and can become unstable when mixture components are poorly separated. In this paper, we propose an alternative robust estimation approach, robust expectation‐solution (RES) estimation. We compare the RES approach with an existing robust approach, minimum Hellinger distance (MHD) estimation. Simulation results indicate that both methods improve on ML when outliers are present and/or when the mixture components are poorly separated. However, the RES approach is more efficient in all the scenarios we considered. In addition, the RES method is shown to yield consistent and asymptotically normal estimators and, in contrast to MHD, can be applied quite generally.  相似文献   

20.
Right‐censored and length‐biased failure time data arise in many fields including cross‐sectional prevalent cohort studies, and their analysis has recently attracted a great deal of attention. It is well‐known that for regression analysis of failure time data, two commonly used approaches are hazard‐based and quantile‐based procedures, and most of the existing methods are the hazard‐based ones. In this paper, we consider quantile regression analysis of right‐censored and length‐biased data and present a semiparametric varying‐coefficient partially linear model. For estimation of regression parameters, a three‐stage procedure that makes use of the inverse probability weighted technique is developed, and the asymptotic properties of the resulting estimators are established. In addition, the approach allows the dependence of the censoring variable on covariates, while most of the existing methods assume the independence between censoring variables and covariates. A simulation study is conducted and suggests that the proposed approach works well in practical situations. Also, an illustrative example is provided.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号