首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
In this work, we develop modeling and estimation approach for the analysis of cross-sectional clustered data with multimodal conditional distributions where the main interest is in analysis of subpopulations. It is proposed to model such data in a hierarchical model with conditional distributions viewed as finite mixtures of normal components. With a large number of observations in the lowest level clusters, a two-stage estimation approach is used. In the first stage, the normal mixture parameters in each lowest level cluster are estimated using robust methods. Robust alternatives to the maximum likelihood estimation are used to provide stable results even for data with conditional distributions such that their components may not quite meet normality assumptions. Then the lowest level cluster-specific means and standard deviations are modeled in a mixed effects model in the second stage. A small simulation study was conducted to compare performance of finite normal mixture population parameter estimates based on robust and maximum likelihood estimation in stage 1. The proposed modeling approach is illustrated through the analysis of mice tendon fibril diameters data. Analyses results address genotype differences between corresponding components in the mixtures and demonstrate advantages of robust estimation in stage 1.  相似文献   

2.
《统计学通讯:理论与方法》2012,41(16-17):3278-3300
Under complex survey sampling, in particular when selection probabilities depend on the response variable (informative sampling), the sample and population distributions are different, possibly resulting in selection bias. This article is concerned with this problem by fitting two statistical models, namely: the variance components model (a two-stage model) and the fixed effects model (a single-stage model) for one-way analysis of variance, under complex survey design, for example, two-stage sampling, stratification, and unequal probability of selection, etc. Classical theory underlying the use of the two-stage model involves simple random sampling for each of the two stages. In such cases the model in the sample, after sample selection, is the same as model for the population; before sample selection. When the selection probabilities are related to the values of the response variable, standard estimates of the population model parameters may be severely biased, leading possibly to false inference. The idea behind the approach is to extract the model holding for the sample data as a function of the model in the population and of the first order inclusion probabilities. And then fit the sample model, using analysis of variance, maximum likelihood, and pseudo maximum likelihood methods of estimation. The main feature of the proposed techniques is related to their behavior in terms of the informativeness parameter. We also show that the use of the population model that ignores the informative sampling design, yields biased model fitting.  相似文献   

3.
Prediction of random effects is an important problem with expanding applications. In the simplest context, the problem corresponds to prediction of the latent value (the mean) of a realized cluster selected via two-stage sampling. Recently, Stanek and Singer [Predicting random effects from finite population clustered samples with response error. J. Amer. Statist. Assoc. 99, 119–130] developed best linear unbiased predictors (BLUP) under a finite population mixed model that outperform BLUPs from mixed models and superpopulation models. Their setup, however, does not allow for unequally sized clusters. To overcome this drawback, we consider an expanded finite population mixed model based on a larger set of random variables that span a higher dimensional space than those typically applied to such problems. We show that BLUPs for linear combinations of the realized cluster means derived under such a model have considerably smaller mean squared error (MSE) than those obtained from mixed models, superpopulation models, and finite population mixed models. We motivate our general approach by an example developed for two-stage cluster sampling and show that it faithfully captures the stochastic aspects of sampling in the problem. We also consider simulation studies to illustrate the increased accuracy of the BLUP obtained under the expanded finite population mixed model.  相似文献   

4.
We consider the adjustment, based upon a sample of size n, of collections of vectors drawn from either an infinite or finite population. The vectors may be judged to be either normally distributed or, more generally, second-order exchangeable. We develop the work of Goldstein and Wooff (1998) to show how the familiar univariate finite population corrections (FPCs) naturally generalise to individual quantities in the multivariate population. The types of information we gain by sampling are identified with the orthogonal canonical variable directions derived from a generalised eigenvalue problem. These canonical directions share the same co-ordinate representation for all sample sizes and, for equally defined individuals, all population sizes enabling simple comparisons between both the effects of different sample sizes and of different population sizes. We conclude by considering how the FPC is modified for multivariate cluster sampling with exchangeable clusters. In univariate two-stage cluster sampling, we may decompose the variance of the population mean into the sum of the variance of cluster means and the variance of the cluster members within clusters. The first term has a FPC relating to the sampling fraction of clusters, the second term has a FPC relating to the sampling fraction of cluster size. We illustrate how this generalises in the multivariate case. We decompose the variance into two terms: the first relating to multivariate finite population sampling of clusters and the second to multivariate finite population sampling within clusters. We solve two generalised eigenvalue problems to show how to generalise the univariate to the multivariate: each of the two FPCs attaches to one, and only one, of the two eigenbases.  相似文献   

5.
Log‐normal linear regression models are popular in many fields of research. Bayesian estimation of the conditional mean of the dependent variable is problematic as many choices of the prior for the variance (on the log‐scale) lead to posterior distributions with no finite moments. We propose a generalized inverse Gaussian prior for this variance and derive the conditions on the prior parameters that yield posterior distributions of the conditional mean of the dependent variable with finite moments up to a pre‐specified order. The conditions depend on one of the three parameters of the suggested prior; the other two have an influence on inferences for small and medium sample sizes. A second goal of this paper is to discuss how to choose these parameters according to different criteria including the optimization of frequentist properties of posterior means.  相似文献   

6.
In this paper, we study the class of inflated modified power series distributions (IMPSD) where inflation occurs at any of the support points. This class include among other the generalized Poisson, the generalized negative binomial, the generalized logarithmic series and the lost games distributions. We give expressions for the moments, factorial moments and central moments of the IMPSD. The maximum likelihood estimation of the parameters of the IMPSD and the variance – covariance matrix of the estimators is obtained. We derive these estimators and their information matrices for mentioned above particular members of IMPSD class. The second part of this paper deals with the distribution of sum of independent and identically distributed random variables taking values s, s+1. s + 2, …, s ≥ 0, with modified power series distributions inflated at the point s.  相似文献   

7.
The estimation or prediction of population characteristics based on the sample information is the key issue in survey sampling. If the sample sizes in subpopulations (domains) are large enough, similar methods as used for the whole population can be used to estimate or to predict subpopulations characteristics as well. To estimate or to predict characteristics of domains with small or even zero sample sizes, small area estimation methods “borrowing strength” from other subpopulations or time periods are widely used. We extend this problem and study methods of prediction of future population and subpopulations’ characteristics based on the longitudinal data.  相似文献   

8.
Motivated by a real-life problem, we develop a Two-Stage Cluster Sampling with Ranked Set Sampling (TSCRSS) design in the second stage for which we derive an unbiased estimator of population mean and its variance. An unbiased estimator of the variance of mean estimator is also derived. It is proved that the TSCRSS is more efficient—in the sense of having smaller variance—than the conventional two-stage cluster simple random sampling in which the second-stage sampling is with replacement. Using a simulation study on a real-life population, we show that the TSCRSS is more efficient than the conventional two-stage cluster sampling when simple random sampling without replacement is used in both stages.  相似文献   

9.
In this article, we propose an extension of the Maxwell distribution, so-called the extended Maxwell distribution. This extension is evolved by using the Maxwell-X family of distributions and Weibull distribution. We study its fundamental properties such as hazard rate, moments, generating functions, skewness, kurtosis, stochastic ordering, conditional moments and moment generating function, hazard rate, mean and variance of the (reversed) residual life, reliability curves, entropy, etc. In estimation viewpoint, the maximum likelihood estimation of the unknown parameters of the distribution and asymptotic confidence intervals are discussed. We also obtain expected Fisher’s information matrix as well as discuss the existence and uniqueness of the maximum likelihood estimators. The EMa distribution and other competing distributions are fitted to two real datasets and it is shown that the distribution is a good competitor to the compared distributions.  相似文献   

10.
The property of identifiability is an important consideration on estimating the parameters in a mixture of distributions. Also classification of a random variable based on a mixture can be meaning fully discussed only if the class of all finite mixtures is identifiable. The problem of identifiability of finite mixture of Gompertz distributions is studied. A procedure is presented for finding maximum likelihood estimates of the parameters of a mixture of two Gompertz distributions, using classified and unclassified observations. Based on small sample size, estimation of a nonlinear discriminant function is considered. Throughout simulation experiments, the performance of the corresponding estimated nonlinear discriminant function is investigated.  相似文献   

11.
This article addresses the various properties and different methods of estimation of the unknown parameter of length and area-biased Maxwell distributions. Although, our main focus is on estimation from both frequentist and Bayesian point of view, yet, various mathematical and statistical properties of length and area-biased Maxwell distributions (such as moments, moment-generating function (mgf), hazard rate function, mean residual lifetime function, residual lifetime function, reversed residual life function, conditional moments and conditional mgf, stochastic ordering, and measures of uncertainty) are derived. We briefly describe different frequentist approaches, namely, maximum likelihood estimator, moments estimator, least-square and weighted least-square estimators, maximum product of spacings estimator and compare them using extensive numerical simulations. Next we consider Bayes estimation under different types of loss function (symmetric and asymmetric loss functions) using inverted gamma prior for the scale parameter. Furthermore, Bayes estimators and their respective posterior risks are computed and compared using Markov chain Monte Carlo (MCMC) algorithm. Also, bootstrap confidence intervals using frequentist approaches are provided to compare with Bayes credible intervals. Finally, a real dataset has been analyzed for illustrative purposes.  相似文献   

12.
In the health and social sciences, researchers often encounter categorical data for which complexities come from a nested hierarchy and/or cross-classification for the sampling structure. A common feature of these studies is a non-standard data structure with repeated measurements which may have some degree of clustering. In this paper, methodology is presented for the joint estimation of quantities of interest in the context of a stratified two-stage sample with bivariate dichotomous data. These quantities are the mean value π of an observed dichotomous response for a certain condition or time-point and a set of correlation coefficients for intra-cluster association for each condition or time period and for inter-condition correlation within and among clusters. The methodology uses the cluster means and pairwise joint probability parameters from each cluster. They together provide appropriate information across clusters for the estimation of the correlation coefficients.  相似文献   

13.
The purpose of this paper is to account for informative sampling in fitting time series models, and in particular an autoregressive model of order one, for longitudinal survey data. The idea behind the proposed approach is to extract the model holding for the sample data as a function of the model in the population and the first-order inclusion probabilities, and then fit the sample model using maximum-likelihood, pseudo-maximum-likelihood and estimating equations methods. A new test for sampling ignorability is proposed based on the Kullback–Leibler information measure. Also, we investigate the issue of the sensitivity of the sample model to incorrect specification of the conditional expectations of the sample inclusion probabilities. The simulation study carried out shows that the sample-likelihood-based method produces better estimators than the pseudo-maximum-likelihood method, and that sensitivity to departures from the assumed model is low. Also, we find that both the conventional t-statistic and the Kullback–Leibler information statistic for testing of sampling ignorability perform well under both informative and noninformative sampling designs.  相似文献   

14.
Multivariate failure time data arise when the sample consists of clusters and each cluster contains several possibly dependent failure times. The Clayton–Oakes model (Clayton, 1978; Oakes, 1982) for multivariate failure times characterizes the intracluster dependence parametrically but allows arbitrary specification of the marginal distributions. In this paper, we discuss estimation in the Clayton–Oakes model when the marginal distributions are modeled to follow the Cox (1972) proportional hazards regression model. Parameter estimation is based on an approximate generalized maximum likelihood estimator. We illustrate the model's application with example datasets.  相似文献   

15.
We consider the variance estimation of the weighted likelihood estimator (WLE) under two‐phase stratified sampling without replacement. Asymptotic variance of the WLE in many semiparametric models contains unknown functions or does not have a closed form. The standard method of the inverse probability weighted (IPW) sample variances of an estimated influence function is then not available in these models. To address this issue, we develop the variance estimation procedure for the WLE in a general semiparametric model. The phase I variance is estimated by taking a numerical derivative of the IPW log likelihood. The phase II variance is estimated based on the bootstrap for a stratified sample in a finite population. Despite a theoretical difficulty of dependent observations due to sampling without replacement, we establish the (bootstrap) consistency of our estimators. Finite sample properties of our method are illustrated in a simulation study.  相似文献   

16.
Functional data analysis has become an important area of research because of its ability of handling high‐dimensional and complex data structures. However, the development is limited in the context of linear mixed effect models and, in particular, for small area estimation. The linear mixed effect models are the backbone of small area estimation. In this article, we consider area‐level data and fit a varying coefficient linear mixed effect model where the varying coefficients are semiparametrically modelled via B‐splines. We propose a method of estimating the fixed effect parameters and consider prediction of random effects that can be implemented using a standard software. For measuring prediction uncertainties, we derive an analytical expression for the mean squared errors and propose a method of estimating the mean squared errors. The procedure is illustrated via a real data example, and operating characteristics of the method are judged using finite sample simulation studies.  相似文献   

17.
We consider robust Bayesian prediction of a function of unobserved data based on observed data under an asymmetric loss function. Under a general linear-exponential posterior risk function, the posterior regret gamma-minimax (PRGM), conditional gamma-minimax (CGM), and most stable (MS) predictors are obtained when the prior distribution belongs to a general class of prior distributions. We use this general form to find the PRGM, CGM, and MS predictors of a general linear combination of the finite population values under LINEX loss function on the basis of two classes of priors in a normal model. Also, under the general ε-contamination class of prior distributions, the PRGM predictor of a general linear combination of the finite population values is obtained. Finally, we provide a real-life example to predict a finite population mean and compare the estimated risk and risk bias of the obtained predictors under the LINEX loss function by a simulation study.  相似文献   

18.
The authors establish the joint distribution of the sum X and the maximum Y of IID exponential random variables. They derive exact formuli describing the random vector (X, Y), including its joint PDF, CDF, and other characteristics; marginal and conditional distributions; moments and related parameters; and stochastic representations leading to further properties of infinite divisibility and self-decomposability. The authors also discuss parameter estimation and include an example from climatology that illustrates the modeling potential of this new bivariate model.  相似文献   

19.
Lin  Tsung I.  Lee  Jack C.  Ni  Huey F. 《Statistics and Computing》2004,14(2):119-130
A finite mixture model using the multivariate t distribution has been shown as a robust extension of normal mixtures. In this paper, we present a Bayesian approach for inference about parameters of t-mixture models. The specifications of prior distributions are weakly informative to avoid causing nonintegrable posterior distributions. We present two efficient EM-type algorithms for computing the joint posterior mode with the observed data and an incomplete future vector as the sample. Markov chain Monte Carlo sampling schemes are also developed to obtain the target posterior distribution of parameters. The advantages of Bayesian approach over the maximum likelihood method are demonstrated via a set of real data.  相似文献   

20.
Likelihood cross-validation for kernel density estimation is known to be sensitive to extreme observations and heavy-tailed distributions. We propose a robust likelihood-based cross-validation method to select bandwidths in multivariate density estimations. We derive this bandwidth selector within the framework of robust maximum likelihood estimation. This method establishes a smooth transition from likelihood cross-validation for nonextreme observations to least squares cross-validation for extreme observations, thereby combining the efficiency of likelihood cross-validation and the robustness of least-squares cross-validation. We also suggest a simple rule to select the transition threshold. We demonstrate the finite sample performance and practical usefulness of the proposed method via Monte Carlo simulations and a real data application on Chinese air pollution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号