首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
For estimating area‐specific parameters (quantities) in a finite population, a mixed‐model prediction approach is attractive. However, this approach strongly depends on the normality assumption of the response values, although we often encounter a non‐normal case in practice. In such a case, transforming observations to make them suitable for normality assumption is a useful tool, but the problem of selecting a suitable transformation still remains open. To overcome the difficulty, we here propose a new empirical best predicting method by using a parametric family of transformations to estimate a suitable transformation based on the data. We suggest a simple estimating method for transformation parameters based on the profile likelihood function, which achieves consistency under some conditions on transformation functions. For measuring the variability of point prediction, we construct an empirical Bayes confidence interval of the population parameter of interest. Through simulation studies, we investigate the numerical performance of the proposed methods. Finally, we apply the proposed method to synthetic income data in Spanish provinces in which the resulting estimates indicate that the commonly used log transformation would not be appropriate.  相似文献   

2.
ABSTRACT

Transformation of the response is a popular method to meet the usual assumptions of statistical methods based on linear models such as ANOVA and t-test. In this paper, we introduce new families of transformations for proportions or percentage data. Most of the transformations for proportions require 0 < x < 1 (where x denotes the proportion), which is often not the case in real data. The proposed families of transformations allow x = 0 and x = 1. We study the properties of the proposed transformations, as well as the performance in achieving normality and homoscedasticity. We analyze three real data sets to empirically show how the new transformation performs in meeting the usual assumptions. A simulation study is also performed to study the behavior of new families of transformations.  相似文献   

3.
We consider a method of moments approach for dealing with censoring at zero for data expressed in levels when researchers would like to take logarithms. A Box–Cox transformation is employed. We explore this approach in the context of linear regression where both dependent and independent variables are censored. We contrast this method to two others, (1) dropping records of data containing censored values and (2) assuming normality for censored observations and the residuals in the model. Across the methods considered, where researchers are interested primarily in the slope parameter, estimation bias is consistently reduced using the method of moments approach.  相似文献   

4.
The frequency of doctor consultations has direct consequences for health care budgets, yet little statistical analysis of the determinants of doctor visits has been reported. We consider the distribution of the number of visits to the doctor and, in particular, we model its dependence on a number of demographic factors. Examination of the Australian 1995 National Health Survey data reveals that generalized linear Poisson or negative binomial models are inadequate for modelling the mean as a function of covariates, because of excessive zero counts, and a mean‐variance relationship that varies enormously over covariate values. A negative binomial model is used, with parameter values estimated in subgroups according to the discrete combinations of the covariate values. Smoothing splines are then used to smooth and interpolate the parameter values. In effect the mean and the shape parameters are each modelled as (different) functions of gender, age and geographical factors. The estimated regressions for the mean have simple and intuitive interpretations. However, the dependence of the (negative binomial) shape parameter on the covariates is more difficult to interpret and is subject to influence by extreme observations. We illustrate the use of the model by estimating the distribution of the number of doctor consultations in the Statistical Local Area of Ryde, based on population numbers from the 1996 census.  相似文献   

5.
Robust estimating equation based on statistical depth   总被引:2,自引:0,他引:2  
In this paper the estimating equation is constructed via statistical depth. The obtained estimating equation and parameter estimation have desirable robustness, which attain very high breakdown values close to 1/2. At the same time, the obtained parameter estimation still has ordinary asymptotic behaviours such as asymptotic normality. In particular, the robust quasi likelihood and depth-weighted LSE respectively for nonlinear and linear regression model are introduced. A suggestion for choosing weight function and a method of constructing depth-weighed quasi likelihood equation are given. This paper is supported by NNSF projects (10371059 and 10171051) of China.  相似文献   

6.
Imputation is a much used method for handling missing data. It is appealing as it separates the missing data part of the analysis, which is handled by imputation, and the estimation part, which is handled by complete data methods. Most imputation methods, however, either rely on strict parametric assumptions or are rather ad hoc in which case they often only work approximately under even stricter assumptions. In this paper a non-parametric imputation method is proposed. Since it is non-parametric it works under quite general assumptions. In particular, a model for the complete data is not required in the imputation step, and the complete data method used after the imputation may be a general estimating equation for estimating a finite-dimensional parameter. Large sample results for the resulting estimator are given.  相似文献   

7.
All statistical methods involve basic model assumptions, which if violated render results of the analysis dubious. A solution to such a contingency is to seek an appropriate model or to modify the customary model by introducing additional parameters. Both of these approaches are in general cumbersome and demand uncommon expertise. An alternative is to transform the data to achieve compatibility with a well understood and convenient customary model with readily available software. The well-known example is the Box–Cox data transformation developed in order to make the normal theory linear model usable even when the assumptions of normality and homoscedasticity are not met.In reliability analysis the model appropriateness is determined by the nature of the hazard function. The well-known Weibull distribution is the most commonly employed model for this purpose. However, this model, which allows only a small spectrum of monotone hazard rates, is especially inappropriate if the data indicate bathtub-shaped hazard rates.In this paper, a new model based on the use of data transformation is presented for modeling bathtub-shaped hazard rates. Parameter estimation methods are studied for this new (transformation) approach. Examples and results of comparisons between the new model and other bathtub-shaped models are shown to illustrate the applicability of this new model.  相似文献   

8.
In a stated preference discrete choice experiment each subject is typically presented with several choice sets, and each choice set contains a number of alternatives. The alternatives are defined in terms of their name (brand) and their attributes at specified levels. The task for the subject is to choose from each choice set the alternative with highest utility for them. The multinomial is an appropriate distribution for the responses to each choice set since each subject chooses one alternative, and the multinomial logit is a common model. If the responses to the several choice sets are independent, the likelihood function is simply the product of multinomials. The most common and generally preferred method of estimating the parameters of the model is maximum likelihood (that is, selecting as estimates those values that maximize the likelihood function). If the assumption of within-subject independence to successive choice tasks is violated (it is almost surely violated), the likelihood function is incorrect and maximum likelihood estimation is inappropriate. The most serious errors involve the estimation of the variance-covariance matrix of the model parameter estimates, and the corresponding variances of market shares and changes in market shares.

In this paper we present an alternative method of estimation of the model parameter coefficients that incorporates a first-order within-subject covariance structure. The method involves the familiar log-odds transformation and application of the multivariate delta method. Estimation of the model coefficients after the transformation is a straightforward generalized least squares regression, and the corresponding improved estimate of the variance-covariance matrix is in closed form. Estimates of market share (and change in market share) follow from a second application of the multivariate delta method. The method and comparison with maximum likelihood estimation are illustrated with several simulated and actual data examples.

Advantages of the proposed method are: 1) it incorporates the within-subject covariance structure; 2) it is completely data driven; 3) it requires no additional model assumptions; 4) assuming asymptotic normality, it provides a simple procedure for computing confidence regions on market shares and changes in market shares; and 5) it produces results that are asymptotically equivalent to those produced by maximum likelihood when the data are independent.  相似文献   

9.
We propose new ensemble approaches to estimate the population mean for missing response data with fully observed auxiliary variables. We first compress the working models according to their categories through a weighted average, where the weights are proportional to the square of the least‐squares coefficients of model refitting. Based on the compressed values, we develop two ensemble frameworks, under which one is to adjust weights in the inverse probability weighting procedure and the other is built upon an additive structure by reformulating the augmented inverse probability weighting function. The asymptotic normality property is established for the proposed estimators through the theory of estimating functions with plugged‐in nuisance parameter estimates. Simulation studies show that the new proposals have substantial advantages over existing ones for small sample sizes, and an acquired immune deficiency syndrome data example is used for illustration.  相似文献   

10.
Although regression estimates are quite robust to slight departure from normality, symmetric prediction intervals assuming normality can be highly unsatisfactory and problematic if the residuals have a skewed distribution. For data with distributions outside the class covered by the Generalized Linear Model, a common way to handle non-normality is to transform the response variable. Unfortunately, transforming the response variable often destroys the theoretical or empirical functional relationship connecting the mean of the response variable to the explanatory variables established on the original scale. Further complication arises if a single transformation cannot both stabilize variance and attain normality. Furthermore, practitioners also find the interpretation of highly transformed data not obvious and often prefer an analysis on the original scale. The present paper presents an alternative approach for handling simultaneously heteroscedasticity and non-normality without resorting to data transformation. Unlike classical approaches, the proposed modeling allows practitioners to formulate the mean and variance relationships directly on the original scale, making data interpretation considerably easier. The modeled variance relationship and form of non-normality in the proposed approach can be easily examined through a certain function of the standardized residuals. The proposed method is seen to remain consistent for estimating the regression parameters even if the variance function is misspecified. The method along with some model checking techniques is illustrated with a real example.  相似文献   

11.
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.  相似文献   

12.
Count response data often exhibit departures from the assumptions of standard Poisson generalized linear models. In particular, cluster level correlation of the data and truncation at zero are two common characteristics of such data. This paper describes a random components truncated Poisson model that can be applied to clustered and zero‐truncated count data. Residual maximum likelihood method estimators for the parameters of this model are developed and their use is illustrated using a dataset of non‐zero counts of sheets with edge‐strain defects in iron sheets produced by the Mobarekeh Steel Complex, Iran. The paper also reports on a small‐scale simulation study that supports the estimation procedure.  相似文献   

13.
Abstract.  The problem of estimating a nonlinear regression model, when the dependent variable is randomly censored, is considered. The parameter of the model is estimated by least squares using synthetic data. Consistency and asymptotic normality of the least squares estimators are derived. The proofs are based on a novel approach that uses i.i.d. representations of synthetic data through Kaplan–Meier integrals. The asymptotic results are supported by a small simulation study.  相似文献   

14.
The linear mixed model assumes normality of its two sources of randomness: the random effects and the residual error. Recent research demonstrated that a simple transformation of the response targets normality of both sources simultaneously. However, estimating the transformation can lead to biased estimates of the variance components. Here, we provide guidance regarding this potential bias and propose a correction for it when such bias is substantial. This correction allows for accurate estimation of the random effects when using a transformation to achieve normality. The utility of this approach is demonstrated in a study of sleep-wake behavior in preterm infants.  相似文献   

15.
Abstract

Robust parameter design (RPD) is an effective tool, which involves experimental design and strategic modeling to determine the optimal operating conditions of a system. The usual assumptions of RPD are that normally distributed experimental data and no contamination due to outliers. And generally the parameter uncertainties in response models are neglected. However, using normal theory modeling methods for a skewed data and ignoring parameter uncertainties can create a chain of degradation in optimization and production phases such that misleading fit, poor estimated optimal operating conditions, and poor quality products. This article presents a new approach based on confidence interval (CI) response modeling for the process mean. The proposed interval robust design makes the system median unbiased for the mean and uses midpoint of the interval as a measure of location performance response. As an alternative robust estimator for the process variance response modeling, using biweight midvariance is proposed which is both resistant and robust of efficiency where normality is not met. The results further show that the proposed interval robust design gives a robust solution to the skewed structure of the data and to contaminated data. The procedure and its advantages are illustrated using two experimental design studies.  相似文献   

16.
Penalised likelihood methods, such as the least absolute shrinkage and selection operator (Lasso) and the smoothly clipped absolute deviation penalty, have become widely used for variable selection in recent years. These methods impose penalties on regression coefficients to shrink a subset of them towards zero to achieve parameter estimation and model selection simultaneously. The amount of shrinkage is controlled by the regularisation parameter. Popular approaches for choosing the regularisation parameter include cross‐validation, various information criteria and bootstrapping methods that are based on mean square error. In this paper, a new data‐driven method for choosing the regularisation parameter is proposed and the consistency of the method is established. It holds not only for the usual fixed‐dimensional case but also for the divergent setting. Simulation results show that the new method outperforms other popular approaches. An application of the proposed method to motif discovery in gene expression analysis is included in this paper.  相似文献   

17.
In this article, we introduce a new estimator for the generalized Pareto distribution, which is based on the maximum likelihood estimation and the goodness of fit. The asymptotic normality of the new estimator is shown and a small simulation. From the simulation, the performance of the new estimator is roughly comparable with maximum likelihood for positive values of the shape parameter and often much better than maximum likelihood for negative values.  相似文献   

18.
Box–Cox power transformation is a commonly used methodology to transform the distribution of the data into a normal distribution. The methodology relies on a single transformation parameter. In this study, we focus on the estimation of this parameter. For this purpose, we employ seven popular goodness-of-fit tests for normality, namely Shapiro–Wilk, Anderson–Darling, Cramer-von Mises, Pearson Chi-square, Shapiro-Francia, Lilliefors and Jarque–Bera tests, together with a searching algorithm. The searching algorithm is based on finding the argument of the minimum or maximum depending on the test, i.e., maximum for the Shapiro–Wilk and Shapiro–Francia, minimum for the rest. The artificial covariate method of Dag et al. (2014) is also included for comparison purposes. Simulation studies are implemented to compare the performances of the methods. Results show that Shapiro–Wilk and the artificial covariate method are more effective than the others and Pearson Chi-square is the worst performing method. The methods are also applied to two real-life datasets. The R package AID is proposed for implementation of the aforementioned methods.  相似文献   

19.
As is the case of many studies, the data collected are limited and an exact value is recorded only if it falls within an interval range. Hence, the responses can be either left, interval or right censored. Linear (and nonlinear) regression models are routinely used to analyze these types of data and are based on normality assumptions for the errors terms. However, those analyzes might not provide robust inference when the normality assumptions are questionable. In this article, we develop a Bayesian framework for censored linear regression models by replacing the Gaussian assumptions for the random errors with scale mixtures of normal (SMN) distributions. The SMN is an attractive class of symmetric heavy-tailed densities that includes the normal, Student-t, Pearson type VII, slash and the contaminated normal distributions, as special cases. Using a Bayesian paradigm, an efficient Markov chain Monte Carlo algorithm is introduced to carry out posterior inference. A new hierarchical prior distribution is suggested for the degrees of freedom parameter in the Student-t distribution. The likelihood function is utilized to compute not only some Bayesian model selection measures but also to develop Bayesian case-deletion influence diagnostics based on the q-divergence measure. The proposed Bayesian methods are implemented in the R package BayesCR. The newly developed procedures are illustrated with applications using real and simulated data.  相似文献   

20.
This paper studies four methods for estimating the Box-Cox parameter used to transform data to normality. Three of these are based on optimizing test statistics for standard normality tests (the Shapiro-Wilk. skewness, and kurtosis tests); the fourth uses the maximum likelihood estimator of the Box-Cox parameter. The four methods are compared and evaluated with a simulation study, where their performances under different skewness and kurtosis conditions are analyzed. The estimator based on optimizing the Shapiro-Wilk statistic generally gives rise to the best transformations, while the maximum likelihood estimator performs almost as well. Estimators based on optimizing skewness and kurtosis do not perform well in general.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号