首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Over 60 years ago Ronald Fisher demonstrated a number of potential pitfalls with statistical analyses using ratio variables. Nonetheless, these pitfalls are largely overlooked in contemporary clinical and epidemiological research, which routinely uses ratio variables in statistical analyses. This article aims to demonstrate how very different findings can be generated as a result of less than perfect correlations among the data used to generate ratio variables. These imperfect correlations result from measurement error and random biological variation. While the former can often be reduced by improvements in measurement, random biological variation is difficult to estimate and eliminate in observational studies. Moreover, wherever the underlying biological relationships among epidemiological variables are unclear, and hence the choice of statistical model is also unclear, the different findings generated by different analytical strategies can lead to contradictory conclusions. Caution is therefore required when interpreting analyses of ratio variables whenever the underlying biological relationships among the variables involved are unspecified or unclear. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

2.
The inverse Gaussian-Poisson (two-parameter Sichel) distribution is useful in fitting overdispersed count data. We consider linear models on the mean of a response variable, where the response is in the form of counts exhibiting extra-Poisson variation, and assume an IGP error distribution. We show how maximum likelihood estimation may be carried out using iterative Newton-Raphson IRLS fitting, where GLIM is used for the IRLS part of the maximization. Approximate likelihood ratio tests are given.  相似文献   

3.
A bank offering unsecured personal loans may be interested in several related outcome variables, including defaulting on the repayments, early repayment or failing to take up an offered loan. Current predictive models used by banks typically consider such variables individually. However, the fact that they are related to each other, and to many interrelated potential predictor variables, suggests that graphical models may provide an attractive alternative solution. We developed such a model for a data set of 15 variables measured on a set of 14 000 applications for unsecured personal loans. The resulting global model of behaviour enabled us to identify several previously unsuspected relationships of considerable interest to the bank. For example, we discovered important but obscure relationships between taking out insurance, prior delinquency with a credit card and delinquency with the loan.  相似文献   

4.
As modeling efforts expand to a broader spectrum of areas the amount of computer time required to exercise the corresponding computer codes has become quite costly (several hours for a single run is not uncommon). This costly process can be directly tied to the complexity of the modeling and to the large number of input variables (often numbering in the hundreds) Further, the complexity of the modeling (usually involving systems of differential equations) makes the relationships among the input variables not mathematically tractable. In this setting it is desired to perform sensitivity studies of the input-output relationships. Hence, a judicious selection procedure for the choic of values of input variables is required, Latin hypercube sampling has been shown to work well on this type of problem.

However, a variety of situations require that decisions and judgments be made in the face of uncertainty. The source of this uncertainty may be lack ul knowledge about probability distributions associated with input variables, or about different hypothesized future conditions, or may be present as a result of different strategies associated with a decision making process In this paper a generalization of Latin hypercube sampling is given that allows these areas to be investigated without making additional computer runs. In particular it is shown how weights associated with Latin hypercube input vectors may be rhangpd to reflect different probability distribution assumptions on key input variables and yet provide: an unbiased estimate of the cumulative distribution function of the output variable. This allows for different distribution assumptions on input variables to be studied without additional computer runs and without fitting a response surface. In addition these same weights can be used in a modified nonparametric Friedman test to compare treatments, Sample size requirements needed to apply the results of the work are also considered. The procedures presented in this paper are illustrated using a model associated with the risk assessment of geologic disposal of radioactive waste.  相似文献   

5.
Carbon dioxide is one of the major contributors to Global Warming. In the present study, we develop a differential equation to model the carbon dioxide emission data in the atmosphere using functional linear regression approach. In the proposed method, a differential operator is defined as data smoother and we use the penalized least square fitting criteria to smooth the data. The profile error sum of squares is optimized to estimate the differential operators using functional regression. The solution of the developed differential equation estimates and predicts the rate of change of carbon dioxide in the atmosphere at a particular time. We apply the proposed model to fit the emission of carbon dioxide data in the continental United States. Numerical simulations of a number of test cases depict a satisfactory agreement with real data.  相似文献   

6.
We introduce a framework for estimating the effect that a binary treatment has on a binary outcome in the presence of unobserved confounding. The methodology is applied to a case study which uses data from the Medical Expenditure Panel Survey and whose aim is to estimate the effect of private health insurance on health care utilization. Unobserved confounding arises when variables which are associated with both treatment and outcome are not available (in economics this issue is known as endogeneity). Also, treatment and outcome may exhibit a dependence which cannot be modeled using a linear measure of association, and observed confounders may have a non-linear impact on the treatment and outcome variables. The problem of unobserved confounding is addressed using a two-equation structural latent variable framework, where one equation essentially describes a binary outcome as a function of a binary treatment whereas the other equation determines whether the treatment is received. Non-linear dependence between treatment and outcome is dealt using copula functions, whereas covariate-response relationships are flexibly modeled using a spline approach. Related model fitting and inferential procedures are developed, and asymptotic arguments presented.  相似文献   

7.
Nakamura (1990) introduced an approach to estimation in measurement error models based on a corrected score function, and claimed that the estimators obtained are consistent for functional models. Proof of the claim essentially assumed the existence of a corrected log-likelihood for which differentiation with respect to model parameters can be interchanged with conditional expectation taken with respect to the measurement error distributions, given the response variables and true covariates. This paper deals with simple yet practical models for which the above assumption is false, i.e. a corrected score function for the model may not be obtained through differentiating a corrected log-likelihood although it exists. Alternative regularity conditions with no reference to log-likelihood are given, under which the corrected score functions yield consistent and asymptotically normal estimators. Application to functional comparative calibration yields interesting results.  相似文献   

8.
Control charts for residuals, based on the regression model, require a robust fitting technique for minimizing the error resulting from the fitted model. However, in the multivariate case, when the number of variables is high and data become complex, traditional fitting techniques, such as ordinary least squares (OLS), lose efficiency. In this paper, support vector regression (SVR) is used to construct robust control charts for residuals, called SVR-chart. This choice is based on the fact that the SVR is designed to minimize the structural error whereas other techniques minimize the empirical error. An application shows that SVR methods gives competitive results in comparison with the OLS and the partial least squares method, in terms of standard deviation of the error prediction and the standard error of performance. A sensitivity study is conducted to evaluate the SVR-chart performance based on the average run length (ARL) and showed that the SVR-chart has the best ARL behaviour in comparison with the other residuals control charts.  相似文献   

9.
This paper extends methods for nonlinear regression analysis that have developed for the analysis of clustered data. Its novelty lies in its dual incorporation of random cluster effects and structural error in the measurement of the explanatory variables. Moments up to second order are assumed to have been specified for the latter to enable a generalized estimating equations approach to be used for fitting and testing nonlinear models linking response to these explanatory variables and random effects. Taylor expansion methods are used, and a difficulty with earlier approaches overcome. Finally we describe an application of this methodology to indicate how it can be used. That application concerns the degree of association of hospital admissions for acute respiratory health problems and air pollution.  相似文献   

10.
Regression models incorporating measurement error have received much attention in the recent literature. Measurement error can arise both in the explanatory variables and in the response. We introduce a fairly general model which permits both types of errors. The model naturally arises as a hierarchical structure involving three distinct regressions. For each regression, a semiparametric generalized linear model is introduced utilizing an unknown monotonic function. By transformation, such a function can be viewed as a c.d.f. We model an unknown c.d.f. using mixtures of Beta c.d.f.'s, noting that such mixtures are dense within the class of all continuous distributions on [0,1]. Thus, the overall model incorporates nonparametric links or calibration curves along with customary regression coefficients clarifying its semiparametric nature. Fully Bayesian fitting of such a model using sampling-based methods is proposed. We indicate numerous attractive advantages which our model and its fitting provide. A simulation example demonstrates quantitatively the potential benefit.  相似文献   

11.
The ecological fallacy is related to Simpson's paradox (1951) where relationships among group means may be counterintuitive and substantially different from relationships within groups, where the groups are usually geographic entities such as census tracts. We consider the problem of estimating the correlation between two jointly normal random variables where only ecological data (group means) are available. Two empirical Bayes estimators and one fully Bayesian estimator are derived and compared with the usual ecological estimator, which is simply the Pearson correlation coefficient of the group sample means. We simulate the bias and mean squared error performance of these estimators, and also give an example employing a dataset where the individual level data are available for model checking. The results indicate superiority of the empirical Bayes estimators in a variety of practical situations where, though we lack individual level data, other relevant prior information is available.  相似文献   

12.
Time-series data are often subject to measurement error, usually the result of needing to estimate the variable of interest. Generally, however, the relationship between the surrogate variables and the true variables can be rather complicated compared to the classical additive error structure usually assumed. In this article, we address the estimation of the parameters in autoregressive models in the presence of function measurement errors. We first develop a parameter estimation method with the help of validation data; this estimation method does not depend on functional form and the distribution of the measurement error. The proposed estimator is proved to be consistent. Moreover, the asymptotic representation and the asymptotic normality of the estimator are also derived, respectively. Simulation results indicate that the proposed method works well for practical situation.  相似文献   

13.
Structural vector autoregressive analysis for cointegrated variables   总被引:1,自引:0,他引:1  
Summary Vector autoregressive (VAR) models are capable of capturing the dynamic structure of many time series variables. Impulse response functions are typically used to investigate the relationships between the variables included in such models. In this context the relevant impulses or innovations or shocks to be traced out in an impulse response analysis have to be specified by imposing appropriate identifying restrictions. Taking into account the cointegration structure of the variables offers interesting possibilities for imposing identifying restrictions. Therefore VAR models which explicitly take into account the cointegration structure of the variables, so-called vector error correction models, are considered. Specification, estimation and validation of reduced form vector error correction models is briefly outlined and imposing structural short- and long-run restrictions within these models is discussed. I thank an anonymous reader for comments on an earlier draft of this paper that helped me to improve the exposition.  相似文献   

14.
Shi, Wang, Murray-Smith and Titterington (Biometrics 63:714–723, 2007) proposed a Gaussian process functional regression (GPFR) model to model functional response curves with a set of functional covariates. Two main problems are addressed by their method: modelling nonlinear and nonparametric regression relationship and modelling covariance structure and mean structure simultaneously. The method gives very good results for curve fitting and prediction but side-steps the problem of heterogeneity. In this paper we present a new method for modelling functional data with ‘spatially’ indexed data, i.e., the heterogeneity is dependent on factors such as region and individual patient’s information. For data collected from different sources, we assume that the data corresponding to each curve (or batch) follows a Gaussian process functional regression model as a lower-level model, and introduce an allocation model for the latent indicator variables as a higher-level model. This higher-level model is dependent on the information related to each batch. This method takes advantage of both GPFR and mixture models and therefore improves the accuracy of predictions. The mixture model has also been used for curve clustering, but focusing on the problem of clustering functional relationships between response curve and covariates, i.e. the clustering is based on the surface shape of the functional response against the set of functional covariates. The model is examined on simulated data and real data.  相似文献   

15.
"One can often gain insight into the aetiology of a disease by relating mortality rates in different areas to explanatory variables. Multiple regression techniques are usually employed, but unweighted least squares may be inappropriate if the areas vary in population size. Also, a fully weighted regression, with weights inversely proportional to binomial sampling variances, is usually too extreme. This paper proposes an intermediate solution via maximum likelihood which takes account of three sources of variation in death rates: sampling error, explanatory variables and unexplained differences between areas. The method is also adapted for logit (death rates), standardized mortality ratios (SMRs) and log (SMRs). Two [United Kingdom] examples are presented."  相似文献   

16.
For the functional measurement error model, the true, unobservable explanatory variables when treated as nuisance parameters yield an increase in the number of nuisance parameters corresponding to an increase in sample size. Fisher's information may not exist for all parameters under this scenario. We propose a simple but effective method of deriving Fisher's information by approximating the design matrix of explanatory variables with a quantile design matrix. We illustrate the application of our method with a numerical example. Adaptation of this method shows very good performance for the prediction problem.  相似文献   

17.
Given a two-way contingency table in which the rows and columns both define ordinal variables, there are many ways in which the informal idea of positive association between those variables might be defined. This paper considers a variety of definitions expressed as inequality constraints on cross-product ratios. Logical relationships between the definitions are explored. Each definition can serve as a composite alternative against which the null hypothesis of no association may be tested. For a broad class of such alternatives a decomposition of the log-likelihood gives both an explicit likelihood ratio statistic and its asymptotic null hypothesis distribution. Results are derived for multinomial sampling and for fully conditional sampling with row and column totals fixed.  相似文献   

18.
The independent additive errors linear model consists of a structure for the mean and a separate structure for the error distribution. The error structure may be parametric or it may be semiparametric. Under alternative values of the mean structure, the best fitting additive errors model has an error distribution which can be represented as the convolution of the actual error distribution and the marginal distribution of a misspecification term. The model misspecification term results from the covariates' distribution. Conditions are developed to distinguish when the semiparametric model yields sharper inference than the parametric model and vice versa. The main conditions concern the actual error distribution and the covariates' distribution. The theoretical results explain a paradoxical finding in semiparametric Bayesian modelling, where the posterior distribution under a semiparametric model is found to be more concentrated than is the posterior distribution under a corresponding parametric model. The paradox is illustrated on a set of allometric data. The Canadian Journal of Statistics 39: 165–180; 2011 ©2011 Statistical Society of Canada  相似文献   

19.
Many statistical agencies, survey organizations, and research centers collect data that suffer from item nonresponse and erroneous or inconsistent values. These data may be required to satisfy linear constraints, for example, bounds on individual variables and inequalities for ratios or sums of variables. Often these constraints are designed to identify faulty values, which then are blanked and imputed. The data also may exhibit complex distributional features, including nonlinear relationships and highly nonnormal distributions. We present a fully Bayesian, joint model for modeling or imputing data with missing/blanked values under linear constraints that (i) automatically incorporates the constraints in inferences and imputations, and (ii) uses a flexible Dirichlet process mixture of multivariate normal distributions to reflect complex distributional features. Our strategy for estimation is to augment the observed data with draws from a hypothetical population in which the constraints are not present, thereby taking advantage of computationally expedient methods for fitting mixture models. Missing/blanked items are sampled from their posterior distribution using the Hit-and-Run sampler, which guarantees that all imputations satisfy the constraints. We illustrate the approach using manufacturing data from Colombia, examining the potential to preserve joint distributions and a regression from the plant productivity literature. Supplementary materials for this article are available online.  相似文献   

20.
For normally distributed data analyzed with linear models, it is well known that measurement error on an independent variable leads to attenuation of the effect of the independent variable on the dependent variable. However, for time‐to‐event variables such as progression‐free survival (PFS), the effect of the measurement variability in the underlying measurements defining the event is less well understood. We conducted a simulation study to evaluate the impact of measurement variability in tumor assessment on the treatment effect hazard ratio for PFS and on the median PFS time, for different tumor assessment frequencies. Our results show that scan measurement variability can cause attenuation of the treatment effect (i.e. the hazard ratio is closer to one) and that the extent of attenuation may be increased with more frequent scan assessments. This attenuation leads to inflation of the type II error. Therefore, scan measurement variability should be minimized as far as possible in order to reveal a treatment effect that is closest to the truth. In disease settings where the measurement variability is shown to be large, consideration may be given to inflating the sample size of the study to maintain statistical power. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号