首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 609 毫秒
1.
This paper proposes a probabilistic frontier regression model for binary type output data in a production process setup. We consider one of the two categories of outputs as ‘selected’ category and the reduction in probability of falling in this category is attributed to the reduction in technical efficiency (TE) of the decision-making unit. An efficiency measure is proposed to determine the deviations of individual units from the probabilistic frontier. Simulation results show that the average estimated TE component is close to its true value. An application of the proposed method to the data related to the Indian public sector banking system is provided where the output variable is the indicator of level of non-performing assets. Individual TE is obtained for each of the banks under consideration. Among the public sector banks, Andhra bank is found to be the most efficient, whereas the United Bank of India is the least.  相似文献   

2.
Ensuring a standard of assessment in situations where marking panels are used is fraught with difficulties, particularly where essay-type responses are to be marked. This paper discusses statistical process control procedures, similar to those used in industry, to help moderate marking quality when ‘double-marking’ or ‘partial double-marking’ are used. When questions are assessed by the same two markers, the scores assigned to responses by each marker may be adjusted so that their assessments are on average equal in terms of location and scale. The paper also discusses methods of controlling sequential assessment, and demonstrates the application of these techniques in evaluating marker consistency, using data from school leaving examinations in geography.  相似文献   

3.
Two types of bivariate models for categorical response variables are introduced to deal with special categories such as ‘unsure’ or ‘unknown’ in combination with other ordinal categories, while taking additional hierarchical data structures into account. The latter is achieved by the use of different covariance structures for a trivariate random effect. The models are applied to data from the INSIDA survey, where interest goes to the effect of covariates on the association between HIV risk perception (quadrinomial with an ‘unknown risk’ category) and HIV infection status (binary). The final model combines continuation-ratio with cumulative link logits for the risk perception, together with partly correlated and partly shared trivariate random effects for the household level. The results indicate that only age has a significant effect on the association between HIV risk perception and infection status. The proposed models may be useful in various fields of application such as social and biomedical sciences, epidemiology and public health.  相似文献   

4.
Interpretation in nonlinear regression models that include sets of dummy variables representing categories of underlying categorical variables is not straightforward. Partial effects giving the differences between each category and the reference category are routinely computed in the empirical economics literature. Yet, partial effects yielding the differences between each category and all other categories are not calculated, despite having great interpretative value. We derive the correct formulae for calculating these partial effects for an ordered probit model. The results of an application using data on subjective well-being illustrate the usefulness of the alternative partial effects.  相似文献   

5.
Most methods for variable selection work from the top down and steadily remove features until only a small number remain. They often rely on a predictive model, and there are usually significant disconnections in the sequence of methodologies that leads from the training samples to the choice of the predictor, then to variable selection, then to choice of a classifier, and finally to classification of a new data vector. In this paper we suggest a bottom‐up approach that brings the choices of variable selector and classifier closer together, by basing the variable selector directly on the classifier, removing the need to involve predictive methods in the classification decision, and enabling the direct and transparent comparison of different classifiers in a given problem. Specifically, we suggest ‘wrapper methods’, determined by classifier type, for choosing variables that minimize the classification error rate. This approach is particularly useful for exploring relationships among the variables that are chosen for the classifier. It reveals which variables have a high degree of leverage for correct classification using different classifiers; it shows which variables operate in relative isolation, and which are important mainly in conjunction with others; it permits quantification of the authority with which variables are selected; and it generally leads to a reduced number of variables for classification, in comparison with alternative approaches based on prediction.  相似文献   

6.
The problem of multivariate regression modelling in the presence of heterogeneous data is dealt to address the relevant issue of the influence of such heterogeneity in assessing the linear relations between responses and explanatory variables. In spite of its popularity, clusterwise regression is not designed to identify the linear relationships within ‘homogeneous’ clusters exhibiting internal cohesion and external separation. A within-clusterwise regression is introduced to achieve this aim and, since the possible presence of a linear relation ‘between’ clusters should be also taken into account, a general regression model is introduced to account for both the between-cluster and the within-cluster regression variation. Some decompositions of the variance of the responses accounted for are also given, the least-squares estimation of the parameters is derived, together with an appropriate coordinate descent algorithms and the performance of the proposed methodology is evaluated in different datasets.  相似文献   

7.
This paper presents a method of fitting factorial models to recidivism data consisting of the (possibly censored) time to ‘fail’ of individuals, in order to test for differences between groups. Here ‘failure’ means rearrest, reconviction or reincarceration, etc. A proportion P of the sample is assumed to be ‘susceptible’ to failure, i.e. to fail eventually, while the remaining 1-P are ‘immune’, and never fail. Thus failure may be described in two ways: by the probability P that an individual ever fails again (‘probability of recidivism’), and by the rate of failure Λ for the susceptibles. Related analyses have been proposed previously: this paper argues that a factorial approach, as opposed to regression approaches advocated previously, offers simplified analysis and interpretation of these kinds of data. The methods proposed, which are also applicable in medical statistics and reliability analyses, are demonstrated on data sets in which the factors are Parole Type (released to freedom or on parole), Age group (≤ 20 years, 20–40 years, > 40 years), and Marital Status. The outcome (failure) is a return to prison following first or second release.  相似文献   

8.
Statistical disclosure control (SDC) is a balancing act between mandatory data protection and the comprehensible demand from researchers for access to original data. In this paper, a family of methods is defined to ‘mask’ sensitive variables before data files can be released. In the first step, the variable to be masked is ‘cloned’ (C). Then, the duplicated variable as a whole or just a part of it is ‘suppressed’ (S). The masking procedure's third step ‘imputes’ (I) data for these artificial missings. Then, the original variable can be deleted and its masked substitute has to serve as the basis for the analysis of data. The idea of this general ‘CSI framework’ is to open the wide field of imputation methods for SDC. The method applied in the I-step can make use of available auxiliary variables including the original variable. Different members of this family of methods delivering variance estimators are discussed in some detail. Furthermore, a simulation study analyzes various methods belonging to the family with respect to both, the quality of parameter estimation and privacy protection. Based on the results obtained, recommendations are formulated for different estimation tasks.  相似文献   

9.
The traditional and readily available multivariate analysis of variance (MANOVA) tests such as Wilks' Lambda and the Pillai–Bartlett trace start to suffer from low power as the number of variables approaches the sample size. Moreover, when the number of variables exceeds the number of available observations, these statistics are not available for use. Ridge regularisation of the covariance matrix has been proposed to allow the use of MANOVA in high‐dimensional situations and to increase its power when the sample size approaches the number of variables. In this paper two forms of ridge regression are compared to each other and to a novel approach based on lasso regularisation, as well as to more traditional approaches based on principal components and the Moore‐Penrose generalised inverse. The performance of the different methods is explored via an extensive simulation study. All the regularised methods perform well; the best method varies across the different scenarios, with margins of victory being relatively modest. We examine a data set of soil compaction profiles at various positions relative to a ridgetop, and illustrate how our results can be used to inform the selection of a regularisation method.  相似文献   

10.
The problem of statistical calibration of a measuring instrument can be framed both in a statistical context as well as in an engineering context. In the first, the problem is dealt with by distinguishing between the ‘classical’ approach and the ‘inverse’ regression approach. Both of these models are static models and are used to estimate exact measurements from measurements that are affected by error. In the engineering context, the variables of interest are considered to be taken at the time at which you observe it. The Bayesian time series analysis method of Dynamic Linear Models can be used to monitor the evolution of the measures, thus introducing a dynamic approach to statistical calibration. The research presented employs a new approach to performing statistical calibration. A simulation study in the context of microwave radiometry is conducted that compares the dynamic model to traditional static frequentist and Bayesian approaches. The focus of the study is to understand how well the dynamic statistical calibration method performs under various signal-to-noise ratios, r.  相似文献   

11.
This article provides alternative circular smoothing methods in nonparametric estimation of periodic functions. By treating the data as ‘circular’, we solve the “boundary issue” in the nonparametric estimation treating the data as ‘linear’. By redefining the distance metric and signed distance, we modify many estimators used in the situations involving periodic patterns. In the perspective of ‘nonparametric estimation of periodic functions’, we present the examples in nonparametric estimation of (1) a periodic function, (2) multiple periodic functions, (3) an evolving function, (4) a periodically varying-coefficient model and (5) a generalized linear model with periodically varying coefficient. In the perspective of ‘circular statistics’, we provide alternative approaches to calculate the weighted average and evaluate the ‘linear/circular–linear/circular’ association and regression. Simulation studies and an empirical study of electricity price index have been conducted to illustrate and compare our methods with other methods in the literature.  相似文献   

12.
Generalised variance function (GVF) models are data analysis techniques often used in large‐scale sample surveys to approximate the design variance of point estimators for population means and proportions. Some potential advantages of the GVF approach include operational simplicity, more stable sampling errors estimates and providing a convenient method of summarising results when a high number of survey variables is considered. In this paper, several parametric and nonparametric methods for GVF estimation with binary variables are proposed and compared. The behavior of these estimators is analysed under heteroscedasticity and in the presence of outliers and influential observations. An empirical study based on the annual survey of living conditions in Galicia (a region in the northwest of Spain) illustrates the behaviour of the proposed estimators.  相似文献   

13.
Many time series are measured monthly, either as averages or totals, and such data often exhibit seasonal variability – the values of the series are consistently larger for some months of the year than for others. A typical series of this type is the number of deaths each month attributed to SIDS (Sudden Infant Death Syndrome). Seasonality can be modelled in a number of ways. This paper describes and discusses various methods for modelling seasonality in SIDS data, though much of the discussion is relevant to other seasonally varying data. There are two main approaches, either fitting a circular probability distribution to the data, or using regression-based techniques to model the mean seasonal behaviour. Both are discussed in this paper.  相似文献   

14.
In multi-category response models, categories are often ordered. In the case of ordinal response models, the usual likelihood approach becomes unstable with ill-conditioned predictor space or when the number of parameters to be estimated is large relative to the sample size. The likelihood estimates do not exist when the number of observations is less than the number of parameters. The same problem arises if constraint on the order of intercept values is not met during the iterative procedure. Proportional odds models (POMs) are most commonly used for ordinal responses. In this paper, penalized likelihood with quadratic penalty is used to address these issues with a special focus on POMs. To avoid large differences between two parameter values corresponding to the consecutive categories of an ordinal predictor, the differences between the parameters of two adjacent categories should be penalized. The considered penalized-likelihood function penalizes the parameter estimates or differences between the parameter estimates according to the type of predictors. Mean-squared error for parameter estimates, deviance of fitted probabilities and prediction error for ridge regression are compared with usual likelihood estimates in a simulation study and an application.  相似文献   

15.
Frequently in clinical and epidemiologic studies, the event of interest is recurrent (i.e., can occur more than once per subject). When the events are not of the same type, an analysis which accounts for the fact that events fall into different categories will often be more informative. Often, however, although event times may always be known, information through which events are categorized may potentially be missing. Complete‐case methods (whose application may require, for example, that events be censored when their category cannot be determined) are valid only when event categories are missing completely at random. This assumption is rather restrictive. The authors propose two multiple imputation methods for analyzing multiple‐category recurrent event data under the proportional means/rates model. The use of a proper or improper imputation technique distinguishes the two approaches. Both methods lead to consistent estimation of regression parameters even when the missingness of event categories depends on covariates. The authors derive the asymptotic properties of the estimators and examine their behaviour in finite samples through simulation. They illustrate their approach using data from an international study on dialysis.  相似文献   

16.
The heterogeneity of error variance often causes a huge interpretive problem in linear regression analysis. Before taking any remedial measures we first need to detect this problem. A large number of diagnostic plots are now available in the literature for detecting heteroscedasticity of error variances. Among them the ‘residuals’ and ‘fits’ (R–F) plot is very popular and commonly used. In the R–F plot residuals are plotted against the fitted responses, where both these components are obtained using the ordinary least squares (OLS) method. It is now evident that the OLS fits and residuals suffer a huge setback in the presence of unusual observations and hence the R–F plot may not exhibit the real scenario. The deletion residuals based on a data set free from all unusual cases should estimate the true errors in a better way than the OLS residuals. In this paper we propose ‘deletion residuals’ and the ‘deletion fits’ (DR–DF) plot for the detection of the heterogeneity of error variances in a linear regression model to get a more convincing and reliable graphical display. Examples show that this plot locates unusual observations more clearly than the R–F plot. The advantage of using deletion residuals in the detection of heteroscedasticity of error variance is investigated through Monte Carlo simulations under a variety of situations.  相似文献   

17.
Abstract

It is common to monitor several correlated quality characteristics using the Hotelling's T 2 statistic. However, T 2 confounds the location shift with scale shift and consequently it is often difficult to determine the factors responsible for out of control signal in terms of the process mean vector and/or process covariance matrix. In this paper, we propose a diagnostic procedure called ‘D-technique’ to detect the nature of shift. For this purpose, two sets of regression equations, each consisting of regression of a variable on the remaining variables, are used to characterize the ‘structure’ of the ‘in control’ process and that of ‘current’ process. To determine the sources responsible for an out of control state, it is shown that it is enough to compare these two structures using the dummy variable multiple regression equation. The proposed method is operationally simpler and computationally advantageous over existing diagnostic tools. The technique is illustrated with various examples.  相似文献   

18.
Clinical studies in overactive bladder have traditionally used analysis of covariance or nonparametric methods to analyse the number of incontinence episodes and other count data. It is known that if the underlying distributional assumptions of a particular parametric method do not hold, an alternative parametric method may be more efficient than a nonparametric one, which makes no assumptions regarding the underlying distribution of the data. Therefore, there are advantages in using methods based on the Poisson distribution or extensions of that method, which incorporate specific features that provide a modelling framework for count data. One challenge with count data is overdispersion, but methods are available that can account for this through the introduction of random effect terms in the modelling, and it is this modelling framework that leads to the negative binomial distribution. These models can also provide clinicians with a clearer and more appropriate interpretation of treatment effects in terms of rate ratios. In this paper, the previously used parametric and non‐parametric approaches are contrasted with those based on Poisson regression and various extensions in trials evaluating solifenacin and mirabegron in patients with overactive bladder. In these applications, negative binomial models are seen to fit the data well. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

19.
To compare their performance on high dimensional data, several regression methods are applied to data sets in which the number of exploratory variables greatly exceeds the sample sizes. The methods are stepwise regression, principal components regression, two forms of latent root regression, partial least squares, and a new method developed here. The data are four sample sets for which near infrared reflectance spectra have been determined and the regression methods use the spectra to estimate the concentration of various chemical constituents, the latter having been determined by standard chemical analysis. Thirty-two regression equations are estimated using each method and their performances are evaluated using validation data sets. Although it is the most widely used, stepwise regression was decidedly poorer than the other methods considered. Differences between the latter were small with partial least squares performing slightly better than other methods under all criteria examined, albeit not by a statistically significant amount.  相似文献   

20.
Abstract

Errors-in-variable (EIV) regression is often used to gauge linear relationship between two variables both suffering from measurement and other errors, such as, the comparison of two measurement platforms (e.g., RNA sequencing vs. microarray). Scientists are often at a loss as to which EIV regression model to use for there are infinite many choices. We provide sound guidelines toward viable solutions to this dilemma by introducing two general nonparametric EIV regression frameworks: the compound regression and the constrained regression. It is shown that these approaches are equivalent to each other and, to the general parametric structural modeling approach. The advantages of these methods lie in their intuitive geometric representations, their distribution free nature, and their ability to offer candidate solutions with various optimal properties when the ratio of the error variances is unknown. Each includes the classic nonparametric regression methods of ordinary least squares, geometric mean regression (GMR), and orthogonal regression as special cases. Under these general frameworks, one can readily uncover some surprising optimal properties of the GMR, and truly comprehend the benefit of data normalization. Supplementary materials for this article are available online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号