首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 199 毫秒
1.
Art auction catalogs provide a pre-sale prediction interval for the price each item is expected to fetch. When the owner consigns art work to the auction house, a reserve price is agreed upon, which is not announced to the bidders. If the highest bid does not reach it, the item is brought in. Since only the prices of the sold items are published, analysts only have a biased sample to examine due to the selective sale process. Relying on the published data leads to underestimating the forecast error of the pre-sale estimates. However, we were able to obtain several art auction catalogs with the highest bids for the unsold items as well as those of the sold items. With these data we were able to evaluate the accuracy of the predictions of the sale prices or highest bids for all item obtained from the original Heckman selection model that assumed normal error distributions as well as those derived from an alternative model using the t(2) distribution, which yielded a noticeably better fit to several sets of auction data. The measures of prediction accuracy are of more than academic interest as they are used by auction participants to guide their bidding or selling strategy, and similar appraisals are accepted by the US Internal Revenue Services to justify the deductions for charitable contributions donors make on their tax returns.  相似文献   

2.
This paper presents a novel ensemble classifier generation method by integrating the ideas of bootstrap aggregation and Principal Component Analysis (PCA). To create each individual member of an ensemble classifier, PCA is applied to every out-of-bag sample and the computed coefficients of all principal components are stored, and then the principal components calculated on the corresponding bootstrap sample are taken as additional elements of the original feature set. A classifier is trained with the bootstrap sample and some features randomly selected from the new feature set. The final ensemble classifier is constructed by majority voting of the trained base classifiers. The results obtained by empirical experiments and statistical tests demonstrate that the proposed method performs better than or as well as several other ensemble methods on some benchmark data sets publicly available from the UCI repository. Furthermore, the diversity-accuracy patterns of the ensemble classifiers are investigated by kappa-error diagrams.  相似文献   

3.
We consider data with a continuous outcome that is missing at random and a fully observed set of covariates. We compare by simulation a variety of doubly-robust (DR) estimators for estimating the mean of the outcome. An estimator is DR if it is consistent when either the regression model for the mean function or the propensity to respond is correctly specified. Performance of different methods is compared in terms of root mean squared error of the estimates and width and coverage of confidence intervals or posterior credibility intervals in repeated samples. Overall, the DR methods tended to yield better inference than the incorrect model when either the propensity or mean model is correctly specified, but were less successful for small sample sizes, where the asymptotic DR property is less consequential. Two methods tended to outperform the other DR methods: penalized spline of propensity prediction [Little RJA, An H. Robust likelihood-based analysis of multivariate data with missing values. Statist Sinica. 2004;14:949–968] and the robust method proposed in [Cao W, Tsiatis AA, Davidian M. Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika. 2009;96:723–734].  相似文献   

4.
Summary.  Regression, matching, control function and instrumental variables methods for recovering the effect of education on individual earnings are reviewed for single treatments and sequential multiple treatments with and without heterogeneous returns. The sensitivity of the estimates once applied to a common data set is then explored. We show the importance of correcting for detailed test score and family background differences and of allowing for (observable) heterogeneity in returns. We find an average return of 27% for those completing higher education versus anything less. Compared with stopping at 16 years of age without qualifications, we find an average return to O-levels of 18%, to A-levels of 24% and to higher education of 48%.  相似文献   

5.
ABSTRACT

Consider the problem of estimating the positions of a set of targets in a multidimensional Euclidean space from distances reported by a number of observers when the observers do not know their own positions in the space. Each observer reports the distance from the observer to each target plus a random error. This statistical problem is the basic model for the various forms of what is called multidimensional unfolding in the psychometric literature. Multidimensional unfolding methodology as developed in the field of cognitive psychology is basically a statistical estimation problem where the data structure is a set of measures that are monotonic functions of Euclidean distances between a number of observers and targets in a multidimensional space. The new method presented in this article deals with estimating the target locations and the observer positions when the observations are functions of the squared distances between observers and targets observed with an additive random error in a two-dimensional space. The method provides robust estimates of the target locations in a multidimensional space for the parametric structure of the data generating model presented in the article. The method also yields estimates of the orientation of the coordinate system and the mean and variances of the observer locations. The mean and the variances are not estimated by standard unfolding methods which yield targets maps that are invariant to a rotation of the coordinate system. The data is transformed so that the nonlinearity due to the squared observer locations is removed. The sampling properties of the estimates are derived from the asymptotic variances of the additive errors of a maximum likelihood factor analysis of the sample covariance matrix of the transformed data augmented with bootstrapping. The robustness of the new method is tested using artificial data. The method is applied to a 2001 survey data set from Turkey to provide a real data example.  相似文献   

6.
Various approaches to obtaining estimates based on preliminary data are outlined. A case is then considered which frequently arises when selecting a subsample of units, the information for which is collected within a deadline that allows preliminary estimates to be produced. At the moment when these estimates have to be produced it often occurs that, although the collection of data on subsample units is still not complete, information is available on a set of units which does not belong to the sample selected for the production of the preliminary estimates. An estimation method is proposed which allows all the data available on a given date to be used to the full-and the expression of the expectation and variance are derived. The proposal is based on two-phase sampling theory and on the hypothesis that the response mechanism is the result of random processes whose parameters can be suitably estimated. An empirical analysis of the performance of the estimator on the Italian Survey on building permits concludes the work. The Sects. 1,2,3,4 and the technical appendixes have been developed by Giorgio Alleva and Piero Demetrio Falorsi; Sect. 5 has been done by Fabio Bacchini and Roberto Iannaccone. Piero Demetrio Falorsi is chief statisticians at Italian National Institute of Statistics (ISTAT); Giorgio Alleva is Professor of Statistics at University “La Sapienza” of Rome, Fabio Bacchini and Roberto Iannaccone are researchers at ISTAT.  相似文献   

7.
The two-part model and Heckman's sample selection model are often used in economic studies which involve analyzing the demand for limited variables. This study proposed a simultaneous equation model (SEM) and used the expectation-maximization algorithm to obtain the maximum likelihood estimate. We then constructed a simulation to compare the performance of estimates of price elasticity using SEM with those estimates from the two-part model and the sample selection model. The simulation shows that the estimates of price elasticity by SEM are more precise than those by the sample selection model and the two-part model when the model includes limited independent variables. Finally, we analyzed a real example of cigarette consumption as an application. We found an increase in cigarette price associated with a decrease in both the propensity to consume cigarettes and the amount actually consumed.  相似文献   

8.
A new method for estimating a set of odds ratios under an order restriction based on estimating equations is proposed. The method is applied to those of the conditional maximum likelihood estimators and the Mantel-Haenszel estimators. The estimators derived from the conditional likelihood estimating equations are shown to maximize the conditional likelihoods. It is also seen that the restricted estimators converge almost surely to the respective odds ratios when the respective sample sizes become large regularly. The restricted estimators are compared with the unrestricted maximum likelihood estimators by a Monte Carlo simulation. The simulation studies show that the restricted estimates improve the mean squared errors remarkably, while the Mantel-Haenszel type estimates are competitive with the conditional maximum likelihood estimates, being slightly worse.  相似文献   

9.
The annual 5% increase in tobacco taxes in real terms proposed in the recent White Paper on smoking has reaffirmed the commitment of successive UK Governments to above-inflation increases in tobacco taxation to encourage people to stop smoking. This paper presents evidence on the determinants of starting and quitting smoking by using data from the British Health and Lifestyle Survey and is the first to identify tax elasticities for starting and quitting smoking using British data. Self-reported individual smoking histories are coupled with a long time series for the tax rate on cigarettes to construct a longitudinal data set. Estimates are obtained for the effect of above-inflation tax rises on the age of starting smoking and the number of years of smoking. The estimates of the tax elasticity of the age of starting smoking are 0.16 for men and 0.08 for women. The estimates of the tax elasticity of quitting are −0.60 for men and −0.46 for women. These are robust to different specifications.  相似文献   

10.
In a previous paper Gastwirth shows that a broad family of measures of inequality can be accurately estimated when the tax data are known in groups (more precisely, when we know the number of returns in each of several class intervals and their corresponding average income). In the present paper we show that some measures of the preceding family can be unbiasedly estimated when the tax data are individually known for a sample from the population. Specifically, we construct unbiased estimators of a particular measure of inequality in the samplings with and without replacement, and in the stratified samplings with and without replacement.  相似文献   

11.
Within the context of the period fixed-effects model, this study uses a 2002–2009 state-level panel data set of the USA to investigate the relative impact of state cigarette excise taxation across the nation in reducing cigarette smoking. In particular, by focusing upon the state cigarette excise taxation levels within each of the nine US Census Divisions, this study investigates whether there are inter-regional differences in the rate of responsiveness of cigarette consumption to increased state cigarette taxes. The initial empirical estimates reveal that although the per capita number of packs of cigarettes smoked annually is a decreasing function of the state cigarette excise tax in all nine Census Regions, the relative response of cigarette smoking to state cigarette tax increases varies considerably from one region to the next. Reinforcing this conclusion, in one specification of the model, the number of packs of cigarettes smoked in response to a higher state cigarette tax is statistically significant and negative in only eight of the nine Census Divisions. Furthermore, when cigarette smoking is measured in terms of the percentage of the population classified as smokers, interregional differentials in the response of smokers to higher state cigarette taxes are much greater. Thus, there is evidence that cigarette excise taxation exercises rather different impacts on the propensity to smoke across Census Regions.  相似文献   

12.
In this paper, order statistics from independent and non identically distributed random variables is used to obtain ordered ranked set sampling (ORSS). Bayesian inference of unknown parameters under a squared error loss function of the Pareto distribution is determined. We compute the minimum posterior expected loss (the posterior risk) of the derived estimates and compare them with those based on the corresponding simple random sample (SRS) to assess the efficiency of the obtained estimates. Two-sample Bayesian prediction for future observations is introduced by using SRS and ORSS for one- and m-cycle. A simulation study and real data are applied to show the proposed results.  相似文献   

13.
This paper is concerned with model averaging procedure for varying-coefficient partially linear models with missing responses. The profile least-squares estimation process and inverse probability weighted method are employed to estimate regression coefficients of the partially restricted models, in which the propensity score is estimated by the covariate balancing propensity score method. The estimators of the linear parameters are shown to be asymptotically normal. Then we develop the focused information criterion, formulate the frequentist model averaging estimators and construct the corresponding confidence intervals. Some simulation studies are conducted to examine the finite sample performance of the proposed methods. We find that the covariate balancing propensity score improves the performance of the inverse probability weighted estimator. We also demonstrate the superiority of the proposed model averaging estimators over those of existing strategies in terms of mean squared error and coverage probability. Finally, our approach is further applied to a real data example.  相似文献   

14.
We discuss the use of latent variable models with observed covariates for computing response propensities for sample respondents. A response propensity score is often used to weight item and unit responders to account for item and unit non-response and to obtain adjusted means and proportions. In the context of attitude scaling, we discuss computing response propensity scores by using latent variable models for binary or nominal polytomous manifest items with covariates. Our models allow the response propensity scores to be found for several different items without refitting. They allow any pattern of missing responses for the items. If one prefers, it is possible to estimate population proportions directly from the latent variable models, so avoiding the use of propensity scores. Artificial data sets and a real data set extracted from the 1996 British Social Attitudes Survey are used to compare the various methods proposed.  相似文献   

15.
Regression procedures are not only hindered by large p and small n, but can also suffer in cases when outliers are present or the data generating mechanisms are heavy tailed. Since the penalized estimates like the least absolute shrinkage and selection operator (LASSO) are equipped to deal with the large p small n by encouraging sparsity, we combine a LASSO type penalty with the absolute deviation loss function, instead of the standard least squares loss, to handle the presence of outliers and heavy tails. The model is cast in a Bayesian setting and a Gibbs sampler is derived to efficiently sample from the posterior distribution. We compare our method to existing methods in a simulation study as well as on a prostate cancer data set and a base deficit data set from trauma patients.  相似文献   

16.
ABSTRACT

Local linear estimator is a popularly used method to estimate the non-parametric regression functions, and many methods have been derived to estimate the smoothing parameter, or the bandwidth in this case. In this article, we propose an information criterion-based bandwidth selection method, with the degrees of freedom originally derived for non-parametric inferences. Unlike the plug-in method, the new method does not require preliminary parameters to be chosen in advance, and is computationally efficient compared to the cross-validation (CV) method. Numerical study shows that the new method performs better or comparable to existing plug-in method or CV method in terms of the estimation of the mean functions, and has lower variability than CV selectors. Real data applications are also provided to illustrate the effectiveness of the new method.  相似文献   

17.
This article proposes semiparametric generalized least-squares estimation of parametric restrictions between the conditional mean and the conditional variance of excess returns given a set of parametric factors. A distinctive feature of our estimator is that it does not require a fully parametric model for the conditional mean and variance. We establish consistency and asymptotic normality of the estimates. The theory is nonstandard due to the presence of estimated factors. We provide sufficient conditions for the estimated factors not to have an impact in the asymptotic standard error of estimators. A simulation study investigates the finite sample performance of the estimates. Finally, an application to the CRSP value-weighted excess returns highlights the merits of our approach. In contrast to most previous studies using nonparametric estimates, we find a positive and significant price of risk in our semiparametric setting.  相似文献   

18.
Abstract.  We develop a variance reduction method for smoothing splines. For a given point of estimation, we define a variance-reduced spline estimate as a linear combination of classical spline estimates at three nearby points. We first develop a variance reduction method for spline estimators in univariate regression models. We then develop an analogous variance reduction method for spline estimators in clustered/longitudinal models. Simulation studies are performed which demonstrate the efficacy of our variance reduction methods in finite sample settings. Finally, a real data analysis with the motorcycle data set is performed. Here we consider variance estimation and generate 95% pointwise confidence intervals for the unknown regression function.  相似文献   

19.
Dealing with incomplete data is a pervasive problem in statistical surveys. Bayesian networks have been recently used in missing data imputation. In this research, we propose a new methodology for the multivariate imputation of missing data using discrete Bayesian networks and conditional Gaussian Bayesian networks. Results from imputing missing values in coronary artery disease data set and milk composition data set as well as a simulation study from cancer-neapolitan network are presented to demonstrate and compare the performance of three Bayesian network-based imputation methods with those of multivariate imputation by chained equations (MICE) and the classical hot-deck imputation method. To assess the effect of the structure learning algorithm on the performance of the Bayesian network-based methods, two methods called Peter-Clark algorithm and greedy search-and-score have been applied. Bayesian network-based methods are: first, the method introduced by Di Zio et al. [Bayesian networks for imputation, J. R. Stat. Soc. Ser. A 167 (2004), 309–322] in which, each missing item of a variable is imputed using the information given in the parents of that variable; second, the method of Di Zio et al. [Multivariate techniques for imputation based on Bayesian networks, Neural Netw. World 15 (2005), 303–310] which uses the information in the Markov blanket set of the variable to be imputed and finally, our new proposed method which applies the whole available knowledge of all variables of interest, consisting the Markov blanket and so the parent set, to impute a missing item. Results indicate the high quality of our new proposed method especially in the presence of high missingness percentages and more connected networks. Also the new method have shown to be more efficient than the MICE method for small sample sizes with high missing rates.  相似文献   

20.
In multiple imputation (MI), the resulting estimates are consistent if the imputation model is correct. To specify the imputation model, it is recommended to combine two sets of variables: those that are related to the incomplete variable and those that are related to the missingness mechanism. Several possibilities exist, but it is not clear how they perform in practice. The method that simply groups all variables together into the imputation model and four other methods that are based on the propensity scores are presented. Two of them are new and have not been used in the context of MI. The performance of the methods is investigated by a simulation study under different missing at random mechanisms for different types of variables. We conclude that all methods, except for one method based on the propensity scores, perform well. It turns out that as long as the relevant variables are taken into the imputation model, the form of the imputation model has only a minor effect in the quality of the imputations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号