In this article, we estimate bounds for the expected value of the stochastic Divisia's price index, that is, we assume that prices and quantities of the given commodities are stochastic processes with continuous time. We consider some special case of the stochastic model in which prices and quantities are described by the geometric Brownian motion. It is shown that the precision of this estimation depends rather on the volatility of prices than quantities volatilities.   

Five sampling schemes (SS) for price index construction – one cut-off sampling technique and four probability-proportional-to-size (pps) methods – are evaluated by comparing their performance on a homescan market research data set across 21 months for each of the 13 classification of individual consumption by purpose (COICOP) food groups. Classifications are derived for each of the food groups and the population index value is used as a reference to derive performance error measures, such as root mean squared error, bias and standard deviation for each food type. Repeated samples are taken for each of the pps schemes and the resulting performance error measures analysed using regression of three of the pps schemes to assess the overall effect of SS and COICOP group whilst controlling for sample size, month and population index value. Cut-off sampling appears to perform less well than pps methods and multistage pps seems to have no advantage over its single-stage counterpart. The jackknife resampling technique is also explored as a means of estimating the standard error of the index and compared with the actual results from repeated sampling.   

This paper presents an empirical analysis of stochastic features of volatility in the Japanese stock price index, or TOPIX, using high-frequency data sampled every 5 min. The process of TOPIX is modeled by a stochastic differential equation with the time-homogeneous drift and diffusion coefficients. To avoid the risk of misspecification for the volatility function, which is defined by the squared diffusion coefficient, the local polynomial model is applied to the data, and then produced the estimates of the volatility function together with their confidence intervals. The result of the estimation suggests that the volatility function shows similar patterns for one period, but drastically changes for another.   

Functional logistic regression is becoming more popular as there are many situations where we are interested in the relation between functional covariates (as input) and a binary response (as output). Several approaches have been advocated, and this paper goes into detail about three of them: dimension reduction via functional principal component analysis, penalized functional regression, and wavelet expansions in combination with Least Absolute Shrinking and Selection Operator penalization. We discuss the performance of the three methods on simulated data and also apply the methods to data regarding lameness detection for horses. Emphasis is on classification performance, but we also discuss estimation of the unknown parameter function.   

In this paper the use of three kernel-based nonparametric forecasting methods - the conditional mean, the conditional median, and the conditional mode -is explored in detail. Several issues related to the estimation of these methods are discussed, including the choice of the bandwidth and the type of kernel function. The out-of-sample forecasting performance of the three nonparametric methods is investigated using 60 real time series. We find that there is no superior forecast method for series having approximately less than 100 observations. However, when a time series is long or when its conditional density is bimodal there is quite a difference between the forecasting performance of the three kernel-based forecasting methods.   

This paper presents a method of estimation of crop-production statistics at smaller geographical levels like a community development block (generally referred to as a block) to make area-specific plans for agricultural development programmes in India. Using available district-level data on crop yield from crop-cutting experiments and data on auxiliary variables from various administrative sources, a suitable regression model is fitted. The fitted model is then used to predict the crop production at the block level. Some scaled estimators are also developed using predicted estimates. An empirical study is also carried out to judge the merits of the proposed estimators.   

In recent years randomized response methods have been introduced in an attempt to improve the accuracy and honesty in personalized response surveys of very sensitive questions. Two randomized response methods are compared, taking into account the protection afforded the respondent. In addition, we point out that the estimators, which previous authors have claimed to be the maximum likelihood estimators of the population proportion with the sensitive characteristic, are in fact not the maximum likelihood estimators.   

We consider analysis of complex stochastic models based upon partial information. MCMC and reversible jump MCMC are often the methods of choice for such problems, but in some situations they can be difficult to implement; and suffer from problems such as poor mixing, and the difficulty of diagnosing convergence. Here we review three alternatives to MCMC methods: importance sampling, the forward-backward algorithm, and sequential Monte Carlo (SMC). We discuss how to design good proposal densities for importance sampling, show some of the range of models for which the forward-backward algorithm can be applied, and show how resampling ideas from SMC can be used to improve the efficiency of the other two methods. We demonstrate these methods on a range of examples, including estimating the transition density of a diffusion and of a discrete-state continuous-time Markov chain; inferring structure in population genetics; and segmenting genetic divergence data.   

We show in detail how the Swendsen-Wang algorithm, for simulating Potts models, may be used to simulate certain types of posterior Gibbs distribution, as a special case of Edwards and Sokal (1988), and we empirically compare the behaviour of the algorithm with that of the Gibbs sampler. Some marginal posterior mode and simulated annealing image restorations are also examined. Our results demonstrate the importance of the starting configuration. If this is inappropriate, the Swendsen-Wang method can suffer from critical slowing in moderately noise-free situations where the Gibbs sampler convergence is very fast, whereas the reverse is true when noise level is high.   

The poor performance of the Wald method for constructing confidence intervals (CIs) for a binomial proportion has been demonstrated in a vast literature. The related problem of sample size determination needs to be updated and comparative studies are essential to understanding the performance of alternative methods. In this paper, the sample size is obtained for the Clopper–Pearson, Bayesian (Uniform and Jeffreys priors), Wilson, Agresti–Coull, Anscombe, and Wald methods. Two two-step procedures are used: one based on the expected length (EL) of the CI and another one on its first-order approximation. In the first step, all possible solutions that satisfy the optimal criterion are obtained. In the second step, a single solution is proposed according to a new criterion (e.g. highest coverage probability (CP)). In practice, it is expected a sample size reduction, therefore, we explore the behavior of the methods admitting 30% and 50% of losses. For all the methods, the ELs are inflated, as expected, but the coverage probabilities remain close to the original target (with few exceptions). It is not easy to suggest a method that is optimal throughout the range (0, 1) for p. Depending on whether the goal is to achieve CP approximately or above the nominal level different recommendations are made.   

黄静  屠梅曾 《统计研究》2009,26(7):13-19
 本文利用全国29个大中城市1999-2008的季度面板数据,采用最新发展的非平稳面板计量方法,对我国城市房价与地价之间的长期均衡关系和Granger因果关系进行了实证分析,克服了以往研究中小样本带来的低效果以及忽略了各城市差异的问题。得到的主要结论为:(1)东部经济较发达城市和西南省会城市,地价对房价的长期影响程度较其它中部地区的省会城市大;中部省会城市的房价对地价的长期影响程度要大于东部地区和西南省会城市;(2)总体上,房价对地价长期影响的程度高于地价对房价的影响;(3)长期来看,各城市的房价和地价互为Granger因果关系,短期而言,房价是地价的Granger原因。   

A method for the national assessment of the biological quality of river sites is developed. Multivariate discrimination, based on site environmental characteristics, is used on a biological classification of reference sites to derive a procedure to predict the fauna to be expected in the absence of environmental stress. Various quality indices, based on a comparison of the observed with the expected fauna, are proposed. The sizes of the various sources of error and variation, and their effects on the rates of misclassification to quality bands, are examined.   

Principal component analysis (PCA) is a widely used statistical technique for determining subscales in questionnaire data. As in any other statistical technique, missing data may both complicate its execution and interpretation. In this study, six methods for dealing with missing data in the context of PCA are reviewed and compared: listwise deletion (LD), pairwise deletion, the missing data passive approach, regularized PCA, the expectation-maximization algorithm, and multiple imputation. Simulations show that except for LD, all methods give about equally good results for realistic percentages of missing data. Therefore, the choice of a procedure can be based on the ease of application or purely the convenience of availability of a technique.   

In an epidemiological study the regression slope between a response and predictor variable is underestimated when the predictor variable is measured imprecisely. Repeat measurements of the predictor in individuals in a subset of the study or in a separate study can be used to estimate a multiplicative factor to correct for this 'regression dilution bias'. In applied statistics publications various methods have been used to estimate this correction factor. Here we compare six different estimation methods and explain how they fall into two categories, namely regression and correlation-based methods. We provide new asymptotic variance formulae for the optimal correction factors in each category, when these are estimated from the repeat measurements subset alone, and show analytically and by simulation that the correlation method of choice gives uniformly lower variance. The simulations also show that, when the correction factor is not much greater than 1, this correlation method gives a correction factor which is closer to the true value than that from the best regression method on up to 80% of occasions. We also provide a variance formula for a modified correlation method which uses the standard deviation of the predictor variable in the main study; this shows further improved performance provided that the correction factor is not too extreme. A confidence interval for a corrected regression slope in an epidemiological study should reflect the imprecision of both the uncorrected slope and the estimated correction factor. We provide formulae for this and show that, particularly when the correction factor is large and the size of the subset of repeat measures is small, the effect of allowing for imprecision in the estimated correction factor can be substantial.   

The influence function of the covariance matrix is decomposed into a finite number of components. This decomposition provides a useful tool to develop efficient methods for computing empirical influence curves related to various multivariate methods. It can also be used to characterize multivariate methods from the sensitivity perspective. A numerical example is given to demonstrate efficient computing and to characterize some procedures of exploratory factor analysis.   

Pharmacokinetic studies are commonly performed using the two-stage approach. The first stage involves estimation of pharmacokinetic parameters such as the area under the concentration versus time curve (AUC) for each analysis subject separately, and the second stage uses the individual parameter estimates for statistical inference. This two-stage approach is not applicable in sparse sampling situations where only one sample is available per analysis subject similar to that in non-clinical in vivo studies. In a serial sampling design, only one sample is taken from each analysis subject. A simulation study was carried out to assess coverage, power and type I error of seven methods to construct two-sided 90% confidence intervals for ratios of two AUCs assessed in a serial sampling design, which can be used to assess bioequivalence in this parameter.   

The geographical location and the monsoon climate render Bangladesh highly vulnerable to natural hazards, deteriorating the country's socio-economic stability. This study is based on 500 randomly chosen rural households from the Household Income and Expenditure Survey [Bangladesh Bureau of Statistics, Planning Division, Ministry of Planning, Government of the People's Republic of Bangladesh, Dhaka, 2006]. The objectives are to estimate the income vulnerability of rural households and to check whether the Bayesian approaches (natural conjugate prior and non-informative prior estimates) have any superiority over the classical (feasible generalized least square (FGLS)) method. The poverty level, measured from the data, is 24%; whereas the vulnerability estimates, using FGLS, natural conjugate prior and non-informative prior are 31%, 69% and 82%, respectively. Vulnerability estimates by the Bayesian natural conjugate prior approach is found to have greater efficiency compared with FGLS and non-informative prior approaches.   

A number of recent studies have looked at the coverage probabilities of various common parametric methods of interval estimation of the median effective dose (ED 50 ) for a logistic dose-response curve. There has been comparatively little work done on more extreme effective doses. In this paper, the interval estimation of the 90% effective dose (ED 90 ) will be of principal interest. We provide a comparison of four parametric methods of interval construction with four methods based on bootstrap resampling.   

In searching for optimum conditions, the response surface methods comprise two phases. In the first phase, the method of the steepest ascent with a 2 k-p design is used in searching for a region of improved response. The curvature of the response surface is checked in the second phase. For testing the evidence of curvature, a reasonable design is a 2 k-p fractional factorial design augmented by centre runs. Using c-optimality criterion, the optimal number of centre runs is investigated. Incorporating c-efficiencies for the curvature test with D-efficiencies and G-efficiencies of CCDs for the quadratic response surfaces and then, adopting the Mini-Max principle, i.e. maximizing the worst efficiency, we propose robust centre runs with respect to the three optimality criteria to be chosen.   

This paper addresses the problem of maximum a posteriori (MAP) sequence estimation in general state-space models. We consider two algorithms based on the sequential Monte Carlo (SMC) methodology (also known as particle filtering). We prove that they produce approximations of the MAP estimator and that they converge almost surely. We also derive a lower bound for the number of particles that are needed to achieve a given approximation accuracy. In the last part of the paper, we investigate the application of particle filtering and MAP estimation to the global optimization of a class of (possibly non-convex and possibly non-differentiable) cost functions. In particular, we show how to convert the cost-minimization problem into one of MAP sequence estimation for a state-space model that is "matched" to the cost of interest. We provide examples that illustrate the application of the methodology as well as numerical results.   

