The exact mean-squared error (MSE) of estimators of the variance in nonparametric regression based on quadratic forms is investigated. In particular, two classes of estimators are compared: Hall, Kay and Titterington's optimal difference-based estimators and a class of ordinary difference-based estimators which generalize methods proposed by Rice and Gasser, Sroka and Jennen-Steinmetz. For small sample sizes the MSE of the first estimator is essentially increased by the magnitude of the integrated first two squared derivatives of the regression function. It is shown that in many situations ordinary difference-based estimators are more appropriate for estimating the variance, because they control the bias much better and hence have a much better overall performance. It is also demonstrated that Rice's estimator does not always behave well. Data-driven guidelines are given to select the estimator with the smallest MSE.  相似文献   

The time it takes to recruit patients into a clinical trial has a major impact on whether a drug development programme completes on time. Here Byron Jones explains how simple statistical models can be very useful in predicting the time to complete recruitment.  相似文献   

This application note investigates the causal relationship between oil price and tourist arrivals to further explain the impact of oil price volatility on tourism-related economic activities. The analysis itself considers the time domain, frequency domain and information theory domain perspectives. Data relating to US and nine European countries are exploited in this paper with causality tests which include time domain, frequency domain, and Convergent Cross Mapping (CCM). The CCM approach is nonparametric and therefore not restricted by assumptions. We contribute to existing research through the successful and introductory application of an advanced method, and via the uncovering of significant causal links from oil prices to tourist arrivals.  相似文献   

It is possible to obtain any positive value for the Wald test statistic, by rewriting the null hypothesis being tested In an algebraically equivalent form.  相似文献   

The purpose of this article is to explain cross-validation and describe its use in regression. Because replicability analyses are not typically employed in studies, this is a topic with which many researchers may not be familiar. As a result, researchers may not understand how to conduct cross-validation in order to evaluate the replicability of their data. This article not only explains the purpose of cross-validation, but also uses the widely available Holzinger and Swineford (1939 Holzinger, K.J., Swineford, F. (1939). A Study in Factor Analysis: The Stability of a Bi-Factor Solution. Chicago, IL: University of Chicago. Available at: http://people.cehd.tamu.edu/~bthompson/datasets.htm [Google Scholar]) dataset as a heuristic example to concretely demonstrate its use. By incorporating multiple tables and examples of SPSS syntax and output, the reader is provided with additional visual examples in order to further clarify the steps involved in conducting cross-validation. A brief discussion of the limitations of cross-validation is also included. After reading this article, the reader should have a clear understanding of cross-validation, including when it is appropriate to use, and how it can be used to evaluate replicability in regression.  相似文献   


While advantages of electronic publications are obvious and far reaching, most electronic journals are still published also in print since libraries do not acquire electronic journals but only access them through licenses. Libraries with substantial electronic publications, however, no longer “compile” collections in a traditional sense. One consequence of electronic only access is that the permanent availability of information that implicitly used to be found in print collections is no longer guaranteed. Digital publishing dramatically alters both the roles of libraries and publishers in preserving records of science. This article discusses the contribution of national libraries, particularly the Koninklijke Bibliotheek (KB; National Library of the Netherlands), to cooperate with publishers to secure the permanent archiving of electronic publications.  相似文献   

Point process models are a natural approach for modelling data that arise as point events. In the case of Poisson counts, these may be fitted easily as a weighted Poisson regression. Point processes lack the notion of sample size. This is problematic for model selection, because various classical criteria such as the Bayesian information criterion (BIC) are a function of the sample size, n, and are derived in an asymptotic framework where n tends to infinity. In this paper, we develop an asymptotic result for Poisson point process models in which the observed number of point events, m, plays the role that sample size does in the classical regression context. Following from this result, we derive a version of BIC for point process models, and when fitted via penalised likelihood, conditions for the LASSO penalty that ensure consistency in estimation and the oracle property. We discuss challenges extending these results to the wider class of Gibbs models, of which the Poisson point process model is a special case.  相似文献   

The Earth is full of life. If life evolved here, why not elsewhere? The Universe is a big place and our galaxy has many stars with planets. So are we alone? What is out there? And how do we know? Mark Burchell looks at the probability of life beyond our planet.  相似文献   

We investigate the estimation of dynamic models of criminal activity, when there is significant under-recording of crime. We give a theoretical analysis and use simulation techniques to investigate the resulting biases in conventional regression estimates. We find the biases to be of little practical significance. We develop and apply a new simulated maximum likelihood procedure that estimates simultaneously the measurement error and crime processes, using extraneous survey data. This also confirms that measurement error biases are small. Our estimation results for data from England and Wales imply a significant response of crime to both the economic and the enforcement environment.  相似文献   

This paper compares the five-parameter beta generalized gamma (BGG) distribution to the three-parameter generalized gamma (GG). Both distributions include the four standard hazard shapes that we believe is an important property for any parametric family. For several BGG distributions, we select matching GGs and compute the Kullback-Liebler distance, observing remarkable agreement. We explore the beta parameters' influence on the matched GG parameters, detecting a strong connection between the distributions. Lastly, we compare the distributions using two real-data examples. We conclude from these comparisons that the BGG is not likely to be more useful for analytical purposes than the simpler GG.  相似文献   

The mean absolute deviation (MAD) estimator has recently received a great deal of attention as applied to full-rank linear regression models. This paper provides a necessary and sufficient condition for the MAD estimator to be a non-linear estimator, in which case conditions for the variance of the MAD estimator to be larger or smaller than those for OLS are, in general, unknown. The non-linearity of the MAD estimator is examined for several two-way designs; in particular (1) randomized block design (2) two-way nested design (3) two-way classification with interaction and (4) partially balanced incomplete block design  相似文献   

The author examines whether the unexpectedly high number of births recorded in Poland in 1982 and 1983 is evidence of a change in fertility patterns. It is suggested that the increase in the gross reproduction rate that occurred was due to lower standards of living and fewer opportunities to acquire material possessions or travel abroad as an alternative to having children. Some of the increase may also be due to new pro-natalist measures such as prolongation of paid leave of absence for mothers. The author suggests that the increase in fertility is temporary and that fertility will soon decline to its former level.  相似文献   

The 1978 European Community Typology for Agricultural Holdings is described in this paper and contrasted with a data based, polythetic-multivariate classification based on cluster analysis.

The requirement to reduce the size of the variable set employed in an optimisation-partition method of clustering suggested the value of principal components and factor analysis for the identification of major ‘source’ dimensions against which to measure farm differences and similarities.

The Euclidean cluster analysis incorporating the reduced dimensions quickly converged to a stable solution and was little influenced by the initial number or nature of ‘seeding’ partitions of the data.

The assignment of non-sampled observations from the population to cluster classes was completed using classification functions.

The final scheme, based on a sample of over 2,000 observations, was found to be both capable of interpretation and meaningful in terms of agricultural structure and practice and much superior in its explanatory power when compared with a version of the principal activity typology.  相似文献   

Singular spectrum analysis (SSA) is an increasingly popular and widely adopted filtering and forecasting technique which is currently exploited in a variety of fields. Given its increasing application and superior performance in comparison to other methods, it is pertinent to study and distinguish between the two forecasting variations of SSA. These are referred to as Vector SSA (SSA-V) and Recurrent SSA (SSA-R). The general notion is that SSA-V is more robust and provides better forecasts than SSA-R. This is especially true when faced with time series which are non-stationary and asymmetric, or affected by unit root problems, outliers or structural breaks. However, currently there exists no empirical evidence for proving the above notions or suggesting that SSA-V is better than SSA-R. In this paper, we evaluate out-of-sample forecasting capabilities of the optimised SSA-V and SSA-R forecasting algorithms via a simulation study and an application to 100 real data sets with varying structures, to provide a statistically reliable answer to the question of which SSA algorithm is best for forecasting at both short and long run horizons based on several important criteria.  相似文献   

Polynomials are widely used for fitting models empirically to data. Low-degree polynomials (specifically, degrees 1, 2, and at most 3) have stood the test of time by proving their versatility when it comes to fitting a wide variety of different surface shapes over limited regions of interest. However, when faced with modeling a surface over an experimental region whose boundaries extend beyond some localized neighborhood or limited-sized region of interest, a polynomial of degree 2, or even of degree 3, may not be adequate. For this situation we propose fitting an interaction model which is a reduced form of higher-degree polynomial. Some examples of actual experiments are presented to illustrate the improvement in fit by an interaction model over that of a standard polynomial, even for response surfaces with uncomplicated shapes.  相似文献   


A common method for estimating the time-domain parameters of an autoregressive process is to use the Yule–Walker equations. Tapering has been shown intuitively and proven theoretically to reduce the bias of the periodogram in the frequency domain, but the intuition for the similar bias reduction in the time-domain estimates has been lacking. We provide insightful reasoning for why tapering reduces the bias in the Yule–Walker estimates by showing them to be equivalent to a weighted least-squares problem. This leads to the derivation of an optimal taper which behaves similarly to commonly used tapers.  相似文献   

