期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Evaluation of statistical protocols for quality control of ecosystem carbon dioxide fluxes

Jorge F. Perez-Quezada Nicanor Z. Saliendra William E. Emmerich Emilio A. Laca 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2007,170(1):213-230

Summary. The process of quality control of micrometeorological and carbon dioxide (CO₂) flux data can be subjective and may lack repeatability, which would undermine the results of many studies. Multivariate statistical methods and time series analysis were used together and independently to detect and replace outliers in CO₂ flux data derived from a Bowen ratio energy balance system. The results were compared with those produced by five experts who applied the current and potentially subjective protocol. All protocols were tested on the same set of three 5-day periods, when measurements were conducted in an abandoned agricultural field. The concordance of the protocols was evaluated by using the experts' opinion (mean ± 1.96 standard deviations) as a reference interval (the Bland–Altman method). Analysing the 15 days together, the statistical protocol that combined multivariate distance, multiple linear regression and time series analysis showed a concordance of 93% on a 20-min flux basis and 87% on a daily basis (only 2 days fell outside the reference interval), and the overall flux differed only by 1.7% (3.2 g CO₂ m⁻²). An automated version of this or a similar statistical protocol could be used as a standard way of filling gaps and processing data from Bowen ratio energy balance and other techniques (e.g. eddy covariance). This would enforce objectivity in comparisons of CO₂ flux data that are generated by different research groups and streamline the protocols for quality control. 相似文献

2.

Detecting changes in the mean of functional observations

István Berkes Robertas Gabrys Lajos Horváth Piotr Kokoszka 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2009,71(5):927-946

Summary. Principal component analysis has become a fundamental tool of functional data analysis. It represents the functional data as X _i( t )= μ ( t )+Σ_{1≤ l <∞} η _{i , l}+ v _l( t ), where μ is the common mean, v _l are the eigenfunctions of the covariance operator and the η _{i , l} are the scores. Inferential procedures assume that the mean function μ ( t ) is the same for all values of i . If, in fact, the observations do not come from one population, but rather their mean changes at some point(s), the results of principal component analysis are confounded by the change(s). It is therefore important to develop a methodology to test the assumption of a common functional mean. We develop such a test using quantities which can be readily computed in the R package fda. The null distribution of the test statistic is asymptotically pivotal with a well-known asymptotic distribution. The asymptotic test has excellent finite sample performance. Its application is illustrated on temperature data from England. 相似文献

3.

Bayesian projection of the acquired immune deficiency syndrome epidemic

D. De Angelis W. R. Gilks & N. E. Day 《Journal of the Royal Statistical Society. Series C, Applied statistics》1998,47(4):449-498

Short-term projections of the acquired immune deficiency syndrome (AIDS) epidemic in England and Wales have been regularly updated since the publication of the Cox report in 1988. The key approach for those updates has been the back-calculation method, which has been informally adapted to acknowledge various sources of uncertainty as well as to incorporate increasingly available information on the spread of the human immunodeficiency virus (HIV) in the population. We propose a Bayesian formulation of the back-calculation method which allows a formal treatment of uncertainty and the inclusion of extra information, within a single coherent composite model. Estimation of the variably dimensioned model is carried out by using reversible-jump Markov chain Monte Carlo methods. Application of the model to data for homosexual and bisexual males in England and Wales is presented, and the role of the various sources of information and model assumptions is appraised. Our results show a massive peak in HIV infections around 1983 and suggest that the incidence of AIDS has now reached a plateau, although there is still substantial uncertainty about the future. 相似文献

4.

Emulation and interpretation of high-dimensional climate model outputs

Philip B. Holden Neil R. Edwards Paul H. Garthwaite Richard D. Wilkinson 《Journal of applied statistics》2015,42(9):2038-2055

Running complex computer models can be expensive in computer time, while learning about the relationships between input and output variables can be difficult. An emulator is a fast approximation to a computationally expensive model that can be used as a surrogate for the model, to quantify uncertainty or to improve process understanding. Here, we examine emulators based on singular value decompositions (SVDs) and use them to emulate global climate and vegetation fields, examining how these fields are affected by changes in the Earth's orbit. The vegetation field may be emulated directly from the orbital variables, but an appealing alternative is to relate it to emulations of the climate fields, which involves high-dimensional input and output. The SVDs radically reduce the dimensionality of the input and output spaces and are shown to clarify the relationships between them. The method could potentially be useful for any complex process with correlated, high-dimensional inputs and/or outputs. 相似文献

5.

Statistical inference for Sobol pick-freeze Monte Carlo method

F. Gamboa A. Janon T. Klein C. Prieur 《Statistics》2016,50(4):881-902

Many mathematical models involve input parameters, which are not precisely known. Global sensitivity analysis aims to identify the parameters whose uncertainty has the largest impact on the variability of a quantity of interest (output of the model). One of the statistical tools used to quantify the influence of each input variable on the output is the Sobol sensitivity index. We consider the statistical estimation of this index from a finite sample of model outputs. We study asymptotic and non-asymptotic properties of two estimators of Sobol indices. These properties are applied to significance tests and estimation by confidence intervals. 相似文献

6.

Prior elicitation, variable selection and Bayesian computation for logistic regression models

M.-H. Chen J. G. Ibrahim & C. Yiannoutsos 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1999,61(1):223-242

Bayesian selection of variables is often difficult to carry out because of the challenge in specifying prior distributions for the regression parameters for all possible models, specifying a prior distribution on the model space and computations. We address these three issues for the logistic regression model. For the first, we propose an informative prior distribution for variable selection. Several theoretical and computational properties of the prior are derived and illustrated with several examples. For the second, we propose a method for specifying an informative prior on the model space, and for the third we propose novel methods for computing the marginal distribution of the data. The new computational algorithms only require Gibbs samples from the full model to facilitate the computation of the prior and posterior model probabilities for all possible models. Several properties of the algorithms are also derived. The prior specification for the first challenge focuses on the observables in that the elicitation is based on a prior prediction y ₀ for the response vector and a quantity a ₀ quantifying the uncertainty in y ₀. Then, y ₀ and a ₀ are used to specify a prior for the regression coefficients semi-automatically. Examples using real data are given to demonstrate the methodology. 相似文献

7.

Modelling and smoothing parameter estimation with multiple quadratic penalties 总被引：1，自引：0，他引：1

S. N. Wood 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2000,62(2):413-428

Penalized likelihood methods provide a range of practical modelling tools, including spline smoothing, generalized additive models and variants of ridge regression. Selecting the correct weights for penalties is a critical part of using these methods and in the single-penalty case the analyst has several well-founded techniques to choose from. However, many modelling problems suggest a formulation employing multiple penalties, and here general methodology is lacking. A wide family of models with multiple penalties can be fitted to data by iterative solution of the generalized ridge regression problem minimize || W ^1/2 ( Xp − y ) ||²ρ+Σ_{i =1}^m θ_i p ' S _i p ( p is a parameter vector, X a design matrix, S _i a non-negative definite coefficient matrix defining the i th penalty with associated smoothing parameter θ_i, W a diagonal weight matrix, y a vector of data or pseudodata and ρ an 'overall' smoothing parameter included for computational efficiency). This paper shows how smoothing parameter selection can be performed efficiently by applying generalized cross-validation to this problem and how this allows non-linear, generalized linear and linear models to be fitted using multiple penalties, substantially increasing the scope of penalized modelling methods. Examples of non-linear modelling, generalized additive modelling and anisotropic smoothing are given. 相似文献

8.

A Problem of Dimensionality in Normal Mixture Analysis

MARCO BEE & BERNARD FLURY 《Scandinavian Journal of Statistics》2002,29(3):485-500

Suppose the p -variate random vector W , partitioned into q variables W₁ and p - q variables W₂, follows a multivariate normal mixture distribution. If the investigator is mainly interested in estimation of the parameters of the distribution of W₁, there are two possibilities: (1) use only the data on W₁ for estimation, and (2) estimate the parameters of the p -variate mixture distribution, and then extract the estimates of the marginal distribution of W₁. In this article we study the choice between these two possibilities mainly for the case of two mixture components with identical covariance matrices. We find the asymptotic distribution of the linear discriminant function coefficients using the work of Efron (1975 ) and O'Neill (1978 ), and give a Wald–test for redundancy of W₂. A simulation study gives further insights into conditions under which W₂ should be used in the analysis: in summary, the inclusion of W₂ seems justified if Δ 2.1, the Mahalanobis distance between the two component distributions based on the conditional distribution of W₂ given W₁, is at least 2. 相似文献

9.

Approximations of distributions for some standardized partial sums in sequential analysis

Wei Liu Nan Wang & Suojin Wang 《Australian & New Zealand Journal of Statistics》2002,44(1):109-119

In sequential analysis it is often necessary to determine the distributions of √t Y _t and/or √a Y _t , where t is a stopping time of the form t = inf{ n ≥ 1 : n+S_n+ξ_n> a }, Y _n is the sample mean of n independent and identically distributed random variables (iidrvs) Y_i with mean zero and variance one, S_n is the partial sum of iidrvs X_i with mean zero and a positive finite variance, and { ξ_n } is a sequence of random variables that converges in distribution to a random variable ξ as n →∞ and ξ_n is independent of ( X_n+1, Y_n+1), (X_n+2, Y_n+2), . . . for all n ≥ 1. Anscombe's (1952) central limit theorem asserts that both √t Y _t and √a Y _t are asymptotically normal for large a , but a normal approximation is not accurate enough for many applications. Refined approximations are available only for a few special cases of the general setting above and are often very complex. This paper provides some simple Edgeworth approximations that are numerically satisfactory for the problems it considers. 相似文献

10.

On Minimum Distance Estimation Using Kolmogorov-Lévy Type Metrics

Andrzej S. Kozek 《Australian & New Zealand Journal of Statistics》1998,40(3):317-334

Let X ₁, . . ., X_n be independent identically distributed random variables with a common continuous (cumulative) distribution function (d.f.) F , and F^_n the empirical d.f. (e.d.f.) based on X ₁, . . ., X_n . Let G be a smooth d.f. and Gθ = G (·–θ) its translation through θ∈ R . Using a Kolmogorov-Lévy type metric ρ_α defined on the space of d.f.s. on R , the paper derives both null and non-null limiting distributions of √ n [ ρ_α ( F_n , Gθ_n ) – ρ_α ( F, G_θ )], √ n (θ _n –θ) and √ nρ_α ( Gθ , Gθ ), where θ _n and θ are the minimum ρ_α -distance parameters for F_n and F from G , respectively. These distributions are known explicitly in important particular cases; with some complementary Monte Carlo simulations, they help us clarify our understanding of estimation using minimum distance methods and supremum type metrics. We advocate use of the minimum distance method with supremum type metrics in cases of non-null models. The resulting functionals are Hadamard differentiable and efficient. For small scale parameters the minimum distance functionals are close to medians of the parent distributions. The optimal small scale models result in minimum distance estimators having asymptotic variances very competitive and comparable with best known robust estimators. 相似文献

11.

Estimating percentiles of uncertain computer code outputs

Jeremy Oakley 《Journal of the Royal Statistical Society. Series C, Applied statistics》2004,53(1):83-93

Summary. A deterministic computer model is to be used in a situation where there is uncertainty about the values of some or all of the input parameters. This uncertainty induces uncertainty in the output of the model. We consider the problem of estimating a specific percentile of the distribution of this uncertain output. We also suppose that the computer code is computationally expensive, so we can run the model only at a small number of distinct inputs. This means that we must consider our uncertainty about the computer code itself at all untested inputs. We model the output, as a function of its inputs, as a Gaussian process, and after a few initial runs of the code use a simulation approach to choose further suitable design points and to make inferences about the percentile of interest itself. An example is given involving a model that is used in sewer design. 相似文献

12.

Non-parametric Regression with Dependent Censored Data 总被引：1，自引：0，他引：1

ANOUAR EL GHOUCH INGRID VAN KEILEGOM 《Scandinavian Journal of Statistics》2008,35(2):228-247

Abstract. Let ( X _i, Y _i) ( i = 1 ,…, n ) be n replications of a random vector ( X , Y ), where Y is supposed to be subject to random right censoring. The data ( X _i, Y _i) are assumed to come from a stationary α -mixing process. We consider the problem of estimating the function m ( x ) = E ( φ ( Y ) | X = x ), for some known transformation φ . This problem is approached in the following way: first, we introduce a transformed variable , that is not subject to censoring and satisfies the relation , and then we estimate m ( x ) by applying local linear regression techniques. As a by-product, we obtain a general result on the uniform rate of convergence of kernel type estimators of functionals of an unknown distribution function, under strong mixing assumptions. 相似文献

13.

A general method of constructing E(s²)-optimal supersaturated designs

Neil A. Butler Roger Mead Kent M. Eskridge & Steven G. Gilmour 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2001,63(3):621-632

There has been much recent interest in supersaturated designs and their application in factor screening experiments. Supersaturated designs have mainly been constructed by using the E ( s ²)-optimality criterion originally proposed by Booth and Cox in 1962. However, until now E ( s ²)-optimal designs have only been established with certainty for n experimental runs when the number of factors m is a multiple of n-1 , and in adjacent cases where m = q ( n -1) + r (| r | 2, q an integer). A method of constructing E ( s ²)-optimal designs is presented which allows a reasonably complete solution to be found for various numbers of runs n including n ,=8 12, 16, 20, 24, 32, 40, 48, 64. 相似文献

14.

Human mosaic: maps of honeycombed British society

Bethan Thomas Danny Dorling 《Significance》2007,4(4):150-154

Parts of Britain resemble a melting pot that more than rivals any found in America. Some others appear to be social monocultures where almost everyone is white, married, has children and owns or is buying a house. But scratch beneath the superficial banality of middle England and there too is a rainbow: there is the widest variety of lives, life chances and experience. To reveal the social geology of Britain Bethan Thomas and Danny Dorling have drawn re-projected maps¹—maps where the scale is measured in human lives. 相似文献

15.

Rejoinder to 'Ahmed, M.S. (1998). A note on regression-type estimators using multiple auxiliary information.'

Rahul Mukerjee T.J. Rao & K. Vijayan 《Australian & New Zealand Journal of Statistics》2000,42(2):245-245

In the estimators t ₃ , t ₄ , t ₅ of Mukerjee, Rao & Vijayan (1987), b _{y x} and b _{y z} are partial regression coefficients of y on x and z , respectively, based on the smaller sample. With the above interpretation of b _{y x} and b _{y z} in t ₃ , t ₄ , t ₅ , all the calculations in Mukerjee at al. (1987) are correct. In this connection, we also wish to make it explicit that b _{x z} in t ₅ is an ordinary and not a partial regression coefficient. The 'corrected' MSEs of t ₃ , t ₄ , t ₅ , as given in Ahmed (1998 Section 3) are computed assuming that our b _{y x} and b _{y z} are ordinary and not partial regression coefficients. Indeed, we had no intention of giving estimators using the corresponding ordinary regression coefficients which would lead to estimators inferior to those given by Kiregyera (1984). We accept responsibility for any notational confusion created by us and express regret to readers who have been confused by our notation. Finally, in consideration of the above, it may be noted that Tripathi & Ahmed's (1995) estimator t ₀ , quoted also in Ahmed (1998), is no better than t ₅ of Mukerjee at al. (1987). 相似文献

16.

A parallel analysis of individual and ecological data on residential radon and lung cancer in south-west England

Sarah Darby Harz Deo Richard Doll & Elise Whitley 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2001,164(1):205-207

Parallel individual and ecological analyses of data on residential radon have been performed using information on cases of lung cancer and population controls from a recent study in south-west England. For the individual analysis the overall results indicated that the relative risk of lung cancer at 100 Bq m⁻³ compared with at 0 Bq m⁻³ was 1.12 (95% confidence interval (0.99, 1.27)) after adjusting for age, sex, smoking, county of residence and social class. In the ecological analysis substantial bias in the estimated effect of radon was present for one of the two counties involved unless an additional variable, urban–rural status, was included in the model, although this variable was not an important confounder in the individual level analysis. Most of the methods that have been recommended for overcoming the limitations of ecological studies would not in practice have proved useful in identifying this variable as an appreciable source of bias. 相似文献

17.

Bayesian calibration of computer models 总被引：5，自引：0，他引：5

Marc C. Kennedy & Anthony O'Hagan 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2001,63(3):425-464

We consider prediction and uncertainty analysis for systems which are approximated using complex mathematical models. Such models, implemented as computer codes, are often generic in the sense that by a suitable choice of some of the model's input parameters the code can be used to predict the behaviour of the system in a variety of specific applications. However, in any specific application the values of necessary parameters may be unknown. In this case, physical observations of the system in the specific context are used to learn about the unknown parameters. The process of fitting the model to the observed data by adjusting the parameters is known as calibration. Calibration is typically effected by ad hoc fitting, and after calibration the model is used, with the fitted input values, to predict the future behaviour of the system. We present a Bayesian calibration technique which improves on this traditional approach in two respects. First, the predictions allow for all sources of uncertainty, including the remaining uncertainty over the fitted parameters. Second, they attempt to correct for any inadequacy of the model which is revealed by a discrepancy between the observed data and the model predictions from even the best-fitting parameter values. The method is illustrated by using data from a nuclear radiation release at Tomsk, and from a more complex simulated nuclear accident exercise. 相似文献

18.

Characterizations of Competing Risks in Terms of Independent-Risks Proxy Models

Martin Crowder 《Scandinavian Journal of Statistics》2000,27(1):57-64

In competing risks a failure time T and a cause C , one of p possible, are observed. A traditional representation is via a vector ( T ₁, ..., T_p ) of latent failure times such that T = min( T ₁, ..., T_p ); C is defined by T = T_C in the basic situation of failure from a single cause. There are several results in the literature to the effect that a joint distribution for ( T ₁, ..., T_p ), in which the T_j are independent, can always be constructed to yield any given bivariate distribution for ( C , T ). For this reason the prevailing wisdom is that independence cannot be assessed from competing risks data, not even with arbitrarily large sample sizes (e.g. Prentice et al. , 1978). A result was given by Crowder (1996) which shows that, under certain circumstances, independence can be assessed. The various results will be drawn together and a complete characterization can now be given in terms of independent-risks proxy models. 相似文献

19.

Estimating Main Effects with Pareto Optimal Subsets

Damaraju Raghavarao & James B. Wiley 《Australian & New Zealand Journal of Statistics》1998,40(4):425-432

A subset T of S is said to be a Pareto Optimal subset of m ordered attributes (factors) if for profiles (combination of attribute levels) ( x ₁, …, x_m ) and ( y ₁, …, y_m ) ∈ T , no profile 'dominates' another; that is, there exists no pair such that x_i ≤ y_i , for i = 1, …, m . Pareto Optimal designs have specific applications in economics, cognitive psychology, and marketing research where investigators use main effects linear models to infer how respondents values level of costs and benefits from their preferences for sets of profiles offered them. In such studies, it is desirable that no profile dominates the others in a set. This paper shows how to construct a Pareto Optimal subset, proves that a single Pareto Optimal subset is not a connected main effects plan, provides subsets of two or more attributes that are connected in symmetric designs and gives corresponding results for asymmetric designs. 相似文献

20.

Estimation of the basic reproduction number for infectious diseases from age-stratified serological survey data 总被引：1，自引：0，他引：1

C. P. Farrington M. N. Kanaan & N. J. Gay 《Journal of the Royal Statistical Society. Series C, Applied statistics》2001,50(3):251-292

The basic reproduction number of an infection, R ₀, is the average number of secondary infections generated by a single typical infective individual in a totally susceptible population. It is directly related to the effort required to eliminate infection. We consider statistical methods for estimating R ₀ from age-stratified serological survey data. The main difficulty is indeterminacy, since the contacts between individuals of different ages are not observed. We show that, given an estimate of the average age-specific hazard of infection, a particular leading left eigenfunction is required to specify R ₀. We review existing methods of estimation in the light of this indeterminacy. We suggest using data from several infections transmitted via the same route, and we propose that the choice of model be guided by a criterion based on similarity of their contact functions. This approach also allows model uncertainty to be taken into account. If one infection induces no lasting immunity, we show that the only additional assumption required to estimate R ₀ is that the contact function is symmetric. When matched data on two or more infections transmitted by the same route are available, the methods may be extended to incorporate the effect of individual heterogeneity. The approach can also be applied in partially vaccinated populations and to populations comprising loosely linked communities. The methods are illustrated with data on hepatitis A, mumps, rubella, parvovirus, Haemophilus influenzae type b and measles infection. 相似文献