首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 312 毫秒
1.
Summary.  The process of quality control of micrometeorological and carbon dioxide (CO2) flux data can be subjective and may lack repeatability, which would undermine the results of many studies. Multivariate statistical methods and time series analysis were used together and independently to detect and replace outliers in CO2 flux data derived from a Bowen ratio energy balance system. The results were compared with those produced by five experts who applied the current and potentially subjective protocol. All protocols were tested on the same set of three 5-day periods, when measurements were conducted in an abandoned agricultural field. The concordance of the protocols was evaluated by using the experts' opinion (mean ± 1.96 standard deviations) as a reference interval (the Bland–Altman method). Analysing the 15 days together, the statistical protocol that combined multivariate distance, multiple linear regression and time series analysis showed a concordance of 93% on a 20-min flux basis and 87% on a daily basis (only 2 days fell outside the reference interval), and the overall flux differed only by 1.7% (3.2 g CO2 m−2). An automated version of this or a similar statistical protocol could be used as a standard way of filling gaps and processing data from Bowen ratio energy balance and other techniques (e.g. eddy covariance). This would enforce objectivity in comparisons of CO2 flux data that are generated by different research groups and streamline the protocols for quality control.  相似文献   

2.
Summary.  Principal component analysis has become a fundamental tool of functional data analysis. It represents the functional data as X i ( t )= μ ( t )+Σ1≤ l <∞ η i ,  l +  v l ( t ), where μ is the common mean, v l are the eigenfunctions of the covariance operator and the η i ,  l are the scores. Inferential procedures assume that the mean function μ ( t ) is the same for all values of i . If, in fact, the observations do not come from one population, but rather their mean changes at some point(s), the results of principal component analysis are confounded by the change(s). It is therefore important to develop a methodology to test the assumption of a common functional mean. We develop such a test using quantities which can be readily computed in the R package fda. The null distribution of the test statistic is asymptotically pivotal with a well-known asymptotic distribution. The asymptotic test has excellent finite sample performance. Its application is illustrated on temperature data from England.  相似文献   

3.
Short-term projections of the acquired immune deficiency syndrome (AIDS) epidemic in England and Wales have been regularly updated since the publication of the Cox report in 1988. The key approach for those updates has been the back-calculation method, which has been informally adapted to acknowledge various sources of uncertainty as well as to incorporate increasingly available information on the spread of the human immunodeficiency virus (HIV) in the population. We propose a Bayesian formulation of the back-calculation method which allows a formal treatment of uncertainty and the inclusion of extra information, within a single coherent composite model. Estimation of the variably dimensioned model is carried out by using reversible-jump Markov chain Monte Carlo methods. Application of the model to data for homosexual and bisexual males in England and Wales is presented, and the role of the various sources of information and model assumptions is appraised. Our results show a massive peak in HIV infections around 1983 and suggest that the incidence of AIDS has now reached a plateau, although there is still substantial uncertainty about the future.  相似文献   

4.
Running complex computer models can be expensive in computer time, while learning about the relationships between input and output variables can be difficult. An emulator is a fast approximation to a computationally expensive model that can be used as a surrogate for the model, to quantify uncertainty or to improve process understanding. Here, we examine emulators based on singular value decompositions (SVDs) and use them to emulate global climate and vegetation fields, examining how these fields are affected by changes in the Earth's orbit. The vegetation field may be emulated directly from the orbital variables, but an appealing alternative is to relate it to emulations of the climate fields, which involves high-dimensional input and output. The SVDs radically reduce the dimensionality of the input and output spaces and are shown to clarify the relationships between them. The method could potentially be useful for any complex process with correlated, high-dimensional inputs and/or outputs.  相似文献   

5.
Many mathematical models involve input parameters, which are not precisely known. Global sensitivity analysis aims to identify the parameters whose uncertainty has the largest impact on the variability of a quantity of interest (output of the model). One of the statistical tools used to quantify the influence of each input variable on the output is the Sobol sensitivity index. We consider the statistical estimation of this index from a finite sample of model outputs. We study asymptotic and non-asymptotic properties of two estimators of Sobol indices. These properties are applied to significance tests and estimation by confidence intervals.  相似文献   

6.
Bayesian selection of variables is often difficult to carry out because of the challenge in specifying prior distributions for the regression parameters for all possible models, specifying a prior distribution on the model space and computations. We address these three issues for the logistic regression model. For the first, we propose an informative prior distribution for variable selection. Several theoretical and computational properties of the prior are derived and illustrated with several examples. For the second, we propose a method for specifying an informative prior on the model space, and for the third we propose novel methods for computing the marginal distribution of the data. The new computational algorithms only require Gibbs samples from the full model to facilitate the computation of the prior and posterior model probabilities for all possible models. Several properties of the algorithms are also derived. The prior specification for the first challenge focuses on the observables in that the elicitation is based on a prior prediction y 0 for the response vector and a quantity a 0 quantifying the uncertainty in y 0. Then, y 0 and a 0 are used to specify a prior for the regression coefficients semi-automatically. Examples using real data are given to demonstrate the methodology.  相似文献   

7.
Penalized likelihood methods provide a range of practical modelling tools, including spline smoothing, generalized additive models and variants of ridge regression. Selecting the correct weights for penalties is a critical part of using these methods and in the single-penalty case the analyst has several well-founded techniques to choose from. However, many modelling problems suggest a formulation employing multiple penalties, and here general methodology is lacking. A wide family of models with multiple penalties can be fitted to data by iterative solution of the generalized ridge regression problem minimize || W 1/2 ( Xp − y ) ||2ρ+Σ i =1 m  θ i p ' S i p ( p is a parameter vector, X a design matrix, S i a non-negative definite coefficient matrix defining the i th penalty with associated smoothing parameter θ i , W a diagonal weight matrix, y a vector of data or pseudodata and ρ an 'overall' smoothing parameter included for computational efficiency). This paper shows how smoothing parameter selection can be performed efficiently by applying generalized cross-validation to this problem and how this allows non-linear, generalized linear and linear models to be fitted using multiple penalties, substantially increasing the scope of penalized modelling methods. Examples of non-linear modelling, generalized additive modelling and anisotropic smoothing are given.  相似文献   

8.
Suppose the p -variate random vector W , partitioned into q variables W1 and p - q variables W2, follows a multivariate normal mixture distribution. If the investigator is mainly interested in estimation of the parameters of the distribution of W1, there are two possibilities: (1) use only the data on W1 for estimation, and (2) estimate the parameters of the p -variate mixture distribution, and then extract the estimates of the marginal distribution of W1. In this article we study the choice between these two possibilities mainly for the case of two mixture components with identical covariance matrices. We find the asymptotic distribution of the linear discriminant function coefficients using the work of Efron (1975 ) and O'Neill (1978 ), and give a Wald–test for redundancy of W2. A simulation study gives further insights into conditions under which W2 should be used in the analysis: in summary, the inclusion of W2 seems justified if Δ 2.1, the Mahalanobis distance between the two component distributions based on the conditional distribution of W2 given W1, is at least 2.  相似文献   

9.
In sequential analysis it is often necessary to determine the distributions of √t Y t and/or √a Y t , where t is a stopping time of the form t = inf{ n ≥ 1 : n+Snn> a }, Y n is the sample mean of n independent and identically distributed random variables (iidrvs) Yi with mean zero and variance one, Sn is the partial sum of iidrvs Xi with mean zero and a positive finite variance, and { ξn } is a sequence of random variables that converges in distribution to a random variable ξ as n →∞ and ξn is independent of ( Xn+1, Yn+1), (Xn+2, Yn+2), . . . for all n ≥ 1. Anscombe's (1952) central limit theorem asserts that both √t Y t and √a Y t are asymptotically normal for large a , but a normal approximation is not accurate enough for many applications. Refined approximations are available only for a few special cases of the general setting above and are often very complex. This paper provides some simple Edgeworth approximations that are numerically satisfactory for the problems it considers.  相似文献   

10.
Let X 1, . . ., Xn be independent identically distributed random variables with a common continuous (cumulative) distribution function (d.f.) F , and F^n the empirical d.f. (e.d.f.) based on X 1, . . ., Xn . Let G be a smooth d.f. and Gθ = G (·–θ) its translation through θ∈ R . Using a Kolmogorov-Lévy type metric ρα defined on the space of d.f.s. on R , the paper derives both null and non-null limiting distributions of √ n [ ρα ( Fn , Gθn ) – ρα ( F, Gθ )], √ n (θ n –θ) and √ nρα ( Gθ , Gθ ), where θ n and θ are the minimum ρα -distance parameters for Fn and F from G , respectively. These distributions are known explicitly in important particular cases; with some complementary Monte Carlo simulations, they help us clarify our understanding of estimation using minimum distance methods and supremum type metrics. We advocate use of the minimum distance method with supremum type metrics in cases of non-null models. The resulting functionals are Hadamard differentiable and efficient. For small scale parameters the minimum distance functionals are close to medians of the parent distributions. The optimal small scale models result in minimum distance estimators having asymptotic variances very competitive and comparable with best known robust estimators.  相似文献   

11.
Summary.  A deterministic computer model is to be used in a situation where there is uncertainty about the values of some or all of the input parameters. This uncertainty induces uncertainty in the output of the model. We consider the problem of estimating a specific percentile of the distribution of this uncertain output. We also suppose that the computer code is computationally expensive, so we can run the model only at a small number of distinct inputs. This means that we must consider our uncertainty about the computer code itself at all untested inputs. We model the output, as a function of its inputs, as a Gaussian process, and after a few initial runs of the code use a simulation approach to choose further suitable design points and to make inferences about the percentile of interest itself. An example is given involving a model that is used in sewer design.  相似文献   

12.
Non-parametric Regression with Dependent Censored Data   总被引:1,自引:0,他引:1  
Abstract.  Let ( X i , Y i ) ( i = 1 ,…, n ) be n replications of a random vector ( X , Y  ), where Y is supposed to be subject to random right censoring. The data ( X i , Y i ) are assumed to come from a stationary α -mixing process. We consider the problem of estimating the function m ( x ) = E ( φ ( Y ) |  X = x ), for some known transformation φ . This problem is approached in the following way: first, we introduce a transformed variable     , that is not subject to censoring and satisfies the relation     , and then we estimate m ( x ) by applying local linear regression techniques. As a by-product, we obtain a general result on the uniform rate of convergence of kernel type estimators of functionals of an unknown distribution function, under strong mixing assumptions.  相似文献   

13.
There has been much recent interest in supersaturated designs and their application in factor screening experiments. Supersaturated designs have mainly been constructed by using the E ( s 2)-optimality criterion originally proposed by Booth and Cox in 1962. However, until now E ( s 2)-optimal designs have only been established with certainty for n experimental runs when the number of factors m is a multiple of n-1 , and in adjacent cases where m = q ( n -1) + r (| r | 2, q an integer). A method of constructing E ( s 2)-optimal designs is presented which allows a reasonably complete solution to be found for various numbers of runs n including n ,=8 12, 16, 20, 24, 32, 40, 48, 64.  相似文献   

14.
Parts of Britain resemble a melting pot that more than rivals any found in America. Some others appear to be social monocultures where almost everyone is white, married, has children and owns or is buying a house. But scratch beneath the superficial banality of middle England and there too is a rainbow: there is the widest variety of lives, life chances and experience. To reveal the social geology of Britain Bethan Thomas and Danny Dorling have drawn re-projected maps1—maps where the scale is measured in human lives.  相似文献   

15.
In the estimators t 3 , t 4 , t 5 of Mukerjee, Rao & Vijayan (1987), b y x and b y z are partial regression coefficients of y on x and z , respectively, based on the smaller sample. With the above interpretation of b y x and b y z in t 3 , t 4 , t 5 , all the calculations in Mukerjee at al. (1987) are correct. In this connection, we also wish to make it explicit that b x z in t 5 is an ordinary and not a partial regression coefficient. The 'corrected' MSEs of t 3 , t 4 , t 5 , as given in Ahmed (1998 Section 3) are computed assuming that our b y x and b y z are ordinary and not partial regression coefficients. Indeed, we had no intention of giving estimators using the corresponding ordinary regression coefficients which would lead to estimators inferior to those given by Kiregyera (1984). We accept responsibility for any notational confusion created by us and express regret to readers who have been confused by our notation. Finally, in consideration of the above, it may be noted that Tripathi & Ahmed's (1995) estimator t 0 , quoted also in Ahmed (1998), is no better than t 5 of Mukerjee at al. (1987).  相似文献   

16.
Parallel individual and ecological analyses of data on residential radon have been performed using information on cases of lung cancer and population controls from a recent study in south-west England. For the individual analysis the overall results indicated that the relative risk of lung cancer at 100 Bq m−3 compared with at 0 Bq m−3 was 1.12 (95% confidence interval (0.99, 1.27)) after adjusting for age, sex, smoking, county of residence and social class. In the ecological analysis substantial bias in the estimated effect of radon was present for one of the two counties involved unless an additional variable, urban–rural status, was included in the model, although this variable was not an important confounder in the individual level analysis. Most of the methods that have been recommended for overcoming the limitations of ecological studies would not in practice have proved useful in identifying this variable as an appreciable source of bias.  相似文献   

17.
Bayesian calibration of computer models   总被引:5,自引:0,他引:5  
We consider prediction and uncertainty analysis for systems which are approximated using complex mathematical models. Such models, implemented as computer codes, are often generic in the sense that by a suitable choice of some of the model's input parameters the code can be used to predict the behaviour of the system in a variety of specific applications. However, in any specific application the values of necessary parameters may be unknown. In this case, physical observations of the system in the specific context are used to learn about the unknown parameters. The process of fitting the model to the observed data by adjusting the parameters is known as calibration. Calibration is typically effected by ad hoc fitting, and after calibration the model is used, with the fitted input values, to predict the future behaviour of the system. We present a Bayesian calibration technique which improves on this traditional approach in two respects. First, the predictions allow for all sources of uncertainty, including the remaining uncertainty over the fitted parameters. Second, they attempt to correct for any inadequacy of the model which is revealed by a discrepancy between the observed data and the model predictions from even the best-fitting parameter values. The method is illustrated by using data from a nuclear radiation release at Tomsk, and from a more complex simulated nuclear accident exercise.  相似文献   

18.
In competing risks a failure time T and a cause C , one of p possible, are observed. A traditional representation is via a vector ( T 1, ..., Tp ) of latent failure times such that T = min( T 1, ..., Tp ); C is defined by T = TC in the basic situation of failure from a single cause. There are several results in the literature to the effect that a joint distribution for ( T 1, ..., Tp ), in which the Tj are independent, can always be constructed to yield any given bivariate distribution for ( C , T ). For this reason the prevailing wisdom is that independence cannot be assessed from competing risks data, not even with arbitrarily large sample sizes (e.g. Prentice et al. , 1978). A result was given by Crowder (1996) which shows that, under certain circumstances, independence can be assessed. The various results will be drawn together and a complete characterization can now be given in terms of independent-risks proxy models.  相似文献   

19.
A subset T of S is said to be a Pareto Optimal subset of m ordered attributes (factors) if for profiles (combination of attribute levels) ( x 1, …, xm ) and ( y 1, …, ym ) ∈ T , no profile 'dominates' another; that is, there exists no pair such that xi ≤ yi , for i = 1, …, m . Pareto Optimal designs have specific applications in economics, cognitive psychology, and marketing research where investigators use main effects linear models to infer how respondents values level of costs and benefits from their preferences for sets of profiles offered them. In such studies, it is desirable that no profile dominates the others in a set. This paper shows how to construct a Pareto Optimal subset, proves that a single Pareto Optimal subset is not a connected main effects plan, provides subsets of two or more attributes that are connected in symmetric designs and gives corresponding results for asymmetric designs.  相似文献   

20.
The basic reproduction number of an infection, R 0, is the average number of secondary infections generated by a single typical infective individual in a totally susceptible population. It is directly related to the effort required to eliminate infection. We consider statistical methods for estimating R 0 from age-stratified serological survey data. The main difficulty is indeterminacy, since the contacts between individuals of different ages are not observed. We show that, given an estimate of the average age-specific hazard of infection, a particular leading left eigenfunction is required to specify R 0. We review existing methods of estimation in the light of this indeterminacy. We suggest using data from several infections transmitted via the same route, and we propose that the choice of model be guided by a criterion based on similarity of their contact functions. This approach also allows model uncertainty to be taken into account. If one infection induces no lasting immunity, we show that the only additional assumption required to estimate R 0 is that the contact function is symmetric. When matched data on two or more infections transmitted by the same route are available, the methods may be extended to incorporate the effect of individual heterogeneity. The approach can also be applied in partially vaccinated populations and to populations comprising loosely linked communities. The methods are illustrated with data on hepatitis A, mumps, rubella, parvovirus, Haemophilus influenzae type b and measles infection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号