首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The Analysis of Crop Variety Evaluation Data in Australia   总被引:5,自引:0,他引:5  
The major aim of crop variety evaluation is to predict the future performance of varieties. This paper presents the routine statistical analysis of data from late-stage testing of crop varieties in Australia. It uses a two-stage approach for analysis. The data from individual trials from the current year are analysed using spatial techniques. The resultant table of variety-by-trial means is combined with tables from previous years to form the data for an overall mixed model analysis. Weights allow for the data being estimates with varying accuracy. In view of the predictive aim of the analysis, variety effects and interactions are regarded as random effects. Appropriate inferential tools have been developed to assist with interpretation of the results. Analyses must be conducted in a timely manner so that variety predictions can be published and disseminated to growers immediately after harvest each year. Factors which facilitate this include easy access to historic data and the use of specialist mixed model software.  相似文献   

2.
We consider the construction of designs for test-control field experiments, with particular attention being paid to the effects of spatial correlation between adjoining plots. In contrast to previous approaches, in which very specific correlation structures were modelled, we explicitly allow a degree of uncertainty on the part of the experimenter. While fitting a particular correlation structure—and variance structure and regression response—the experimenter is thought to be seeking protection against other possible structures in full neighbourhoods of these particular choices. Robustness, in a minimax sense, is obtained through a modification of the kriging estimation procedure, and through the assignment of treatments to field plots.  相似文献   

3.
This article demonstrates the application of classification trees (decision trees), logistic regression (LR), and linear discriminant function (LDR) to classify data of water quality (i.e., whether the water is fit for drinking on not fit for drinking). The data on water quality were obtained from Pakistan Council of Research in Water Resources (PCRWR) for two cities of Pakistan—one representing industrial environment (Sialkot) and the other one representing non-industrial environment (Narowal). To classify data on water quality, three statistical tools were employed—the Decision Tree methodology using Gini Index, LR, and LDA—using R software library. The results obtained by the said three techniques were compared using misclassification rates (a model with minimum value of misclassification rate is better). It was witnessed that LR performed well than the other two techniques while the Decision trees and LDA performed equally well. But for illustration purposes decision trees technique is comparatively easy to draw and interpret.  相似文献   

4.
We set out IDR as a loglinear-model-based Moran's I test for Poisson count data that resembles the Moran's I residual test for Gaussian data. We evaluate its type I and type II error probabilities via simulations, and demonstrate its utility via a case study. When population sizes are heterogeneous, IDR is effective in detecting local clusters by local association terms with an acceptable type I error probability. When used in conjunction with local spatial association terms in loglinear models, IDR can also indicate the existence of first-order global cluster that can hardly be removed by local spatial association terms. In this situation, IDR should not be directly applied for local cluster detection. In the case study of St. Louis homicides, we bridge loglinear model methods for parameter estimation to exploratory data analysis, so that a uniform association term can be defined with spatially varied contributions among spatial neighbors. The method makes use of exploratory tools such as Moran's I scatter plots and residual plots to evaluate the magnitude of deviance residuals, and it is effective to model the shape, the elevation and the magnitude of a local cluster in the model-based test.  相似文献   

5.
We describe inferactive data analysis, so-named to denote an interactive approach to data analysis with an emphasis on inference after data analysis. Our approach is a compromise between Tukey's exploratory and confirmatory data analysis allowing also for Bayesian data analysis. We see this as a useful step in concrete providing tools (with statistical guarantees) for current data scientists. The basis of inference we use is (a conditional approach to) selective inference, in particular its randomized form. The relevant reference distributions are constructed from what we call a DAG-DAG—a Data Analysis Generative DAG, and a selective change of variables formula is crucial to any practical implementation of inferactive data analysis via sampling these distributions. We discuss a canonical example of an incomplete cross-validation test statistic to discriminate between black box models, and a real HIV dataset example to illustrate inference after making multiple queries on data.  相似文献   

6.
Statistical space–time modelling has traditionally been concerned with separable covariance functions, meaning that the covariance function is a product of a purely temporal function and a purely spatial function. We draw attention to a physical dispersion model which could model phenomena such as the spread of an air pollutant. We show that this model has a non-separable covariance function. The model is well suited to a wide range of realistic problems which will be poorly fitted by separable models. The model operates successively in time: the spatial field at time t +1 is obtained by 'blurring' the field at time t and adding a spatial random field. The model is first introduced at discrete time steps, and the limit is taken as the length of the time steps goes to 0. This gives a consistent continuous model with parameters that are interpretable in continuous space and independent of sampling intervals. Under certain conditions the blurring must be a Gaussian smoothing kernel. We also show that the model is generated by a stochastic differential equation which has been studied by several researchers previously.  相似文献   

7.
Process capability indices (PCIs) are tools widely used by the industries to determine the quality of their products and the performance of their manufacturing processes. Classic versions of these indices were constructed for processes whose quality characteristics have a normal distribution. In practice, many of these characteristics do not follow this distribution. In such a case, the classic PCIs must be modified to take into account the non-normality. Ignoring the effect of this non-normality can lead to misinterpretation of the process capability and ill-advised business decisions. An asymmetric non-normal model that is receiving considerable attention due to its good properties is the Birnbaum–Saunders (BS) distribution. We propose, develop, implement and apply a methodology based on PCIs for BS processes considering estimation, parametric inference, bootstrap and optimization tools. This methodology is implemented in the statistical software {\tt R}. A simulation study is conducted to evaluate its performance. Real-world case studies with applications for three data sets are carried out to illustrate its potentiality. One of these data sets was already published and is associated with the electronic industry, whereas the other two are unpublished and associated with the food industry.  相似文献   

8.
Summary.  We define residuals for point process models fitted to spatial point pattern data, and we propose diagnostic plots based on them. The residuals apply to any point process model that has a conditional intensity; the model may exhibit spatial heterogeneity, interpoint interaction and dependence on spatial covariates. Some existing ad hoc methods for model checking (quadrat counts, scan statistic, kernel smoothed intensity and Berman's diagnostic) are recovered as special cases. Diagnostic tools are developed systematically, by using an analogy between our spatial residuals and the usual residuals for (non-spatial) generalized linear models. The conditional intensity λ plays the role of the mean response. This makes it possible to adapt existing knowledge about model validation for generalized linear models to the spatial point process context, giving recommendations for diagnostic plots. A plot of smoothed residuals against spatial location, or against a spatial covariate, is effective in diagnosing spatial trend or co-variate effects. Q – Q -plots of the residuals are effective in diagnosing interpoint interaction.  相似文献   

9.
This paper explains the approach to parameter estimation based on the idea of simultaneous models. Instead of using a single shape—as for example the normal distribution—a simultaneous model uses a finite number of distinct shapes F, G, etc. Such simultaneous systems are tools in gauging the finite sample behavior of estimators. And they can be applied in the design of an estimator with prescribed desirable properties. The problem considered in this paper is interval estimation for a scale parameter. We discuss among other things the computation of optimal estimators in simultaneous models and study more closely the case of protecting against heavy-tailed error distributions.  相似文献   

10.
Genetic data are in widespread use in ecological research, and an understanding of this type of data and its uses and interpretations will soon be an imperative for ecological statisticians. Here, we provide an introduction to the subject, intended for statisticians who have no previous knowledge of genetics. Although there are numerous types of genetic data, we restrict attention to multilocus genotype data from microsatellite loci. We look at two application areas in wide use: investigating population structure using genetic assignment and related techniques; and using genotype data in capture–recapture studies for estimating population size and demographic parameters. In each case, we outline the conceptual framework and draw attention to both the strengths and weaknesses of existing approaches to analysis and interpretation.  相似文献   

11.
ABSTRACT

Efforts to address a reproducibility crisis have generated several valid proposals for improving the quality of scientific research. We argue there is also need to address the separate but related issues of relevance and responsiveness. To address relevance, researchers must produce what decision makers actually need to inform investments and public policy—that is, the probability that a claim is true or the probability distribution of an effect size given the data. The term responsiveness refers to the irregularity and delay in which issues about the quality of research are brought to light. Instead of relying on the good fortune that some motivated researchers will periodically conduct efforts to reveal potential shortcomings of published research, we could establish a continuous quality-control process for scientific research itself. Quality metrics could be designed through the application of this statistical process control for the research enterprise. We argue that one quality control metric—the probability that a research hypothesis is true—is required to address at least relevance and may also be part of the solution for improving responsiveness and reproducibility. This article proposes a “straw man” solution which could be the basis of implementing these improvements. As part of this solution, we propose one way to “bootstrap” priors. The processes required for improving reproducibility and relevance can also be part of a comprehensive statistical quality control for science itself by making continuously monitored metrics about the scientific performance of a field of research.  相似文献   

12.
A well-known difficulty in survey research is that respondents’ answers to questions can depend on arbitrary features of a survey’s design, such as the wording of questions or the ordering of answer choices. In this paper, we describe a novel set of tools for analyzing survey data characterized by such framing effects. We show that the conventional approach to analyzing data with framing effects—randomizing survey-takers across frames and pooling the responses—generally does not identify a useful parameter. In its place, we propose an alternative approach and provide conditions under which it identifies the responses that are unaffected by framing. We also present several results for shedding light on the population distribution of the individual characteristic the survey is designed to measure.  相似文献   

13.
In a recent issue of this journal, Holgersson et al. [Dummy variables vs. category-wise models, J. Appl. Stat. 41(2) (2014), pp. 233–241, doi:10.1080/02664763.2013.838665] compared the use of dummy coding in regression analysis to the use of category-wise models (i.e. estimating separate regression models for each group) with respect to estimating and testing group differences in intercept and in slope. They presented three objections against the use of dummy variables in a single regression equation, which could be overcome by the category-wise approach. In this note, I first comment on each of these three objections and next draw attention to some other issues in comparing these two approaches. This commentary further clarifies the differences and similarities between dummy variable and category-wise approaches.  相似文献   

14.
This paper describes the modelling and fitting of Gaussian Markov random field spatial components within a Generalized AdditiveModel for Location, Scale and Shape (GAMLSS) model. This allows modelling of any or all the parameters of the distribution for the response variable using explanatory variables and spatial effects. The response variable distribution is allowed to be a non-exponential family distribution. A new package developed in R to achieve this is presented. We use Gaussian Markov random fields to model the spatial effect in Munich rent data and explore some features and characteristics of the data. The potential of using spatial analysis within GAMLSS is discussed. We argue that the flexibility of parametric distributions, ability to model all the parameters of the distribution and diagnostic tools of GAMLSS provide an ideal environment for modelling spatial features of data.  相似文献   

15.
It sometimes occurs that one or more components of the data exert a disproportionate influence on the model estimation. We need a reliable tool for identifying such troublesome cases in order to decide either eliminate from the sample, when the data collect was badly realized, or otherwise take care on the use of the model because the results could be affected by such components. Since a measure for detecting influential cases in linear regression setting was proposed by Cook [Detection of influential observations in linear regression, Technometrics 19 (1977), pp. 15–18.], apart from the same measure for other models, several new measures have been suggested as single-case diagnostics. For most of them some cutoff values have been recommended (see [D.A. Belsley, E. Kuh, and R.E. Welsch, Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, 2nd ed., John Wiley & Sons, New York, Chichester, Brisban, (2004).], for instance), however the lack of a quantile type cutoff for Cook's statistics has induced the analyst to deal only with index plots as worthy diagnostic tools. Focussed on logistic regression, the aim of this paper is to provide the asymptotic distribution of Cook's distance in order to look for a meaningful cutoff point for detecting influential and leverage observations.  相似文献   

16.
Bayesian palaeoclimate reconstruction   总被引:1,自引:0,他引:1  
Summary.  We consider the problem of reconstructing prehistoric climates by using fossil data that have been extracted from lake sediment cores. Such reconstructions promise to provide one of the few ways to validate modern models of climate change. A hierarchical Bayesian modelling approach is presented and its use, inversely, is demonstrated in a relatively small but statistically challenging exercise: the reconstruction of prehistoric climate at Glendalough in Ireland from fossil pollen. This computationally intensive method extends current approaches by explicitly modelling uncertainty and reconstructing entire climate histories. The statistical issues that are raised relate to the use of compositional data (pollen) with covariates (climate) which are available at many modern sites but are missing for the fossil data. The compositional data arise as mixtures and the missing covariates have a temporal structure. Novel aspects of the analysis include a spatial process model for compositional data, local modelling of lattice data, the use, as a prior, of a random walk with long-tailed increments, a two-stage implementation of the Markov chain Monte Carlo approach and a fast approximate procedure for cross-validation in inverse problems. We present some details, contrasting its reconstructions with those which have been generated by a method in use in the palaeoclimatology literature. We suggest that the method provides a basis for resolving important challenging issues in palaeoclimate research. We draw attention to several challenging statistical issues that need to be overcome.  相似文献   

17.
Testing for spatial clustering of count data is an important problem in spatial data analysis. Several procedures have been proposed to this end but despite their extensive use, studies of their fundamental theoretical properties are almost non‐existent. The authors suggest two conditions that any reasonable test for spatial clustering should satisfy. The latter are based on the notion that the null hypothesis should be rejected almost surely as the amount of spatial clustering tends to infinity. The authors show that the chisquared test and the Potthoff—Whittinghill V have both properties but that other classical tests do not.  相似文献   

18.
Data‐analytic tools for models other than the normal linear regression model are relatively rare. Here we develop plots and diagnostic statistics for nonconstant variance for the random‐effects model (REM). REMs for longitudinal data include both within‐ and between‐subject variances. A basic assumption is that the two variance terms are constant across subjects. However, we often find that these variances are functions of covariates, and the data set has what we call explainable heterogeneity, which needs to be allowed for in the model. We characterize several types of heterogeneity of variance in REMs and develop three diagnostic tests using the score statistic: one for each of the two variance terms, and the third for a form of multivariate nonconstant variance. For each test we present an adjusted residual plot which can identify cases that are unusually influential on the outcome of the test.  相似文献   

19.
Abstract

The Birnbaum-Saunders (BS) distribution is an asymmetric probability model that is receiving considerable attention. In this article, we propose a methodology based on a new class of BS models generated from the Student-t distribution. We obtain a recurrence relationship for a BS distribution based on a nonlinear skew–t distribution. Model parameters estimators are obtained by means of the maximum likelihood method, which are evaluated by Monte Carlo simulations. We illustrate the obtained results by analyzing two real data sets. These data analyses allow the adequacy of the proposed model to be shown and discussed by applying model selection tools.  相似文献   

20.
In statistical modeling, we strive to specify models that resemble data collected in studies or observed from processes. Consequently, distributional specification and parameter estimation are central to parametric models. Graphical procedures, such as the quantile–quantile (QQ) plot, are arguably the most widely used method of distributional assessment, though critics find their interpretation to be overly subjective. Formal goodness of fit tests are available and are quite powerful, but only indicate whether there is a lack of fit, not why there is lack of fit. In this article, we explore the use of the lineup protocol to inject rigor into graphical distributional assessment and compare its power to that of formal distributional tests. We find that lineup tests are considerably more powerful than traditional tests of normality. A further investigation into the design of QQ plots shows that de-trended QQ plots are more powerful than the standard approach as long as the plot preserves distances in x and y to be the same. While we focus on diagnosing nonnormality, our approach is general and can be directly extended to the assessment of other distributions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号