首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper focuses on estimating the number of species and the number of abundant species in a specific geographic region and, consequently, draw inferences on the number of rare species. The word 'species' is generic referring to any objects in a population that can be categorized. In the areas of biology, ecology, literature, etc, the species frequency distributions are usually severely skewed, in which case the population contains a few very abundant species and many rare ones. To model a such situation, we develop an asymmetric multinomial-Dirichlet probability model using species frequency data. Posterior distributions on the number of species and the number of abundant species are obtained and posterior inferences are induced using MCMC simulations. Simulations are used to demonstrate and evaluate the developed methodology. We apply the method to a DNA segment data set and a butterfly data set. Comparisons among different approaches to inferring the number of species are also discussed in this paper.  相似文献   

2.
This article is concerned with the simulation of one‐day cricket matches. Given that only a finite number of outcomes can occur on each ball that is bowled, a discrete generator on a finite set is developed where the outcome probabilities are estimated from historical data involving one‐day international cricket matches. The probabilities depend on the batsman, the bowler, the number of wickets lost, the number of balls bowled and the innings. The proposed simulator appears to do a reasonable job at producing realistic results. The simulator allows investigators to address complex questions involving one‐day cricket matches. The Canadian Journal of Statistics © 2009 Statistical Society of Canada  相似文献   

3.
We propose a method of comparing two functional linear models in which explanatory variables are functions (curves) and responses can be either scalars or functions. In such models, the role of parameter vectors (or matrices) is played by integral operators acting on a function space. We test the null hypothesis that these operators are the same in two independent samples. The complexity of the test statistics increases as we move from scalar to functional responses and relax assumptions on the covariance structure of the regressors. They all, however, have an asymptotic chi‐squared distribution with the number of degrees of freedom which depends on a specific setting. The test statistics are readily computable using the R package fda , and have good finite sample properties. The test is applied to egg‐laying curves of Mediterranean flies and to data from terrestrial magnetic observatories. The Canadian Journal of Statistics © 2009 Statistical Society of Canada  相似文献   

4.
We propose optimal procedures to achieve the goal of partitioning k multivariate normal populations into two disjoint subsets with respect to a given standard vector. Definition of good or bad multivariate normal populations is given according to their Mahalanobis distances to a known standard vector as being small or large. Partitioning k multivariate normal populations is reduced to partitioning k non-central Chi-square or non-central F distributions with respect to the corresponding non-centrality parameters depending on whether the covariance matrices are known or unknown. The minimum required sample size for each population is determined to ensure that the probability of correct decision attains a certain level. An example is given to illustrate our procedures.  相似文献   

5.
In this paper, we consider simple random sampling without replacement from a dichotomous finite population. We investigate accuracy of the Normal approximation to the Hypergeometric probabilities for a wide range of parameter values, including the nonstandard cases where the sampling fraction tends to one and where the proportion of the objects of interest in the population tends to the boundary values, zero and one. We establish a non-uniform Berry–Esseen theorem for the Hypergeometric distribution which shows that in the nonstandard cases, the rate of Normal approximation to the Hypergeometric distribution can be considerably slower than the rate of Normal approximation to the Binomial distribution. We also report results from a moderately large numerical study and provide some guidelines for using the Normal approximation to the Hypergeometric distribution in finite samples.  相似文献   

6.
This paper is concerned with undoing aliasing effects, which arise from discretely sampling a continuous‐time stochastic process. Such effects are manifested in the frequency‐domain relationships between the sampled and original processes. The authors describe a general technique to undo aliasing effects, given two processes, one being a time‐delayed version of the other. The technique is based on the observations that certain phase information between the two processes is unaffected by sampling, is completely determined by the (known) time delay, and contains sufficient information to undo aliasing effects. The authors illustrate their technique with a simulation example. The theoretical model is motivated by the helioseismological problem of determining modes of solar pressure waves. The authors apply their technique to solar radio data, and conclude that certain low‐frequency modes known in the helioseismology literature are likely the result of aliasing effects. The Canadian Journal of Statistics 38: 116–135; 2010 © 2010 Statistical Society of Canada  相似文献   

7.
Ranked set sampling (RSS) was first proposed by McIntyre [1952. A method for unbiased selective sampling, using ranked sets. Australian J. Agricultural Res. 3, 385–390] as an effective way to estimate the unknown population mean. Chuiv and Sinha [1998. On some aspects of ranked set sampling in parametric estimation. In: Balakrishnan, N., Rao, C.R. (Eds.), Handbook of Statistics, vol. 17. Elsevier, Amsterdam, pp. 337–377] and Chen et al. [2004. Ranked Set Sampling—Theory and Application. Lecture Notes in Statistics, vol. 176. Springer, New York] have provided excellent surveys of RSS and various inferential results based on RSS. In this paper, we use the idea of order statistics from independent and non-identically distributed (INID) random variables to propose ordered ranked set sampling (ORSS) and then develop optimal linear inference based on ORSS. We determine the best linear unbiased estimators based on ORSS (BLUE-ORSS) and show that they are more efficient than BLUE-RSS for the two-parameter exponential, normal and logistic distributions. Although this is not the case for the one-parameter exponential distribution, the relative efficiency of the BLUE-ORSS (to BLUE-RSS) is very close to 1. Furthermore, we compare both BLUE-ORSS and BLUE-RSS with the BLUE based on order statistics from a simple random sample (BLUE-OS). We show that BLUE-ORSS is uniformly better than BLUE-OS, while BLUE-RSS is not as efficient as BLUE-OS for small sample sizes (n<5n<5).  相似文献   

8.
We consider Dirichlet process mixture models in which the observed clusters in any particular dataset are not viewed as belonging to a finite set of possible clusters but rather as representatives of a latent structure in which objects belong to one of a potentially infinite number of clusters. As more information is revealed the number of inferred clusters is allowed to grow. The precision parameter of the Dirichlet process is a crucial parameter that controls the number of clusters. We develop a framework for the specification of the hyperparameters associated with the prior for the precision parameter that can be used both in the presence or absence of subjective prior information about the level of clustering. Our approach is illustrated in an analysis of clustering brands at the magazine Which?. The results are compared with the approach of Dorazio (2009) via a simulation study.  相似文献   

9.
In recent years analyses of dependence structures using copulas have become more popular than the standard correlation analysis. Starting from Aas et al. ( 2009 ) regular vine pair‐copula constructions (PCCs) are considered the most flexible class of multivariate copulas. PCCs are involved objects but (conditional) independence present in data can simplify and reduce them significantly. In this paper the authors detect (conditional) independence in a particular vine PCC model based on bivariate t copulas by deriving and implementing a reversible jump Markov chain Monte Carlo algorithm. However, the methodology is general and can be extended to any regular vine PCC and to all known bivariate copula families. The proposed approach considers model selection and estimation problems for PCCs simultaneously. The effectiveness of the developed algorithm is shown in simulations and its usefulness is illustrated in two real data applications. The Canadian Journal of Statistics 39: 239–258; 2011 © 2011 Statistical Society of Canada  相似文献   

10.
Consider a finite population of large but unknown size of hidden objects. Consider searching for these objects for a period of time, at a certain cost, and receiving a reward depending on the sizes of the objects found. Suppose that the size and discovery time of the objects both have unknown distributions, but the conditional distribution of time given size is exponential with an unknown non-negative and non-decreasing function of the size as failure rate. The goal is to find an optimal way to stop the discovery process. Assuming that the above parameters are known, an optimal stopping time is derived and its asymptotic properties are studied. Then, an adaptive rule based on order restricted estimates of the distributions from truncated data is presented. This adaptive rule is shown to perform nearly as well as the optimal stopping time for large population size.  相似文献   

11.
In many applications of generalized linear mixed models to clustered correlated or longitudinal data, often we are interested in testing whether a random effects variance component is zero. The usual asymptotic mixture of chi‐square distributions of the score statistic for testing constrained variance components does not necessarily hold. In this article, the author proposes and explores a parametric bootstrap test that appears to be valid based on its estimated level of significance under the null hypothesis. Results from a simulation study indicate that the bootstrap test has a level much closer to the nominal one while the asymptotic test is conservative, and is more powerful than the usual asymptotic score test based on a mixture of chi‐squares. The proposed bootstrap test is illustrated using two sets of real‐life data obtained from clinical trials. The Canadian Journal of Statistics © 2009 Statistical Society of Canada  相似文献   

12.
Longitudinal surveys have emerged in recent years as an important data collection tool for population studies where the primary interest is to examine population changes over time at the individual level. Longitudinal data are often analyzed through the generalized estimating equations (GEE) approach. The vast majority of existing literature on the GEE method; however, is developed under non‐survey settings and are inappropriate for data collected through complex sampling designs. In this paper the authors develop a pseudo‐GEE approach for the analysis of survey data. They show that survey weights must and can be appropriately accounted in the GEE method under a joint randomization framework. The consistency of the resulting pseudo‐GEE estimators is established under the proposed framework. Linearization variance estimators are developed for the pseudo‐GEE estimators when the finite population sampling fractions are small or negligible, a scenario often held for large‐scale surveys. Finite sample performances of the proposed estimators are investigated through an extensive simulation study using data from the National Longitudinal Survey of Children and Youth. The results show that the pseudo‐GEE estimators and the linearization variance estimators perform well under several sampling designs and for both continuous and binary responses. The Canadian Journal of Statistics 38: 540–554; 2010 © 2010 Statistical Society of Canada  相似文献   

13.
Pharmacokinetic (PK) data often contain concentration measurements below the quantification limit (BQL). While specific values cannot be assigned to these observations, nevertheless these observed BQL data are informative and generally known to be lower than the lower limit of quantification (LLQ). Setting BQLs as missing data violates the usual missing at random (MAR) assumption applied to the statistical methods, and therefore leads to biased or less precise parameter estimation. By definition, these data lie within the interval [0, LLQ], and can be considered as censored observations. Statistical methods that handle censored data, such as maximum likelihood and Bayesian methods, are thus useful in the modelling of such data sets. The main aim of this work was to investigate the impact of the amount of BQL observations on the bias and precision of parameter estimates in population PK models (non‐linear mixed effects models in general) under maximum likelihood method as implemented in SAS and NONMEM, and a Bayesian approach using Markov chain Monte Carlo (MCMC) as applied in WinBUGS. A second aim was to compare these different methods in dealing with BQL or censored data in a practical situation. The evaluation was illustrated by simulation based on a simple PK model, where a number of data sets were simulated from a one‐compartment first‐order elimination PK model. Several quantification limits were applied to each of the simulated data to generate data sets with certain amounts of BQL data. The average percentage of BQL ranged from 25% to 75%. Their influence on the bias and precision of all population PK model parameters such as clearance and volume distribution under each estimation approach was explored and compared. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

14.
In many applications, a finite population contains a large proportion of zero values that make the population distribution severely skewed. An unequal‐probability sampling plan compounds the problem, and as a result the normal approximation to the distribution of various estimators has poor precision. The central‐limit‐theorem‐based confidence intervals for the population mean are hence unsatisfactory. Complex designs also make it hard to pin down useful likelihood functions, hence a direct likelihood approach is not an option. In this paper, we propose a pseudo‐likelihood approach. The proposed pseudo‐log‐likelihood function is an unbiased estimator of the log‐likelihood function when the entire population is sampled. Simulations have been carried out. When the inclusion probabilities are related to the unit values, the pseudo‐likelihood intervals are superior to existing methods in terms of the coverage probability, the balance of non‐coverage rates on the lower and upper sides, and the interval length. An application with a data set from the Canadian Labour Force Survey‐2000 also shows that the pseudo‐likelihood method performs more appropriately than other methods. The Canadian Journal of Statistics 38: 582–597; 2010 © 2010 Statistical Society of Canada  相似文献   

15.
Efficiency and robustness are two fundamental concepts in parametric estimation problems. It was long thought that there was an inherent contradiction between the aims of achieving robustness and efficiency; that is, a robust estimator could not be efficient and vice versa. It is now known that the minimum Hellinger distance approached introduced by Beran [R. Beran, Annals of Statistics 1977;5:445–463] is one way of reconciling the conflicting concepts of efficiency and robustness. For parametric models, it has been shown that minimum Hellinger estimators achieve efficiency at the model density and simultaneously have excellent robustness properties. In this article, we examine the application of this approach in two semiparametric models. In particular, we consider a two‐component mixture model and a two‐sample semiparametric model. In each case, we investigate minimum Hellinger distance estimators of finite‐dimensional Euclidean parameters of particular interest and study their basic asymptotic properties. Small sample properties of the proposed estimators are examined using a Monte Carlo study. The results can be extended to semiparametric models of general form as well. The Canadian Journal of Statistics 37: 514–533; 2009 © 2009 Statistical Society of Canada  相似文献   

16.
We consider variable selection in linear regression of geostatistical data that arise often in environmental and ecological studies. A penalized least squares procedure is studied for simultaneous variable selection and parameter estimation. Various penalty functions are considered including smoothly clipped absolute deviation. Asymptotic properties of penalized least squares estimates, particularly the oracle properties, are established, under suitable regularity conditions imposed on a random field model for the error process. Moreover, computationally feasible algorithms are proposed for estimating regression coefficients and their standard errors. Finite‐sample properties of the proposed methods are investigated in a simulation study and comparison is made among different penalty functions. The methods are illustrated by an ecological dataset of landcover in Wisconsin. The Canadian Journal of Statistics 37: 607–624; 2009 © 2009 Statistical Society of Canada  相似文献   

17.
The evaluation of new processor designs is an important issue in electrical and computer engineering. Architects use simulations to evaluate designs and to understand trade‐offs and interactions among design parameters. However, due to the lengthy simulation time and limited resources, it is often practically impossible to simulate a full factorial design space. Effective sampling methods and predictive models are required. In this paper, the authors propose an automated performance predictive approach which employs an adaptive sampling scheme that interactively works with the predictive model to select samples for simulation. These samples are then used to build Bayesian additive regression trees, which in turn are used to predict the whole design space. Both real data analysis and simulation studies show that the method is effective in that, though sampling at very few design points, it generates highly accurate predictions on the unsampled points. Furthermore, the proposed model provides quantitative interpretation tools with which investigators can efficiently tune design parameters in order to improve processor performance. The Canadian Journal of Statistics 38: 136–152; 2010 © 2010 Statistical Society of Canada  相似文献   

18.
The paper investigates random processes of geometrical objects in Euclidean spaces. General properties of the measure of total projections are derived by means of Palm distribution. Explicit formulas for variances of the projection measure are obtained for Poisson point processes of compact sets.

Intensity estimators of fibre (surface) processes are then studied by means of projection measures. Classification of direct and indirect probes is introduced. The indirect sampling design of vertical sections and projections is generalized and its statistical properties derived.  相似文献   

19.
For capture–recapture models when covariates are subject to measurement errors and missing data, a set of estimating equations is constructed to estimate population size and relevant parameters. These estimating equations can be solved by an algorithm similar to the EM algorithm. The proposed method is also applicable to the situation when covariates with no measurement errors have missing data. Simulation studies are used to assess the performance of the proposed estimator. The estimator is also applied to a capture–recapture experiment on the bird species Prinia flaviventris in Hong Kong. The Canadian Journal of Statistics 37: 645–658; 2009 © 2009 Statistical Society of Canada  相似文献   

20.
Small area estimation has received considerable attention in recent years because of growing demand for small area statistics. Basic area‐level and unit‐level models have been studied in the literature to obtain empirical best linear unbiased prediction (EBLUP) estimators of small area means. Although this classical method is useful for estimating the small area means efficiently under normality assumptions, it can be highly influenced by the presence of outliers in the data. In this article, the authors investigate the robustness properties of the classical estimators and propose a resistant method for small area estimation, which is useful for downweighting any influential observations in the data when estimating the model parameters. To estimate the mean squared errors of the robust estimators of small area means, a parametric bootstrap method is adopted here, which is applicable to models with block diagonal covariance structures. Simulations are carried out to study the behaviour of the proposed robust estimators in the presence of outliers, and these estimators are also compared to the EBLUP estimators. Performance of the bootstrap mean squared error estimator is also investigated in the simulation study. The proposed robust method is also applied to some real data to estimate crop areas for counties in Iowa, using farm‐interview data on crop areas and LANDSAT satellite data as auxiliary information. The Canadian Journal of Statistics 37: 381–399; 2009 © 2009 Statistical Society of Canada  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号