期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Estimation of misclassification probabilities by bootstrap methods

Samprit Chatterjee Sangit Chatterjee 《统计学通讯:模拟与计算》2013,42(6):645-656

Several methods have been proposed to estimate the misclassification probabilities when a linear discriminant function is used to classify an observation into one of several populations. We describe the application of bootstrap sampling to the above problem. The proposed method has the advantage of not only furnishing the estimates of misclassification probabilities but also provides an estimate of the standard error of estimate. The method is illustrated by a small simulation experiment. It is then applied to three published, well accessible data sets, which are typical of large, medium and small data sets encountered in practice. 相似文献

2.

Prior distributions for stratified capture-recapture models

J. A. Dupuis 《Journal of applied statistics》2002,29(1-4):225-237

We consider the Arnason-Schwarz model, usually used to estimate survival and movement probabilities from capture-recapture data. A missing data structure of this model is constructed which allows a clear separation of information relative to capture and relative to movement. Extensions of the Arnason-Schwarz model are considered. For example, we consider a model that takes into account both the individual migration history and the individual reproduction history. Biological assumptions of these extensions are summarized via a directed graph. Owing to missing data, the posterior distribution of parameters is numerically intractable. To overcome those computational difficulties we advocate a Gibbs sampling algorithm that takes advantage of the missing data structure inherent in capture-recapture models. Prior information on survival, capture and movement probabilities typically consists of a prior mean and of a prior 95% credible confidence interval. Dirichlet distributions are used to incorporate some prior information on capture, survival probabilities, and movement probabilities. Finally, the influence of the prior on the Bayesian estimates of movement probabilities is examined. 相似文献

3.

Bias in transition-specific survival and movement probabilities estimated using capture-recapture data

Jay B. Hestbeck 《Journal of applied statistics》1995,22(5-6):737-750

Transition probabilities can be estimated when capture-recapture data are available from each stratum on every capture occasion using a conditional likelihood approach with the Arnason-Schwarz model. To decompose the fundamental transition probabilities into derived parameters, all movement probabilities must sum to 1 and all individuals in stratum r at time i must have the same probability of survival regardless of which stratum the individual is in at time i + 1. If movement occurs among strata at the end of a sampling interval, survival rates of individuals from the same stratum are likely to be equal. However, if movement occurs between sampling periods and survival rates of individuals from the same stratum are not the same, estimates of stratum survival can be confounded with estimates of movement causing both estimates to be biased. Monte Carlo simulations were made of a three-sample model for a population with two strata using SURVIV. When differences were created in transition-specific survival rates for survival rates from the same stratum, relative bias was <2% in estimates of stratum survival and capture rates but relative bias in movement rates was much higher and varied. The magnitude of the relative bias in the movement estimate depended on the relative difference between the transition-specific survival rates and the corresponding stratum survival rate. The direction of the bias in movement rate estimates was opposite to the direction of this difference. Increases in relative bias due to increasing heterogeneity in probabilities of survival, movement and capture were small except when survival and capture probabilities were positively correlated within individuals. 相似文献

4.

Prior distributions for stratified capture-recapture models

J. A. Dupuis 《Journal of applied statistics》2002,29(1):225-237

We consider the Arnason-Schwarz model, usually used to estimate survival and movement probabilities from capture-recapture data. A missing data structure of this model is constructed which allows a clear separation of information relative to capture and relative to movement. Extensions of the Arnason-Schwarz model are considered. For example, we consider a model that takes into account both the individual migration history and the individual reproduction history. Biological assumptions of these extensions are summarized via a directed graph. Owing to missing data, the posterior distribution of parameters is numerically intractable. To overcome those computational difficulties we advocate a Gibbs sampling algorithm that takes advantage of the missing data structure inherent in capture-recapture models. Prior information on survival, capture and movement probabilities typically consists of a prior mean and of a prior 95% credible confidence interval. Dirichlet distributions are used to incorporate some prior information on capture, survival probabilities, and movement probabilities. Finally, the influence of the prior on the Bayesian estimates of movement probabilities is examined. 相似文献

5.

样本选择模型的一个简单半参数估计量

下载免费PDF全文

王亚峰《统计研究》2012,29(2):88-93

本文发展了一个针对样本选择模型的两阶段半参数估计量,其首先在第一阶段基于对数欧几里得分布差异测度估计离散选择概率,进而在第二阶段利用非参数sieve方法估计一个包含参数和非参数部分的部分线性模型以得到模型参数的估计。相对于文献中已有的半参数估计量,该估计量的计算更加简便,且计算负担相对较小。我们说明了该半参数估计量的一致性和渐近正态性,同时给出了其渐近方差的计算公式。蒙特卡洛模拟结果符合我们的理论结论。相似文献

6.

Tests of symmetry with one-sided alternatives in three-way contingency tables

Ping Ye Bhaskar Bhattacharya 《Statistical Papers》2011,52(1):33-51

For a two-dimensional contingency table of probabilities, the concept of symmetry around the main diagonal is well defined. Statistical hypothesis test of symmetry versus positive bias have also been explored. For tables of higher (three or more) dimensions, however, different concepts of symmetry are available. In this study, we consider statistical inference procedures of symmetry in partial tables versus various biases in three-dimensional tables. We find the maximum likelihood estimates of the cell probabilities and the asymptotic distribution of the likelihood ratio test statistic in each case. Simulation studies are used to investigate the sizes and powers of the tests. The methodologies developed are applied on real data sets. 相似文献

7.

Non-parametric estimation of population size from capture–recapture data when the capture probability depends on a covariate

Richard Huggins Wen-Han Hwang 《Journal of the Royal Statistical Society. Series C, Applied statistics》2007,56(4):429-443

Summary. In capture–recapture experiments the capture probabilities may depend on individual covariates such as an individual's weight or age. Typically this dependence is modelled through simple parametric functions of the covariates. Here we first demonstrate that misspecification of the model can produce biased estimates and subsequently develop a non-parametric procedure to estimate the functional relationship between the probability of capture and a single covariate. This estimator is then incorporated in a Horvitz–Thompson estimator to estimate the size of the population. The resulting estimators are evaluated in a simulation study and applied to a data set on captures of the Mountain Pygmy Possum. 相似文献

8.

Discriminant analysis with stratified prior probabilities

Berry Wilson 《统计学通讯:理论与方法》2013,42(5):1283-1295

This study investigates the use of stratification to improve discrimination when prior probabilities vary across strata of a population of interest. Sources of heterogeneity in prior probabilities include differences in geographic locale, age differences in the population studied, or differences in the time component of the data collected. The article suggests using logistic regression both to identify the underlying stratification and to estimate prior probabilities. A simulation study compares misclassification rates under two alternative stratification schemes with the traditional discriminant approach that ignores stratification in favor of pooled prior estimates. The simulations show that large asymptotic gains can be realized by stratification, and that these gains can be realized in finite samples, given moderate differences in prior probabilities. 相似文献

9.

Using a correlated probit model approximation to estimate the variance for binary matched pairs 总被引：1，自引：1，他引：0

Waddington D. Thompson R. 《Statistics and Computing》2004,14(2):83-90

A correlated probit model approximation for conditional probabilities (Mendell and Elston 1974) is used to estimate the variance for binary matched pairs data by maximum likelihood. Using asymptotic data, the bias of the estimates is shown to be small for a wide range of intra-class correlations and incidences. This approximation is also compared with other recently published, or implemented, improved approximations. For the small sample examples presented, it shows a substantial advantage over other approximations. The method is extended to allow covariates for each observation, and fitting by iteratively reweighted least squares. 相似文献

10.

Bayesian analysis for incomplete multi-way contingency tables with nonignorable nonresponse

Yousung Park 《Journal of applied statistics》2010,37(9):1439-1453

We propose Bayesian methods with five types of priors to estimate cell probabilities in an incomplete multi-way contingency table under nonignorable nonresponse. In this situation, the maximum likelihood (ML) estimates often fall in the boundary solution, causing the ML estimates to become unstable. To deal with such a multi-way table, we present an EM algorithm which generalizes the previous algorithm used for incomplete one-way tables. Three of the five types of priors were previously introduced while the other two are newly proposed to reflect different response patterns between respondents and nonrespondents. Data analysis and simulation studies show that Bayesian estimates based on the old three priors can be worse than the ML regardless of occurrence of boundary solution, contrary to previous studies. The Bayesian estimates from the two new priors are most preferable when a boundary solution occurs. We provide an illustrating example using data for a study of the relationship between a mother's smoking and her newborn's weight. 相似文献

11.

Estimation of prolongation of hospital stay attributable to nosocomial infections: New approaches based on multistate models

Gabi Schulgen Martin Schumacher 《Lifetime data analysis》1996,2(3):219-240

Evaluation of the impact of nosocomial infection on duration of hospital stay usually relies on estimates obtained in prospective cohort studies. However, the statistical methods used to estimate the extra length of stay are usually not adequate. A naive comparison of duration of stay in infected and non-infected patients is not adequate to estimate the extra hospitalisation time due to nosocomial infections. Matching for duration of stay prior to infection can compensate in part for the bias of ad hoc methods. New model-based approaches have been developed to estimate the excess length of stay. It will be demonstrated that statistical models based on multivariate counting processes provide an appropriate framework to analyse the occurrence and impact of nosocomial infections. We will propose and investigate new approaches to estimate the extra time spent in hospitals attributable to nosocomial infections based on functionals of the transition probabilities in multistate models. Additionally, within the class of structural nested failure time models an alternative approach to estimate the extra stay due to nosocomial infections is derived. The methods are illustrated using data from a cohort study on 756 patients admitted to intensive care units at the University Hospital in Freiburg. 相似文献

12.

Two-stage approaches to the analysis of occupancy data I: the homogeneous case (analysis of occupancy data)

Natalie Karavarsamis Richard M. Huggins 《统计学通讯:理论与方法》2020,49(19):4751-4761

Abstract

Occupancy models are used in statistical ecology to estimate species dispersion. The two components of an occupancy model are the detection and occupancy probabilities, with the main interest being in the occupancy probabilities. We show that for the homogeneous occupancy model there is an orthogonal transformation of the parameters that gives a natural two-stage inference procedure based on a conditional likelihood. We then extend this to a partial likelihood that gives explicit estimators of the model parameters. By allowing the separate modeling of the detection and occupancy probabilities, the extension of the two-stage approach to more general models has the potential to simplify the computational routines used there. 相似文献

13.

Survival estimation and the effects of dependency among animals

Joel A. Schmutz David H. Ward James S. Sedinger Eric A. Rexstad 《Journal of applied statistics》1995,22(5):673-682

Survival models assume that fates of individuals are independent, yet the robustness of this assumption has been poorly quantified. We examine how empirically derived estimates of the variance of survival rates are affected by dependency in survival probability among individuals. We used Monte Carlo simulations to generate known amounts of dependency among pairs of individuals and analyzed these data with Kaplan-Meier and Cormack-Jolly-Seber models. Dependency significantly increased these empirical variances as compared to theoretically derived estimates of variance from the same populations. Using resighting data from 168 pairs of black brant ( Branta bernicla nigricans ), we used a resampling procedure and program RELEASE to estimate empirical and mean theoretical variances. We estimated that the relationship between paired individuals caused the empirical variance of the survival rate to be 155% larger than the empirical variance for unpaired individuals. Monte Carlo simulations and use of this resampling strategy can provide investigators with information on how robust their data are to this common assumption of independent survival probabilities. 相似文献

14.

Generalised Linear Models Incorporating Population Level Information: An Empirical Likelihood Based Approach

Chaudhuri S Handcock MS Rendall MS 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2008,70(2):311-328

In many situations information from a sample of individuals can be supplemented by population level information on the relationship between a dependent variable and explanatory variables. Inclusion of the population level information can reduce bias and increase the efficiency of the parameter estimates.Population level information can be incorporated via constraints on functions of the model parameters. In general the constraints are nonlinear making the task of maximum likelihood estimation harder. In this paper we develop an alternative approach exploiting the notion of an empirical likelihood. It is shown that within the framework of generalised linear models, the population level information corresponds to linear constraints, which are comparatively easy to handle. We provide a two-step algorithm that produces parameter estimates using only unconstrained estimation. We also provide computable expressions for the standard errors. We give an application to demographic hazard modelling by combining panel survey data with birth registration data to estimate annual birth probabilities by parity. 相似文献

15.

Survival estimation and the effects of dependency among animals

Joel A. Schmutz David H. Ward James S. Sedinger Eric A. Rexstad 《Journal of applied statistics》1995,22(5-6):673-682

Survival models assume that fates of individuals are independent, yet the robustness of this assumption has been poorly quantified. We examine how empirically derived estimates of the variance of survival rates are affected by dependency in survival probability among individuals. We used Monte Carlo simulations to generate known amounts of dependency among pairs of individuals and analyzed these data with Kaplan-Meier and Cormack-Jolly-Seber models. Dependency significantly increased these empirical variances as compared to theoretically derived estimates of variance from the same populations. Using resighting data from 168 pairs of black brant ( Branta bernicla nigricans ), we used a resampling procedure and program RELEASE to estimate empirical and mean theoretical variances. We estimated that the relationship between paired individuals caused the empirical variance of the survival rate to be 155% larger than the empirical variance for unpaired individuals. Monte Carlo simulations and use of this resampling strategy can provide investigators with information on how robust their data are to this common assumption of independent survival probabilities. 相似文献

16.

Confidence limits for estimates of totals from stratified samples, with application to medicare Part B overpayment audits

Donna L. Mohr 《Journal of applied statistics》2005,32(7):757-769

Superpopulation models are proposed that should be appropriate for modelling sample-based audits of Medicare payments and other overpayment situations. Simulations are used to estimate the coverage probabilities of confidence intervals formed using the standard Stratified Expansion and Combined Ratio estimators of the total. Despite severe departures from the usual model of normal deviations, these methods have actual coverage probabilities reasonably close to the nominal level specified by the US government's sampling guidelines. An exception occurs when all claims from a single sampling unit are either completely allowed, or completely denied, and for this situation an alternative is explored. A balanced sampling design is also examined, but shown to make no improvement over ordinary stratified samples used in conjunction with ratio estimates. 相似文献

17.

Bayesian credible intervals for response surface optima

Richard J. Fox David ElgartS. Christopher Davis 《Journal of statistical planning and inference》2009

In response surface methodology, one is usually interested in estimating the optimal conditions based on a small number of experimental runs which are designed to optimally sample the experimental space. Typically, regression models are constructed from the experimental data and interrogated in order to provide a point estimate of the independent variable settings predicted to optimize the response. Unfortunately, these point estimates are rarely accompanied with uncertainty intervals. Though classical frequentist confidence intervals can be constructed for unconstrained quadratic models, higher order, constrained or nonlinear models are often encountered in practice. Existing techniques for constructing uncertainty estimates in such situations have not been implemented widely, due in part to the need to set adjustable parameters or because of limited or difficult applicability to constrained or nonlinear problems. To address these limitations a Bayesian method of determining credible intervals for response surface optima was developed. The approach shows good coverage probabilities on two test problems, is straightforward to implement and is readily applicable to the kind of constrained and/or nonlinear problems that frequently appear in practice. 相似文献

18.

Rule generation for categorical time series with Markov assumptions

Christian H. Weiß 《Statistics and Computing》2011,21(1):1-16

Several procedures of sequential pattern analysis are designed to detect frequently occurring patterns in a single categorical time series (episode mining). Based on these frequent patterns, rules are generated and evaluated, for example, in terms of their confidence. The confidence value is commonly interpreted as an estimate of a conditional probability, so some kind of stochastic model has to be assumed. The model is identified as a variable length Markov model. With this assumption, the usual confidences are maximum likelihood estimates of the transition probabilities of the Markov model. We discuss possibilities of how to efficiently fit an appropriate model to the data. Based on this model, rules are formulated. It is demonstrated that this new approach generates noticeably less and more reliable rules. 相似文献

19.

Improved population-based probability of developing cancer when direct estimates of the cancer-free population are available

Simonetti A Mariotto A Krapcho M Feuer EJ 《Lifetime data analysis》2012,18(3):284-301

Age-conditional probabilities of developing a first cancer represent the transition from being cancer-free to developing a first cancer. Natural inputs into their calculation are rates of first cancer per person-years alive and cancer-free. However these rates are not readily available because they require information on the cancer-free population. Instead rates of first cancer per person-years alive, calculated using as denominator the mid-year populations, available from census data, can be easily calculated from cancer registry data. Methods have been developed to estimate age-conditional probabilities of developing cancer based on these easily available rates per person-years alive that do not directly account for the cancer-free population. In the last few years models (Merrill et al., Int J Epidemiol 29(2):197-207, 2000; Mariotto et al., SEER Cancer Statistics Review, 2002; Clegg et al., Biometrics 58(3):684-688, 2002; Gigli et al., Stat Methods Med Res 15(3):235-253, 2006, and software (ComPrev:Complete Prevalence Software, Version 1.0, 2005) have been developed that allow estimation of cancer prevalence (DevCan: Probability of Developing or Dying of Cancer Software, Version 6.0, 2005). Estimates of population-based cancer prevalence allows for the estimation of the cancer-free population and consequently of rates per person-years alive and cancer-free. In this paper we present a method that directly estimates the age-conditional probabilities of developing a first cancer using rates per person-years alive and cancer-free obtained from prevalence estimates. We explore conditions when the previous and the new estimators give similar or different values using real data from the Surveillance, Epidemiology and End Results (SEER) program. 相似文献

20.

A fast Monte Carlo expectation–maximization algorithm for estimation in latent class model analysis with an application to assess diagnostic accuracy for cervical neoplasia in women with atypical glandular cells

Le Kang Kathleen Darcy James Kauderer Shu-Yuan Liao 《Journal of applied statistics》2013,40(12):2699-2719

In this article, we use a latent class model (LCM) with prevalence modeled as a function of covariates to assess diagnostic test accuracy in situations where the true disease status is not observed, but observations on three or more conditionally independent diagnostic tests are available. A fast Monte Carlo expectation–maximization (MCEM) algorithm with binary (disease) diagnostic data is implemented to estimate parameters of interest; namely, sensitivity, specificity, and prevalence of the disease as a function of covariates. To obtain standard errors for confidence interval construction of estimated parameters, the missing information principle is applied to adjust information matrix estimates. We compare the adjusted information matrix-based standard error estimates with the bootstrap standard error estimates both obtained using the fast MCEM algorithm through an extensive Monte Carlo study. Simulation demonstrates that the adjusted information matrix approach estimates the standard error similarly with the bootstrap methods under certain scenarios. The bootstrap percentile intervals have satisfactory coverage probabilities. We then apply the LCM analysis to a real data set of 122 subjects from a Gynecologic Oncology Group study of significant cervical lesion diagnosis in women with atypical glandular cells of undetermined significance to compare the diagnostic accuracy of a histology-based evaluation, a carbonic anhydrase-IX biomarker-based test and a human papillomavirus DNA test. 相似文献