首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
The paper considers the problem of finding optimum strata boundaries when sample sizes to different strata are allocated in proportion to the strata totals of the auxiliary variable. This variable is also treated as the stratification variable. Minimal equations, solutions to which provide the optimum boundaries, have been obtained. Because of the implicit nature of these equations their exact solutions cannot be obtained. Therefore, methods of obtaining their approximate solutions have been presented. A lim¬iting expression for the variance of the estimate of population mean, as the number of strata tend to become large, has also been obtained.  相似文献   

2.
Suppose it is desired to partition a distribution into k groups (classes) using squared error or absolute error as the mea¬sure of information retained. An algorithm to obtain the optimal boundaries (or class probabilities) is giTen. For the case of squared error optimal class probabilities were obtained for k = 2 to 15 for beta (for various values of the parameters), chi-square (12 d.f.) exponential, normals and uniform distributions. Results obtained are compared and analysed in light of existing papers, Special attention is given to the case k =5, corresponding to the assignment of the letter grades A, B, C, D9 P in courses, and to the case k = 9 corresponding to stanines.  相似文献   

3.
A clinical risk classification system is an important component of a treatment decision algorithm. A measure used to assess the strength of a risk classification system is discrimination, and when the outcome is survival time, the most commonly applied global measure of discrimination is the concordance probability. The concordance probability represents the pairwise probability of lower patient risk given longer survival time. The c-index and the concordance probability estimate have been used to estimate the concordance probability when patient-specific risk scores are continuous. In the current paper, the concordance probability estimate and an inverse probability censoring weighted c-index are modified to account for discrete risk scores. Simulations are generated to assess the finite sample properties of the concordance probability estimate and the weighted c-index. An application of these measures of discriminatory power to a metastatic prostate cancer risk classification system is examined.  相似文献   

4.
The likelihood equations based on a progressively Type II censored sample from a Type I generalized logistic distribution do not provide explicit solutions for the location and scale parameters. We present a simple method of deriving explicit estimators by approximating the likelihood equations appropriately. We examine numerically the bias and variance of these estimators and show that these estimators are as efficient as the maximum likelihood estimators (MLEs). The probability coverages of the pivotal quantities (for location and scale parameters) based on asymptotic normality are shown to be unsatisfactory, especially when the effective sample size is small. Therefore we suggest using unconditional simulated percentage points of these pivotal quantities for the construction of confidence intervals. A wide range of sample sizes and progressive censoring schemes have been considered in this study. Finally, we present a numerical example to illustrate the methods of inference developed here.  相似文献   

5.
The Pearson chi‐squared statistic for testing the equality of two multinomial populations when the categories are nominal is much less appropriate for ordinal categories. Test statistics typically used in this context are based on scorings of the ordinal levels, but the results of these tests are highly dependent on the choice of scores. The authors propose a test which naturally modifies the Pearson chi‐squared statistic to incorporate the ordinal information. The proposed test statistic does not depend on the scores and under the null hypothesis of equality of populations, it is asymptotically equivalent to the likelihood ratio test against the alternative of two‐sided likelihood ratio ordering.  相似文献   

6.
7.
The number of solutions of the system of the log-likelihood equations for the three-parameter case is still an open problem. Several methods have been developed for finding the solutions. In this article we present a program in Mathematica that can find all the solutions of the system of equations. Furthermore, we examine the case where the global maximum appears at the boundary of the domain of the log-likelihood function and we prove that any consistent estimators appear at the interior with probability tending to one.  相似文献   

8.
Summary. The strength of statistical evidence is measured by the likelihood ratio. Two key performance properties of this measure are the probability of observing strong misleading evidence and the probability of observing weak evidence. For the likelihood function associated with a parametric statistical model, these probabilities have a simple large sample structure when the model is correct. Here we examine how that structure changes when the model fails. This leads to criteria for determining whether a given likelihood function is robust (continuing to perform satisfactorily when the model fails), and to a simple technique for adjusting both likelihoods and profile likelihoods to make them robust. We prove that the expected information in the robust adjusted likelihood cannot exceed the expected information in the likelihood function from a true model. We note that the robust adjusted likelihood is asymptotically fully efficient when the working model is correct, and we show that in some important examples this efficiency is retained even when the working model fails. In such cases the Bayes posterior probability distribution based on the adjusted likelihood is robust, remaining correct asymptotically even when the model for the observable random variable does not include the true distribution. Finally we note a link to standard frequentist methodology—in large samples the adjusted likelihood functions provide robust likelihood-based confidence intervals.  相似文献   

9.
For nearly any challenging scientific problem evaluation of the likelihood is problematic if not impossible. Approximate Bayesian computation (ABC) allows us to employ the whole Bayesian formalism to problems where we can use simulations from a model, but cannot evaluate the likelihood directly. When summary statistics of real and simulated data are compared??rather than the data directly??information is lost, unless the summary statistics are sufficient. Sufficient statistics are, however, not common but without them statistical inference in ABC inferences are to be considered with caution. Previously other authors have attempted to combine different statistics in order to construct (approximately) sufficient statistics using search and information heuristics. Here we employ an information-theoretical framework that can be used to construct appropriate (approximately sufficient) statistics by combining different statistics until the loss of information is minimized. We start from a potentially large number of different statistics and choose the smallest set that captures (nearly) the same information as the complete set. We then demonstrate that such sets of statistics can be constructed for both parameter estimation and model selection problems, and we apply our approach to a range of illustrative and real-world model selection problems.  相似文献   

10.
This paper describes a new program, CORRECT, which takes words rejected by the Unix® SPELL program, proposes a list of candidate corrections, and sorts them by probability score. The probability scores are the novel contribution of this work. They are based on a noisy channel model. It is assumed that the typist knows what words he or she wants to type but some noise is added on the way to the keyboard (in the form of typos and spelling errors). Using a classic Bayesian argument of the kind that is popular in recognition applications, especially speech recognition (Jelinek, 1985), one can often recover the intended correction,c, from a typo,t, by finding the correctionc that maximizesPr(c) Pr(t/c). The first factor,Pr(c), is a prior model of word probabilities; the second factor,Pr(t/c), is a model of the noisy channel that accounts for spelling transformations on letter sequences (insertions, deletions, substitutions and reversals). Both sets of probabilities were estimated using data collected from the Associated Press (AP) newswire over 1988 and 1989 as a training set. The AP generates about 1 million words and 500 typos per week.In evaluating the program, we found that human judges were extremely reluctant to cast a vote given only the information available to the program, and that they were much more comfortable when they could see a concordance line or two. The second half of this paper discusses some very simple methods of modeling the context usingn-gram statistics. Althoughn-gram methods are much too simple (compared with much more sophisticated methods used in artificial intelligence and natural language processing), we have found that even these very simple methods illustrate some very interesting estimation problems that will almost certainly come up when we consider more sophisticated models of contexts. The problem is how to estimate the probability of a context that we have not seen. We compare several estimation techniques and find that some are useless. Fortunately, we have found that the Good-Turing method provides an estimate of contextual probabilities that produces a significant improvement in program performance. Context is helpful in this application, but only if it is estimated very carefully.At this point, we have a number of different knowledge sources—the prior, the channel and the context—and there will certainly be more in the future. In general, performance will be improved as more and more knowledge sources are added to the system, as long as each additional knowledge source provides some new (independent) information. As we shall see, it is important to think more carefully about combination rules, especially when there are a large number of different knowledge sources.  相似文献   

11.
Summary. To construct an optimal estimating function by weighting a set of score functions, we must either know or estimate consistently the covariance matrix for the individual scores. In problems with high dimensional correlated data the estimated covariance matrix could be unreliable. The smallest eigenvalues of the covariance matrix will be the most important for weighting the estimating equations, but in high dimensions these will be poorly determined. Generalized estimating equations introduced the idea of a working correlation to minimize such problems. However, it can be difficult to specify the working correlation model correctly. We develop an adaptive estimating equation method which requires no working correlation assumptions. This methodology relies on finding a reliable approximation to the inverse of the variance matrix in the quasi-likelihood equations. We apply a multivariate generalization of the conjugate gradient method to find estimating equations that preserve the information well at fixed low dimensions. This approach is particularly useful when the estimator of the covariance matrix is singular or close to singular, or impossible to invert owing to its large size.  相似文献   

12.
中国森林火灾风险统计分析   总被引:3,自引:0,他引:3  
依据信息扩散理论,从森林火灾次数、受灾范围和致灾程度的角度对1998—2011年中国森林火灾风险进行了统计分析。对森林火灾次数选取一般与较大森林火灾次数、重大森林火灾次数及各自在森林火灾总次数中所占比例三项指标,进行动态趋势和风险概率分析。对受灾范围和致灾程度选取森林火灾损失率、受害率和损受比三个指标观察其动态变化并计算风险概率。研究的主要结论有:重大森林火灾次数和其在总次数中所占比例两指标的波动幅度和频率较为一致;一般和较大森林火灾随着次数的累积,对森林的危害程度也会较大;森林火灾损失率和受害率的值都相对较高时,损受比不一定高;一年中森林火灾损失率、受害率、损受比的值在各自均值以上的概率都约为60%左右。  相似文献   

13.
This paper considers approximations to carrier-borne epidemic processes, including when the immigration of carriers or susceptibles is allowed. The partial differential equations for the probability generating functions of the approximating processes are derived, and their solutions are obtained.  相似文献   

14.
A modification of the sequential probability ratio test is proposed in which Wald's parallel boundaries are broken at some preassigned point of the sample number axis and Anderson's converging boundaries are used prior to that. Read's partial sequential probability ratio test can be considered as a special case of the proposed procedure. As far as 'the maximum average sample number reducing property is concerned, the procedure is as good as Anderson's modified sequential probability ratio test.  相似文献   

15.
In situations where individuals are screened for an infectious disease or other binary characteristic and where resources for testing are limited, group testing can offer substantial benefits. Group testing, where subjects are tested in groups (pools) initially, has been successfully applied to problems in blood bank screening, public health, drug discovery, genetics, and many other areas. In these applications, often the goal is to identify each individual as positive or negative using initial group tests and subsequent retests of individuals within positive groups. Many group testing identification procedures have been proposed; however, the vast majority of them fail to incorporate heterogeneity among the individuals being screened. In this paper, we present a new approach to identify positive individuals when covariate information is available on each. This covariate information is used to structure how retesting is implemented within positive groups; therefore, we call this new approach "informative retesting." We derive closed-form expressions and implementation algorithms for the probability mass functions for the number of tests needed to decode positive groups. These informative retesting procedures are illustrated through a number of examples and are applied to chlamydia and gonorrhea testing in Nebraska for the Infertility Prevention Project. Overall, our work shows compelling evidence that informative retesting can dramatically decrease the number of tests while providing accuracy similar to established non-informative retesting procedures.  相似文献   

16.
In this paper a measure of proximity of distributions, when moments are known, is proposed. Based on cases where the exact distribution is known, evidence is given that the proposed measure is accurate to evaluate the proximity of quantiles (exact vs. approximated). The measure may be applied to compare asymptotic and near-exact approximations to distributions, in situations where although being known the exact moments, the exact distribution is not known or the expression for its probability density function is not known or too complicated to handle. In this paper the measure is applied to compare newly proposed asymptotic and near-exact approximations to the distribution of the Wilks Lambda statistic when both groups of variables have an odd number of variables. This measure is also applied to the study of several cases of telescopic near-exact approximations to the exact distribution of the Wilks Lambda statistic based on mixtures of generalized near-integer gamma distributions.  相似文献   

17.
Sheffer polynomials are solutions of certain systems of operator equations. Difference equations, which frequently occur in path enumeration, belong in that class. To find representations of the solutions, the restriction on the paths has to be in the form of boundaries. Such problems have applications in two-sample tests. We also consider paths with more than two step vectors. The gambler's ruin problem illustrates the method. If paths with a given area underneath are counted, q-binomial coefficients come into play. Eulerian Sheffer sequence solve some of such problems.  相似文献   

18.
SUMMARY A number of models have been examined for modelling probability based on rankings. Most prominent among these are the gamma and normal probability models. The accuracy of these models in predicting the outcomes of horse races is investigated in this paper. The parameters of these models are estimated by the maximum likelihood method, using the information on win pool fractions. These models are used to estimate the probabilities that race entrants finish second or third in a race. These probabilities are then compared with the corresponding objective probabilities estimated from actual race outcomes. The data are obtained from over 15 000 races. it is found that all the models tend to overestimate the probability of a horse finishing second or third when the horse has a high probability of such a result, but underestimate the probability of a horse finishing second or third when this probability is low.  相似文献   

19.
Dichotomization is the transformation of a continuous outcome (response) to a binary outcome. This approach, while somewhat common, is harmful from the viewpoint of statistical estimation and hypothesis testing. We show that this leads to loss of information, which can be large. For normally distributed data, this loss in terms of Fisher's information is at least 1-2/pi (or 36%). In other words, 100 continuous observations are statistically equivalent to 158 dichotomized observations. The amount of information lost depends greatly on the prior choice of cut points, with the optimal cut point depending upon the unknown parameters. The loss of information leads to loss of power or conversely a sample size increase to maintain power. Only in certain cases, for instance, in estimating a value of the cumulative distribution function and when the assumed model is very different from the true model, can the use of dichotomized outcomes be considered a reasonable approach.  相似文献   

20.
Regression analysis for competing risks data can be based on generalized estimating equations. For the case with right censored data, pseudo-values were proposed to solve the estimating equations. In this article we investigate robustness of the pseudo-values against violation of the assumption that the probability of not being lost to follow-up (un-censored) is independent of the covariates. Modified pseudo-values are proposed which rely on a correctly specified regression model for the censoring times. Bias and efficiency of these methods are compared in a simulation study. Further illustration of the differences is obtained in an application to bone marrow transplantation data and a corresponding sensitivity analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号