首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In reliability and biometry, it is common practice to choose a failure model by first assessing the failure rate function subjectively, and then invoking the well known exponentiation formula. The derivation of this formula is based on the assumption that the underlying failure distribution be absolutely continuous. Thus, implicit in the above approach is the understanding that the selected failure distribution will be absolutely continuous. The purpose of this note is to point out that the absolute continuity may fail when the failure rate is assessed conditionally, and in particular when it is conditioned on certain types of covariates, called internal covariates. When such is the case, the exponentiation formula should not be used.  相似文献   

2.
Let X, T, Y be random vectors such that the distribution of Y conditional on covariates partitioned into the vectors X = x and T = t is given by f(y; x, ), where = (, (t)). Here is a parameter vector and (t) is a smooth, real–valued function of t. The joint distribution of X and T is assumed to be independent of and . This semiparametric model is called conditionally parametric because the conditional distribution f(y; x, ) of Y given X = x, T = t is parameterized by a finite dimensional parameter = (, (t)). Severini and Wong (1992. Annals of Statistics 20: 1768–1802) show how to estimate and (·) using generalized profile likelihoods, and they also provide a review of the literature on generalized profile likelihoods. Under specified regularity conditions, they derive an asymptotically efficient estimator of and a uniformly consistent estimator of (·). The purpose of this paper is to provide a short tutorial for this method of estimation under a likelihood–based model, reviewing results from Stein (1956. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, University of California Press, Berkeley, pp. 187–196), Severini (1987. Ph.D Thesis, The University of Chicago, Department of Statistics, Chicago, Illinois), and Severini and Wong (op. cit.).  相似文献   

3.
Over the last few years many studies have been carried out in Italy to identify reliable small area labour force indicators. Considering the rotated sample design of the Italian Labour Force Survey, the aim of this work is to derive a small area estimator which borrows strength from individual temporal correlation, as well as from related areas. Two small area estimators are derived as extensions of an estimation strategies proposed by Fuller (1990) for partial overlap samples. A simulation study is carried out to evaluate the gain in efficiency provided by our solutions. Results obtained for different levels of autocorrelation between repeated measurements on the same outcome and different population settings show that these estimators are always more reliable than the traditional composite one, and in some circumstances they are extremely advantageous.The present paper is financially supported by Murst-Cofin (2001) Lutilizzo di informazioni di tipo amministrativo nella stima per piccole aree e per sottoinsiemi della popolazione (National Coordinator Prof. Carlo Filippucci).  相似文献   

4.
Multi-layer perceptrons (MLPs), a common type of artificial neural networks (ANNs), are widely used in computer science and engineering for object recognition, discrimination and classification, and have more recently found use in process monitoring and control. Training such networks is not a straightforward optimisation problem, and we examine features of these networks which contribute to the optimisation difficulty.Although the original perceptron, developed in the late 1950s (Rosenblatt 1958, Widrow and Hoff 1960), had a binary output from each node, this was not compatible with back-propagation and similar training methods for the MLP. Hence the output of each node (and the final network output) was made a differentiable function of the network inputs. We reformulate the MLP model with the original perceptron in mind so that each node in the hidden layers can be considered as a latent (that is, unobserved) Bernoulli random variable. This maintains the property of binary output from the nodes, and with an imposed logistic regression of the hidden layer nodes on the inputs, the expected output of our model is identical to the MLP output with a logistic sigmoid activation function (for the case of one hidden layer).We examine the usual MLP objective function—the sum of squares—and show its multi-modal form and the corresponding optimisation difficulty. We also construct the likelihood for the reformulated latent variable model and maximise it by standard finite mixture ML methods using an EM algorithm, which provides stable ML estimates from random starting positions without the need for regularisation or cross-validation. Over-fitting of the number of nodes does not affect this stability. This algorithm is closely related to the EM algorithm of Jordan and Jacobs (1994) for the Mixture of Experts model.We conclude with some general comments on the relation between the MLP and latent variable models.  相似文献   

5.
In some situations the asymptotic distribution of a random function T n() that depends on a nuisance parameter is tractable when has known value. In that case it can be used as a test statistic, if suitably constructed, for some hypothesis. However, in practice, often needs to be replaced by an estimator S n. In this paper general results are given concerning the asymptotic distribution of T n(S n) that include special cases previously dealt with. In particular, some situations are covered where the usual likelihood theory is nonregular and extreme values are employed to construct estimators and test statistics.  相似文献   

6.
Edgoose  T.  Allison  L. 《Statistics and Computing》1999,9(4):269-278
General purpose un-supervised classification programs have typically assumed independence between observations in the data they analyse. In this paper we report on an extension to the MML classifier Snob which enables the program to take advantage of some of the extra information implicit in ordered datasets (such as time-series). Specifically the data is modelled as if it were generated from a first order Markov process with as many states as there are classes of observation. The state of such a process at any point in the sequence determines the class from which the corresponding observation is generated. Such a model is commonly referred to as a Hidden Markov Model. The MML calculation for the expected length of a near optimal two-part message stating a specific model of this type and a dataset given this model is presented. Such an estimate enables us to fairly compare models which differ in the number of classes they specify which in turn can guide a robust un-supervised search of the model space. The new program, tSnob, is tested against both synthetic data and a large real world dataset and is found to make unbiased estimates of model parameters and to conduct an effective search of the extended model space.  相似文献   

7.
Comparison of observed mortality with known, background, or standard rates has taken place for several hundred years. With the developments of regression models for survival data, an increasing interest has arisen in individualizing the standardisation using covariates of each individual. Also, account sometimes needs to be taken of random variation in the standard group.Emphasizing uses of the Cox regression model, this paper surveys a number of critical choices and pitfalls in this area. The methods are illustrated by comparing survival of liver patients after transplantation with survival after conservative treatment.  相似文献   

8.
In clustered survival settings where the clusters correspond to geographic regions, biostatisticians are increasingly turning to models with spatially distributed random effects. These models begin with spatially oriented frailty terms, but may also include further region-level terms in the parametrization of the baseline hazards or various covariate effects (as in a spatially-varying coefficients model). In this paper, we propose a multivariate conditionally autoregressive (MCAR) model as a mixing distribution for these random effects, as a way of capturing correlation across both the regions and the elements of the random effect vector for any particular region. We then extend this model to permit analysis of temporal cohort effects, where we use the term temporal cohort to mean a group of subjects all of whom were diagnosed with the disease of interest (and thus, entered the study) during the same time period (say, calendar year). We show how our spatiotemporal model may be efficiently fit in a hierarchical Bayesian framework implemented using Markov chain Monte Carlo (MCMC) computational techniques. We illustrate our approach in the context of county-level breast cancer data from 22 annual cohorts of women living in the state of Iowa, as recorded by the Surveillance, Epidemiology, and End Results (SEER) database. Hierarchical model comparison using the Deviance Information Criterion (DIC), as well as maps of the fitted county-level effects, reveal the benefit of our approach.  相似文献   

9.
We propose exploratory, easily implemented methods for diagnosing the appropriateness of an underlying copula model for bivariate failure time data, allowing censoring in either or both failure times. It is found that the proposed approach effectively distinguishes gamma from positive stable copula models when the sample is moderately large or the association is strong. Data from the Womens Health and Aging Study (WHAS, Guralnik et al., The Womenss Health and Aging Study: Health and Social Characterisitics of Older Women with Disability. National Institute on Aging: Bethesda, Mayland, 1995) are analyzed to demonstrate the proposed diagnostic methodology. The positive stable model gives a better overall fit to these data than the gamma frailty model, but it tends to underestimate association at the later time points. The finding is consistent with recent theory differentiating catastrophic from progressive disability onset in older adults. The proposed methods supply an interpretable quantity for copula diagnosis. We hope that they will usefully inform practitioners as to the reasonableness of their modeling choices.  相似文献   

10.
CHU  HUI-MAY  KUO  LYNN 《Statistics and Computing》1997,7(3):183-192
Bayesian methods for estimating the dose response curves with the one-hit model, the gamma multi-hit model, and their modified versions with Abbott's correction are studied. The Gibbs sampling approach with data augmentation and with the Metropolis algorithm is employed to compute the Bayes estimates of the potency curves. In addition, estimation of the relative additional risk and the virtually safe dose is studied. Model selection based on conditional predictive ordinates from cross-validated data is developed.  相似文献   

11.
Summary: We describe depth–based graphical displays that show the interdependence of multivariate distributions. The plots involve one–dimensional curves or bivariate scatterplots, so they are easier to interpret than correlation matrices. The correlation curve, modelled on the scale curve of Liu et al. (1999), compares the volume of the observed central regions with the volume under independence. The correlation DD–plot is the scatterplot of depth values under a reference distribution against depth values under independence. The area of the plot gives a measure of distance from independence. Correlation curve and DD-plot require an independence model as a baseline: Besides classical parametric specifications, a nonparametric estimator, derived from the randomization principle, is used. Combining data depth and the notion of quadrant dependence, quadrant correlation trajectories are obtained which allow simultaneous representation of subsets of variables. The properties of the plots for the multivariate normal distribution are investigated. Some real data examples are illustrated. *This work was completed with the support of Ca Foscari University.  相似文献   

12.
Read  Robert  Thomas  Lyn  Washburn  Alan 《Statistics and Computing》2000,10(3):245-252
Consider the random sampling of a discrete population. The observations, as they are collected one by one, are enhanced in that the probability mass associated with each observation is also observed. The goal is to estimate the population mean. Without this extra information about probability mass, the best general purpose estimator is the arithmetic average of the observations, XBAR. The issue is whether or not the extra information can be used to improve on XBAR. This paper examines the issues and offers four new estimators, each with its own strengths and liabilities. Some comparative performances of the four with XBAR are made.The motivating application is a Monte Carlo simulation that proceeds in two stages. The first stage independently samples n characteristics to obtain a configuration of some kind, together with a configuration probability p obtained, if desired, as a product of n individual probabilities. A relatively expensive calculation then determines an output X as a function of the configuration. A random sample of X could simply be averaged to estimate the mean output, but there are possibly more efficient estimators on account of the known configuration probabilities.  相似文献   

13.
In biomedical studies, interest often focuses on the relationship between patients characteristics or some risk factors and both quality of life and survival time of subjects under study. In this paper, we propose a simultaneous modelling of both quality of life and survival time using the observed covariates. Moreover, random effects are introduced into the simultaneous models to account for dependence between quality of life and survival time due to unobserved factors. EM algorithms are used to derive the point estimates for the parameters in the proposed model and profile likelihood function is used to estimate their variances. The asymptotic properties are established for our proposed estimators. Finally, simulation studies are conducted to examine the finite-sample properties of the proposed estimators and a liver transplantation data set is analyzed to illustrate our approaches.  相似文献   

14.
In large epidemiological studies, budgetary or logistical constraints will typically preclude study investigators from measuring all exposures, covariates and outcomes of interest on all study subjects. We develop a flexible theoretical framework that incorporates a number of familiar designs such as case control and cohort studies, as well as multistage sampling designs. Our framework also allows for designed missingness and includes the option for outcome dependent designs. Our formulation is based on maximum likelihood and generalizes well known results for inference with missing data to the multistage setting. A variety of techniques are applied to streamline the computation of the Hessian matrix for these designs, facilitating the development of an efficient software tool to implement a wide variety of designs.  相似文献   

15.
The generalized odds-rate class of regression models for time to event data is indexed by a non-negative constant and assumes thatg(S(t|Z)) = (t) + Zwhere g(s) = log(-1(s-) for > 0, g0(s) = log(- log s), S(t|Z) is the survival function of the time to event for an individual with qx1 covariate vector Z, is a qx1 vector of unknown regression parameters, and (t) is some arbitrary increasing function of t. When =0, this model is equivalent to the proportional hazards model and when =1, this model reduces to the proportional odds model. In the presence of right censoring, we construct estimators for and exp((t)) and show that they are consistent and asymptotically normal. In addition, we show that the estimator for is semiparametric efficient in the sense that it attains the semiparametric variance bound.  相似文献   

16.
Computing location depth and regression depth in higher dimensions   总被引:3,自引:0,他引:3  
The location depth (Tukey 1975) of a point relative to a p-dimensional data set Z of size n is defined as the smallest number of data points in a closed halfspace with boundary through . For bivariate data, it can be computed in O(nlogn) time (Rousseeuw and Ruts 1996). In this paper we construct an exact algorithm to compute the location depth in three dimensions in O(n2logn) time. We also give an approximate algorithm to compute the location depth in p dimensions in O(mp3+mpn) time, where m is the number of p-subsets used.Recently, Rousseeuw and Hubert (1996) defined the depth of a regression fit. The depth of a hyperplane with coefficients (1,...,p) is the smallest number of residuals that need to change sign to make (1,...,p) a nonfit. For bivariate data (p=2) this depth can be computed in O(nlogn) time as well. We construct an algorithm to compute the regression depth of a plane relative to a three-dimensional data set in O(n2logn) time, and another that deals with p=4 in O(n3logn) time. For data sets with large n and/or p we propose an approximate algorithm that computes the depth of a regression fit in O(mp3+mpn+mnlogn) time. For all of these algorithms, actual implementations are made available.  相似文献   

17.
In studies of the fracture toughness of irradiated weld metal, specimens are subjected to an increasing load. The test on any one specimen might be terminated by choice or because the specimen ruptures. Prior to termination, ductile tearing might or might not have occurred. The situation is thus basically one of competing risks, with different types of termination, but there are additional features. The major purpose of statistical analysis is to estimate probabilities concerning the values of toughness and crack length. The analysis has been based on a model developed for the joint survivor function of these quantities.  相似文献   

18.
As a simple model for browsing the World Wide Web, we consider Markov chains with the option of moving back to the previous state. We develop an algorithm which uses back buttons to achieve essentially any limiting distribution on the state space. This corresponds to spending the desired total fraction of time at each web page. On finite state spaces, our algorithm always succeeds. On infinite state spaces the situation is more complicated, and is related to both the tail behaviour of the distributions, and the properties of convolution equations.  相似文献   

19.
A new area of research interest is the computation of exact confidence limits or intervals for a scalar parameter of interest from discrete data by inverting a hypothesis test based on a studentized test statistic. See, for example, Chan and Zhang (1999), Agresti and Min (2001) and Agresti (2003) who deal with a difference of binomial probabilities and Agresti and Min (2002) who deal with an odds ratio. However, neither (1) a detailed analysis of the computational issues involved nor (2) a reliable method of computation that deals effectively with these issues is currently available. In this paper we solve these two problems for a very broad class of discrete data models. We suppose that the distribution of the data is determined by (,) where is a nuisance parameter vector. We also consider six different studentized test statistics. Our contributions to (1) are as follows. We show that the P-value resulting from the hypothesis test, considered as a function of the null-hypothesized value of , has both jump and drop discontinuities. Numerical examples are used to demonstrate that these discontinuities lead to the failure of simple-minded approaches to the computation of the confidence limit or interval. We also provide a new method for efficiently computing the set of all possible locations of these discontinuities. Our contribution to (2) is to provide a new and reliable method of computing the confidence limit or interval, based on the knowledge of this set.  相似文献   

20.
Jerome H. Friedman and Nicholas I. Fisher   总被引:1,自引:0,他引:1  
Many data analytic questions can be formulated as (noisy) optimization problems. They explicitly or implicitly involve finding simultaneous combinations of values for a set of (input) variables that imply unusually large (or small) values of another designated (output) variable. Specifically, one seeks a set of subregions of the input variable space within which the value of the output variable is considerably larger (or smaller) than its average value over the entire input domain. In addition it is usually desired that these regions be describable in an interpretable form involving simple statements (rules) concerning the input values. This paper presents a procedure directed towards this goal based on the notion of patient rule induction. This patient strategy is contrasted with the greedy ones used by most rule induction methods, and semi-greedy ones used by some partitioning tree techniques such as CART. Applications involving scientific and commercial data bases are presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号