首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 562 毫秒
1.
In longitudinal studies, an individual may potentially undergo a series of repeated recurrence events. The gap times, which are referred to as the times between successive recurrent events, are typically the outcome variables of interest. Various regression models have been developed in order to evaluate covariate effects on gap times based on recurrence event data. The proportional hazards model, additive hazards model, and the accelerated failure time model are all notable examples. Quantile regression is a useful alternative to the aforementioned models for survival analysis since it can provide great flexibility to assess covariate effects on the entire distribution of the gap time. In order to analyze recurrence gap time data, we must overcome the problem of the last gap time subjected to induced dependent censoring, when numbers of recurrent events exceed one time. In this paper, we adopt the Buckley–James-type estimation method in order to construct a weighted estimation equation for regression coefficients under the quantile model, and develop an iterative procedure to obtain the estimates. We use extensive simulation studies to evaluate the finite-sample performance of the proposed estimator. Finally, analysis of bladder cancer data is presented as an illustration of our proposed methodology.  相似文献   

2.
Research and operational applications in weather forecasting are reviewed, with emphasis on statistical issues. It is argued that the deterministic approach has dominated in weather forecasting, although weather forecasting is a probabilistic problem by nature. The reason has been the successful application of numerical weather prediction techniques over the 50 years since the introduction of computers. A gradual change towards utilization of more probabilistic methods has occurred over the last decade; in particular meteorological data assimilation, ensemble forecasting and post‐processing of model output have been influenced by ideas from statistics and control theory.  相似文献   

3.
At the research and development stage, decision-makers may wish to classify several competing designs with respect to a control (or standard) one. The classification problem may become very difficult when the products are highly reliable, since only a few (or even no) failures may be observed under normal use condition. The accelerated life test model resolves this difficulty by shortening the time of life testing and quickly provides life data of products. For highly-reliable products with a Weibull log-linear model, we propose a classification rule based on a locally optimal criterion. A suitable sampling plan based on this rule is also developed. The performance of this rule is compared with a pairwise comparison classification rule. It is shown that the sample sizes needed for the new rule are considerably lower than those needed for the pairwise comparison rule.  相似文献   

4.

Parameter reduction can enable otherwise infeasible design and uncertainty studies with modern computational science models that contain several input parameters. In statistical regression, techniques for sufficient dimension reduction (SDR) use data to reduce the predictor dimension of a regression problem. A computational scientist hoping to use SDR for parameter reduction encounters a problem: a computer prediction is best represented by a deterministic function of the inputs, so data comprised of computer simulation queries fail to satisfy the SDR assumptions. To address this problem, we interpret SDR methods sliced inverse regression (SIR) and sliced average variance estimation (SAVE) as estimating the directions of a ridge function, which is a composition of a low-dimensional linear transformation with a nonlinear function. Within this interpretation, SIR and SAVE estimate matrices of integrals whose column spaces are contained in the ridge directions’ span; we analyze and numerically verify convergence of these column spaces as the number of computer model queries increases. Moreover, we show example functions that are not ridge functions but whose inverse conditional moment matrices are low-rank. Consequently, the computational scientist should beware when using SIR and SAVE for parameter reduction, since SIR and SAVE may mistakenly suggest that truly important directions are unimportant.

  相似文献   

5.
Many fields of research need to classify individual systems based on one or more data series, which are obtained by sampling an unknown continuous curve with noise. In other words, the underlying process is an unknown function which the observed variables represent only imperfectly. Although functional logistic regression has many attractive features for this classification problem, this method is applicable only when the number of individuals to be classified (or available to estimate the model) is large compared to the number of curves sampled per individual.To overcome this limitation, we use penalized optimal scoring to construct a new method for the classification of multi-dimensional functional data. The proposed method consists of two stages. First, the series of observed discrete values available for each individual are expressed as a set of continuous curves. Next, the penalized optimal scoring model is estimated on the basis of these curves. A similar penalized optimal scoring method was described in my previous work, but this model is not suitable for the analysis of continuous functions. In this paper we adopt a Gaussian kernel approach to extend the previous model. The high accuracy of the new method is demonstrated on Monte Carlo simulations, and used to predict defaulting firms on the Japanese Stock Exchange.  相似文献   

6.
Mixture models are used in a large number of applications yet there remain difficulties with maximum likelihood estimation. For instance, the likelihood surface for finite normal mixtures often has a large number of local maximizers, some of which do not give a good representation of the underlying features of the data. In this paper we present diagnostics that can be used to check the quality of an estimated mixture distribution. Particular attention is given to normal mixture models since they frequently arise in practice. We use the diagnostic tools for finite normal mixture problems and in the nonparametric setting where the difficult problem of determining a scale parameter for a normal mixture density estimate is considered. A large sample justification for the proposed methodology will be provided and we illustrate its implementation through several examples  相似文献   

7.
The classical birthday problem considers the probability that at least two people in a group of size N share the same birthday. The inverse birthday problem considers the estimation of the size N of a group given the number of different birthdays in the group. In practice, this problem is analogous to estimating the size of a population from occurrence data only. The inverse problem can be solved via two simple approaches including the method of moments for a multinominal model and the maximum likelihood estimate of a Poisson model, which we present in this study. We investigate properties of both methods and show that they can yield asymptotically equivalent Wald-type interval estimators. Moreover, we show that these methods estimate a lower bound for the population size when birth rates are nonhomogenous or individuals in the population are aggregated. A simulation study was conducted to evaluate the performance of the point estimates arising from the two approaches and to compare the performance of seven interval estimators, including likelihood ratio and log-transformation methods. We illustrate the utility of these methods by estimating: (1) the abundance of tree species over a 50-hectare forest plot, (2) the number of Chlamydia infections when only the number of different birthdays of the patients is known, and (3) the number of rainy days when the number of rainy weeks is known. Supplementary materials for this article are available online.  相似文献   

8.
Summary. We consider the problem of identifying the genetic loci (called quantitative trait loci (QTLs)) contributing to variation in a quantitative trait, with data on an experimental cross. A large number of different statistical approaches to this problem have been described; most make use of multiple tests of hypotheses, and many consider models allowing only a single QTL. We feel that the problem is best viewed as one of model selection. We discuss the use of model selection ideas to identify QTLs in experimental crosses. We focus on a back-cross experiment, with strictly additive QTLs, and concentrate on identifying QTLs, considering the estimation of their effects and precise locations of secondary importance. We present the results of a simulation study to compare the performances of the more prominent methods.  相似文献   

9.
One of the key questions in the use of mixture models concerns the choice of the number of components most suitable for a given data set. In this paper we investigate answers to this problem in the context of likelihood‐based clustering of the rows of a matrix of ordinal data modelled by the ordered stereotype model. Two methodologies for selecting the best model are demonstrated and compared. The first approach fits a separate model to the data for each possible number of clusters, and then uses an information criterion to select the best model. The second approach uses a Bayesian construction in which the parameters and the number of clusters are estimated simultaneously from their joint posterior distribution. Simulation studies are presented which include a variety of scenarios in order to test the reliability of both approaches. Finally, the results of the application of model selection to two real data sets are shown.  相似文献   

10.
Much attention has focused in recent years on the use of state-space models for describing and forecasting industrial time series. However, several state-space models that are proposed for such data series are not observable and do not have a unique representation, particularly in situations where the data history suggests marked seasonal trends. This raises major practical difficulties since it becomes necessary to impose one or more constraints and this implies a complicated error structure on the model. The purpose of this paper is to demonstrate that state-space models are useful for describing time series data for forecasting purposes and that there are trend-projecting state-space components that can be combined to provide observable state-space representations for specified data series. This result is particularly useful for seasonal or pseudo-seasonal time series. A well-known data series is examined in some detail and several observable state-space models are suggested and compared favourably with the constrained observable model.  相似文献   

11.
In this paper, we study the change-point inference problem motivated by the genomic data that were collected for the purpose of monitoring DNA copy number changes. DNA copy number changes or copy number variations (CNVs) correspond to chromosomal aberrations and signify abnormality of a cell. Cancer development or other related diseases are usually relevant to DNA copy number changes on the genome. There are inherited random noises in such data, therefore, there is a need to employ an appropriate statistical model for identifying statistically significant DNA copy number changes. This type of statistical inference is evidently crucial in cancer researches, clinical diagnostic applications, and other related genomic researches. For the high-throughput genomic data resulting from DNA copy number experiments, a mean and variance change point model (MVCM) for detecting the CNVs is appropriate. We propose to use a Bayesian approach to study the MVCM for the cases of one change and propose to use a sliding window to search for all CNVs on a given chromosome. We carry out simulation studies to evaluate the estimate of the locus of the DNA copy number change using the derived posterior probability. These simulation results show that the approach is suitable for identifying copy number changes. The approach is also illustrated on several chromosomes from nine fibroblast cancer cell line data (array-based comparative genomic hybridization data). All DNA copy number aberrations that have been identified and verified by karyotyping are detected by our approach on these cell lines.  相似文献   

12.
Much attention has focused in recent years on the use of state-space models for describing and forecasting industrial time series. However, several state-space models that are proposed for such data series are not observable and do not have a unique representation, particularly in situations where the data history suggests marked seasonal trends. This raises major practical difficulties since it becomes necessary to impose one or more constraints and this implies a complicated error structure on the model. The purpose of this paper is to demonstrate that state-space models are useful for describing time series data for forecasting purposes and that there are trend-projecting state-space components that can be combined to provide observable state-space representations for specified data series. This result is particularly useful for seasonal or pseudo-seasonal time series. A well-known data series is examined in some detail and several observable state-space models are suggested and compared favourably with the constrained observable model.  相似文献   

13.
Summary.  We present a multivariate logistic regression model for the joint analysis of longitudinal multiple-source binary data. Longitudinal multiple-source binary data arise when repeated binary measurements are obtained from two or more sources, with each source providing a measure of the same underlying variable. Since the number of responses on each subject is relatively large, the empirical variance estimator performs poorly and cannot be relied on in this setting. Two methods for obtaining a parsimonious within-subject association structure are considered. An additional complication arises with estimation, since maximum likelihood estimation may not be feasible without making unrealistically strong assumptions about third- and higher order moments. To circumvent this, we propose the use of a generalized estimating equations approach. Finally, we present an analysis of multiple-informant data obtained longitudinally from a psychiatric interventional trial that motivated the model developed in the paper.  相似文献   

14.
Summary The problem of testing economic theories through the use of non-experimental data is considered. Even if it was recognized as one of the main objectives of the discipline since the forties, theory testing in econometrics did not receive the due attention and remained almost a desire till around the eighties, the few notable exceptions being Haavelmo (1944) and Sargan (1964). A specification search approach which recognized the non-experimental nature of data and which provides a general framework for evaluating the statistical reliability of a model is considered. According to this approach, the econometrician specifies models which both capture the probabilistic structure of the data, and, at the same time, provide a reliable synthesis of economic data, reinforcing the decisional capacity of the model itself. Invited paper at the Conference on ?Statistical Tests: Methodology and Econometric Applications?, held in Bologna, Italy, 27–28 May 1993.  相似文献   

15.
Abstract

Inferential methods based on ranks present robust and powerful alternative methodology for testing and estimation. In this article, two objectives are followed. First, develop a general method of simultaneous confidence intervals based on the rank estimates of the parameters of a general linear model and derive the asymptotic distribution of the pivotal quantity. Second, extend the method to high dimensional data such as gene expression data for which the usual large sample approximation does not apply. It is common in practice to use the asymptotic distribution to make inference for small samples. The empirical investigation in this article shows that for methods based on the rank-estimates, this approach does not produce a viable inference and should be avoided. A method based on the bootstrap is outlined and it is shown to provide a reliable and accurate method of constructing simultaneous confidence intervals based on rank estimates. In particular it is shown that commonly applied methods of normal or t-approximation are not satisfactory, particularly for large-scale inferences. Methods based on ranks are uniquely suitable for analysis of microarray gene expression data since they often involve large scale inferences based on small samples containing a large number of outliers and violate the assumption of normality. A real microarray data is analyzed using the rank-estimate simultaneous confidence intervals. Viability of the proposed method is assessed through a Monte Carlo simulation study under varied assumptions.  相似文献   

16.
Biological control of pests is an important branch of entomology, providing environmentally friendly forms of crop protection. Bioassays are used to find the optimal conditions for the production of parasites and strategies for application in the field. In some of these assays, proportions are measured and, often, these data have an inflated number of zeros. In this work, six models will be applied to data sets obtained from biological control assays for Diatraea saccharalis , a common pest in sugar cane production. A natural choice for modelling proportion data is the binomial model. The second model will be an overdispersed version of the binomial model, estimated by a quasi-likelihood method. This model was initially built to model overdispersion generated by individual variability in the probability of success. When interest is only in the positive proportion data, a model can be based on the truncated binomial distribution and in its overdispersed version. The last two models include the zero proportions and are based on a finite mixture model with the binomial distribution or its overdispersed version for the positive data. Here, we will present the models, discuss their estimation and compare the results.  相似文献   

17.
The zero-inflated Poisson regression model is commonly used when analyzing economic data that come in the form of non-negative integers since it accounts for excess zeros and overdispersion of the dependent variable. However, a problem often encountered when analyzing economic data that has not been addressed for this model is multicollinearity. This paper proposes ridge regression (RR) estimators and some methods for estimating the ridge parameter k for a non-negative model. A simulation study has been conducted to compare the performance of the estimators. Both mean squared error and mean absolute error are considered as the performance criteria. The simulation study shows that some estimators are better than the commonly used maximum-likelihood estimator and some other RR estimators. Based on the simulation study and an empirical application, some useful estimators are recommended for practitioners.  相似文献   

18.
This article discusses the discretization of continuous-time filters for application to discrete time series sampled at any fixed frequency. In this approach, the filter is first set up directly in continuous-time; since the filter is expressed over a continuous range of lags, we also refer to them as continuous-lag filters. The second step is to discretize the filter itself. This approach applies to different problems in signal extraction, including trend or business cycle analysis, and the method allows for coherent design of discrete filters for observed data sampled as a stock or a flow, for nonstationary data with stochastic trend, and for different sampling frequencies. We derive explicit formulas for the mean squared error (MSE) optimal discretization filters. We also discuss the problem of optimal interpolation for nonstationary processes – namely, how to estimate the values of a process and its components at arbitrary times in-between the sampling times. A number of illustrations of discrete filter coefficient calculations are provided, including the local level model (LLM) trend filter, the smooth trend model (STM) trend filter, and the Band Pass (BP) filter. The essential methodology can be applied to other kinds of trend extraction problems. Finally, we provide an extended demonstration of the method on CPI flow data measured at monthly and annual sampling frequencies.  相似文献   

19.
Dynamic regression models are widely used because they express and model the behaviour of a system over time. In this article, two dynamic regression models, the distributed lag (DL) model and the autoregressive distributed lag model, are evaluated focusing on their lag lengths. From a classical statistics point of view, there are various methods to determine the number of lags, but none of them are the best in all situations. This is a serious issue since wrong choices will provide bad estimates for the effects of the regressors on the response variable. We present an alternative for the aforementioned problems by considering a Bayesian approach. The posterior distributions of the numbers of lags are derived under an improper prior for the model parameters. The fractional Bayes factor technique [A. O'Hagan, Fractional Bayes factors for model comparison (with discussion), J. R. Statist. Soc. B 57 (1995), pp. 99–138] is used to handle the indeterminacy in the likelihood function caused by the improper prior. The zero-one loss function is used to penalize wrong decisions. A naive method using the specified maximum number of DLs is also presented. The proposed and the naive methods are verified using simulation data. The results are promising for the method we proposed. An illustrative example with a real data set is provided.  相似文献   

20.
While the estimation of the parameters of a hidden Markov model has been studied extensively, the consistent estimation of the number of hidden states is still an unsolved problem. The AIC and BIC methods are used most commonly, but their use in this context has not been justified theoretically. The author shows that for many common models, the penalized minimum‐distance method yields a consistent estimate of the number of hidden states in a stationary hidden Markov model. In addition to addressing the identifiability issues, she applies her method to a multiple sclerosis data set and assesses its performance via simulation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号