首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The magnitude-frequency distribution (MFD) of earthquake is a fundamental statistic in seismology. The so-called b-value in the MFD is of particular interest in geophysics. A continuous time hidden Markov model (HMM) is proposed for characterizing the variability of b-values. The HMM-based approach to modeling the MFD has some appealing properties over the widely used sliding-window approach. Often, large variability appears in the estimation of b-value due to window size tuning, which may cause difficulties in interpretation of b-value heterogeneities. Continuous-time hidden Markov models (CT-HMMs) are widely applied in various fields. It bears some advantages over its discrete time counterpart in that it can characterize heterogeneities appearing in time series in a finer time scale, particularly for highly irregularly-spaced time series, such as earthquake occurrences. We demonstrate an expectation–maximization algorithm for the estimation of general exponential family CT-HMM. In parallel with discrete-time hidden Markov models, we develop a continuous time version of Viterbi algorithm to retrieve the overall optimal path of the latent Markov chain. The methods are applied to New Zealand deep earthquakes. Before the analysis, we first assess the completeness of catalogue events to assure the analysis is not biased by missing data. The estimation of b-value is stable over the selection of magnitude thresholds, which is ideal for the interpretation of b-value variability.  相似文献   

2.
Tree-based methods similar to CART have recently been utilized for problems in which the main goal is to estimate some set of interest. It is often the case that the boundary of the true set is smooth in some sense, however tree-based estimates will not be smooth, as they will be a union of ‘boxes’. We propose a general methodology for smoothing such sets that allows for varying levels of smoothness on the boundary automatically. The method is similar to the idea underlying support vector machines, which is applying a computationally simple technique to data after a non-linear mapping to produce smooth estimates in the original space. In particular, we consider the problem of level-set estimation for regression functions and the dyadic tree-based method of Willett and Nowak [Minimax optimal level-set estimation, IEEE Trans. Image Process. 16 (2007), pp. 2965–2979].  相似文献   

3.
Longitudinal health-related quality of life data arise naturally from studies of progressive and neurodegenerative diseases. In such studies, patients’ mental and physical conditions are measured over their follow-up periods and the resulting data are often complicated by subject-specific measurement times and possible terminal events associated with outcome variables. Motivated by the “Predictor’s Cohort” study on patients with advanced Alzheimer disease, we propose in this paper a semiparametric modeling approach to longitudinal health-related quality of life data. It builds upon and extends some recent developments for longitudinal data with irregular observation times. The new approach handles possibly dependent terminal events. It allows one to examine time-dependent covariate effects on the evolution of outcome variable and to assess nonparametrically change of outcome measurement that is due to factors not incorporated in the covariates. The usual large-sample properties for parameter estimation are established. In particular, it is shown that relevant parameter estimators are asymptotically normal and the asymptotic variances can be estimated consistently by the simple plug-in method. A general procedure for testing a specific parametric form in the nonparametric component is also developed. Simulation studies show that the proposed approach performs well for practical settings. The method is applied to the motivating example.  相似文献   

4.
Preferential attachment in a directed scale-free graph is an often used paradigm for modeling the evolution of social networks. Social network data is usually given in a format allowing recovery of the number of nodes with in-degree i and out-degree j. Assuming a model with preferential attachment, formal statistical procedures for estimation can be based on such data summaries. Anticipating the statistical need for such node-based methods, we prove asymptotic normality of the node counts. Our approach is based on a martingale construction and a martingale central limit theorem.  相似文献   

5.
Abstract

The generalized extreme value (GEV) distribution is known as the limiting result for the modeling of maxima blocks of size n, which is used in the modeling of extreme events. However, it is possible for the data to present an excessive number of zeros when dealing with extreme data, making it difficult to analyze and estimate these events by using the usual GEV distribution. The Zero-Inflated Distribution (ZID) is widely known in literature for modeling data with inflated zeros, where the inflator parameter w is inserted. The present work aims to create a new approach to analyze zero-inflated extreme values, that will be applied in data of monthly maximum precipitation, that can occur during months where there was no precipitation, being these computed as zero. An inference was made on the Bayesian paradigm, and the parameter estimation was made by numerical approximations of the posterior distribution using Markov Chain Monte Carlo (MCMC) methods. Time series of some cities in the northeastern region of Brazil were analyzed, some of them with predominance of non-rainy months. The results of these applications showed the need to use this approach to obtain more accurate and with better adjustment measures results when compared to the standard distribution of extreme value analysis.  相似文献   

6.
Summary.  Key ecological studies involve the regular censusing of populations of wild animals, resulting in individual case history data which record when marked individuals are seen alive and/or found dead. We show how current conditional methods of analysing case history data may be biased. We then show how a correction can be applied, making use of results from a mark–recovery–recapture analysis. This allows a simple investigation of the effect of time-varying individual covariates such as weight that often contain missing values. The work is motivated and illustrated by the study of Soay sheep in the St Kilda archipelago.  相似文献   

7.
Binary response models are often applied in dose–response settings where the number of dose levels is limited. Commonly, one can find cases where the maximum likelihood estimation process for these models produces infinite values for at least one of the parameters, often corresponding to the ‘separated data’ issue. Algorithms for detecting such data have been proposed, but are usually incorporated directly into in the parameter estimation. Additionally, they do not consider the use of asymptotes in the model formulation. In order to study this phenomenon in greater detail, we define the class of specifiably degenerate functions where this can occur (including the popular logistic and Weibull models) that allows for asymptotes in the dose–response specification. We demonstrate for this class that the well-known pool-adjacent-violators algorithm can efficiently pre-screen for non-estimable data. A simulation study demonstrates the frequency with which this problem can occur for various response models and conditions.  相似文献   

8.
When the distribution of a process characterized by a profile is non normal, process capability analysis using normal assumption often leads to erroneous interpretations of the process performance. Profile monitoring is a relatively new set of techniques in quality control that is used in situations where the state of product or process is represented by a function of two or more quality characteristics. Such profiles can be modeled using linear or nonlinear regression models. In some applications, it is assumed that the quality characteristics follow a normal distribution; however, in certain applications this assumption may fail to hold and may yield misleading results. In this article, we consider process capability analysis of non normal linear profiles. We investigate and compare five methods to estimate non normal process capability index (PCI) in profiles. In three of the methods, an estimation of the cumulative distribution function (cdf) of the process is required to analyze process capability in profiles. In order to estimate cdf of the process, we use a Burr XII distribution as well as empirical distributions. However, the resulted PCI with estimating cdf of the process is sometimes far from its true value. So, here we apply artificial neural network with supervised learning which allows the estimation of PCIs in profiles without the need to estimate cdf of the process. Box-Cox transformation technique is also developed to deal with non normal situations. Finally, a comparison study is performed through the simulation of Gamma, Weibull, Lognormal, Beta and student-t data.  相似文献   

9.
To summarize a set of data by a distribution function in Johnson's translation system, we use a least-squares approach to parameter estimation wherein we seek to minimize the distance between the vector of "uniformized" oeder statistics and the corresponding vector of expected values. We use the software package FITTRI to apply this technique to three problems arising respectively in medicine, applied statistics, and civil engineering. Compared to traditional methods of distribution fitting based on moment matching, percentile matchingL 1 estimation, and L ? estimation, the least-squares technique is seen to yield fits of similar accuracy and to converge more rapidly and reliably to a set of acceptable parametre estimates.  相似文献   

10.
Neural networks are a popular machine learning tool, particularly in applications such as protein structure prediction; however, overfitting can pose an obstacle to their effective use. Due to the large number of parameters in a typical neural network, one may obtain a network fit that perfectly predicts the learning data, yet fails to generalize to other data sets. One way of reducing the size of the parmeter space is to alter the network topology so that some edges are removed; however it is often not immediately apparent which edges should be eliminated. We propose a data-adaptive method of selecting an optimal network architecture using a deletion/substitution/addition algorithm. Results of this approach to classification are presented on simulated data and the breast cancer data of Wolberg and Mangasarian [1990. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Nat. Acad. Sci. 87, 9193–9196].  相似文献   

11.
Two statistical applications for estimation and prediction of flows in traffic networks are presented. In the first, the number of route users are assumed to be independent α-shifted gamma Γ(θ, λ0) random variables denoted H(α, θ, λ0), with common λ0. As a consequence, the link, OD (origin-destination) and node flows are also H(α, θ, λ0) variables. We assume that the main source of information is plate scanning, which permits us to identify, totally or partially, the vehicle route, OD and link flows by scanning their corresponding plate numbers at an adequately selected subset of links. A Bayesian approach using conjugate families is proposed that allows us to estimate different traffic flows. In the second application, a stochastic demand dynamic traffic model to predict some traffic variables and their time evolution in real networks is presented. The Bayesian network model considers that the variables are generalized Beta variables such that when marginally transformed to standard normal become multivariate normal. The model is able to provide a point estimate, a confidence interval or the density of the variable being predicted. Finally, the models are illustrated by their application to the Nguyen Dupuis network and the Vermont-State example. The resulting traffic predictions seem to be promising for real traffic networks and can be done in real time.  相似文献   

12.
The asymptotic variance of the maximum likelihood estimate is proved to decrease when the maximization is restricted to a subspace that contains the true parameter value. Maximum likelihood estimation allows a systematic fitting of covariance models to the sample, which is important in data assimilation. The hierarchical maximum likelihood approach is applied to the spectral diagonal covariance model with different parameterizations of eigenvalue decay, and to the sparse inverse covariance model with specified parameter values on different sets of nonzero entries. It is shown computationally that using smaller sets of parameters can decrease the sampling noise in high dimension substantially.  相似文献   

13.
A model for survival analysis is studied that is relevant for samples which are subject to multiple types of failure. In comparison with a more standard approach, through the appropriate use of hazard functions and transition probabilities, the model allows for a more accurate study of cause-specific failure with regard to both the timing and type of failure. A semiparametric specification of a mixture model is employed that is able to adjust for concomitant variables and allows for the assessment of their effects on the probabilities of eventual causes of failure through a generalized logistic model, and their effects on the corresponding conditional hazard functions by employing the Cox proportional hazards model. A carefully formulated estimation procedure is presented that uses an EM algorithm based on a profile likelihood construction. The methods discussed, which could also be used for reliability analysis, are applied to a prostate cancer data set.  相似文献   

14.
Model checking with discrete data regressions can be difficult because the usual methods such as residual plots have complicated reference distributions that depend on the parameters in the model. Posterior predictive checks have been proposed as a Bayesian way to average the results of goodness-of-fit tests in the presence of uncertainty in estimation of the parameters. We try this approach using a variety of discrepancy variables for generalized linear models fitted to a historical data set on behavioural learning. We then discuss the general applicability of our findings in the context of a recent applied example on which we have worked. We find that the following discrepancy variables work well, in the sense of being easy to interpret and sensitive to important model failures: structured displays of the entire data set, general discrepancy variables based on plots of binned or smoothed residuals versus predictors and specific discrepancy variables created on the basis of the particular concerns arising in an application. Plots of binned residuals are especially easy to use because their predictive distributions under the model are sufficiently simple that model checks can often be made implicitly. The following discrepancy variables did not work well: scatterplots of latent residuals defined from an underlying continuous model and quantile–quantile plots of these residuals.  相似文献   

15.
Summary.  In survival data that are collected from phase III clinical trials on breast cancer, a patient may experience more than one event, including recurrence of the original cancer, new primary cancer and death. Radiation oncologists are often interested in comparing patterns of local or regional recurrences alone as first events to identify a subgroup of patients who need to be treated by radiation therapy after surgery. The cumulative incidence function provides estimates of the cumulative probability of locoregional recurrences in the presence of other competing events. A simple version of the Gompertz distribution is proposed to parameterize the cumulative incidence function directly. The model interpretation for the cumulative incidence function is more natural than it is with the usual cause-specific hazard parameterization. Maximum likelihood analysis is used to estimate simultaneously parametric models for cumulative incidence functions of all causes. The parametric cumulative incidence approach is applied to a data set from the National Surgical Adjuvant Breast and Bowel Project and compared with analyses that are based on parametric cause-specific hazard models and nonparametric cumulative incidence estimation.  相似文献   

16.
Abstract

The gap time between recurrent events is often of primary interest in many fields such as medical studies, and in this article, we discuss regression analysis of the gap times arising from a general class of additive transformation models. For the problem, we propose two estimation procedures, the modified within-cluster resampling (MWCR) method and the weighted risk-set (WRS) method, and the proposed estimators are shown to be consistent and asymptotically follow the normal distribution. In particular, the estimators have closed forms and can be easily determined, and the methods have the advantage of leaving the correlation among gap times arbitrary. A simulation study is conducted for assessing the finite sample performance of the presented methods and suggests that they work well in practical situations. Also the methods are applied to a set of real data from a chronic granulomatous disease (CGD) clinical trial.  相似文献   

17.
Skew scale mixtures of normal distributions are often used for statistical procedures involving asymmetric data and heavy-tailed. The main virtue of the members of this family of distributions is that they are easy to simulate from and they also supply genuine expectation-maximization (EM) algorithms for maximum likelihood estimation. In this paper, we extend the EM algorithm for linear regression models and we develop diagnostics analyses via local influence and generalized leverage, following Zhu and Lee's approach. This is because Cook's well-known approach cannot be used to obtain measures of local influence. The EM-type algorithm has been discussed with an emphasis on the skew Student-t-normal, skew slash, skew-contaminated normal and skew power-exponential distributions. Finally, results obtained for a real data set are reported, illustrating the usefulness of the proposed method.  相似文献   

18.
Abstract

The economic mobility of individuals and households is of fundamental interest. While many measures of economic mobility exist, reliance on transition matrices remains pervasive due to simplicity and ease of interpretation. However, estimation of transition matrices is complicated by the well-acknowledged problem of measurement error in self-reported and even administrative data. Existing methods of addressing measurement error are complex, rely on numerous strong assumptions, and often require data from more than two periods. In this article, we investigate what can be learned about economic mobility as measured via transition matrices while formally accounting for measurement error in a reasonably transparent manner. To do so, we develop a nonparametric partial identification approach to bound transition probabilities under various assumptions on the measurement error and mobility processes. This approach is applied to panel data from the United States to explore short-run mobility before and after the Great Recession.  相似文献   

19.
The estimation of the covariance matrix is important in the analysis of bivariate longitudinal data. A good estimator for the covariance matrix can improve the efficiency of the estimators of the mean regression coefficients. Furthermore, the covariance estimation itself is also of interest, but it is a challenging job to model the covariance matrix of bivariate longitudinal data due to the complex structure and positive definite constraint. In addition, most of existing approaches are based on the maximum likelihood, which is very sensitive to outliers or heavy-tail error distributions. In this article, an adaptive robust estimation method is proposed for bivariate longitudinal data. Unlike the existing likelihood-based methods, the proposed method can adapt to different error distributions. Specifically, at first, we utilize the modified Cholesky block decomposition to parameterize the covariance matrices. Secondly, we apply the bounded Huber's score function to develop a set of robust generalized estimating equations to estimate the parameters both in the mean and the covariance models simultaneously. A data-driven approach is presented to select the parameter c in the Huber's score function, which can ensure that the proposed method is robust and efficient. A simulation study and a real data analysis are conducted to illustrate the robustness and efficiency of the proposed approach.  相似文献   

20.
Exponential-family random graph models (ERGMs) provide a principled way to model and simulate features common in human social networks, such as propensities for homophily and friend-of-a-friend triad closure. We show that, without adjustment, ERGMs preserve density as network size increases. Density invariance is often not appropriate for social networks. We suggest a simple modification based on an offset which instead preserves the mean degree and accommodates changes in network composition asymptotically. We demonstrate that this approach allows ERGMs to be applied to the important situation of egocentrically sampled data. We analyze data from the National Health and Social Life Survey (NHSLS).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号