首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 240 毫秒
1.
Let X be a N(μ, σ 2) distributed characteristic with unknown σ. We present the minimax version of the two-stage t test having minimal maximal average sample size among all two-stage t tests obeying the classical two-point-condition on the operation characteristic. We give several examples. Furthermore, the minimax version of the two-stage t test is compared with the corresponding two-stage Gauß test.  相似文献   

2.
In semi-competing risks one considers a terminal event, such as death of a person, and a non-terminal event, such as disease recurrence. We present a model where the time to the terminal event is the first passage time to a fixed level c in a stochastic process, while the time to the non-terminal event is represented by the first passage time of the same process to a stochastic threshold S, assumed to be independent of the stochastic process. In order to be explicit, we let the stochastic process be a gamma process, but other processes with independent increments may alternatively be used. For semi-competing risks this appears to be a new modeling approach, being an alternative to traditional approaches based on illness-death models and copula models. In this paper we consider a fully parametric approach. The likelihood function is derived and statistical inference in the model is illustrated on both simulated and real data.  相似文献   

3.
Simulated tempering (ST) is an established Markov chain Monte Carlo (MCMC) method for sampling from a multimodal density π(θ). Typically, ST involves introducing an auxiliary variable k taking values in a finite subset of [0,1] and indexing a set of tempered distributions, say π k (θ) π(θ) k . In this case, small values of k encourage better mixing, but samples from π are only obtained when the joint chain for (θ,k) reaches k=1. However, the entire chain can be used to estimate expectations under π of functions of interest, provided that importance sampling (IS) weights are calculated. Unfortunately this method, which we call importance tempering (IT), can disappoint. This is partly because the most immediately obvious implementation is naïve and can lead to high variance estimators. We derive a new optimal method for combining multiple IS estimators and prove that the resulting estimator has a highly desirable property related to the notion of effective sample size. We briefly report on the success of the optimal combination in two modelling scenarios requiring reversible-jump MCMC, where the naïve approach fails.  相似文献   

4.
This paper presents a novel framework for maximum likelihood (ML) estimation in skew-t factor analysis (STFA) models in the presence of missing values or nonresponses. As a robust extension of the ordinary factor analysis model, the STFA model assumes a restricted version of the multivariate skew-t distribution for the latent factors and the unobservable errors to accommodate non-normal features such as asymmetry and heavy tails or outliers. An EM-type algorithm is developed to carry out ML estimation and imputation of missing values under a missing at random mechanism. The practical utility of the proposed methodology is illustrated through real and synthetic data examples.  相似文献   

5.
The aim of this paper is to study the asymptotic properties of a class of kernel conditional mode estimates whenever functional stationary ergodic data are considered. To be more precise on the matter, in the ergodic data setting, we consider a random elements (XZ) taking values in some semi-metric abstract space \(E\times F\). For a real function \(\varphi \) defined on the space F and \(x\in E\), we consider the conditional mode of the real random variable \(\varphi (Z)\) given the event “\(X=x\)”. While estimating the conditional mode function, say \(\theta _\varphi (x)\), using the well-known kernel estimator, we establish the strong consistency with rate of this estimate uniformly over Vapnik–Chervonenkis classes of functions \(\varphi \). Notice that the ergodic setting offers a more general framework than the usual mixing structure. Two applications to energy data are provided to illustrate some examples of the proposed approach in time series forecasting framework. The first one consists in forecasting the daily peak of electricity demand in France (measured in Giga-Watt). Whereas the second one deals with the short-term forecasting of the electrical energy (measured in Giga-Watt per Hour) that may be consumed over some time intervals that cover the peak demand.  相似文献   

6.
One important goal in multi-state modelling is to explore information about conditional transition-type-specific hazard rate functions by estimating influencing effects of explanatory variables. This may be performed using single transition-type-specific models if these covariate effects are assumed to be different across transition-types. To investigate whether this assumption holds or whether one of the effects is equal across several transition-types (cross-transition-type effect), a combined model has to be applied, for instance with the use of a stratified partial likelihood formulation. Here, prior knowledge about the underlying covariate effect mechanisms is often sparse, especially about ineffectivenesses of transition-type-specific or cross-transition-type effects. As a consequence, data-driven variable selection is an important task: a large number of estimable effects has to be taken into account if joint modelling of all transition-types is performed. A related but subsequent task is model choice: is an effect satisfactory estimated assuming linearity, or is the true underlying nature strongly deviating from linearity? This article introduces component-wise Functional Gradient Descent Boosting (short boosting) for multi-state models, an approach performing unsupervised variable selection and model choice simultaneously within a single estimation run. We demonstrate that features and advantages in the application of boosting introduced and illustrated in classical regression scenarios remain present in the transfer to multi-state models. As a consequence, boosting provides an effective means to answer questions about ineffectiveness and non-linearity of single transition-type-specific or cross-transition-type effects.  相似文献   

7.
This paper discusses the contribution of Cerioli et al. (Stat Methods Appl, 2018), where robust monitoring based on high breakdown point estimators is proposed for multivariate data. The results follow years of development in robust diagnostic techniques. We discuss the issues of extending data monitoring to other models with complex structure, e.g. factor analysis, mixed linear models for which S and MM-estimators exist or deviating data cells. We emphasise the importance of robust testing that is often overlooked despite robust tests being readily available once S and MM-estimators have been defined. We mention open questions like out-of-sample inference or big data issues that would benefit from monitoring.  相似文献   

8.
Mediation analysis often requires larger sample sizes than main effect analysis to achieve the same statistical power. Combining results across similar trials may be the only practical option for increasing statistical power for mediation analysis in some situations. In this paper, we propose a method to estimate: (1) marginal means for mediation path a, the relation of the independent variable to the mediator; (2) marginal means for path b, the relation of the mediator to the outcome, across multiple trials; and (3) the between-trial level variance–covariance matrix based on a bivariate normal distribution. We present the statistical theory and an R computer program to combine regression coefficients from multiple trials to estimate a combined mediated effect and confidence interval under a random effects model. Values of coefficients a and b, along with their standard errors from each trial are the input for the method. This marginal likelihood based approach with Monte Carlo confidence intervals provides more accurate inference than the standard meta-analytic approach. We discuss computational issues, apply the method to two real-data examples and make recommendations for the use of the method in different settings.  相似文献   

9.
We propose a novel Bayesian analysis of the p-variate skew-t model, providing a new parameterization, a set of non-informative priors and a sampler specifically designed to explore the posterior density of the model parameters. Extensions, such as the multivariate regression model with skewed errors and the stochastic frontiers model, are easily accommodated. A novelty introduced in the paper is given by the extension of the bivariate skew-normal model given in Liseo and Parisi (2013) to a more realistic p-variate skew-t model. We also introduce the R package mvst, which produces a posterior sample for the parameters of a multivariate skew-t model.  相似文献   

10.
The skew t-distribution includes both the skew normal and the normal distributions as special cases. Inference for the skew t-model becomes problematic in these cases because the expected information matrix is singular and the parameter corresponding to the degrees of freedom takes a value at the boundary of its parameter space. In particular, the distributions of the likelihood ratio statistics for testing the null hypotheses of skew normality and normality are not asymptotically \(\chi ^2\). The asymptotic distributions of the likelihood ratio statistics are considered by applying the results of Self and Liang (J Am Stat Assoc 82:605–610, 1987) for boundary-parameter inference in terms of reparameterizations designed to remove the singularity of the information matrix. The Self–Liang asymptotic distributions are mixtures, and it is shown that their accuracy can be improved substantially by correcting the mixing probabilities. Furthermore, although the asymptotic distributions are non-standard, versions of Bartlett correction are developed that afford additional accuracy. Bootstrap procedures for estimating the mixing probabilities and the Bartlett adjustment factors are shown to produce excellent approximations, even for small sample sizes.  相似文献   

11.
In this paper, we consider the problem of hypotheses testing about the drift parameter \(\theta \) in the process \(\text {d}Y^{\delta }_{t} = \theta \dot{f}(t)Y^{\delta }_{t}\text {d}t + b(t)\text {d}L^{\delta }_{t}\) driven by symmetric \(\delta \)-stable Lévy process \(L^{\delta }_{t}\) with \(\dot{f}(t)\) being the derivative of a known increasing function f(t) and b(t) being known as well. We consider the hypotheses testing \(H_{0}: \theta \le 0\) and \(K_{0}: \theta =0\) against the alternatives \(H_{1}: \theta >0\) and \(K_{1}: \theta \ne 0\), respectively. For these hypotheses, we propose inverse methods, which are motivated by sequential approach, based on the first hitting time of the observed process (or its absolute value) to a pre-specified boundary or two boundaries until some given time. The applicability of these methods is illustrated. For the case \(Y^{\delta }_{0}=0\), we are able to calculate the values of boundaries and finite observed times more directly. We are able to show the consistencies of proposed tests for \(Y^{\delta }_{0}\ge 0\) with \(\delta \in (1,2]\) and for \(Y^{\delta }_{0}=0\) with \(\delta \in (0,2]\) under quite mild conditions.  相似文献   

12.
Let \({\{X_n, n\geq 1\}}\) be a sequence of independent and identically distributed non-degenerated random variables with common cumulative distribution function F. Suppose X 1 is concentrated on 0, 1, . . . , N ≤ ∞ and P(X 1 = 1) > 0. Let \({X_{U_w(n)}}\) be the n-th upper weak record value. In this paper we show that for any fixed m ≥ 2, X 1 has Geometric distribution if and only if \({X_{U_{w}(m)}\mathop=\limits^d X_1+\cdots+X_m ,}\) where \({\underline{\underline{d}}}\) denotes equality in distribution. Our result is a generalization of the case m = 2 obtained by Ahsanullah (J Stat Theory Appl 8(1):5–16, 2009).  相似文献   

13.
In this work, the problem of transformation and simultaneous variable selection is thoroughly treated via objective Bayesian approaches by the use of default Bayes factor variants. Four uniparametric families of transformations (Box–Cox, Modulus, Yeo-Johnson and Dual), denoted by T, are evaluated and compared. The subjective prior elicitation for the transformation parameter \(\lambda _T\), for each T, is not a straightforward task. Additionally, little prior information for \(\lambda _T\) is expected to be available, and therefore, an objective method is required. The intrinsic Bayes factors and the fractional Bayes factors allow us to incorporate default improper priors for \(\lambda _T\). We study the behaviour of each approach using a simulated reference example as well as two real-life examples.  相似文献   

14.
This paper addresses the issue of estimating the expectation of a real-valued random variable of the form \(X = g(\mathbf {U})\) where g is a deterministic function and \(\mathbf {U}\) can be a random finite- or infinite-dimensional vector. Using recent results on rare event simulation, we propose a unified framework for dealing with both probability and mean estimation for such random variables, i.e. linking algorithms such as Tootsie Pop Algorithm or Last Particle Algorithm with nested sampling. Especially, it extends nested sampling as follows: first the random variable X does not need to be bounded any more: it gives the principle of an ideal estimator with an infinite number of terms that is unbiased and always better than a classical Monte Carlo estimator—in particular it has a finite variance as soon as there exists \(k \in \mathbb {R}> 1\) such that \({\text {E}}\left[ X^k \right] < \infty \). Moreover we address the issue of nested sampling termination and show that a random truncation of the sum can preserve unbiasedness while increasing the variance only by a factor up to 2 compared to the ideal case. We also build an unbiased estimator with fixed computational budget which supports a Central Limit Theorem and discuss parallel implementation of nested sampling, which can dramatically reduce its running time. Finally we extensively study the case where X is heavy-tailed.  相似文献   

15.
A typical problem in optimal design theory is finding an experimental design that is optimal with respect to some criteria in a class of designs. The most popular criteria include the A- and D-criteria. Regular graph designs occur in many optimality results, and if the number of blocks is large enough, an A-optimal (or D-optimal) design is among them (if any exist). To explore the landscape of designs with a large number of blocks, we introduce extensions of regular graph designs. These are constructed by adding the blocks of a balanced incomplete block design repeatedly to the original design. We present the results of an exact computer search for the best regular graph designs and the best extended regular graph designs with up to 20 treatments v, block size \(k \le 10\) and replication r \(\le 10\) and \(r(k-1)-(v-1)\lfloor r(k-1)/(v-1)\rfloor \le 9\).  相似文献   

16.
We develop a new robust stopping criterion for partial least squares regression (PLSR) component construction, characterized by a high level of stability. This new criterion is universal since it is suitable both for PLSR and extensions to generalized linear regression (PLSGLR). The criterion is based on a non-parametric bootstrap technique and must be computed algorithmically. It allows the testing of each successive component at a preset significance level \(\alpha \). In order to assess its performance and robustness with respect to various noise levels, we perform dataset simulations in which there is a preset and known number of components. These simulations are carried out for datasets characterized both by \(n>p\), with n the number of subjects and p the number of covariates, as well as for \(n<p\). We then use t-tests to compare the predictive performance of our approach with other common criteria. The stability property is in particular tested through re-sampling processes on a real allelotyping dataset. An important additional conclusion is that this new criterion gives globally better predictive performances than existing ones in both the PLSR and PLSGLR (logistic and poisson) frameworks.  相似文献   

17.
A new algorithm is presented and studied in this paper for fast computation of the nonparametric maximum likelihood estimate of a U-shaped hazard function. It successfully overcomes a difficulty when computing a U-shaped hazard function, which is only properly defined by knowing its anti-mode, and the anti-mode itself has to be found during the computation. Specifically, the new algorithm maintains the constant hazard segment, regardless of its length being zero or positive. The length varies naturally, according to what mass values are allocated to their associated knots after each updating. Being an appropriate extension of the constrained Newton method, the new algorithm also inherits its advantage of fast convergence, as demonstrated by some real-world data examples. The algorithm works not only for exact observations, but also for purely interval-censored data, and for data mixed with exact and interval-censored observations.  相似文献   

18.
Methods to perform regression on compositional covariates have recently been proposed using isometric log-ratios (ilr) representation of compositional parts. This approach consists of first applying standard regression on ilr coordinates and second, transforming the estimated ilr coefficients into their contrast log-ratio counterparts. This gives easy-to-interpret parameters indicating the relative effect of each compositional part. In this work we present an extension of this framework, where compositional covariate effects are allowed to be smooth in the ilr domain. This is achieved by fitting a smooth function over the multidimensional ilr space, using Bayesian P-splines. Smoothness is achieved by assuming random walk priors on spline coefficients in a hierarchical Bayesian framework. The proposed methodology is applied to spatial data from an ecological survey on a gypsum outcrop located in the Emilia Romagna Region, Italy.  相似文献   

19.
Computer experiments using deterministic simulators are sometimes used to replace or supplement physical system experiments. This paper compares designs for an initial computer simulator experiment based on empirical prediction accuracy; it recommends designs for producing accurate predictions. The basis for the majority of the designs compared is the integrated mean squared prediction error (IMSPE) that is computed assuming a Gaussian process model with a Gaussian correlation function. Designs that minimize the IMSPE with respect to a fixed set of correlation parameters as well as designs that minimize a weighted IMSPE over the correlation parameters are studied. These IMSPE-based designs are compared with three widely-used space-filling designs. The designs are used to predict test surfaces representing a range of stationary and non-stationary functions. For the test conditions examined in this paper, the designs constructed under IMSPE-based criteria are shown to outperform space-filling Latin hypercube designs and maximum projection designs when predicting smooth functions of stationary appearance, while space-filling and maximum projection designs are superior for test functions that exhibit strong non-stationarity.  相似文献   

20.
Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However, for datasets where the number of variables p is large the algorithm can become inefficient and computationally expensive. Another method which is popular for high-dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However, its default implementation does not produce probabilistic estimates or predictions. We propose an alternative fitting algorithm for BART called BART-BMA, which uses Bayesian model averaging and a greedy search algorithm to obtain a posterior distribution more efficiently than BART for datasets with large p. BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data. We have found that BART-BMA can be run in a reasonable time on a standard laptop for the “small n large p” scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments, one to distinguish between patients with cardiovascular disease and controls and another to classify aggressive from non-aggressive prostate cancer. We compare our results to their main competitors. Open source code written in R and Rcpp to run BART-BMA can be found at: https://github.com/BelindaHernandez/BART-BMA.git.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号