首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Maximum likelihood estimation is investigated in the context of linear regression models under partial independence restrictions. These restrictions aim to assume a kind of completeness of a set of predictors Z in the sense that they are sufficient to explain the dependencies between an outcome Y and predictors X: ?(Y|Z, X) = ?(Y|Z), where ?(·|·) stands for the conditional distribution. From a practical point of view, the former model is particularly interesting in a double sampling scheme where Y and Z are measured together on a first sample and Z and X on a second separate sample. In that case, estimation procedures are close to those developed in the study of double‐regression by Engel & Walstra (1991) and Causeur & Dhorne (1998) . Properties of the estimators are derived in a small sample framework and in an asymptotic one, and the procedure is illustrated by an example from the food industry context.  相似文献   

2.
When process data follow a particular curve in quality control, profile monitoring is suitable and appropriate for assessing process stability. Previous research in profile monitoring focusing on nonlinear parametric (P) modeling, involving both fixed and random-effects, was made under the assumption of an accurate nonlinear model specification. Lately, nonparametric (NP) methods have been used in the profile monitoring context in the absence of an obvious linear P model. This study introduces a novel technique in profile monitoring for any nonlinear and auto-correlated data. Referred to as the nonlinear mixed robust profile monitoring (NMRPM) method, it proposes a semiparametric (SP) approach that combines nonlinear P and NP profile fits for scenarios in which a nonlinear P model is adequate over part of the data but inadequate of the rest. These three methods (P, NP, and NMRPM) account for the auto-correlation within profiles and treats the collection of profiles as a random sample with a common population. During Phase I analysis, a version of Hotelling’s T2 statistic is proposed for each approach to identify abnormal profiles based on the estimated random effects and obtain the corresponding control limits. The performance of the NMRPM method is then evaluated using a real data set. Results reveal that the NMRPM method is robust to model misspecification and performs adequately against a correctly specified nonlinear P model. Control charts with the NMRPM method have excellent capability of detecting changes in Phase I data with control limits that are easily computable.  相似文献   

3.
While it is often argued that a p-value is a probability; see Wasserstein and Lazar, we argue that a p-value is not defined as a probability. A p-value is a bijection of the sufficient statistic for a given test which maps to the same scale as the Type I error probability. As such, the use of p-values in a test should be no more a source of controversy than the use of a sufficient statistic. It is demonstrated that there is, in fact, no ambiguity about what a p-value is, contrary to what has been claimed in recent public debates in the applied statistics community. We give a simple example to illustrate that rejecting the use of p-values in testing for a normal mean parameter is conceptually no different from rejecting the use of a sample mean. The p-value is innocent; the problem arises from its misuse and misinterpretation. The way that p-values have been informally defined and interpreted appears to have led to tremendous confusion and controversy regarding their place in statistical analysis.  相似文献   

4.
Summary In analysing a decision problem, in a situation ofpartial knowledge, a decision maker may be reluctant to assign acomplete probability distribution on the relevant states of nature. In order to face this difficulty, several methods, based onindeterminate probabilities or probabilityintervals, have been proposed in the literature. In this paper, arguing that it is meaningless to judge probabilistic assessments ascorrect orwrong, it is maintained that onlycoherence has anobjective andsignificant role. Then to overcome practical difficulties, an approach based on thesubjective methodology and on the use ofnumerical andqualitative probabilities, is outlined.  相似文献   

5.
The traditional non-parametric bootstrap (referred to as the n-out-of-n bootstrap) is a widely applicable and powerful tool for statistical inference, but in important situations it can fail. It is well known that by using a bootstrap sample of size m, different from n, the resulting m-out-of-n bootstrap provides a method for rectifying the traditional bootstrap inconsistency. Moreover, recent studies have shown that interesting cases exist where it is better to use the m-out-of-n bootstrap in spite of the fact that the n-out-of-n bootstrap works. In this paper, we discuss another case by considering its application to hypothesis testing. Two new data-based choices of m are proposed in this set-up. The results of simulation studies are presented to provide empirical comparisons between the performance of the traditional bootstrap and the m-out-of-n bootstrap, based on the two data-dependent choices of m, as well as on an existing method in the literature for choosing m. These results show that the m-out-of-n bootstrap, based on our choice of m, generally outperforms the traditional bootstrap procedure as well as the procedure based on the choice of m proposed in the literature.  相似文献   

6.
Given an unknown function (e.g. a probability density, a regression function, …) f and a constant c, the problem of estimating the level set L(c) ={fc} is considered. This problem is tackled in a very general framework, which allows f to be defined on a metric space different from . Such a degree of generality is motivated by practical considerations and, in fact, an example with astronomical data is analyzed where the domain of f is the unit sphere. A plug‐in approach is followed; that is, L(c) is estimated by Ln(c) ={fnc} , where fn is an estimator of f. Two results are obtained concerning consistency and convergence rates, with respect to the Hausdorff metric, of the boundaries ?Ln(c) towards ?L(c) . Also, the consistency of Ln(c) to L(c) is shown, under mild conditions, with respect to the L1 distance. Special attention is paid to the particular case of spherical data.  相似文献   

7.
A cluster methodology, motivated by a robust similarity matrix is proposed for identifying likely multivariate outlier structure and to estimate weighted least-square (WLS) regression parameters in linear models. The proposed method is an agglomeration of procedures that begins from clustering the n-observations through a test of ‘no-outlier hypothesis’ (TONH) to a weighted least-square regression estimation. The cluster phase partition the n-observations into h-set called main cluster and a minor cluster of size n?h. A robust distance emerge from the main cluster upon which a test of no outlier hypothesis’ is conducted. An initial WLS regression estimation is computed from the robust distance obtained from the main cluster. Until convergence, a re-weighted least-squares (RLS) regression estimate is updated with weights based on the normalized residuals. The proposed procedure blends an agglomerative hierarchical cluster analysis of a complete linkage through the TONH to the Re-weighted regression estimation phase. Hence, we propose to call it cluster-based re-weighted regression (CBRR). The CBRR is compared with three existing procedures using two data sets known to exhibit masking and swamping. The performance of CBRR is further examined through simulation experiment. The results obtained from the data set illustration and the Monte Carlo study shows that the CBRR is effective in detecting multivariate outliers where other methods are susceptible to it. The CBRR does not require enormous computation and is substantially not susceptible to masking and swamping.  相似文献   

8.
We consider the situation where there is a known regression model that can be used to predict an outcome, Y, from a set of predictor variables X . A new variable B is expected to enhance the prediction of Y. A dataset of size n containing Y, X and B is available, and the challenge is to build an improved model for Y| X ,B that uses both the available individual level data and some summary information obtained from the known model for Y| X . We propose a synthetic data approach, which consists of creating m additional synthetic data observations, and then analyzing the combined dataset of size n + m to estimate the parameters of the Y| X ,B model. This combined dataset of size n + m now has missing values of B for m of the observations, and is analyzed using methods that can handle missing data (e.g., multiple imputation). We present simulation studies and illustrate the method using data from the Prostate Cancer Prevention Trial. Though the synthetic data method is applicable to a general regression context, to provide some justification, we show in two special cases that the asymptotic variances of the parameter estimates in the Y| X ,B model are identical to those from an alternative constrained maximum likelihood estimation approach. This correspondence in special cases and the method's broad applicability makes it appealing for use across diverse scenarios. The Canadian Journal of Statistics 47: 580–603; 2019 © 2019 Statistical Society of Canada  相似文献   

9.
We consider partial sums Sn of a general class of stationary sequences of integer-valued random variables, and we provide sufficient conditions for Sn to satisfy a local limit theorem. To prove this result, we introduce a concept called the Bernoulli part. The amount of Bernoulli part in Sn determines the extent to which the density of Sn is relatively flat. If in addition Sn satisfies a global central limit theorem, the local limit theorem follows.  相似文献   

10.
Let X 1, X 2,… be a sequence of independent and identically distributed random variables, and let Y n , n = K, K + 1, K + 2,… be the corresponding backward moving average of order K. At epoch n ≥ K, the process Y n will be off target by the input X n if it exceeds a threshold. By introducing a two-state Markov chain, we define a level of significance (1 ? a)% to be the percentage of times that the moving average process stays on target. We establish a technique to evaluate, or estimate, a threshold, to guarantee that {Y n } will stay (1 ? a)% of times on target, for a given (1 ? a)%. It is proved that if the distribution of the inputs is exponential or normal, then the threshold will be a linear function in the mean of the distribution of inputs μ X . The slope and intercept of the line, in each case, are specified. It is also observed that for the gamma inputs, the threshold is merely linear in the reciprocal of the scale parameter. These linear relationships can be easily applied to estimate the desired thresholds by samples from the inputs.  相似文献   

11.
A generalization of step-up and step-down multiple test procedures is proposed. This step-up-down procedure is useful when the objective is to reject a specified minimum number, q, out of a family of k hypotheses. If this basic objective is met at the first step, then it proceeds in a step-down manner to see if more than q hypotheses can be rejected. Otherwise it proceeds in a step-up manner to see if some number less than q hypotheses can be rejected. The usual step-down procedure is the special case where q = 1, and the usual step-up procedure is the special case where q = k. Analytical and numerical comparisons between the powers of the step-up-down procedures with different choices of q are made to see how these powers depend on the actual number of false hypotheses. Examples of application include comparing the efficacy of a treatment to a control for multiple endpoints and testing the sensitivity of a clinical trial for comparing the efficacy of a new treatment with a set of standard treatments.  相似文献   

12.
Consider a population the individuals in which can be classified into groups. Let y, the number of individuals in a group, be distributed according to a probability function f(y;øo) where the functional form f is known. The random variable y cannot be observed directly, and hence a random sample of groups cannot be obtained. Consider a random sample of N individuals from the population. Suppose the N individuals are distributed into S groups with x1, x2, …, xS representatives respectively. The random variable x, the number of individuals in a group in the sample, will be a fraction of its population counterpart y, and the distributions of x and y need not have the same functional form. If the two random variables x and y have the same functional form for their distributions, then the particular common distribution is called an invariant abundance distribution. The paper provides a characterization of invariant abundance distributions in the class of power-series distributions.  相似文献   

13.
Biased sampling from an underlying distribution with p.d.f. f(t), t>0, implies that observations follow the weighted distribution with p.d.f. f w (t)=w(t)f(t)/E[w(T)] for a known weight function w. In particular, the function w(t)=t α has important applications, including length-biased sampling (α=1) and area-biased sampling (α=2). We first consider here the maximum likelihood estimation of the parameters of a distribution f(t) under biased sampling from a censored population in a proportional hazards frailty model where a baseline distribution (e.g. Weibull) is mixed with a continuous frailty distribution (e.g. Gamma). A right-censored observation contributes a term proportional to w(t)S(t) to the likelihood; this is not the same as S w (t), so the problem of fitting the model does not simply reduce to fitting the weighted distribution. We present results on the distribution of frailty in the weighted distribution and develop an EM algorithm for estimating the parameters of the model in the important Weibull–Gamma case. We also give results for the case where f(t) is a finite mixture distribution. Results are presented for uncensored data and for Type I right censoring. Simulation results are presented, and the methods are illustrated on a set of lifetime data.  相似文献   

14.
We consider the properties of the trimmed mean, as regards minimax-variance L-estimation of a location parameter in a Kolmogorov neighbourhood K() of the normal distribution: We first review some results on the search for an L-minimax estimator in this neighbourhood, i.e. a linear combination of order statistics whose maximum variance in Kt() is a minimum in the class of L-estimators. The natural candidate – the L-estimate which is efficient for that member of Kt,() with minimum Fisher information – is known not to be a saddlepoint solution to the minimax problem. We show here that it is not a solution at all. We do this by showing that a smaller maximum variance is attained by an appropriately trimmed mean. We argue that this trimmed mean, as well as being computationally simple – much simpler than the efficient L-estimate referred to above, and simpler than the minimax M- and R-estimators – is at least “nearly” minimax.  相似文献   

15.
The problem of estimating the total number of trials n in a binomial distribution is reconsidered in this article for both cases of known and unknown probability of success p from the Bayesian viewpoint. Bayes and empirical Bayes point estimates for n are proposed under the assumption of a left-truncated prior distribution for n and a beta prior distribution for p. Simulation studies are provided in this article in order to compare the proposed estimate with the most familiar n estimates.  相似文献   

16.
17.
Distributions of a response y (height, for example) differ with values of a factor t (such as age). Given a response y* for a subject of unknown t*, the objective of inverse prediction is to infer the value of t* and to provide a defensible confidence set for it. Training data provide values of y observed on subjects at known values of t. Models relating the mean and variance of y to t can be formulated as mixed (fixed and random) models in terms of sets of functions of t, such as polynomial spline functions. A confidence set on t* can then be had as those hypothetical values of t for which y* is not detected as an outlier when compared to the model fit to the training data. With nonconstant variance, the p-values for these tests are approximate. This article describes how versatile models for this problem can be formulated in such a way that the computations can be accomplished with widely available software for mixed models, such as SAS PROC MIXED. Coverage probabilities of confidence sets on t* are illustrated in an example.  相似文献   

18.
In consumer preference studies, it is common to seek a complete ranking of a variety of, say N, alternatives or treatments. Unfortunately, as N increases, it becomes progressively more confusing and undesirable for respondents to rank all N alternatives simultaneously. Moreover, the investigators may only be interested in consumers’ top few choices. Therefore, it is desirable to accommodate the setting where each survey respondent ranks only her/his most preferred k (k?N) alternatives. In this paper, we propose a simple procedure to test the independence of N alternatives and the top-k ranks, such that the value of k can be predetermined before securing a set of partially ranked data or be at the discretion of the investigator in the presence of complete ranking data. The asymptotic distribution of the proposed test under root-n local alternatives is established. We demonstrate our procedure with two real data sets.  相似文献   

19.
It has been modeled for several replacement policies in literatures that the whole life cycle or operating interval of an operating unit should be finite rather than infinite as is done with the traditional method. However, it is more natural to consider the case in which the finite life cycle is a fluctuated parameter that could be used to estimate replacement times, which will be taken up in this article. For this, we first formulate a general model in which the unit is replaced at random age U, random time Y for the first working number, random life cycle S, or at failure X, whichever occurs first. The following models included in the general model, such that replacement done at age T when variable U is a degenerate distribution, and replacement done at working numbers N summed by number N of variable Y, are optimized. We obtain the total expected cost until replacement and the expected replacement cost rate for each model. Optimal age T, working number N, and a pair of (T, N) are discussed analytically and computed numerically.  相似文献   

20.
In Statistics of Extremes, the estimation of parameters of extreme or even rare events is usually done under a semi-parametric framework. The estimators are based on the largest k-ordered statistics in the sample or on the excesses over a high level u. Although showing good asymptotic properties, most of those estimators present a strong dependence on k or u with high bias when the k increases or the level u decreases. The use of resampling methodologies has revealed to be promising in the reduction of the bias and in the choice of k or u. Different approaches for resampling need to be considered depending on whether we are in an independent or in a dependent setup. A great amount of investigation has been performed for the independent situation. The main objective of this article is to use bootstrap and jackknife methods in the context of dependence to obtain more stable estimators of a parameter that appears characterizing the degree of local dependence on extremes, the so-called extremal index. A simulation study illustrates the application of those methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号