首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Perakis and Xekalaki 2002, A process capability index that is based on the proportion of conformance. Journal of Statistical Computation and Simulation, 72(9), 707–718. introduced a process capability index that is based on the proportion of conformance of the process under study and has several appealing features. One of its advantages is that it can be used not only for continuous processes, as is the case with the majority of the indices considered in the literature, but also for discrete processes as well. In this article, the use of this index is investigated for discrete data under two alternative models, which are frequently considered in statistical process control. In particular, distributional properties and estimation of the index are considered for Poisson processes and for processes resulting in modeling attribute data. The performance of the suggested estimators and confidence limits is tested via simulation.  相似文献   

2.
The mode of a distribution provides an important summary of data and is often estimated on the basis of some non‐parametric kernel density estimator. This article develops a new data analysis tool called modal linear regression in order to explore high‐dimensional data. Modal linear regression models the conditional mode of a response Y given a set of predictors x as a linear function of x . Modal linear regression differs from standard linear regression in that standard linear regression models the conditional mean (as opposed to mode) of Y as a linear function of x . We propose an expectation–maximization algorithm in order to estimate the regression coefficients of modal linear regression. We also provide asymptotic properties for the proposed estimator without the symmetric assumption of the error density. Our empirical studies with simulated data and real data demonstrate that the proposed modal regression gives shorter predictive intervals than mean linear regression, median linear regression and MM‐estimators.  相似文献   

3.
The complex triparametric Pearson (CTP) distribution is a flexible model belonging to the Gaussian hypergeometric family that can account for over- and underdispersion. However, despite its good properties, not much attention has been paid to it. So, we revive the CTP comparing it with some well-known distributions that cope with overdispersion (negative binomial, generalized Poisson and univariate generalized Waring) as well as underdispersion (Conway–Maxwell–Poisson (CMP) and hyper-Poisson (HP)). We make a simulation study that reveals the performance of the CTP and shows that it has its own space among count data models. In this sense, we also explore some overdispersed datasets which seem to be more appropriately modelled by the CTP than by other usual models. Moreover, we include two underdispersed examples to illustrate that the CTP can provide similar fits to the CMP or HP (sometimes even more accurate) without the computational problems of these models.  相似文献   

4.
Missing values are common in longitudinal data studies. The missing data mechanism is termed non-ignorable (NI) if the probability of missingness depends on the non-response (missing) observations. This paper presents a model for the ordinal categorical longitudinal data with NI non-monotone missing values. We assumed two separate models for the response and missing procedure. The response is modeled as ordinal logistic, whereas the logistic binary model is considered for the missing process. We employ these models in the context of so-called shared-parameter models, where the outcome and missing data models are connected by a common set of random effects. It is commonly assumed that the random effect follows the normal distribution in longitudinal data with or without missing data. This can be extremely restrictive in practice, and it may result in misleading statistical inferences. In this paper, we instead adopt a more flexible alternative distribution which is called the skew-normal distribution. The methodology is illustrated through an application to Schizophrenia Collaborative Study data [19 D. Hedeker, Generalized linear mixed models, in Encyclopedia of Statistics in Behavioral Science, B. Everitt and D. Howell, eds., John Wiley, London, 2005, pp. 729738. [Google Scholar]] and a simulation.  相似文献   

5.
Abstract. This paper reviews some of the key statistical ideas that are encountered when trying to find empirical support to causal interpretations and conclusions, by applying statistical methods on experimental or observational longitudinal data. In such data, typically a collection of individuals are followed over time, then each one has registered a sequence of covariate measurements along with values of control variables that in the analysis are to be interpreted as causes, and finally the individual outcomes or responses are reported. Particular attention is given to the potentially important problem of confounding. We provide conditions under which, at least in principle, unconfounded estimation of the causal effects can be accomplished. Our approach for dealing with causal problems is entirely probabilistic, and we apply Bayesian ideas and techniques to deal with the corresponding statistical inference. In particular, we use the general framework of marked point processes for setting up the probability models, and consider posterior predictive distributions as providing the natural summary measures for assessing the causal effects. We also draw connections to relevant recent work in this area, notably to Judea Pearl's formulations based on graphical models and his calculus of so‐called do‐probabilities. Two examples illustrating different aspects of causal reasoning are discussed in detail.  相似文献   

6.
The big data era demands new statistical analysis paradigms, since traditional methods often break down when datasets are too large to fit on a single desktop computer. Divide and Recombine (D&R) is becoming a popular approach for big data analysis, where results are combined over subanalyses performed in separate data subsets. In this article, we consider situations where unit record data cannot be made available by data custodians due to privacy concerns, and explore the concept of statistical sufficiency and summary statistics for model fitting. The resulting approach represents a type of D&R strategy, which we refer to as summary statistics D&R; as opposed to the standard approach, which we refer to as horizontal D&R. We demonstrate the concept via an extended Gamma–Poisson model, where summary statistics are extracted from different databases and incorporated directly into the fitting algorithm without having to combine unit record data. By exploiting the natural hierarchy of data, our approach has major benefits in terms of privacy protection. Incorporating the proposed modelling framework into data extraction tools such as TableBuilder by the Australian Bureau of Statistics allows for potential analysis at a finer geographical level, which we illustrate with a multilevel analysis of the Australian unemployment data. Supplementary materials for this article are available online.  相似文献   

7.
We consider the situation where there is a known regression model that can be used to predict an outcome, Y, from a set of predictor variables X . A new variable B is expected to enhance the prediction of Y. A dataset of size n containing Y, X and B is available, and the challenge is to build an improved model for Y| X ,B that uses both the available individual level data and some summary information obtained from the known model for Y| X . We propose a synthetic data approach, which consists of creating m additional synthetic data observations, and then analyzing the combined dataset of size n + m to estimate the parameters of the Y| X ,B model. This combined dataset of size n + m now has missing values of B for m of the observations, and is analyzed using methods that can handle missing data (e.g., multiple imputation). We present simulation studies and illustrate the method using data from the Prostate Cancer Prevention Trial. Though the synthetic data method is applicable to a general regression context, to provide some justification, we show in two special cases that the asymptotic variances of the parameter estimates in the Y| X ,B model are identical to those from an alternative constrained maximum likelihood estimation approach. This correspondence in special cases and the method's broad applicability makes it appealing for use across diverse scenarios. The Canadian Journal of Statistics 47: 580–603; 2019 © 2019 Statistical Society of Canada  相似文献   

8.
This article derives the likelihood ratio statistic to test the independence between (X 1,…,X r ) and (X r+1,…,X k ) under the assumption that (X 1,…,X k ) has a multivariate normal distribution and that a sample of size n is available, where for N observation vectors all components are available, while for M = (n + N) observation vectors, the data on the last q components, (Xk-q+1,…,X k ) are missing (k+q≥r).  相似文献   

9.
We describe inferactive data analysis, so-named to denote an interactive approach to data analysis with an emphasis on inference after data analysis. Our approach is a compromise between Tukey's exploratory and confirmatory data analysis allowing also for Bayesian data analysis. We see this as a useful step in concrete providing tools (with statistical guarantees) for current data scientists. The basis of inference we use is (a conditional approach to) selective inference, in particular its randomized form. The relevant reference distributions are constructed from what we call a DAG-DAG—a Data Analysis Generative DAG, and a selective change of variables formula is crucial to any practical implementation of inferactive data analysis via sampling these distributions. We discuss a canonical example of an incomplete cross-validation test statistic to discriminate between black box models, and a real HIV dataset example to illustrate inference after making multiple queries on data.  相似文献   

10.
Biplots represent a widely used statistical tool for visualizing the resulting loadings and scores of a dimension reduction technique applied to multivariate data. If the underlying data carry only relative information (i.e. compositional data expressed in proportions, mg/kg, etc.) they have to be pre-processed with a logratio transformation before the dimension reduction is carried out. In the context of principal component analysis, the resulting biplot is called compositional biplot. We introduce an alternative, the ilr biplot, which is based on a special choice of orthonormal coordinates resulting from an isometric logratio (ilr) transformation. This allows to incorporate also external non-compositional variables, and to study the relations to the compositional variables. The methodology is demonstrated on real data sets.  相似文献   

11.
The Buckley–James estimator (BJE) [J. Buckley and I. James, Linear regression with censored data, Biometrika 66 (1979), pp. 429–436] has been extended from right-censored (RC) data to interval-censored (IC) data by Rabinowitz et al. [D. Rabinowitz, A. Tsiatis, and J. Aragon, Regression with interval-censored data, Biometrika 82 (1995), pp. 501–513]. The BJE is defined to be a zero-crossing of a modified score function H(b), a point at which H(·) changes its sign. We discuss several approaches (for finding a BJE with IC data) which are extensions of the existing algorithms for RC data. However, these extensions may not be appropriate for some data, in particular, they are not appropriate for a cancer data set that we are analysing. In this note, we present a feasible iterative algorithm for obtaining a BJE. We apply the method to our data.  相似文献   

12.
Colours and Cocktails: Compositional Data Analysis 2013 Lancaster Lecture   总被引:1,自引:0,他引:1  
The different constituents of physical mixtures such as coloured paint, cocktails, geological and other samples can be represented by d‐dimensional vectors called compositions with non‐negative components that sum to one. Data in which the observations are compositions are called compositional data. There are a number of different ways of thinking about and consequently analysing compositional data. The log‐ratio methods proposed by Aitchison in the 1980s have become the dominant methods in the field. One reason for this is the development of normative arguments converting the properties of log‐ratio methods to ‘essential requirements’ or Principles for any method of analysis to satisfy. We discuss different ways of thinking about compositional data and interpret the development of the Principles in terms of these different viewpoints. We illustrate the properties on which the Principles are based, focussing particularly on the key subcompositional coherence property. We show that this Principle is based on implicit assumptions and beliefs that do not always hold. Moreover, it is applied selectively because it is not actually satisfied by the log‐ratio methods it is intended to justify. This implies that a more open statistical approach to compositional data analysis should be adopted.  相似文献   

13.
Markov chain Monte Carlo (MCMC) methods, while facilitating the solution of many complex problems in Bayesian inference, are not currently well adapted to the problem of marginal maximum a posteriori (MMAP) estimation, especially when the number of parameters is large. We present here a simple and novel MCMC strategy, called State-Augmentation for Marginal Estimation (SAME), which leads to MMAP estimates for Bayesian models. We illustrate the simplicity and utility of the approach for missing data interpolation in autoregressive time series and blind deconvolution of impulsive processes.  相似文献   

14.
In this article, we introduce a new extension of Burr XII distribution called Topp Leone Generated Burr XII distribution. We derive some of its properties. Useful characterizations are presented. Simulation study is performed to assess the performance of the maximum likelihood estimators. Censored maximum likelihood estimation is presented in the general case of multi-censored data. The new location-scale regression model based on the proposed distribution is introduced. The usefulness of the proposed models is illustrated empirically by means of three real datasets.  相似文献   

15.
Different longitudinal study designs require different statistical analysis methods and different methods of sample size determination. Statistical power analysis is a flexible approach to sample size determination for longitudinal studies. However, different power analyses are required for different statistical tests which arises from the difference between different statistical methods. In this paper, the simulation-based power calculations of F-tests with Containment, Kenward-Roger or Satterthwaite approximation of degrees of freedom are examined for sample size determination in the context of a special case of linear mixed models (LMMs), which is frequently used in the analysis of longitudinal data. Essentially, the roles of some factors, such as variance–covariance structure of random effects [unstructured UN or factor analytic FA0], autocorrelation structure among errors over time [independent IND, first-order autoregressive AR1 or first-order moving average MA1], parameter estimation methods [maximum likelihood ML and restricted maximum likelihood REML] and iterative algorithms [ridge-stabilized Newton-Raphson and Quasi-Newton] on statistical power of approximate F-tests in the LMM are examined together, which has not been considered previously. The greatest factor affecting statistical power is found to be the variance–covariance structure of random effects in the LMM. It appears that the simulation-based analysis in this study gives an interesting insight into statistical power of approximate F-tests for fixed effects in LMMs for longitudinal data.  相似文献   

16.
A structured model is essentially a family of random vectors Xθ defined on a probability space with values in a sample space. If, for a given sample value x and for each ω in the probability space, there is at most one parameter value θ for which Xθ(ω) is equal to x, then the model is called additive at x. When a certain conditional distribution exists, a frequency interpretation specific to additive structured models holds, and is summarized in a unique structured distribution for the parameter. Many of the techniques used by Fisher in deriving and handling his fiducial probability distribution are shown to be valid when dealing with a structured distribution.  相似文献   

17.
In partly linear models, the dependence of the response y on (x T, t) is modeled through the relationship y=x T β+g(t)+?, where ? is independent of (x T, t). We are interested in developing an estimation procedure that allows us to combine the flexibility of the partly linear models, studied by several authors, but including some variables that belong to a non-Euclidean space. The motivating application of this paper deals with the explanation of the atmospheric SO2 pollution incidents using these models when some of the predictive variables belong in a cylinder. In this paper, the estimators of β and g are constructed when the explanatory variables t take values on a Riemannian manifold and the asymptotic properties of the proposed estimators are obtained under suitable conditions. We illustrate the use of this estimation approach using an environmental data set and we explore the performance of the estimators through a simulation study.  相似文献   

18.
Nonlinear mixed-effects (NLME) models are flexible enough to handle repeated-measures data from various disciplines. In this article, we propose both maximum-likelihood and restricted maximum-likelihood estimations of NLME models using first-order conditional expansion (FOCE) and the expectation–maximization (EM) algorithm. The FOCE-EM algorithm implemented in the ForStat procedure SNLME is compared with the Lindstrom and Bates (LB) algorithm implemented in both the SAS macro NLINMIX and the S-Plus/R function nlme in terms of computational efficiency and statistical properties. Two realworld data sets an orange tree data set and a Chinese fir (Cunninghamia lanceolata) data set, and a simulated data set were used for evaluation. FOCE-EM converged for all mixed models derived from the base model in the two realworld cases, while LB did not, especially for the models in which random effects are simultaneously considered in several parameters to account for between-subject variation. However, both algorithms had identical estimated parameters and fit statistics for the converged models. We therefore recommend using FOCE-EM in NLME models, particularly when convergence is a concern in model selection.  相似文献   

19.
When two‐component parallel systems are tested, the data consist of Type‐II censored data X(i), i= 1, n, from one component, and their concomitants Y [i] randomly censored at X(r), the stopping time of the experiment. Marshall & Olkin's (1967) bivariate exponential distribution is used to illustrate statistical inference procedures developed for this data type. Although this data type is motivated practically, the likelihood is complicated, and maximum likelihood estimation is difficult, especially in the case where the parameter space is a non‐open set. An iterative algorithm is proposed for finding maximum likelihood estimates. This article derives several properties of the maximum likelihood estimator (MLE) including existence, uniqueness, strong consistency and asymptotic distribution. It also develops an alternative estimation method with closed‐form expressions based on marginal distributions, and derives its asymptotic properties. Compared with variances of the MLEs in the finite and large sample situations, the alternative estimator performs very well, especially when the correlation between X and Y is small.  相似文献   

20.
In this paper, we propose nonlinear elliptical models for correlated data with heteroscedastic and/or autoregressive structures. Our aim is to extend the models proposed by Russo et al. 22 by considering a more sophisticated scale structure to deal with variations in data dispersion and/or a possible autocorrelation among measurements taken throughout the same experimental unit. Moreover, to avoid the possible influence of outlying observations or to take into account the non-normal symmetric tails of the data, we assume elliptical contours for the joint distribution of random effects and errors, which allows us to attribute different weights to the observations. We propose an iterative algorithm to obtain the maximum-likelihood estimates for the parameters and derive the local influence curvatures for some specific perturbation schemes. The motivation for this work comes from a pharmacokinetic indomethacin data set, which was analysed previously by Bocheng and Xuping 1 under normality.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号