首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
ABSTRACT

Incremental modelling of data streams is of great practical importance, as shown by its applications in advertising and financial data analysis. We propose two incremental covariance matrix decomposition methods for a compositional data type. The first method, exact incremental covariance decomposition of compositional data (C-EICD), gives an exact decomposition result. The second method, covariance-free incremental covariance decomposition of compositional data (C-CICD), is an approximate algorithm that can efficiently compute high-dimensional cases. Based on these two methods, many frequently used compositional statistical models can be incrementally calculated. We take multiple linear regression and principle component analysis as examples to illustrate the utility of the proposed methods via extensive simulation studies.  相似文献   

2.
The restrictive properties of compositional data, that is multivariate data with positive parts that carry only relative information in their components, call for special care to be taken while performing standard statistical methods, for example, regression analysis. Among the special methods suitable for handling this problem is the total least squares procedure (TLS, orthogonal regression, regression with errors in variables, calibration problem), performed after an appropriate log-ratio transformation. The difficulty or even impossibility of deeper statistical analysis (confidence regions, hypotheses testing) using the standard TLS techniques can be overcome by calibration solution based on linear regression. This approach can be combined with standard statistical inference, for example, confidence and prediction regions and bounds, hypotheses testing, etc., suitable for interpretation of results. Here, we deal with the simplest TLS problem where we assume a linear relationship between two errorless measurements of the same object (substance, quantity). We propose an iterative algorithm for estimating the calibration line and also give confidence ellipses for the location of unknown errorless results of measurement. Moreover, illustrative examples from the fields of geology, geochemistry and medicine are included. It is shown that the iterative algorithm converges to the same values as those obtained using the standard TLS techniques. Fitted lines and confidence regions are presented for both original and transformed compositional data. The paper contains basic principles of linear models and addresses many related problems.  相似文献   

3.
Abstract

The regression model with ordinal outcome has been widely used in a lot of fields because of its significant effect. Moreover, predictors measured with error and multicollinearity are long-standing problems and often occur in regression analysis. However there are not many studies on dealing with measurement error models with generally ordinal response, even fewer when they suffer from multicollinearity. The purpose of this article is to estimate parameters of ordinal probit models with measurement error and multicollinearity. First, we propose to use regression calibration and refined regression calibration to estimate parameters in ordinal probit models with measurement error. Second, we develop new methods to obtain estimators of parameters in the presence of multicollinearity and measurement error in ordinal probit model. Furthermore we also extend all the methods to quadratic ordinal probit models and talk about the situation in ordinal logistic models. These estimators are consistent and asymptotically normally distributed under general conditions. They are easy to compute, perform well and are robust against the normality assumption for the predictor variables in our simulation studies. The proposed methods are applied to some real datasets.  相似文献   

4.
In the context of the general linear model Y=Xβ+ε, the matrix Pz =Z(ZTZ)?1 ZT , where Z=(X: Y), plays an important role in determining least squares results. In this article we propose two graphical displays for the off-diagonal as well as the diagonal elements of PZ . The two graphs are based on simple ideas and are useful in the detection of potentially influential subsets of observations in regression. Since PZ is invariant with respect to permutations of the columns of Z, an added advantage of these graphs is that they can be used to detect outliers in multivariate data where the rows of Z are usually regarded as a random sample from a multivariate population. We also suggest two calibration points, one for the diagonal elements of PZ and the other for the off-diagonal elements. The advantage of these calibration points is that they take into consideration the variability of the off-diagonal as well as the diagonal elements of PZ . They also do not suffer from masking.  相似文献   

5.
Abstract

Errors-in-variable (EIV) regression is often used to gauge linear relationship between two variables both suffering from measurement and other errors, such as, the comparison of two measurement platforms (e.g., RNA sequencing vs. microarray). Scientists are often at a loss as to which EIV regression model to use for there are infinite many choices. We provide sound guidelines toward viable solutions to this dilemma by introducing two general nonparametric EIV regression frameworks: the compound regression and the constrained regression. It is shown that these approaches are equivalent to each other and, to the general parametric structural modeling approach. The advantages of these methods lie in their intuitive geometric representations, their distribution free nature, and their ability to offer candidate solutions with various optimal properties when the ratio of the error variances is unknown. Each includes the classic nonparametric regression methods of ordinary least squares, geometric mean regression (GMR), and orthogonal regression as special cases. Under these general frameworks, one can readily uncover some surprising optimal properties of the GMR, and truly comprehend the benefit of data normalization. Supplementary materials for this article are available online.  相似文献   

6.
Empirical likelihood ratio confidence regions based on the chi-square calibration suffer from an undercoverage problem in that their actual coverage levels tend to be lower than the nominal levels. The finite sample distribution of the empirical log-likelihood ratio is recognized to have a mixture structure with a continuous component on [0, + ∞) and a point mass at + ∞. The undercoverage problem of the Chi-square calibration is partly due to its use of the continuous Chi-square distribution to approximate the mixture distribution of the empirical log-likelihood ratio. In this article, we propose two new methods of calibration which will take advantage of the mixture structure; we construct two new mixture distributions by using the F and chi-square distributions and use these to approximate the mixture distributions of the empirical log-likelihood ratio. The new methods of calibration are asymptotically equivalent to the chi-square calibration. But the new methods, in particular the F mixture based method, can be substantially more accurate than the chi-square calibration for small and moderately large sample sizes. The new methods are also as easy to use as the chi-square calibration.  相似文献   

7.
Abstract

The economic mobility of individuals and households is of fundamental interest. While many measures of economic mobility exist, reliance on transition matrices remains pervasive due to simplicity and ease of interpretation. However, estimation of transition matrices is complicated by the well-acknowledged problem of measurement error in self-reported and even administrative data. Existing methods of addressing measurement error are complex, rely on numerous strong assumptions, and often require data from more than two periods. In this article, we investigate what can be learned about economic mobility as measured via transition matrices while formally accounting for measurement error in a reasonably transparent manner. To do so, we develop a nonparametric partial identification approach to bound transition probabilities under various assumptions on the measurement error and mobility processes. This approach is applied to panel data from the United States to explore short-run mobility before and after the Great Recession.  相似文献   

8.
Summary.  A common problem with laboratory assays is that a measurement of a substance in a test sample becomes relatively imprecise as the concentration decreases. A standard solution is to establish lower limits for reliable measurement. A quantitation limit is a level above which a measurement has sufficient precision to be reliably reported. The paper proposes a new approach to defining the limit of quantitation for the case where a linear calibration curve is used to estimate actual concentrations from measured values. The approach is based on the relative precision of the estimated concentration, using the delta method to approximate the precision. A graphical display is proposed for the assessment of estimated concentrations, as well as the overall reliability of the calibration curve. Our research is motivated by a clinical inhalation experiment. Comparisons are made between the approach proposed and two standard methods, using both real and simulated data.  相似文献   

9.
ABSTRACT

The living hours data of individuals' time spent on daily activities are compositional and include many zeros because individuals do not pursue all activities every day. Thus, we should exercise caution in using such data for empirical analyses. The Bayesian method offers several advantages in analyzing compositional data. In this study, we analyze the time allocation of Japanese married couples using the Bayesian model. Based on the Bayes factors, we compare models that consider and do not consider the correlations between married couples' time use data. The model that considers the correlation shows superior performance. We show that the Bayesian method can adequately take into account the correlations of wives' and husbands' living hours, facilitating the calculation of partial effects that their activities' variables have on living hours. The partial effects of the model that considers the correlations between the couples' time use are easily calculated from the posterior results.  相似文献   

10.
A common problem in medical statistics is the discrimination between two groups on the basis of diagnostic information. Information on patient characteristics is used to classify individuals into one of two groups: diseased or disease-free. This classification is often with respect to a particular disease. This discrimination has two probabilistic components: (1) the discrimination is not without error, and (2) in many cases the a priori chance of disease can be estimated. Logistic models (Cox 1970; Anderson 1972) provide methods for incorporating both of these components. The a posteriori probability of disease may be estimated for a patient on the basis of both current measurement of patient characteristics and prior information. The parameters of the logistic model may be estimated on the basis of a calibration trial. In practice, not one but several sets of measurements of one characteristic of the patient may be made on a questionable case. These measurements typically are correlated; they are far from independent. How should these correlated measurements be used? This paper presents a method for incorporating several sets of measurements in the classification of a case.  相似文献   

11.
ABSTRACT

Calibration, also called inverse regression, is a classical problem which appears often in a regression setup under fixed design. The aim of this article is to propose a stochastic method which gives an estimated solution for a linear calibration problem. We establish exponential inequalities of Bernstein–Frechet type for the probability of the distance between the approximate solutions and the exact one. Furthermore, we build a confidence domain for the so-mentioned exact solution. To check the validity of our results, a numerical example is proposed.  相似文献   

12.

Meta-analysis refers to quantitative methods for combining results from independent studies in order to draw overall conclusions. Hierarchical models including selection models are introduced and shown to be useful in such Bayesian meta-analysis. Semiparametric hierarchical models are proposed using the Dirichlet process prior. These rich class of models combine the information of independent studies, allowing investigation of variability both between and within studies, and weight function. Here we investigate sensitivity of results to unobserved studies by considering a hierarchical selection model with including unknown weight function and use Markov chain Monte Carlo methods to develop inference for the parameters of interest. Using Bayesian method, this model is used on a meta-analysis of twelve studies comparing the effectiveness of two different types of flouride, in preventing cavities. Clinical informative prior is assumed. Summaries and plots of model parameters are analyzed to address questions of interest.  相似文献   

13.
ABSTRACT

This paper presents methods for constructing prediction limits for a step-stress model in accelerated life testing. An exponential life distribution with a mean that is a log-linear function of stress, and a cumulative exposure model are assumed. Two prediction problems are discussed. One concerns the prediction of the life at a design stress, and the other concerns the prediction of a future life during the step-stress testing. Both predictions require the knowledge of some model parameters. When estimates for the model parameters are available, a calibration method based on simulations is proposed for correcting the prediction intervals (regions) obtained by treating the parameter estimates as the true parameter values. Finally, a numerical example is given to illustrate the prediction procedure.  相似文献   

14.
Abstract

Most countries use the Dutot, Jevons or Carli index for the calculation of their Consumer Price Index (CPI) at the lowest (elementary) level of aggregation. The choice of the elementary formula for inflation measurement does matter and the effect of the change of the index formula was estimated by the Bureau of Labor Statistics. It has been shown that difference between elementary indices can be explained in terms of changes in price dispersion. In this article, we extend these results comparing both population and sample elementary indices. We assume that prices from two compared time moments are log-normally distributed and correlated. Under the above-mentioned assumption, we provide formulas for biases and mean-squared errors of main elementary indices.  相似文献   

15.
Let X be an input measurement and Y the output reading of a calibrated instrument, with Y(X) as the calibration curve. Solving X(Y) projects an instrumental reading back onto the scale of measurements as an object of pivotal interest. The arrays of instrumental readings are projected in this manner in practice, yielding arrays of calibrated measurements, typically subject to errors of calibration. The effects of calibration errors on the properties of calibrated measurements are examined here under linear calibration. Irregularities arise as induced dependencies, inflated variances, non-standard distributions, inconsistent sample means, the underestimation of measurement variance, and other unintended consequences. On the other hand, conventional properties are seen to remain largely in place in the use of selected regression diagnostics and in one-way comparative experiments using calibrated data.  相似文献   

16.
《随机性模型》2013,29(4):425-447
Abstract

In this paper, we define a birth–death‐modulated Markovian arrival process (BDMMAP) as a Markovian arrival process (MAP) with an underlying birth–death process. It is proved that the zeros of det(zI ? A(z)) in the unit disk are real and simple. In order to analyze a BDMMAP/G/1 queue, two spectral methods are proposed. The first one is a bisection method for calculation of the zeros from which the boundary vector is derived. The second one is the Fourier inversion transform of the probability generating function for the calculation of the stationary probability distribution of the queue length. Eigenvalues required in this calculation are obtained by the Duran–Kerner–Aberth (DKA) method. For numerical examples, the stationary probability distribution of the queue length is calculated by using the spectral methods. Comparisons of the spectral methods with the currently best methods available are discussed.  相似文献   

17.
In this paper, we perform an empirical comparison of the classification error of several ensemble methods based on classification trees. This comparison is performed by using 14 data sets that are publicly available and that were used by Lim, Loh and Shih [Lim, T., Loh, W. and Shih, Y.-S., 2000, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40, 203–228.]. The methods considered are a single tree, Bagging, Boosting (Arcing) and random forests (RF). They are compared from different perspectives. More precisely, we look at the effects of noise and of allowing linear combinations in the construction of the trees, the differences between some splitting criteria and, specifically for RF, the effect of the number of variables from which to choose the best split at each given node. Moreover, we compare our results with those obtained by Lim et al. [Lim, T., Loh, W. and Shih, Y.-S., 2000, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40, 203–228.]. In this study, the best overall results are obtained with RF. In particular, RF are the most robust against noise. The effect of allowing linear combinations and the differences between splitting criteria are small on average, but can be substantial for some data sets.  相似文献   

18.
Compositional data are characterized by values containing relative information, and thus the ratios between the data values are of interest for the analysis. Due to specific features of compositional data, standard statistical methods should be applied to compositions expressed in a proper coordinate system with respect to an orthonormal basis. It is discussed how three-way compositional data can be analyzed with the Parafac model. When data are contaminated by outliers, robust estimates for the Parafac model parameters should be employed. It is demonstrated how robust estimation can be done in the context of compositional data and how the results can be interpreted. A real data example from macroeconomics underlines the usefulness of this approach.  相似文献   

19.
ABSTRACT

In this paper, we introduce a competing risks model for the lifetimes of components that differs from the classical competing risks models by the fact that it is not directly observable which component has failed. We propose two statistical methods for estimating the reliability of components from failure data on a system. Our methods are applied to simulated failure data, in order to illustrate the performance of the methods.  相似文献   

20.
Wage differences between women and men can be divided into an explained part and an unexplained part. The former encompasses differences in the observable characteristics of the members of groups, such as age, education or work experience. The latter includes the part of the difference that is not attributable to objective factors and represents an estimation of the discrimination level. We discuss the original method of Blinder (J Hum Resour 8(4):436–455, 1973) and Oaxaca (Int Econ Rev 14(3):693–709, 1973), the reweighting technique of DiNardo et al. (Econometrica 64(5):1001–1044, 1996) and our approach based on calibration. Using a Swiss dataset from 2012, we compare the estimated explained and unexplained parts of the difference in average wages in the private and public sectors obtained with the three methods. We show that for the private sector, all three methods yield similar results. For the public sector, the reweighting technique estimates a lower value of the unexplained part than the other two methods. The calibration approach and the reweighting technique allow us to estimate the explained and unexplained parts of the wage differences at points other than the mean. By using this, in this paper, the assumption that wages are more equitable in the public sector is analysed. Wage differences at different quantiles in both sectors are examined. We show that in the public sector, discrimination occurs quite uniformly both in lower and in higher-paying jobs. On the other hand, in the private sector, discrimination is greater in lower-paying jobs than in higher-paying jobs.queryPlease check and confirm the given name and family name is correctly identified for the first author and amend if necessary.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号