首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
ABSTRACT

In this paper, we investigate the consistency of the Expectation Maximization (EM) algorithm-based information criteria for model selection with missing data. The criteria correspond to a penalization of the conditional expectation of the complete data log-likelihood given the observed data and with respect to the missing data conditional density. We present asymptotic properties related to maximum likelihood estimation in the presence of incomplete data and we provide sufficient conditions for the consistency of model selection by minimizing the information criteria. Their finite sample performance is illustrated through simulation and real data studies.  相似文献   

2.
Shaffer's extensions and generalization of Dunnett's procedure are shown to be applicable in several nonparametric data analyses. Applications are considered within the context of the Kruskal-Wallis one-way analysis of variance (ANOVA) test for ranked data, Friedman's two-way ANOVA test for ranked data, and Cochran's test of change for dichotomous data.  相似文献   

3.
This article considers an approach to estimating and testing a new Kronecker product covariance structure for three-level (multiple time points (p), multiple sites (u), and multiple response variables (q)) multivariate data. Testing of such covariance structure is potentially important for high dimensional multi-level multivariate data. The hypothesis testing procedure developed in this article can not only test the hypothesis for three-level multivariate data, but also can test many different hypotheses, such as blocked compound symmetry, for two-level multivariate data as special cases. The tests are implemented with two real data sets.  相似文献   

4.
The Buckley–James estimator (BJE) [J. Buckley and I. James, Linear regression with censored data, Biometrika 66 (1979), pp. 429–436] has been extended from right-censored (RC) data to interval-censored (IC) data by Rabinowitz et al. [D. Rabinowitz, A. Tsiatis, and J. Aragon, Regression with interval-censored data, Biometrika 82 (1995), pp. 501–513]. The BJE is defined to be a zero-crossing of a modified score function H(b), a point at which H(·) changes its sign. We discuss several approaches (for finding a BJE with IC data) which are extensions of the existing algorithms for RC data. However, these extensions may not be appropriate for some data, in particular, they are not appropriate for a cancer data set that we are analysing. In this note, we present a feasible iterative algorithm for obtaining a BJE. We apply the method to our data.  相似文献   

5.
The big data era demands new statistical analysis paradigms, since traditional methods often break down when datasets are too large to fit on a single desktop computer. Divide and Recombine (D&R) is becoming a popular approach for big data analysis, where results are combined over subanalyses performed in separate data subsets. In this article, we consider situations where unit record data cannot be made available by data custodians due to privacy concerns, and explore the concept of statistical sufficiency and summary statistics for model fitting. The resulting approach represents a type of D&R strategy, which we refer to as summary statistics D&R; as opposed to the standard approach, which we refer to as horizontal D&R. We demonstrate the concept via an extended Gamma–Poisson model, where summary statistics are extracted from different databases and incorporated directly into the fitting algorithm without having to combine unit record data. By exploiting the natural hierarchy of data, our approach has major benefits in terms of privacy protection. Incorporating the proposed modelling framework into data extraction tools such as TableBuilder by the Australian Bureau of Statistics allows for potential analysis at a finer geographical level, which we illustrate with a multilevel analysis of the Australian unemployment data. Supplementary materials for this article are available online.  相似文献   

6.
In this paper, we study strong uniform consistency of a weighted average of artificial data points. This is especially useful when information is incomplete (censored data, missing data …). In this case, reconstruction of the information is often achieved nonparametrically by using a local preservation of mean criterion for which the corresponding mean is estimated by a weighted average of new data points. The present approach enlarges the possible scope for applications beyond just the incomplete data context and can also be useful to treat the estimation of the conditional mean of specific functions of complete data points. As a consequence, we establish the strong uniform consistency of the Nadaraya–Watson [Nadaraya, E.A., 1964. On estimating regression. Theory Probab. Appl. 9, 141–142; Watson, G.S., 1964. Smooth regression analysis. Sankhyā Ser. A 26, 359–372] estimator for general transformations of the data points. This result generalizes the one of Härdle et al. [Strong uniform consistency rates for estimators of conditional functionals. Ann. Statist. 16, 1428–1449]. In addition, the strong uniform consistency of a modulus of continuity will be obtained for this estimator. Applications of those two results are detailed for some popular estimators.  相似文献   

7.
The paper considers the goodness of fit tests with right censored data or doubly censored data. The Fredholm Integral Equation (FIE) method proposed by Ren (1993) is implemented in the simulation studies to estimate the null distribution of the Cramér-von Mises test statistics and the asymptotic covariance function of the self-consistent estimator for the lifetime distribution with right censored data or doubly censored data. We show that for fixed alternatives, the bootstrap method does not estimate the null distribution consistently for doubly censored data. For the right censored case, a comparison between the performance of FIE and the η out of η bootstrap shows that FIE gives better estimation for the null distribution. The application of FIE to a set of right censored Channing House data and to a set of doubly censored breast cancer data is presented.  相似文献   

8.
李金昌 《统计研究》2014,31(11):3-14
本文首先对大数据进行了再认识,包括如何理解大数据的“大”、如何理解大数据的“数据”以及大数据是不是好数据;然后对数据的变化与统计分析方法的发展进行了较为系统的历史梳理,对两者的关系进行了总结;最后探讨了统计学的新发展--大数据分析问题,认为大数据分析是数据科学赋予统计学的新任务,指出了大数据分析面临的挑战与突破口,提出了大数据分析需要达成的共识。  相似文献   

9.
k-POD: A Method for k-Means Clustering of Missing Data   总被引:1,自引:0,他引:1  
The k-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, are common in many applications. Mainstream approaches to clustering missing data reduce the missing data problem to a complete data formulation through either deletion or imputation but these solutions may incur significant costs. Our k-POD method presents a simple extension of k-means clustering for missing data that works even when the missingness mechanism is unknown, when external information is unavailable, and when there is significant missingness in the data.

[Received November 2014. Revised August 2015.]  相似文献   

10.
Abstract

Non-normality is a common phenomenon in data from agricultural and biological research, especially in molecular data (for example; -omics, RNAseq, flow cytometric data, etc.). For over half a century, the leading paradigm called for using analysis of variance (ANOVA) after applying a data transformation. The introduction of generalized linear mixed models (GLMM) provides a new way of analyzing non-normal data. Selecting an apt link function in GLMM can be quite influential, however, and is as critical as selecting an appropriate transformation for ANOVA. In this paper, we assess the performance of different parametric link families available in literature. Then, we propose a new estimation method for selecting an appropriate link function with a suitable variance function in a quasi-likelihood framework. We apply these methods to a proteomics data set, showing that GLMMs provide a very flexible framework for analyzing these kinds of data.  相似文献   

11.
ABSTRACT

One main challenge for statistical prediction with data from multiple sources is that not all the associated covariate data are available for many sampled subjects. Consequently, we need new statistical methodology to handle this type of “fragmentary data” that has become more and more popular in recent years. In this article, we propose a novel method based on the frequentist model averaging that fits some candidate models using all available covariate data. The weights in model averaging are selected by delete-one cross-validation based on the data from complete cases. The optimality of the selected weights is rigorously proved under some conditions. The finite sample performance of the proposed method is confirmed by simulation studies. An example for personal income prediction based on real data from a leading e-community of wealth management in China is also presented for illustration.  相似文献   

12.
Abstract

Libraries and vendors share change data to ensure successful discovery and access for users. Change data have increased dramatically and affected data communication, causing barriers to access. Serials data used to be simple: There were new, ceased, and title changes. Data were efficiently exchanged. Now data are complex and difficult to communicate in the digital age. This presentation will discuss the challenges of change data and the ways libraries and vendors are teaming up to calm the whirlwind of change.  相似文献   

13.
Many wavelet shrinkage methods assume that the data are observed on an equally spaced grid of length of the form 2J for some J. These methods require serious modification or preprocessed data to cope with irregularly spaced data. The lifting scheme is a recent mathematical innovation that obtains a multiscale analysis for irregularly spaced data. A key lifting component is the “predict” step where a prediction of a data point is made. The residual from the prediction is stored and can be thought of as a wavelet coefficient. This article exploits the flexibility of lifting by adaptively choosing the kind of prediction according to a criterion. In this way the smoothness of the underlying ‘wavelet’ can be adapted to the local properties of the function. Multiple observations at a point can readily be handled by lifting through a suitable choice of prediction. We adapt existing shrinkage rules to work with our adaptive lifting methods. We use simulation to demonstrate the improved sparsity of our techniques and improved regression performance when compared to both wavelet and non-wavelet methods suitable for irregular data. We also exhibit the benefits of our adaptive lifting on the real inductance plethysmography and motorcycle data.  相似文献   

14.
We introduce scaled density models for binary response data which can be much more reasonable than the traditional binary response models for particular types of binary response data. We show the maximum-likelihood estimates for the new models and it seems that the model works well with some sets of data. We also considered optimum designs for parameter estimation for the models and found that the D- and Ds-optimum designs are independent of parameters corresponding to the linear function of dose level, but the optimum designs are simple functions of a scale parameter only.  相似文献   

15.
Yu  Tingting  Wu  Lang  Gilbert  Peter 《Lifetime data analysis》2019,25(2):229-258

In HIV vaccine studies, longitudinal immune response biomarker data are often left-censored due to lower limits of quantification of the employed immunological assays. The censoring information is important for predicting HIV infection, the failure event of interest. We propose two approaches to addressing left censoring in longitudinal data: one that makes no distributional assumptions for the censored data—treating left censored values as a “point mass” subgroup—and the other makes a distributional assumption for a subset of the censored data but not for the remaining subset. We develop these two approaches to handling censoring for joint modelling of longitudinal and survival data via a Cox proportional hazards model fit by h-likelihood. We evaluate the new methods via simulation and analyze an HIV vaccine trial data set, finding that longitudinal characteristics of the immune response biomarkers are highly associated with the risk of HIV infection.

  相似文献   

16.
We describe inferactive data analysis, so-named to denote an interactive approach to data analysis with an emphasis on inference after data analysis. Our approach is a compromise between Tukey's exploratory and confirmatory data analysis allowing also for Bayesian data analysis. We see this as a useful step in concrete providing tools (with statistical guarantees) for current data scientists. The basis of inference we use is (a conditional approach to) selective inference, in particular its randomized form. The relevant reference distributions are constructed from what we call a DAG-DAG—a Data Analysis Generative DAG, and a selective change of variables formula is crucial to any practical implementation of inferactive data analysis via sampling these distributions. We discuss a canonical example of an incomplete cross-validation test statistic to discriminate between black box models, and a real HIV dataset example to illustrate inference after making multiple queries on data.  相似文献   

17.
Abstract

In longitudinal studies data are collected on the same set of units for more than one occasion. In medical studies it is very common to have mixed Poisson and continuous longitudinal data. In such studies, for different reasons, some intended measurements might not be available resulting in a missing data setting. When the probability of missingness is related to the missing values, the missingness mechanism is termed nonrandom. The stochastic expectation-maximization (SEM) algorithm and the parametric fractional imputation (PFI) method are developed to handle nonrandom missingness in mixed discrete and continuous longitudinal data assuming different covariance structures for the continuous outcome. The proposed techniques are evaluated using simulation studies. Also, the proposed techniques are applied to the interstitial cystitis data base (ICDB) data.  相似文献   

18.
ABSTRACT

In economics and government statistics, aggregated data instead of individual level data are usually reported for data confidentiality and for simplicity. In this paper we develop a method of flexibly estimating the probability density function of the population using aggregated data obtained as group averages when individual level data are grouped according to quantile limits. The kernel density estimator has been commonly applied to such data without taking into account the data aggregation process and has been shown to perform poorly. Our method models the quantile function as an integral of the exponential of a spline function and deduces the density function from the quantile function. We match the aggregated data to their theoretical counterpart using least squares, and regularize the estimation by using the squared second derivatives of the density function as the penalty function. A computational algorithm is developed to implement the method. Application to simulated data and US household income survey data show that our penalized spline estimator can accurately recover the density function of the underlying population while the common use of kernel density estimation is severely biased. The method is applied to study the dynamic of China's urban income distribution using published interval aggregated data of 1985–2010.  相似文献   

19.
ABSTRACT

The binomial exponential 2 (BE2) distribution was proposed by Bakouch et al. as a distribution of a random sum of independent exponential random variables, when the sample size has a zero truncated binomial distribution. In this article, we introduce a generalization of BE2 distribution which offers a more flexible model for lifetime data than the BE2 distribution. The hazard rate function of the proposed distribution can be decreasing, increasing, decreasing–increasing–decreasing and unimodal, so it turns out to be quite flexible for analyzing non-negative real life data. Some statistical properties and parameters estimation of the distribution are investigated. Three different algorithms are proposed for generating random data from the new distribution. Two real data applications regarding the strength data and Proschan's air-conditioner data are used to show that the new distribution is better than the BE2 distribution and some other well-known distributions in modeling lifetime data.  相似文献   

20.
In this paper, we propose and study a new global test, namely, GPF test, for the one‐way anova problem for functional data, obtained via globalizing the usual pointwise F‐test. The asymptotic random expressions of the test statistic are derived, and its asymptotic power is investigated. The GPF test is shown to be root‐n consistent. It is much less computationally intensive than a parametric bootstrap test proposed in the literature for the one‐way anova for functional data. Via some simulation studies, it is found that in terms of size‐controlling and power, the GPF test is comparable with two existing tests adopted for the one‐way anova problem for functional data. A real data example illustrates the GPF test.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号