首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Binocular data typically arise in ophthalmology where pairs of eyes are evaluated, through some diagnostic procedure, for the presence of certain diseases or pathologies. Treating eyes as independent and adopting the usual approach in estimating the sensitivity and specificity of a diagnostic test ignores the correlation between fellow eyes. This may consequently yield incorrect estimates, especially of the standard errors. The paper is concerned with diagnostic studies wherein several diagnostic tests, or the same test read by several readers, are administered to identify one or more diseases. A likelihood-based method of estimating disease-specific sensitivities and specificities via hierarchical generalized linear mixed models is proposed to meaningfully delineate the various correlations in the data. The efficiency of the estimates is assessed in a simulation study. Data from a study on diabetic retinopathy are analyzed to illustrate the methodology.  相似文献   

2.
Survey statisticians make use of auxiliary information to improve estimates. One important example is calibration estimation, which constructs new weights that match benchmark constraints on auxiliary variables while remaining “close” to the design weights. Multiple-frame surveys are increasingly used by statistical agencies and private organizations to reduce sampling costs and/or avoid frame undercoverage errors. Several ways of combining estimates derived from such frames have been proposed elsewhere; in this paper, we extend the calibration paradigm, previously used for single-frame surveys, to calculate the total value of a variable of interest in a dual-frame survey. Calibration is a general tool that allows to include auxiliary information from two frames. It also incorporates, as a special case, certain dual-frame estimators that have been proposed previously. The theoretical properties of our class of estimators are derived and discussed, and simulation studies conducted to compare the efficiency of the procedure, using different sets of auxiliary variables. Finally, the proposed methodology is applied to real data obtained from the Barometer of Culture of Andalusia survey.  相似文献   

3.
Survival data involving silent events are often subject to interval censoring (the event is known to occur within a time interval) and classification errors if a test with no perfect sensitivity and specificity is applied. Considering the nature of this data plays an important role in estimating the time distribution until the occurrence of the event. In this context, we incorporate validation subsets into the parametric proportional hazard model, and show that this additional data, combined with Bayesian inference, compensate the lack of knowledge about test sensitivity and specificity improving the parameter estimates. The proposed model is evaluated through simulation studies, and Bayesian analysis is conducted within a Gibbs sampling procedure. The posterior estimates obtained under validation subset models present lower bias and standard deviation compared to the scenario with no validation subset or the model that assumes perfect sensitivity and specificity. Finally, we illustrate the usefulness of the new methodology with an analysis of real data about HIV acquisition in female sex workers that have been discussed in the literature.  相似文献   

4.
In this paper, we focus on the variable selection for the semiparametric regression model with longitudinal data when some covariates are measured with errors. A new bias-corrected variable selection procedure is proposed based on the combination of the quadratic inference functions and shrinkage estimations. With appropriate selection of the tuning parameters, we establish the consistency and asymptotic normality of the resulting estimators. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedure. We further illustrate the proposed procedure with an application.  相似文献   

5.
We propose a new method to test the order between two high-dimensional mean curves. The new statistic extends the approach of Follmann (1996) to high-dimensional data by adapting the strategy of Bai and Saranadasa (1996). The proposed procedure is an alternative to the non-negative basis matrix factorization (NBMF) based test of Lee et al. (2008) for the same hypothesis, but it is much easier to implement. We derive the asymptotic mean and variance of the proposed test statistic under the null hypothesis of equal mean curves. Based on theoretical results, we put forward a permutation procedure to approximate the null distribution of the new test statistic. We compare the power of the proposed test with that of the NBMF-based test via simulations. We illustrate the approach by an application to tidal volume traces.  相似文献   

6.
The different parts (variables) of a compositional data set cannot be considered independent from each other, since only the ratios between the parts constitute the relevant information to be analysed. Practically, this information can be included in a system of orthonormal coordinates. For the task of regression of one part on other parts, a specific choice of orthonormal coordinates is proposed which allows for an interpretation of the regression parameters in terms of the original parts. In this context, orthogonal regression is appropriate since all compositional parts – also the explanatory variables – are measured with errors. Besides classical (least-squares based) parameter estimation, also robust estimation based on robust principal component analysis is employed. Statistical inference for the regression parameters is obtained by bootstrap; in the robust version the fast and robust bootstrap procedure is used. The methodology is illustrated with a data set from macroeconomics.  相似文献   

7.
The logrank test procedure for testing bivariate symmetry against asymmetry in matched-pair data is proposed. The presented test statistic is based on Mantel-Haenszel type statistics evaluated at diagonal grid points on the plane obtained from distinct uncensored failure times. The asymptotic results of the proposed test are derived and an example is shown to illustrate the methodology.  相似文献   

8.
ABSTRACT

Environmental data is typically indexed in space and time. This work deals with modelling spatio-temporal air quality data, when multiple measurements are available for each space-time point. Typically this situation arises when different measurements referring to several response variables are observed in each space-time point, for example, different pollutants or size resolved data on particular matter. Nonetheless, such a kind of data also arises when using a mobile monitoring station moving along a path for a certain period of time. In this case, each spatio-temporal point has a number of measurements referring to the response variable observed several times over different locations in a close neighbourhood of the space-time point. We deal with this type of data within a hierarchical Bayesian framework, in which observed measurements are modelled in the first stage of the hierarchy, while the unobserved spatio-temporal process is considered in the following stages. The final model is very flexible and includes autoregressive terms in time, different structures for the variance-covariance matrix of the errors, and can manage covariates available at different space-time resolutions. This approach is motivated by the availability of data on urban pollution dynamics: fast measures of gases and size resolved particulate matter have been collected using an Optical Particle Counter located on a cabin of a public conveyance that moves on a monorail on a line transect of a town. Urban microclimate information is also available and included in the model. Simulation studies are conducted to evaluate the performance of the proposed model over existing alternatives that do not model data over the first stage of the hierarchy.  相似文献   

9.
In many applications (geosciences, insurance, etc.), the peaks-over-thresholds (POT) approach is one of the most widely used methodology for extreme quantile inference. It mainly consists of approximating the distribution of exceedances above a high threshold by a generalized Pareto distribution (GPD). The number of exceedances which is used in the POT inference is often quite small and this leads typically to a high volatility of the estimates. Inspired by perfect sampling techniques used in simulation studies, we define a folding procedure that connects the lower and upper parts of a distribution. A new extreme quantile estimator motivated by this theoretical folding scheme is proposed and studied. Although the asymptotic behaviour of our new estimate is the same as the classical (non-folded) one, our folding procedure reduces significantly the mean squared error of the extreme quantile estimates for small and moderate samples. This is illustrated in the simulation study. We also apply our method to an insurance dataset.  相似文献   

10.
Efficient statistical inference on nonignorable missing data is a challenging problem. This paper proposes a new estimation procedure based on composite quantile regression (CQR) for linear regression models with nonignorable missing data, that is applicable even with high-dimensional covariates. A parametric model is assumed for modelling response probability, which is estimated by the empirical likelihood approach. Local identifiability of the proposed strategy is guaranteed on the basis of an instrumental variable approach. A set of data-based adaptive weights constructed via an empirical likelihood method is used to weight CQR functions. The proposed method is resistant to heavy-tailed errors or outliers in the response. An adaptive penalisation method for variable selection is proposed to achieve sparsity with high-dimensional covariates. Limiting distributions of the proposed estimators are derived. Simulation studies are conducted to investigate the finite sample performance of the proposed methodologies. An application to the ACTG 175 data is analysed.  相似文献   

11.
Shared frailty models are of interest when one has clustered survival data and when focus is on comparing the lifetimes within clusters and further on estimating the correlation between lifetimes from the same cluster. It is well known that the positive stable model should be preferred to the gamma model in situations where the correlated survival data show a decreasing association with time. In this paper, we devise a likelihood based estimation procedure for the positive stable shared frailty Cox model, which is expected to obtain high efficiency. The proposed estimator is provided with large sample properties and also a consistent estimator of standard errors is given. Simulation studies show that the estimation procedure is appropriate for practical use, and that it is much more efficient than a recently suggested procedure. The suggested methodology is applied to a dataset concerning time to blindness for patients with diabetic retinopathy.  相似文献   

12.
Jointly modeling longitudinal and survival data has been an active research area. Most researches focus on improving the estimating efficiency but ignore many data features frequently encountered in practice. In the current study, we develop the joint models that concurrently accounting for longitudinal and survival data with multiple features. Specifically, the proposed model handles skewness, missingness and measurement errors in covariates which are typically observed in the collection of longitudinal survival data from many studies. We employ a Bayesian inferential method to make inference on the proposed model. We applied the proposed model to an real data study. A few alternative models under different conditions are compared. We conduct extensive simulations in order to evaluate how the method works.  相似文献   

13.
New data collection and storage technologies have given rise to a new field of streaming data analytics, called real-time statistical methodology for online data analyses. Most existing online learning methods are based on homogeneity assumptions, which require the samples in a sequence to be independent and identically distributed. However, inter-data batch correlation and dynamically evolving batch-specific effects are among the key defining features of real-world streaming data such as electronic health records and mobile health data. This article is built under a state-space mixed model framework in which the observed data stream is driven by a latent state process that follows a Markov process. In this setting, online maximum likelihood estimation is made challenging by high-dimensional integrals and complex covariance structures. In this article, we develop a real-time Kalman-filter-based regression analysis method that updates both point estimates and their standard errors for fixed population average effects while adjusting for dynamic hidden effects. Both theoretical justification and numerical experiments demonstrate that our proposed online method has statistical properties similar to those of its offline counterpart and enjoys great computational efficiency. We also apply this method to analyze an electronic health record dataset.  相似文献   

14.
This paper generalizes the tolerance interval approach for assessing agreement between two methods of continuous measurement for repeated measurement data—a common scenario in applications. The repeated measurements may be longitudinal or they may be replicates of the same underlying measurement. Our approach is to first model the data using a mixed model and then construct a relevant asymptotic tolerance interval (or band) for the distribution of appropriately defined differences. We present the methodology in the general context of a mixed model that can incorporate covariates, heteroscedasticity and serial correlation in the errors. Simulation for the no-covariate case shows good small-sample performance of the proposed methodology. For the longitudinal data, we also describe an extension for the case when the observed time profiles are modelled nonparametrically through penalized splines. Two real data applications are presented.  相似文献   

15.
Many directional data such as wind directions can be collected extremely easily so that experiments typically yield a huge number of data points that are sequentially collected. To deal with such big data, the traditional nonparametric techniques rapidly require a lot of time to be computed and therefore become useless in practice if real time or online forecasts are expected. In this paper, we propose a recursive kernel density estimator for directional data which (i) can be updated extremely easily when a new set of observations is available and (ii) keeps asymptotically the nice features of the traditional kernel density estimator. Our methodology is based on Robbins–Monro stochastic approximations ideas. We show that our estimator outperforms the traditional techniques in terms of computational time while being extremely competitive in terms of efficiency with respect to its competitors in the sequential context considered here. We obtain expressions for its asymptotic bias and variance together with an almost sure convergence rate and an asymptotic normality result. Our technique is illustrated on a wind dataset collected in Spain. A Monte‐Carlo study confirms the nice properties of our recursive estimator with respect to its non‐recursive counterpart.  相似文献   

16.
In practice, the presence of influential observations may lead to misleading results in variable screening problems. We, therefore, propose a robust variable screening procedure for high-dimensional data analysis in this paper. Our method consists of two steps. The first step is to define a new high-dimensional influence measure and propose a novel influence diagnostic procedure to remove those unusual observations. The second step is to utilize the sure independence screening procedure based on distance correlation to select important variables in high-dimensional regression analysis. The new influence measure and diagnostic procedure that we developed are model free. To confirm the effectiveness of the proposed method, we conduct simulation studies and a real-life data analysis to illustrate the merits of the proposed approach over some competing methods. Both the simulation results and the real-life data analysis demonstrate that the proposed method can greatly control the adverse effect after detecting and removing those unusual observations, and performs better than the competing methods.  相似文献   

17.
In this paper, the generalized log-gamma regression model is modified to allow the possibility that long-term survivors may be present in the data. This modification leads to a generalized log-gamma regression model with a cure rate, encompassing, as special cases, the log-exponential, log-Weibull and log-normal regression models with a cure rate typically used to model such data. The models attempt to simultaneously estimate the effects of explanatory variables on the timing acceleration/deceleration of a given event and the surviving fraction, that is, the proportion of the population for which the event never occurs. The normal curvatures of local influence are derived under some usual perturbation schemes and two martingale-type residuals are proposed to assess departures from the generalized log-gamma error assumption as well as to detect outlying observations. Finally, a data set from the medical area is analyzed.  相似文献   

18.
In biological, medical, and social sciences, multilevel structures are very common. Hierarchical models that take the dependencies among subjects within the same level are necessary. In this article, we introduce a semiparametric hierarchical composite quantile regression model for hierarchical data. This model (i) keeps the easy interpretability of the simple parametric model; (ii) retains some of the flexibility of the complex non parametric model; (iii) relaxes the assumptions that the noise variances and higher-order moments exist and are finite; and (iv) takes the dependencies among subjects within the same hierarchy into consideration. We establish the asymptotic properties of the proposed estimators. Our simulation results show that the proposed method is more efficient than the least-squares-based method for many non normally distributed errors. We illustrate our methodology with a real biometric data set.  相似文献   

19.
Summary.  Editing in surveys of economic populations is often complicated by the fact that outliers due to errors in the data are mixed in with correct, but extreme, data values. We describe and evaluate two automatic techniques for the identification of errors in such long-tailed data distributions. The first is a forward search procedure based on finding a sequence of error-free subsets of the error-contaminated data and then using regression modelling within these subsets to identify errors. The second uses a robust regression tree modelling procedure to identify errors. Both approaches can be implemented on a univariate basis or on a multivariate basis. An application to a business survey data set that contains a mix of extreme errors and true outliers is described.  相似文献   

20.
CD4 and viral load play important roles in HIV/AIDS studies, and the study of their relationship has received much attention with well-known results. However, AIDS datasets are often highly complex in the sense that they typically contain outliers, measurement errors, and missing data. These data complications can greatly affect statistical analysis results, but much of the literature fail to address these issues in data analysis. In this paper, we re-visit the important relationship between CD4 and viral load and propose methods which simultaneously address outliers, measurement errors, and missing data. We find that the strength of the relationship may be severely mis-estimated if measurement errors and outliers are ignored. The proposed methods are general and can be used in other settings, where jointly modelling several different types of longitudinal data is required in the presence of data complications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号