首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
It is of particular importance to nurse manpower planners to determine the likelihood and time-scale of return to service of nurses who have temporarily withdrawn from the labour force. Such behaviour is common in female dominated professions and is referred to in the nursing literature as being in ‘limbo’. In this paper we develop non-parametric, parametric and network models to describe the distribution of stay in ‘limbo’ and estimate the probability of return to service. These models are fitted to data from the Northern Ireland nursing service. It is inherent in the problem that the data are statistically incomplete. This must be taken into account in the statistical methods used to estimate the various model parameters.  相似文献   

2.
Research and operational applications in weather forecasting are reviewed, with emphasis on statistical issues. It is argued that the deterministic approach has dominated in weather forecasting, although weather forecasting is a probabilistic problem by nature. The reason has been the successful application of numerical weather prediction techniques over the 50 years since the introduction of computers. A gradual change towards utilization of more probabilistic methods has occurred over the last decade; in particular meteorological data assimilation, ensemble forecasting and post‐processing of model output have been influenced by ideas from statistics and control theory.  相似文献   

3.
The prediction problem of sea state based on the field measurements of wave and meteorological factors is a topic of interest from the standpoints of navigation safety and fisheries. Various statistical methods have been considered for the prediction of the distribution of sea surface elevation. However, prediction of sea state in the transitional situation when waves are developing by blowing wind has been a difficult problem until now, because the statistical expression of the dynamic mechanism during this situation is very complicated. In this article, we consider this problem through the development of a statistical model. More precisely, we develop a model for the prediction of the time-varying distribution of sea surface elevation, taking into account a non-homogeneous hidden Markov model in which the time-varying structures are influenced by wind speed and wind direction. Our prediction experiments suggest the possibility that the proposed model contributes to an improvement of the prediction accuracy by using a homogenous hidden Markov model. Furthermore, we found that the prediction accuracy is influenced by the circular distribution of the circular hidden Markov model for the directional time series wind direction data.  相似文献   

4.
There has been growing interest in partial identification of probability distributions and parameters. This paper considers statistical inference on parameters that are partially identified because data are incompletely observed, due to nonresponse or censoring, for instance. A method based on likelihood ratios is proposed for constructing confidence sets for partially identified parameters. The method can be used to estimate a proportion or a mean in the presence of missing data, without assuming missing-at-random or modeling the missing-data mechanism. It can also be used to estimate a survival probability with censored data without assuming independent censoring or modeling the censoring mechanism. A version of the verification bias problem is studied as well.  相似文献   

5.
The statistical modeling of big data bases constitutes one of the most challenging issues, especially nowadays. The issue is even more critical in case of a complicated correlation structure. Variable selection plays a vital role in statistical analysis of large data bases and many methods have been proposed so far to deal with the aforementioned problem. One of such methods is the Sure Independence Screening which has been introduced to reduce dimensionality to a relatively smaller scale. This method, though simple, produces remarkable results even under both ultra high dimensionality and big scale in terms of sample size problems. In this paper we dealt with the analysis of a big real medical data set assuming a Poisson regression model. We support the analysis by conducting simulated experiments taking into consideration the correlation structure of the design matrix.  相似文献   

6.
Development of algorithms that estimate the offset between two clocks has received a lot of attention, with the motivating force being data networking applications that require synchronous communication protocols. Recently, statistical modeling techniques have been used to develop improved estimation algorithms with the focus being obtaining robust estimators in terms of mean squared error. In this paper, we extend the use of statistical modeling techniques to address the construction of confidence intervals for the offset parameter. We consider the case where the distributions of network delays are members of a scale family. Our results include an asymptotic confidence interval and a generalized confidence interval in the sense of [S. Weerahandi, Generalized confidence intervals, Journal of the American Statistical Association 88 (1993) 899–905. Correction in vol. 89, p. 726, 1994]. We compare and contrast the two approaches for obtaining a confidence interval, and illustrate specific applications using exponential, Rayleigh and heavy-tailed Weibull network delays as concrete examples.  相似文献   

7.
This article offers a review of three software packages that estimate directed acyclic graphs (DAGs) from data. The three packages, MIM, Tetrad and WinMine, can help researchers discover underlying causal structure. Although each package uses a different algorithm, the results are to some extent similar. All three packages are free and easy to use. They are likely to be of interest to researchers who do not have strong theory regarding the causal structure in their data. DAG modeling is a powerful analytic tool to consider in conjunction with, or in place of, path analysis, structural equation modeling, and other statistical techniques.  相似文献   

8.
Variable and model selection problems are fundamental to high-dimensional statistical modeling in diverse fields of sciences. Especially in health studies, many potential factors are usually introduced to determine an outcome variable. This paper deals with the problem of high-dimensional statistical modeling through the analysis of the trauma annual data in Greece for 2005. The data set is divided into the experiment and control sets and consists of 6334 observations and 112 factors that include demographic, transport and intrahospital data used to detect possible risk factors of death. In our study, different model selection techniques are applied to the experiment set and the notion of deviance is used on the control set to assess the fit of the overall selected model. The statistical methods employed in this work were the non-concave penalized likelihood methods, smoothly clipped absolute deviation, least absolute shrinkage and selection operator, and Hard, the generalized linear logistic regression, and the best subset variable selection.The way of identifying the significant variables in large medical data sets along with the performance and the pros and cons of the various statistical techniques used are discussed. The performed analysis reveals the distinct advantages of the non-concave penalized likelihood methods over the traditional model selection techniques.  相似文献   

9.
Cyber attacks have become a problem that is threatening the economy, human privacy, and even national security. Before we can adequately address the problem, we need to have a crystal clear understanding about cyber attacks from various perspectives. This is a challenge because the Internet is a large-scale complex system with humans in the loop. In this paper, we investigate a particular perspective of the problem, namely the extreme value phenomenon that is exhibited by cyber attack rates, which are the numbers of attacks against a system of interest per time unit. It is important to explore this perspective because understanding the statistical properties of extreme cyber attack rates will pave the way for cost-effective, if not optimal, allocation of resources in real-life cyber defense operations. Specifically, we propose modeling and predicting extreme cyber attack rates via marked point processes, while using the Value-at-Risk as a natural measure of intense cyber attacks. The point processes are then applied to analyze some real data sets. Our analysis shows that the point processes can describe and predict extreme cyber attack rates at a very satisfactory accuracy.  相似文献   

10.
This paper deals with the analysis of datasets, where the subjects are described by the estimated means of a p-dimensional variable. Classical statistical methods of data analysis do not treat measurements affected by intrinsic variability, as in the case of estimates, so that the heterogeneity induced among subjects by this condition is not taken into account. In this paper a way to solve the problem is suggested in the context of symbolic data analysis, whose specific aim is to handle data tables where single valued measurements are substituted by complex data structures like frequency distributions, intervals, and sets of values. A principal component analysis is carried out according to this proposal, with a significant improvement in the treatment of information.  相似文献   

11.
Neuroimaging studies aim to analyze imaging data with complex spatial patterns in a large number of locations (called voxels) on a two-dimensional (2D) surface or in a 3D volume. Conventional analyses of imaging data include two sequential steps: spatially smoothing imaging data and then independently fitting a statistical model at each voxel. However, conventional analyses suffer from the same amount of smoothing throughout the whole image, the arbitrary choice of smoothing extent, and low statistical power in detecting spatial patterns. We propose a multiscale adaptive regression model (MARM) to integrate the propagation-separation (PS) approach (Polzehl and Spokoiny, 2000, 2006) with statistical modeling at each voxel for spatial and adaptive analysis of neuroimaging data from multiple subjects. MARM has three features: being spatial, being hierarchical, and being adaptive. We use a multiscale adaptive estimation and testing procedure (MAET) to utilize imaging observations from the neighboring voxels of the current voxel to adaptively calculate parameter estimates and test statistics. Theoretically, we establish consistency and asymptotic normality of the adaptive parameter estimates and the asymptotic distribution of the adaptive test statistics. Our simulation studies and real data analysis confirm that MARM significantly outperforms conventional analyses of imaging data.  相似文献   

12.
We present a mathematical theory of objective, frequentist chance phenomena that uses as a model a set of probability measures. In this work, sets of measures are not viewed as a statistical compound hypothesis or as a tool for modeling imprecise subjective behavior. Instead we use sets of measures to model stable (although not stationary in the traditional stochastic sense) physical sources of finite time series data that have highly irregular behavior. Such models give a coarse-grained picture of the phenomena, keeping track of the range of the possible probabilities of the events. We present methods to simulate finite data sequences coming from a source modeled by a set of probability measures, and to estimate the model from finite time series data. The estimation of the set of probability measures is based on the analysis of a set of relative frequencies of events taken along subsequences selected by a collection of rules. In particular, we provide a universal methodology for finding a family of subsequence selection rules that can estimate any set of probability measures with high probability.  相似文献   

13.
The von Mises distribution is widely used for modeling angular data. When such data are seen in a quality control setting, there may be interest in checking whether the values are in statistical control or have gone out of control. A cumulative sum (cusum) control chart has desirable properties for checking whether the distribution has changed from an in-control to an out-of-control setting. This paper develops cusums for a change in the mean direction and concentration of angular data and illustrates some of their properties.  相似文献   

14.
Rapid technological advances have resulted in continual changes in data acquisition and reporting processes. While such advances have benefited research in these areas, the changing technologies have, at the same time, created difficulty for statistical analysis by generating outdated data which are incompatible with data based on newer technology. Relationships between these incompatible variables are complicated; not only they are stochastic, but also often depend on other variables, rendering even a simple statistical analysis, such as estimation of a population mean, difficult in the presence of mixed data formats. Thus, technological advancement has brought forth, from the statistical perspective, a methodological problem of the analysis of newer data with outdated data. In this paper, we discuss general principles for addressing the statistical issues related to the analysis of incompatible data. The approach taken to the task at hand has three desirable properties, it is readily understood, since it builds upon a linear regression setting, it is flexible to allow for data incompatibility in either the response or covariate, and it is not computationally intensive. In addition, inferences may be made for a latent variable of interest. Our considerations to this problem are motivated by the analysis of delta wave counts, as a surrogate for sleep disorder, in the sleep laboratory of the Department of Psychiatry, University of Pittsburgh Medical Center, where two major changes had occurred in the acquisition of this data, resulting in three mixed formats. By developing appropriate methods for addressing this issue, we provide statistical advancement that is compatible with technological advancement.  相似文献   

15.
多水平模型及静态面板数据模型的比较研究   总被引:1,自引:0,他引:1  
对两水平模型与静态面板数据模型进行对比分析:多水平模型主要用于分析具有层次结构的统计数据,面板数据模型是针对面板数据而提出的一种应用广泛的计量经济模型。面板数据可以看成是具有截面水平与时间水平的两层数据,两水平模型也能对面板数据进行分析,在一定条件下具有一定的相似性。因此,提出多水平的静态面板数据模型,为分析具有多个层次结构的面板数据提供分析工具。  相似文献   

16.
Self-reported income information particularly suffers from an intentional coarsening of the data, which is called heaping or rounding. If it does not occur completely at random – which is usually the case – heaping and rounding have detrimental effects on the results of statistical analysis. Conventional statistical methods do not consider this kind of reporting bias, and thus might produce invalid inference. We describe a novel statistical modeling approach that allows us to deal with self-reported heaped income data in an adequate and flexible way. We suggest modeling heaping mechanisms and the true underlying model in combination. To describe the true net income distribution, we use the zero-inflated log-normal distribution. Heaping points are identified from the data by applying a heuristic procedure comparing a hypothetical income distribution and the empirical one. To determine heaping behavior, we employ two distinct models: either we assume piecewise constant heaping probabilities, or heaping probabilities are considered to increase steadily with proximity to a heaping point. We validate our approach by some examples. To illustrate the capacity of the proposed method, we conduct a case study using income data from the German National Educational Panel Study.  相似文献   

17.
Abstract. This article considers the problem of cardinality estimation in data stream applications. We present a statistical analysis of probabilistic counting algorithms, focusing on two techniques that use pseudo‐random variates to form low‐dimensional data sketches. We apply conventional statistical methods to compare probabilistic algorithms based on storing either selected order statistics, or random projections. We derive estimators of the cardinality in both cases, and show that the maximal‐term estimator is recursively computable and has exponentially decreasing error bounds. Furthermore, we show that the estimators have comparable asymptotic efficiency, and explain this result by demonstrating an unexpected connection between the two approaches.  相似文献   

18.
We studied several test statistics for testing the equality of marginal survival functions of paired censored data. The null distribution of the test statistics was approximated by permutation. These tests do not require explicit modeling or estimation of the within-pair correlation, accommodate both paired data and singletons, and the computation is straightforward with most statistical software. Numerical studies showed that these tests have competitive size and power performance. One test statistic has higher power than previously published test statistics when the two survival functions under comparison cross. We illustrate use of these tests in a propensity score matched dataset.  相似文献   

19.
Bootstrapping has been used as a diagnostic tool for validating model results for a wide array of statistical models. Here we evaluate the use of the non-parametric bootstrap for model validation in mixture models. We show that the bootstrap is problematic for validating the results of class enumeration and demonstrating the stability of parameter estimates in both finite mixture and regression mixture models. In only 44% of simulations did bootstrapping detect the correct number of classes in at least 90% of the bootstrap samples for a finite mixture model without any model violations. For regression mixture models and cases with violated model assumptions, the performance was even worse. Consequently, we cannot recommend the non-parametric bootstrap for validating mixture models.

The cause of the problem is that when resampling is used influential individual observations have a high likelihood of being sampled many times. The presence of multiple replications of even moderately extreme observations is shown to lead to additional latent classes being extracted. To verify that these replications cause the problems we show that leave-k-out cross-validation where sub-samples taken without replacement does not suffer from the same problem.  相似文献   


20.
一、引言  统计数据来源于基层调查单位,对基层调查单位上报的原始数据进行检查和审核是提高统计数据质量的重要措施之一。这种检查和审核应包括以下两方面:(1)对基层调查单位上报的原始数据进行质量评估。通过评估对数据的质量有一个总体的认识和把握,特别是对数据的可靠性和准确性应有一个定量的说法。(2)对原始数据中的异常点进行识别和纠正。所谓异常点或称奇异点(outlier),是对应于误差较大的观察数据,在这里主要是指社会经济和科技统计中的失实数据。由于种种技术上或非技术上原因形成的基层调查单位上报数据中的异常点,必然…  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号