首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper we analyse a real e-learning dataset derived from the e-learning platform of the University of Pavia. The dataset concerns an online learning environment with in-depth teaching materials. The main focus of this paper is to supply a measure of the relative importance of the exercises (test) at the end of each training unit; to build predictive models of student’s performance and finally to personalize the e-learning platform. The methodology employed is based on nonparametric statistical methods for kernel density estimation and generalized linear models and generalized additive models for predictive purposes.  相似文献   

2.
统计数据、统计安全与统计法治   总被引:2,自引:1,他引:2       下载免费PDF全文
李金昌 《统计研究》2009,26(8):45-49
 本文以《统计违法违纪行为处分规定》的施行为背景,以统计数据质量为切入点,讨论了统计法治问题。文章阐述了统计数据质量与国家统计安全、统计本质与统计法治的关系,并对实现统计法治的途径进行了探讨。  相似文献   

3.
Summary. Geologists take slices through rock samples at a series of different orientations. Each slice is examined for a phenomenon which may occur in one of two states labelled clockwise and anticlockwise. The probability of an occurrence being in the clockwise state is dependent on the orientation. Motivated by these data, two models are presented that relate the probability of an event to orientation. Each model has two parameters, one identifying the orientation corresponding to the peak probability, and the other controlling the rate of change of probability with orientation. One model is a logistic model, whereas the other involves a power of the sine function. For the given data neither model consistently outperforms the other.  相似文献   

4.
ABSTRACT

Random events such as a production machine breakdown in a manufacturing plant, an equipment failure within a transportation system, a security failure of information system, or any number of different problems may cause supply chain disruption. Although several researchers have focused on supply chain disruptions and have discussed the measures that companies should use to design better supply chains, or study the different ways that could help firms to mitigate the consequences of a supply chain disruption, the lack of an appropriate method to predict time to disruptive events is strongly felt. Based on this need, this paper introduces statistical flowgraph models (SFGMs) for survival analysis in supply chains. SFGMs provide an innovative approach to analyze time-to-event data. Time-to-event data analysis focuses on modeling waiting times until events of interest occur. SFGMs are useful for reducing multistate models into an equivalent binary-state model. Analysis from the SFGM gives an entire waiting time distribution as well as the system reliability (survivor) and hazard functions for any total or partial waiting time. The end results from a SFGM helps to identify the supply chain's strengths, and more importantly, weaknesses. Therefore, the results are a valuable decision support for supply chain managers to predict supply chain behaviors. Examples presented in this paper demonstrate with clarity the applicability of SFGMs to survival analysis in supply chains.  相似文献   

5.
Data input errors can potentially affect statistical inferences, but little research has been published to date on this topic. In the present paper, we report the effect of data input errors on the statistical inferences drawn about population parameters in an empirical study involving 280 students from two Polish universities, namely the Warsaw University of Life Sciences – SGGW and the University of Information Technology and Management in Rzeszow. We found that 28% of the students committed at least one data error. While some of these errors were small and did not have any real effect, a few of them had substantial effects on the statistical inferences drawn about the population parameters.  相似文献   

6.
The solution of the generalized symmetric eigenproblem Ax = λBx is required in many multivariate statistical models, viz. canonical correlation, discriminant analysis, multivariate linear model, limited information maximum likelihoods. The problem can be solved by two efficient numerical algorithms: Cholesky decomposition or singular value decomposition. Practical considerations for implementation are also discussed.  相似文献   

7.
In this paper we address the problem of protecting confidentiality in statistical tables containing sensitive information that cannot be disseminated. This is an issue of primary importance in practice. Cell Suppression is a widely-used technique for avoiding disclosure of sensitive information, which consists in suppressing all sensitive table entries along with a certain number of other entries, called complementary suppressions. Determining a pattern of complementary suppressions that minimizes the overall loss of information results into a difficult (i.e., -hard) optimization problem known as the Cell Suppression Problem. We propose here a different protection methodology consisting of replacing some table entries by appropriate intervals containing the actual value of the unpublished cells. We call this methodology Partial Cell Suppression, as opposed to the classical complete cell suppression. Partial cell suppression has the important advantage of reducing the overall information loss needed to protect the sensitive information. Also, the new method provides automatically auditing ranges for each unpublished cell, thus saving an often time-consuming task to the statistical office while increasing the information explicitly provided with the table. Moreover, we propose an efficient (i.e., polynomial-time) algorithm to find an optimal partial suppression solution. A preliminary computational comparison between partial and complete suppression methologies is reported, showing the advantages of the new approach. Finally, we address possible extensions leading to a unified complete/partial cell suppression framework.  相似文献   

8.
A multicollinearity diagnostic is discussed for parametric models fit to censored data. The models considered include the Weibull, exponential and lognormal models as well as the Cox proportional hazards model. This diagnostic is an extension of the diagnostic proposed by Belsley, Kuh, and Welsch (1980). The diagnostic is based on the condition indicies and variance proportions of the variance covariance matrix. Its use and properties are studied through a series of examples. The effect of centering variables included in model is also discussed.  相似文献   

9.
There are no practical and effective mechanisms to share high-dimensional data including sensitive information in various fields like health financial intelligence or socioeconomics without compromising either the utility of the data or exposing private personal or secure organizational information. Excessive scrambling or encoding of the information makes it less useful for modelling or analytical processing. Insufficient preprocessing may compromise sensitive information and introduce a substantial risk for re-identification of individuals by various stratification techniques. To address this problem, we developed a novel statistical obfuscation method (DataSifter) for on-the-fly de-identification of structured and unstructured sensitive high-dimensional data such as clinical data from electronic health records (EHR). DataSifter provides complete administrative control over the balance between risk of data re-identification and preservation of the data information. Simulation results suggest that DataSifter can provide privacy protection while maintaining data utility for different types of outcomes of interest. The application of DataSifter on a large autism dataset provides a realistic demonstration of its promise practical applications.  相似文献   

10.
In this study, we deal with the problem of overdispersion beyond extra zeros for a collection of counts that can be correlated. Poisson, negative binomial, zero-inflated Poisson and zero-inflated negative binomial distributions have been considered. First, we propose a multivariate count model in which all counts follow the same distribution and are correlated. Then we extend this model in a sense that correlated counts may follow different distributions. To accommodate correlation among counts, we have considered correlated random effects for each individual in the mean structure, thus inducing dependency among common observations to an individual. The method is applied to real data to investigate variation in food resources use in a species of marsupial in a locality of the Brazilian Cerrado biome.  相似文献   

11.
Real-time polymerase chain reaction (PCR) is reliable quantitative technique in gene expression studies. The statistical analysis of real-time PCR data is quite crucial for results analysis and explanation. The statistical procedures of analyzing real-time PCR data try to determine the slope of regression line and calculate the reaction efficiency. Applications of mathematical functions have been used to calculate the target gene relative to the reference gene(s). Moreover, these statistical techniques compare Ct (threshold cycle) numbers between control and treatments group. There are many different procedures in SAS for real-time PCR data evaluation. In this study, the efficiency of calibrated model and delta delta Ct model have been statistically tested and explained. Several methods were tested to compare control with treatment means of Ct. The methods tested included t-test (parametric test), Wilcoxon test (non-parametric test) and multiple regression. Results showed that applied methods led to similar results and no significant difference was observed between results of gene expression measurement by the relative method.  相似文献   

12.
In any other circumstance, it might make sense to define the extent of the terrain (Data Science) first, and then locate and describe the landmarks (Principles). But this data revolution we are experiencing defies a cadastral survey. Areas are continually being annexed into Data Science. For example, biometrics was traditionally statistics for agriculture in all its forms but now, in Data Science, it means the study of characteristics that can be used to identify an individual. Examples of non-intrusive measurements include height, weight, fingerprints, retina scan, voice, photograph/video (facial landmarks and facial expressions) and gait. A multivariate analysis of such data would be a complex project for a statistician, but a software engineer might appear to have no trouble with it at all. In any applied-statistics project, the statistician worries about uncertainty and quantifies it by modelling data as realisations generated from a probability space. Another approach to uncertainty quantification is to find similar data sets, and then use the variability of results between these data sets to capture the uncertainty. Both approaches allow ‘error bars’ to be put on estimates obtained from the original data set, although the interpretations are different. A third approach, that concentrates on giving a single answer and gives up on uncertainty quantification, could be considered as Data Engineering, although it has staked a claim in the Data Science terrain. This article presents a few (actually nine) statistical principles for data scientists that have helped me, and continue to help me, when I work on complex interdisciplinary projects.  相似文献   

13.
We study application of the Exponential Tilt Model (ETM) to compare survival distributions in two groups. The ETM assumes a parametric form for the density ratio of the two distributions. It accommodates a broad array of parametric models such as the log-normal and gamma models and can be sufficiently flexible to allow for crossing hazard and crossing survival functions. We develop a nonparametric likelihood approach to estimate ETM parameters in the presence of censoring and establish related asymptotic results. We compare the ETM to the Proportional Hazards Model (PHM) in simulation studies. When the proportional hazards assumption is not satisfied but the ETM assumption is, the ETM has better power for testing the hypothesis of no difference between the two groups. And, importantly, when the ETM relation is not satisfied but the PHM assumption is, the ETM can still have power reasonably close to that of the PHM. Application of the ETM is illustrated by a gastrointestinal tumor study.  相似文献   

14.
On identifiability of parametric statistical models   总被引:1,自引:0,他引:1  
Summary This is a review article on statistical identifiability. Besides the definition of the main concepts, we deal with several questions relevant to the statistician: parallelism between parametric identifiability and sample sufficiency; relationship of identifiability with measures of sample information and with the inferential concept of estimability; several strategies of making inferences in unidentifiable models with emphasis on the distinct behaviour of the classical and Bayesian approaches. The concepts, ideas and methods discussed are illustrated with simple examples of statistical models. Centro de Análise e Processamento de Sinais da UTL  相似文献   

15.
16.
The author develops a robust quasi‐likelihood method, which appears to be useful for down‐weighting any influential data points when estimating the model parameters. He illustrates the computational issues of the method in an example. He uses simulations to study the behaviour of the robust estimates when data are contaminated with outliers, and he compares these estimates to those obtained by the ordinary quasi‐likelihood method.  相似文献   

17.
I exploit the potential of latent class models for proposing an innovative framework for financial data analysis. By stressing the latent nature of the most important financial variables, expected return and risk, I am able to introduce a new methodological dimension in the analysis of financial phenomena. In my proposal, (i) I provide innovative measures of expected return and risk, (ii) I suggest a financial data classification consistent with the latent risk-return profile, and (iii) I propose a set of statistical methods for detecting and testing the number of groups of the new data classification. The results lead to an improvement in both risk measurement theory and practice and, if compared to traditional methods, allow for new insights into the analysis of financial data. Finally, I illustrate the potentiality of my proposal by investigating the European stock market and detailing the steps for the appropriate choice of a financial portfolio.  相似文献   

18.
Summary The paper first provides a short review of the most common microeconometric models including logit, probit, discrete choice, duration models, models for count data and Tobit-type models. In the second part we consider the situation that the micro data have undergone some anonymization procedure which has become an important issue since otherwise confidentiality would not be guaranteed. We shortly describe the most important approaches for data protection which also can be seen as creating errors of measurement by purpose. We also consider the possibility of correcting the estimation procedure while taking into account the anonymization procedure. We illustrate this for the case of binary data which are anonymized by ‘post-randomization’ and which are used in a probit model. We show the effect of ‘naive’ estimation, i. e. when disregarding the anonymization procedure. We also show that a ‘corrected’ estimate is available which is satisfactory in statistical terms. This is also true if parameters of the anonymization procedure have to be estimated, too. Research in this paper is related to the project “Faktische Anonymisierung wirtschaftsstatistischer Einzeldaten” financed by German Ministry of Research and Technology.  相似文献   

19.
This paper considers the optimal design problem for multivariate mixed-effects logistic models with longitudinal data. A decomposition method of the binary outcome and the penalized quasi-likelihood are used to obtain the information matrix. The D-optimality criterion based on the approximate information matrix is minimized under different cost constraints. The results show that the autocorrelation coefficient plays a significant role in the design. To overcome the dependence of the D-optimal designs on the unknown fixed-effects parameters, the Bayesian D-optimality criterion is proposed. The relative efficiencies of designs reveal that both the cost ratio and autocorrelation coefficient play an important role in the optimal designs.  相似文献   

20.
The intention of this article is to highlight sources of web‐based reference material, courses and software that will aid statisticians and researchers. The article includes websites that: assist in writing a protocol or proposal; link to online statistical textbooks; and provide statistical calculators or links to free statistical software and other guidance documents. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号