期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Erste Ergebnisse faktischer Anonymisierung wirtschaftsstatistischer Einzeldaten

Martin?Rosemann Email author Daniel?Vorgrimler Rainer?Lenz 《AStA Advances in Statistical Analysis》2004,88(1):73-99

相似文献

2.

Disclosure Risk and Disclosure Avoidance for Microdata

Gerhard Paass 《商业与经济统计学杂志》2013,31(4):487-500

Under given concrete exogenous conditions, the fraction of identifiable records in a microdata file without positive identifiers such as name and address is estimated. The effect of possible noise in the data, as well as the sample property of microdata files, is taken into account. Using real microdata files, it is shown that there is no risk of disclosure if the information content of characteristics known to the investigator (additional knowledge) is limited. Files with additional knowledge of large information content yield a high risk of disclosure. This can be eliminated only by massive modifications of the data records, which, however, involve large biases for complex statistical evaluations. In this case, the requirement for privacy protection and high-quality data perhaps may be fulfilled only if the linkage of such files with extensive additional knowledge is prevented by appropriate organizational and legal restrictions. 相似文献

3.

On estimation of frequency data with censored observations

Shailendra S. Menjoge 《Pharmaceutical statistics》2003,2(3):191-197

The problem of the estimation of mean frequency of events in the presence of censoring is important in assessing the efficacy, safety and cost of therapies. The mean frequency is typically estimated by dividing the total number of events by the total number of patients under study. This method, referred to in this paper as the ‘naïve estimator’, ignores the censoring. Other approaches available for this problem require many assumptions that are rarely acceptable. These include the assumption of independence, constant hazard rate over time and other similar distributional assumptions. In this paper a simple non‐parametric estimator based on the sum of the products of Kaplan–Meier estimators is proposed as an estimator of mean frequency, and its approximate variance and standard error are derived. An illustration is provided to show the derivation of the proposed estimator. Although the clinical trial setting is used in this paper, the problem has applications in other areas where survival analysis is used and recurrent events are studied. Copyright © 2003 John Wiley & Sons, Ltd. 相似文献

4.

Likelihood inference for exchangeable continuous data with covariates and varying cluster sizes; use of the Farlie–Gumbel–Morgenstern model

Catalina Stefanescu Bruce W. Turnbull 《Statistical Methodology》2009,6(5):503-512

This article investigates the Farlie–Gumbel–Morgenstern class of models for exchangeable continuous data. We show how the model specification can account for both individual and cluster level covariates, we derive insights from comparisons with the multivariate normal distribution, and we discuss maximum likelihood inference when a sample of independent clusters of varying sizes is available. We propose a method for maximum likelihood estimation which is an alternative to direct numerical maximization of the likelihood that sometimes exhibits non-convergence problems. We describe an algorithm for generating samples from the exchangeable multivariate Farlie–Gumbel–Morgenstern distribution with any marginals, using the structural properties of the distribution. Finally, we present the results of a simulation study designed to assess the properties of the maximum likelihood estimators, and we illustrate the use of the FGM distributions with the analysis of a small data set from a developmental toxicity study. 相似文献

5.

Estimating the mean life time using right censored data

Somnath Datta 《Statistical Methodology》2005,2(1):65-69

Estimation of the population mean based on right censored observations is considered. The naive sample mean will be an inconsistent and asymptotically biased estimator in this case. An estimate suggested in textbooks is to compute the area under a Kaplan–Meier curve. In this note, two more seemingly different approaches are introduced. Students’ reaction to these approaches was very positive in an introductory survival analysis course the author recently taught. 相似文献

6.

统计数据质量评估方法研究述评 总被引：3，自引：3，他引：3

许涤龙叶少波《统计与信息论坛》2011,26(7):3-14

数据质量评估是统计数据质量管理的重要环节。统计数据质量的评估方法有逻辑关系检验法、计量模型分析法、核算数据重估法、统计分布检验法、调查误差评估法以及多维评估法六个类别,在详细讨论其各自的评估过程、适用性及其优缺点的基础上,按照评估维度和评估方式对各种统计数据质量的评估方法进行再归类,指出评估方法的进一步研究应该围绕计量模型分析法、核算数据重估法中的物量指数重估法、调查误差评估法和多维评估法等方向展开。相似文献

7.

The effect of conditional heteroskedasticity on common statistical procedures for means and variances

Hendrik Kläver Friedrich Schmid 《Allgemeines Statistisches Archiv》2004,88(4):397-407

Summary: Commonly used standard statistical procedures for means and variances (such as the t–test for means or the F–test for variances and related confidence procedures) require observations from independent and identically normally distributed variables. These procedures are often routinely applied to financial data, such as asset or currency returns, which do not share these properties. Instead, they are nonnormal and show conditional heteroskedasticity, hence they are dependent. We investigate the effect of conditional heteroskedasticity (as modelled by GARCH(1,1)) on the level of these tests and the coverage probability of the related confidence procedures. It can be seen that conditional heteroskedasticity has no effect on procedures for means (at least in large samples). There is, however, a strong effect of conditional heteroskedasticity on procedures for variances. These procedures should therefore not be used if conditional heteroskedasticity is prevalent in the data.*We are grateful to the referees for their useful and constructive comments. 相似文献

8.

The Ubiquity of Statistics

William Kruskal 《The American statistician》2013,67(1):3-6

The use of biased estimation in data analysis and model building is discussed. A review of the theory of ridge regression and its relation to generalized inverse regression is presented along with the results of a simulation experiment and three examples of the use of ridge regression in practice. Comments on variable selection procedures, model validation, and ridge and generalized inverse regression computation procedures are included. The examples studied here show that when the predictor variables are highly correlated, ridge regression produces coefficients which predict and extrapolate better than least squares and is a safe procedure for selecting variables. 相似文献

9.

Discrete distributions in the extended FGM family: The p.g.f. approach

Violetta E. Piperigou 《Journal of statistical planning and inference》2009,139(11):3891

In this article the probability generating functions of the extended Farlie–Gumbel–Morgenstern family for discrete distributions are derived. Using the probability generating function approach various properties are examined, the expressions for probabilities, moments, and the form of the conditional distributions are obtained. Bivariate version of the geometric and Poisson distributions are used as illustrative examples. Their covariance structure and estimation of parameters for a data set are briefly discussed. A new copula is also introduced. 相似文献

10.

A maximum-likelihood estimator with infinite error

Gunnar Taraldsen 《Journal of statistical planning and inference》2010,140(1):1473

The standard error of the maximum-likelihood estimator for 1/μ based on a random sample of size N from the normal distribution N(μ,σ²) is infinite. This could be considered to be a disadvantage.Another disadvantage is that the bias of the estimator is undefined if the integral is interpreted in the usual sense as a Lebesgue integral. It is shown here that the integral expression for the bias can be interpreted in the sense given by the Schwartz theory of generalized functions. Furthermore, an explicit closed form expression in terms of the complex error function is derived. It is also proven that unbiased estimation of 1/μ is impossible.Further results on the maximum-likelihood estimator are investigated, including closed form expressions for the generalized moments and corresponding complete asymptotic expansions. It is observed that the problem can be reduced to a one-parameter problem depending only on , and this holds also for more general location-scale problems. The parameter can be interpreted as a shape parameter for the distribution of the maximum-likelihood estimator.An alternative estimator is suggested motivated by the asymptotic expansion for the bias, and it is argued that the suggested estimator is an improvement. The method used for the construction of the estimator is simple and generalizes to other parametric families.The problem leads to a rediscovery of a generalized mathematical expectation introduced originally by Kolmogorov [1933. Foundations of the Theory of Probability, second ed. Chelsea Publishing Company (1956)]. A brief discussion of this, and some related integrals, is provided. It is in particular argued that the principal value expectation provides a reasonable location parameter in cases where it exists. This does not hold generally for expectations interpreted in the sense given by the Schwartz theory of generalized functions. 相似文献

11.

Non‐parametric Bayesian Hazard Regression for Chronic Disease Risk Assessment

下载免费PDF全文

Olli Saarela Elja Arjas 《Scandinavian Journal of Statistics》2015,42(2):609-626

Assessing the absolute risk for a future disease event in presently healthy individuals has an important role in the primary prevention of cardiovascular diseases (CVD) and other chronic conditions. In this paper, we study the use of non‐parametric Bayesian hazard regression techniques and posterior predictive inferences in the risk assessment task. We generalize our previously published Bayesian multivariate monotonic regression procedure to a survival analysis setting, combined with a computationally efficient estimation procedure utilizing case–base sampling. To achieve parsimony in the model fit, we allow for multidimensional relationships within specified subsets of risk factors, determined either on a priori basis or as a part of the estimation procedure. We apply the proposed methods for 10‐year CVD risk assessment in a Finnish population. © 2014 Board of the Foundation of the Scandinavian Journal of Statistics 相似文献

12.

Some solutions to the multivariate Behrens–Fisher problem for dissimilarity‐based analyses

Marti J. Anderson Daniel C. I. Walsh K. Robert Clarke Ray N. Gorley Edlin Guerra‐Castro 《Australian & New Zealand Journal of Statistics》2017,59(1):57-79

The essence of the generalised multivariate Behrens–Fisher problem (BFP) is how to test the null hypothesis of equality of mean vectors for two or more populations when their dispersion matrices differ. Solutions to the BFP usually assume variables are multivariate normal and do not handle high‐dimensional data. In ecology, species' count data are often high‐dimensional, non‐normal and heterogeneous. Also, interest lies in analysing compositional dissimilarities among whole communities in non‐Euclidean (semi‐metric or non‐metric) multivariate space. Hence, dissimilarity‐based tests by permutation (e.g., PERMANOVA, ANOSIM) are used to detect differences among groups of multivariate samples. Such tests are not robust, however, to heterogeneity of dispersions in the space of the chosen dissimilarity measure, most conspicuously for unbalanced designs. Here, we propose a modification to the PERMANOVA test statistic, coupled with either permutation or bootstrap resampling methods, as a solution to the BFP for dissimilarity‐based tests. Empirical simulations demonstrate that the type I error remains close to nominal significance levels under classical scenarios known to cause problems for the un‐modified test. Furthermore, the permutation approach is found to be more powerful than the (more conservative) bootstrap for detecting changes in community structure for real ecological datasets. The utility of the approach is shown through analysis of 809 species of benthic soft‐sediment invertebrates from 101 sites in five areas spanning 1960 km along the Norwegian continental shelf, based on the Jaccard dissimilarity measure. 相似文献

13.

The Racing Car Problem

Aaron Tenenbein 《The American statistician》2013,67(1):38-40

An approach to teaching linear regression with unbalanced data is outlined that emphasizes its role as a method of adjustment for associated regressors. The method is introduced via direct standardization, a simple form of regression for categorical regressors. Properties of regression in the presence of association and interaction are emphasized. Least squares is introduced as a more efficient way of calculating adjusted effects for which exact decompositions of the variance are possible. Interval-scaled regressors are initially grouped and treated as categorical; polynomial regression and analysis of covariance can be introduced later as alternative methods. 相似文献

14.

Dynamic functional data analysis with non-parametric state space models

Márcio Poletti Laurini 《Journal of applied statistics》2014,41(1):142-163

In this article, we introduce a new method for modelling curves with dynamic structures, using a non-parametric approach formulated as a state space model. The non-parametric approach is based on the use of penalised splines, represented as a dynamic mixed model. This formulation can capture the dynamic evolution of curves using a limited number of latent factors, allowing an accurate fit with a small number of parameters. We also present a new method to determine the optimal smoothing parameter through an adaptive procedure, using a formulation analogous to a model of stochastic volatility (SV). The non-parametric state space model allows unifying different methods applied to data with a functional structure in finance. We present the advantages and limitations of this method through simulation studies and also by comparing its predictive performance with other parametric and non-parametric methods used in financial applications using data on the term structure of interest rates. 相似文献

15.

Simulation assessments of statistical aspects of bioequivalence in the pharmaceutical industry

Scott D. Patterson Byron Jones 《Pharmaceutical statistics》2004,3(1):13-23

An Erratum has been published for this article in Pharmaceutical Statistics 2004; 3(3): 232 Since the early 1990s, average bioequivalence (ABE) has served as the international standard for demonstrating that two formulations of drug product will provide the same therapeutic benefit and safety profile. Population (PBE) and individual (IBE) bioequivalence have been the subject of intense international debate since methods for their assessment were proposed in the late 1980s. Guidance has been proposed by the Food and Drug Administration (FDA) for the implementation of these techniques in the pioneer and generic pharmaceutical industries. Hitherto no consensus among regulators, academia and industry has been established on the use of the IBE and PBE metrics. The need for more stringent bioequivalence criteria has not been demonstrated, and it is known that the PBE and IBE criteria proposed by the FDA are actually less stringent under certain conditions. The statistical properties of method of moments and restricted maximum likelihood modelling in replicate designs will be summarized, and the application of these techniques in the assessment of ABE, IBE and PBE will be considered based on a database of 51 replicate design studies and using simulation. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献

16.

Developments in sample survey theory: An appraisal

J.N.K. Rao 《Revue canadienne de statistique》1997,25(1):1-21

Recent developments in sample survey theory include the following topics: foundational aspects of inference, resampling methods for variance and confidence interval estimation, imputation for nonresponse and analysis of complex survey data. An overview and appraisal of some of these developments are presented. 相似文献

17.

Devan V. Mehrotra 《Pharmaceutical statistics》2014,13(6):376-387

In many two‐period, two‐treatment (2 × 2) crossover trials, for each subject, a continuous response of interest is measured before and after administration of the assigned treatment within each period. The resulting data are typically used to test a null hypothesis involving the true difference in treatment response means. We show that the power achieved by different statistical approaches is greatly influenced by (i) the ‘structure’ of the variance–covariance matrix of the vector of within‐subject responses and (ii) how the baseline (i.e., pre‐treatment) responses are accounted for in the analysis. For (ii), we compare different approaches including ignoring one or both period baselines, using a common change from baseline analysis (which we advise against), using functions of one or both baselines as period‐specific or period‐invariant covariates, and doing joint modeling of the post‐baseline and baseline responses with corresponding mean constraints for the latter. Based on theoretical arguments and simulation‐based type I error rate and power properties, we recommend an analysis of covariance approach that uses the within‐subject difference in treatment responses as the dependent variable and the corresponding difference in baseline responses as a covariate. Data from three clinical trials are used to illustrate the main points. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

18.

Sensitivity analysis of longitudinal count responses: a local influence approach and application to medical data

Alejandra Tapia Viviana Giampaoli Maria del Pilar Diaz 《Journal of applied statistics》2019,46(6):1021-1042

Longitudinal count responses are often analyzed with a Poisson mixed model. However, under overdispersion, these responses are better described by a negative binomial mixed model. Estimators of the corresponding parameters are usually obtained by the maximum likelihood method. To investigate the stability of these maximum likelihood estimators, we propose a methodology of sensitivity analysis using local influence. As count responses are discrete, we are unable to perturb them with the standard scheme used in local influence. Then, we consider an appropriate perturbation for the means of these responses. The proposed methodology is useful in different applications, but particularly when medical data are analyzed, because the removal of influential cases can change the statistical results and then the medical decision. We study the performance of the methodology by using Monte Carlo simulation and applied it to real medical data related to epilepsy and headache. All of these numerical studies show the good performance and potential of the proposed methodology. 相似文献

19.

Asymptotically refined score and GOF tests for inverse Gaussian models

《Journal of Statistical Computation and Simulation》2012,82(16):3243-3269

ABSTRACT

The score test and the GOF test for the inverse Gaussian distribution, in particular the latter, are known to have large size distortion and hence unreliable power when referring to the asymptotic critical values. We show in this paper that with the appropriately bootstrapped critical values, these tests become second-order accurate, with size distortion being essentially eliminated and power more reliable. Two major generalizations of the score test are made: one is to allow the data to be right-censored, and the other is to allow the existence of covariate effects. A data mapping method is introduced for the bootstrap to be able to produce censored data that are conformable with the null model. Monte Carlo results clearly favour the proposed bootstrap tests. Real data illustrations are given. 相似文献

20.

Maximum likelihood estimation for outcome‐dependent samples

Robert Graham Clark 《Australian & New Zealand Journal of Statistics》2020,62(1):49-70

In outcome‐dependent sampling, the continuous or binary outcome variable in a regression model is available in advance to guide selection of a sample on which explanatory variables are then measured. Selection probabilities may either be a smooth function of the outcome variable or be based on a stratification of the outcome. In many cases, only data from the final sample is accessible to the analyst. A maximum likelihood approach for this data configuration is developed here for the first time. The likelihood for fully general outcome‐dependent designs is stated, then the special case of Poisson sampling is examined in more detail. The maximum likelihood estimator differs from the well‐known maximum sample likelihood estimator, and an information bound result shows that the former is asymptotically more efficient. A simulation study suggests that the efficiency difference is generally small. Maximum sample likelihood estimation is therefore recommended in practice when only sample data is available. Some new smooth sample designs show considerable promise. 相似文献