期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Microeconometric models and anonymized micro data

Gerd Ronning 《Allgemeines Statistisches Archiv》2006,90(1):153-166

Summary The paper first provides a short review of the most common microeconometric models including logit, probit, discrete choice, duration models, models for count data and Tobit-type models. In the second part we consider the situation that the micro data have undergone some anonymization procedure which has become an important issue since otherwise confidentiality would not be guaranteed. We shortly describe the most important approaches for data protection which also can be seen as creating errors of measurement by purpose. We also consider the possibility of correcting the estimation procedure while taking into account the anonymization procedure. We illustrate this for the case of binary data which are anonymized by ‘post-randomization’ and which are used in a probit model. We show the effect of ‘naive’ estimation, i. e. when disregarding the anonymization procedure. We also show that a ‘corrected’ estimate is available which is satisfactory in statistical terms. This is also true if parameters of the anonymization procedure have to be estimated, too. Research in this paper is related to the project “Faktische Anonymisierung wirtschaftsstatistischer Einzeldaten” financed by German Ministry of Research and Technology. 相似文献

2.

Periodic integration: further results on model selection and forecasting

Philip Hans Franses Richard Paap 《Statistical Papers》1996,37(1):33-52

This paper considers model selection and forecasting issues in two closely related models for nonstationary periodic autoregressive time series [PAR]. Periodically integrated seasonal time series [PIAR] need a periodic differencing filter to remove the stochastic trend. On the other hand, when the nonperiodic first order differencing filter can be applied, one can have a periodic model with a nonseasonal unit root [PARI]. In this paper, we discuss and evaluate two testing strategies to select between these two models. Furthermore, we compare the relative forecasting performance of each model using Monte Carlo simulations and some U.K. macroeconomic seasonal time series. One result is that forecasting with PARI models while the data generating process is a PIAR process seems to be worse thanvice versa. 相似文献

3.

Investigation about a screening step in model selection

Willi Sauerbrei Norbert Holländer Anika Buchholz 《Statistics and Computing》2008,18(2):195-208

In many studies a large number of variables is measured and the identification of relevant variables influencing an outcome is an important task. For variable selection several procedures are available. However, focusing on one model only neglects that there usually exist other equally appropriate models. Bayesian or frequentist model averaging approaches have been proposed to improve the development of a predictor. With a larger number of variables (say more than ten variables) the resulting class of models can be very large. For Bayesian model averaging Occam’s window is a popular approach to reduce the model space. As this approach may not eliminate any variables, a variable screening step was proposed for a frequentist model averaging procedure. Based on the results of selected models in bootstrap samples, variables are eliminated before deriving a model averaging predictor. As a simple alternative screening procedure backward elimination can be used. Through two examples and by means of simulation we investigate some properties of the screening step. In the simulation study we consider situations with fifteen and 25 variables, respectively, of which seven have an influence on the outcome. With the screening step most of the uninfluential variables will be eliminated, but also some variables with a weak effect. Variable screening leads to more applicable models without eliminating models, which are more strongly supported by the data. Furthermore, we give recommendations for important parameters of the screening step. 相似文献

4.

On estimation of linear transformation models with nested case–control sampling

Lu W Liu M 《Lifetime data analysis》2012,18(1):80-93

Nested case–control (NCC) sampling is widely used in large epidemiological cohort studies for its cost effectiveness, but its data analysis primarily relies on the Cox proportional hazards model. In this paper, we consider a family of linear transformation models for analyzing NCC data and propose an inverse selection probability weighted estimating equation method for inference. Consistency and asymptotic normality of our estimators for regression coefficients are established. We show that the asymptotic variance has a closed analytic form and can be easily estimated. Numerical studies are conducted to support the theory and an application to the Wilms’ Tumor Study is also given to illustrate the methodology. 相似文献

5.

Boosting techniques for nonlinear time series models

Nikolay Robinzonov Gerhard Tutz Torsten Hothorn 《AStA Advances in Statistical Analysis》2012,96(1):99-122

Many of the popular nonlinear time series models require a priori the choice of parametric functions which are assumed to be appropriate in specific applications. This approach is mainly used in financial applications, when sufficient knowledge is available about the nonlinear structure between the covariates and the response. One principal strategy to investigate a broader class on nonlinear time series is the Nonlinear Additive AutoRegressive (NAAR) model. The NAAR model estimates the lags of a time series as flexible functions in order to detect non-monotone relationships between current and past observations. We consider linear and additive models for identifying nonlinear relationships. A componentwise boosting algorithm is applied for simultaneous model fitting, variable selection, and model choice. Thus, with the application of boosting for fitting potentially nonlinear models we address the major issues in time series modelling: lag selection and nonlinearity. By means of simulation we compare boosting to alternative nonparametric methods. Boosting shows a strong overall performance in terms of precise estimations of highly nonlinear lag functions. The forecasting potential of boosting is examined on the German industrial production (IP); to improve the model’s forecasting quality we include additional exogenous variables. Thus we address the second major aspect in this paper which concerns the issue of high dimensionality in models. Allowing additional inputs in the model extends the NAAR model to a broader class of models, namely the NAARX model. We show that boosting can cope with large models which have many covariates compared to the number of observations. 相似文献

6.

Identification of fractional differencing autoregressive models †

Guibin Li 《统计学通讯:理论与方法》2013,42(10):2635-2643

This paper proposes an identification method to fractional differencing autoregressive models, and this method gives a consistent estimator for fractional differencing order and efficient estimates for parameters in fractional differencing autoregressive models. 相似文献

7.

A unified approach of testing for discrete and continuous Pareto laws

Simos G. Meintanis 《Statistical Papers》2009,50(3):569-580

New tests are proposed for the Pareto distribution as well as its discrete version, the so called Zipf’s law. In both cases the discrepancy between the empirical moment of arbitrary negative order and its theoretical counterpart is utilized in a weighted integral test statistic. If the weight function is of exponential rate of decay interesting limit statistics are obtained. The tests are shown to be consistent under fixed alternatives and a Monte Carlo study is drawn to investigate the performance of the proposed procedures in small samples. Furthermore a bootstrap procedure is proposed to cope with the case of unknown shape parameter. We conclude with applications to real data. 相似文献

8.

Compromise Pitman estimators

N. Fournier 《Journal of statistical planning and inference》2011,141(8):2656-2669

In this paper, we investigate the construction of compromise estimators of location and scale, by averaging over several models selected among a specified large set of possible models. The weight given to each distribution is based on the profile likelihood, which leads to a notion of distance between distributions as we study the asymptotic behaviour of such estimators. The selection of the models is made in a minimax way, in order to choose distributions that are close to any possible distribution. We also present simulation results of such compromise estimators based on contaminated Gaussian and Student's t distributions. 相似文献

9.

Computing Wald criteria for nested hypotheses

F. C. Palm D. A. Kodde 《Statistical Papers》1988,29(1):169-190

We present a numerically convenient procedure for computing Wald criteria for nested hypotheses. Similar to Szroeter’s (1983) generalized Wald test, the suggested procedure does not require explicit derivation of the restrictions implied by the null hypothesis and hence its use might eliminate an intricate step in testing linear and nonlinear hypotheses. We show that the traditional Wald test, Szroeter’s (1983) generalized Wald test and our procedure are asymptotically equivalent under H₀. A class of nonlinear transformations of the restrictions for which the Wald statistic is asymptotically invariant is discussed. Finally, we illustrate the use of our procedure for testing the common factor restrictions in a dynamic regression model. 相似文献

10.

Remedying the Neyman–Scott phenomenon in model discrimination

《Journal of Statistical Computation and Simulation》2012,82(6):749-757

The objective of this paper is to investigate through simulation the possible presence of the incidental parameters problem when performing frequentist model discrimination with stratified data. In this context, model discrimination amounts to considering a structural parameter taking values in a finite space, with k points, k≥2. This setting seems to have not yet been considered in the literature about the Neyman–Scott phenomenon. Here we provide Monte Carlo evidence of the severity of the incidental parameters problem also in the model discrimination setting and propose a remedy for a special class of models. In particular, we focus on models that are scale families in each stratum. We consider traditional model selection procedures, such as the Akaike and Takeuchi information criteria, together with the best frequentist selection procedure based on maximization of the marginal likelihood induced by the maximal invariant, or of its Laplace approximation. Results of two Monte Carlo experiments indicate that when the sample size in each stratum is fixed and the number of strata increases, correct selection probabilities for traditional model selection criteria may approach zero, unlike what happens for model discrimination based on exact or approximate marginal likelihoods. Finally, two examples with real data sets are given. 相似文献

11.

Penalized models to estimate customer survival

Silvia Figini 《Statistical Methods and Applications》2010,19(1):141-150

In this paper we propose a novel procedure, for the estimation of semiparametric survival functions. The proposed technique adapts penalized likelihood survival models to the context of lifetime value modeling. The method extends classical Cox model by introducing a smoothing parameter that can be estimated by means of penalized maximum likelihood procedures. Markov Chain Monte Carlo methods are employed to effectively estimate such smoothing parameter, using an algorithm which combines Metropolis–Hastings and Gibbs sampling. Our proposal is contextualized and compared with conventional models, with reference to a marketing application that involves the prediction of customer’s lifetime value estimation. 相似文献

12.

Estimators in capture–recapture studies with two sources

Sarah Brittain Dankmar Böhning 《AStA Advances in Statistical Analysis》2009,93(1):23-47

This paper investigates the applications of capture–recapture methods to human populations. Capture–recapture methods are commonly used in estimating the size of wildlife populations but can also be used in epidemiology and social sciences, for estimating prevalence of a particular disease or the size of the homeless population in a certain area. Here we focus on estimating the prevalence of infectious diseases. Several estimators of population size are considered: the Lincoln–Petersen estimator and its modified version, the Chapman estimator, Chao’s lower bound estimator, the Zelterman’s estimator, McKendrick’s moment estimator and the maximum likelihood estimator. In order to evaluate these estimators, they are applied to real, three-source, capture-recapture data. By conditioning on each of the sources of three source data, we have been able to compare the estimators with the true value that they are estimating. The Chapman and Chao estimators were compared in terms of their relative bias. A variance formula derived through conditioning is suggested for Chao’s estimator, and normal 95% confidence intervals are calculated for this and the Chapman estimator. We then compare the coverage of the respective confidence intervals. Furthermore, a simulation study is included to compare Chao’s and Chapman’s estimator. Results indicate that Chao’s estimator is less biased than Chapman’s estimator unless both sources are independent. Chao’s estimator has also the smaller mean squared error. Finally, the implications and limitations of the above methods are discussed, with suggestions for further development. We are grateful to the Medical Research Council for supporting this work. 相似文献

13.

Robust Model Selection for Stochastic Processes

Jesús E. García V. A. González-López M. L. L. Viola 《统计学通讯:理论与方法》2014,43(10-12):2516-2526

We address the problem of robust model selection for finite memory stochastic processes. Consider m independent samples, with most of them being realizations of the same stochastic process with law Q, which is the one we want to retrieve. We define the asymptotic breakdown point γ for a model selection procedure and also we devise a model selection procedure. We compute the value of γ which is 0.5, when all the processes are Markovian. This result is valid for any family of finite order Markov models but for simplicity we will focus on the family of variable length Markov chains. 相似文献

14.

Robust selection of variables in linear discriminant analysis

Valentin Todorov 《Statistical Methods and Applications》2007,15(3):395-407

A commonly used procedure for reduction of the number of variables in linear discriminant analysis is the stepwise method for variable selection. Although often criticized, when used carefully, this method can be a useful prelude to a further analysis. The contribution of a variable to the discriminatory power of the model is usually measured by the maximum likelihood ratio criterion, referred to as Wilks’ lambda. It is well known that the Wilks’ lambda statistic is extremely sensitive to the influence of outliers. In this work a robust version of the Wilks’ lambda statistic will be constructed based on the Minimum Covariance Discriminant (MCD) estimator and its reweighed version which has a higher efficiency. Taking advantage of the availability of a fast algorithm for computing the MCD a simulation study will be done to evaluate the performance of this statistic. The presentation of material in this article does not imply the expression of any opinion whatsoever on the part of Austro Control GmbH and is the sole responsibility of the authors. 相似文献

15.

Fitting E(max) models to clinical trial dose-response data

Kirby S Brain P Jones B 《Pharmaceutical statistics》2011,10(2):143-149

We consider fitting Emax models to the primary endpoint for a parallel group dose–response clinical trial. Such models can be difficult to fit using Maximum Likelihood if the data give little information about the maximum possible response. Consequently, we consider alternative models that can be derived as limiting cases, which can usually be fitted. Furthermore we propose two model selection procedures for choosing between the different models. These model selection procedures are compared with two model selection procedures which have previously been used. In a simulation study we find that the model selection procedure that performs best depends on the underlying true situation. One of the new model selection procedures gives what may be regarded as the most robust of the procedures. 相似文献

16.

A new model selection procedure for finite mixture regression models

Conglian Yu 《统计学通讯:理论与方法》2020,49(18):4347-4366

Abstract

In this article, we propose a new penalized-likelihood method to conduct model selection for finite mixture of regression models. The penalties are imposed on mixing proportions and regression coefficients, and hence order selection of the mixture and the variable selection in each component can be simultaneously conducted. The consistency of order selection and the consistency of variable selection are investigated. A modified EM algorithm is proposed to maximize the penalized log-likelihood function. Numerical simulations are conducted to demonstrate the finite sample performance of the estimation procedure. The proposed methodology is further illustrated via real data analysis. 相似文献

17.

Using the Residual White Noise Autoregressive Order Determination Criterion to Identify Unit Roots in Arima Models

Sergio G. Koreisha Tarmo Pukkila 《统计学通讯:模拟与计算》2013,42(1):259-293

We present a simplified form of a univariate identification approach for time series models based on the residual white noise autoregressive order determination criterion and linear estimation methods. We also show how the procedure can be used to identify the degree of differencing necessary to induce stationarity in data. The performance of this approach is also contrasted with Portmanteau tests for detection of white noise residuals and with Dickey-Fuller and Bayesian procedures for detection of unit roots. Simulated and economic data are used to demonstrate the capabilities of the modified approach. 相似文献

18.

Robust variable selection with application to quality of life research 总被引：1，自引：0，他引：1

Andreas Alfons Wolfgang E. Baaske Peter Filzmoser Wolfgang Mader Roland Wieser 《Statistical Methods and Applications》2011,20(1):65-82

A large database containing socioeconomic data from 60 communities in Austria and Germany has been built, stemming from 18,000 citizens’ responses to a survey, together with data from official statistical institutes about these communities. This paper describes a procedure for extracting a small set of explanatory variables to explain response variables such as the cognition of quality of life. For better interpretability, the set of explanatory variables needs to be very small and the dependencies among the selected variables need to be low. Due to possible inhomogeneities within the data set, it is further required that the solution is robust to outliers and deviating points. In order to achieve these goals, a robust model selection method, combined with a strategy to reduce the number of selected predictor variables to a necessary minimum, is developed. In addition, this context-sensitive method is applied to obtain responsible factors describing quality of life in communities. 相似文献

19.

State space mixed models for longitudinal observations with binary and binomial responses

Claudia Czado Peter X. -K. Song 《Statistical Papers》2008,49(4):691-714

We propose a new class of state space models for longitudinal discrete response data where the observation equation is specified in an additive form involving both deterministic and random linear predictors. These models allow us to explicitly address the effects of trend, seasonal or other time-varying covariates while preserving the power of state space models in modeling serial dependence in the data. We develop a Markov chain Monte Carlo algorithm to carry out statistical inference for models with binary and binomial responses, in which we invoke de Jong and Shephard’s (Biometrika 82(2):339–350, 1995) simulation smoother to establish an efficient sampling procedure for the state variables. To quantify and control the sensitivity of posteriors on the priors of variance parameters, we add a signal-to-noise ratio type parameter in the specification of these priors. Finally, we illustrate the applicability of the proposed state space mixed models for longitudinal binomial response data in both simulation studies and data examples. 相似文献

20.

Bayesian phase II adaptive randomization by jointly modeling time-to-event efficacy and binary toxicity

Xiudong?Lei Ying?Yuan Guosheng?Yin Email author 《Lifetime data analysis》2011,17(1):156-174

In oncology, toxicity is typically observable shortly after a chemotherapy treatment, whereas efficacy, often characterized by tumor shrinkage, is observable after a relatively long period of time. In a phase II clinical trial design, we propose a Bayesian adaptive randomization procedure that accounts for both efficacy and toxicity outcomes. We model efficacy as a time-to-event endpoint and toxicity as a binary endpoint, sharing common random effects in order to induce dependence between the bivariate outcomes. More generally, we allow the randomization probability to depend on patients’ specific covariates, such as prognostic factors. Early stopping boundaries are constructed for toxicity and futility, and a superior treatment arm is recommended at the end of the trial. Following the setup of a recent renal cancer clinical trial at M. D. Anderson Cancer Center, we conduct extensive simulation studies under various scenarios to investigate the performance of the proposed method, and compare it with available Bayesian adaptive randomization procedures. 相似文献