首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The aim of this paper is to study the asymptotic properties of a class of kernel conditional mode estimates whenever functional stationary ergodic data are considered. To be more precise on the matter, in the ergodic data setting, we consider a random elements (XZ) taking values in some semi-metric abstract space \(E\times F\). For a real function \(\varphi \) defined on the space F and \(x\in E\), we consider the conditional mode of the real random variable \(\varphi (Z)\) given the event “\(X=x\)”. While estimating the conditional mode function, say \(\theta _\varphi (x)\), using the well-known kernel estimator, we establish the strong consistency with rate of this estimate uniformly over Vapnik–Chervonenkis classes of functions \(\varphi \). Notice that the ergodic setting offers a more general framework than the usual mixing structure. Two applications to energy data are provided to illustrate some examples of the proposed approach in time series forecasting framework. The first one consists in forecasting the daily peak of electricity demand in France (measured in Giga-Watt). Whereas the second one deals with the short-term forecasting of the electrical energy (measured in Giga-Watt per Hour) that may be consumed over some time intervals that cover the peak demand.  相似文献   

2.
Rubin (1976 Rubin, D.B. (1976). Inference and missing data. Biometrika 63(3):581592.[Crossref], [Web of Science ®] [Google Scholar]) derived general conditions under which inferences that ignore missing data are valid. These conditions are sufficient but not generally necessary, and therefore may be relaxed in some special cases. We consider here the case of frequentist estimation of a conditional cdf subject to missing outcomes. We partition a set of data into outcome, conditioning, and latent variables, all of which potentially affect the probability of a missing response. We describe sufficient conditions under which a complete-case estimate of the conditional cdf of the outcome given the conditioning variable is unbiased. We use simulations on a renal transplant data set (Dienemann et al.) to illustrate the implications of these results.  相似文献   

3.
This article advocates the following perspective: When confronting a scientific problem, the field of statistics enters by viewing the problem as one where the scientific answer could be calculated if some missing data, hypothetical or real, were available. Thus, statistical effort should be devoted to three steps:
  1. formulate the missing data that would allow this calculation,
  2. stochastically fill in these missing data, and
  3. do the calculations as if the filled-in data were available.
This presentation discusses: conceptual benefits, such as for causal inference using potential outcomes; computational benefits, such as afforded by using the EM algorithm and related data augmentation methods based on MCMC; and inferential benefits, such as valid interval estimation and assessment of assumptions based on multiple imputation.  相似文献   

4.
This article deals with random projections applied as a data reduction technique for Bayesian regression analysis. We show sufficient conditions under which the entire d-dimensional distribution is approximately preserved under random projections by reducing the number of data points from n to \(k\in O({\text {poly}}(d/\varepsilon ))\) in the case \(n\gg d\). Under mild assumptions, we prove that evaluating a Gaussian likelihood function based on the projected data instead of the original data yields a \((1+O(\varepsilon ))\)-approximation in terms of the \(\ell _2\) Wasserstein distance. Our main result shows that the posterior distribution of Bayesian linear regression is approximated up to a small error depending on only an \(\varepsilon \)-fraction of its defining parameters. This holds when using arbitrary Gaussian priors or the degenerate case of uniform distributions over \(\mathbb {R}^d\) for \(\beta \). Our empirical evaluations involve different simulated settings of Bayesian linear regression. Our experiments underline that the proposed method is able to recover the regression model up to small error while considerably reducing the total running time.  相似文献   

5.
6.
Assume that a linear random-effects model \(\mathbf{y}= \mathbf{X}\varvec{\beta }+ \varvec{\varepsilon }= \mathbf{X}(\mathbf{A}\varvec{\alpha }+ \varvec{\gamma }) + \varvec{\varepsilon }\) is transformed as \(\mathbf{T}\mathbf{y}= \mathbf{T}\mathbf{X}\varvec{\beta }+ \mathbf{T}\varvec{\varepsilon }= \mathbf{T}\mathbf{X}(\mathbf{A}\varvec{\alpha }+ \varvec{\gamma }) + \mathbf{T}\varvec{\varepsilon }\) by pre-multiplying a given matrix \(\mathbf{T}\) of arbitrary rank. These two models are not necessarily equivalent unless \(\mathbf{T}\) is of full column rank, and we have to work with this derived model in many situations. Because predictors/estimators of the parameter spaces under the two models are not necessarily the same, it is primary work to compare predictors/estimators in the two models and to establish possible links between the inference results obtained from two models. This paper presents a general algebraic approach to the problem of comparing best linear unbiased predictors (BLUPs) of parameter spaces in an original linear random-effects model and its transformations, and provides a group of fundamental and comprehensive results on mathematical and statistical properties of the BLUPs. In particular, we construct many equalities for the BLUPs under an original linear random-effects model and its transformations, and obtain necessary and sufficient conditions for the equalities to hold.  相似文献   

7.
Sample size estimation for comparing the rates of change in two-arm repeated measurements has been investigated by many investigators. In contrast, the literature has paid relatively less attention to sample size estimation for studies with multi-arm repeated measurements where the design and data analysis can be more complex than two-arm trials. For continuous outcomes, Jung and Ahn (2004 Jung, S., Ahn, C. (2004). K-sample test and sample size calculation for comparing slopes in data with repeated measurements. Biometrical J. 46(5):554564.[Crossref], [Web of Science ®] [Google Scholar]) and Zhang and Ahn (2013 Zhang, S., Ahn, C. (2013). Sample size calculation for comparing time-averaged responses in k-group repeated measurement studies. Comput. Stat. Data Anal. 58:283291.[Crossref], [PubMed], [Web of Science ®] [Google Scholar]) have presented sample size formulas to compare the rates of change and time-averaged responses in multi-arm trials, using the generalized estimating equation (GEE) approach. To our knowledge, there has been no corresponding development for multi-arm trials with count outcomes. We present a sample size formula for comparing the rates of change in multi-arm repeated count outcomes using the GEE approach that accommodates various correlation structures, missing data patterns, and unbalanced designs. We conduct simulation studies to assess the performance of the proposed sample size formula under a wide range of designing configurations. Simulation results suggest that empirical type I error and power are maintained close to their nominal levels. The proposed method is illustrated using an epileptic clinical trial example.  相似文献   

8.
In this paper we consider an acceptance-rejection (AR) sampler based on deterministic driver sequences. We prove that the discrepancy of an N element sample set generated in this way is bounded by \(\mathcal {O} (N^{-2/3}\log N)\), provided that the target density is twice continuously differentiable with non-vanishing curvature and the AR sampler uses the driver sequence \(\mathcal {K}_M= \{( j \alpha , j \beta ) ~~ mod~~1 \mid j = 1,\ldots ,M\},\) where \(\alpha ,\beta \) are real algebraic numbers such that \(1,\alpha ,\beta \) is a basis of a number field over \(\mathbb {Q}\) of degree 3. For the driver sequence \(\mathcal {F}_k= \{ ({j}/{F_k}, \{{jF_{k-1}}/{F_k}\} ) \mid j=1,\ldots , F_k\},\) where \(F_k\) is the k-th Fibonacci number and \(\{x\}=x-\lfloor x \rfloor \) is the fractional part of a non-negative real number x, we can remove the \(\log \) factor to improve the convergence rate to \(\mathcal {O}(N^{-2/3})\), where again N is the number of samples we accepted. We also introduce a criterion for measuring the goodness of driver sequences. The proposed approach is numerically tested by calculating the star-discrepancy of samples generated for some target densities using \(\mathcal {K}_M\) and \(\mathcal {F}_k\) as driver sequences. These results confirm that achieving a convergence rate beyond \(N^{-1/2}\) is possible in practice using \(\mathcal {K}_M\) and \(\mathcal {F}_k\) as driver sequences in the acceptance-rejection sampler.  相似文献   

9.
An inequality for the sum of squares of rank differences associated with Spearman’s rank correlation coefficient, when ties and missing data are present in both rankings, was established numerically in Loukas and Papaioannou (1991 Loukas, S., Papaioannou, T. (1991). Rank correlation inequalities with ties and missing data. Stat. Probab. Lett. 11:5356.[Crossref], [Web of Science ®] [Google Scholar]). That inequality is improved and generalized.  相似文献   

10.
The computation of penalized quantile regression estimates is often computationally intensive in high dimensions. In this paper we propose a coordinate descent algorithm for computing the penalized smooth quantile regression (cdaSQR) with convex and nonconvex penalties. The cdaSQR approach is based on the approximation of the objective check function, which is not differentiable at zero, by a modified check function which is differentiable at zero. Then, using the maximization-minimization trick of the gcdnet algorithm (Yang and Zou in, J Comput Graph Stat 22(2):396–415, 2013), we update each coefficient simply and efficiently. In our implementation, we consider the convex penalties \(\ell _1+\ell _2\) and the nonconvex penalties SCAD (or MCP) \(+ \ell _2\). We establishe the convergence property of the csdSQR with \(\ell _1+\ell _2\) penalty. The numerical results show that our implementation is an order of magnitude faster than its competitors. Using simulations we compare the speed of our algorithm to its competitors. Finally, the performance of our algorithm is illustrated on three real data sets from diabetes, leukemia and Bardet–Bidel syndrome gene expression studies.  相似文献   

11.
This paper addresses the issue of estimating the expectation of a real-valued random variable of the form \(X = g(\mathbf {U})\) where g is a deterministic function and \(\mathbf {U}\) can be a random finite- or infinite-dimensional vector. Using recent results on rare event simulation, we propose a unified framework for dealing with both probability and mean estimation for such random variables, i.e. linking algorithms such as Tootsie Pop Algorithm or Last Particle Algorithm with nested sampling. Especially, it extends nested sampling as follows: first the random variable X does not need to be bounded any more: it gives the principle of an ideal estimator with an infinite number of terms that is unbiased and always better than a classical Monte Carlo estimator—in particular it has a finite variance as soon as there exists \(k \in \mathbb {R}> 1\) such that \({\text {E}}\left[ X^k \right] < \infty \). Moreover we address the issue of nested sampling termination and show that a random truncation of the sum can preserve unbiasedness while increasing the variance only by a factor up to 2 compared to the ideal case. We also build an unbiased estimator with fixed computational budget which supports a Central Limit Theorem and discuss parallel implementation of nested sampling, which can dramatically reduce its running time. Finally we extensively study the case where X is heavy-tailed.  相似文献   

12.
13.
14.
This article advocates the following perspective: When confronting a scientific problem, the field of statistics enters by viewing the problem as one where the scientific answer could be calculated if some missing data, hypothetical or real, were available. Thus, statistical effort should be devoted to three steps:
1.  formulate the missing data that would allow this calculation,
2.  stochastically fill in these missing data, and
3.  do the calculations as if the filled-in data were available.
This presentation discusses: conceptual benefits, such as for causal inference using potential outcomes; computational benefits, such as afforded by using the EM algorithm and related data augmentation methods based on MCMC; and inferential benefits, such as valid interval estimation and assessment of assumptions based on multiple imputation. JEL classification  C10, C14, C15  相似文献   

15.
Missing values are common in longitudinal data studies. The missing data mechanism is termed non-ignorable (NI) if the probability of missingness depends on the non-response (missing) observations. This paper presents a model for the ordinal categorical longitudinal data with NI non-monotone missing values. We assumed two separate models for the response and missing procedure. The response is modeled as ordinal logistic, whereas the logistic binary model is considered for the missing process. We employ these models in the context of so-called shared-parameter models, where the outcome and missing data models are connected by a common set of random effects. It is commonly assumed that the random effect follows the normal distribution in longitudinal data with or without missing data. This can be extremely restrictive in practice, and it may result in misleading statistical inferences. In this paper, we instead adopt a more flexible alternative distribution which is called the skew-normal distribution. The methodology is illustrated through an application to Schizophrenia Collaborative Study data [19 D. Hedeker, Generalized linear mixed models, in Encyclopedia of Statistics in Behavioral Science, B. Everitt and D. Howell, eds., John Wiley, London, 2005, pp. 729738. [Google Scholar]] and a simulation.  相似文献   

16.
A case–cohort design was proposed by Prentice (1986) Prentice, R.L. (1986). A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73:111.[Crossref], [Web of Science ®] [Google Scholar] in order to reduce costs. It involves the collection of covariate data from all subjects who experience the event of interest, and from the members of a random subcohort. This case–cohort design has been extensively studied, but is exclusively considered for right-censored data. In this article, we propose case–cohort designs adapted to length-biased data under the proportional hazards assumption. A pseudo-likelihood procedure is described for estimating parameters and the corresponding cumulative hazard function. The large sample properties, such as consistency and weak convergence, for such pseudo-likelihood estimators are presented. We also conduct simulation studies to show that the proposed estimators are appropriate for practical use. A real Oscar Awards data is provided.  相似文献   

17.
We present a time-domain goodness-of-fit (gof) diagnostic test that is based on signal-extraction variances for nonstationary time series. This diagnostic test extends the time-domain gof statistic of Maravall (2003 Maravall, A. (2003). A class of diagnostics in the ARIMA-model-based decomposition of a time series. Memorandum, Bank of Spain. Available at http://www.bde.es/servicio/software/tramo/diagnosticsamb.pdf [Google Scholar]) by taking into account the effects of model parameter uncertainty, utilizing theoretical results of McElroy and Holan (2009 McElroy, T., Holan, S. (2009). A local spectral approach for assessing time series model misspeci?cation. Journal of Multivariate Analysis 100:604621.[Crossref], [Web of Science ®] [Google Scholar]). We demonstrate that omitting this correction results in a severely undersized statistic. Adequate size and power are obtained in Monte Carlo studies for fairly short time series (10 to 15 years of monthly data). Our Monte Carlo studies of finite sample size and power consider different combinations of both signal and noise components using seasonal, trend, and irregular component models obtained via canonical decomposition. Details of the implementation appropriate for SARIMA models are given. We apply the gof diagnostic test statistics to several U.S. Census Bureau time series. The results generally corroborate the output of the automatic model selection procedure of the X-12-ARIMA software, which in contrast to our diagnostic test statistic does not involve hypothesis testing. We conclude that these diagnostic test statistics are a useful supplementary model-checking tool for practitioners engaged in the task of model-based seasonal adjustment.  相似文献   

18.
Best et al. (Best, D. J., Rayner, J. C. W., O'Sullivan, M. G. (2000 Best, D. J., Rayner, J. C. W., O'Sullivan, M. G. (2000). Product maps for consumer categorical data. Food Quality and Preference 11:9197.[Crossref], [Web of Science ®] [Google Scholar]). Product maps for consumer categorical data. Food Quality and Preference, 11:91–97) suggested tests based on partitioning the X2 statistic into relevant components of location, dispersion, and skewness effects for testing equality of each effect for ordinal preference data. It is known that the chi-square approximation requires large counts for categories. For this purpose, in this study, we investigate a permutation approach for these statistics and compare the performance of these tests with simulation study. In addition, the permutation approach can be used to produce a product map that classifies the products. We illustrate the approach with a real data example.  相似文献   

19.
In this article, we study global L2 error of non linear wavelet estimator of density in the Besov space Bspq for missing data model when covariables are present and prove that the estimator can achieve the optimal rate of convergence, which is similar to the result studied by Donoho et al. (1996) Donoho, D.L., Johnstone, I.M., Kerkyacharian, G., Picard, D. (1996). Density estimation by wavelet thresholding. Ann. Stat. 24:508539.[Crossref], [Web of Science ®] [Google Scholar] in complete independent data case with term-by-term thresholding of the empirical wavelet coefficients. Finite-sample behavior of the proposed estimator is explored via simulations.  相似文献   

20.
The purpose of this article is to explain cross-validation and describe its use in regression. Because replicability analyses are not typically employed in studies, this is a topic with which many researchers may not be familiar. As a result, researchers may not understand how to conduct cross-validation in order to evaluate the replicability of their data. This article not only explains the purpose of cross-validation, but also uses the widely available Holzinger and Swineford (1939 Holzinger, K.J., Swineford, F. (1939). A Study in Factor Analysis: The Stability of a Bi-Factor Solution. Chicago, IL: University of Chicago. Available at: http://people.cehd.tamu.edu/~bthompson/datasets.htm [Google Scholar]) dataset as a heuristic example to concretely demonstrate its use. By incorporating multiple tables and examples of SPSS syntax and output, the reader is provided with additional visual examples in order to further clarify the steps involved in conducting cross-validation. A brief discussion of the limitations of cross-validation is also included. After reading this article, the reader should have a clear understanding of cross-validation, including when it is appropriate to use, and how it can be used to evaluate replicability in regression.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号