首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 781 毫秒
1.
This paper develops a new automatic and location-adaptive procedure for estimating regression in a Functional Single-Index Model (FSIM). This procedure is based on k-Nearest Neighbours (kNN) ideas. The asymptotic study includes results for automatically data-driven selected number of neighbours, making the procedure directly usable in practice. The local feature of the kNN approach insures higher predictive power compared with usual kernel estimates, as illustrated in some finite sample analysis. As by-product, we state as preliminary tools some new uniform asymptotic results for kernel estimates in the FSIM model.  相似文献   

2.
The k nearest neighbors (k-NN) classifier is one of the most popular methods for statistical pattern recognition and machine learning. In practice, the size k, the number of neighbors used for classification, is usually arbitrarily set to one or some other small numbers, or based on the cross-validation procedure. In this study, we propose a novel alternative approach to decide the size k. Based on a k-NN-based multivariate multi-sample test, we assign each k a permutation test based Z-score. The number of NN is set to the k with the highest Z-score. This approach is computationally efficient since we have derived the formulas for the mean and variance of the test statistic under permutation distribution for multiple sample groups. Several simulation and real-world data sets are analyzed to investigate the performance of our approach. The usefulness of our approach is demonstrated through the evaluation of prediction accuracies using Z-score as a criterion to select the size k. We also compare our approach to the widely used cross-validation approaches. The results show that the size k selected by our approach yields high prediction accuracies when informative features are used for classification, whereas the cross-validation approach may fail in some cases.  相似文献   

3.
In this article, we discuss finding the optimal k of (i) kth simple moving average, (ii) kth weighted moving average, and (iii) kth exponential weighted moving average based on simulated ARIMA(p, d, q) model. We run a simulation using the three above examining methods under specific conditions. The main finding is that 5th exponential weighted moving average (5th EWMA) ARIMA model is the best forecasting model among others, which means the optimal k = 5. For Turkish Telecommunications (TTKOM) stock market, real data reveal the similar results of simulation study.  相似文献   

4.
In this article, we discuss finding the optimal k of (i) kth simple moving average, (ii) kth weighted moving average, and (iii) kth exponential weighted moving average based on simulated autoregressive AR(p) model. We run a simulation using the three above examining method under specific conditions. The main finding is that the optimal k = 4 and then k = 3. Especially, the fourth WMA ARIMA model, fourth EWMA ARIMA model, and third EWMA ARIMA model are the best forecasting models among others, respectively. For all the six real data reveal the similar results of simulation study.  相似文献   

5.
Euclidean distance k-nearest neighbor (k-NN) classifiers are simple nonparametric classification rules. Bootstrap methods, widely used for estimating the expected prediction error of classification rules, are motivated by the objective of calculating the ideal bootstrap estimate of expected prediction error. In practice, bootstrap methods use Monte Carlo resampling to estimate the ideal bootstrap estimate because exact calculation is generally intractable. In this article, we present analytical formulae for exact calculation of the ideal bootstrap estimate of expected prediction error for k-NN classifiers and propose a new weighted k-NN classifier based on resampling ideas. The resampling-weighted k-NN classifier replaces the k-NN posterior probability estimates by their expectations under resampling and predicts an unclassified covariate as belonging to the group with the largest resampling expectation. A simulation study and an application involving remotely sensed data show that the resampling-weighted k-NN classifier compares favorably to unweighted and distance-weighted k-NN classifiers.  相似文献   

6.
We examine the behaviour of the sample autocorrelations of a seasonal time series for which the first difference of order s (s ≥ 1) is stationary. The asymptotic distribution of the autocorrelations r'(k) based on uncentered data and of the autocorrelations r(k) based on centered data are derived. In each case, the asymptotic distribution is characterized as a function of the lag k and the parameters of the process. A simulation study was conducted in order to investigate the rate of convergence of the finite sample distributions of r(k) and r'(k) to their asymptotic counterparts and to evaluate the effect of centering or not centering the data on the distribution of autocorrelations.  相似文献   

7.
Consider the multiple hypotheses testing problem controlling the generalized familywise error rate k-FWER, the probability of at least k false rejections. We propose a plug-in procedure based on the estimation of the number of true null hypotheses. Under the independence assumption of the p-values corresponding to the true null hypotheses, we first introduce the least favorable configuration (LFC) of k-FWER for Bonferroni-type plug-in procedure, then we construct a plug-in k-FWER-controlled procedure based on LFC. For dependent p-values, we establish the asymptotic k-FWER control under some mild conditions. Simulation studies suggest great improvement over generalized Bonferroni test and generalized Holm test.  相似文献   

8.
In consumer preference studies, it is common to seek a complete ranking of a variety of, say N, alternatives or treatments. Unfortunately, as N increases, it becomes progressively more confusing and undesirable for respondents to rank all N alternatives simultaneously. Moreover, the investigators may only be interested in consumers’ top few choices. Therefore, it is desirable to accommodate the setting where each survey respondent ranks only her/his most preferred k (k?N) alternatives. In this paper, we propose a simple procedure to test the independence of N alternatives and the top-k ranks, such that the value of k can be predetermined before securing a set of partially ranked data or be at the discretion of the investigator in the presence of complete ranking data. The asymptotic distribution of the proposed test under root-n local alternatives is established. We demonstrate our procedure with two real data sets.  相似文献   

9.
Contours may be viewed as the 2D outline of the image of an object. This type of data arises in medical imaging as well as in computer vision and can be modeled as data on a manifold and can be studied using statistical shape analysis. Practically speaking, each observed contour, while theoretically infinite dimensional, must be discretized for computations. As such, the coordinates for each contour as obtained at k sampling times, resulting in the contour being represented as a k-dimensional complex vector. While choosing large values of k will result in closer approximations to the original contour, this will also result in higher computational costs in the subsequent analysis. The goal of this study is to determine reasonable values for k so as to keep the computational cost low while maintaining accuracy. To do this, we consider two methods for selecting sample points and determine lower bounds for k for obtaining a desired level of approximation error using two different criteria. Because this process is computationally inefficient to perform on a large scale, we then develop models for predicting the lower bounds for k based on simple characteristics of the contours.  相似文献   

10.
Making predictions of future realized values of random variables based on currently available data is a frequent task in statistical applications. In some applications, the interest is to obtain a two-sided simultaneous prediction interval (SPI) to contain at least k out of m future observations with a certain confidence level based on n previous observations from the same distribution. A closely related problem is to obtain a one-sided upper (or lower) simultaneous prediction bound (SPB) to exceed (or be exceeded) by at least k out of m future observations. In this paper, we provide a general approach for computing SPIs and SPBs based on data from a particular member of the (log)-location-scale family of distributions with complete or right censored data. The proposed simulation-based procedure can provide exact coverage probability for complete and Type II censored data. For Type I censored data, our simulation results show that our procedure provides satisfactory results in small samples. We use three applications to illustrate the proposed simultaneous prediction intervals and bounds.  相似文献   

11.
ABSTRACT

Many mathematical and physical problems are led to find a root of a real function f. This kind of equation is an inverse problem and it is difficult to solve it. Especially in engineering sciences, the analytical expression of the function f is unknown to the experimenter, but it can be measured at each point xk with M(xk) as expected value and induced error ξk. The aim is to approximate the unique root θ under some assumptions on the function f and errors ξk. We use a stochastic approximation algorithm that constructs a sequence (xk)k ? 1. We establish the almost complete convergence of the sequence (xk)k to the exact root θ by considering the errors (ξk)k quasi-associated and we illustrate the method by numerical examples to show its efficiency.  相似文献   

12.
In this paper, we propose a general kth correlation coefficient between the density function and distribution function of a continuous variable as a measure of symmetry and asymmetry. We first propose a root-n moment-based estimator of the kth correlation coefficient and present its asymptotic results. Next, we consider statistical inference of the kth correlation coefficient by using the empirical likelihood (EL) method. The EL statistic is shown to be asymptotically a standard chi-squared distribution. Last, we propose a residual-based estimator of the kth correlation coefficient for a parametric regression model to test whether the density function of the true model error is symmetric or not. We present the asymptotic results of the residual-based kth correlation coefficient estimator and also construct its EL-based confidence intervals. Simulation studies are conducted to examine the performance of the proposed estimators, and we also use our proposed estimators to analyze the air quality dataset.  相似文献   

13.
A random effects model for analyzing mixed longitudinal normal and count outcomes with and without the possibility of non ignorable missing outcomes is presented. The count response is inflated in two points (k and l) and the (k, l)-Hurdle power series is used as its distribution. The new distribution contains, as special submodels, several important distributions which are discussed, such as (k, l)-Hurdle Poisson and (k, l)-Hurdle negative binomial and (k, l)-Hurdle binomial distributions among others. Random effects are used to take into account the correlation between longitudinal outcomes and inflation parameters. A full likelihood-based approach is used to yield maximum likelihood estimates of the model parameters. A simulation study is performed in which for count outcome (k, l)-Hurdle Poisson, (k, l)-Hurdle negative binomial and (k, l)-Hurdle binomial distributions are considered. To illustrate the application of such modelling the longitudinal data of body mass index and the number of joint damage are analyzed.  相似文献   

14.
Through random cut‐points theory, the author extends inference for ordered categorical data to the unspecified continuum underlying the ordered categories. He shows that a random cut‐point Mann‐Whitney test yields slightly smaller p‐values than the conventional test for most data. However, when at least P% of the data lie in one of the k categories (with P = 80 for k = 2, P = 67 for k = 3,…, P = 18 for k = 30), he also shows that the conventional test can yield much smaller p‐values, and hence misleadingly liberal inference for the underlying continuum. The author derives formulas for exact tests; for k = 2, the Mann‐Whitney test is but a binomial test.  相似文献   

15.
Sequential order statistics is an extension of ordinary order statistics. They model the successive failure times in sequential k-out-of-n systems, where the failures of components possibly affect the residual lifetimes of the remaining ones. In this paper, we consider the residual lifetime of the components after the kth failure in the sequential (nk + 1)-out-of-n system. We extend some results on the joint distribution of the residual lifetimes of the remaining components in an ordinary (nk + 1)-out-of-n system presented in Bairamov and Arnold (Stat Probab Lett 78(8):945–952, 2008) to the case of the sequential (nk + 1)-out-of-n system.  相似文献   

16.
In this paper, we study an inference problem for a stochastic model where k deterministic Lotka–Volterra systems of ordinary differential equations (ODEs) are perturbed with k pairs of random errors. The k deterministic systems describe the ecological interaction between k predator–prey populations. These k deterministic systems depend on unknown parameters. We consider the testing problem concerning the homogeneity between k pairs of the interaction parameters of the ODEs. We assume that the k pairs of random errors are independent and that, each pair follows correlated Ornstein–Uhlenbeck processes. Thus, we extend the stochastic model suggested in Froda and Colavita [2005. Estimating predator–prey systems via ordinary differential equations with closed orbits. Aust. N.Z. J. Stat. 2, 235–254] as well as in Froda and Nkurunziza [2007. Prediction of predator–prey populations modeled by perturbed ODE. J. Math. Biol. 54, 407–451] where k=1. Under this statistical model, we propose a likelihood ratio test and study the asymptotic properties of this test. Finally, we highlight the performance of our method through some simulations studies.  相似文献   

17.
Let {xij(1 ? j ? ni)|i = 1, 2, …, k} be k independent samples of size nj from respective distributions of functions Fj(x)(1 ? j ? k). A classical statistical problem is to test whether these k samples came from a common distribution function, F(x) whose form may or may not be known. In this paper, we consider the complementary problem of estimating the distribution functions suspected to be homogeneous in order to improve the basic estimator known as “empirical distribution function” (edf), in an asymptotic setup. Accordingly, we consider four additional estimators, namely, the restricted estimator (RE), the preliminary test estimator (PTE), the shrinkage estimator (SE), and the positive rule shrinkage estimator (PRSE) and study their characteristic properties based on the mean squared error (MSE) and relative risk efficiency (RRE) with tables and graphs. We observed that for k ? 4, the positive rule SE performs uniformly better than both shrinkage and the unrestricted estimator, while PTEs works reasonably well for k < 4.  相似文献   

18.
In this paper, we deal with the estimation, under a semi-parametric framework, of the Value-at-Risk (VaR) at a level p, the size of the loss occurred with a small probability p. Under such a context, the classical VaR estimators are the Weissman–Hill estimators, based on any intermediate number k of top-order statistics. But these VaR estimators do not enjoy the adequate linear property of quantiles, contrarily to the PORT VaR estimators, which depend on an extra tuning parameter q, with 0≤q<1. We shall here consider ‘quasi-PORT’ reduced-bias VaR estimators, for which such a linear property is obtained approximately. They are based on a partially shifted version of a minimum-variance reduced-bias (MVRB) estimator of the extreme value index (EVI), the primary parameter in Statistics of Extremes. Due to the stability on k of the MVRB EVI and associated VaR estimates, we propose the use of a heuristic stability criterion for the choice of k and q, providing applications of the methodology to simulated data and to log-returns of financial stocks.  相似文献   

19.
One of the most popular methods and algorithms to partition data to k clusters is k-means clustering algorithm. Since this method relies on some basic conditions such as, the existence of mean and finite variance, it is unsuitable for data that their variances are infinite such as data with heavy tailed distribution. Pitman Measure of Closeness (PMC) is a criterion to show how much an estimator is close to its parameter with respect to another estimator. In this article using PMC, based on k-means clustering, a new distance and clustering algorithm is developed for heavy tailed data.  相似文献   

20.
In this article, we deal with semi-parametric corrected-bias estimation of a positive extreme value index (EVI), the primary parameter in statistics of extremes. Under such a context, the classical EVI-estimators are the Hill estimators, based on any intermediate number k of top-order statistics. But these EVI-estimators are not location-invariant, contrarily to the PORT-Hill estimators, which depend on an extra tuning parameter q, with 0 ≤ q < 1, and where PORT stands for peaks over random threshold. On the basis of second-order minimum-variance reduced-bias (MVRB) EVI-estimators, we shall here consider PORT-MVRB EVI-estimators. Due to the stability on k of the MVRB EVI-estimates, we propose the use of a heuristic algorithm, for the adaptive choice of k and q, based on the bias pattern of the estimators as a function of k. Applications in the fields of insurance and finance will be provided.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号