首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we consider paired survival data, in which pair members are subject to the same right censoring time, but they are dependent on each other. Assuming the Marshall–Olkin Multivariate Weibull distribution for the joint distribution of the lifetimes (X1, X2) and the censoring time X3, we derive the joint density of the actual observed data and obtain maximum likelihood estimators, Bayes estimators and posterior regret Gamma minimax estimators of the unknown parameters under squared error loss and weighted squared error loss functions. We compare the performances of the maximum likelihood estimators and Bayes estimators numerically in terms of biases and estimated Mean Squared Error Loss.  相似文献   

2.
Much effort has been devoted to deriving Edgeworth expansions for various classes of statistics that are asymptotically normally distributed, with derivations tailored to the individual structure of each class. Expansions with smaller error rates are needed for more accurate statistical inference. Two such Edgeworth expansions are derived analytically in this paper. One is a two-term expansion for the standardized U-statistic of order m, m ? 3, with an error rate o(n? 1). The other is an expansion with the same error rate for the distribution of the standardized V-statistic of the same order. In deriving the Edgeworth expansion, we made use of the close connection between the V- and U-statistics, which permits to first derive the needed expansion for the related U-statistic, then extend it to the V-statistic, taking into consideration the estimation of all difference terms between the two statistics.  相似文献   

3.
ABSTRACT

We derive concentration inequalities for the cross-validation estimate of the generalization error for empirical risk minimizers. In the general setting, we show that the worst-case error of this estimate is not much worse that of training error estimate see Kearns M, Ron D. [Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Comput. 1999;11:1427–1453]. General loss functions and class of predictors with finite VC-dimension are considered. Our focus is on proving the consistency of the various cross-validation procedures. We point out the interest of each cross-validation procedure in terms of rates of convergence. An interesting consequence is that the size of the test sample is not required to grow to infinity for the consistency of the cross-validation procedure.  相似文献   

4.
We consider several time series, and for each of them, we fit an appropriate dynamic parametric model. This produces serially independent error terms for each time series. The dependence between these error terms is then modelled by a regime-switching copula. The EM algorithm is used for estimating the parameters and a sequential goodness-of-fit procedure based on Cramér–von Mises statistics is proposed to select the appropriate number of regimes. Numerical experiments are performed to assess the validity of the proposed methodology. As an example of application, we evaluate a European put-on-max option on the returns of two assets. To facilitate the use of our methodology, we have built a R package HMMcopula available on CRAN. The Canadian Journal of Statistics 48: 79–96; 2020 © 2020 Statistical Society of Canada  相似文献   

5.
Measurement error and autocorrelation often exist in quality control applications. Both have an adverse effect on the chart's performance. To counteract the undesired effect of autocorrelation, we build-up the samples with non-neighbouring items, according to the time they were produced. To counteract the undesired effect of measurement error, we measure the quality characteristic of each item of the sample several times. The chart's performance is assessed when multiple measurements are applied and the samples are built by taking one item from the production line and skipping one, two or more before selecting the next.  相似文献   

6.
Daniil Ryabko 《Statistics》2013,47(1):121-128
Given a discrete-valued sample X1, …, Xn, we wish to decide whether it was generated by a distribution belonging to a family H0, or it was generated by a distribution belonging to a family H1. In this work, we assume that all distributions are stationary ergodic, and do not make any further assumptions (e.g. no independence or mixing rate assumptions). We would like to have a test whose probability of error (both Types I and II) is uniformly bounded. More precisely, we require that for each ? there exists a sample size n such that probability of error is upper-bounded by ? for samples longer than n. We find some necessary and some sufficient conditions on H0 and H1 under which a consistent test (with this notion of consistency) exists. These conditions are topological, with respect to the topology of distributional distance.  相似文献   

7.
Five sampling schemes (SS) for price index construction – one cut-off sampling technique and four probability-proportional-to-size (pps) methods – are evaluated by comparing their performance on a homescan market research data set across 21 months for each of the 13 classification of individual consumption by purpose (COICOP) food groups. Classifications are derived for each of the food groups and the population index value is used as a reference to derive performance error measures, such as root mean squared error, bias and standard deviation for each food type. Repeated samples are taken for each of the pps schemes and the resulting performance error measures analysed using regression of three of the pps schemes to assess the overall effect of SS and COICOP group whilst controlling for sample size, month and population index value. Cut-off sampling appears to perform less well than pps methods and multistage pps seems to have no advantage over its single-stage counterpart. The jackknife resampling technique is also explored as a means of estimating the standard error of the index and compared with the actual results from repeated sampling.  相似文献   

8.
This paper studies a system with multiple infinite-server queues that are modulated by a common background process. If this background process, being modeled as a finite-state continuous-time Markov chain, is in state j, then the arrival rate into the i-th queue is λi, j, whereas the service times of customers present in this queue are exponentially distributed with mean μ? 1i, j; at each of the individual queues all customers present are served in parallel (thus reflecting their infinite-server nature).

Three types of results are presented: in the first place (i) we derive differential equations for the probability-generating functions corresponding to the distributions of the transient and stationary numbers of customers (jointly in all queues), then (ii) we set up recursions for the (joint) moments, and finally (iii) we establish a central limit theorem in the asymptotic regime in which the arrival rates as well as the transition rates of the background process are simultaneously growing large.  相似文献   

9.
In multiple linear regression analysis each lower-dimensional subspace L of a known linear subspace M of ? n corresponds to a non empty subset of the columns of the regressor matrix. For a fixed subspace L, the C p statistic is an unbiased estimator of the mean square error if the projection of the response vector onto L is used to estimate the expected response. In this article, we consider two truncated versions of the C p statistic that can also be used to estimate this mean square error. The C p statistic and its truncated versions are compared in two example data sets, illustrating that use of the truncated versions may result in models different from those selected by standard C p .  相似文献   

10.
In genetic association studies, detecting phenotype–genotype association is a primary goal. We assume that the relationship between the data—phenotype, genetic markers and environmental covariates—can be modeled by a generalized linear model. The number of markers is allowed to be far greater than the number of individuals of the study. A multivariate score statistic is used to test each marker for association with a phenotype. We assume that the test statistics asymptotically follow a multivariate normal distribution under the complete null hypothesis of no phenotype–genotype association. We present the familywise error rate order k approximation method to find a local significance level (alternatively, an adjusted p-value) for each test such that the familywise error rate is controlled. The special case k=1 gives the Šidák method. As a by-product, an effective number of independent tests can be defined. Furthermore, if environmental covariates and genetic markers are uncorrelated, or no environmental covariates are present, we show that covariances between score statistics depend on genetic markers alone. This not only leads to more efficient calculations but also to a local significance level that is determined only by the collection of markers used, independent of the phenotypes and environmental covariates of the experiment at hand.  相似文献   

11.
Emmanuel Caron 《Statistics》2019,53(4):885-902
In this paper, we consider the usual linear regression model in the case where the error process is assumed strictly stationary. We use a result from Hannan (Central limit theorems for time series regression. Probab Theory Relat Fields. 1973;26(2):157–170), who proved a Central Limit Theorem for the usual least squares estimator under general conditions on the design and on the error process. Whatever the design satisfying Hannan's conditions, we define an estimator of the covariance matrix and we prove its consistency under very mild conditions. As an application, we show how to modify the usual tests on the linear model in this dependent context, in such a way that the type-I error rate remains asymptotically correct, and we illustrate the performance of this procedure through different sets of simulations.  相似文献   

12.
This is a comparative study of various clustering and classification algorithms as applied to differentiate cancer and non-cancer protein samples using mass spectrometry data. Our study demonstrates the usefulness of a feature selection step prior to applying a machine learning tool. A natural and common choice of a feature selection tool is the collection of marginal p-values obtained from t-tests for testing the intensity differences at each m/z ratio in the cancer versus non-cancer samples. We study the effect of selecting a cutoff in terms of the overall Type 1 error rate control on the performance of the clustering and classification algorithms using the significant features. For the classification problem, we also considered m/z selection using the importance measures computed by the Random Forest algorithm of Breiman. Using a data set of proteomic analysis of serum from ovarian cancer patients and serum from cancer-free individuals in the Food and Drug Administration and National Cancer Institute Clinical Proteomics Database, we undertake a comparative study of the net effect of the machine learning algorithm–feature selection tool–cutoff criteria combination on the performance as measured by an appropriate error rate measure.  相似文献   

13.
Estimation of prediction accuracy is important when our aim is prediction. The training error is an easy estimate of prediction error, but it has a downward bias. On the other hand, K-fold cross-validation has an upward bias. The upward bias may be negligible in leave-one-out cross-validation, but it sometimes cannot be neglected in 5-fold or 10-fold cross-validation, which are favored from a computational standpoint. Since the training error has a downward bias and K-fold cross-validation has an upward bias, there will be an appropriate estimate in a family that connects the two estimates. In this paper, we investigate two families that connect the training error and K-fold cross-validation.  相似文献   

14.
Approximate normality and unbiasedness of the maximum likelihood estimate (MLE) of the long-memory parameter H of a fractional Brownian motion hold reasonably well for sample sizes as small as 20 if the mean and scale parameter are known. We show in a Monte Carlo study that if the latter two parameters are unknown the bias and variance of the MLE of H both increase substantially. We also show that the bias can be reduced by using a parametric bootstrap procedure. In very large samples, maximum likelihood estimation becomes problematic because of the large dimension of the covariance matrix that must be inverted. To overcome this difficulty, we propose a maximum likelihood method based upon first differences of the data. These first differences form a short-memory process. We split the data into a number of contiguous blocks consisting of a relatively small number of observations. Computation of the likelihood function in a block then presents no computational problem. We form a pseudo-likelihood function consisting of the product of the likelihood functions in each of the blocks and provide a formula for the standard error of the resulting estimator of H. This formula is shown in a Monte Carlo study to provide a good approximation to the true standard error. The computation time required to obtain the estimate and its standard error from large data sets is an order of magnitude less than that required to obtain the widely used Whittle estimator. Application of the methodology is illustrated on two data sets.  相似文献   

15.
In this paper we consider a linear regression model with omitted relevant regressors and multivariatet error terms. The explicit formula for the Pitman nearness criterion of the Stein-rule (SR) estimator relative to the ordinary least squares (OLS) estimator is derived. It is shown numerically that the dominance of the SR estimator over the OLS estimator under the Pitman nearness criterion can be extended to the case of the multivariatet error distribution when the specification error is not severe. It is also shown that the dominance of the SR estimator over the OLS estimator cannot be extended to the case of the multivariatet error distribution when the specification error is severe. This research is partially supported by the Grants-in-Aid for 21st Century COE program.  相似文献   

16.
In this paper, we study inference in a heteroscedastic measurement error model with known error variances. Instead of the normal distribution for the random components, we develop a model that assumes a skew-t distribution for the true covariate and a centred Student's t distribution for the error terms. The proposed model enables to accommodate skewness and heavy-tailedness in the data, while the degrees of freedom of the distributions can be different. Maximum likelihood estimates are computed via an EM-type algorithm. The behaviour of the estimators is also assessed in a simulation study. Finally, the approach is illustrated with a real data set from a methods comparison study in Analytical Chemistry.  相似文献   

17.
ABSTRACT

In this article, we consider a sampling scheme in record-breaking data set-up, as record ranked set sampling. We compare the proposed sampling with the well-known sampling scheme in record values known as inverse sampling scheme when the underlying distribution follows the proportional hazard rate model. Various point estimators are obtained in each sampling schemes and compared with respect to mean squared error and Pitman measure of closeness criteria. It is observed in most of the situations that the new sampling scheme provides more efficient estimators than their counterparts. Finally, one data set has been analyzed for illustrative purposes.  相似文献   

18.
To create inferences in dichotomous classifications with misclassifications and possibly perform repeated classifications, the maximum likelihood method is commonly used, mainly because of its efficiency in obtaining parameter estimators of a mixture of two binomial distributions. One simpler alternative that is operationally easier is to consider the simple majority method. In this method, each of n items are classified r times as conforming or non-conforming. The final classification of the item is determined by the most frequent class. This method yielded lower mean squared errors than the maximum likelihood and the moments estimators and is asymptotically efficient. In this paper, we introduce a new approach in which the realization of all r repeated classifications of each item may not be needed. Each of n items is sequentially classified as conforming or nonconforming, and the process ceases when the frequency of conforming or non-conforming classification reaches the integer a. We show that, by a Monte Carlo simulation, the last procedure presents a lower mean squared error than the simple majority results for a similar number of r repeated classifications.  相似文献   

19.
Matthias Kohl 《Statistics》2013,47(4):473-488
Bednarski and Müller [Optimal bounded influence regression and scale M-estimators in the context of experimental design, Statistics 35 (2001), pp. 349–369] introduced a class of bounded influence M estimates for the simultaneous estimation of regression and scale in the linear model with normal errors by solving the corresponding normal location and scale problem at each design point. This limits the proposal to regressor distributions with finite support. Based on their approach, we propose a slightly extended class of M estimates that is not restricted to finite support and is numerically easier to handle. Moreover, we employ the even more general class of asymptotically linear (AL) estimators which, in addition, is not restricted to normal errors. The superiority of AL estimates is demonstrated by numerical comparisons of the maximum asymptotic mean-squared error over infinitesimal contamination neighbourhoods.  相似文献   

20.
In this paper, we analytically derive the exact formula for the mean squared error (MSE) of two weighted average (WA) estimators for each individual regression coefficient. Further, we execute numerical evaluations to investigate small sample properties of the WA estimators, and compare the MSE performance of the WA estimators with the other shrinkage estimators and the usual OLS estimator. Our numerical results show that (1) the WA estimators have smaller MSE than the other shrinkage estimators and the OLS estimator over a wide region of parameter space; (2) the range where the relative MSE of the WA estimator is smaller than that of the OLS estimator gets narrower as the number of explanatory variables k increases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号