首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A classifier is constant if it classifies all examples into just one class. Call a training data set “(linearly) indiscriminate” if a constant classifier minimizes, among all linear classifiers, the misclassification rate on the training data set. General sufficient conditions are presented for the probability of getting an indiscriminate data set to be positive. Similarly, general sufficient conditions are also presented for the probability of getting an indiscriminate data set to be 0.

A small simulation study examines how our results are reflected in the behavior of logistic regression.  相似文献   

2.
Estimators of the intercept parameter of a simple linear regression model involves the slope estimator. In this article, we consider the estimation of the intercept parameters of two linear regression models with normal errors, when it is a priori suspected that the two regression lines are parallel, but in doubt. We also introduce a coefficient of distrust as a measure of degree of lack of trust on the uncertain prior information regarding the equality of two slopes. Three different estimators of the intercept parameters are defined by using the sample data, the non sample uncertain prior information, an appropriate test statistic, and the coefficient of distrust. The relative performances of the unrestricted, shrinkage restricted and shrinkage preliminary test estimators are investigated based on the analyses of the bias and risk functions under quadratic loss. If the prior information is precise and the coefficient of distrust is small, the shrinkage preliminary test estimator overperforms the other estimators. An example based on a medical study is used to illustrate the method.  相似文献   

3.
This paper develops two sampling designs to create artificially stratified samples. These designs use a small set of experimental units to determine their relative ranks without measurement. In each set, the units are ranked by all available observers (rankers), with ties whenever the units cannot be ranked with high confidence. The rankings from all the observers are then combined in a meaningful way to create a single weight measure. This weight measure is used to create judgment strata in both designs. The first design constructs the strata through judgment post‐stratification after the data has been collected. The second design creates the strata before any measurements are made on the experimental units. The paper constructs estimators and confidence intervals, and develops testing procedures for the mean and median of the underlying distribution based on these sampling designs. We show that the proposed sampling designs provide a substantial improvement over their competitor designs in the literature. The Canadian Journal of Statistics 41: 304–324; 2013 © 2013 Statistical Society of Canada  相似文献   

4.
Many split-plot×split-block (SPSB) type experiments used in agriculture, biochemistry or plant protection are designed to study new crop plant cultivars or chemical agents. In these experiments it is usually very important to compare test treatments with the so-called control treatments. It happens yet that experimental material is limited and it does not allow using a complete (orthogonal) SPSB design. In the paper we propose a non-orthogonal SPSB design for consideration. Two cases of the design are presented here, i.e. when its incompleteness is connected with a crossed treatment structure only or with a nested treatment structure only. It is assumed the factors' levels connected with the incompleteness of the design are split into two groups: a set of test treatments and a set of control treatments. The method of constructions involves applying augmented block designs for some factors' levels. In a modelling data obtained from such experiments the structure of experimental material and appropriate randomization scheme of the different kinds of units before they enter the experiment are taken into account. With respect to the analysis of the obtained randomization model the approach typical to the multistratum experiments with orthogonal block structure is adapted. The proposed statistical analysis of linear model obtained includes estimation of parameters, testing general and particular hypotheses defined by the (basic) treatment contrasts with special reference to the notion of general balance.  相似文献   

5.
The article details a sampling scheme which can lead to a reduction in sample size and cost in clinical and epidemiological studies of association between a count outcome and risk factor. We show that inference in two common generalized linear models for count data, Poisson and negative binomial regression, is improved by using a ranked auxiliary covariate, which guides the sampling procedure. This type of sampling has typically been used to improve inference on a population mean. The novelty of the current work is its extension to log-linear models and derivations showing that the sampling technique results in an increase in information as compared to simple random sampling. Specifically, we show that under the proposed sampling strategy the maximum likelihood estimate of the risk factor’s coefficient is improved through an increase in the Fisher’s information. A simulation study is performed to compare the mean squared error, bias, variance, and power of the sampling routine with simple random sampling under various data-generating scenarios. We also illustrate the merits of the sampling scheme on a real data set from a clinical setting of males with chronic obstructive pulmonary disease. Empirical results from the simulation study and data analysis coincide with the theoretical derivations, suggesting that a significant reduction in sample size, and hence study cost, can be realized while achieving the same precision as a simple random sample.  相似文献   

6.
住户调查是我国社会经济统计调查体系的重要组成部分,样本代表性直接决定统计数据质量。多阶段抽样中初级单元的方差对估计的影响是主要的,因此本文结合2010年全国第六次人口普查分县数据,采用平衡抽样设计获取初级单元的代表性样本-平衡样本。对代表性样本的事后评估结果表明,样本结构与总体结构吻合,目标估计的误差很小,说明了本文平衡设计的有效性。  相似文献   

7.
容越彦  陈光慧 《统计研究》2015,32(12):88-94
在总结现有模型辅助估计方法的基础上,本文通过构造一种半参数超总体模型,同时结合广义差分估计思想提出一种新型的模型辅助估计量。该估计量比传统的非参数和半参数回归估计利用更少、更易得到的辅助信息,即只需利用和广义回归估计相同的辅助信息,但一般会比广义回归估计拥有更高的估计精度。理论证明了该估计量是渐近设计无偏和设计一致的,其渐近设计均方误差为广义差分估计量的方差。模拟结果显示:其至少与广义回归估计一样好;对于线性程度越低的超总体模型,其估计精度比广义回归估计有越明显的提高;就本文模拟而言,光滑参数在0.04~0.12间适当取值时其会取到相对较好的估计效果。  相似文献   

8.
Unit level linear mixed models are often used in small area estimation (SAE), and the empirical best linear unbiased prediction (EBLUP) is widely used for the estimation of small area means under such models. However, EBLUP requires population level auxiliary data, atleast area specific aggregated values. Sometimes population level auxiliary data is either not available or not consistent with the survey data. We describe a SAE method that uses estimated population auxiliary information. Empirical results show that proposed method for SAE produces an efficient set of small area estimates.  相似文献   

9.
Unit-level regression models are commonly used in small area estimation (SAE) to obtain an empirical best linear unbiased prediction of small area characteristics. The underlying assumptions of these models, however, may be unrealistic in some applications. Previous work developed a copula-based SAE model where the empirical Kendall's tau was used to estimate the dependence between two units from the same area. In this article, we propose a likelihood framework to estimate the intra-class dependence of the multivariate exchangeable copula for the empirical best unbiased prediction (EBUP) of small area means. One appeal of the proposed approach lies in its accommodation of both parametric and semi-parametric estimation approaches. Under each estimation method, we further propose a bootstrap approach to obtain a nearly unbiased estimator of the mean squared prediction error of the EBUP of small area means. The performance of the proposed methods is evaluated through simulation studies and also by a real data application.  相似文献   

10.
This article develops statistical inference for the general linear models in order restricted randomized (ORR) designs. The ORR designs use the heterogeneity among experimental units to induce a negative correlation structure among responses obtained from different treatment regimes. This negative correlation structure acts as a variance reduction technique for treatment contrast. The parameters of the general linear models are estimated and a generalized F-test is constructed for its components. It is shown that the null distribution of the test statistic can be approximated reasonably well with an F-distribution for moderate sample sizes. It is also shown that the empirical power of the proposed test is substantially higher than the powers of its competitors in the literature. The proposed test and estimator are applied to a data set from a clinical trial to illustrate how one can improve such an experiment.  相似文献   

11.
S. Huet 《Statistics》2015,49(2):239-266
We propose a procedure to test that the expectation of a Gaussian vector is linear against a nonparametric alternative. We consider the case where the covariance matrix of the observations has a block diagonal structure. This framework encompasses regression models with autocorrelated errors, heteroscedastic regression models, mixed-effects models and growth curves. Our procedure does not depend on any prior information about the alternative. We prove that the test is asymptotically of the nominal level and consistent. We characterize the set of vectors on which the test is powerful and prove the classical √log log (n)/n convergence rate over directional alternatives. We propose a bootstrap version of the test as an alternative to the initial one and provide a simulation study in order to evaluate both procedures for small sample sizes when the purpose is to test goodness of fit in a Gaussian mixed-effects model. Finally, we illustrate the procedures using a real data set.  相似文献   

12.
Ori Davidov  Chang Yu 《Statistics》2013,47(2):163-173
We provide a method for estimating the sample mean of a continuous outcome in a stratified population using a double sampling scheme. The stratified sample mean is a weighted average of stratum specific means. It is assumed that the fallible and true outcome data are related by a simple linear regression model in each stratum. The optimal stratified double sampling plan, i.e. , the double sampling plan that minimizes the cost of sampling for fixed variances, or alternatively, minimizes the variance for fixed costs, is found and compared to a standard sampling plan. The design parameters are the total sample size and the number of doubly sampled units in each stratum. We show that the optimal double sampling plan is a function of the between-strata and within-strata cost and variance ratios. The efficiency gains, relative to standard sampling plans, under broad set of conditions, are considerable.  相似文献   

13.
This paper is mainly concerned with modelling data from degradation sample paths over time. It uses a general growth curve model with Box‐Cox transformation, random effects and ARMA(p, q) dependence to analyse a set of such data. A maximum likelihood estimation procedure for the proposed model is derived and future values are predicted, based on the best linear unbiased prediction. The paper compares the proposed model with a nonlinear degradation model from a prediction point of view. Forecasts of failure times with various data lengths in the sample are also compared.  相似文献   

14.
This paper considers the problem of estimating a nonlinear statistical model subject to stochastic linear constraints among unknown parameters. These constraints represent prior information which originates from a previous estimation of the same model using an alternative database. One feature of this specification allows for the disign matrix of stochastic linear restrictions to be estimated. The mixed regression technique and the maximum likelihood approach are used to derive the estimator for both the model coefficients and the unknown elements of this design matrix. The proposed estimator whose asymptotic properties are studied, contains as a special case the conventional mixed regression estimator based on a fixed design matrix. A new test of compatibility between prior and sample information is also introduced. Thesuggested estimator is tested empirically with both simulated and actual marketing data.  相似文献   

15.
Ranked set sampling (RSS) is a cost-efficient technique for data collection when the units in a population can be easily judgment ranked by any cheap method other than actual measurements. Using auxiliary information in developing statistical procedures for inference about different population characteristics is a well-known approach. In this work, we deal with quantile estimation from a population with known mean when data are obtained according to RSS scheme. Through the simple device of mean-correction (subtract off the sample mean and add on the known population mean), a modified estimator is constructed from the standard quantile estimator. Asymptotic normality of the new estimator and its asymptotic efficiency relative to the original estimator are derived. Simulation results for several underlying distributions show that the proposed estimator is more efficient than the traditional one.  相似文献   

16.
This paper considers the problem of estimating a nonlinear statistical model subject to stochastic linear constraints among unknown parameters. These constraints represent prior information which originates from a previous estimation of the same model using an alternative database. One feature of this specification allows for the disign matrix of stochastic linear restrictions to be estimated. The mixed regression technique and the maximum likelihood approach are used to derive the estimator for both the model coefficients and the unknown elements of this design matrix. The proposed estimator whose asymptotic properties are studied, contains as a special case the conventional mixed regression estimator based on a fixed design matrix. A new test of compatibility between prior and sample information is also introduced. Thesuggested estimator is tested empirically with both simulated and actual marketing data.  相似文献   

17.
Often, the response variables on sampling units are observed repeatedly over time. The sampling units may come from different populations, such as treatment groups. This setting is routinely modeled by a random coefficients growth curve model, and the techniques of general linear mixed models are applied to address the primary research aim. An alternative approach is to reduce each subject’s data to summary measures, such as within-subject averages or regression coefficients. One may then test for equality of means of the summary measures (or functions of them) among treatment groups. Here, we compare by simulation the performance characteristics of three approximate tests based on summary measures and one based on the full data, focusing mainly on accuracy of p-values. We find that performances of these procedures can be quite different for small samples in several different configurations of parameter values. The summary-measures approach performed at least as well as the full-data mixed models approach.  相似文献   

18.
The indirect mechanism of action of immunotherapy causes a delayed treatment effect, producing delayed separation of survival curves between the treatment groups, and violates the proportional hazards assumption. Therefore using the log‐rank test in immunotherapy trial design could result in a severe loss efficiency. Although few statistical methods are available for immunotherapy trial design that incorporates a delayed treatment effect, recently, Ye and Yu proposed the use of a maximin efficiency robust test (MERT) for the trial design. The MERT is a weighted log‐rank test that puts less weight on early events and full weight after the delayed period. However, the weight function of the MERT involves an unknown function that has to be estimated from historical data. Here, for simplicity, we propose the use of an approximated maximin test, the V0 test, which is the sum of the log‐rank test for the full data set and the log‐rank test for the data beyond the lag time point. The V0 test fully uses the trial data and is more efficient than the log‐rank test when lag exits with relatively little efficiency loss when no lag exists. The sample size formula for the V0 test is derived. Simulations are conducted to compare the performance of the V0 test to the existing tests. A real trial is used to illustrate cancer immunotherapy trial design with delayed treatment effect.  相似文献   

19.
In this paper, we consider a regression analysis for a missing data problem in which the variables of primary interest are unobserved under a general biased sampling scheme, an outcome‐dependent sampling (ODS) design. We propose a semiparametric empirical likelihood method for accessing the association between a continuous outcome response and unobservable interesting factors. Simulation study results show that ODS design can produce more efficient estimators than the simple random design of the same sample size. We demonstrate the proposed approach with a data set from an environmental study for the genetic effects on human lung function in COPD smokers. The Canadian Journal of Statistics 40: 282–303; 2012 © 2012 Statistical Society of Canada  相似文献   

20.
In planning a study, the choice of sample size may depend on a variance value based on speculation or obtained from an earlier study. Scientists may wish to use an internal pilot design to protect themselves against an incorrect choice of variance. Such a design involves collecting a portion of the originally planned sample and using it to produce a new variance estimate. This leads to a new power analysis and increasing or decreasing sample size. For any general linear univariate model, with fixed predictors and Gaussian errors, we prove that the uncorrected fixed sample F-statistic is the likelihood ratio test statistic. However, the statistic does not follow an F distribution. Ignoring the discrepancy may inflate test size. We derive and evaluate properties of the components of the likelihood ratio test statistic in order to characterize and quantify the bias. Most notably, the fixed sample size variance estimate becomes biased downward. The bias may inflate test size for any hypothesis test, even if the parameter being tested was not involved in the sample size re-estimation. Furthermore, using fixed sample size methods may create biased confidence intervals for secondary parameters and the variance estimate.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号