首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Hea-Jung Kim  Taeyoung Roh 《Statistics》2013,47(5):1082-1111
In regression analysis, a sample selection scheme often applies to the response variable, which results in missing not at random observations on the variable. In this case, a regression analysis using only the selected cases would lead to biased results. This paper proposes a Bayesian methodology to correct this bias based on a semiparametric Bernstein polynomial regression model that incorporates the sample selection scheme into a stochastic monotone trend constraint, variable selection, and robustness against departures from the normality assumption. We present the basic theoretical properties of the proposed model that include its stochastic representation, sample selection bias quantification, and hierarchical model specification to deal with the stochastic monotone trend constraint in the nonparametric component, simple bias corrected estimation, and variable selection for the linear components. We then develop computationally feasible Markov chain Monte Carlo methods for semiparametric Bernstein polynomial functions with stochastically constrained parameter estimation and variable selection procedures. We demonstrate the finite-sample performance of the proposed model compared to existing methods using simulation studies and illustrate its use based on two real data applications.  相似文献   

2.
A number of recent papers have focused on the problem of testing for a unit root in the case where the driving shocks may be unconditionally heteroskedastic. These papers have, however, taken the lag length in the unit root test regression to be a deterministic function of the sample size, rather than data-determined, the latter being standard empirical practice. We investigate the finite sample impact of unconditional heteroskedasticity on conventional data-dependent lag selection methods in augmented Dickey–Fuller type regressions and propose new lag selection criteria which allow for unconditional heteroskedasticity. Standard lag selection methods are shown to have a tendency to over-fit the lag order under heteroskedasticity, resulting in significant power losses in the (wild bootstrap implementation of the) augmented Dickey–Fuller tests under the alternative. The proposed new lag selection criteria are shown to avoid this problem yet deliver unit root tests with almost identical finite sample properties as the corresponding tests based on conventional lag selection when the shocks are homoskedastic.  相似文献   

3.
Sample coordination maximizes or minimizes the overlap of two or more samples selected from overlapping populations. It can be applied to designs with simultaneous or sequential selection of samples. We propose a method for sample coordination in the former case. We consider the case where units are to be selected with maximum overlap using two designs with given unit inclusion probabilities. The degree of coordination is measured by the expected sample overlap, which is bounded above by a theoretical bound, called the absolute upper bound, and which depends on the unit inclusion probabilities. If the expected overlap equals the absolute upper bound, the sample coordination is maximal. Most of the methods given in the literature consider fixed marginal sampling designs, but in many cases, the absolute upper bound is not achieved. We propose to construct optimal sampling designs for given unit inclusion probabilities in order to realize maximal coordination. Our method is based on some theoretical conditions on joint selection probability of two samples and on the controlled selection method with linear programming implementation. The method can also be applied to minimize the sample overlap.  相似文献   

4.
Model selection methods are important to identify the best approximating model. To identify the best meaningful model, purpose of the model should be clearly pre-stated. The focus of this paper is model selection when the modelling purpose is classification. We propose a new model selection approach designed for logistic regression model selection where main modelling purpose is classification. The method is based on the distance between the two clustering trees. We also question and evaluate the performances of conventional model selection methods based on information theory concepts in determining best logistic regression classifier. An extensive simulation study is used to assess the finite sample performances of the cluster tree based and the information theoretic model selection methods. Simulations are adjusted for whether the true model is in the candidate set or not. Results show that the new approach is highly promising. Finally, they are applied to a real data set to select a binary model as a means of classifying the subjects with respect to their risk of breast cancer.  相似文献   

5.
In this paper, we study the problem of estimation and variable selection for generalised partially linear single-index models based on quasi-likelihood, extending existing studies on variable selection for partially linear single-index models to binary and count responses. To take into account the unit norm constraint of the index parameter, we use the ‘delete-one-component’ approach. The asymptotic normality of the estimates is demonstrated. Furthermore, the smoothly clipped absolute deviation penalty is added for variable selection of parameters both in the nonparametric part and the parametric part, and the oracle property of the variable selection procedure is shown. Finally, some simulation studies are carried out to illustrate the finite sample performance.  相似文献   

6.
The performance of the sampling strategy used in the Botswana Aids Impact Survey II (BAISII) has been studied in detail under a randomized response technique. We have shown that alternative strategies based on the Rao–Harley–Cochran (RHC) sampling scheme for the selection of first stage units perform much better than other strategies. In particular, the combination RHC for the selection of first stage units (fsu's) and systematic sampling for the selection of second stage units (ssu's) perform the best when the sample size is small where as the RHC and SRSWOR perform the best when the sample size is large. In view of the present findings it is recommended that the BAISII survey should be studied in more detail incorporating more indicators and increased sample sizes. This is because the BAISII survey design is extensively in use for large scales surveys in Southern African countries.  相似文献   

7.
Model selection is the most persuasive problem in generalized linear models. A model selection criterion based on deviance called the deviance-based criterion (DBC) is proposed. The DBC is obtained by penalizing the difference between the deviance of the fitted model and the full model. Under certain weak conditions, DBC is shown to be a consistent model selection criterion in the sense that with probability approaching to one, the selected model asymptotically equals the optimal model relating response and predictors. Further, the use of DBC in link function selection is also discussed. We compare the proposed model selection criterion with existing methods. The small sample efficiency of proposed model selection criterion is evaluated by the simulation study.  相似文献   

8.
韩猛  白仲林 《统计研究》2021,38(8):121-131
门限因子模型设定载荷具有阈值型区制转换结构,可以同时刻画高维时间序列的共变性和区制转换特征。针对高维门限因子模型,本文基于自适应组LASSO技术给出了一种一致模型选择过程。这一模型选择过程将因子个数设定、门限效应推断纳入统一的分析框架,不仅解决了模型选择的一致性问题,还同时实现了模型选择误差的统一控制,这对于高维门限因子模型而言是非常重要的。理论研究和随机模拟结论表明本文给出的一致模型选择过程具有良好的大样本性质和有限样本表现。最后,本文将门限因子模型应用于我国金融市场分析,实证结果进一步验证了本文理论的有效性。  相似文献   

9.
Non‐random sampling is a source of bias in empirical research. It is common for the outcomes of interest (e.g. wage distribution) to be skewed in the source population. Sometimes, the outcomes are further subjected to sample selection, which is a type of missing data, resulting in partial observability. Thus, methods based on complete cases for skew data are inadequate for the analysis of such data and a general sample selection model is required. Heckman proposed a full maximum likelihood estimation method under the normality assumption for sample selection problems, and parametric and non‐parametric extensions have been proposed. We generalize Heckman selection model to allow for underlying skew‐normal distributions. Finite‐sample performance of the maximum likelihood estimator of the model is studied via simulation. Applications illustrate the strength of the model in capturing spurious skewness in bounded scores, and in modelling data where logarithm transformation could not mitigate the effect of inherent skewness in the outcome variable.  相似文献   

10.
In Wu and Zen (1999), a linear model selection procedure based on M-estimation is proposed, which includes many classical model selection criteria as its special cases, and it is shown that the selection procedure is strongly consistent for a variety of penalty functions. In this paper, we will investigate its small sample performances for some choices of fixed penalty functions. It can be seen that the performance varies with the choice of the penalty. Hence, a randomized penalty based on observed data is proposed, which preserves the consistency property and provides improved performance over a fixed choice of penalty functions.  相似文献   

11.
Demonstrated equivalence between a categorical regression model based on case‐control data and an I‐sample semiparametric selection bias model leads to a new goodness‐of‐fit test. The proposed test statistic is an extension of an existing Kolmogorov–Smirnov‐type statistic and is the weighted average of the absolute differences between two estimated distribution functions in each response category. The paper establishes an optimal property for the maximum semiparametric likelihood estimator of the parameters in the I‐sample semiparametric selection bias model. It also presents a bootstrap procedure, some simulation results and an analysis of two real datasets.  相似文献   

12.
The np control chart is used widely in Statistical Process Control (SPC) for attributes. It is difficult to design an np chart that simultaneously satisfies a requirement on false alarm rate and has high detection effectiveness. This is mainly because one is often unable to make the in-control Average Run Length ARL0 of an np chart close to a specified or desired value. This article proposes a new np control chart which is able to overcome the problems suffered by the conventional np chart. It is called the Double Inspection (DI) np chart, because it uses a double inspection scheme to decide the process status (in control or out of control). The first inspection decides the process status according to the number of non-conforming units found in a sample; and the second inspection makes a decision based on the location of a particular non-conforming unit in the sample. The double inspection scheme makes the in-control ARL0 very close to a specified value and the out-of-control Average Run Length ARL1 quite small. As a result, the requirement on a false alarm rate is satisfied and the detection effectiveness also achieves a high level. Moreover, the DI np chart retains the operational simplicity of the np chart to a large degree and achieves the performance improvement without requiring extra inspection (testing whether a unit is conforming or not).  相似文献   

13.
Classification of high-dimensional data set is a big challenge for statistical learning and data mining algorithms. To effectively apply classification methods to high-dimensional data sets, feature selection is an indispensable pre-processing step of learning process. In this study, we consider the problem of constructing an effective feature selection and classification scheme for data set which has a small number of sample size with a large number of features. A novel feature selection approach, named four-Staged Feature Selection, has been proposed to overcome high-dimensional data classification problem by selecting informative features. The proposed method first selects candidate features with number of filtering methods which are based on different metrics, and then it applies semi-wrapper, union and voting stages, respectively, to obtain final feature subsets. Several statistical learning and data mining methods have been carried out to verify the efficiency of the selected features. In order to test the adequacy of the proposed method, 10 different microarray data sets are employed due to their high number of features and small sample size.  相似文献   

14.
The problem considered relates to large-scale sample surveys. A new estimator of population total for the characteristics that are poorly correlated with the selection probabilities has been developed for the PPSWR sampling scheme. The relative efficiency of the proposed estimator has been studied under a super-population model. A numerical investigation into the performance of the estimator has also been made.  相似文献   

15.
The problem considered relates to large scale sample surveys, A new estimator of population total for the characteristics that are poorly correlated with. the selection probabilities,has been developed for the RHC sampling scheme, She relative efficiency of the proposed estimator has been studied under a super-population model, A numerical. Investigation into the performance of the estimator, has also been made.  相似文献   

16.
Adaptive sampling without replacement of clusters   总被引:1,自引:0,他引:1  
In a common form of adaptive cluster sampling, an initial sample of units is selected by random sampling without replacement and, whenever the observed value of the unit is sufficiently high, its neighboring units are added to the sample, with the process of adding neighbors repeated if any of the added units are also high valued. In this way, an initial selection of a high-valued unit results in the addition of the entire network of surrounding high-valued units and some low-valued “edge” units where sampling stops. Repeat selections can occur when more than one initially selected unit is in the same network or when an edge unit is shared by more than one added network. Adaptive sampling without replacement of networks avoids some of this repeat selection by sequentially selecting initial sample units only from the part of the population not already in any selected network. The design proposed in this paper carries this step further by selecting initial units only from the population, exclusive of any previously selected networks or edge units.  相似文献   

17.
We propose a random partition model that implements prediction with many candidate covariates and interactions. The model is based on a modified product partition model that includes a regression on covariates by favouring homogeneous clusters in terms of these covariates. Additionally, the model allows for a cluster‐specific choice of the covariates that are included in this evaluation of homogeneity. The variable selection is implemented by introducing a set of cluster‐specific latent indicators that include or exclude covariates. The proposed model is motivated by an application to predicting mortality in an intensive care unit in Lisboa, Portugal.  相似文献   

18.
The sample selection bias problem occurs when the outcome of interest is only observed according to some selection rule, where there is a dependence structure between the outcome and the selection rule. In a pioneering work, J. Heckman proposed a sample selection model based on a bivariate normal distribution for dealing with this problem. Due to the non-robustness of the normal distribution, many alternatives have been introduced in the literature by assuming extensions of the normal distribution like the Student-t and skew-normal models. One common limitation of the existent sample selection models is that they require a transformation of the outcome of interest, which is common R+-valued, such as income and wage. With this, data are analyzed on a non-original scale which complicates the interpretation of the parameters. In this paper, we propose a sample selection model based on the bivariate Birnbaum–Saunders distribution, which has the same number of parameters that the classical Heckman model. Further, our associated outcome equation is R+-valued. We discuss estimation by maximum likelihood and present some Monte Carlo simulation studies. An empirical application to the ambulatory expenditures data from the 2001 Medical Expenditure Panel Survey is presented.  相似文献   

19.
《统计学通讯:理论与方法》2012,41(16-17):3278-3300
Under complex survey sampling, in particular when selection probabilities depend on the response variable (informative sampling), the sample and population distributions are different, possibly resulting in selection bias. This article is concerned with this problem by fitting two statistical models, namely: the variance components model (a two-stage model) and the fixed effects model (a single-stage model) for one-way analysis of variance, under complex survey design, for example, two-stage sampling, stratification, and unequal probability of selection, etc. Classical theory underlying the use of the two-stage model involves simple random sampling for each of the two stages. In such cases the model in the sample, after sample selection, is the same as model for the population; before sample selection. When the selection probabilities are related to the values of the response variable, standard estimates of the population model parameters may be severely biased, leading possibly to false inference. The idea behind the approach is to extract the model holding for the sample data as a function of the model in the population and of the first order inclusion probabilities. And then fit the sample model, using analysis of variance, maximum likelihood, and pseudo maximum likelihood methods of estimation. The main feature of the proposed techniques is related to their behavior in terms of the informativeness parameter. We also show that the use of the population model that ignores the informative sampling design, yields biased model fitting.  相似文献   

20.
In this paper, we investigate model selection and model averaging based on rank regression. Under mild conditions, we propose a focused information criterion and a frequentist model averaging estimator for the focused parameters in rank regression model. Compared to the least squares method, the new method is not only highly efficient but also robust. The large sample properties of the proposed procedure are established. The finite sample properties are investigated via extensive Monte Claro simulation study. Finally, we use the Boston Housing Price Dataset to illustrate the use of the proposed rank methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号