共查询到20条相似文献,搜索用时 46 毫秒
1.
Junyong Park Jayson D. Wilbur Jayanta K. Ghosh Cindy H. Nakatsu Corinne Ackerman 《统计学通讯:模拟与计算》2013,42(4):855-869
We adopt boosting for classification and selection of high-dimensional binary variables for which classical methods based on normality and non singular sample dispersion are inapplicable. Boosting seems particularly well suited for binary variables. We present three methods of which two combine boosting with the relatively classical variable selection methods developed in Wilbur et al. (2002). Our primary interest is variable selection in classification with small misclassification error being used as validation of proposed method for variable selection. Two of the new methods perform uniformly better than Wilbur et al. (2002) in one set of simulated and three real life examples. 相似文献
2.
《统计学通讯:理论与方法》2013,42(9):1725-1735
Abstract The study of multivariate distributions of order k, two of which are the multivariate negative binomial of order k and the multinomial of the same order, was introduced in Philippou et al. (Philippou, A. N., Antzoulakos, D. L., Tripsiannis, G. A. (1988). Multivariate distributions of order k. Statistics and Probability Letters 7(3):207–216.), and Philippou et al. (Philippou, A. N., Antzoulakos, D. L., Tripsiannis, G. A. (1990). Multivariate distributions of order k, part II. Statistics and Probability Letters 10(1):29–35.). Recently, an order k (or cluster) generalized negative binomial distribution and a multivariate negative binomial distribution were derived in Sen and Jain (Sen, K., Jain, R. (1996). Cluster generalized negative binomial distribution. In: Borthakur et al. A. C., Eds.; Probability Models and Statistics Medhi Festschrift, A. J., on the Occasion of his 70th Birthday. New Age International Publishers: New Delhi, 227–241.) and Sen and Jain (Sen, K., Jain, R. (1997). A multivariate generalized Polya-Eggenberger probability model-first passage approach. Communications in Statistics-Theory and Methods 26:871–884.), respectively. In this paper, all four distributions are generalized to a multivariate generalized negative binomial distribution of order k by means of an appropriate sampling scheme and a first passage event. This new distribution includes as special cases several known and new multivariate distributions of order k, and gives rise in the limit to multivariate generalized logarithmic, Poisson and Borel-Tanner distributions of the same order. Applications are indicated. 相似文献
3.
This paper addresses a generalization of the bivariate Cauchy distribution discussed by Fang et al. (1990), derived from a trivariate normal distribution with a general correlation matrix. We obtain explicit expressions for the joint distribution function and joint density function, and show that they reduce in a special case to the corresponding expressions of Fang et al. (1990). Finally, we show that this generalized distribution is useful in determining the orthant probability of a bivariate skew-normal distribution of Azzalini and Dalla Valle (1996). 相似文献
4.
In this article, an improved method of computing tolerance factors for constructing tolerance regions in a multivariate linear regression model is proposed. The method is based on a chi-square approximation to the distribution of a linear function of noncentral chi-square variables and simulation. The merits of the proposed approach and the usual simulation method considered in Lee and Mathew (2004) are evaluated using Monte Carlo simulation. The study indicates that the proposed approach is stable and accurate even for small samples, and better than available methods. For constructing two-sided tolerance intervals in multiple linear regression, coverage level adjusted one-sided tolerance factors are shown to be better than available approximate tolerance factors. The results based on the coverage level adjusted one-sided tolerance factors are as good as the ones based on the exact two-sided tolerance factors in many cases. 相似文献
5.
In this article, we consider two different shared frailty regression models under the assumption of Gompertz as baseline distribution. Mostly assumption of gamma distribution is considered for frailty distribution. To compare the results with gamma frailty model, we consider the inverse Gaussian shared frailty model also. We compare these two models to a real life bivariate survival data set of acute leukemia remission times (Freireich et al., 1963). Analysis is performed using Markov Chain Monte Carlo methods. Model comparison is made using Bayesian model selection criterion and a well-fitted model is suggested for the acute leukemia data. 相似文献
6.
Tony Vangeneugden Geert Molenberghs Geert Verbeke Clarice G.B. Demétrio 《统计学通讯:理论与方法》2014,43(19):4164-4178
In hierarchical data settings, be it of a longitudinal, spatial, multi-level, clustered, or otherwise repeated nature, often the association between repeated measurements attracts at least part of the scientific interest. Quantifying the association frequently takes the form of a correlation function, including but not limited to intraclass correlation. Vangeneugden et al. (2010) derived approximate correlation functions for longitudinal sequences of general data type, Gaussian and non-Gaussian, based on generalized linear mixed-effects models. Here, we consider the extended model family proposed by Molenberghs et al. (2010). This family flexibly accommodates data hierarchies, intra-sequence correlation, and overdispersion. The family allows for closed-form means, variance functions, and correlation function, for a variety of outcome types and link functions. Unfortunately, for binary data with logit link, closed forms cannot be obtained. This is in contrast with the probit link, for which such closed forms can be derived. It is therefore that we concentrate on the probit case. It is of interest, not only in its own right, but also as an instrument to approximate the logit case, thanks to the well-known probit-logit ‘conversion.’ Next to the general situation, some important special cases such as exchangeable clustered outcomes receive attention because they produce insightful expressions. The closed-form expressions are contrasted with the generic approximate expressions of Vangeneugden et al. (2010) and with approximations derived for the so-called logistic-beta-normal combined model. A simulation study explores performance of the method proposed. Data from a schizophrenia trial are analyzed and correlation functions derived. 相似文献
7.
In this article, we study the effect of censoring on the asymptotic efficiency of the two-sample rank tests based on multiple Type-II censored data. Since the scores generating functions associated with these test statistics have a finite number of jump discontinuities, we use a slightly modified version of a theorem of Dupac and Hajek (1969) to obtain their asymptotic distributions under fixed alternatives. This modified version, which leads to a simpler centering constant, is proved by Dupac (1970) in the light of results of Hoeffding (1968), an earlier version of Hoeffding (1973). Hence, we obtain the Pitman ARE's of these rank tests relative to the corresponding tests based on the complete samples. The ARE's are computed for some well known rank tests for two-sample location and scale problems, when the combined ordered samples from different underlying distributions are censored using triple and lower order Type-II censoring schemes. The effect of all these censoring schemes on the ARE's of the different tests is examined numerically. It is found that there is a gain in efficiency due to censoring in many of the cases considered here. This suggests that in such cases it is possible to improve the efficiency of rank tests by discarding suitable portions of the data. 相似文献
8.
Viswanathan Ramakrishnan 《统计学通讯:模拟与计算》2013,42(3):405-418
In many genetic analyses of dichotomous twin data, odds ratios have been used to test hypotheses on heritability and shared common environment effects of a given disease (Lichtenstein et al., 2000; Ahlbom et al., 1997; Ramakrishnan et al., 1992, 4). However, estimates of these two effects have not been dealt with in the literature. In epidemiology, the attributable fraction (AF), a function of the odds ratio and the prevalence of the risk factor has been used to describe the contribution of a risk factor to a disease in a given population (Leviton, 1973). In this article, we adapt the AF to quantify the heritability and the shared common environment. Twin data on cancer, gallstone disease and phobia are used to illustrate the applicability of the AF estimate as a measure of heritability. 相似文献
9.
Jun Hui 《统计学通讯:理论与方法》2013,42(6):935-941
This article establishes the limiting spectral distribution of large sample covariance matrices with m-dependent random variables under the second moment condition by verifying the condition of Theorem 1.1 in Bai and Zhou (2008). 相似文献
10.
A. Ahmad Abd El-Baset 《统计学通讯:理论与方法》2013,42(15):2762-2772
In this article, a class of reflected generalized Pareto distributions (cf. Burkschat et al., 2003) is considered. Recurrence relations for joint moment generating functions of higher non adjacent dual generalized order statistics based on a random sample drawn from the considered class are derived. Higher joint moments of non adjacent dual generalized order statistics (reversed ordered order statistics and lower k-records as special cases) are obtained. Recurrence relations for single and product moment generating functions and moments of higher non adjacent dual generalized order statistics are derived. Some results of higher moments of non adjacent generalized order statistics from generalized Pareto distributions (cf. Johnson et al., 1995), are obtained by using a relation connecting higher moments of generalized order statistics and its dual. 相似文献
11.
Abouzar Bazyari 《统计学通讯:模拟与计算》2017,46(9):7194-7209
Testing homogeneity of multivariate normal mean vectors under an order restriction when the covariance matrices are unknown, arbitrary positive definite and unequal are considered. This problem of testing has been studied to some extent, for example, by Kulatunga and Sasabuchi (1984) when the covariance matrices are known and also Sasabuchi et al. (2003) and Sasabuchi (2007) when the covariance matrices are unknown but common. In this paper, a test statistic is proposed and because of the main advantage of the bootstrap test is that it avoids the derivation of the complex null distribution analytically, a bootstrap test statistic is derived and since the proposed test statistic is location invariance the bootstrap p-value defined logical and some steps are presented to estimate it. Our numerical studies via Monte Carlo simulation show that the proposed bootstrap test can correctly control the type I error rates. The power of the test for some of the p-dimensional normal distributions is computed by Monte Carlo simulation. Also, the null distribution of test statistic is estimated using kernel density. Finally, the bootstrap test is illustrated using a real data. 相似文献
12.
We consider non-parametric estimation of a continuous cdf of a random vector (X 1, X 2). With bivariate RC data, it is stated in van der Laan (1996, p. 59810, Ann. Statist.), Quale et al. (2006, JASA) etc. that “it is well known that the NPMLE for continuous data is inconsistent (Tsai et al. (1986)).” The claim is based on a result in Tsai et al. (1986, p.1352, Ann. Statist.) that if X 1 is right censored but not X 2, then common ways for defining one NPMLE lead to inconsistency. If X 1 is right censored and X 2 is type I right-censored (which includes the case in Tsai et al.), we present a consistent NPMLE. The result corrects a common misinterpretation of Tsai's example (Tsai et al., 1986, Ann. Statist.). 相似文献
13.
A Bottom-Up Dynamic Model of Portfolio Credit Risk with Stochastic Intensities and Random Recoveries
Tomasz R. Bielecki Areski Cousin Stéphane Crépey Alexander Herbertsson 《统计学通讯:理论与方法》2014,43(7):1362-1389
In Bielecki et al. (2014a), the authors introduced a Markov copula model of portfolio credit risk where pricing and hedging can be done in a sound theoretical and practical way. Further theoretical backgrounds and practical details are developed in Bielecki et al. (2014b,c) where numerical illustrations assumed deterministic intensities and constant recoveries. In the present paper, we show how to incorporate stochastic default intensities and random recoveries in the bottom-up modeling framework of Bielecki et al. (2014a) while preserving numerical tractability. These two features are of primary importance for applications like CVA computations on credit derivatives (Assefa et al., 2011; Bielecki et al., 2012), as CVA is sensitive to the stochastic nature of credit spreads and random recoveries allow to achieve satisfactory calibration even for “badly behaved” data sets. This article is thus a complement to Bielecki et al. (2014a), Bielecki et al. (2014b) and Bielecki et al. (2014c). 相似文献
14.
Kanti V. Mardia 《统计学通讯:理论与方法》2014,43(6):1132-1144
In application areas like bioinformatics, multivariate distributions on angles are encountered which show significant clustering. One approach to statistical modeling of such situations is to use mixtures of unimodal distributions. In the literature (Mardia et al., 2012), the multivariate von Mises distribution, also known as the multivariate sine distribution, has been suggested for components of such models, but work in the area has been hampered by the fact that no good criteria for the von Mises distribution to be unimodal were available. In this article we study the question about when a multivariate von Mises distribution is unimodal. We give sufficient criteria for this to be the case and show examples of distributions with multiple modes when these criteria are violated. In addition, we propose a method to generate samples from the von Mises distribution in the case of high concentration. 相似文献
15.
Lindeman et al. [12] provide a unique solution to the relative importance of correlated predictors in multiple regression by averaging squared semi-partial correlations obtained for each predictor across all p! orderings. In this paper, we propose a series of predictor sensitivity statistics that complement the variance decomposition procedure advanced by Lindeman et al. [12]. First, we detail the logic of averaging over orderings as a technique of variance partitioning. Second, we assess predictors by conditional dominance analysis, a qualitative procedure designed to overcome defects in the Lindeman et al. [12] variance decomposition solution. Third, we introduce a suite of indices to assess the sensitivity of a predictor to model specification, advancing a series of sensitivity-adjusted contribution statistics that allow for more definite quantification of predictor relevance. Fourth, we describe the analytic efficiency of our proposed technique against the Budescu conditional dominance solution to the uneven contribution of predictors across all p! orderings. 相似文献
16.
Łukasz Smaga 《统计学通讯:模拟与计算》2017,46(10):7654-7667
The nonparametric and parametric bootstrap methods for multivariate hypothesis testing are developed. They are used to approximate the null distribution of the test statistics proposed by Duchesne and Francq (2015), resulting in bootstrap testing procedures. In the problem of testing for the mean vector of a multivariate distribution, the asymptotic validity of the bootstrap methods is proved. The finite sample performance of the new solutions is demonstrated by means of Monte Carlo simulation studies. They indicate that for small-sample size, the bootstrap tests provide a better finite sample properties than the asymptotic tests considered by Duchesne and Francq (2015). 相似文献
17.
《统计学通讯:理论与方法》2013,42(5):875-885
The order of experimental runs in a fractional factorial experiment is essential when the cost of level changes in factors is considered. The generalized foldover scheme given by [1]gives an optimal order to experimental runs in an experiment with specified defining contrasts. An experiment can be specified by a design requirement such as resolution or estimation of some interactions. To meet such a requirement, we can find several sets of defining contrasts. Applying the generalized foldover scheme to these sets of defining contrasts, we obtain designs with different numbers of level changes and then the design with minimum number of level changes. The difficulty is to find all the sets of defining contrasts. An alternative approach is investigated by [2]for two-level fractional factorial experiments. In this paper, we investigate experiments with all factors in slevels. 相似文献
18.
The Significance Analysis of Microarrays (SAM; Tusher et al., 2001) method is widely used in analyzing gene expression data while controlling the FDR by using resampling-based procedure in the microarray setting. One of the main components of the SAM procedure is the adjustment of the test statistic. The introduction of the fudge factor to the test statistic aims at deflating the large value of test statistics due to the small standard error of gene-expression. Lin et al. (2008) pointed out that the fudge factor does not effectively improve the power and the control of the FDR as compared to the SAM procedure without the fudge factor in the presence of small variance genes. Motivated by the simulation results presented in Lin et al. (2008), in this article, we extend our study to compare several methods for choosing the fudge factor in the modified t-type test statistics and use simulation studies to investigate the power and the control of the FDR of the considered methods. 相似文献
19.
《统计学通讯:理论与方法》2013,42(8-9):1789-1810
Mudholkar and Srivastava [1]adapted Mudholkar and Subbaiah's [2]modified stepwise procedure, using the trimmed means in place of the means and appropriate studentization, to construct robust tests for the significance of a mean vector. They concluded that the robust alternatives provide excellent type I error control, and a substantial gain in power over Hotelling's T 2test in case of heavy tailed populations without significant loss of power when the population is normal. In this paper we adapt the modified stepwise approach to construct simple tests for the significance of the orthant constrained mean vector of a p-variate normal population with unknown covariance matrix, and also for constructing robust tests without assuming normality. The simple normal theory tests have exact type I error, whereas the robust tests provide a reasonably type I error control and substantial power advantage over Perlman's [3]likelihood ratio test. 相似文献
20.
Hall et al. (2007) propose a method for moment selection based on an information criterion that is a function of the entropy of the limiting distribution of the Generalized Method of Moments (GMM) estimator. They establish the consistency of the method subject to certain conditions that include the identification of the parameter vector by at least one of the moment conditions being considered. In this article, we examine the limiting behavior of this moment selection method when the parameter vector is weakly identified by all the moment conditions being considered. It is shown that the selected moment condition is random and hence not consistent in any meaningful sense. As a result, we propose a two-step procedure for moment selection in which identification is first tested using a statistic proposed by Stock and Yogo (2003) and then only if this statistic indicates identification does the researcher proceed to the second step in which the aforementioned information criterion is used to select moments. The properties of this two-step procedure are contrasted with those of strategies based on either using all available moments or using the information criterion without the identification pre-test. The performances of these strategies are compared via an evaluation of the finite sample behavior of various methods for inference about the parameter vector. The inference methods considered are based on the Wald statistic, Anderson and Rubin's (1949) statistic, Kleibergen (2002) K statistic, and combinations thereof in which the choice is based on the outcome of the test for weak identification. 相似文献