首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.

We propose a semiparametric framework based on sliced inverse regression (SIR) to address the issue of variable selection in functional regression. SIR is an effective method for dimension reduction which computes a linear projection of the predictors in a low-dimensional space, without loss of information on the regression. In order to deal with the high dimensionality of the predictors, we consider penalized versions of SIR: ridge and sparse. We extend the approaches of variable selection developed for multidimensional SIR to select intervals that form a partition of the definition domain of the functional predictors. Selecting entire intervals rather than separated evaluation points improves the interpretability of the estimated coefficients in the functional framework. A fully automated iterative procedure is proposed to find the critical (interpretable) intervals. The approach is proved efficient on simulated and real data. The method is implemented in the R package SISIR available on CRAN at https://cran.r-project.org/package=SISIR.

  相似文献   

2.
In this article, we study the methods for two-sample hypothesis testing of high-dimensional data coming from a multivariate binary distribution. We test the random projection method and apply an Edgeworth expansion for improvement. Additionally, we propose new statistics which are especially useful for sparse data. We compare the performance of these tests in various scenarios through simulations run in a parallel computing environment. Additionally, we apply these tests to the 20 Newsgroup data showing that our proposed tests have considerably higher power than the others for differentiating groups of news articles with different topics.  相似文献   

3.
4.
Summary.  Sparse clustered data arise in finely stratified genetic and epidemiologic studies and pose at least two challenges to inference. First, it is difficult to model and interpret the full joint probability of dependent discrete data, which limits the utility of full likelihood methods. Second, standard methods for clustered data, such as pairwise likelihood and the generalized estimating function approach, are unsuitable when the data are sparse owing to the presence of many nuisance parameters. We present a composite conditional likelihood for use with sparse clustered data that provides valid inferences about covariate effects on both the marginal response probabilities and the intracluster pairwise association. Our primary focus is on sparse clustered binary data, in which case the method proposed utilizes doubly discordant quadruplets drawn from each stratum to conduct inference about the intracluster pairwise odds ratios.  相似文献   

5.
Smoothing of noisy sample covariances is an important component in functional data analysis. We propose a novel covariance smoothing method based on penalized splines and associated software. The proposed method is a bivariate spline smoother that is designed for covariance smoothing and can be used for sparse functional or longitudinal data. We propose a fast algorithm for covariance smoothing using leave-one-subject-out cross-validation. Our simulations show that the proposed method compares favorably against several commonly used methods. The method is applied to a study of child growth led by one of coauthors and to a public dataset of longitudinal CD4 counts.  相似文献   

6.
Many authors have studied variable selection in multiple linear regression models. In this paper, we derive some generalized selection procedures for the linear models. An approximation of noncentral F distribution has also been obtained.  相似文献   

7.
In life testing and survival analyses which involve the use of expensive equipment the cost of continuing an experiment until all the items on test have failed can be quite high. In these situations it is reasonable to make a statistical test when a pre-specified percentile, e.g. median of the control group has been observed. This article adapts some existing procedures for complete samples to randomly censored data. The results of Lo and Singh (1985) who extended the Bahadur representation of quantiles to the censored case enable us to use the methods of Gastwirth (1968) and Hettmansperger (1973) which were based on Bahadur's result to extend the procedures of Mathisen (1943), Gart (1963) and Slivka (1970).The large sample efficiency of the control median test is the same as that of Brookmeyer and Crowley's (1982) extension of the usual median test. For the two-sample shift problem with observations following the double-exponential law, the median remains the optimum percentile to use until the censoring becomes quite heavy. On the other hand, in the two-sample scale parameter problem for data from an exponential distribution the percentile (80th in the uncensored case) yielding the asymptotically most powerful test in the family of control percentile tests no longer is optimum. The effect becomes noticeable when 25% or more of the data is censored.  相似文献   

8.
Q. F. Xu  C. Cai  X. Huang 《Statistics》2019,53(1):26-42
In recent decades, quantile regression has received much more attention from academics and practitioners. However, most of existing computational algorithms are only effective for small or moderate size problems. They cannot solve quantile regression with large-scale data reliably and efficiently. To this end, we propose a new algorithm to implement quantile regression on large-scale data using the sparse exponential transform (SET) method. This algorithm mainly constructs a well-conditioned basis and a sampling matrix to reduce the number of observations. It then solves a quantile regression problem on this reduced matrix and obtains an approximate solution. Through simulation studies and empirical analysis of a 5% sample of the US 2000 Census data, we demonstrate efficiency of the SET-based algorithm. Numerical results indicate that our new algorithm is effective in terms of computation time and performs well for large-scale quantile regression.  相似文献   

9.
10.
Statistical methods for an asymmetric normal classification do not adapt well to the situations where the population distributions are perturbed by an interval-screening scheme. This paper explores methods for providing an optimal classification of future samples in this situation. The properties of the screened population distributions are considered and two optimal regions for classifying the future samples are obtained. These developments yield yet other rules for the interval-screened asymmetric normal classification. The rules are studied from several aspects such as the probability of misclassification, robustness, and estimation of the rules. The investigation of the performance of the rules as well as the illustration of the screened classification idea, using two numerical examples, is also considered.  相似文献   

11.
Interval-censored data are very common in the reliability and lifetime data analysis. This paper investigates the performance of different estimation procedures for a special type of interval-censored data, i.e. grouped data, from three widely used lifetime distributions. The approaches considered here include the maximum likelihood estimation, the minimum distance estimation based on chi-square criterion, the moment estimation based on imputation (IM) method and an ad hoc estimation procedure. Although IM-based techniques are extensively used recently, we show that this method is not always effective. It is found that the ad hoc estimation procedure is equivalent to the minimum distance estimation with another distance metric and more effective in the simulation. The procedures of different approaches are presented and their performances are investigated by Monte Carlo simulation for various combinations of sample sizes and parameter settings. The numerical results provide guidelines to analyse grouped data for practitioners when they need to choose a good estimation approach.  相似文献   

12.
On some data oriented robust estimation procedures for means   总被引:3,自引:0,他引:3  
Data oriented to estimate means is very important for large data sets. Since outliers usually occur, the trimmed mean is a robust estimator of locations. After building a reasonable linear model to explain the relationship between the suitably transformed symmetric data and the approximately standardized normal statistics, we find the trimmed proportion based on the smallest variance of trimmed means. The related statistical inference is also discussed. An empirical study based on an annual survey about inbound visitors in the Taiwan area is used to achieve our goal in deciding the trimmed proportion. In this study, we propose a complete procedure to attain the goal.  相似文献   

13.
A two-sample problem for rank-order data is formulated as a two-decision problem. Using the general Bayes solution, Bayes procedures are derived for several configurations of the set of states of nature including some for which the problem is distribution-free. It is shown that for certain prior distributions these procedures reduce to classical LMP rank tests. Some devices for selection of prior distributions are suggested. It is shown that the Bayes risk of these procedures tends to zero as sample sizes increase.  相似文献   

14.
Methods for estimating probabilities on sample spaces for ordered-categorical variables are surveyed. The methods all involve smoothing the relative frequencies in manners which recognise the ordering among categories. Approaches of this type include convex smoothing, weighting-function and kernel-based methods, near neighbour methods, Bayes-based methods and penalized minimum-distance methods. The relationships among the methods are brought out, application is made to a medical example and a simulation study is reported which compares the methods on univariate and bivariate examples. Links with smoothing procedures in other contexts are indicated.  相似文献   

15.
This paper introduces some robust estimation procedures to estimate quantiles of a continuous random variable based on data, without any other assumptions of probability distribution. We construct a reasonable linear regression model to connect the relationship between a suitable symmetric data transformation and the approximate standard normal statistics. Statistical properties of this linear regression model and its applications are studied, including estimators of quantiles, quartile mean, quartile deviation, correlation coefficient of quantiles and standard errors of these estimators. We give some empirical examples to illustrate the statistical properties and apply our estimators to grouping data.  相似文献   

16.
17.
In this article, we propose a novel approach to fit a functional linear regression in which both the response and the predictor are functions. We consider the case where the response and the predictor processes are both sparsely sampled at random time points and are contaminated with random errors. In addition, the random times are allowed to be different for the measurements of the predictor and the response functions. The aforementioned situation often occurs in longitudinal data settings. To estimate the covariance and the cross‐covariance functions, we use a regularization method over a reproducing kernel Hilbert space. The estimate of the cross‐covariance function is used to obtain estimates of the regression coefficient function and of the functional singular components. We derive the convergence rates of the proposed cross‐covariance, the regression coefficient, and the singular component function estimators. Furthermore, we show that, under some regularity conditions, the estimator of the coefficient function has a minimax optimal rate. We conduct a simulation study and demonstrate merits of the proposed method by comparing it to some other existing methods in the literature. We illustrate the method by an example of an application to a real‐world air quality dataset. The Canadian Journal of Statistics 47: 524–559; 2019 © 2019 Statistical Society of Canada  相似文献   

18.
We consider model selection for linear mixed-effects models with clustered structure, where conditional Kullback–Leibler (CKL) loss is applied to measure the efficiency of the selection. We estimate the CKL loss by substituting the empirical best linear unbiased predictors (EBLUPs) into random effects with model parameters estimated by maximum likelihood. Although the BLUP approach is commonly used in predicting random effects and future observations, selecting random effects to achieve asymptotic loss efficiency concerning CKL loss is challenging and has not been well studied. In this paper, we propose addressing this difficulty using a conditional generalized information criterion (CGIC) with two tuning parameters. We further consider a challenging but practically relevant situation where the number, m $$ m $$ , of clusters does not go to infinity with the sample size. Hence the random-effects variances are not consistently estimable. We show that via a novel decomposition of the CKL risk, the CGIC achieves consistency and asymptotic loss efficiency, whether m $$ m $$ is fixed or increases to infinity with the sample size. We also conduct numerical experiments to illustrate the theoretical findings.  相似文献   

19.
Statistics and Computing - We propose a novel structure selection method for high-dimensional ( $$d > 100$$ ) sparse vine copulas. Current sequential greedy approaches for structure...  相似文献   

20.
This paper presents a brief introduction to selection and ranking methodology. Both indifference zone and subset selection approaches are discussed along with some modifications and generalizations. Two examples are provided to illustate the use of subset selection and the indifference zone approaches. The paper concludes with the remark that selection and ranking methodology is a realistic approach in statistical analyses involving comparisons among two or more treatments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号