首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 687 毫秒
1.
I consider the design of multistage sampling schemes for epidemiologic studies involving latent variable models, with surrogate measurements of the latent variables on a subset of subjects. Such models arise in various situations: when detailed exposure measurements are combined with variables that can be used to assign exposures to unmeasured subjects; when biomarkers are obtained to assess an unobserved pathophysiologic process; or when additional information is to be obtained on confounding or modifying variables. In such situations, it may be possible to stratify the subsample on data available for all subjects in the main study, such as outcomes, exposure predictors, or geographic locations. Three circumstances where analytic calculations of the optimal design are possible are considered: (i) when all variables are binary; (ii) when all are normally distributed; and (iii) when the latent variable and its measurement are normally distributed, but the outcome is binary. In each of these cases, it is often possible to considerably improve the cost efficiency of the design by appropriate selection of the sampling fractions. More complex situations arise when the data are spatially distributed: the spatial correlation can be exploited to improve exposure assignment for unmeasured locations using available measurements on neighboring locations; some approaches for informative selection of the measurement sample using location and/or exposure predictor data are considered.  相似文献   

2.
The Hilbert–Huang transform uses the empirical mode decomposition (EMD) method to analyze nonlinear and nonstationary data. This method breaks a time series of data into several orthogonal sequences based on differences in frequency. These data components include the intrinsic mode functions (IMFs) and the final residue. Although IMFs have been used in the past as predictors for other variables, very little effort has been devoted to identifying the most effective predictors among IMFs. As lasso is a widely used method for feature selection within complex datasets, the main objective of this article is to present a lasso regression based on the EMD method for choosing decomposed components that exhibit the strongest effects. Both numerical experiments and empirical results show that the proposed modeling process can use time-frequency structure within data to reveal interactions between two variables. This allows for more accurate predictions concerning future events.  相似文献   

3.
For estimation of time-varying coefficient longitudinal models, the widely used local least-squares (LS) or covariance-weighted local LS smoothing uses information from the local sample average. Motivated by the fact that a combination of multiple quantiles provides a more complete picture of the distribution, we investigate quantile regression-based methods to improve efficiency by optimally combining information across quantiles. Under the working independence scenario, the asymptotic variance of the proposed estimator approaches the Cramér–Rao lower bound. In the presence of dependence among within-subject measurements, we adopt a prewhitening technique to transform regression errors into independent innovations and show that the prewhitened optimally weighted quantile average estimator asymptotically achieves the Cramér–Rao bound for the independent innovations. Fully data-driven bandwidth selection and optimal weights estimation are implemented through a two-step procedure. Monte Carlo studies show that the proposed method delivers more robust and superior overall performance than that of the existing methods.  相似文献   

4.
ABSTRACT

Environmental data is typically indexed in space and time. This work deals with modelling spatio-temporal air quality data, when multiple measurements are available for each space-time point. Typically this situation arises when different measurements referring to several response variables are observed in each space-time point, for example, different pollutants or size resolved data on particular matter. Nonetheless, such a kind of data also arises when using a mobile monitoring station moving along a path for a certain period of time. In this case, each spatio-temporal point has a number of measurements referring to the response variable observed several times over different locations in a close neighbourhood of the space-time point. We deal with this type of data within a hierarchical Bayesian framework, in which observed measurements are modelled in the first stage of the hierarchy, while the unobserved spatio-temporal process is considered in the following stages. The final model is very flexible and includes autoregressive terms in time, different structures for the variance-covariance matrix of the errors, and can manage covariates available at different space-time resolutions. This approach is motivated by the availability of data on urban pollution dynamics: fast measures of gases and size resolved particulate matter have been collected using an Optical Particle Counter located on a cabin of a public conveyance that moves on a monorail on a line transect of a town. Urban microclimate information is also available and included in the model. Simulation studies are conducted to evaluate the performance of the proposed model over existing alternatives that do not model data over the first stage of the hierarchy.  相似文献   

5.
We consider nonparametric estimation of a regression curve when the data are observed with Berkson errors or with a mixture of classical and Berkson errors. In this context, other existing nonparametric procedures can either estimate the regression curve consistently on a very small interval or require complicated inversion of an estimator of the Fourier transform of a nonparametric regression estimator. We introduce a new estimation procedure which is simpler to implement, and study its asymptotic properties. We derive convergence rates which are faster than those previously obtained in the literature, and we prove that these rates are optimal. We suggest a data-driven bandwidth selector and apply our method to some simulated examples.  相似文献   

6.
Here we consider a multinomial probit regression model where the number of variables substantially exceeds the sample size and only a subset of the available variables is associated with the response. Thus selecting a small number of relevant variables for classification has received a great deal of attention. Generally when the number of variables is substantial, sparsity-enforcing priors for the regression coefficients are called for on grounds of predictive generalization and computational ease. In this paper, we propose a sparse Bayesian variable selection method in multinomial probit regression model for multi-class classification. The performance of our proposed method is demonstrated with one simulated data and three well-known gene expression profiling data: breast cancer data, leukemia data, and small round blue-cell tumors. The results show that compared with other methods, our method is able to select the relevant variables and can obtain competitive classification accuracy with a small subset of relevant genes.  相似文献   

7.
This work deals with semiparametric kernel estimator of probability mass functions which are assumed to be modified Poisson distributions. This semiparametric approach is based on discrete associated kernel method appropriated for modelling count data; in particular, the famous discrete symmetric triangular kernels are used. Two data-driven bandwidth selection procedures are investigated and an explicit expression of optimal bandwidth not available until now is provided. Moreover, some asymptotic properties of the cross-validation criterion adapted for discrete semiparametric kernel estimation are studied. Finally, to measure the performance of semiparametric estimator according to each type of bandwidth parameter, some applications are realized on three real count data-sets from sociology and biology.  相似文献   

8.
Bandwidth plays an important role in determining the performance of nonparametric estimators, such as the local constant estimator. In this article, we propose a Bayesian approach to bandwidth estimation for local constant estimators of time-varying coefficients in time series models. We establish a large sample theory for the proposed bandwidth estimator and Bayesian estimators of the unknown parameters involved in the error density. A Monte Carlo simulation study shows that (i) the proposed Bayesian estimators for bandwidth and parameters in the error density have satisfactory finite sample performance; and (ii) our proposed Bayesian approach achieves better performance in estimating the bandwidths than the normal reference rule and cross-validation. Moreover, we apply our proposed Bayesian bandwidth estimation method for the time-varying coefficient models that explain Okun’s law and the relationship between consumption growth and income growth in the U.S. For each model, we also provide calibrated parametric forms of the time-varying coefficients. Supplementary materials for this article are available online.  相似文献   

9.
Nonparametric tests of modality are a distribution-free way of assessing evidence about inhomogeneity in a population, provided that the potential sub populations are sufficiently well separated. They include the excess mass and dip tests, which are equivalent in univariate settings and are alternatives to the bandwidth test. Only very conservative forms of the excess mass and dip tests are available at presently, however, and for that reason they are generally not competitive with the bandwidth test. In the present paper we develop a practical approach to calibrating the excess mass and dip tests to improve their level accuracy and power substantially. Our method exploits the fact that the limiting distribution of the excess mass statistic under the null hypothesis depends on unknowns only through a constant, which may be estimated. Our calibrated test exploits this fact and is shown to have greater power and level accuracy than the bandwidth test has. The latter tends to be quite conservative, even in an asymptotic sense. Moreover, the calibrated test avoids difficulties that the bandwidth test has with spurious modes in the tails, which often must be discounted through subjective intervention of the experimenter.  相似文献   

10.
11.
A fast Bayesian method that seamlessly fuses classification and hypothesis testing via discriminant analysis is developed. Building upon the original discriminant analysis classifier, modelling components are added to identify discriminative variables. A combination of cake priors and a novel form of variational Bayes we call reverse collapsed variational Bayes gives rise to variable selection that can be directly posed as a multiple hypothesis testing approach using likelihood ratio statistics. Some theoretical arguments are presented showing that Chernoff-consistency (asymptotically zero type I and type II error) is maintained across all hypotheses. We apply our method on some publicly available genomics datasets and show that our method performs well in practice for its computational cost. An R package VaDA has also been made available on Github.  相似文献   

12.
The class of single-index models (SIMs) has become an important tool for nonparametric regression analysis. As with any other nonparametric regression models, the selection of bandwidth plays an important role in the inferences of the SIMs. However, most results in the literature either take the bandwidths as externally given, or require unpractical assumptions or very restrictive conditions for data-driven bandwidths. We examine the asymptotic properties of a popular bandwidth selection method based on cross-validation that is completely data-driven, under much weaker conditions than those assumed in the literature. And we show that the same bandwidth that is optimal for estimating the index vector, can be used for nearly optimal error variance estimation through the method of varying cross-validation. A simulation study is presented to demonstrate the finite sample performance of the proposed procedures, based on which we recommend a simple 2-step procedure for bandwidth selection, index vector estimation, as well as error variance estimation.  相似文献   

13.
The traditional and readily available multivariate analysis of variance (MANOVA) tests such as Wilks' Lambda and the Pillai–Bartlett trace start to suffer from low power as the number of variables approaches the sample size. Moreover, when the number of variables exceeds the number of available observations, these statistics are not available for use. Ridge regularisation of the covariance matrix has been proposed to allow the use of MANOVA in high‐dimensional situations and to increase its power when the sample size approaches the number of variables. In this paper two forms of ridge regression are compared to each other and to a novel approach based on lasso regularisation, as well as to more traditional approaches based on principal components and the Moore‐Penrose generalised inverse. The performance of the different methods is explored via an extensive simulation study. All the regularised methods perform well; the best method varies across the different scenarios, with margins of victory being relatively modest. We examine a data set of soil compaction profiles at various positions relative to a ridgetop, and illustrate how our results can be used to inform the selection of a regularisation method.  相似文献   

14.
We study errors‐in‐variables problems when the response is binary and instrumental variables are available. We construct consistent estimators through taking advantage of the prediction relation between the unobservable variables and the instruments. The asymptotic properties of the new estimator are established and illustrated through simulation studies. We also demonstrate that the method can be readily generalized to generalized linear models and beyond. The usefulness of the method is illustrated through a real data example.  相似文献   

15.
Kernel-based density estimation algorithms are inefficient in presence of discontinuities at support endpoints. This is substantially due to the fact that classic kernel density estimators lead to positive estimates beyond the endopoints. If a nonparametric estimate of a density functional is required in determining the bandwidth, then the problem also affects the bandwidth selection procedure. In this paper algorithms for bandwidth selection and kernel density estimation are proposed for non-negative random variables. Furthermore, the methods we propose are compared with some of the principal solutions in the literature through a simulation study.  相似文献   

16.
Linear discriminant analysis between two populations is considered in this paper. Error rate is reviewed as a criterion for selection of variables, and a stepwise procedure is outlined that selects variables on the basis of empirical estimates of error. Problems with assessment of the selected variables are highlighted. A leave-one-out method is proposed for estimating the true error rate of the selected variables, or alternatively of the selection procedure itself. Monte Carlo simulations, of multivariate binary as well as multivariate normal data, demonstrate the feasibility of the proposed method and indicate its much greater accuracy relative to that of other available methods.  相似文献   

17.
A common problem in environmental epidemiology is the estimation and mapping of spatial variation in disease risk. In this paper we analyse data from the Walsall District Health Authority, UK, concerning the spatial distributions of cancer cases compared with controls sampled from the population register. We formulate the risk estimation problem as a nonparametric binary regression problem and consider two different methods of estimation. The first uses a standard kernel method with a cross-validation criterion for choosing the associated bandwidth parameter. The second uses the framework of the generalized additive model (GAM) which has the advantage that it can allow for additional explanatory variables, but is computationally more demanding. For the Walsall data, we obtain similar results using either the kernel method with controls stratified by age and sex to match the age–sex distribution of the cases or the GAM method with random controls but incorporating age and sex as additional explanatory variables. For cancers of the lung or stomach, the analysis shows highly statistically significant spatial variation in risk. For the less common cancers of the pancreas, the spatial variation in risk is not statistically significant.  相似文献   

18.
This paper focuses on bivariate kernel density estimation that bridges the gap between univariate and multivariate applications. We propose a subsampling-extrapolation bandwidth matrix selector that improves the reliability of the conventional cross-validation method. The proposed procedure combines a U-statistic expression of the mean integrated squared error and asymptotic theory, and can be used in both cases of diagonal bandwidth matrix and unconstrained bandwidth matrix. In the subsampling stage, one takes advantage of the reduced variability of estimating the bandwidth matrix at a smaller subsample size m (m < n); in the extrapolation stage, a simple linear extrapolation is used to remove the incurred bias. Simulation studies reveal that the proposed method reduces the variability of the cross-validation method by about 50% and achieves an expected integrated squared error that is up to 30% smaller than that of the benchmark cross-validation. It shows comparable or improved performance compared to other competitors across six distributions in terms of the expected integrated squared error. We prove that the components of the selected bivariate bandwidth matrix have an asymptotic multivariate normal distribution, and also present the relative rate of convergence of the proposed bandwidth selector.  相似文献   

19.
Regression parameter estimation in the Cox failure time model is considered when regression variables are subject to measurement error. Assuming that repeat regression vector measurements adhere to a classical measurement model, we can consider an ordinary regression calibration approach in which the unobserved covariates are replaced by an estimate of their conditional expectation given available covariate measurements. However, since the rate of withdrawal from the risk set across the time axis, due to failure or censoring, will typically depend on covariates, we may improve the regression parameter estimator by recalibrating within each risk set. The asymptotic and small sample properties of such a risk set regression calibration estimator are studied. A simple estimator based on a least squares calibration in each risk set appears able to eliminate much of the bias that attends the ordinary regression calibration estimator under extreme measurement error circumstances. Corresponding asymptotic distribution theory is developed, small sample properties are studied using computer simulations and an illustration is provided.  相似文献   

20.
The present study proposes a method to estimate the yield of a crop. The proposed Gaussian quadrature (GQ) method makes it possible to estimate the crop yield from a smaller subsample. Identification of plots and corresponding weights to be assigned to the yield of plots comprising a subsample is done with the help of information about the full sample on certain auxiliary variables relating to biometrical characteristics of the plant. Computational experience reveals that the proposed method leads to about 78% reduction in sample size with absolute percentage error of 2.7%. Performance of the proposed method has been compared with that of random sampling on the basis of the values of average absolute percentage error and standard deviation of yield estimates obtained from 40 samples of comparable size. Interestingly, average absolute percentage error as well as standard deviation is considerably smaller for the GQ estimates than for the random sample estimates. The proposed method is quite general and can be applied for other crops as well-provided information on auxiliary variables relating to yield contributing biometrical characteristics is available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号