首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The authors consider the optimal design of sampling schedules for binary sequence data. They propose an approach which allows a variety of goals to be reflected in the utility function by including deterministic sampling cost, a term related to prediction, and if relevant, a term related to learning about a treatment effect To this end, they use a nonparametric probability model relying on a minimal number of assumptions. They show how their assumption of partial exchangeability for the binary sequence of data allows the sampling distribution to be written as a mixture of homogeneous Markov chains of order k. The implementation follows the approach of Quintana & Müller (2004), which uses a Dirichlet process prior for the mixture.  相似文献   

2.
Nonparametric bootstrapping for hierarchical data is relatively underdeveloped and not straightforward: certainly it does not make sense to use simple nonparametric resampling, which treats all observations as independent. We have provided some resampling strategies of hierarchical data, proved that the strategy of nonparametric bootstrapping on the highest level (randomly sampling all other levels without replacement within the highest level selected by randomly sampling the highest levels with replacement) is better than that on lower levels, analyzed real data and performed simulation studies.  相似文献   

3.
We propose several diagnostic methods for checking the adequacy of marginal regression models for analyzing correlated binary data. We use a parametric marginal model based on latent variables and derive the projection (hat) matrix, Cook's distance, various residuals and Mahalanobis distance between the observed binary responses and the estimated probabilities for a cluster. Emphasized are several graphical methods including the simulated Q-Q plot, the half-normal probability plot with a simulated envelope, and the partial residual plot. The methods are illustrated with a real life example.  相似文献   

4.
Length-biased data are a particular case of weighted data, which arise in many situations: biomedicine, quality control or epidemiology among others. In this paper we study the theoretical properties of kernel density estimation in the context of length-biased data, proposing two consistent bootstrap methods that we use for bandwidth selection. Apart from the bootstrap bandwidth selectors we suggest a rule-of-thumb. These bandwidth selection proposals are compared with a least-squares cross-validation method. A simulation study is accomplished to understand the behaviour of the procedures in finite samples.  相似文献   

5.
Consider a detector which records the times at which the endogenous variable of a nonparametric regression model exceeds a certain threshold. If the error distribution is known, the regression function can still be identified from these threshold data. The author constructs estimators for the regression function that are transformations of kernel estimators. She determines the bandwidth that minimizes the asymptotic mean average squared error. Her investigation was motivated by recent work on stochastic resonance in neuroscience and signal detection theory, where it was observed that detection of a subthreshold signal is enhanced by the addition of noise. The author compares her model with several others that have been proposed in the recent past.  相似文献   

6.
This paper investigates a nonparametric spatial predictor of a stationary multidimensional spatial process observed over a rectangular domain. The proposed predictor depends on two kernels in order to control both the distance between observations and that between spatial locations. The uniform almost complete consistency and the asymptotic normality of the kernel predictor are obtained when the sample considered is an alpha-mixing sequence. Numerical studies were carried out in order to illustrate the behaviour of our methodology both for simulated data and for an environmental data set.  相似文献   

7.
We present a simulation study and application that shows inclusion of binary proxy variables related to binary unmeasured confounders improves the estimate of a related treatment effect in binary logistic regression. The simulation study included 60,000 randomly generated parameter scenarios of sample size 10,000 across six different simulation structures. We assessed bias by comparing the probability of finding the expected treatment effect relative to the modeled treatment effect with and without the proxy variable. Inclusion of a proxy variable in the logistic regression model significantly reduced the bias of the treatment or exposure effect when compared to logistic regression without the proxy variable. Including proxy variables in the logistic regression model improves the estimation of the treatment effect at weak, moderate, and strong association with unmeasured confounders and the outcome, treatment, or proxy variables. Comparative advantages held for weakly and strongly collapsible situations, as the number of unmeasured confounders increased, and as the number of proxy variables adjusted for increased.  相似文献   

8.
Exponential regression model is important in analyzing data from heterogeneous populations. In this paper we propose a simple method to estimate the regression parameters using binary data. Under certain design distributions, including ellipticaily symmetric distributions, for the explanatory variables, the estimators are shown to be consistent and asymptotically normal when sample size is large. For finite samples, the new estimates were shown to behave reasonably well. They are competitive with the maximum likelihood estimates and more importantly, according to our simulation results, the cost of CPU time for computing new estimates is only 1/7 of that required for computing the usual maximum likelihood estimates. We expect the savings in CPU time would be more dramatic with larger dimension of the regression parameter space.  相似文献   

9.
10.
In this article, we study the methods for two-sample hypothesis testing of high-dimensional data coming from a multivariate binary distribution. We test the random projection method and apply an Edgeworth expansion for improvement. Additionally, we propose new statistics which are especially useful for sparse data. We compare the performance of these tests in various scenarios through simulations run in a parallel computing environment. Additionally, we apply these tests to the 20 Newsgroup data showing that our proposed tests have considerably higher power than the others for differentiating groups of news articles with different topics.  相似文献   

11.
Right, left or interval censored multivariate data can be represented by an intersection graph. Focussing on the bivariate case, the authors relate the structure of such an intersection graph to the support of the nonparametric maximum likelihood estimate (NPMLE) of the cumulative distribution function (CDF) for such data. They distinguish two types of non‐uniqueness of the NPMLE: representational, arising when the likelihood is unaffected by the distribution of the estimated probability mass within regions, and mixture, arising when the masses themselves are not unique. The authors provide a brief overview of estimation techniques and examine three data sets.  相似文献   

12.
In this paper we discuss the partial least squares (PLS) prediction method. The method is compared to the predictor based on principal component regression (PCR). Both theoretical considerations and computations on artificial and real data are presented.  相似文献   

13.
Methods for the simultaneous analysis of the relationships of binary variables for efficacy and toxicity to dosage of an experimental drug are developed. Properties of two models of ‘within-dose’ dependence of efficacy and toxicity in parallel designs - one a bivariate analogue of the familiar univariate logistic model, and the other an adaptation of a general model developed by D.R. Cox– are explored. The cell probabilities predicted by these models are often quite similar to those predicted by a model of independence of efficacy and toxicity, but large discrepancies can occur when there is approximate equality of the median effective and median toxic doses. Asymptotic variances of estimates of parameters involved in assessing correlation are large when there is little or no dependence in the data, but parameters can be estimated with good precision in at least some cases of moderate to strong dependence between efficacy and toxicity.  相似文献   

14.
We propose a universal robust likelihood that is able to accommodate correlated binary data without any information about the underlying joint distributions. This likelihood function is asymptotically valid for the regression parameter for any underlying correlation configurations, including varying under- or over-dispersion situations, which undermines one of the regularity conditions ensuring the validity of crucial large sample theories. This robust likelihood procedure can be easily implemented by using any statistical software that provides naïve and sandwich covariance matrices for regression parameter estimates. Simulations and real data analyses are used to demonstrate the efficacy of this parametric robust method.  相似文献   

15.
Doubly truncated data appear in a number of applications, including astronomy and survival analysis. For double-truncated data, the lifetime T is observable only when UTV, where U and V are the left-truncated and right-truncated time, respectively. In some situations, the lifetime T also suffers interval censoring. Using the EM algorithm of Turnbull [The empirical distribution function with arbitrarily grouped censored and truncated data, J. R. Stat. Soc. Ser. B 38 (1976), pp. 290–295] and iterative convex minorant algorithm [P. Groeneboom and J.A. Wellner, Information Bounds and Nonparametric Maximum Likelihood Estimation, Birkhäuser, Basel, 1992], we study the performance of the nonparametric maximum-likelihood estimates (NPMLEs) of the distribution function of T. Simulation results indicate that the NPMLE performs adequately for the finite sample.  相似文献   

16.
The authors consider regression analysis for binary data collected repeatedly over time on members of numerous small clusters of individuals sharing a common random effect that induces dependence among them. They propose a mixed model that can accommodate both these structural and longitudinal dependencies. They estimate the parameters of the model consistently and efficiently using generalized estimating equations. They show through simulations that their approach yields significant gains in mean squared error when estimating the random effects variance and the longitudinal correlations, while providing estimates of the fixed effects that are just as precise as under a generalized penalized quasi‐likelihood approach. Their method is illustrated using smoking prevention data.  相似文献   

17.
In binary regression, imbalanced data result from the presence of values equal to zero (or one) in a proportion that is significantly greater than the corresponding real values of one (or zero). In this work, we evaluate two methods developed to deal with imbalanced data and compare them to the use of asymmetric links. The results based on simulation study show, that correction methods do not adequately correct bias in the estimation of regression coefficients and that the models with power links and reverse power considered produce better results for certain types of imbalanced data. Additionally, we present an application for imbalanced data, identifying the best model among the various ones proposed. The parameters are estimated using a Bayesian approach, considering the Hamiltonian Monte-Carlo method, utilizing the No-U-Turn Sampler algorithm and the comparisons of models were developed using different criteria for model comparison, predictive evaluation and quantile residuals.  相似文献   

18.
19.
20.
A multiple regression method based on distance analysis and metric scaling is proposed and studied. This method allow us to predict a continuous response variable from several explanatory variables, is compatible with the general linear model and is found to be useful when the predictor variables are both continuous and categorical. Real data examples are given to illustrate the results obtained.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号