首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 558 毫秒
1.
This paper presents a simple computational procedure for generating ‘matching’ or ‘cloning’ datasets so that they have exactly the same fitted multiple linear regression equation. The method is simple to implement and provides an alternative to generating datasets under an assumed model. The advantage is that, unlike the case for the straight model‐based alternative, parameter estimates from the original data and the generated data do not include any model error. This distinction suggests that ‘same fit’ procedures may provide a general and useful alternative to model‐based procedures, and have a wide range of applications. For example, as well as being useful for teaching, cloned datasets can provide a model‐free way of confidentializing data.  相似文献   

2.
In proteomics, identification of proteins from complex mixtures of proteins extracted from biological samples is an important problem. Among the experimental technologies, mass spectrometry (MS) is the most popular one. Protein identification from MS data typically relies on a ‘two-step’ procedure of identifying the peptide first followed by the separate protein identification procedure next. In this setup, the interdependence of peptides and proteins is neglected resulting in relatively inaccurate protein identification. In this article, we propose a Markov chain Monte Carlo based Bayesian hierarchical model, a first of its kind in protein identification, which integrates the two steps and performs joint analysis of proteins and peptides using posterior probabilities. We remove the assumption of independence of proteins by using clustering group priors to the proteins based on the assumption that proteins sharing the same biological pathway are likely to be present or absent together and are correlated. The complete conditionals of the proposed joint model being tractable, we propose and implement a Gibbs sampling scheme for full posterior inference that provides the estimation and statistical uncertainties of all relevant parameters. The model has better operational characteristics compared to two existing ‘one-step’ procedures on a range of simulation settings as well as on two well-studied datasets.  相似文献   

3.
An objective of randomized placebo-controlled preventive HIV vaccine efficacy (VE) trials is to assess the relationship between vaccine effects to prevent HIV acquisition and continuous genetic distances of the exposing HIVs to multiple HIV strains represented in the vaccine. The set of genetic distances, only observed in failures, is collectively termed the ‘mark.’ The objective has motivated a recent study of a multivariate mark-specific hazard ratio model in the competing risks failure time analysis framework. Marks of interest, however, are commonly subject to substantial missingness, largely due to rapid post-acquisition viral evolution. In this article, we investigate the mark-specific hazard ratio model with missing multivariate marks and develop two inferential procedures based on (i) inverse probability weighting (IPW) of the complete cases, and (ii) augmentation of the IPW estimating functions by leveraging auxiliary data predictive of the mark. Asymptotic properties and finite-sample performance of the inferential procedures are presented. This research also provides general inferential methods for semiparametric density ratio/biased sampling models with missing data. We apply the developed procedures to data from the HVTN 502 ‘Step’ HIV VE trial.  相似文献   

4.
Abstract. An objective of randomized placebo‐controlled preventive HIV vaccine efficacy trials is to assess the relationship between the vaccine effect to prevent infection and the genetic distance of the exposing HIV to the HIV strain represented in the vaccine construct. Motivated by this objective, recently a mark‐specific proportional hazards (PH) model with a continuum of competing risks has been studied, where the genetic distance of the transmitting strain is the continuous ‘mark’ defined and observable only in failures. A high percentage of genetic marks of interest may be missing for a variety of reasons, predominantly because rapid evolution of HIV sequences after transmission before a blood sample is drawn from which HIV sequences are measured. This research investigates the stratified mark‐specific PH model with missing marks where the baseline functions may vary with strata. We develop two consistent estimation approaches, the first based on the inverse probability weighted complete‐case (IPW) technique, and the second based on augmenting the IPW estimator by incorporating auxiliary information predictive of the mark. We investigate the asymptotic properties and finite‐sample performance of the two estimators, and show that the augmented IPW estimator, which satisfies a double robustness property, is more efficient.  相似文献   

5.
Summary.  We estimate a joint model of the formation and dissolution of cohabiting and marital unions among British women who were born in 1970. The focus of the analysis is the effect of previous cohabitation and marriage on subsequent partnership transitions. We use a multilevel simultaneous equations event history model to allow for residual correlation between the hazards of moving from an unpartnered state into cohabitation or marriage, converting a cohabiting union into marriage and dissolution of either form of union. A simultaneous modelling approach allows for the joint determination of these transitions, which may otherwise bias estimates of the effects of previous partnership outcomes on later transitions.  相似文献   

6.
The classical competing risks model deals with failure times of items subject to multiple causes of failure. We propose a version of the well-known ‘new better than used’ and ‘new worse than used’ ageing notions for random lifetimes within a specific cause of failure in the competing risks set up. Various properties of these new notions are studied, including the closure under time-dependent scale changes, as well as their connection to the hazard rate functions of the competing risks model. A data analytic example is finally provided.  相似文献   

7.
We derive and investigate a variant of AIC, the Akaike information criterion, for model selection in settings where the observed data is incomplete. Our variant is based on the motivation provided for the PDIO (‘predictive divergence for incomplete observation models’) criterion of Shimodaira (1994, in: Selecting Models from Data: Artificial Intelligence and Statistics IV, Lecture Notes in Statistics, vol. 89, Springer, New York, pp. 21–29). However, our variant differs from PDIO in its ‘goodness-of-fit’ term. Unlike AIC and PDIO, which require the computation of the observed-data empirical log-likelihood, our criterion can be evaluated using only complete-data tools, readily available through the EM algorithm and the SEM (‘supplemented’ EM) algorithm of Meng and Rubin (Journal of the American Statistical Association 86 (1991) 899–909). We compare the performance of our AIC variant to that of both AIC and PDIO in simulations where the data being modeled contains missing values. The results indicate that our criterion is less prone to overfitting than AIC and less prone to underfitting than PDIO.  相似文献   

8.
Cui  Ruifei  Groot  Perry  Heskes  Tom 《Statistics and Computing》2019,29(2):311-333

We consider the problem of causal structure learning from data with missing values, assumed to be drawn from a Gaussian copula model. First, we extend the ‘Rank PC’ algorithm, designed for Gaussian copula models with purely continuous data (so-called nonparanormal models), to incomplete data by applying rank correlation to pairwise complete observations and replacing the sample size with an effective sample size in the conditional independence tests to account for the information loss from missing values. When the data are missing completely at random (MCAR), we provide an error bound on the accuracy of ‘Rank PC’ and show its high-dimensional consistency. However, when the data are missing at random (MAR), ‘Rank PC’ fails dramatically. Therefore, we propose a Gibbs sampling procedure to draw correlation matrix samples from mixed data that still works correctly under MAR. These samples are translated into an average correlation matrix and an effective sample size, resulting in the ‘Copula PC’ algorithm for incomplete data. Simulation study shows that: (1) ‘Copula PC’ estimates a more accurate correlation matrix and causal structure than ‘Rank PC’ under MCAR and, even more so, under MAR and (2) the usage of the effective sample size significantly improves the performance of ‘Rank PC’ and ‘Copula PC.’ We illustrate our methods on two real-world datasets: riboflavin production data and chronic fatigue syndrome data.

  相似文献   

9.
10.
ABSTRACT

Background: Instrumental variables (IVs) have become much easier to find in the “Big data era” which has increased the number of applications of the Two-Stage Least Squares model (TSLS). With the increased availability of IVs, the possibility that these IVs are weak has increased. Prior work has suggested a ‘rule of thumb’ that IVs with a first stage F statistic at least ten will avoid a relative bias in point estimates greater than 10%. We investigated whether or not this threshold was also an efficient guarantee of low false rejection rates of the null hypothesis test in TSLS applications with many IVs.

Objective: To test how the ‘rule of thumb’ for weak instruments performs in predicting low false rejection rates in the TSLS model when the number of IVs is large.

Method: We used a Monte Carlo approach to create 28 original data sets for different models with the number of IVs varying from 3 to 30. For each model, we generated 2000 observations for each iteration and conducted 50,000 iterations to reach convergence in rejection rates. The point estimate was set to 0, and probabilities of rejecting this hypothesis were recorded for each model as a measurement of false rejection rate. The relationship between the endogenous variable and IVs was carefully adjusted to let the F statistics for the first stage model equal ten, thus simulating the ‘rule of thumb.’

Results: We found that the false rejection rates (type I errors) increased when the number of IVs in the TSLS model increased while holding the F statistics for the first stage model equal to 10. The false rejection rate exceeds 10% when TLSL has 24 IVs and exceed 15% when TLSL has 30 IVs.

Conclusion: When more instrumental variables were applied in the model, the ‘rule of thumb’ was no longer an efficient guarantee for good performance in hypothesis testing. A more restricted margin for F statistics is recommended to replace the ‘rule of thumb,’ especially when the number of instrumental variables is large.  相似文献   

11.
We consider observational studies in pregnancy where the outcome of interest is spontaneous abortion (SAB). This at first sight is a binary ‘yes’ or ‘no’ variable, albeit there is left truncation as well as right-censoring in the data. Women who do not experience SAB by gestational week 20 are ‘cured’ from SAB by definition, that is, they are no longer at risk. Our data is different from the common cure data in the literature, where the cured subjects are always right-censored and not actually observed to be cured. We consider a commonly used cure rate model, with the likelihood function tailored specifically to our data. We develop a conditional nonparametric maximum likelihood approach. To tackle the computational challenge we adopt an EM algorithm making use of “ghost copies” of the data, and a closed form variance estimator is derived. Under suitable assumptions, we prove the consistency of the resulting estimator which involves an unbounded cumulative baseline hazard function, as well as the asymptotic normality. Simulation results are carried out to evaluate the finite sample performance. We present the analysis of the motivating SAB study to illustrate the advantages of our model addressing both occurrence and timing of SAB, as compared to existing approaches in practice.  相似文献   

12.
The problem of comparing two independent groups of univariate data in the sense of testing for equivalence is considered for a fully nonparametric setting. The distribution of the data within each group may be a mixture of both a continuous and a discrete component, and no assumptions are made regarding the way in which the distributions of the two groups of data may differ from each other – in particular, the assumption of a shift model is avoided. The proposed equivalence testing procedure for this scenario refers to the median of the independent difference distribution, i.e. to the median of the differences between independent observations from the test group and the reference group, respectively. The procedure provides an asymptotic equivalence test, which is symmetric with respect to the roles of ‘test’ and ‘reference’. It can be described either as a two‐one‐sided‐tests (TOST) approach, or equivalently as a confidence interval inclusion rule. A one‐sided variant of the approach can be applied analogously to non‐inferiority testing problems. The procedure may be generalised to equivalence testing with respect to quantiles other than the median, and is closely related to tolerance interval type inference. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

13.
It is well known that non ignorable item non response may occur when the cause of the non response is the value of the latent variable of interest. In these cases, a refusal by a respondent to answer specific questions in a survey should be treated sometimes as a non ignorable item non response. The Rasch-Rasch model (RRM) is a new two-dimensional item response theory model for addressing non ignorable non response. This article demonstrates the use of the RRM on data from an Italian survey focused on assessment of healthcare workers’ knowledge about sudden infant death syndrome (that is, a context in which non response is presumed to be more likely among individuals with a low level of competence). We compare the performance of the RRM with other models within the Rasch model family that assume the unidimensionality of the latent trait. We conclude that this assumption should be considered unreliable for the data at hand, whereas the RRM provides a better fit of the data.  相似文献   

14.
Within the context of California's public report of coronary artery bypass graft (CABG) surgery outcomes, we first thoroughly review popular statistical methods for profiling healthcare providers. Extensive simulation studies are then conducted to compare profiling schemes based on hierarchical logistic regression (LR) modeling under various conditions. Both Bayesian and frequentist's methods are evaluated in classifying hospitals into ‘better’, ‘normal’ or ‘worse’ service providers. The simulation results suggest that no single method would dominate others on all accounts. Traditional schemes based on LR tend to identify too many false outliers, while those based on hierarchical modeling are relatively conservative. The issue of over shrinkage in hierarchical modeling is also investigated using the 2005–2006 California CABG data set. The article provides theoretical and empirical evidence in choosing the right methodology for provider profiling.  相似文献   

15.
This article provides alternative circular smoothing methods in nonparametric estimation of periodic functions. By treating the data as ‘circular’, we solve the “boundary issue” in the nonparametric estimation treating the data as ‘linear’. By redefining the distance metric and signed distance, we modify many estimators used in the situations involving periodic patterns. In the perspective of ‘nonparametric estimation of periodic functions’, we present the examples in nonparametric estimation of (1) a periodic function, (2) multiple periodic functions, (3) an evolving function, (4) a periodically varying-coefficient model and (5) a generalized linear model with periodically varying coefficient. In the perspective of ‘circular statistics’, we provide alternative approaches to calculate the weighted average and evaluate the ‘linear/circular–linear/circular’ association and regression. Simulation studies and an empirical study of electricity price index have been conducted to illustrate and compare our methods with other methods in the literature.  相似文献   

16.
Two types of bivariate models for categorical response variables are introduced to deal with special categories such as ‘unsure’ or ‘unknown’ in combination with other ordinal categories, while taking additional hierarchical data structures into account. The latter is achieved by the use of different covariance structures for a trivariate random effect. The models are applied to data from the INSIDA survey, where interest goes to the effect of covariates on the association between HIV risk perception (quadrinomial with an ‘unknown risk’ category) and HIV infection status (binary). The final model combines continuation-ratio with cumulative link logits for the risk perception, together with partly correlated and partly shared trivariate random effects for the household level. The results indicate that only age has a significant effect on the association between HIV risk perception and infection status. The proposed models may be useful in various fields of application such as social and biomedical sciences, epidemiology and public health.  相似文献   

17.
The choice of prior distributions for the variances can be important and quite difficult in Bayesian hierarchical and variance component models. For situations where little prior information is available, a ‘nonin-formative’ type prior is usually chosen. ‘Noninformative’ priors have been discussed by many authors and used in many contexts. However, care must be taken using these prior distributions as many are improper and thus, can lead to improper posterior distributions. Additionally, in small samples, these priors can be ‘informative’. In this paper, we investigate a proper ‘vague’ prior, the uniform shrinkage prior (Strawder-man 1971; Christiansen & Morris 1997). We discuss its properties and show how posterior distributions for common hierarchical models using this prior lead to proper posterior distributions. We also illustrate the attractive frequentist properties of this prior for a normal hierarchical model including testing and estimation. To conclude, we generalize this prior to the multivariate situation of a covariance matrix.  相似文献   

18.
We present a statistical methodology for fitting time‐varying rankings, by estimating the strength parameters of the Plackett–Luce multiple comparisons model at regularly spaced times for each ranked item. We use the little‐known method of barycentric rational interpolation to interpolate between the strength parameters so that a competitor's strength can be evaluated at any time. We chose the time‐varying strengths to evolve deterministically rather than stochastically, a preference that we reason often has merit. There are many statistical and computational problems to overcome on fitting anything beyond ‘toy’ data sets. The methodological innovations here include a method for maximizing a likelihood function for many parameters, approximations for modelling tied data and an approach to the elimination of secular drift of the estimated ‘strengths’. The methodology has obvious applications to fields such as marketing, although we demonstrate our approach by analysing a large data set of golf tournament results, in search of an answer to the question ‘who is the greatest golfer of all time?’  相似文献   

19.
Rao (J. Indian Statist. Assoc. 17 (1979) 125) has given a ‘necessary form’ for an unbiased mean square error (MSE) estimator to be ‘uniformly non-negative’. The MSE is of a homogeneous linear estimator ‘subject to a specified constraint’, for a survey population total of a real variable of interest. We present a corresponding theorem when the ‘constraint’ is relaxed. Certain results are added presenting formulae for estimators of MSEs when the variate-values for the sampled individuals are not ascertainable. Though not ascertainable, they are supposed to be suitably estimated either by (1) randomized response techniques covering sensitive issues or by (2) further sampling in ‘subsequent’ stages in specific ways when the initial sampling units are composed of a number of sub-units. Using live numerical data, practical uses of the proposed alternative MSE estimators are demonstrated.  相似文献   

20.
This article reviews semiparametric estimators for limited dependent variable (LDV) models with endogenous regressors, where nonlinearity and nonseparability pose difficulties. We first introduce six main approaches in the linear equation system literature to handle endogenous regressors with linear projections: (i) ‘substitution’ replacing the endogenous regressors with their projected versions on the system exogenous regressors x, (ii) instrumental variable estimator (IVE) based on E{(error) × x} = 0, (iii) ‘model-projection’ turning the original model into a model in terms of only x-projected variables, (iv) ‘system reduced form (RF)’ finding RF parameters first and then the structural form (SF) parameters, (v) ‘artificial instrumental regressor’ using instruments as artificial regressors with zero coefficients, and (vi) ‘control function’ adding an extra term as a regressor to control for the endogeneity source. We then check if these approaches are applicable to LDV models using conditional mean/quantiles instead of linear projection. The six approaches provide a convenient forum on which semiparametric estimators in the literature can be categorized, although there are a few exceptions. The pros and cons of the approaches are discussed, and a small-scale simulation study is provided for some reviewed estimators.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号