首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The conditional mixture likelihood method using the absolute difference of the trait values of a sib pair to estimate genetic parameters underlies commonly used method in linkage analysis. Here, the statistical properties of the model are examined. The marginal model with a pseudo-likelihood function based on a sample of the absolute difference of sib-traits is also studied. Both approaches are compared numerically. When genotyping is much more expensive than screening a quantitative trait, it is known that extremely discordant sib pairs provide more powerful linkage tests than randomly sampled sib pairs. The Fisher information about genetic parameters contained in extremely discordant sib pairs is calculated using the marginal mixture model. Our results supplement current research showing that extremely discordant sib pairs are powerful for the linkage detection by demonstrating they also contain more information about other genetic parameters.  相似文献   

2.
Zero inflated Poisson regression is a model commonly used to analyze data with excessive zeros. Although many models have been developed to fit zero-inflated data, most of them strongly depend on the special features of the individual data. For example, there is a need for new models when dealing with truncated and inflated data. In this paper, we propose a new model that is sufficiently flexible to model inflation and truncation simultaneously, and which is a mixture of a multinomial logistic and a truncated Poisson regression, in which the multinomial logistic component models the occurrence of excessive counts. The truncated Poisson regression models the counts that are assumed to follow a truncated Poisson distribution. The performance of our proposed model is evaluated through simulation studies, and our model is found to have the smallest mean absolute error and best model fit. In the empirical example, the data are truncated with inflated values of zero and fourteen, and the results show that our model has a better fit than the other competing models.  相似文献   

3.
The availability of the next generation sequencing (NGS) technology in today's biomedical research has provided new opportunities in scientific discovery of genetic information. The high-throughput NGS technology, especially DNA-seq, is particularly useful in profiling a genome for the analysis of DNA copy number variants (CNVs). The read count (RC) data resulting from NGS technology are massive and information rich. How to exploit the RC data for accurate CNV detection has become a computational and statistical challenge. We provide a statistical online change point method to help detect CNVs in the sequencing RC data in this paper. This method uses the idea of online searching for change point (or breakpoint) with a Markov chain assumption on the breakpoints loci and an iterative computing process via a Bayesian framework. We illustrate that an online change-point detection method is particularly suitable for identifying CNVs in the RC data. The algorithm is applied to the publicly available NCI-H2347 lung cancer cell line sequencing reads data for locating the breakpoints. Extensive simulation studies have been carried out and results show the good behavior of the proposed algorithm. The algorithm is implemented in R and the codes are available upon request.  相似文献   

4.
The mixed random effect model is commonly used in longitudinal data analysis within either frequentist or Bayesian framework. Here we consider a case, in which we have prior knowledge on partial parameters, while no such information on the rest of the parameters. Thus, we use the hybrid approach on the random-effects model with partial parameters. The parameters are estimated via Bayesian procedure, and the rest of parameters by the frequentist maximum likelihood estimation (MLE), simultaneously on the same model. In practice, we often know partial prior information such as, covariates of age, gender, etc. These information can be used, and accurate estimations in mixed random-effects model can be obtained. A series of simulation studies were performed to compare the results with the commonly used random-effects model with and without partial prior information. The results in hybrid estimation (HYB) and MLE were very close to each other. The estimated θ values in with partial prior information model (HYB) were more closer to true θ values, and showed less variances than without partial prior information in MLE. To compare with true θ values, the mean square of errors are much less in HYB than in MLE. This advantage of HYB is very obvious in longitudinal data with a small sample size. The methods of HYB and MLE are applied to a real longitudinal data for illustration purposes.  相似文献   

5.
By sequence homology search, the list of all the functions found and the counts of reads being aligned to them present the functional profile of a metagenomic sample. However, a significant obstacle has been observed in this approach due to the short read length associated with many next-generation sequencing technologies. This includes artificial families, cross-annotations, length bias and conservation bias. The widely applied cut-off methods, such as BLAST E-value, are not able to solve the problems. Following the published successful procedures on the artificial families and the cross-annotation issue, we propose in this paper to use zero-truncated Poisson and Binomial (ZTP-Bin) hierarchical modelling to correct the length bias and the conservation bias. Goodness of fit of the modelling and cross-validation for the prediction using a bioinformatic simulated sample show the validity of this approach. Evaluated on an in vitro-simulated data set, the proposed modelling method outperforms other traditional methods. All three steps were then sequentially applied on real-life metagenomic samples to show that the proposed framework will lead to a more accurate functional profile of a short-read metagenomic sample.  相似文献   

6.
In this article, we present results on the Shannon information (SI) contained in upper(lower) k-record values and associated k-record times. We then establish an interesting relationship between the SI content of a random sample of fixed size, and the SI in the data consisting of sequential maxima. We also consider the information contained in the k-record data from an inverse sampling plan (ISP).  相似文献   

7.
The maximum absolute studentized residual is commonly used for testing for a single outlier in a linear regression model. This test statistic, however, is seldom discussed in a nonlinear regression setting. We simulate the critical values for the tests under various nonlinear models. The associated critical values are found to be very close to one another. Moreover, they are very well approximated using the critical values obtained from F-distributions based on the Bonferroni equations in linear models. The results are promising even in samples of size 6.  相似文献   

8.
Control charts for counted data are commonly designed assuming that counts follow Poisson dynamics. However, in various real situations, the true underlying dynamics of the events are more properly modelled by a negative binomial process. This paper examines the consequences of the Poisson approximation to negative binomial dynamics for counts under CUSUM-type schemes. It is essentially found that, on setting up Poisson dyamics for an underlying negative binomial data structure, the real in-control average run length decreases, whereas the sensitivity of the chart is affected less. These results warn against the routine use of the Poisson assumption in planning control charts for counts.  相似文献   

9.
Regression analysis is one of methods widely used in prediction problems. Although there are many methods used for parameter estimation in regression analysis, ordinary least squares (OLS) technique is the most commonly used one among them. However, this technique is highly sensitive to outlier observation. Therefore, in literature, robust techniques are suggested when data set includes outlier observation. Besides, in prediction a problem, using the techniques that reduce the effectiveness of outlier and using the median as a target function rather than an error mean will be more successful in modeling these kinds of data. In this study, a new parameter estimation method using the median of absolute rate obtained by division of the difference between observation values and predicted values by the observation value and based on particle swarm optimization was proposed. The performance of the proposed method was evaluated with a simulation study by comparing it with OLS and some other robust methods in the literature.  相似文献   

10.
In this ‘Big Data’ era, statisticians inevitably encounter data generated from various disciplines. In particular, advances in bio‐technology have enabled scientists to produce enormous datasets in various biological experiments. In the last two decades, we have seen high‐throughput microarray data resulting from various genomic studies. Recently, next generation sequencing (NGS) technology has been playing an important role in the study of genomic features, resulting in vast amount of NGS data. One frequent application of NGS technology is in the study of DNA copy number variants (CNVs). The resulting NGS read count data are then used by researchers to formulate their various scientific approaches to accurately detect CNVs. Computational and statistical approaches to the detection of CNVs using NGS data are, however, very limited at present. In this review paper, we will focus on read‐depth analysis in CNV detection and give a brief summary of currently used statistical analysis methods in searching for CNVs using NGS data. In addition, based on the review, we discuss the challenges we face and future research directions. The ultimate goal of this review paper is to give a timely exposition of the surveyed statistical methods to researchers in related fields.  相似文献   

11.
In this paper we discuss a new theoretical basis for perturbation methods. In developing this new theoretical basis, we define the ideal measures of data utility and disclosure risk. Maximum data utility is achieved when the statistical characteristics of the perturbed data are the same as that of the original data. Disclosure risk is minimized if providing users with microdata access does not result in any additional information. We show that when the perturbed values of the confidential variables are generated as independent realizations from the distribution of the confidential variables conditioned on the non-confidential variables, they satisfy the data utility and disclosure risk requirements. We also discuss the relationship between the theoretical basis and some commonly used methods for generating perturbed values of confidential numerical variables.  相似文献   

12.
A multiplicative seasonal forecasting model for cumulative events in which, conditional on end- of-season totals being given and seasonal shape being known, it is shown that events occurring within the season are multinomially distributed is presented. The model uses the information contained in the arrival of new events to obtain a posterior distribution for end-of-season totals. Bayesian forecasts are obtained recursively in two stages: first, by predicting the expected number and variance of event counts in future intervals within the remaining season, and then by predicting revised means and variances for end-of-season totals based on the most recent forecast error.  相似文献   

13.
In the current study, a new method by the weighting absolute centered external variable (WCEV) was proposed to stabilize heteroscedasticity for butterfly-distributed residuals (BDRs). After giving brief information about heteroscedasticity and BDRs, WCEV was introduced. The WCEV and commonly used variance stabilizing methods are compared on a simple and a multiple regression model. The WCEV was also tested for other type of heteroscedasticity patterns. In addition to heteroscedasticity, other regression assumptions were checked for the WCEV.  相似文献   

14.
In the past decades, the number of variables explaining observations in different practical applications increased gradually. This has led to heavy computational tasks, despite of widely using provisional variable selection methods in data processing. Therefore, more methodological techniques have appeared to reduce the number of explanatory variables without losing much of the information. In these techniques, two distinct approaches are apparent: ‘shrinkage regression’ and ‘sufficient dimension reduction’. Surprisingly, there has not been any communication or comparison between these two methodological categories, and it is not clear when each of these two approaches are appropriate. In this paper, we fill some of this gap by first reviewing each category in brief, paying special attention to the most commonly used methods in each category. We then compare commonly used methods from both categories based on their accuracy, computation time, and their ability to select effective variables. A simulation study on the performance of the methods in each category is generated as well. The selected methods are concurrently tested on two sets of real data which allows us to recommend conditions under which one approach is more appropriate to be applied to high-dimensional data.  相似文献   

15.
Soils from locations near Amsterdam, Dublin, Exeter (SW England), Göttingen, Paris and in the Grizedale Forest (NW England) were placed in lysimeters in a Norway spruce (Picea abies) stand in the Grizedale Forest and the soil solutions, rainfall and throughfall were monitored for 54 weeks. Two-way data tables for SO4, NH4, NO3, Al, Mg, Ca and pH were analysed by median polishing, in which each observed data value is decomposed into a typical value plus a row effect plus a column effect plus a residual. The row effects suggested that SO4 was the main acidifying influence in the Göttingen soil, but in the other soils SO4 had less effect than did NO3. The column effects showed little variation over the 54 weeks. Low values for NO3 in March-May and September were associated with periods of low or zero rainfall. The residuals contained from 27% (Al) to 59% (NO3) of the variation. Some were interpretable and contained useful information. Where the residuals are considered to represent uninterpretable random variation (‘noise’) they can be subtracted from the original data to give corrected data for other types of analysis. In median polishing, changing a few values in the input data, even drastically, will result in unusual residuals only in the cells with altered values. Hence, it is a useful technique for screening two-way data tables obtained from field experiments.  相似文献   

16.
In modeling count data collected from manufacturing processes, economic series, disease outbreaks and ecological surveys, there are usually a relatively large or small number of zeros compared to positive counts. Such low or high frequencies of zero counts often require the use of underdispersed or overdispersed probability models for the underlying data generating mechanism. The commonly used models such as generalized or zero-inflated Poisson distributions are parametric and can usually account for only the overdispersion, but such distributions are often found to be inadequate in modeling underdispersion because of the need for awkward parameter or support restrictions. This article introduces a flexible class of semiparametric zero-altered models which account for both underdispersion and overdispersion and includes other familiar models such as those mentioned above as special cases. Consistency and asymptotic normality of the estimator of the dispersion parameter are derived under general conditions. Numerical support for the performance of the proposed method of inference is presented for the case of common discrete distributions.  相似文献   

17.
We derive an explicit, closed form expression for the double generating function of the corresponding counts of occurrence, within a finite time horizon, of the single patterns contained in a compound pattern. The expression is in terms of a basic single, and a basic joint, generating functions for which exact solutions exist in the literature. The single generating function is associated with the basic waiting time for the first occurrence of the compound pattern. The joint generating function is that for the waiting time to reach a given single pattern and the associated counts of occurrence, within that waiting time, of the single patterns contained in the compound pattern. The literature on patterns is huge. Also, there are results that establish links between generating functions for counts of occurrence of the single patterns contained in a compound pattern with generating functions of some more complex waiting times associated with that compound pattern. The latter waiting times are known in the literature with names such as sooner, or later waiting times, or generalisations of such. On the other hand, our result fills a gap in the literature by providing a neat link connecting the generating functions of the basic quantities associated with occurrence of compound patterns.  相似文献   

18.
The asymptotic distributions of squared and absolute residual autocorrelations for GARCH model estimated by M-estimators are derived. Two diagnostic tests are developed which can be used to check the adequacy of GARCH model fitted by using M-estimators. Simulation results show that the empirical sizes of both tests are close to the nominal size in most of the cases. The power of test based on absolute residual autocorrelation is found better than test based on squared residual autocorrelations. Our results reveal that there are estimators that can fit GARCH-type models better than the commonly used quasi-maximum likelihood estimator under non normal errors. An application to real data set is also presented.  相似文献   

19.
The Cash statistic, also known as the C statistic, is commonly used for the analysis of low-count Poisson data, including data with null counts for certain values of the independent variable. The use of this statistic is especially attractive for low-count data that cannot be combined, or re-binned, without loss of resolution. This paper presents a new maximum-likelihood solution for the best-fit parameters of a linear model using the Poisson-based Cash statistic. The solution presented in this paper provides a new and simple method to measure the best-fit parameters of a linear model for any Poisson-based data, including data with null counts. In particular, the method enforces the requirement that the best-fit linear model be non-negative throughout the support of the independent variable. The method is summarized in a simple algorithm to fit Poisson counting data of any size and counting rate with a linear model, by-passing entirely the use of the traditional χ2 statistic.  相似文献   

20.
Joinpoint regression model identifies significant changes in the trends of the incidence, mortality, and survival of a specific disease in a given population. The purpose of the present study is to develop an age-stratified Bayesian joinpoint regression model to describe mortality trend assuming that the observed counts are probabilistically characterized by the Poisson distribution. The proposed model is based on Bayesian model selection criteria with the smallest number of joinpoints that are sufficient to explain the Annual Percentage Change. The prior probability distributions are chosen in such a way that they are automatically derived from the model index contained in the model space. The proposed model and methodology estimates the age-adjusted mortality rates in different epidemiological studies to compare the trends by accounting the confounding effects of age. In developing the subject methods, we use the cancer mortality counts of adult lung and bronchus cancer, and brain and other Central Nervous System cancer patients obtained from the Surveillance Epidemiology and End Results data base of the National Cancer Institute.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号