首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In a pharmacokinetic drug interaction study using a three‐period, three‐treatment (drug A, drug B, and drugs A and B concomitantly) crossover design, pharmacokinetic parameters for either drug are only measured in two of the three periods. Similar missing data problems can arise for a four‐period, four‐treatment crossover pharmacokinetic comparability study. This paper investigates whether the usual ANOVA model for the crossover design can be applied under this pattern of missing data. It is shown that the model can still be used, contrary to a belief that a new one is needed. The effect of this type of missing data pattern on the statistical properties of treatment, period and carryover effect estimates was derived and illustrated by means of simulations and an example. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

2.
It has become common to adopt a hierarchical model structure when comparing the performance of multiple health-care providers. This structure allows some variation in such measures, beyond that explained by sampling variation, to be “normal,” in recognition of the fact that risk-adjustment is never perfect. The shrinkage estimates arising from such a model structure also have appealing properties.

It is not immediately clear, however, how “unusual” providers, that is, any with particularly high or low rates, can be identified based on such a model. Given that some variation in underlying rates is assumed to be the norm, we argue that it is not generally appropriate to identify a provider as interesting based only on evidence of it lying above or below the population mean. We note with concern, however, that this practice is not uncommon.

We examine in detail three possible strategies for identifying unusual providers, carefully distinguishing between statistical “outliers” and “extremes.” A two-level normal model is used for mathematical simplicity, but we note that much of the discussion also applies to alternative data structures. Further, we emphasize throughout that each approach can be viewed as resulting from a Bayesian or a classical perspective. Three worked examples provide additional insight.  相似文献   

3.
Leverage values are being used in regression diagnostics as measures of unusual observations in the X-space. Detection of high leverage observations or points is crucial due to their responsibility for masking outliers. In linear regression, high leverage points (HLP) are those that stand far apart from the center (mean) of the data and hence the most extreme points in the covariate space get the highest leverage. But Hosemer and Lemeshow [Applied logistic regression, Wiley, New York, 1980] pointed out that in logistic regression, the leverage measure contains a component which can make the leverage values of genuine HLP misleadingly very small and that creates problem in the correct identification of the cases. Attempts have been made to identify the HLP based on the median distances from the mean, but since they are designed for the identification of a single high leverage point they may not be very effective in the presence of multiple HLP due to their masking (false–negative) and swamping (false–positive) effects. In this paper we propose a new method for the identification of multiple HLP in logistic regression where the suspect cases are identified by a robust group deletion technique and they are confirmed using diagnostic techniques. The usefulness of the proposed method is then investigated through several well-known examples and a Monte Carlo simulation.  相似文献   

4.
Digits in statistical data produced by natural or social processes are often distributed in a manner described by ‘Benford's law’. Recently, a test against this distribution was used to identify fraudulent accounting data. This test is based on the supposition that first, second, third, and other digits in real data follow the Benford distribution while the digits in fabricated data do not. Is it possible to apply Benford tests to detect fabricated or falsified scientific data as well as fraudulent financial data? We approached this question in two ways. First, we examined the use of the Benford distribution as a standard by checking the frequencies of the nine possible first and ten possible second digits in published statistical estimates. Second, we conducted experiments in which subjects were asked to fabricate statistical estimates (regression coefficients). The digits in these experimental data were scrutinized for possible deviations from the Benford distribution. There were two main findings. First, both digits of the published regression coefficients were approximately Benford distributed or at least followed a pattern of monotonic decline. Second, the experimental results yielded new insights into the strengths and weaknesses of Benford tests. Surprisingly, first digits of faked data also exhibited a pattern of monotonic decline, while second, third, and fourth digits were distributed less in accordance with Benford's law. At least in the case of regression coefficients, there were indications that checks for digit-preference anomalies should focus less on the first (i.e. leftmost) and more on later digits.  相似文献   

5.
Background: In age‐related macular degeneration (ARMD) trials, the FDA‐approved endpoint is the loss (or gain) of at least three lines of vision as compared to baseline. The use of such a response endpoint entails a potentially severe loss of information. A more efficient strategy could be obtained by using longitudinal measures of the change in visual acuity. In this paper we investigate, by using data from two randomized clinical trials, the mean and variance–covariance structures of the longitudinal measurements of the change in visual acuity. Methods: Individual patient data were collected in 234 patients in a randomized trial comparing interferon‐ α with placebo and in 1181 patients in a randomized trial comparing three active doses of pegaptanib with sham. A linear model for longitudinal data was used to analyze the repeated measurements of the change in visual acuity. Results: For both trials, the data were adequately summarized by a model that assumed a quadratic trend for the mean change in visual acuity over time, a power variance function, and an antedependence correlation structure. The power variance function was remarkably similar for the two datasets and involved the square root of the measurement time. Conclusions: The similarity of the estimated variance functions and correlation structures for both datasets indicates that these aspects may be a genuine feature of the measurements of changes in visual acuity in patients with ARMD. The feature can be used in the planning and analysis of trials that use visual acuity as the clinical endpoint of interest. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

6.
The ongoing evolution of genomics and bioinformatics has an overwhelming impact on medical and clinical research, albeit this development is often marked by genuine controversies as well as lack of scientific clarities and acumen. The search for disease genes and the gene–environment interaction has drawn considerable interdisciplinary scientific attention: environmental health, clinical and medical sciences, biological as well as computational and statistical sciences are most noteworthy. Statistical reasoning (quantitative modeling and analysis perspectives) has a focal stand in this respect while data mining resolutions are far from being statistically fully understood or interpretable. The use of human subjects, though unavoidable, under various extraneous restraints, medical ethics perspectives, and human rights undercurrents, has raised concern all over the world, especially in the developing countries. In the genomics context, clinical trials may be designed on chips and yet there are greater challenges due to the curse of dimensionality perspectives. Some of these challenging statistical issues in medical and clinical research (with emphasis on clinical trials) are appraised in the light of existing statistical tools, which are available for less complex clinical research problems.  相似文献   

7.
Summary Owing to enormous advances in data acquisition and processing technology the study of high (or ultra) frequency data has become an important area of econometrics. At least three avenues of econometric methods have been followed to analyze high frequency financial data: Models in tick time ignoring the time dimension of sampling, duration models specifying the time span between transactions and, finally, fixed time interval techniques. Starting from the strong assumption that quotes are irregularly generated from an underlying exogeneous arrival process, fixed interval models promise feasibility of familiar time series techniques. Moreover, fixed interval analysis is a natural means to investigate multivariate dynamics. In particular, models of price discovery are implemented in this venue of high frequency econometrics. Recently, a sound statistical theory of ‘realized volatility’ has been developed. In this framework high frequency log price changes are seen as a means to observe volatility at some lower frequency.  相似文献   

8.
In many two‐period, two‐treatment (2 × 2) crossover trials, for each subject, a continuous response of interest is measured before and after administration of the assigned treatment within each period. The resulting data are typically used to test a null hypothesis involving the true difference in treatment response means. We show that the power achieved by different statistical approaches is greatly influenced by (i) the ‘structure’ of the variance–covariance matrix of the vector of within‐subject responses and (ii) how the baseline (i.e., pre‐treatment) responses are accounted for in the analysis. For (ii), we compare different approaches including ignoring one or both period baselines, using a common change from baseline analysis (which we advise against), using functions of one or both baselines as period‐specific or period‐invariant covariates, and doing joint modeling of the post‐baseline and baseline responses with corresponding mean constraints for the latter. Based on theoretical arguments and simulation‐based type I error rate and power properties, we recommend an analysis of covariance approach that uses the within‐subject difference in treatment responses as the dependent variable and the corresponding difference in baseline responses as a covariate. Data from three clinical trials are used to illustrate the main points. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

9.
Skew scale mixtures of normal distributions are often used for statistical procedures involving asymmetric data and heavy-tailed. The main virtue of the members of this family of distributions is that they are easy to simulate from and they also supply genuine expectation-maximization (EM) algorithms for maximum likelihood estimation. In this paper, we extend the EM algorithm for linear regression models and we develop diagnostics analyses via local influence and generalized leverage, following Zhu and Lee's approach. This is because Cook's well-known approach cannot be used to obtain measures of local influence. The EM-type algorithm has been discussed with an emphasis on the skew Student-t-normal, skew slash, skew-contaminated normal and skew power-exponential distributions. Finally, results obtained for a real data set are reported, illustrating the usefulness of the proposed method.  相似文献   

10.
In order to describe or generate so-called outliers in univariate statistical data, contamination models are often used. These models assume that k out of n independent random variables are shifted or multiplicated by some constant, whereas the other observations still come i.i.d. from some common target distribution. Of course, these contaminants do not necessarily stick out as the extremes in the sample. Moreover, it is the amount and magnitude of ‘contamination” which determines the number of obvious outliers. Using the concept of Davies and Gather (1993) to formalize the outlier notion we quantify the amount of contamination needed to produce a prespecified expected number of ‘genuine’ outliers. In particular, we demonstrate that for sample of moderate size from a normal target distribution a rather large shift of the contaminants is necessary to yield a certain expected number of outliers. Such an insight is of interest when designing simulation studies where outliers shoulod occur as well as in theoretical investigations on outliers.  相似文献   

11.
ABSTRACT

The display of the data by means of contingency tables is used in different approaches to statistical inference, for example, to broach the test of homogeneity of independent multinomial distributions. We develop a Bayesian procedure to test simple null hypotheses versus bilateral alternatives in contingency tables. Given independent samples of two binomial distributions and taking a mixed prior distribution, we calculate the posterior probability that the proportion of successes in the first population is the same as in the second. This posterior probability is compared with the p-value of the classical method, obtaining a reconciliation between both results, classical and Bayesian. The obtained results are generalized for r × s tables.  相似文献   

12.
When the X ¥ control chart is used to monitor a process, three parameters should be determined: the sample size, the sampling interval between successive samples, and the control limits of the chart. Duncan presented a cost model to determine the three parameters for an X ¥ chart. Alexander et al. combined Duncan's cost model with the Taguchi loss function to present a loss model for determining the three parameters. In this paper, the Burr distribution is employed to conduct the economic-statistical design of X ¥ charts for non-normal data. Alexander's loss model is used as the objective function, and the cumulative function of the Burr distribution is applied to derive the statistical constraints of the design. An example is presented to illustrate the solution procedure. From the results of the sensitivity analyses, we find that small values of the skewness coefficient have no significant effect on the optimal design; however, a larger value of skewness coefficient leads to a slightly larger sample size and sampling interval, as well as wider control limits. Meanwhile, an increase on the kurtosis coefficient results in an increase on the sample size and wider control limits.  相似文献   

13.
Zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models are recommended for handling excessive zeros in count data. For various reasons, researchers may not address zero inflation. This paper helps educate researchers on (1) the importance of accounting for zero inflation and (2) the consequences of misspecifying the statistical model. Using simulations, we found that when the zero inflation in the data was ignored, estimation was poor and statistically significant findings were missed. When overdispersion within the zero-inflated data was ignored, poor estimation and inflated Type I errors resulted. Recommendations on when to use the ZINB and ZIP models are provided. In an illustration using a two-step model selection procedure (likelihood ratio test and the Vuong test), the ZIP model was correctly identified only when the distributions had moderate means and sample sizes and did not correctly identify the ZINB model or the zero inflation in the ZIP and ZINB distributions.  相似文献   

14.
When genuine panel data samples are not available, repeated cross-sectional surveys can be used to form so-called pseudo panels. In this article, we investigate the properties of linear pseudo panel data estimators with fixed number of cohorts and time observations. We extend standard linear pseudo panel data setup to models with factor residuals by adapting the quasi-differencing approach developed for genuine panels. In a Monte Carlo study, we find that the proposed procedure has good finite sample properties in situations with endogeneity, cohort interactive effects, and near nonidentification. Finally, as an illustration the proposed method is applied to data from Ecuador to study labor supply elasticity. Supplementary materials for this article are available online.  相似文献   

15.
林存洁  李扬 《统计研究》2016,33(11):109-112
在大数据时代,传统的统计学是否还有用武之地成为很多人的争议。本文以ARGO模型为案例,介绍了统计方法在大数据分析中的应用和取得的成果,并从统计学的角度出发,提出改进的措施与方法。通过ARGO模型的分析结果发现,大数据分析的很多根本性问题仍然是统计问题,而数据中的统计规律仍然是数据分析要挖掘的最大价值,这也意味着统计思想在大数据分析中只能越来越重要。而对于结构复杂、来源多样的大数据来说,统计学方法也需要新的探索和尝试,这将是统计学所面临的机遇和挑战。  相似文献   

16.
Classical regression analysis is usually performed in two steps. In the first step, an appropriate model is identified to describe the data generating process and in the second step, statistical inference is performed in the identified model. An intuitively appealing approach to the design of experiment for these different purposes are sequential strategies, which use parts of the sample for model identification and adapt the design according to the outcome of the identification steps. In this article, we investigate the finite sample properties of two sequential design strategies, which were recently proposed in the literature. A detailed comparison of sequential designs for model discrimination in several regression models is given by means of a simulation study. Some non-sequential designs are also included in the study.  相似文献   

17.
18.
Abstract.  Collapsibility means that the same statistical result of interest can be obtained before and after marginalization over some variables. In this paper, we discuss three kinds of collapsibility for directed acyclic graphs (DAGs): estimate collapsibility, conditional independence collapsibility and model collapsibility. Related to collapsibility, we discuss removability of variables from a DAG. We present conditions for these three different kinds of collapsibility and relationships among them. We give algorithms to find a minimum variable set containing a variable subset of interest onto which a statistical result is collapsible.  相似文献   

19.
Rapid technological advances have resulted in continual changes in data acquisition and reporting processes. While such advances have benefited research in these areas, the changing technologies have, at the same time, created difficulty for statistical analysis by generating outdated data which are incompatible with data based on newer technology. Relationships between these incompatible variables are complicated; not only they are stochastic, but also often depend on other variables, rendering even a simple statistical analysis, such as estimation of a population mean, difficult in the presence of mixed data formats. Thus, technological advancement has brought forth, from the statistical perspective, a methodological problem of the analysis of newer data with outdated data. In this paper, we discuss general principles for addressing the statistical issues related to the analysis of incompatible data. The approach taken to the task at hand has three desirable properties, it is readily understood, since it builds upon a linear regression setting, it is flexible to allow for data incompatibility in either the response or covariate, and it is not computationally intensive. In addition, inferences may be made for a latent variable of interest. Our considerations to this problem are motivated by the analysis of delta wave counts, as a surrogate for sleep disorder, in the sleep laboratory of the Department of Psychiatry, University of Pittsburgh Medical Center, where two major changes had occurred in the acquisition of this data, resulting in three mixed formats. By developing appropriate methods for addressing this issue, we provide statistical advancement that is compatible with technological advancement.  相似文献   

20.
Lightning data collected over three dry seasons from the detection system operated by the British Columbia Ministry of Forests were analyzed to estimate the distribution of lightning signal strength and component detection efficiencies. The analysis was based on more than 165,000 lightning-strike records, where component detectors served both as lightning finders and as data collectors for evaluating the performance of other component detectors in the network. In spite of the unusual feature of this application involving a system evaluating itself, much was revealed to identify weaknesses and suggest improvements. A postanalysis system-modification update is included.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号