首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
We give comments for the paper from Liu et al. (2019) about the Item Response Theory (IRT) model under consideration, the justification to compute Marginal likelihood, about what we learn with the data analysis performed and finally, about the computational issues in this paper.  相似文献   

2.
With the influx of complex and detailed tracking data gathered from electronic tracking devices, the analysis of animal movement data has recently emerged as a cottage industry among biostatisticians. New approaches of ever greater complexity are continue to be added to the literature. In this paper, we review what we believe to be some of the most popular and most useful classes of statistical models used to analyse individual animal movement data. Specifically, we consider discrete-time hidden Markov models, more general state-space models and diffusion processes. We argue that these models should be core components in the toolbox for quantitative researchers working on stochastic modelling of individual animal movement. The paper concludes by offering some general observations on the direction of statistical analysis of animal movement. There is a trend in movement ecology towards what are arguably overly complex modelling approaches which are inaccessible to ecologists, unwieldy with large data sets or not based on mainstream statistical practice. Additionally, some analysis methods developed within the ecological community ignore fundamental properties of movement data, potentially leading to misleading conclusions about animal movement. Corresponding approaches, e.g. based on Lévy walk-type models, continue to be popular despite having been largely discredited. We contend that there is a need for an appropriate balance between the extremes of either being overly complex or being overly simplistic, whereby the discipline relies on models of intermediate complexity that are usable by general ecologists, but grounded in well-developed statistical practice and efficient to fit to large data sets.  相似文献   

3.
Summary.  On-line auctions pose many challenges for the empirical researcher, one of which is the effective and reliable modelling of price paths. We propose a novel way of modelling price paths in eBay's on-line auctions by using functional data analysis. One of the practical challenges is that the functional objects are sampled only very sparsely and unevenly. Most approaches rely on smoothing to recover the underlying functional object from the data, which can be difficult if the data are irregularly distributed. We present a new approach that can overcome this challenge. The approach is based on the ideas of mixed models. Specifically, we propose a semiparametric mixed model with boosting to recover the functional object. As well as being able to handle sparse and unevenly distributed data, the model also results in conceptually more meaningful functional objects. In particular, we motivate our method within the framework of eBay's on-line auctions. On-line auctions produce monotonic increasing price curves that are often correlated across auctions. The semiparametric mixed model accounts for this correlation in a parsimonious way. It also manages to capture the underlying monotonic trend in the data without imposing model constraints. Our application shows that the resulting functional objects are conceptually more appealing. Moreover, when used to forecast the outcome of an on-line auction, our approach also results in more accurate price predictions compared with standard approaches. We illustrate our model on a set of 183 closed auctions for Palm M515 personal digital assistants.  相似文献   

4.
Identifiability has long been an important concept in classical statistical estimation. Historically, Bayesians have been less interested in the concept since, strictly speaking, any parameter having a proper prior distribution also has a proper posterior, and is thus estimable. However, the larger statistical community's recent move toward more Bayesian thinking is largely fueled by an interest in Markov chain Monte Carlo-based analyses using vague or even improper priors. As such, Bayesians have been forced to think more carefully about what has been learned about the parameters of interest (given the data so far), or what could possibly be learned (given an infinite amount of data). In this paper, we propose measures of Bayesian learning based on differences in precision and Kullback–Leibler divergence. After investigating them in the context of some familiar Gaussian linear hierarchical models, we consider their use in a more challenging setting involving two sets of random effects (traditional and spatially arranged), only the sum of which is identified by the data. We illustrate this latter model with an example from periodontal data analysis, where the spatial aspect arises from the proximity of various measurements taken in the mouth. Our results suggest our measures behave sensibly and may be useful in even more complicated (e.g., non-Gaussian) model settings.  相似文献   

5.
A challenging problem in the analysis of high-dimensional data is variable selection. In this study, we describe a bootstrap based technique for selecting predictors in partial least-squares regression (PLSR) and principle component regression (PCR) in high-dimensional data. Using a bootstrap-based technique for significance tests of the regression coefficients, a subset of the original variables can be selected to be included in the regression, thus obtaining a more parsimonious model with smaller prediction errors. We compare the bootstrap approach with several variable selection approaches (jack-knife and sparse formulation-based methods) on PCR and PLSR in simulation and real data.  相似文献   

6.
Subgroup analyses are a routine part of clinical trials to investigate whether treatment effects are homogeneous across the study population. Graphical approaches play a key role in subgroup analyses to visualise effect sizes of subgroups, to aid the identification of groups that respond differentially, and to communicate the results to a wider audience. Many existing approaches do not capture the core information and are prone to lead to a misinterpretation of the subgroup effects. In this work, we critically appraise existing visualisation techniques, propose useful extensions to increase their utility and attempt to develop an effective visualisation approach. We focus on forest plots, UpSet plots, Galbraith plots, subpopulation treatment effect pattern plot, and contour plots, and comment on other approaches whose utility is more limited. We illustrate the methods using data from a prostate cancer study.  相似文献   

7.
In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual – an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect.  相似文献   

8.
9.
ABSTRACT

Both philosophically and in practice, statistics is dominated by frequentist and Bayesian thinking. Under those paradigms, our courses and textbooks talk about the accuracy with which true model parameters are estimated or the posterior probability that they lie in a given set. In nonparametric problems, they talk about convergence to the true function (density, regression, etc.) or the probability that the true function lies in a given set. But the usual paradigms' focus on learning the true model and parameters can distract the analyst from another important task: discovering whether there are many sets of models and parameters that describe the data reasonably well. When we discover many good models we can see in what ways they agree. Points of agreement give us more confidence in our inferences, but points of disagreement give us less. Further, the usual paradigms’ focus seduces us into judging and adopting procedures according to how well they learn the true values. An alternative is to judge models and parameter values, not procedures, and judge them by how well they describe data, not how close they come to the truth. The latter is especially appealing in problems without a true model.  相似文献   

10.
Missing data are often problematic in social network analysis since what is missing may potentially alter the conclusions about what we have observed as tie-variables need to be interpreted in relation to their local neighbourhood and the global structure. Some ad hoc methods for dealing with missing data in social networks have been proposed but here we consider a model-based approach. We discuss various aspects of fitting exponential family random graph (or p-star) models (ERGMs) to networks with missing data and present a Bayesian data augmentation algorithm for the purpose of estimation. This involves drawing from the full conditional posterior distribution of the parameters, something which is made possible by recently developed algorithms. With ERGMs already having complicated interdependencies, it is particularly important to provide inference that adequately describes the uncertainty, something that the Bayesian approach provides. To the extent that we wish to explore the missing parts of the network, the posterior predictive distributions, immediately available at the termination of the algorithm, are at our disposal, which allows us to explore the distribution of what is missing unconditionally on any particular parameter values. Some important features of treating missing data and of the implementation of the algorithm are illustrated using a well-known collaboration network and a variety of missing data scenarios.  相似文献   

11.
Empirical Bayes is a versatile approach to “learn from a lot” in two ways: first, from a large number of variables and, second, from a potentially large amount of prior information, for example, stored in public repositories. We review applications of a variety of empirical Bayes methods to several well‐known model‐based prediction methods, including penalized regression, linear discriminant analysis, and Bayesian models with sparse or dense priors. We discuss “formal” empirical Bayes methods that maximize the marginal likelihood but also more informal approaches based on other data summaries. We contrast empirical Bayes to cross‐validation and full Bayes and discuss hybrid approaches. To study the relation between the quality of an empirical Bayes estimator and p, the number of variables, we consider a simple empirical Bayes estimator in a linear model setting. We argue that empirical Bayes is particularly useful when the prior contains multiple parameters, which model a priori information on variables termed “co‐data”. In particular, we present two novel examples that allow for co‐data: first, a Bayesian spike‐and‐slab setting that facilitates inclusion of multiple co‐data sources and types and, second, a hybrid empirical Bayes–full Bayes ridge regression approach for estimation of the posterior predictive interval.  相似文献   

12.
The use of large-dimensional factor models in forecasting has received much attention in the literature with the consensus being that improvements on forecasts can be achieved when comparing with standard models. However, recent contributions in the literature have demonstrated that care needs to be taken when choosing which variables to include in the model. A number of different approaches to determining these variables have been put forward. These are, however, often based on ad hoc procedures or abandon the underlying theoretical factor model. In this article, we will take a different approach to the problem by using the least absolute shrinkage and selection operator (LASSO) as a variable selection method to choose between the possible variables and thus obtain sparse loadings from which factors or diffusion indexes can be formed. This allows us to build a more parsimonious factor model that is better suited for forecasting compared to the traditional principal components (PC) approach. We provide an asymptotic analysis of the estimator and illustrate its merits empirically in a forecasting experiment based on U.S. macroeconomic data. Overall we find that compared to PC we obtain improvements in forecasting accuracy and thus find it to be an important alternative to PC. Supplementary materials for this article are available online.  相似文献   

13.
On Testing Equality of Distributions of Technical Efficiency Scores   总被引:5,自引:0,他引:5  
The challenge of the econometric problem in production efficiency analysis is that the efficiency scores to be analyzed are unobserved. Statistical properties have recently been discovered for a type of estimator popular in the literature, known as data envelopment analysis (DEA). This opens up a wide range of possibilities for well-grounded statistical inference about the true efficiency scores from their DEA estimates. In this paper we investigate the possibility of using existing tests for the equality of two distributions in such a context. Considering the statistical complications pertinent to our context, we consider several approaches to adapting the Li test to the context and explore their performance in terms of the size and power of the test in various Monte Carlo experiments. One of these approaches shows good performance for both the size and the power of the test, thus encouraging its use in empirical studies. We also present an empirical illustration analyzing the efficiency distributions of countries in the world, following up a recent study by Kumar and Russell (2002), and report very interesting results.  相似文献   

14.
Abstract

A key question for understanding the cross-section of expected returns of equities is the following: which factors, from a given collection of factors, are risk factors, equivalently, which factors are in the stochastic discount factor (SDF)? Though the SDF is unobserved, assumptions about which factors (from the available set of factors) are in the SDF restricts the joint distribution of factors in specific ways, as a consequence of the economic theory of asset pricing. A different starting collection of factors that go into the SDF leads to a different set of restrictions on the joint distribution of factors. The conditional distribution of equity returns has the same restricted form, regardless of what is assumed about the factors in the SDF, as long as the factors are traded, and hence the distribution of asset returns is irrelevant for isolating the risk-factors. The restricted factors models are distinct (nonnested) and do not arise by omitting or including a variable from a full model, thus precluding analysis by standard statistical variable selection methods, such as those based on the lasso and its variants. Instead, we develop what we call a Bayesian model scan strategy in which each factor is allowed to enter or not enter the SDF and the resulting restricted models (of which there are 114,674 in our empirical study) are simultaneously confronted with the data. We use a Student-t distribution for the factors, and model-specific independent Student-t distribution for the location parameters, a training sample to fix prior locations, and a creative way to arrive at the joint distribution of several other model-specific parameters from a single prior distribution. This allows our method to be essentially a scaleable and tuned-black-box method that can be applied across our large model space with little to no user-intervention. The model marginal likelihoods, and implied posterior model probabilities, are compared with the prior probability of 1/114,674 of each model to find the best-supported model, and thus the factors most likely to be in the SDF. We provide detailed simulation evidence about the high finite-sample accuracy of the method. Our empirical study with 13 leading factors reveals that the highest marginal likelihood model is a Student-t distributed factor model with 5 degrees of freedom and 8 risk factors.  相似文献   

15.
16.
In longitudinal data where the timing and frequency of the measurement of outcomes may be associated with the value of the outcome, significant bias can occur. Previous results depended on correct specification of the outcome process and a somewhat unrealistic visit process model. In practice, this will never exactly be the case, so it is important to understand to what degree the results hold when those assumptions are violated in order to guide practical use of the methods. This paper presents theory and the results of simulation studies to extend our previous work to more realistic visit process models, as well as Poisson outcomes. We also assess the effects of several types of model misspecification. The estimated bias in these new settings generally mirrors the theoretical and simulation results of our previous work and provides confidence in using maximum likelihood methods in practice. Even when the assumptions about the outcome process did not hold, mixed effects models fit by maximum likelihood produced at most small bias in estimated regression coefficients, illustrating the robustness of these methods. This contrasts with generalised estimating equations approaches where bias increased in the settings of this paper. The analysis of data from a study of change in neurological outcomes following microsurgery for a brain arteriovenous malformation further illustrate the results.  相似文献   

17.
There are various techniques for dealing with incomplete data; some are computationally highly intensive and others are not as computationally intensive, while all may be comparable in their efficiencies. In spite of these developments, analysis using only the complete data subset is performed when using popular statistical software. In an attempt to demonstrate the efficiencies and advantages of using all available data, we compared several approaches that are relatively simple but efficient alternatives to those using the complete data subset for analyzing repeated measures data with missing values, under the assumption of a multivariate normal distribution of the data. We also assumed that the missing values occur in a monotonic pattern and completely at random. The incomplete data procedure is demonstrated to be more powerful than the procedure of using the complete data subset, generally when the within-subject correlation gets large. One other principal finding is that even with small sample data, for which various covariance models may be indistinguishable, the empirical size and power are shown to be sensitive to misspecified assumptions about the covariance structure. Overall, the testing procedures that do not assume any particular covariance structure are shown to be more robust in keeping the empirical size at the nominal level than those assuming a special structure.  相似文献   

18.
Noting that several rule discovery algorithms in data mining can produce a large number of irrelevant or obvious rules from data, there has been substantial research in data mining that addressed the issue of what makes rules truly 'interesting'. This resulted in the development of a number of interestingness measures and algorithms that find all interesting rules from data. However, these approaches have the drawback that many of the discovered rules, while supposed to be interesting by definition, may actually (1) be obvious in that they logically follow from other discovered rules or (2) be expected given some of the other discovered rules and some simple distributional assumptions. In this paper we argue that this is a paradox since rules that are supposed to be interesting, in reality are uninteresting for the above reason. We show that this paradox exists for various popular interestingness measures and present an abstract characterization of an approach to alleviate the paradox. We finally discuss existing work in data mining that addresses this issue and show how these approaches can be viewed with respect to the characterization presented here.  相似文献   

19.
Summary.  The Sloan digital sky survey is an extremely large astronomical survey that is conducted with the intention of mapping more than a quarter of the sky. Among the data that it is generating are spectroscopic and photometric measurements, both containing information about the red shift of galaxies. The former are precise and easy to interpret but expensive to gather; the latter are far cheaper but correspondingly more difficult to interpret. Recently, Csabai and co-workers have described various calibration techniques aiming to predict red shift from photometric measurements. We investigate what a structured Bayesian approach to the problem can add. In particular, we are interested in providing uncertainty bounds that are associated with the underlying red shifts and the classifications of the galaxies. We find that quite a generic statistical modelling approach, using for the most part standard model ingredients, can compete with much more specific custom-made and highly tuned techniques that are already available in the astronomical literature.  相似文献   

20.
Model-based adaptive spatial sampling for occurrence map construction   总被引:1,自引:0,他引:1  
In many environmental management problems, the construction of occurrence maps of species of interest is a prerequisite to their effective management. However, the construction of occurrence maps is a challenging problem because observations are often costly to obtain (thus incomplete) and noisy (thus imperfect). It is therefore critical to develop tools for designing efficient spatial sampling strategies and for addressing data uncertainty. Adaptive sampling strategies are known to be more efficient than non-adaptive strategies. Here, we develop a model-based adaptive spatial sampling method for the construction of occurrence maps. We apply the method to estimate the occurrence of one of the world’s worst invasive species, the red imported fire ant, in and around the city of Brisbane, Australia. Our contribution is threefold: (i) a model of uncertainty about invasion maps using the classical image analysis probabilistic framework of Hidden Markov Random Fields (HMRF), (ii) an original exact method for optimal spatial sampling with HMRF and approximate solution algorithms for this problem, both in the static and adaptive sampling cases, (iii) an empirical evaluation of these methods on simulated problems inspired by the fire ants case study. Our analysis demonstrates that the adaptive strategy can lead to substantial improvement in occurrence mapping.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号