首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this study an attempt is made to assess statistically the validity of two theories as to the origin of comets. This subject still leads to great controversy amongst astronomers but recently two main schools of thought have developed.

These are that comets are of

(i) planetary origin,

(ii) interstellar origin.

Many theories have been expanded within each school of thought but at the present time one theory in each is generally accepted. This paper sets out to identify the statistical implications of each theory and evaluate each theory in terms of their implications.  相似文献   


2.
Bootstrapping has been used as a diagnostic tool for validating model results for a wide array of statistical models. Here we evaluate the use of the non-parametric bootstrap for model validation in mixture models. We show that the bootstrap is problematic for validating the results of class enumeration and demonstrating the stability of parameter estimates in both finite mixture and regression mixture models. In only 44% of simulations did bootstrapping detect the correct number of classes in at least 90% of the bootstrap samples for a finite mixture model without any model violations. For regression mixture models and cases with violated model assumptions, the performance was even worse. Consequently, we cannot recommend the non-parametric bootstrap for validating mixture models.

The cause of the problem is that when resampling is used influential individual observations have a high likelihood of being sampled many times. The presence of multiple replications of even moderately extreme observations is shown to lead to additional latent classes being extracted. To verify that these replications cause the problems we show that leave-k-out cross-validation where sub-samples taken without replacement does not suffer from the same problem.  相似文献   


3.
In this paper, we discuss the implementation of fully Bayesian analysis of dynamic image sequences in the context of stochastic deformable templates for shape modelling, Markov/Gibbs random fields for modelling textures, and dynomation.

Throughout, Markov chain Monte Carlo algorithms are used to perform the Bayesian calculations.  相似文献   


4.
The purpose of this paper is to identify a relationship between pupils' mathematics and reading test scores and the characteristics of students themselves, stratifying for classes, schools and geographical areas. The data set of interest contains detailed information about more than 500,000 students at the first year of junior secondary school in the year 2012/2013, provided by the Italian Institute for the Evaluation of Educational System. The innovation of this work is in the use of multivariate multilevel models, in which the outcome is bivariate: reading and mathematics achievement. Using the bivariate outcome enables researchers to analyze the correlations between achievement levels in the two fields and to predict statistically significant school and class effects after adjusting for pupil's characteristics. The statistical model employed here explicates account for the potential covariance between the two topics, and at the same time it allows the school effect to vary among them. The results show that while for most cases the direction of school's effect is coherent for reading and mathematics (i.e. positive/negative), there are cases where internal school factors lead to different performances in the two fields.  相似文献   

5.
The 1978 European Community Typology for Agricultural Holdings is described in this paper and contrasted with a data based, polythetic-multivariate classification based on cluster analysis.

The requirement to reduce the size of the variable set employed in an optimisation-partition method of clustering suggested the value of principal components and factor analysis for the identification of major ‘source’ dimensions against which to measure farm differences and similarities.

The Euclidean cluster analysis incorporating the reduced dimensions quickly converged to a stable solution and was little influenced by the initial number or nature of ‘seeding’ partitions of the data.

The assignment of non-sampled observations from the population to cluster classes was completed using classification functions.

The final scheme, based on a sample of over 2,000 observations, was found to be both capable of interpretation and meaningful in terms of agricultural structure and practice and much superior in its explanatory power when compared with a version of the principal activity typology.  相似文献   


6.
Summary.  Traditional studies of school differences in educational achievement use multilevel modelling techniques to take into account the nesting of pupils within schools. However, educational data are known to have more complex non-hierarchical structures. The potential importance of such structures is apparent when considering the effect of pupil mobility during secondary schooling on educational achievement. Movements of pupils between schools suggest that we should model pupils as belonging to the series of schools that are attended and not just their final school. Since these school moves are strongly linked to residential moves, it is important to explore additionally whether achievement is also affected by the history of neighbourhoods that are lived in. Using the national pupil database, this paper combines multiple membership and cross-classified multilevel models to explore simultaneously the relationships between secondary school, primary school, neighbourhood and educational achievement. The results show a negative relationship between pupil mobility and achievement, the strength of which depends greatly on the nature and timing of these moves. Accounting for pupil mobility also reveals that schools and neighbourhoods are more important than shown by previous analysis. A strong primary school effect appears to last long after a child has left that phase of schooling. The additional effect of neighbourhoods, in contrast, is small. Crucially, the rank order of school effects across all types of pupil is sensitive to whether we account for the complexity of the multilevel data structure.  相似文献   

7.
This paper analyses direct and indirect forms of dependence in the probability of scoring in a handball match, taking into account the mutual influence of both playing teams. Non-identical distribution (i.d.) and non-stationarity, which are commonly observed in sport games, are studied through the specification of time-varying parameters.

The model accounts for the binary character of the dependent variable, and for unobserved heterogeneity. The parameter dynamics is specified by a first-order auto-regressive process.

Data from the Handball World Championships 2001–2005 show that the dynamics of handball violate both independence and i.d., in some cases having a non-stationary behaviour.  相似文献   


8.
Alternating logistic regressions (ALRs) seem to offer some of the advantages of marginal models estimated via generalized estimating equations (GEE) and generalized linear mixed models (GLMMs). Via simulation study we compared ALRs to marginal models estimated via GEE and subject-specific models estimated via GLMMs, with a focus on estimation of the correlation structure in three-level data sets (e.g. students in classes in schools). Data set size and structure, and amount of correlation in the data sets were varied. For simple correlation structures, ALRs performed well. For three-level correlation structures, all approaches, but especially ALRs, had difficulty assigning the correlation to the correct level, though sample sizes used were small. In addition, ALRs and GEEs had trouble attaching correct inference to the mean effects, though this improved as overall sample size improved. ALRs are a valuable addition to the data analyst's toolkit, though care should be taken when modelling data with three-level structures.  相似文献   

9.
We consider wavelet-based non linear estimators, which are constructed by using the thresholding of the empirical wavelet coefficients, for the mean regression functions with strong mixing errors and investigate their asymptotic rates of convergence. We show that these estimators achieve nearly optimal convergence rates within a logarithmic term over a large range of Besov function classes Bsp, q. The theory is illustrated with some numerical examples.

A new ingredient in our development is a Bernstein-type exponential inequality, for a sequence of random variables with certain mixing structure and are not necessarily bounded or sub-Gaussian. This moderate deviation inequality may be of independent interest.  相似文献   


10.
In this paper, we study, by means of randomized sampling, the long-run stability of some open Markov population fed with time-dependent Poisson inputs. We show that state probabilities within transient states converge—even when the overall expected population dimension increases without bound—under general conditions on the transition matrix and input intensities.

Following the convergence results, we obtain ML estimators for a particular sequence of input intensities, where the sequence of new arrivals is modeled by a sigmoidal function. These estimators allow for the forecast, by confidence intervals, of the evolution of the relative population structure in the transient states.

Applying these results to the study of a consumption credit portfolio, we estimate the implicit default rate.  相似文献   


11.
Permutation tests for symmetry are suggested using data that are subject to right censoring. Such tests are directly relevant to the assumptions that underlie the generalized Wilcoxon test since the symmetric logistic distribution for log-errors has been used to motivate Wilcoxon scores in the censored accelerated failure time model. Its principal competitor is the log-rank (LGR) test motivated by an extreme value error distribution that is positively skewed. The proposed one-sided tests for symmetry against the alternative of positive skewness are directly relevant to the choice between usage of these two tests.

The permutation tests use statistics from the weighted LGR class normally used for making two-sample comparisons. From this class, the test using LGR weights (all weights equal) showed the greatest discriminatory power in simulations that compared the possibility of logistic errors versus extreme value errors.

In the test construction, a median estimate, determined by inverting the Kaplan–Meier estimator, is used to divide the data into a “control” group to its left that is compared with a “treatment” group to its right. As an unavoidable consequence of testing symmetry, data in the control group that have been censored become uninformative in performing this two-sample test. Thus, early heavy censoring of data can reduce the effective sample size of the control group and result in diminished power for discriminating symmetry in the population distribution.  相似文献   


12.
Efficient, accurate, and fast Markov Chain Monte Carlo estimation methods based on the Implicit approach are proposed. In this article, we introduced the notion of Implicit method for the estimation of parameters in Stochastic Volatility models.

Implicit estimation offers a substantial computational advantage for learning from observations without prior knowledge and thus provides a good alternative to classical inference in Bayesian method when priors are missing.

Both Implicit and Bayesian approach are illustrated using simulated data and are applied to analyze daily stock returns data on CAC40 index.  相似文献   


13.
We propose a consistent and locally efficient method of estimating the model parameters of a logistic mixed effect model with random slopes. Our approach relaxes two typical assumptions: the random effects being normally distributed, and the covariates and random effects being independent of each other. Adhering to these assumptions is particularly difficult in health studies where, in many cases, we have limited resources to design experiments and gather data in long‐term studies, while new findings from other fields might emerge, suggesting the violation of such assumptions. So it is crucial to have an estimator that is robust to such violations; then we could make better use of current data harvested using various valuable resources. Our method generalizes the framework presented in Garcia & Ma (2016) which also deals with a logistic mixed effect model but only considers a random intercept. A simulation study reveals that our proposed estimator remains consistent even when the independence and normality assumptions are violated. This contrasts favourably with the traditional maximum likelihood estimator which is likely to be inconsistent when there is dependence between the covariates and random effects. Application of this work to a study of Huntington's disease reveals that disease diagnosis can be enhanced using assessments of cognitive performance. The Canadian Journal of Statistics 47: 140–156; 2019 © 2019 Statistical Society of Canada  相似文献   

14.
When VAR models are used to predict future outcomes, the forecast error can be substantial. Through imposition of restrictions on the off-diagonal elements of the parameter matrix, however, the information in the process may be condensed to the marginal processes. In particular, if the cross-autocorrelations in the system are small and only a small sample is available, then such a restriction may reduce the forecast mean squared error considerably.

In this paper, we propose three different techniques to decide whether to use the restricted or unrestricted model, i.e. the full VAR(1) model or only marginal AR(1) models. In a Monte Carlo simulation study, all three proposed tests have been found to behave quite differently depending on the parameter setting. One of the proposed tests stands out, however, as the preferred one and is shown to outperform other estimators for a wide range of parameter settings.  相似文献   


15.
Tree algorithms are a well-known class of random access algorithms with a provable maximum stable throughput under the infinite population model (as opposed to ALOHA or the binary exponential backoff algorithm). In this article, we propose a tree algorithm for opportunistic spectrum usage in cognitive radio networks. A channel in such a network is shared among so-called primary and secondary users, where the secondary users are allowed to use the channel only if there is no primary user activity. The tree algorithm designed in this article can be used by the secondary users to share the channel capacity left by the primary users.

We analyze the maximum stable throughput and mean packet delay of the secondary users by developing a tree structured Quasi-Birth Death Markov chain under the assumption that the primary user activity can be modeled by means of a finite state Markov chain and that packets lengths follow a discrete phase-type distribution.

Numerical experiments provide insight on the effect of various system parameters and indicate that the proposed algorithm is able to make good use of the bandwidth left by the primary users.  相似文献   


16.
17.
Four procedures are suggested for estimating the parameter ‘a’ in the Pauling equation:

e-X/a+e ? Y/a = 1.

The procedures are: using the mean of individual solutions, least squares with Y the subject of the equation, least squares with X the subject of the equation and maximum likelihood using a statistical model. In order to compare these estimates, we use Efron's bootstrap technique (1979), since distributional results are not available. This example also illustrates the role of the bootstrap in statistical inference.  相似文献   


18.
Measures of the spread of data for random sums arise frequently in many problems and have a wide range of applications in real life, such as in the insurance field (e.g., the total claim size in a portfolio). The exact distribution of random sums is extremely difficult to determine, and normal approximation usually performs very badly for this complex distributions. A better method of approximating a random-sum distribution involves the use of saddlepoint approximations.

Saddlepoint approximations are powerful tools for providing accurate expressions for distribution functions that are not known in closed form. This method not only yields an accurate approximation near the center of the distribution but also controls the relative error in the far tail of the distribution.

In this article, we discuss approximations to the unknown complex random-sum Poisson–Erlang random variable, which has a continuous distribution, and the random-sum Poisson-negative binomial random variable, which has a discrete distribution. We show that the saddlepoint approximation method is not only quick, dependable, stable, and accurate enough for general statistical inference but is also applicable without deep knowledge of probability theory. Numerical examples of application of the saddlepoint approximation method to continuous and discrete random-sum Poisson distributions are presented.  相似文献   


19.
According to the last proposals by the Basel Committee, banks are allowed to use statistical approaches for the computation of their capital charge covering financial risks such as credit risk, market risk and operational risk.

It is widely recognized that internal loss data alone do not suffice to provide accurate capital charge in financial risk management, especially for high-severity and low-frequency events. Financial institutions typically use external loss data to augment the available evidence and, therefore, provide more accurate risk estimates. Rigorous statistical treatments are required to make internal and external data comparable and to ensure that merging the two databases leads to unbiased estimates.

The goal of this paper is to propose a correct statistical treatment to make the external and internal data comparable and, therefore, mergeable. Such methodology augments internal losses with relevant, rather than redundant, external loss data.  相似文献   


20.
Summary.  A two-level regression mixture model is discussed and contrasted with the conventional two-level regression model. Simulated and real data shed light on the modelling alternatives. The real data analyses investigate gender differences in mathematics achievement from the US National Education Longitudinal Survey. The two-level regression mixture analyses show that unobserved heterogeneity should not be presupposed to exist only at level 2 at the expense of level 1. Both the simulated and the real data analyses show that level 1 heterogeneity in the form of latent classes can be mistaken for level 2 heterogeneity in the form of the random effects that are used in conventional two-level regression analysis. Because of this, mixture models have an important role to play in multilevel regression analyses. Mixture models allow heterogeneity to be investigated more fully, more correctly attributing different portions of the heterogeneity to the different levels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号