首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
This article considers fixed effects (FE) estimation for linear panel data models under possible model misspecification when both the number of individuals, n, and the number of time periods, T, are large. We first clarify the probability limit of the FE estimator and argue that this probability limit can be regarded as a pseudo-true parameter. We then establish the asymptotic distributional properties of the FE estimator around the pseudo-true parameter when n and T jointly go to infinity. Notably, we show that the FE estimator suffers from the incidental parameters bias of which the top order is O(T? 1), and even after the incidental parameters bias is completely removed, the rate of convergence of the FE estimator depends on the degree of model misspecification and is either (nT)? 1/2 or n? 1/2. Second, we establish asymptotically valid inference on the (pseudo-true) parameter. Specifically, we derive the asymptotic properties of the clustered covariance matrix (CCM) estimator and the cross-section bootstrap, and show that they are robust to model misspecification. This establishes a rigorous theoretical ground for the use of the CCM estimator and the cross-section bootstrap when model misspecification and the incidental parameters bias (in the coefficient estimate) are present. We conduct Monte Carlo simulations to evaluate the finite sample performance of the estimators and inference methods, together with a simple application to the unemployment dynamics in the U.S.  相似文献   

2.
A great deal of inference in statistics is based on making the approximation that a statistic is normally distributed. The error in doing so is generally O(n?1/2), where n is the sample size and can be considered when the distribution of the statistic is heavily biased or skewed. This note shows how one may reduce the error to O(n?(j+1)/2), where j is a given integer. The case considered is when the statistic is the mean of the sample values of a continuous distribution with a scale or location change after the sample has undergone an initial transformation, which may depend on an unknown parameter. The transformation corresponding to Fisher's score function yields an asymptotically efficient procedure.  相似文献   

3.
Population-parameter mapping (PPM) is a method for estimating the parameters of latent scientific models that describe the statistical likelihood function. The PPM method involves a Bayesian inference in terms of the statistical parameters and the mapping from the statistical parameter space to the parameter space of the latent scientific parameters, and obtains a model coherence estimate, P(coh). The P(coh) statistic can be valuable for designing experiments, comparing competing models, and can be helpful in redesigning flawed models. Examples are provided where greater estimation precision was found for small sample sizes for the PPM point estimates relative to the maximum likelihood estimator (MLE).  相似文献   

4.
This paper presents a methodology for model fitting and inference in the context of Bayesian models of the type f(Y | X,θ)f(X|θ)f(θ), where Y is the (set of) observed data, θ is a set of model parameters and X is an unobserved (latent) stationary stochastic process induced by the first order transition model f(X (t+1)|X (t),θ), where X (t) denotes the state of the process at time (or generation) t. The crucial feature of the above type of model is that, given θ, the transition model f(X (t+1)|X (t),θ) is known but the distribution of the stochastic process in equilibrium, that is f(X|θ), is, except in very special cases, intractable, hence unknown. A further point to note is that the data Y has been assumed to be observed when the underlying process is in equilibrium. In other words, the data is not collected dynamically over time. We refer to such specification as a latent equilibrium process (LEP) model. It is motivated by problems in population genetics (though other applications are discussed), where it is of interest to learn about parameters such as mutation and migration rates and population sizes, given a sample of allele frequencies at one or more loci. In such problems it is natural to assume that the distribution of the observed allele frequencies depends on the true (unobserved) population allele frequencies, whereas the distribution of the true allele frequencies is only indirectly specified through a transition model. As a hierarchical specification, it is natural to fit the LEP within a Bayesian framework. Fitting such models is usually done via Markov chain Monte Carlo (MCMC). However, we demonstrate that, in the case of LEP models, implementation of MCMC is far from straightforward. The main contribution of this paper is to provide a methodology to implement MCMC for LEP models. We demonstrate our approach in population genetics problems with both simulated and real data sets. The resultant model fitting is computationally intensive and thus, we also discuss parallel implementation of the procedure in special cases.  相似文献   

5.
ABSTRACT

An exponential-time exact algorithm is provided for the task of clustering n items of data into k clusters. Instead of seeking one partition, posterior probabilities are computed for summary statistics: the number of clusters and pairwise co-occurrence. The method is based on subset convolution and yields the posterior distribution for the number of clusters in O(n3n) operations or O(n32n) using fast subset convolution. Pairwise co-occurrence probabilities are then obtained in O(n32n) operations. This is considerably faster than exhaustive enumeration of all partitions.  相似文献   

6.
Recently, mixture distribution becomes more and more popular in many scientific fields. Statistical computation and analysis of mixture models, however, are extremely complex due to the large number of parameters involved. Both EM algorithms for likelihood inference and MCMC procedures for Bayesian analysis have various difficulties in dealing with mixtures with unknown number of components. In this paper, we propose a direct sampling approach to the computation of Bayesian finite mixture models with varying number of components. This approach requires only the knowledge of the density function up to a multiplicative constant. It is easy to implement, numerically efficient and very practical in real applications. A simulation study shows that it performs quite satisfactorily on relatively high dimensional distributions. A well-known genetic data set is used to demonstrate the simplicity of this method and its power for the computation of high dimensional Bayesian mixture models.  相似文献   

7.
Let {X, Xn; n ≥ 1} be a sequence of real-valued iid random variables, 0 < r < 2 and p > 0. Let D = { A = (ank; 1 ≤ kn, n ≥ 1); ank, ? R and supn, k |an,k| < ∞}. Set Sn( A ) = ∑nk=1an, kXk for A ? D and n ≥ 1. This paper is devoted to determining conditions whereby E{supn ≥ 1, |Sn( A )|/n1/r}p < ∞ or E{supn ≥ 2 |Sn( A )|/2n log n)1/2}p < ∞ for every A ? D. This generalizes some earlier results, including those of Burkholder (1962), Choi and Sung (1987), Davis (1971), Gut (1979), Klass (1974), Siegmund (1969) and Teicher (1971).  相似文献   

8.
The exact and asymptotic upper tail probabilities (α = .10, .05, .01, .001) of the three chi-squared goodness-of-fit statistics Pearson's X 2, likelihood ratioG 2, and powerdivergence statisticD 2(λ), with λ= 2/3 are compared by complete enumeration for the binomial and the mixture binomial. For the two-component mixture binomial, three cases have been distinguished. 1. Both success probabilities and the mixing weights are unknwon. 2. One of the two success probabilities is known. And 3., the mixing weights are known. The binomial was investigated for the number of cellsk, being between 3 and 6 with sample sizes between 5 and 100, for k = 7 with sample sizes between 5 and 45, and for k = 10 with sample sizes ranging from 5 to 20. For the mixture binomial, solely k = 5 cells were considered with sample sizes from 5 to 100 and k = 8 cells with sample sizes between 4 and 20. Rating the relative accuracy of the chi-squared approximation in terms of ±10% and ±20% intervals around α led to the following conclusions for the binomial: 1. Using G2 is not recommendable. 2. At the significance levels α=.10 and α=.05X 2 should be preferred over D 2; D 2 is the best choice at α = .01. 3. Cochran's (1954; Biometrics, 10, 417-451) rule for the minimum expectation when using X 2 seems to generalize to the binomial for G 2 and D 2 ; as a compromise, it gives a rather strong lower limit for the expected cell frequencies in some circumstances, but a rather liberal in others. To draw similar conclusions concerning the mixture binomial was not possible, because in that case, the accuracy of the chi-squared approximation is not only a function of the chosen test statistic and of the significance level, but also heavily depends on the numerical value of theinvolved unknown parameters and on the hypothesis to be tested. Thereto, the present study may give rise only to warnings against the application of mixture models to small samples.  相似文献   

9.
Most of the higher-order asymptotic results in statistical inference available in the literature assume model correctness. The aim of this paper is to develop higher-order results under model misspecification. The density functions to O(n?3/2) of the robust score test statistic and the robust Wald test statistic are derived under the null hypothesis, for the scalar as well as the multiparameter case. Alternate statistics which are robust to O(n?3/2) are also proposed.  相似文献   

10.
We study the behavior of bivariate empirical copula process 𝔾 n (·, ·) on pavements [0, k n /n]2 of [0, 1]2, where k n is a sequence of positive constants fulfilling some conditions. We provide a upper bound for the strong approximation of 𝔾 n (·, ·) by a Gaussian process when k n /n↘γ as n → ∞, where 0 ≤ γ ≤1.  相似文献   

11.
The class of symmetric linear regression models has the normal linear regression model as a special case and includes several models that assume that the errors follow a symmetric distribution with longer-than-normal tails. An important member of this class is the t linear regression model, which is commonly used as an alternative to the usual normal regression model when the data contain extreme or outlying observations. In this article, we develop second-order asymptotic theory for score tests in this class of models. We obtain Bartlett-corrected score statistics for testing hypotheses on the regression and the dispersion parameters. The corrected statistics have chi-squared distributions with errors of order O(n ?3/2), n being the sample size. The corrections represent an improvement over the corresponding original Rao's score statistics, which are chi-squared distributed up to errors of order O(n ?1). Simulation results show that the corrected score tests perform much better than their uncorrected counterparts in samples of small or moderate size.  相似文献   

12.
In this paper, we consider an estimation for the unknown parameters of a conditional Gaussian MA(1) model. In the majority of cases, a maximum-likelihood estimator is chosen because the estimator is consistent. However, for small sample sizes the error is large, because the estimator has a bias of O(n? 1). Therefore, we provide a bias of O(n? 1) for the maximum-likelihood estimator for the conditional Gaussian MA(1) model. Moreover, we propose new estimators for the unknown parameters of the conditional Gaussian MA(1) model based on the bias of O(n? 1). We investigate the properties of the bias, as well as the asymptotical variance of the maximum-likelihood estimators for the unknown parameters, by performing some simulations. Finally, we demonstrate the validity of the new estimators through this simulation study.  相似文献   

13.
Suppose [^(q)]{\widehat{\theta}} is an estimator of θ in \mathbbR{\mathbb{R}} that satisfies the central limit theorem. In general, inferences on θ are based on the central limit approximation. These have error O(n −1/2), where n is the sample size. Many unsuccessful attempts have been made at finding transformations which reduce this error to O(n −1). The variance stabilizing transformation fails to achieve this. We give alternative transformations that have bias O(n −2), and skewness O(n −3). Examples include the binomial, Poisson, chi-square and hypergeometric distributions.  相似文献   

14.
This study takes up inference in linear models with generalized error and generalized t distributions. For the generalized error distribution, two computational algorithms are proposed. The first is based on indirect Bayesian inference using an approximating finite scale mixture of normal distributions. The second is based on Gibbs sampling. The Gibbs sampler involves only drawing random numbers from standard distributions. This is important because previously the impression has been that an exact analysis of the generalized error regression model using Gibbs sampling is not possible. Next, we describe computational Bayesian inference for linear models with generalized t disturbances based on Gibbs sampling, and exploiting the fact that the model is a mixture of generalized error distributions with inverse generalized gamma distributions for the scale parameter. The linear model with this specification has also been thought not to be amenable to exact Bayesian analysis. All computational methods are applied to actual data involving the exchange rates of the British pound, the French franc, and the German mark relative to the U.S. dollar.  相似文献   

15.
A Bayesian analysis is provided for the Wilcoxon signed-rank statistic (T+). The Bayesian analysis is based on a sign-bias parameter φ on the (0, 1) interval. For the case of a uniform prior probability distribution for φ and for small sample sizes (i.e., 6 ? n ? 25), values for the statistic T+ are computed that enable probabilistic statements about φ. For larger sample sizes, approximations are provided for the asymptotic likelihood function P(T+|φ) as well as for the posterior distribution P(φ|T+). Power analyses are examined both for properly specified Gaussian sampling and for misspecified non Gaussian models. The new Bayesian metric has high power efficiency in the range of 0.9–1 relative to a standard t test when there is Gaussian sampling. But if the sampling is from an unknown and misspecified distribution, then the new statistic still has high power; in some cases, the power can be higher than the t test (especially for probability mixtures and heavy-tailed distributions). The new Bayesian analysis is thus a useful and robust method for applications where the usual parametric assumptions are questionable. These properties further enable a way to do a generic Bayesian analysis for many non Gaussian distributions that currently lack a formal Bayesian model.  相似文献   

16.
The concept of location depth was introduced as a way to extend the univariate notion of ranking to a bivariate configuration of data points. It has been used successfully for robust estimation, hypothesis testing, and graphical display. The depth contours form a collection of nested polygons, and the center of the deepest contour is called the Tukey median. The only available implemented algorithms for the depth contours and the Tukey median are slow, which limits their usefulness. In this paper we describe an optimal algorithm which computes all bivariate depth contours in O(n 2) time and space, using topological sweep of the dual arrangement of lines. Once these contours are known, the location depth of any point can be computed in O(log2 n) time with no additional preprocessing or in O(log n) time after O(n 2) preprocessing. We provide fast implementations of these algorithms to allow their use in everyday statistical practice.  相似文献   

17.
The well-known Wilson and Agresti–Coull confidence intervals for a binomial proportion p are centered around a Bayesian estimator. Using this as a starting point, similarities between frequentist confidence intervals for proportions and Bayesian credible intervals based on low-informative priors are studied using asymptotic expansions. A Bayesian motivation for a large class of frequentist confidence intervals is provided. It is shown that the likelihood ratio interval for p approximates a Bayesian credible interval based on Kerman’s neutral noninformative conjugate prior up to O(n? 1) in the confidence bounds. For the significance level α ? 0.317, the Bayesian interval based on the Jeffreys’ prior is then shown to be a compromise between the likelihood ratio and Wilson intervals. Supplementary materials for this article are available online.  相似文献   

18.
Various authors, given k location parameters, have considered lower confidence bounds on (standardized) dserences between the largest and each of the other k - 1 parameters. They have then used these bounds to put lower confidence bounds on the probability of correct selection (PCS) in the same experiment (as was used for finding the lower bounds on differences). It is pointed out that this is an inappropriate inference procedure. Moreover, if the PCS refers to some later experiment it is shown that if a non-trivial confidence bound is possible then it is already possible to conclude, with greater confidence, that correct selection has occurred in the first experiment. The short answer to the question in the title is therefore ‘No’, but this should be qualified in the case of a Bayesian analysis.  相似文献   

19.
In this paper we discuss bias-corrected estimators for the regression and the dispersion parameters in an extended class of dispersion models (Jørgensen, 1997b). This class extends the regular dispersion models by letting the dispersion parameter vary throughout the observations, and contains the dispersion models as particular case. General formulae for the O(n−1) bias are obtained explicitly in dispersion models with dispersion covariates, which generalize previous results obtained by Botter and Cordeiro (1998), Cordeiro and McCullagh (1991), Cordeiro and Vasconcellos (1999), and Paula (1992). The practical use of the formulae is that we can derive closed-form expressions for the O(n−1) biases of the maximum likelihood estimators of the regression and dispersion parameters when the information matrix has a closed-form. Various expressions for the O(n−1) biases are given for special models. The formulae have advantages for numerical purposes because they require only a supplementary weighted linear regression. We also compare these bias-corrected estimators with two different estimators which are also bias-free to order O(n−1) that are based on bootstrap methods. These estimators are compared by simulation.  相似文献   

20.
i , i = 1, 2, ..., k be k independent exponential populations with different unknown location parameters θ i , i = 1, 2, ..., k and common known scale parameter σ. Let Y i denote the smallest observation based on a random sample of size n from the i-th population. Suppose a subset of the given k population is selected using the subset selection procedure according to which the population π i is selected iff Y i Y (1)d, where Y (1) is the largest of the Y i 's and d is some suitable constant. The estimation of the location parameters associated with the selected populations is considered for the squared error loss. It is observed that the natural estimator dominates the unbiased estimator. It is also shown that the natural estimator itself is inadmissible and a class of improved estimators that dominate the natural estimator is obtained. The improved estimators are consistent and their risks are shown to be O(kn −2). As a special case, we obtain the coresponding results for the estimation of θ(1), the parameter associated with Y (1). Received: January 6, 1998; revised version: July 11, 2000  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号