首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
谢长  杨仲山 《统计研究》2021,38(1):132-146
在国际经济比较中,可加性和特征性是购买力平价汇总方法应该具备的重要性质。针对新近发展的、基于双边指数所构建的兼具可加性与特征性的多边比较方法———MBC法,本文对其进行了拓展研究,发展出更为完备的MBC法:一方面,鉴于原始极值最优视角下的MBC法主要考虑的是存在较多“奇异”型价格结构的国家双边比较的特征性,为弥补该视角下的MBC法难以实现整体最优的不足,本文从最小二乘和对数最小二乘的视角分别构建MBC法;另一方面,进一步在MBC法中考虑不同双边比较的可靠性差异问题,分别构建多重视角下的加权MBC法,并引入多种权重函数供应用者参考。采用 ICP2011数据的实证研究结果充分说明,多重视角下的MBC法较好地整合了官方GEKS法的特征性优点和 GK法的可加性优点,相比已有满足可加性的方法,MBC法替代偏差最小,是较理想的既适合于实际物量比较、又可用作支出结构分析的方法。实证研究还发现,最小二乘和对数最小二乘视角下的加权MBC法稳健性较好的优点。  相似文献   

2.
Statistics, as one of the applied sciences, has great impacts in vast area of other sciences. Prediction of protein structures with great emphasize on their geometrical features using dihedral angles has invoked the new branch of statistics, known as directional statistics. One of the available biological techniques to predict is molecular dynamics simulations producing high-dimensional molecular structure data. Hence, it is expected that the principal component analysis (PCA) can response some related statistical problems particulary to reduce dimensions of the involved variables. Since the dihedral angles are variables on non-Euclidean space (their locus is the torus), it is expected that direct implementation of PCA does not provide great information in this case. The principal geodesic analysis is one of the recent methods to reduce the dimensions in the non-Euclidean case. A procedure to utilize this technique for reducing the dimension of a set of dihedral angles is highlighted in this paper. We further propose an extension of this tool, implemented in such way the torus is approximated by the product of two unit circle and evaluate its application in studying a real data set. A comparison of this technique with some previous methods is also undertaken.  相似文献   

3.
Post marketing data offer rich information and cost-effective resources for physicians and policy-makers to address some critical scientific questions in clinical practice. However, the complex confounding structures (e.g., nonlinear and nonadditive interactions) embedded in these observational data often pose major analytical challenges for proper analysis to draw valid conclusions. Furthermore, often made available as electronic health records (EHRs), these data are usually massive with hundreds of thousands observational records, which introduce additional computational challenges. In this paper, for comparative effectiveness analysis, we propose a statistically robust yet computationally efficient propensity score (PS) approach to adjust for the complex confounding structures. Specifically, we propose a kernel-based machine learning method for flexibly and robustly PS modeling to obtain valid PS estimation from observational data with complex confounding structures. The estimated propensity score is then used in the second stage analysis to obtain the consistent average treatment effect estimate. An empirical variance estimator based on the bootstrap is adopted. A split-and-merge algorithm is further developed to reduce the computational workload of the proposed method for big data, and to obtain a valid variance estimator of the average treatment effect estimate as a by-product. As shown by extensive numerical studies and an application to postoperative pain EHR data comparative effectiveness analysis, the proposed approach consistently outperforms other competing methods, demonstrating its practical utility.  相似文献   

4.
Missing data are a common problem in almost all areas of empirical research. Ignoring the missing data mechanism, especially when data are missing not at random (MNAR), can result in biased and/or inefficient inference. Because MNAR mechanism is not verifiable based on the observed data, sensitivity analysis is often used to assess it. Current sensitivity analysis methods primarily assume a model for the response mechanism in conjunction with a measurement model and examine sensitivity to missing data mechanism via the parameters of the response model. Recently, Jamshidian and Mata (Post-modelling sensitivity analysis to detect the effect of missing data mechanism, Multivariate Behav. Res. 43 (2008), pp. 432–452) introduced a new method of sensitivity analysis that does not require the difficult task of modelling the missing data mechanism. In this method, a single measurement model is fitted to all of the data and to a sub-sample of the data. Discrepancy in the parameter estimates obtained from the the two data sets is used as a measure of sensitivity to missing data mechanism. Jamshidian and Mata describe their method mainly in the context of detecting data that are missing completely at random (MCAR). They used a bootstrap type method, that relies on heuristic input from the researcher, to test for the discrepancy of the parameter estimates. Instead of using bootstrap, the current article obtains confidence interval for parameter differences on two samples based on an asymptotic approximation. Because it does not use bootstrap, the developed procedure avoids likely convergence problems with the bootstrap methods. It does not require heuristic input from the researcher and can be readily implemented in statistical software. The article also discusses methods of obtaining sub-samples that may be used to test missing at random in addition to MCAR. An application of the developed procedure to a real data set, from the first wave of an ongoing longitudinal study on aging, is presented. Simulation studies are performed as well, using two methods of missing data generation, which show promise for the proposed sensitivity method. One method of missing data generation is also new and interesting in its own right.  相似文献   

5.
By providing new insights into the distribution of a protein's torsion angles, recent statistical models for this data have pointed the way to more efficient methods for protein structure prediction. Most current approaches have concentrated on bivariate models at a single sequence position. There is, however, considerable value in simultaneously modeling angle pairs at multiple sequence positions in a protein. One area of application for such models is in structure prediction for the highly variable loop and turn regions. Such modeling is difficult due to the fact that the number of known protein structures available to estimate these torsion angle distributions is typically small. Furthermore, the data is "sparse" in that not all proteins have angle pairs at each sequence position. We propose a new semiparametric model for the joint distributions of angle pairs at multiple sequence positions. Our model accommodates sparse data by leveraging known information about the behavior of protein secondary structure. We demonstrate our technique by predicting the torsion angles in a loop from the globin fold family. Our results show that a template-based approach can now be successfully extended to modeling the notoriously difficult loop and turn regions.  相似文献   

6.
The odd Weibull distribution is a three-parameter generalization of the Weibull and the inverse Weibull distributions having rich density and hazard shapes for modeling lifetime data. This paper explored the odd Weibull parameter regions having finite moments and examined the relation to some well-known distributions based on skewness and kurtosis functions. The existence of maximum likelihood estimators have shown with complete data for any sample size. The proof for the uniqueness of these estimators is given only when the absolute value of the second shape parameter is between zero and one. Furthermore, elements of the Fisher information matrix are obtained based on complete data using a single integral representation which have shown to exist for any parameter values. The performance of the odd Weibull distribution over various density and hazard shapes is compared with generalized gamma distribution using two different test statistics. Finally, analysis of two data sets has been performed for illustrative purposes.  相似文献   

7.
We propose two retrospective test statistics for testing the vector of odds ratio parameters under the logistic regression model based on case–control data by exploiting the density ratio structure under a two-sample semiparametric model, which is equivalent to the assumed logistic regression model. The proposed test statistics are based on Kullback–Leibler entropy distance and are particularly relevant to the case–control sampling plan. These two test statistics have identical asymptotic chi-squared distributions under the null hypothesis and identical asymptotic noncentral chi-squared distributions under local alternatives to the null hypothesis. Moreover, the proposed test statistics require computation of the maximum semiparametric likelihood estimators of the underlying parameters, but are otherwise easily computed. We present some results on simulation and on the analysis of two real data sets.  相似文献   

8.
9.
This article focuses on the distribution of price sensitivity across consumers. We employ a random-coefficient logit model in which brand-specific intercepts and price-slope coefficients are allowed to vary across households. The model is estimated with panel data for two product categories. The implications of the estimated model are deduced through an optimal retail pricing analysis that combines the panel data with chain-level cost figures. We test parametric distributional assumptions using semiparametric density estimates based on series expansions.  相似文献   

10.
In this study, a new per-field classification method is proposed for supervised classification of remotely sensed multispectral image data of an agricultural area using Gaussian mixture discriminant analysis (MDA). For the proposed per-field classification method, multivariate Gaussian mixture models constructed for control and test fields can have fixed or different number of components and each component can have different or common covariance matrix structure. The discrimination function and the decision rule of this method are established according to the average Bhattacharyya distance and the minimum values of the average Bhattacharyya distances, respectively. The proposed per-field classification method is analyzed for different structures of a covariance matrix with fixed and different number of components. Also, we classify the remotely sensed multispectral image data using the per-pixel classification method based on Gaussian MDA.  相似文献   

11.
平稳性检验是时间序列回归分析的一个关键问题,已有的检验方法在处理海量时间序列数据时显得乏力,检验准确率有待提高。采用分类技术建立平稳性检验的新方法,可以有效地处理海量时间序列数据。首先计算时间序列自相关函数,构建一个充分非必要的判定准则;然后建立序列收敛的量化分析方法,研究收敛参数的最优取值,并提取平稳性特征向量;最后采用k-means聚类建立平稳性分类识别方法。采用一组模拟数据和股票数据进行分析,将ADF检验、PP检验、KPSS检验进行对比,实证结果表明新方法的准确率较高。  相似文献   

12.
The composite likelihood is amongst the computational methods used for estimation of the generalized linear mixed model (GLMM) in the context of bivariate meta-analysis of diagnostic test accuracy studies. Its advantage is that the likelihood can be derived conveniently under the assumption of independence between the random effects, but there has not been a clear analysis of the merit or necessity of this method. For synthesis of diagnostic test accuracy studies, a copula mixed model has been proposed in the biostatistics literature. This general model includes the GLMM as a special case and can also allow for flexible dependence modelling, different from assuming simple linear correlation structures, normality and tail independence in the joint tails. A maximum likelihood (ML) method, which is based on evaluating the bi-dimensional integrals of the likelihood with quadrature methods, has been proposed, and in fact it eases any computational difficulty that might be caused by the double integral in the likelihood function. Both methods are thoroughly examined with extensive simulations and illustrated with data of a published meta-analysis. It is shown that the ML method has no non-convergence issues or computational difficulties and at the same time allows estimation of the dependence between study-specific sensitivity and specificity and thus prediction via summary receiver operating curves.  相似文献   

13.
Among the diverse frameworks that have been proposed for regression analysis of angular data, the projected multivariate linear model provides a particularly appealing and tractable methodology. In this model, the observed directional responses are assumed to correspond to the angles formed by latent bivariate normal random vectors that are assumed to depend upon covariates through a linear model. This implies an angular normal distribution for the observed angles, and incorporates a regression structure through a familiar and convenient relationship. In this paper we extend this methodology to accommodate clustered data (e.g., longitudinal or repeated measures data) by formulating a marginal version of the model and basing estimation on an EM‐like algorithm in which correlation among within‐cluster responses is taken into account by incorporating a working correlation matrix into the M step. A sandwich estimator is used for the parameter estimates’ covariance matrix. The methodology is motivated and illustrated using an example involving clustered measurements of microbril angle on loblolly pine (Pinus taeda L.) Simulation studies are presented that evaluate the finite sample properties of the proposed fitting method. In addition, the relationship between within‐cluster correlation on the latent Euclidean vectors and the corresponding correlation structure for the observed angles is explored.  相似文献   

14.
The usual chi-squared approximation to test statistics based on normal theory for testing covariance structures of multivariate populations is very sensitive to the normality assumption. Two general bootstrap procedures are developed in this paper to obtain approximately valid critical values for these test statistics when the data are not normally distributed. The first is based on separate sampling from individual samples, and the second is based on sampling from pooled samples. Although the second method requires more assumptions, its small sample properties are better.  相似文献   

15.
We present a non-parametric affine-invariant test for the multivariate Behrens–Fisher problem. The proposed method based on the spatial medians is asymptotic and does not require normality of the data. To improve its finite sample performance, we apply a correction of the type which was already used in a similar test based on trimmed means, however, our simulations show that in the case of heavy-tailed distributions our method performs better. Also in a simulation comparison with a recently published rank-based test our test yields satisfactory results.  相似文献   

16.
Efficient estimation of the regression coefficients in longitudinal data analysis requires a correct specification of the covariance structure. If misspecification occurs, it may lead to inefficient or biased estimators of parameters in the mean. One of the most commonly used methods for handling the covariance matrix is based on simultaneous modeling of the Cholesky decomposition. Therefore, in this paper, we reparameterize covariance structures in longitudinal data analysis through the modified Cholesky decomposition of itself. Based on this modified Cholesky decomposition, the within-subject covariance matrix is decomposed into a unit lower triangular matrix involving moving average coefficients and a diagonal matrix involving innovation variances, which are modeled as linear functions of covariates. Then, we propose a fully Bayesian inference for joint mean and covariance models based on this decomposition. A computational efficient Markov chain Monte Carlo method which combines the Gibbs sampler and Metropolis–Hastings algorithm is implemented to simultaneously obtain the Bayesian estimates of unknown parameters, as well as their standard deviation estimates. Finally, several simulation studies and a real example are presented to illustrate the proposed methodology.  相似文献   

17.
The Inverse Gaussian (IG) distribution is commonly introduced to model and examine right skewed data having positive support. When applying the IG model, it is critical to develop efficient goodness-of-fit tests. In this article, we propose a new test statistic for examining the IG goodness-of-fit based on approximating parametric likelihood ratios. The parametric likelihood ratio methodology is well-known to provide powerful likelihood ratio tests. In the nonparametric context, the classical empirical likelihood (EL) ratio method is often applied in order to efficiently approximate properties of parametric likelihoods, using an approach based on substituting empirical distribution functions for their population counterparts. The optimal parametric likelihood ratio approach is however based on density functions. We develop and analyze the EL ratio approach based on densities in order to test the IG model fit. We show that the proposed test is an improvement over the entropy-based goodness-of-fit test for IG presented by Mudholkar and Tian (2002). Theoretical support is obtained by proving consistency of the new test and an asymptotic proposition regarding the null distribution of the proposed test statistic. Monte Carlo simulations confirm the powerful properties of the proposed method. Real data examples demonstrate the applicability of the density-based EL ratio goodness-of-fit test for an IG assumption in practice.  相似文献   

18.
Statistics for spatial functional data is an emerging field in statistics which combines methods of spatial statistics and functional data analysis to model spatially correlated functional data. Checking for spatial autocorrelation is an important step in the statistical analysis of spatial data. Several statistics to achieve this goal have been proposed. The test based on the Mantel statistic is widely known and used in this context. This paper proposes an application of this test to the case of spatial functional data. Although we focus particularly on geostatistical functional data, that is functional data observed in a region with spatial continuity, the test proposed can also be applied with functional data which can be measured on a discrete set of areas of a region (areal functional data) by defining properly the distance between the areas. Based on two simulation studies, we show that the proposed test has a good performance. We illustrate the methodology by applying it to an agronomic data set.  相似文献   

19.
Estimation and prediction in generalized linear mixed models are often hampered by intractable high dimensional integrals. This paper provides a framework to solve this intractability, using asymptotic expansions when the number of random effects is large. To that end, we first derive a modified Laplace approximation when the number of random effects is increasing at a lower rate than the sample size. Second, we propose an approximate likelihood method based on the asymptotic expansion of the log-likelihood using the modified Laplace approximation which is maximized using a quasi-Newton algorithm. Finally, we define the second order plug-in predictive density based on a similar expansion to the plug-in predictive density and show that it is a normal density. Our simulations show that in comparison to other approximations, our method has better performance. Our methods are readily applied to non-Gaussian spatial data and as an example, the analysis of the rhizoctonia root rot data is presented.  相似文献   

20.
This paper describes a proposal for the extension of the dual multiple factor analysis (DMFA) method developed by Lê and Pagès 15 to the analysis of categorical tables in which the same set of variables is measured on different sets of individuals. The extension of DMFA is based on the transformation of categorical variables into properly weighted indicator variables, in a way analogous to that used in the multiple factor analysis of categorical variables. The DMFA of categorical variables enables visual comparison of the association structures between categories over the sample as a whole and in the various subsamples (sets of individuals). For each category, DMFA allows us to obtain its global (considering all the individuals) and partial (considering each set of individuals) coordinates in a factor space. This visual analysis allows us to compare the set of individuals to identify their similarities and differences. The suitability of the technique is illustrated through two applications: one using simulated data for two groups of individuals with very different association structures and the other using real data from a voting intention survey in which some respondents were interviewed by telephone and others face to face. The results indicate that the two data collection methods, while similar, are not entirely equivalent.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号