首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Skew‐symmetric models offer a very flexible class of distributions for modelling data. These distributions can also be viewed as selection models for the symmetric component of the specified skew‐symmetric distribution. The estimation of the location and scale parameters corresponding to the symmetric component is considered here, with the symmetric component known. Emphasis is placed on using the empirical characteristic function to estimate these parameters. This is made possible by an invariance property of the skew‐symmetric family of distributions, namely that even transformations of random variables that are skew‐symmetric have a distribution only depending on the symmetric density. A distance metric between the real components of the empirical and true characteristic functions is minimized to obtain the estimators. The method is semiparametric, in that the symmetric component is specified, but the skewing function is assumed unknown. Furthermore, the methodology is extended to hypothesis testing. Two tests for a null hypothesis of specific parameter values are considered, as well as a test for the hypothesis that the symmetric component has a specific parametric form. A resampling algorithm is described for practical implementation of these tests. The outcomes of various numerical experiments are presented.  相似文献   

2.
Combining-100 information from multiple samples is often needed in biomedical and economic studies, but differences between these samples must be appropriately taken into account in the analysis of the combined data. We study the estimation for moment restriction models with data combined from two samples under an ignorability-type assumption while allowing for different marginal distributions of variables common to both samples. Suppose that an outcome regression (OR) model and a propensity score (PS) model are specified. By leveraging semi-parametric efficiency theory, we derive an augmented inverse probability-weighted (AIPW) estimator that is locally efficient and doubly robust with respect to these models. Furthermore, we develop calibrated regression and likelihood estimators that are not only locally efficient and doubly robust but also intrinsically efficient in achieving smaller variances than the AIPW estimator when the PS model is correctly specified but the OR model may be mispecified. As an important application, we study the two-sample instrumental variable problem and derive the corresponding estimators while allowing for incompatible distributions of variables common to the two samples. Finally, we provide a simulation study and an econometric application on public housing projects to demonstrate the superior performance of our improved estimators. The Canadian Journal of Statistics 48: 259–284; 2020 © 2019 Statistical Society of Canada  相似文献   

3.
There are a variety of economic areas, such as studies of employment duration and of the durability of capital goods, in which data on important variables typically are censored. The standard techinques for estimating a model from censored data require the distributions of unobservable random components of the model to be specified a priori up to a finite set of parameters, and misspecification of these distributions usually leads to inconsistent parameter estimates. However, economic theory rarely gives guidance about distributions and the standard estimation techniques do not provide convenient methods for identifying distributions from censored data. Recently, several distribution-free or semiparametric methods for estimating censored regression models have been developed. This paper presents the results of using two such methods to estimate a model of employment duration. The paper reports the operating characteristics of the semiparametric estimators and compares the semiparametric estimates with those obtained from a standard parametric model.  相似文献   

4.
Summary. Nearest neighbour algorithms are among the most popular methods used in statistical pattern recognition. The models are conceptually simple and empirical studies have shown that their performance is highly competitive against other techniques. However, the lack of a formal framework for choosing the size of the neighbourhood k is problematic. Furthermore, the method can only make discrete predictions by reporting the relative frequency of the classes in the neighbourhood of the prediction point. We present a probabilistic framework for the k -nearest-neighbour method that largely overcomes these difficulties. Uncertainty is accommodated via a prior distribution on k as well as in the strength of the interaction between neighbours. These prior distributions propagate uncertainty through to proper probabilistic predictions that have continuous support on (0, 1). The method makes no assumptions about the distribution of the predictor variables. The method is also fully automatic with no user-set parameters and empirically it proves to be highly accurate on many bench-mark data sets.  相似文献   

5.
In this paper, we compare five asymptotically, under a correctly specified likelihood, equivalent estimators of the standard errors for parameters in structural equation models. The estimators are evaluated under different conditions regarding (i) sample size, varying between N=50 and 3200, (ii) distributional assumption of the latent variables and the disturbance terms, namely normal, and heavy tailed (t), and (iii) the complexity of the model. For the assessment of the five estimators we use overall performance, relative bias, MSE and coverage of confidence intervals. The analysis reveals substantial differences in the performance of the five asymptotically equal estimators. Most diversity was found for t distributed, i.e. heavy tailed, data.  相似文献   

6.
The aim of this paper is to formulate an analytical–informational–theoretical approach which, given the incomplete nature of the available micro-level data, can be used to provide disaggregated values of a given variable. A functional relationship between the variable to be disaggregated and the available variables/indicators at the area level is specified through a combination of different macro- and micro-data sources. Data disaggregation is accomplished by considering two different cases. In the first case, sub-area level information on the variable of interest is available, and a generalized maximum entropy approach is employed to estimate the optimal disaggregate model. In the second case, we assume that the sub-area level information is partial and/or incomplete, and we estimate the model on a smaller scale by developing a generalized cross-entropy-based formulation. The proposed spatial-disaggregation approach is used in relation to an Italian data set in order to compute the value-added per manufacturing sector of local labour systems within the Umbria region, by combining the available micro/macro-level data and by formulating a suitable set of constraints for the optimization problem in the presence of errors in micro-aggregates.  相似文献   

7.
Consider a population of individuals who are free of a disease under study, and who are exposed simultaneously at random exposure levels, say X,Y,Z,… to several risk factors which are suspected to cause the disease in the populationm. At any specified levels X=x, Y=y, Z=z, …, the incidence rate of the disease in the population ot risk is given by the exposure–response relationship r(x,y,z,…) = P(disease|x,y,z,…). The present paper examines the relationship between the joint distribution of the exposure variables X,Y,Z, … in the population at risk and the joint distribution of the exposure variables U,V,W,… among cases under the linear and the exponential risk models. It is proven that under the exponential risk model, these two joint distributions belong to the same family of multivariate probability distributions, possibly with different parameters values. For example, if the exposure variables in the population at risk have jointly a multivariate normal distribution, so do the exposure variables among cases; if the former variables have jointly a multinomial distribution, so do the latter. More generally, it is demonstrated that if the joint distribution of the exposure variables in the population at risk belongs to the exponential family of multivariate probability distributions, so does the joint distribution of exposure variables among cases. If the epidemiologist can specify the differnce among the mean exposure levels in the case and control groups which are considered to be clinically or etiologically important in the study, the results of the present paper may be used to make sample size determinations for the case–control study, corresponding to specified protection levels, i.e., size α and 1–β of a statistical test. The multivariate normal, the multinomial, the negative multinomial and Fisher's multivariate logarithmic series exposure distributions are used to illustrate our results.  相似文献   

8.
A new class of distributions, including the MacGillivray adaptation of the g-and-h distributions and a new family called the g-and-k distributions, may be used to approximate a wide class of distributions, with the advantage of effectively controlling skewness and kurtosis through independent parameters. This separation can be used to advantage in the assessment of robustness to non-normality in frequentist ranking and selection rules. We consider the rule of selecting the largest of several means with some specified confidence. In general, we find that the frequentist selection rule is only robust to small changes in the distributional shape parameters g and k and depends on the amount of flexibility we allow in the specified confidence. This flexibility is exemplified through a quality control example in which a subset of batches of electrical transformers are selected as the most efficient with a specified confidence, based on the sample mean performance level for each batch.  相似文献   

9.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

10.
ABSTRACT

Inference for epidemic parameters can be challenging, in part due to data that are intrinsically stochastic and tend to be observed by means of discrete-time sampling, which are limited in their completeness. The problem is particularly acute when the likelihood of the data is computationally intractable. Consequently, standard statistical techniques can become too complicated to implement effectively. In this work, we develop a powerful method for Bayesian paradigm for susceptible–infected–removed stochastic epidemic models via data-augmented Markov Chain Monte Carlo. This technique samples all missing values as well as the model parameters, where the missing values and parameters are treated as random variables. These routines are based on the approximation of the discrete-time epidemic by diffusion process. We illustrate our techniques using simulated epidemics and finally we apply them to the real data of Eyam plague.  相似文献   

11.
The wide-ranging and rapidly evolving nature of ecological studies mean that it is not possible to cover all existing and emerging techniques for analyzing multivariate data. However, two important methods enticed many followers: the Canonical Correspondence Analysis (CCA) and the STATICO analysis. Despite the particular characteristics of each, they have similarities and differences, which when analyzed properly, can, together, provide important complementary results to those that are usually exploited by researchers. If on one hand, the use of CCA is completely generalized and implemented, solving many problems formulated by ecologists, on the other hand, this method has some weaknesses mainly caused by the imposition of the number of variables that is required to be applied (much higher in comparison with samples). Also, the STATICO method has no such restrictions, but requires that the number of variables (species or environment) is the same in each time or space. Yet, the STATICO method presents information that can be more detailed since it allows visualizing the variability within groups (either in time or space). In this study, the data needed for implementing these methods are sketched, as well as the comparison is made showing the advantages and disadvantages of each method. The treated ecological data are a sequence of pairs of ecological tables, where species abundances and environmental variables are measured at different, specified locations, over the course of time.  相似文献   

12.
Quality control relies heavily on the use of formal assessment metrics. In this paper, for the context of veterinary epidemiology, we review the main proposals, precision, repeatability, reproducibility, and intermediate precision, in agreement with ISO (international Organization for Standardization) practice, generalize these by placing them within the linear mixed model framework, which we then extend to the generalized linear mixed model setting, so that both Gaussian as well as non-Gaussian data can be employed. Similarities and differences are discussed between the classical ANOVA (analysis of variance) approach and the proposed mixed model settings, on the one hand, and between the Gaussian and non-Gaussian cases, on the other hand. The new proposals are applied to five studies in three diseases: Aujeszky's disease, enzootic bovine leucosis (EBL) and bovine brucellosis. The mixed-models proposals are also discussed in the light of their computational requirements.  相似文献   

13.
Minimum information bivariate distributions with uniform marginals and a specified rank correlation are studied in this paper. These distributions play an important role in a particular way of modeling dependent random variables which has been used in the computer code UNICORN for carrying out uncertainty analyses. It is shown that these minimum information distributions have a particular form which makes simulation of conditional distributions very simple. Approximations to the continuous distributions are discussed and explicit formulae are determined. Finally a relation is discussed to DAD theorems, and a numerical algorithm is given (which has geometric rate of covergence) for determining the minimum information distributions.  相似文献   

14.
Various exact tests for statistical inference are available for powerful and accurate decision rules provided that corresponding critical values are tabulated or evaluated via Monte Carlo methods. This article introduces a novel hybrid method for computing p‐values of exact tests by combining Monte Carlo simulations and statistical tables generated a priori. To use the data from Monte Carlo generations and tabulated critical values jointly, we employ kernel density estimation within Bayesian‐type procedures. The p‐values are linked to the posterior means of quantiles. In this framework, we present relevant information from the Monte Carlo experiments via likelihood‐type functions, whereas tabulated critical values are used to reflect prior distributions. The local maximum likelihood technique is employed to compute functional forms of prior distributions from statistical tables. Empirical likelihood functions are proposed to replace parametric likelihood functions within the structure of the posterior mean calculations to provide a Bayesian‐type procedure with a distribution‐free set of assumptions. We derive the asymptotic properties of the proposed nonparametric posterior means of quantiles process. Using the theoretical propositions, we calculate the minimum number of needed Monte Carlo resamples for desired level of accuracy on the basis of distances between actual data characteristics (e.g. sample sizes) and characteristics of data used to present corresponding critical values in a table. The proposed approach makes practical applications of exact tests simple and rapid. Implementations of the proposed technique are easily carried out via the recently developed STATA and R statistical packages.  相似文献   

15.
Competing risks data are routinely encountered in various medical applications due to the fact that patients may die from different causes. Recently, several models have been proposed for fitting such survival data. In this paper, we develop a fully specified subdistribution model for survival data in the presence of competing risks via a subdistribution model for the primary cause of death and conditional distributions for other causes of death. Various properties of this fully specified subdistribution model have been examined. An efficient Gibbs sampling algorithm via latent variables is developed to carry out posterior computations. Deviance information criterion (DIC) and logarithm of the pseudomarginal likelihood (LPML) are used for model comparison. An extensive simulation study is carried out to examine the performance of DIC and LPML in comparing the cause-specific hazards model, the mixture model, and the fully specified subdistribution model. The proposed methodology is applied to analyze a real dataset from a prostate cancer study in detail.  相似文献   

16.
Summary.   Data editing is the process by which data that are collected in some way (a statistical survey for example) are examined for errors and corrected with the help of software. Edits, the logical conditions that should be satisfied by the data, are specified by subject-matter experts with a procedure which could be tedious and could lead to mistakes with practical implications. To render the process of edit specification more efficient we provide a new step—the definition of the so-called abstract data model of a survey—which describes the structure of the phenomenon that is studied in a survey. The existence of this model enables experts to identify all combinations of variables which should be checked by edits and to avoid the definition of conflicting edits. Furthermore, we introduce an automatic data validation strategy—TREEVAL—that consists of fast tree growing to derive automatically the functional form of edits and of a statistical criterion to clean the incoming data. The TREEVAL strategy is cast within a total quality management framework. The application of the methodologies proposed is demonstrated with the help of a real life application.  相似文献   

17.
In this article we discuss variable selection for decision making with focus on decisions regarding when to provide treatment and which treatment to provide. Current variable selection techniques were developed for use in a supervised learning setting where the goal is prediction of the response. These techniques often downplay the importance of interaction variables that have small predictive ability but that are critical when the ultimate goal is decision making rather than prediction. We propose two new techniques designed specifically to find variables that aid in decision making. Simulation results are given along with an application of the methods on data from a randomized controlled trial for the treatment of depression.  相似文献   

18.
We consider causal inference in randomized studies for survival data with a cure fraction and all-or-none treatment non compliance. To describe the causal effects, we consider the complier average causal effect (CACE) and the complier effect on survival probability beyond time t (CESP), where CACE and CESP are defined as the difference of cure rate and non cured subjects’ survival probability between treatment and control groups within the complier class. These estimands depend on the distributions of survival times in treatment and control groups. Given covariates and latent compliance type, we model these distributions with transformation promotion time cure model whose parameters are estimated by maximum likelihood. Both the infinite dimensional parameter in the model and the mixture structure of the problem create some computational difficulties which are overcome by an expectation-maximization (EM) algorithm. We show the estimators are consistent and asymptotically normal. Some simulation studies are conducted to assess the finite-sample performance of the proposed approach. We also illustrate our method by analyzing a real data from the Healthy Insurance Plan of Greater New York.  相似文献   

19.
Inference in model-based cluster analysis   总被引:6,自引:0,他引:6  
A new approach to cluster analysis has been introduced based on parsimonious geometric modelling of the within-group covariance matrices in a mixture of multivariate normal distributions, using hierarchical agglomeration and iterative relocation. It works well and is widely used via the MCLUST software available in S-PLUS and StatLib. However, it has several limitations: there is no assessment of the uncertainty about the classification, the partition can be suboptimal, parameter estimates are biased, the shape matrix has to be specified by the user, prior group probabilities are assumed to be equal, the method for choosing the number of groups is based on a crude approximation, and no formal way of choosing between the various possible models is included. Here, we propose a new approach which overcomes all these difficulties. It consists of exact Bayesian inference via Gibbs sampling, and the calculation of Bayes factors (for choosing the model and the number of groups) from the output using the Laplace–Metropolis estimator. It works well in several real and simulated examples.  相似文献   

20.
Various methods to control the influence of a covariate on a response variable are compared. These methods are ANOVA with or without homogeneity of variances (HOV) of errors and Kruskal–Wallis (K–W) tests on (covariate-adjusted) residuals and analysis of covariance (ANCOVA). Covariate-adjusted residuals are obtained from the overall regression line fit to the entire data set ignoring the treatment levels or factors. It is demonstrated that the methods on covariate-adjusted residuals are only appropriate when the regression lines are parallel and covariate means are equal for all treatments. Empirical size and power performance of the methods are compared by extensive Monte Carlo simulations. We manipulated the conditions such as assumptions of normality and HOV, sample size, and clustering of the covariates. The parametric methods on residuals and ANCOVA exhibited similar size and power when error terms have symmetric distributions with variances having the same functional form for each treatment, and covariates have uniform distributions within the same interval for each treatment. In such cases, parametric tests have higher power compared to the K–W test on residuals. When error terms have asymmetric distributions or have variances that are heterogeneous with different functional forms for each treatment, the tests are liberal with K–W test having higher power than others. The methods on covariate-adjusted residuals are severely affected by the clustering of the covariates relative to the treatment factors when covariate means are very different for treatments. For data clusters, ANCOVA method exhibits the appropriate level. However, such a clustering might suggest dependence between the covariates and the treatment factors, so makes ANCOVA less reliable as well.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号