首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
2.
The logistic regression model has been widely used in the social and natural sciences and results from studies using this model can have significant policy impacts. Thus, confidence in the reliability of inferences drawn from these models is essential. The robustness of such inferences is dependent on sample size. The purpose of this article is to examine the impact of alternative data sets on the mean estimated bias and efficiency of parameter estimation and inference for the logistic regression model with observational data. A number of simulations are conducted examining the impact of sample size, nonlinear predictors, and multicollinearity on substantive inferences (e.g. odds ratios, marginal effects) when using logistic regression models. Findings suggest that small sample size can negatively affect the quality of parameter estimates and inferences in the presence of rare events, multicollinearity, and nonlinear predictor functions, but marginal effects estimates are relatively more robust to sample size.  相似文献   

3.
We propose a Bayesian hierarchical model for multiple comparisons in mixed models where the repeated measures on subjects are described with the subject random effects. The model facilitates inferences in parameterizing the successive differences of the population means, and for them, we choose independent prior distributions that are mixtures of a normal distribution and a discrete distribution with its entire mass at zero. For the other parameters, we choose conjugate or vague priors. The performance of the proposed hierarchical model is investigated in the simulated and two real data sets, and the results illustrate that the proposed hierarchical model can effectively conduct a global test and pairwise comparisons using the posterior probability that any two means are equal. A simulation study is performed to analyze the type I error rate, the familywise error rate, and the test power. The Gibbs sampler procedure is used to estimate the parameters and to calculate the posterior probabilities.  相似文献   

4.
In this presentation we discuss the extension of permutation conditional inferences to unconditional or population ones. Within the parametric approach this extension is possible when the data set is randomly selected by well-designed sampling procedures on well-defined population distributions, provided that their nuisance parameters have boundely complete statistics in the null hypothesis or are provided with invariant statistics. When these conditions fail, especially if selection-bias procedures are used for data collection processes, in general most of the parametric inferential extensions are wrong or misleading. We will see that, since they are provided with similarity and conditional unbiasedness properties and if correctly applicable, permutation tests may extend, at least in a weak sense, conditional to unconditional inferences.  相似文献   

5.
6.
The widely used Fellegi–Sunter model for probabilistic record linkage does not leverage information contained in field values and consequently leads to identical classification of match status regardless of whether records agree on rare or common values. Since agreement on rare values is less likely to occur by chance than agreement on common values, records agreeing on rare values are more likely to be matches. Existing frequency-based methods typically rely on knowledge of error probabilities associated with field values and frequencies of agreed field values among matches, often derived using prior studies or training data. When such information is unavailable, applications of these methods are challenging. In this paper, we propose a simple two-step procedure for frequency-based matching using the Fellegi–Sunter framework to overcome these challenges. Matching weights are adjusted based on frequency distributions of the agreed field values among matches and non-matches, estimated by the Fellegi–Sunter model without relying on prior studies or training data. Through a real-world application and simulation, our method is found to produce comparable or better performance than the unadjusted method. Furthermore, frequency-based matching provides greater improvement in matching accuracy when using poorly discriminating fields with diminished benefit as the discriminating power of matching fields increases.  相似文献   

7.
8.
This paper addresses the inference problem for a flexible class of distributions with normal kernel known as skew-bimodal-normal family of distributions. We obtain posterior and predictive distributions assuming different prior specifications. We provide conditions for the existence of the maximum-likelihood estimators (MLE). An EM-type algorithm is built to compute them. As a by product, we obtain important results related to classical and Bayesian inferences for two special subclasses called bimodal-normal and skew-normal (SN) distribution families. We perform a Monte Carlo simulation study to analyse behaviour of the MLE and some Bayesian ones. Considering the frontier data previously studied in the literature, we use the skew-bimodal-normal (SBN) distribution for density estimation. For that data set, we conclude that the SBN model provides as good a fit as the one obtained using the location-scale SN model. Since the former is a more parsimonious model, such a result is shown to be more attractive.  相似文献   

9.
There are a variety of economic areas, such as studies of employment duration and of the durability of capital goods, in which data on important variables typically are censored. The standard techinques for estimating a model from censored data require the distributions of unobservable random components of the model to be specified a priori up to a finite set of parameters, and misspecification of these distributions usually leads to inconsistent parameter estimates. However, economic theory rarely gives guidance about distributions and the standard estimation techniques do not provide convenient methods for identifying distributions from censored data. Recently, several distribution-free or semiparametric methods for estimating censored regression models have been developed. This paper presents the results of using two such methods to estimate a model of employment duration. The paper reports the operating characteristics of the semiparametric estimators and compares the semiparametric estimates with those obtained from a standard parametric model.  相似文献   

10.
Latent class models (LCMs) are used increasingly for addressing a broad variety of problems, including sparse modeling of multivariate and longitudinal data, model-based clustering, and flexible inferences on predictor effects. Typical frequentist LCMs require estimation of a single finite number of classes, which does not increase with the sample size, and have a well-known sensitivity to parametric assumptions on the distributions within a class. Bayesian nonparametric methods have been developed to allow an infinite number of classes in the general population, with the number represented in a sample increasing with sample size. In this article, we propose a new nonparametric Bayes model that allows predictors to flexibly impact the allocation to latent classes, while limiting sensitivity to parametric assumptions by allowing class-specific distributions to be unknown subject to a stochastic ordering constraint. An efficient MCMC algorithm is developed for posterior computation. The methods are validated using simulation studies and applied to the problem of ranking medical procedures in terms of the distribution of patient morbidity.  相似文献   

11.
Neosporosis is a bovine disease caused by the parasite Neospora caninum. It is not yet sufficiently studied, and it is supposed to cause an important number of abortions. Its clinical symptoms do not yet allow the reliable identification of infected animals. Its study and treatment would improve if a test based on antibody counts were available. Knowing the distribution functions of observed counts of uninfected and infected cows would allow the determination of a cutoff value. These distributions cannot be estimated directly. This paper deals with the indirect estimation of these distributions based on a data set consisting of the antibody counts for some 200 pairs of cows and their calves. The desired distributions are estimated through a mixture model based on simple assumptions that describe the relationship between each cow and its calf. The model then allows the estimation of the cutoff value and of the error probabilities.  相似文献   

12.
Categorical data frequently arise in applications in the Social Sciences. In such applications, the class of log-linear models, based on either a Poisson or (product) multinomial response distribution, is a flexible model class for inference and prediction. In this paper we consider the Bayesian analysis of both Poisson and multinomial log-linear models. It is often convenient to model multinomial or product multinomial data as observations of independent Poisson variables. For multinomial data, Lindley (1964) [20] showed that this approach leads to valid Bayesian posterior inferences when the prior density for the Poisson cell means factorises in a particular way. We develop this result to provide a general framework for the analysis of multinomial or product multinomial data using a Poisson log-linear model. Valid finite population inferences are also available, which can be particularly important in modelling social data. We then focus particular attention on multivariate normal prior distributions for the log-linear model parameters. Here, an improper prior distribution for certain Poisson model parameters is required for valid multinomial analysis, and we derive conditions under which the resulting posterior distribution is proper. We also consider the construction of prior distributions across models, and for model parameters, when uncertainty exists about the appropriate form of the model. We present classes of Poisson and multinomial models, invariant under certain natural groups of permutations of the cells. We demonstrate that, if prior belief concerning the model parameters is also invariant, as is the case in a ‘reference’ analysis, then the choice of prior distribution is considerably restricted. The analysis of multivariate categorical data in the form of a contingency table is considered in detail. We illustrate the methods with two examples.  相似文献   

13.
In this paper, we propose two multimodal circular distributions which are suitable for modeling circular data sets with two or more modes. Both distributions belong to the regular exponential family of distributions and are considered as extensions of the von Mises distribution. Hence, they possess the highly desirable properties, such as the existence of non-trivial sufficient statistics and optimal inferences for their parameters. Fine particulates (PM2.5) are generally emitted from activities such as industrial and residential combustion and from vehicle exhaust. We illustrate the utility of our proposed models using a real data set consisting of fine particulates (PM2.5) pollutant levels in Houston region during Fall season in 2019. Our results provide a strong evidence that its diurnal pattern exhibits four modes; two peaks during morning and evening rush hours and two peaks in between.  相似文献   

14.
A mixture measurement error model built upon skew normal distributions and normal distributions is developed to evaluate various impacts of measurement errors to parameter inferences in logistic regressions. Data generated from survey questionnaires are usually error contaminated. We consider two types of errors: person-specific bias and random errors. Person-specific bias is modelled using skew normal distribution, and the distribution of random errors is described by a normal distribution. Intensive simulations are conducted to evaluate the contribution of each component in the mixture to outcomes of interest. The proposed method is then applied to a questionnaire data set generated from a neural tube defect study. Simulation results and real data application indicate that ignoring measurement errors or misspecifying measurement error components can both produce misleading results, especially when measurement errors are actually skew distributed. The inferred parameters can be attenuated or inflated depending on how the measurement error components are specified. We expect the findings will self-explain the importance of adjusting measurement errors and thus benefit future data collection effort.  相似文献   

15.
This paper explores the use of data augmentation in settings beyond the standard Bayesian one. In particular, we show that, after proposing an appropriate generalised data-augmentation principle, it is possible to extend the range of sampling situations in which fiducial methods can be applied by constructing Markov chains whose stationary distributions represent valid posterior inferences on model parameters. Some properties of these chains are presented and a number of open questions are discussed. We also use the approach to draw out connections between classical and Bayesian approaches in some standard settings.  相似文献   

16.
Very often, in psychometric research, as in educational assessment, it is necessary to analyze item response from clustered respondents. The multiple group item response theory (IRT) model proposed by Bock and Zimowski [12] provides a useful framework for analyzing such type of data. In this model, the selected groups of respondents are of specific interest such that group-specific population distributions need to be defined. The usual assumption for parameter estimation in this model, which is that the latent traits are random variables following different symmetric normal distributions, has been questioned in many works found in the IRT literature. Furthermore, when this assumption does not hold, misleading inference can result. In this paper, we consider that the latent traits for each group follow different skew-normal distributions, under the centered parameterization. We named it skew multiple group IRT model. This modeling extends the works of Azevedo et al. [4], Bazán et al. [11] and Bock and Zimowski [12] (concerning the latent trait distribution). Our approach ensures that the model is identifiable. We propose and compare, concerning convergence issues, two Monte Carlo Markov Chain (MCMC) algorithms for parameter estimation. A simulation study was performed in order to evaluate parameter recovery for the proposed model and the selected algorithm concerning convergence issues. Results reveal that the proposed algorithm recovers properly all model parameters. Furthermore, we analyzed a real data set which presents asymmetry concerning the latent traits distribution. The results obtained by using our approach confirmed the presence of negative asymmetry for some latent trait distributions.  相似文献   

17.
This article focuses on two‐phase sampling designs for a population with unknown number of rare objects. The first phase is used to estimate the number of rare or potentially rare objects in a population, and the second phase to design sampling plans to capture a certain number or a certain proportion of such type of objects. A hypergeometric‐binomial model is applied to infer the number of rare or potentially rare objects and Monte Carlo simulation based approaches are developed to calculate needed sample sizes. Simulations and real data applications are discussed. The Canadian Journal of Statistics 37: 417–434; 2009 © 2009 Statistical Society of Canada  相似文献   

18.
Many statistical agencies, survey organizations, and research centers collect data that suffer from item nonresponse and erroneous or inconsistent values. These data may be required to satisfy linear constraints, for example, bounds on individual variables and inequalities for ratios or sums of variables. Often these constraints are designed to identify faulty values, which then are blanked and imputed. The data also may exhibit complex distributional features, including nonlinear relationships and highly nonnormal distributions. We present a fully Bayesian, joint model for modeling or imputing data with missing/blanked values under linear constraints that (i) automatically incorporates the constraints in inferences and imputations, and (ii) uses a flexible Dirichlet process mixture of multivariate normal distributions to reflect complex distributional features. Our strategy for estimation is to augment the observed data with draws from a hypothetical population in which the constraints are not present, thereby taking advantage of computationally expedient methods for fitting mixture models. Missing/blanked items are sampled from their posterior distribution using the Hit-and-Run sampler, which guarantees that all imputations satisfy the constraints. We illustrate the approach using manufacturing data from Colombia, examining the potential to preserve joint distributions and a regression from the plant productivity literature. Supplementary materials for this article are available online.  相似文献   

19.
Abstract.  We consider large sample inference in a semiparametric logistic/proportional-hazards mixture model. This model has been proposed to model survival data where there exists a positive portion of subjects in the population who are not susceptible to the event under consideration. Previous studies of the logistic/proportional-hazards mixture model have focused on developing point estimation procedures for the unknown parameters. This paper studies large sample inferences based on the semiparametric maximum likelihood estimator. Specifically, we establish existence, consistency and asymptotic normality results for the semiparametric maximum likelihood estimator. We also derive consistent variance estimates for both the parametric and non-parametric components. The results provide a theoretical foundation for making large sample inference under the logistic/proportional-hazards mixture model.  相似文献   

20.
The Quermass‐interaction model allows to generalize the classical germ‐grain Boolean model in adding a morphological interaction between the grains. It enables to model random structures with specific morphologies, which are unlikely to be generated from a Boolean model. The Quermass‐interaction model depends in particular on an intensity parameter, which is impossible to estimate from classical likelihood or pseudo‐likelihood approaches because the number of points is not observable from a germ‐grain set. In this paper, we present a procedure based on the Takacs–Fiksel method, which is able to estimate all parameters of the Quermass‐interaction model, including the intensity. An intensive simulation study is conducted to assess the efficiency of the procedure and to provide practical recommendations. It also illustrates that the estimation of the intensity parameter is crucial in order to identify the model. The Quermass‐interaction model is finally fitted by our method to P. Diggle's heather data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号