首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The choice of the model framework in a regression setting depends on the nature of the data. The focus of this study is on changepoint data, exhibiting three phases: incoming and outgoing, both of which are linear, joined by a curved transition. Bent-cable regression is an appealing statistical tool to characterize such trajectories, quantifying the nature of the transition between the two linear phases by modeling the transition as a quadratic phase with unknown width. We demonstrate that a quadratic function may not be appropriate to adequately describe many changepoint data. We then propose a generalization of the bent-cable model by relaxing the assumption of the quadratic bend. The properties of the generalized model are discussed and a Bayesian approach for inference is proposed. The generalized model is demonstrated with applications to three data sets taken from environmental science and economics. We also consider a comparison among the quadratic bent-cable, generalized bent-cable and piecewise linear models in terms of goodness of fit in analyzing both real-world and simulated data. This study suggests that the proposed generalization of the bent-cable model can be valuable in adequately describing changepoint data that exhibit either an abrupt or gradual transition over time.  相似文献   

2.
Motivated by time series of atmospheric concentrations of certain pollutants the authors develop bent‐cable regression for autocorrelated errors. Bent‐cable regression extends the popular piecewise linear (broken‐stick) model, allowing for a smooth change region of any non‐negative width. Here the authors consider autoregressive noise added to a bent‐cable mean structure, with unknown regression and time series parameters. They develop asymptotic theory for conditional least‐squares estimation in a triangular array framework, wherein each segment of the bent cable contains an increasing number of observations while the autoregressive order remains constant as the sample size grows. They explore the theory in a simulation study, develop implementation details, apply the methodology to the motivating pollutant dataset, and provide a scientific interpretation of the bent‐cable change point not discussed previously. The Canadian Journal of Statistics 38: 386–407; 2010 © 2010 Statistical Society of Canada  相似文献   

3.
This paper proposes robust regression to solve the problem of outliers in seemingly unrelated regression (SUR) models. The authors present an adaptation of S‐estimators to SUR models. S‐estimators are robust, have a high breakdown point and are much more efficient than other robust regression estimators commonly used in practice. Furthermore, modifications to Ruppert's algorithm allow a fast evaluation of them in this context. The classical example of U.S. corporations is revisited, and it appears that the procedure gives an interesting insight into the problem.  相似文献   

4.
With the rapid growth of modern technology, many biomedical studies are being conducted to collect massive datasets with volumes of multi‐modality imaging, genetic, neurocognitive and clinical information from increasingly large cohorts. Simultaneously extracting and integrating rich and diverse heterogeneous information in neuroimaging and/or genomics from these big datasets could transform our understanding of how genetic variants impact brain structure and function, cognitive function and brain‐related disease risk across the lifespan. Such understanding is critical for diagnosis, prevention and treatment of numerous complex brain‐related disorders (e.g., schizophrenia and Alzheimer's disease). However, the development of analytical methods for the joint analysis of both high‐dimensional imaging phenotypes and high‐dimensional genetic data, a big data squared (BD2) problem, presents major computational and theoretical challenges for existing analytical methods. Besides the high‐dimensional nature of BD2, various neuroimaging measures often exhibit strong spatial smoothness and dependence and genetic markers may have a natural dependence structure arising from linkage disequilibrium. We review some recent developments of various statistical techniques for imaging genetics, including massive univariate and voxel‐wise approaches, reduced rank regression, mixture models and group sparse multi‐task regression. By doing so, we hope that this review may encourage others in the statistical community to enter into this new and exciting field of research. The Canadian Journal of Statistics 47: 108–131; 2019 © 2019 Statistical Society of Canada  相似文献   

5.
Abstract. The Dantzig selector (DS) is a recent approach of estimation in high‐dimensional linear regression models with a large number of explanatory variables and a relatively small number of observations. As in the least absolute shrinkage and selection operator (LASSO), this approach sets certain regression coefficients exactly to zero, thus performing variable selection. However, such a framework, contrary to the LASSO, has never been used in regression models for survival data with censoring. A key motivation of this article is to study the estimation problem for Cox's proportional hazards (PH) function regression models using a framework that extends the theory, the computational advantages and the optimal asymptotic rate properties of the DS to the class of Cox's PH under appropriate sparsity scenarios. We perform a detailed simulation study to compare our approach with other methods and illustrate it on a well‐known microarray gene expression data set for predicting survival from gene expressions.  相似文献   

6.
Spatial econometric models estimated on the big geo-located point data have at least two problems: limited computational capabilities and inefficient forecasting for the new out-of-sample geo-points. This is because of spatial weights matrix W defined for in-sample observations only and the computational complexity. Machine learning models suffer the same when using kriging for predictions; thus this problem still remains unsolved. The paper presents a novel methodology for estimating spatial models on big data and predicting in new locations. The approach uses bootstrap and tessellation to calibrate both model and space. The best bootstrapped model is selected with the PAM (Partitioning Around Medoids) algorithm by classifying the regression coefficients jointly in a nonindependent manner. Voronoi polygons for the geo-points used in the best model allow for a representative space division. New out-of-sample points are assigned to tessellation tiles and linked to the spatial weights matrix as a replacement for an original point what makes feasible usage of calibrated spatial models as a forecasting tool for new locations. There is no trade-off between forecast quality and computational efficiency in this approach. An empirical example illustrates a model for business locations and firms' profitability.  相似文献   

7.
Lesion count observed on brain magnetic resonance imaging scan is a common end point in phase 2 clinical trials evaluating therapeutic treatment in relapsing remitting multiple sclerosis (MS). This paper compares the performances of Poisson, zero‐inflated poisson (ZIP), negative binomial (NB), and zero‐inflated NB (ZINB) mixed‐effects regression models in fitting lesion count data in a clinical trial evaluating the efficacy and safety of fingolimod in comparison with placebo, in MS. The NB and ZINB models prove to be superior to the Poisson and ZIP models. We discuss the advantages and limitations of zero‐inflated models in the context of MS treatment. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

8.
This article analyzes a small censored data set to demonstrate the potential dangers of using statistical computing packages without understanding the details of statistical methods. The data, consisting of censored response times with heavy ties in one time point, were analyzed with a Cox regression model utilizing SAS PHREG and BMDP2L procedures. The p values, reported from both SAS PHREG and BMDP2L procedures, for testing the equality of two treatments vary considerably. This article illustrates that (1) the Breslow likelihood used in both BMDP2L and SAS PHREG procedures is too conservative and can have a critical effect on an extreme data set, (2) Wald's test in the SAS PHREG procedure may yield absurd results from most likelihood models, and (3) BMDP2L needs to include more than just the Breslow likelihood in future development.  相似文献   

9.
Parametric link transformation families have shown to be useful in the analysis of binary regression data since they avoid th? problem of link misspecifaction. Inference for these models are commonly based on likelihood methods. Duffy and Santner (1988, 1989) however showed that ordinary logistic maximum likelihood estimators (MLE) have poor mean square error (MSE) behavior in small samples compared to alternative norm restricted estimators. This paper extends these alternative norm restricted estimators to binary regression models with any specified parametric link family. These extended norm restricted MLE's are strongly consistent and efficient under regularity conditions. Finally a simulation study shows that an empiric version of norm restricted MLE's exhibit superior MSE behavior in small samples compared to MLE's with fixed known link.  相似文献   

10.
This paper considers the detection of abrupt changes in the transition matrix of a Markov chain from a Bayesian viewpoint. It derives Bayes factors and posterior probabilities for unknown numbers of change‐points, as well as the positions of the change‐points, assuming non‐informative but proper priors on the parameters and fixed upper bound. The Markov chain Monte Carlo approach proposed by Chib in 1998 for estimating multiple change‐points models is adapted for the Markov chain model. It is especially useful when there are many possible change‐points. The method can be applied in a wide variety of disciplines and is particularly relevant in the social and behavioural sciences, for analysing the effects of events on the attitudes of people.  相似文献   

11.
The mode of a distribution provides an important summary of data and is often estimated on the basis of some non‐parametric kernel density estimator. This article develops a new data analysis tool called modal linear regression in order to explore high‐dimensional data. Modal linear regression models the conditional mode of a response Y given a set of predictors x as a linear function of x . Modal linear regression differs from standard linear regression in that standard linear regression models the conditional mean (as opposed to mode) of Y as a linear function of x . We propose an expectation–maximization algorithm in order to estimate the regression coefficients of modal linear regression. We also provide asymptotic properties for the proposed estimator without the symmetric assumption of the error density. Our empirical studies with simulated data and real data demonstrate that the proposed modal regression gives shorter predictive intervals than mean linear regression, median linear regression and MM‐estimators.  相似文献   

12.
In survey sampling and in stereology, it is often desirable to estimate the ratio of means θ= E(Y)/E(X) from bivariate count data (X, Y) with unknown joint distribution. We review methods that are available for this problem, with particular reference to stereological applications. We also develop new methods based on explicit statistical models for the data, and associated model diagnostics. The methods are tested on a stereological dataset. For point‐count data, binomial regression and bivariate binomial models are generally adequate. Intercept‐count data are often overdispersed relative to Poisson regression models, but adequately fitted by negative binomial regression.  相似文献   

13.
Cook's distance (1977) has become the standard influence diagnostic tool for analyzing cross–sectional regression studies. This paper introduces an analogue of Cook's distance in fixed effects models for longitudinal data. We demonstrate that this statistic is dominated by the effects of nuisance parameters, and hence its effectiveness as an influence measure in the longitudinal data setting is limited.  相似文献   

14.
Conventional methods apply symmetric prior distributions such as a normal distribution or a Laplace distribution for regression coefficients, which may be suitable for median regression and exhibit no robustness to outliers. This work develops a quantile regression on linear panel data model without heterogeneity from a Bayesian point of view, i.e. upon a location-scale mixture representation of the asymmetric Laplace error distribution, and provides how the posterior distribution is summarized using Markov chain Monte Carlo methods. Applying this approach to the 1970 British Cohort Study (BCS) data, it finds that a different maternal health problem has different influence on child's worrying status at different quantiles. In addition, applying stochastic search variable selection for maternal health problems to the 1970 BCS data, it finds that maternal nervous breakdown, among the 25 maternal health problems, contributes most to influence the child's worrying status.  相似文献   

15.
In real‐data analysis, deciding the best subset of variables in regression models is an important problem. Akaike's information criterion (AIC) is often used in order to select variables in many fields. When the sample size is not so large, the AIC has a non‐negligible bias that will detrimentally affect variable selection. The present paper considers a bias correction of AIC for selecting variables in the generalized linear model (GLM). The GLM can express a number of statistical models by changing the distribution and the link function, such as the normal linear regression model, the logistic regression model, and the probit model, which are currently commonly used in a number of applied fields. In the present study, we obtain a simple expression for a bias‐corrected AIC (corrected AIC, or CAIC) in GLMs. Furthermore, we provide an ‘R’ code based on our formula. A numerical study reveals that the CAIC has better performance than the AIC for variable selection.  相似文献   

16.
This paper presents a two‐stage procedure for estimating the conditional support curve of a random variable X, given the information of a random vector X. Quantile estimation is followed by an extremal analysis on the residuals for problems which can be written as regression models. The technique is applied to data from the National Bureau of Economic Research and US Census Bureau's Center for Economic Studies which contain all four‐digit manufacturing industries. Simulation results show that in linear regression models the proposed estimation procedure is more efficient than the extreme linear regression quantile.  相似文献   

17.
It sometimes occurs that one or more components of the data exert a disproportionate influence on the model estimation. We need a reliable tool for identifying such troublesome cases in order to decide either eliminate from the sample, when the data collect was badly realized, or otherwise take care on the use of the model because the results could be affected by such components. Since a measure for detecting influential cases in linear regression setting was proposed by Cook [Detection of influential observations in linear regression, Technometrics 19 (1977), pp. 15–18.], apart from the same measure for other models, several new measures have been suggested as single-case diagnostics. For most of them some cutoff values have been recommended (see [D.A. Belsley, E. Kuh, and R.E. Welsch, Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, 2nd ed., John Wiley & Sons, New York, Chichester, Brisban, (2004).], for instance), however the lack of a quantile type cutoff for Cook's statistics has induced the analyst to deal only with index plots as worthy diagnostic tools. Focussed on logistic regression, the aim of this paper is to provide the asymptotic distribution of Cook's distance in order to look for a meaningful cutoff point for detecting influential and leverage observations.  相似文献   

18.
The extremogram is a useful tool for measuring extremal dependence and checking model adequacy in a time series. We define the extremogram in the spatial domain when the data is observed on a lattice or at locations distributed as a Poisson point process in d‐dimensional space. We establish a central limit theorem for the empirical spatial extremogram. We show these conditions are applicable for max‐moving average processes and Brown–Resnick processes and illustrate the empirical extremogram's performance via simulation. We also demonstrate its practical use with a data set related to rainfall in a region in Florida and ground‐level ozone in the eastern United States.  相似文献   

19.
This paper discusses a model in which the regression lines will be passing through a common point. This point exists as a focal point in the wind-blown sand phenomena. The model of regression lines will be called ‘the focal point regression model’. The focal point will move according to the conditions of the experiments or the measurement site, so it must be estimated together with regression coefficients. The existence of the focal point is mathematically proved in the research field of coastal engineering, but its physical meaning and exact estimation method have not been established. Considering the experimental and/or measurement conditions, five models, that is, common or different error variance(s), passing through or not the centroid and Bayes-like approach are proposed. Moreover, the formulae of direct computation for a focal point under some conditions are given for engineering purpose. The models are applied to the wind-blown sand data, and behaviors of the models are verified by numerical experiments.  相似文献   

20.
The authors present a consistent lack‐of‐fit test in nonlinear regression models. The proposed procedure possesses some nice properties of Zheng's test such as the consistency, the ability to detect any local alternatives approaching the null at rates slower than the parametric rate. What's more, for a predetermined kernel function, the proposed test is more powerful than Zheng's test and the validity of these findings is confirmed by the simulation studies and a real data example. In addition, the authors find out a close connection between the choices of normal kernel functions and the bandwidths. The Canadian Journal of Statistics 39: 108–125; 2011 © 2011 Statistical Society of Canada  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号