首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 625 毫秒
1.
In order to explore and compare a finite number T of data sets by applying functional principal component analysis (FPCA) to the T associated probability density functions, we estimate these density functions by using the multivariate kernel method. The data set sizes being fixed, we study the behaviour of this FPCA under the assumption that all the bandwidth matrices used in the estimation of densities are proportional to a common parameter h and proportional to either the variance matrices or the identity matrix. In this context, we propose a selection criterion of the parameter h which depends only on the data and the FPCA method. Then, on simulated examples, we compare the quality of approximation of the FPCA when the bandwidth matrices are selected using either the previous criterion or two other classical bandwidth selection methods, that is, a plug-in or a cross-validation method.  相似文献   

2.
Recently, van der Linde (Comput. Stat. Data Anal. 53:517–533, 2008) proposed a variational algorithm to obtain approximate Bayesian inference in functional principal components analysis (FPCA), where the functions were observed with Gaussian noise. Generalized FPCA under different noise models with sparse longitudinal data was developed by Hall et al. (J. R. Stat. Soc. B 70:703–723, 2008), but no Bayesian approach is available yet. It is demonstrated that an adapted version of the variational algorithm can be applied to obtain a Bayesian FPCA for canonical parameter functions, particularly log-intensity functions given Poisson count data or logit-probability functions given binary observations. To this end a second order Taylor expansion of the log-likelihood, that is, a working Gaussian distribution and hence another step of approximation, is used. Although the approach is conceptually straightforward, difficulties can arise in practical applications depending on the accuracy of the approximation and the information in the data. A modified algorithm is introduced generally for one-parameter exponential families and exemplified for binary and count data. Conditions for its successful application are discussed and illustrated using simulated data sets. Also an application with real data is presented.  相似文献   

3.
Abstract

This paper considers a partially non linear model E(Y|X, z, t) = f(X, β) + zTg(t) and gives its T-type estimate, which is a weighted quasi-likelihood estimate using sieve method and can be obtained by EM algorithm. The influence functions and asymptotic properties of T-type estimate (consistency and asymptotic normality) are discussed, and convergence rate of both parametric and non parametric components are obtained. Simulation results show the shape of influence functions and prove that the T-type estimate performs quite well. The proposed estimate is also applied to a data set and compared with the least square estimate and least absolute deviation estimate.  相似文献   

4.
ABSTRACT

Consider the problem of estimating the positions of a set of targets in a multidimensional Euclidean space from distances reported by a number of observers when the observers do not know their own positions in the space. Each observer reports the distance from the observer to each target plus a random error. This statistical problem is the basic model for the various forms of what is called multidimensional unfolding in the psychometric literature. Multidimensional unfolding methodology as developed in the field of cognitive psychology is basically a statistical estimation problem where the data structure is a set of measures that are monotonic functions of Euclidean distances between a number of observers and targets in a multidimensional space. The new method presented in this article deals with estimating the target locations and the observer positions when the observations are functions of the squared distances between observers and targets observed with an additive random error in a two-dimensional space. The method provides robust estimates of the target locations in a multidimensional space for the parametric structure of the data generating model presented in the article. The method also yields estimates of the orientation of the coordinate system and the mean and variances of the observer locations. The mean and the variances are not estimated by standard unfolding methods which yield targets maps that are invariant to a rotation of the coordinate system. The data is transformed so that the nonlinearity due to the squared observer locations is removed. The sampling properties of the estimates are derived from the asymptotic variances of the additive errors of a maximum likelihood factor analysis of the sample covariance matrix of the transformed data augmented with bootstrapping. The robustness of the new method is tested using artificial data. The method is applied to a 2001 survey data set from Turkey to provide a real data example.  相似文献   

5.
Abstract. Maximum likelihood estimation in many classical statistical problems is beset by multimodality. This article explores several variations of deterministic annealing that tend to avoid inferior modes and find the dominant mode. In Bayesian settings, annealing can be tailored to find the dominant mode of the log posterior. Our annealing algorithms involve essentially trivial changes to existing optimization algorithms built on block relaxation or the EM or MM principle. Our examples include estimation with the multivariate t distribution, Gaussian mixture models, latent class analysis, factor analysis, multidimensional scaling and a one‐way random effects model. In the numerical examples explored, the proposed annealing strategies significantly improve the chances for locating the global maximum.  相似文献   

6.
This paper proposes a variable selection method for detecting abnormal items based on the T2 test when the observations on abnormal items are available. Based on the unbiased estimates of the powers for all subsets of variables, the variable selection method selects the subset of variables that maximizes the power estimate. Since more than one subsets of variables maximize the power estimate frequently, the averaged p-value of the rejected items is used as a second criterion. Although the performance of the method depends on the sample size for the abnormal items and the true power values for all subsets of variables, numerical experiments show the effectiveness of the proposed method. Since normal and abnormal items are simulated using one-factor and two-factor models, basic properties of the power functions for the models are investigated.  相似文献   

7.
Multivariate control charts are used to monitor stochastic processes for changes and unusual observations. Hotelling's T2 statistic is calculated for each new observation and an out‐of‐control signal is issued if it goes beyond the control limits. However, this classical approach becomes unreliable as the number of variables p approaches the number of observations n, and impossible when p exceeds n. In this paper, we devise an improvement to the monitoring procedure in high‐dimensional settings. We regularise the covariance matrix to estimate the baseline parameter and incorporate a leave‐one‐out re‐sampling approach to estimate the empirical distribution of future observations. An extensive simulation study demonstrates that the new method outperforms the classical Hotelling T2 approach in power, and maintains appropriate false positive rates. We demonstrate the utility of the method using a set of quality control samples collected to monitor a gas chromatography–mass spectrometry apparatus over a period of 67 days.  相似文献   

8.
This paper introduces regularized functional principal component analysis for multidimensional functional data sets, utilizing Gaussian basis functions. An essential point in a functional approach via basis expansions is the evaluation of the matrix for the integral of the product of any two bases (cross-product matrix). Advantages of the use of the Gaussian type of basis functions in the functional approach are that its cross-product matrix can be easily calculated, and it creates a much more flexible instrument for transforming each individual's observation into a functional form. The proposed method is applied to the analysis of three-dimensional (3D) protein structural data that can be referred to as unbalanced data. It is shown that our method extracts useful information from unbalanced data through the application. Numerical experiments are conducted to investigate the effectiveness of our method via Gaussian basis functions, compared to the method based on B-splines. On performing regularized functional principal component analysis with B-splines, we also derive the exact form of its cross-product matrix. The numerical results show that our methodology is superior to the method based on B-splines for unbalanced data.  相似文献   

9.
Abstract. We consider N independent stochastic processes (X i (t), t ∈ [0,T i ]), i=1,…, N, defined by a stochastic differential equation with drift term depending on a random variable φ i . The distribution of the random effect φ i depends on unknown parameters which are to be estimated from the continuous observation of the processes Xi. We give the expression of the exact likelihood. When the drift term depends linearly on the random effect φ i and φ i has Gaussian distribution, an explicit formula for the likelihood is obtained. We prove that the maximum likelihood estimator is consistent and asymptotically Gaussian, when T i =T for all i and N tends to infinity. We discuss the case of discrete observations. Estimators are computed on simulated data for several models and show good performances even when the length time interval of observations is not very large.  相似文献   

10.
Moment generating functions and more generally, integral transforms for goodness-of-fit tests have been in use in the last several decades. Given a set of observations, the empirical transforms are easy to compute, being simply a sample mean, and due to uniqueness properties, these functions can be used for goodness-of-fit tests. This paper focuses on time series observations from a stationary process for which the moment generating function exists and the correlations have long-memory. For long-memory processes, the infinite sum of the correlations diverges and the realizations tend to have spurious trend like patterns where there may be none. Our aim is to use the empirical moment generating function to test the null hypothesis that the marginal distribution is Gaussian. We provide a simple proof of a central limit theorem using ideas from Gaussian subordination models (Taqqu, 1975) and derive critical regions for a graphical test of normality, namely the T3-plot ( Ghosh, 1996). Some simulated and real data examples are used for illustration.  相似文献   

11.
Consider semi-competing risks data (two times to concurrent events are studied but only one of them is right-censored by the other one) where the link between the times Y and C to non-terminal and terminal events, respectively, is modeled by a family of Archimedean copulas. Moreover, both Y and C are submitted to an independent right censoring variable D. We propose to estimate the parameter of the copula and some resulting survival functions using a pseudo maximum likelihood approach. The main advantage of this procedure is that it extends to multidimensional parameters copulas. We perform simulations to study the behavior of our estimation procedure and its impact on other related estimators and we apply our method to real data coming from a study on the Hodgkin disease.  相似文献   

12.
A doubly censoring scheme occurs when the lifetimes T being measured, from a well-known time origin, are exactly observed within a window [L, R] of observational time and are otherwise censored either from above (right-censored observations) or below (left-censored observations). Sample data consists on the pairs (U, δ) where U = min{R, max{T, L}} and δ indicates whether T is exactly observed (δ = 0), right-censored (δ = 1) or left-censored (δ = −1). We are interested in the estimation of the marginal behaviour of the three random variables T, L and R based on the observed pairs (U, δ). We propose new nonparametric simultaneous marginal estimators [^(S)]T, [^(S)]L{\hat S_{T}, \hat S_{L}} and [^(S)]R{\hat S_{R}} for the survival functions of T, L and R, respectively, by means of an inverse-probability-of-censoring approach. The proposed estimators [^(S)]T, [^(S)]L{\hat S_{T}, \hat S_{L}} and [^(S)]R{\hat S_{R}} are not computationally intensive, generalize the empirical survival estimator and reduce to the Kaplan-Meier estimator in the absence of left-censored data. Furthermore, [^(S)]T{\hat S_{T}} is equivalent to a self-consistent estimator, is uniformly strongly consistent and asymptotically normal. The method is illustrated with data from a cohort of drug users recruited in a detoxification program in Badalona (Spain). For these data we estimate the survival function for the elapsed time from starting IV-drugs to AIDS diagnosis, as well as the potential follow-up time. A simulation study is discussed to assess the performance of the three survival estimators for moderate sample sizes and different censoring levels.  相似文献   

13.
ABSTRACT

We develop the saddlepoint approximations in obtaining the transition functions for general subordinator processes. We derive explicit expressions of the first- and second-order approximations. Specifically, we consider some particular classes of subordinators including the Poisson processes, the Gamma processes, the α-stable subordinators, and the Poisson random integrals. We test this technique on the Poisson and Gamma processes, which have closed-form transition functions. Outcomes show that the approximate expressions are consistent with the true transition functions. We then use this method to predict transition density functions for the α-stable subordinator processes. Finally, we calculate approximated transition densities for some Poisson random integrations. Numerical analysis shows the perfect ability of the saddlepoint approximations to predict the transition densities of the α-stable processes and the Poisson random integrations.  相似文献   

14.
A Bayesian analysis is provided for the Wilcoxon signed-rank statistic (T+). The Bayesian analysis is based on a sign-bias parameter φ on the (0, 1) interval. For the case of a uniform prior probability distribution for φ and for small sample sizes (i.e., 6 ? n ? 25), values for the statistic T+ are computed that enable probabilistic statements about φ. For larger sample sizes, approximations are provided for the asymptotic likelihood function P(T+|φ) as well as for the posterior distribution P(φ|T+). Power analyses are examined both for properly specified Gaussian sampling and for misspecified non Gaussian models. The new Bayesian metric has high power efficiency in the range of 0.9–1 relative to a standard t test when there is Gaussian sampling. But if the sampling is from an unknown and misspecified distribution, then the new statistic still has high power; in some cases, the power can be higher than the t test (especially for probability mixtures and heavy-tailed distributions). The new Bayesian analysis is thus a useful and robust method for applications where the usual parametric assumptions are questionable. These properties further enable a way to do a generic Bayesian analysis for many non Gaussian distributions that currently lack a formal Bayesian model.  相似文献   

15.
Typical panel data models make use of the assumption that the regression parameters are the same for each individual cross-sectional unit. We propose tests for slope heterogeneity in panel data models. Our tests are based on the conditional Gaussian likelihood function in order to avoid the incidental parameters problem induced by the inclusion of individual fixed effects for each cross-sectional unit. We derive the Conditional Lagrange Multiplier test that is valid in cases where N → ∞ and T is fixed. The test applies to both balanced and unbalanced panels. We expand the test to account for general heteroskedasticity where each cross-sectional unit has its own form of heteroskedasticity. The modification is possible if T is large enough to estimate regression coefficients for each cross-sectional unit by using the MINQUE unbiased estimator for regression variances under heteroskedasticity. All versions of the test have a standard Normal distribution under general assumptions on the error distribution as N → ∞. A Monte Carlo experiment shows that the test has very good size properties under all specifications considered, including heteroskedastic errors. In addition, power of our test is very good relative to existing tests, particularly when T is not large.  相似文献   

16.
The Lasso has sparked interest in the use of penalization of the log‐likelihood for variable selection, as well as for shrinkage. We are particularly interested in the more‐variables‐than‐observations case of characteristic importance for modern data. The Bayesian interpretation of the Lasso as the maximum a posteriori estimate of the regression coefficients, which have been given independent, double exponential prior distributions, is adopted. Generalizing this prior provides a family of hyper‐Lasso penalty functions, which includes the quasi‐Cauchy distribution of Johnstone and Silverman as a special case. The properties of this approach, including the oracle property, are explored, and an EM algorithm for inference in regression problems is described. The posterior is multi‐modal, and we suggest a strategy of using a set of perfectly fitting random starting values to explore modes in different regions of the parameter space. Simulations show that our procedure provides significant improvements on a range of established procedures, and we provide an example from chemometrics.  相似文献   

17.
We consider the conditional estimation of the survival function of the time T2 to a second event as a function of the time T1 to a first event when there is a censoring mechanism acting on their sum T1+T2. The problem has been motivated by a treatment interruption study aimed at improving the quality of life of HIV-infected patients. We base the analysis on the survival function of T2 given that T1I, where I represents a period of scientific interest (1 trimester, 1 year, 2 years, etc.) and propose a non-parametric estimator for the survival function of T2 given that T1I, which takes into account both the selection bias and the heterogeneity due to the dependent censoring. The proposed estimator for the survival function uses the risk group of T2 conditioned on the categories of T1 and corrects for the dependent censoring using weights defined by the observed values of T1. The estimator, properly normalized, converges weakly to a zero-mean Gaussian process. We estimate the variance of the limiting process via a bootstrap methodology. Properties of the proposed estimator are illustrated by an extensive simulation study. The motivating data set is analysed by means of this new methodology.  相似文献   

18.
In functional linear regression, one conventional approach is to first perform functional principal component analysis (FPCA) on the functional predictor and then use the first few leading functional principal component (FPC) scores to predict the response variable. The leading FPCs estimated by the conventional FPCA stand for the major source of variation of the functional predictor, but these leading FPCs may not be mostly correlated with the response variable, so the prediction accuracy of the functional linear regression model may not be optimal. In this paper, we propose a supervised version of FPCA by considering the correlation of the functional predictor and response variable. It can automatically estimate leading FPCs, which represent the major source of variation of the functional predictor and are simultaneously correlated with the response variable. Our supervised FPCA method is demonstrated to have a better prediction accuracy than the conventional FPCA method by using one real application on electroencephalography (EEG) data and three carefully designed simulation studies.  相似文献   

19.
ABSTRACT

We propose a semiparametric approach to estimate the existence and location of a statistical change-point to a nonlinear multivariate time series contaminated with an additive noise component. In particular, we consider a p-dimensional stochastic process of independent multivariate normal observations where the mean function varies smoothly except at a single change-point. Our approach involves conducting a Bayesian analysis on the empirical detail coefficients of the original time series after a wavelet transform. If the mean function of our time series can be expressed as a multivariate step function, we find our Bayesian-wavelet method performs comparably with classical parametric methods such as maximum likelihood estimation. The advantage of our multivariate change-point method is seen in how it applies to a much larger class of mean functions that require only general smoothness conditions.  相似文献   

20.
Abstract

We propose a simple procedure based on an existing “debiased” l1-regularized method for inference of the average partial effects (APEs) in approximately sparse probit and fractional probit models with panel data, where the number of time periods is fixed and small relative to the number of cross-sectional observations. Our method is computationally simple and does not suffer from the incidental parameters problems that come from attempting to estimate as a parameter the unobserved heterogeneity for each cross-sectional unit. Furthermore, it is robust to arbitrary serial dependence in underlying idiosyncratic errors. Our theoretical results illustrate that inference concerning APEs is more challenging than inference about fixed and low-dimensional parameters, as the former concerns deriving the asymptotic normality for sample averages of linear functions of a potentially large set of components in our estimator when a series approximation for the conditional mean of the unobserved heterogeneity is considered. Insights on the applicability and implications of other existing Lasso-based inference procedures for our problem are provided. We apply the debiasing method to estimate the effects of spending on test pass rates. Our results show that spending has a positive and statistically significant average partial effect; moreover, the effect is comparable to found using standard parametric methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号