首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 828 毫秒
1.
Summary.  Social science applications of sequence analysis have thus far involved the development of a typology on the basis of an analysis of one or two variables which have had a relatively low number of different states. There is a yet unexplored potential for sequence analysis to be applied to a greater number of variables and thereby a much larger state space. The development of a typology of employment experiences, for example, without reference to data on changes in housing, marital and family status is arguably inadequate. The paper demonstrates the use of sequence analysis in the examination of multivariable combinations of status as they change over time and shows that this method can provide insights that are difficult to achieve through other analytic methods. The data that are examined here provide support to intuitive understandings of clusters of common experiences which are both life course specific and related to socio-economic factors. Housing tenure is found to be of key importance in understanding the holistic trajectories that are examined. This suggests that life course trajectories are sharply differentiated by experience of social housing.  相似文献   

2.
Summary.  Sequence analysis has become one of the most used and discussed tools to describe life course trajectories. We introduce a new tool for the graphical exploratory analysis of sequences. Our plots combine standard sequence plots with the results that are provided by multi-dimensional scaling. We apply our procedure to describe work and family careers of Israeli women by using data from the Israel Social Mobility Survey. We first focus on some preliminary choices relative to the definition of the sequences: the age span, the length of the sequences and the set of states registered in each time period. We then describe how our plots can be used to gain insights about the main features of sequences and about the relationships between sequences and external information.  相似文献   

3.
We improve a Monte Carlo algorithm which computes accurate approximations of smooth functions on multidimensional Tchebychef polynomials by using quasi-random sequences. We first show that the convergence of the previous algorithm is twice faster using these sequences. Then, we slightly modify this algorithm to make it work from a single set of random or quasi-random points. This especially leads to a Quasi-Monte Carlo method with an increased rate of convergence for numerical integration.  相似文献   

4.
In the field of molecular biology, it is often of interest to analyze microarray data for clustering genes based on similar profiles of gene expression to identify genes that are differentially expressed under multiple biological conditions. One of the notable characteristics of a gene expression profile is that it shows a cyclic curve over a course of time. To group sequences of similar molecular functions, we propose a Bayesian Dirichlet process mixture of linear regression models with a Fourier series for the regression coefficients, for each of which a spike and slab prior is assumed. A full Gibbs-sampling algorithm is developed for an efficient Markov chain Monte Carlo (MCMC) posterior computation. Due to the so-called “label-switching” problem and different numbers of clusters during the MCMC computation, a post-process approach of Fritsch and Ickstadt (2009) is additionally applied to MCMC samples for an optimal single clustering estimate by maximizing the posterior expected adjusted Rand index with the posterior probabilities of two observations being clustered together. The proposed method is illustrated with two simulated data and one real data of the physiological response of fibroblasts to serum of Iyer et al. (1999).  相似文献   

5.
Clustering algorithms are used in the analysis of gene expression data to identify groups of genes with similar expression patterns. These algorithms group genes with respect to a predefined dissimilarity measure without using any prior classification of the data. Most of the clustering algorithms require the number of clusters as input, and all the objects in the dataset are usually assigned to one of the clusters. We propose a clustering algorithm that finds clusters sequentially, and allows for sporadic objects, so there are objects that are not assigned to any cluster. The proposed sequential clustering algorithm has two steps. First it finds candidates for centers of clusters. Multiple candidates are used to make the search for clusters more efficient. Secondly, it conducts a local search around the candidate centers to find the set of objects that defines a cluster. The candidate clusters are compared using a predefined score, the best cluster is removed from data, and the procedure is repeated. We investigate the performance of this algorithm using simulated data and we apply this method to analyze gene expression profiles in a study on the plasticity of the dendritic cells.  相似文献   

6.
Accurate and efficient methods to detect unusual clusters of abnormal activity are needed in many fields such as medicine and business. Often the size of clusters is unknown; hence, multiple (variable) window scan statistics are used to identify clusters using a set of different potential cluster sizes. We give an efficient method to compute the exact distribution of multiple window discrete scan statistics for higher-order, multi-state Markovian sequences. We define a Markov chain to efficiently keep track of probabilities needed to compute p-values for the statistic. The state space of the Markov chain is set up by a criterion developed to identify strings that are associated with observing the specified values of the statistic. Using our algorithm, we identify cases where the available approximations do not perform well. We demonstrate our methods by detecting unusual clusters of made free throw shots by National Basketball Association players during the 2009–2010 regular season.  相似文献   

7.
A developmental trajectory describes the course of behavior over time. Identifying multiple trajectories within an overall developmental process permits a focus on subgroups of particular interest. We introduce a framework for identifying trajectories by using the Expectation-Maximization (EM) algorithm to fit semiparametric mixtures of logistic distributions to longitudinal binary data. For performance comparison, we consider full maximization algorithms (PROC TRAJ in SAS), standard EM, and two other EM-based algorithms for speeding up convergence. Simulation shows that EM methods produce more accurate parameter estimates. The EM methodology is illustrated with a longitudinal dataset involving adolescents smoking behaviors.  相似文献   

8.
An early goal in autonomous navigation research is to build a research vehicle which can travel through office areas and factory floors, A simple strategy for directing the robot's movement in a hallway is to maintain a fixed distance from the wall. The problem is complicated by the fact that there are many factors in the environment, such as opened doors, pillars or other temporary objects, that can introduce 'noise' into the distance measure. To maintain a proper path with minimum interruption, the robot should have the ability to make decisions, based on measurements, and adjust its course only when it is deemed necessary. This report describes a new algorithm which enables the robot to move along and maintain a fixed distance from a reference object. The method, based on a robust estimator of the location, combines information from earlier measurements with current observations from range sensors to effectively produce an estimate of the distance between the robot and the object. A simulation study, showing the trajectories generated using this algorithm with different parameters for different environments, is presented.  相似文献   

9.
This paper addresses the problem of identifying groups that satisfy the specific conditions for the means of feature variables. In this study, we refer to the identified groups as “target clusters” (TCs). To identify TCs, we propose a method based on the normal mixture model (NMM) restricted by a linear combination of means. We provide an expectation–maximization (EM) algorithm to fit the restricted NMM by using the maximum-likelihood method. The convergence property of the EM algorithm and a reasonable set of initial estimates are presented. We demonstrate the method's usefulness and validity through a simulation study and two well-known data sets. The proposed method provides several types of useful clusters, which would be difficult to achieve with conventional clustering or exploratory data analysis methods based on the ordinary NMM. A simple comparison with another target clustering approach shows that the proposed method is promising in the identification.  相似文献   

10.
We develop clustering procedures for longitudinal trajectories based on a continuous-time hidden Markov model (CTHMM) and a generalized linear observation model. Specifically, in this article we carry out finite and infinite mixture model-based clustering for a CTHMM and achieve inference using Markov chain Monte Carlo (MCMC). For a finite mixture model with a prior on the number of components, we implement reversible-jump MCMC to facilitate the trans-dimensional move between models with different numbers of clusters. For a Dirichlet process mixture model, we utilize restricted Gibbs sampling split–merge proposals to improve the performance of the MCMC algorithm. We apply our proposed algorithms to simulated data as well as a real-data example, and the results demonstrate the desired performance of the new sampler.  相似文献   

11.
The adaptive last particle method is a simple and interesting alternative in the class of general splitting algorithms for estimating tail distributions. We consider this algorithm in the space of trajectories and for general reaction coordinates. Using a combinatorial approach in discrete state spaces, we demonstrate two new results. First, we are able to give the exact expression of the distribution of the number of iterations in an perfect version of the algorithm where trajectories are i.i.d. This result is an improvement of previous known results when the cumulative distribution function has discontinuities. Second, we show that an effective computational version of the algorithm where trajectories are no more i.i.d. follows the same statistics than the idealized version when the reaction coordinate is the committor function.  相似文献   

12.
We consider the estimation of life length of people who were born in the seventeenth or eighteenth century in England. The data consist of a sequence of times of life events that is either ended by a time of death or is right-censored by an unobserved time of migration. We propose a semi parametric model for the data and use a maximum likelihood method to estimate the unknown parameters in this model. We prove the consistency of the maximum likelihood estimators and describe an algorithm to obtain the estimates numerically. We have applied the algorithm to data and the estimates found are presented.  相似文献   

13.
In this study, an attempt has been made to classify the textile fabrics based on the physical properties using statistical multivariate techniques like discriminant analysis and cluster analysis. Initially, the discriminant functions have been constructed for the classification of the three known categories of fabrics made up of polyster, lyocell/viscose and treated-polyster. The classification yielded hundred per cent accuracy. Each of the three different categories of fabrics has been further subjected to the K-means clustering algorithm that yielded three clusters. These clusters are subjected to discriminant analysis which again yielded a 100% correct classification, indicating that the clusters are well separated. The properties of clusters are also investigated with respect to the measurements.  相似文献   

14.
Movies of the solar corona in Extreme UltraViolet (EUV) bandpasses exhibit complex patterns of magnetically structured plasma features surrounding the solar photosphere. Among the various phenomena to be observed in the EUV movies, coronal oscillations are an essential process for determining physical parameters of the plasma. In this paper we demonstrate the ability of our motion estimation algorithm to explore and analyse the oscillating motions of coronal loops present in EUV image sequences. The motion fields of each image pair in the sequence are estimated; selected features are tracked using the motion estimation to form trajectories. The oscillating features are then selected from the Morlet wavelet analysis of the trajectories that provides parameters such as local oscillation period. The proposed method will be particularly useful to process datasets expected from new solar missions.  相似文献   

15.
Large spatial datasets are typically modelled through a small set of knot locations; often these locations are specified by the investigator by arbitrary criteria. Existing methods of estimating the locations of knots assume their number is known a priori, or are otherwise computationally intensive. We develop a computationally efficient method of estimating both the location and number of knots for spatial mixed effects models. Our proposed algorithm, Threshold Knot Selection (TKS), estimates knot locations by identifying clusters of large residuals and placing a knot in the centroid of those clusters. We conduct a simulation study showing TKS in relation to several comparable methods of estimating knot locations. Our case study utilizes data of particulate matter concentrations collected during the course of the response and clean-up effort from the 2010 Deepwater Horizon oil spill in the Gulf of Mexico.  相似文献   

16.
The K-means algorithm and the normal mixture model method are two common clustering methods. The K-means algorithm is a popular heuristic approach which gives reasonable clustering results if the component clusters are ball-shaped. Currently, there are no analytical results for this algorithm if the component distributions deviate from the ball-shape. This paper analytically studies how the K-means algorithm changes its classification rule as the normal component distributions become more elongated under the homoscedastic assumption and compares this rule with that of the Bayes rule from the mixture model method. We show that the classification rules of both methods are linear, but the slopes of the two classification lines change in the opposite direction as the component distributions become more elongated. The classification performance of the K-means algorithm is then compared to that of the mixture model method via simulation. The comparison, which is limited to two clusters, shows that the K-means algorithm provides poor classification performances consistently as the component distributions become more elongated while the mixture model method can potentially, but not necessarily, take advantage of this change and provide a much better classification performance.  相似文献   

17.
ABSTRACT We present a method to approximate and forecast, on an entire interval, a continuous-time process. For this purpose, we use the modelization of ARH(l) processes, defined by Bosq (1991). We deal with the practical problem of the discretization of the observed trajectories and approximate them by means of spline functions. We show by simulations that for well-chosen smoothing parameters, good prediction can be obtained in comparison with the “predictable” part of the process. Finally, we apply this model to forecast road traffic and compare it with a SARIMA model.  相似文献   

18.
Stochastic models are of fundamental importance in many scientific and engineering applications. For example, stochastic models provide valuable insights into the causes and consequences of intra-cellular fluctuations and inter-cellular heterogeneity in molecular biology. The chemical master equation can be used to model intra-cellular stochasticity in living cells, but analytical solutions are rare and numerical simulations are computationally expensive. Inference of system trajectories and estimation of model parameters from observed data are important tasks and are even more challenging. Here, we consider the case where the observed data are aggregated over time. Aggregation of data over time is required in studies of single cell gene expression using a luciferase reporter, where the emitted light can be very faint and is therefore collected for several minutes for each observation. We show how an existing approach to inference based on the linear noise approximation (LNA) can be generalised to the case of temporally aggregated data. We provide a Kalman filter (KF) algorithm which can be combined with the LNA to carry out inference of system variable trajectories and estimation of model parameters. We apply and evaluate our method on both synthetic and real data scenarios and show that it is able to accurately infer the posterior distribution of model parameters in these examples. We demonstrate how applying standard KF inference to aggregated data without accounting for aggregation will tend to underestimate the process noise and can lead to biased parameter estimates.  相似文献   

19.
Clustering streaming data is gaining importance as automatic data acquisition technologies are deployed in diverse applications. We propose a fully incremental projected divisive clustering method for high-dimensional data streams that is motivated by high density clustering. The method is capable of identifying clusters in arbitrary subspaces, estimating the number of clusters, and detecting changes in the data distribution which necessitate a revision of the model. The empirical evaluation of the proposed method on numerous real and simulated datasets shows that it is scalable in dimension and number of clusters, is robust to noisy and irrelevant features, and is capable of handling a variety of types of non-stationarity.  相似文献   

20.
We present a sequential Monte Carlo algorithm for Markov chain trajectories with proposals constructed in reverse time, which is advantageous when paths are conditioned to end in a rare set. The reverse time proposal distribution is constructed by approximating the ratio of Green’s functions in Nagasawa’s formula. Conditioning arguments can be used to interpret these ratios as low-dimensional conditional sampling distributions of some coordinates of the process given the others. Hence, the difficulty in designing SMC proposals in high dimension is greatly reduced. Empirically, our method outperforms an adaptive multilevel splitting algorithm in three examples: estimating an overflow probability in a queueing model, the probability that a diffusion follows a narrowing corridor, and the initial location of an infection in an epidemic model on a network.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号