In This Article Expand or collapse the "in this article" section Sequence Analysis

  • Introduction
  • Algorithmic Origins
  • Didactic Pieces
  • “First Wave” Applications

Sociology Sequence Analysis
Brendan Halpin
  • LAST REVIEWED: 29 October 2013
  • LAST MODIFIED: 29 October 2013
  • DOI: 10.1093/obo/9780199756384-0077


Sequence analysis in sociology refers to a group of approaches to linear (predominantly longitudinal) data that focuses on sequences (such as work-life histories or conversations) as wholes. Sequence analysis is often exploratory and descriptive in intention, typically oriented to generating data-driven typologies, and can be contrasted with conventional approaches to longitudinal data such as hazard-rate modeling (event history analysis), models of transition patterns, or latent growth curve models, which focus on modeling the processes generating the sequences. The development of sequence analysis in sociology has been informed by a perception that complex sequences—such as life-course trajectories or the rhetorical structure of a story—are likely to have structures that are not easily captured by models that focus on their evolution through time, but these structures may be apparent when they are considered as wholes. Sequences are linear objects; that is, they have a unidimensional, ordered structure. In sociological research the dimension is almost always time, so sequences are longitudinal. Two main types of sequences exist: those representing longitudinal process such as life-histories, coded as states in successive time-periods, and those representing structures that unfold through time, such as conversations (coded into types of utterances) or dances (coded into sequences of steps). Sequences are very detailed; for instance, a sequence ten units long in a four-element state space has more than a million possible forms. Hence, sequence data can be difficult to classify either a priori or by inspection. Sequence analysis often has as a goal a data-driven classification, typically achieved by defining a metric of similarity between sequences and using cluster analysis to group sequences on the basis of pairwise similarity. Sequence analysis draws on computer-science techniques for pattern finding in strings of tokens (e.g., text) or other longitudinal data (e.g., recorded speech), some of which have proven very powerful, particularly in molecular biology. A recurrent theme in the sequence analysis literature is whether algorithms developed for, or appropriate to, nonsociological fields such as molecular genetics can map onto sociological data in a meaningful way. Sequence analysis in sociology is currently almost coterminous with the optimal matching algorithm; although alternatives are available, the bulk of research employs this measure. The idea is relatively simple: let the difference between two sequences identical except for one element be related to the difference in the elements, and the difference between two sequences identical except that one has one extra element be the cost of deleting the superfluous element. The difference between any two sequences, then, is the “cost” of the “cheapest” concatenation of substitutions (where the sequences differ in elements) and insertions or deletions that change one sequence into the other. For instance, AABC can be changed into ABBD by deleting the initial A, matching on the next A and the B (two zero-cost “substitutions”), inserting a second B, and substituting the C with a D. Depending on the relative costing of the operations, this may or may not be the cheapest set of operations; longer sequences will have many possible sets to be considered. The optimal matching algorithm (OMA) is “optimal” in identifying the cost of this cheapest set of operations efficiently. The net result is that OMA can identify similarity (identity or partial similarity, thanks to the substitution operation) at the same or different locations (thanks to insertions and deletions, which permit “alignment,” sliding one sequence along another).

Andrew Abbott’s Contribution

The American sociologist Andrew Abbott has done more than anyone else to introduce sequence analysis to the sociological repertoire, with a long series of publications with colleagues and students, advocating and demonstrating its utility. He has used an eclectic range of applications and has made strong arguments about the role of an event-focused “narrative” approach to sociology. At times, his advocacy of sequence analysis developed into a trenchant critique of the state of empirical sociology. Although the concern with sequence predates his work, his advocacy of the optimal matching algorithm has been very influential. A “first wave” of applications can be attributed directly to his influence, which are summarized in a debate in the journal Sociological Methods and Research in 2000 (see the sections on Arguing with Levine and Wu for the debate and “First Wave” Applications for the applications). In the years since, the use of sequence analysis has broadened and deepened, as summarized in “Second Wave” Applications and Future Directions.

back to top

Users without a subscription are not able to see the full content on this page. Please subscribe or login.

How to Subscribe

Oxford Bibliographies Online is available by subscription and perpetual access to institutions. For more information or to contact an Oxford Sales Representative click here.