John Reynolds passed away last month. He was one of the great researchers in programming languages. On numerous occasions I’ve had what I thought was a promising research idea, only to discover that John had already published it, 20–30 years before. I doubt there’s any aspect of programming languages that hasn’t been influenced by his research.
Here’s a small example. In graduate school I read Tony Hoare’s famous paper on communicating sequential processes. There are many interesting things about this paper, but one thing that caught my eye is that CSP clearly has pattern matching, which in my opinion is the most important feature of academic programming languages that has not yet crossed over into mainstream languages.
In Hoare’s words, one notable aspect of CSP is that
A simple pattern-matching feature, similar to that of [16], is used to discriminate the structure of an input message, and to access its components in a secure fashion. This feature is used to inhibit input of messages that do not match the specified pattern.
The citation [16] is a paper by John Reynolds. Moreover, it is dated 1965, early in the history of programming languages. It’s actually a technical report from the Argonne National Laboratory, and it isn’t easy to obtain. Argonne doesn’t have it online, but there are two copies held in university libraries, and I borrowed Yale’s copy via inter-library loan. I scanned it, and I’m putting it here for posterity:
COGENT Programming Manual, by John C. Reynolds. ANL–7022, Argonne National Laboratory, Argonne, Illinois, March 1965.
The pattern-matching feature that Hoare is refering to is COGENT’s analytical assignment, which John describes on page 22:
For example, suppose Z has the value (TERM / ABE * BED). Then the analytic assignment statement
Z =/ (TERM / (FACTOR)*(FACTOR)), X, Y
will give X the value (FACTOR / ABE) and Y the value (FACTOR / BED). (The explicit list structures for this example are again given by Figure 5.)
An additional property of the analytic assignment arises from the possibility that the comparison of the test and template list structures may show that these structures are dissimilar (beyond the occurrence of parameter elements in the template in place in place of sublists of in the test structure). If the list structures do not match, then the analytic statement fails, without changing the value of any variables. (Failure is a conditional control mechanism explained below.)
Thus, if Z has the value (TERM / ABE BED CAB), which is not a term composed of two factors, then the statement given above will fail, leaving the values of X and Y unchanged.
This is about as clear a definition of pattern matching as you’ll find. It contains all of the essential features of pattern matching as found in languages like Haskell and ML:
- Patterns match (tree-) structured data types.
- Patterns can be nested, and nested patterns match nested data structures.
- Patterns destructure data structures and bind sub-structures to variables.
- Patterns can fail as well as match.
All of this leads me to the question: who invented pattern matching? That is, while variants of pattern matching notation undoubtedly go back hundreds of years in pure mathematics, someone must have been the first to implement them in a programming language. John’s work is from 1965, rather early, and on page 2 he states that the tech report replaces earlier versions dating from 1962!
Wikipedia has a rather thin article on pattern matching. It claims that “The first computer programs to use pattern matching were text editors,” and cites Dennis Ritchie, who points out Ken Thompson’s work on extracting regular expression matches; but that dates from 1967.
The earliest citation in the Wikipedia article is for SNOBOL, whose development started in 1962. The language is described in SNOBOL, A String Manipulation Language. In comparison to John’s work, SNOBOL’s pattern matching is for strings, while COGENT pattern matching operates not only on strings (derived from parse trees), but on data structures as well. Moreover, SNOBOL’s pattern matching can only extract substrings of limited expressiveness: a sequence of any character (delimited by context or length), or strings with properly nested parentheses.
A history of the SNOBOL programming languages says that the first SNOBOL implementation took place in 1963; John cites his own lecture notes on “COGENT, A Compiler and Generalized Translator,” from December 1962. This may indicate that John’s implementation came first.
The authors of both SNOBOL and COGENT cite previous work on pattern matching. SNOBOL cites SCL and COMIT. I have not seen any documentation on SCL, but COMIT seems to have a strong claim. The documentation I have seen clearly has pattern matching of data structures. That document is from 1965. This paper by Yngve from 1963 does not describe the COMIT feature, but it says that an incomplete implementation was started in 1957. That’s hard to beat.
In John’s COGENT paper, he says that
The synthetic and analytic assignment expressions are generalizations of the “Parameter Operations” used by Brooker and Morris in their Compiler Compiler (2).
So John points to parser generators as the inspiration for his pattern matching (in fact COGENT, in addition to being a general-purpose language, serves as a compiler-compiler). This to me makes more sense than Wikipedia’s claim that pattern matching arose in text editing. Parsers impose structure onto strings via grammars, a kind of pattern. Text editors, even including regular expressions, fall one step short of that, in my opinion.
Some of the Brooker and Morris papers appeared in The Computer Journal, e.g., this one. Unfortunately, the scan is not of the highest quality. I don’t see as clear a statement of pattern matching as in John’s paper, but John certainly saw pattern matching there.
It is probably impossible at this remove to reconstruct the exact history of pattern matching; most of the pioneers have died off. But we can be sure that John mastered the subject, more than 50 years ago, in his usual style.