Global Utilities

Contents:
  1. "Why is == an infix operator ?"
  2. "But why can't definitions be entered at run-time ?"
  3. "What happens if one takes the first of [dup *] ?"
  4. "Why does Joy have all those recursion combinators ?"
  5. "So why don't other languages have recursion combinators ?"
  6. "Are there any higher order combinators ?

1. "Why is == an infix operator ?"

In Joy almost all operators are written in postfix order, after their operands. One exception seems to be the == operator in definitions. Why should == be treated differently from the others?

In just about all programming languages a program consists of a number of definitions of procedures or functions. Among them is one, called "main" in C, or an anonymous one in Pascal, which is the starting point. The same pattern occurs in grammars, which consist of definitions of non-terminals, including one that is called "start" or whatever. The pattern is also used in most macro expanders in which a collection of macro definitions is processed.

Any one of the definitions consists of two parts: 1) a name, and 2) a body obeying a syntax very specific to the language. In procedural languages like C or Pascal the body is a statement sequence. In functional languages the body is an applicative expression. In grammars the body is a regular expression over terminals and non-terminals. In macro expanders the body consists of literal text. In each case the body may contain calls to items defined elsewhere.

A definition contains a body as a part. The body must satisfy certain syntactic rules, and so must the whole definition. But in all cases the syntax for a whole definition is different from that of its largest part, the body. In procedural languages the body is a statement sequence, but a definition is not. In functional languages a body is an applicative expression, but a definition is not. In grammars a body is a regular expression, but a definition is not. In macro expanders a body is just text, but a definition is not. Lisp looks like an exception, because both bodies and definitions are symbolic expressions, written in a minimal syntax. But there are restrictions on definitions that are not imposed on bodies.

Definitions and bodies play very different roles. The difference is most conspicuous in fully compiled implementations. During compilation definitions produce entries in a symbol table, and the bodies are translated into directly executable machine language, but the bodies are not executed. The resulting compiled program contains no trace of the symbol table (except when it has been compiled for a run with a symbolic debugger).When the compiled program is run, a new structure appaears, typically at least a run-time stack, which did not exist during compilation. Now the body of the main program and any other called bodies can be executed.

The compilation stage need not translate bodies into directly executable machine language. The target language is some idealised machine language which needs to be interpreted. When such a program is run, there will be a run-time stack and in addition a software interpreter. (The portable Pascal implementation P4 works like this.)

Even if the compiler and the interpreter are in the same program, there still are these two stages: 1) reading a program, processing definitions and compiling bodies into some internal form, and 2) running a program by interpreting the internal form. When the program is run, the symbol table is still there, but no new definitions are entered. The run-time stack and other structures are present during reading, but only used now when the translated bodies are executed by the interpreter.

Numerous variations are possible, but the difference between the two stages is always present. So there is no reason to use the same syntax for definitions and bodies. The same applies to Joy.

The syntax for definitions in Joy is

        name   ==   body
where the body is a concatenative expression. The symbol "==" could even be left out altogether. A definition sequence consists of one or more definitions separated by semicolons. Importantly, unlike "==", the semicolon could not be left out. Definition sequences are headed by "LIBRA" or "DEFINE" and terminated by a period. But none of the the symbols "LIBRA", "DEFINE", "==", ";" and "." denote run-time operations like "+", "cons" or "map" do. The same is true for "HIDE", "IN" and "END" in local definitions.

So, "==" is not a postfix operator because it is not a run-time operator any more that the other symbols used in definitions.

2. "But why can't definitions be entered at run-time ?"

This question is very different from the preceding one. Whatever the syntax of definitions, it would be possible to allow definitions to be entered at run-time. But it may well be a bad idea.

For the sake of having a concrete example, consider the ordinary definition

	cube   ==   dup  dup  *  *
that is normally processed at compile-time. If such a definition is to be entered into the symbol table at run-time, then the two parts must be available: 1) the name "cube", and 2) the body "dup dup * *". Clearly none of the following will do
	cube   ==   dup  dup  *  *
	cube  dup  dup  *  *   ==
	cube  [dup dup * *]   DEF
because the initial occurence of "cube" will at run-time produce a call to that function which has not yet been defined. So the name of the function being defined has to be expressed in a way that prevents a call and instead produces an entry into the symbol table. Furthermore, the two occurrences of "dup" and of "*" in the first two candidates also have to be prevented from producing a call, as indeed they are in the third candidate.

This suggest the other notation

	"cube"  [dup dup * *]  DEF
In other words, "DEF" would expect a string and a quotation on top ot the stack, it would pop these off and enter the definition. In that case the following would produce the same result:
	"uc"  reverse  "xbe"  rest  concat  [dup dup * *]  DEF
However it is dubious whether such freedom to construct names by string manipulation can do anything but obfuscate a program. Other possibilities are these:
	[cube]  first  [dup dup * *]  DEF1
	[cube  dup dup * *]  DEF2
In both cases the new symmbol being defined occurs inside a quotation so it is not being executed. In the first case the symbol even ends up on the Joy stack just on its own, but again it is not executed. The two styles are essentially equivalent, although DEF1 and DEF2 have different expectations as to what has to be on top of the stack. Their equivalence can be seen by the mutual definitions - ordinary definitions, that is -
	DEF1   ==     cons  DEF2
	DEF2   ==   uncons  DEF1
Both styles lend themselves to program manipulation before the definitions are entered, for example
	[cube]  first  [* * dup dup]  reverse  DEF1
	[* *]  [dup dup]  concat  [cube]  swoncat  DEF2
Note that this is quite different from the construction of quotations that is often used in Joy before the quotations are processed by a combinator. For example, it is now possible to initialise a definition, say of "current-item", by
	[current-item  1]   DEF2
and to update it with, say an increment of 3, by
	current-item  3  +  []  cons  [current-item]  swoncat  DEF2
This means that the name "current-item" is actually the name of a variable which can be updated in all manners of ways. Hence the language would have assignment statements (with a terrible syntax). But assignment statements are what Joy and all purely functional languages try to avoid.

In the above example, current-item is an integer variable. Clearly one could have string or set or list variables, too. One could also have operator variables, or combinator variables, or quotation variables, or variables that start their life as integer variables and then become all sorts of other kinds of variables. So, allowing definitions to be entered at run-time would open all the possibilities of programs that constantly modify themselves. But it is widely agreed that this is a bad idea.

3. "What happens if one takes the first of [dup *] ?"

Or what happens if I take the second of [dup *] ? The usual list operations concat and rest applied to a quotation obviously leave a quotation on top of the stack. But list operations like first, second, uncons and unswons seem to leave an unadorned and unquoted operator (or combinator) on the stack:
	[dup *]  first   ==>   dup
	[dup *]  second  ==>   *
	[dup *]  uncons  ==>   dup  [*]
Does this make any sense at all? What can one do with such an unadorned unquoted operator?

There are many things one can do with an unadorned operator. Like any other item on the stack, it can be popped, swapped or duped. It can be printed either by an explicit put or, if the setautoput flag is set to 1 or 2, automatically after the execution of the current program. Of course it can also be read back. What is written or read is of course the sequence of 3 characters, d-u-p, in the first case. More interestingly, the unadorned operator can be made part of another quotation, as in

	[dup *]  first  []  cons   ==>   [dup]
The resulting quotation [dup] can become a parameter for a combinator, for example the i-combinator which will just execute what is inside the quotation. So we have
	[dup *]  first  []  cons  i    ==    dup
If the operator or combinator is not a primitive, but defined by the user, then one can get the body of its definitions as a quotation. For example, in the current implementation first is a primitive, but second is defined in the initial library inilib.joy:
	second   ==   rest  first
The body of that definition is accessible by
	[second]  first  body
which will leave the quotation [rest first] on top of the stack.

4. "Why does Joy have all those recursion combinators?"

A: In the 70's it was recognised that programs using structured IFs and WHILEs are easier to get right and easier to read than programs which use GOTOs and in which the pattern of the flow of control has to be inferred by laborious analysis. One reason for adding CASE, REPEAT and FOR was that they make programs even more explicit and easier to understand.

In the functional languages which use lists a lot much of list processing falls into a few recursion patterns which are often abstracted as combinators such as map, filter and fold, and also a few variants such as zipwith. Programs which use these combinators by name are easier to write and easier to read than programs in which the pattern of recursion on lists has to be inferred.

Joy takes this idea a step further by combinators for recursion patterns that are independent of any particular datatype such as lists above. For linear recursion there is the combinator linrec, its special case tailrec, and even more special case while. There is also the more flexible condlinrec. For binary recursion there is the combinator binrec, but so far no more special or more flexible versions. For recursion with arbitrary n-ary branching there is the combinator genrec, so far with no variants. For recursion on trees of any type other than lists there are several combinators. For nested recursion (as in the Ackermann function) there will soon be two combinators nestrec and condnestrec. So far there are no combinators for mutual recursion. All these combinators are more intuitive and descriptive than the well-known "paradoxical" all-purpose recursion combinators (Curry's) Y and (Turing's) T which are frequently defined but never used because they are all-purpose and hence not descriptive.

Of course the execution of programs without explicit GOTOs typically involves its use internally. Similarly, the execution of programs with recursion combinators but without explicit recursion typically involves its use internally. But programs using combinators are easier to write and read than ones using explicit recursion and in which the pattern of all, not just list, recursions has to be inferred. So, if you like, "Recursion considered harmful".

As a bonus, at least for the current implementation, programs using recursion combinators are more efficient than programs using explicit recursion.

5. "So why don't other languages have recursion combinators ?"

A: For several, mostly independent reasons:

1. Joy combinators take quotations as arguments. Other languages would have to use lambda abstractions instead. But many have no way of capturing the current environments.

2. Apart from this, many languages could not even define a lambda counterpart of Joy's simple i combinator - the "dequotation combinator". This rules out most mainstream languages. Some counterpart might be definable in the Lisps, ML, Haskell and Prolog/

3. Again indpendently, there will be a problem about Joy's simple branching combinator ifte. With "eager" evaluation it is not possible to define an ordinary function for branching, as Eva Lou Ator will tell you, in SICP p 23. Branching has to be done with "special forms" cond or if. To define them one needs delayed evaluation or macros. Since the recursion combinators all involve branching they need the same.

4. The next hurdle concerns the number of extra parameters apart from the quotation parameters for combinators. In Joy all parameters, including extra ones just sit on the stack. A different language might do one of the following: (a) Have several versions of combinators for each possible number of extra parameters. (b) If possible, use whatever facility the language has for optional parameters, and put the extras there. (c) Rewrite all functions including combinators as taking a stack (a list) as their sole parameter.

5. Some languages are strongly statically typed, perhaps in a polymorphic way. Finite unions can then solve the problem that combinators are supposed to take as parameters functions of all sorts of base types but also lists of these and lists of lists of base types and so on. But I don't know whether this can handle lists of mixtures of lists of mixtures when the entire mix is unknowable at compile time.

There are languages in which the Y or T combinators can be defined, or at least collections of variants for different number of parameters of the recursing function. Presumably using similar techniques it is possible to define in these languages counterparts of the recursion combinators of Joy. But I have never seen such definitions, and hence never their routine use.

6. Are there any higher order combinators ?

Combinators take functions as parameters, so combinators are second order functions. Are there any second order combinators, functions which take first order combinators as parameters - in other words third order functions? Are there any functions that take as parameters functions which take as parameters functions?

The quick answer is NO, it stops at second order functions.

The more considered answer is that combinators do not take functions as parameters but they take quoted programs as parameters. It is indeed possible to write programs with combinators which take quoted programs as parameters, where the quoted program, when executed, will execute another combinator taking as parameter another quoted program as parameter, and so on. But such apparent higher order examples are just higher order uses of the combinator, they are not examples of higher order combinators.

The following are some examples, all involving the simple i combinators. All just compute the same squaring function. The examples are here grouped according whether the use of the right-most i combinator is first order, second order or third order.

first order use          second order use         third order use
  [dup *]       i
 [[dup *] i]    i         [dup *]    [i] i              
[[[dup *] i] i] i        [[dup *] i] [i] i        [dup *] [i] [i] i       

A less contrived example is the following, used to compute the list comprising the successor, the double and the square of a given number, for example 7.

    7   [[succ] [2 *] [dup *]]   [i]   map     ==>     [8 14 49]
The map combinator is mostly used with a quotation parameter that computes a first order function, but as this example shows, the quotation can also be a combinator. As a final example, here is the ifte combinator used to branch depending on whether the second item on the stack is a leaf (not a list), or a list. In the first case the top quotation will be called with the i combinator, and in the second case with the map combinator:
        [pop leaf]  [i]  [map]  ifte
No Joy example is known where a combinator can only be used with quoted combinators as parameters. (This includes, for example, the operations in Church arithmetic.)

[MYNOTES: higher-order-combs continuations oo mess-pass lazy need-stack why-postix reflective get-put-funct cat-theory]

Back to homepage for Manfred von Thun