Global Utilities

9-NOV-2005

Abstract: There is a subset of Joy in which quotations contain at most one element and nesting of quotations is not allowed. For every Joy program there is a corresponding program in this subset, and when the corresponding program is executed, the original is restored. A translation from full Joy to the subset can be written in full Joy (and, if one wanted to, also in the subset itself).

In most high level programming languages programs contain nested structures, typically in conditionals and loops. Nested structures also arise in data as in nested arrays or nested lists. Concatenative languages such as Forth, Joy, Slava Pestov's Factor, and Stevan Apters's cK and XY are no exception. But nesting interrupts the concatenative semantics: a concatenative program must treat a quotation, a contained nested program, as an indivisible unit, but when the quotation is executed by a combinator, its concatenative semantics unfolds. In the "concatenative" mailing group Billy Tanksley raised the question whether it would be possible to design a concatenative language without nesting, a flat concatenative language.

The written representation of programs in any language consists of a sequencce of symbols which themselves are sequences of characters. The characters are collected by a scanner or tokeniser to form symbols, and the symbols are examined by a parser to form a syntax tree and by the compiler to form a program which typically is nested. Note that the sequence of symbbols is not nested, but it contains some special symbols such as brackets and IF, THEN, WHILE.. . These symbols direct the compiler to buils the nested structures. So one can think of the sequencce of symbols as a program that is executed by the compiler and produces an internal nested structure. Surprisingly the sequence of symbols constitutes a flat concatenative language. The compiler computes a function, and so does the interpreter. Hence the composition of the compiler and the interpreter is an interpreter for a flat concatenative language.

Stevan Apter has adopted this principle in his web page http://www/nsl.com/k/xy/xy.htm. There he uses his concatenative language XY to write a proof-of-concept implementation for a flat concatenative language. The XY language uses a stack X, like all concatenative languages, but also a queue Y. It so happens that this arrangement is suitable implementing a flat concatenative language. Essentially it works by compiling the flat input language to form quotations prior to every execution. So the compilation is done by the XY interpreter, and the result is then again interpreted by XY. The alternative solution offered in the remainder of this note borrows some of these ideas. Instead of a queue Y an ordinary Joy list is used. The flat input language is actually a subset of Joy.

Let Floy ("FLat jOY") be that subet of Joy in which quotations are not allowed to contain more than one atom. This means that [] [3] [dup] [*] and [map] are allowed, but [[]] [1 2 3] [[3]] [dup *] and [[map]] are not. The aim of this note is to demonstrate the following theorem about Joy:

        For every Joy program J
            there is a Floy program F which
                constructs the quotation [J] of the Joy program J:
                              F    =>  [J]
                    and hence F i  ==  [J] i  ==  J
The proof uses a technique commonly used in the theory of computability: One shows that there is a translation function T such that T(J) = F. The translation function is computable, so the proof is constructive. The translation could be computed by a program in any Turing complete language.

Joy is Turing complete, so the translation function from Joy to Floy can be written in Joy itself. Let j2f be a Joy program which takes the quotation [J] of a Joy program J as a parameter and produces the quotation [F] of the corresponding Floy program F. Note that the quotation [F] itself will in general not be a Floy program. The translation program j2f must satisy:

                 [J] j2f      ==  [F]
    and hence    [J] j2f i    ==  [F] i    ==  [J]
          and    [J] j2f i i  ==  [F] i i  ==  [J] i  ==  J
From the second line above [J] j2f i == [J] and hence j2f i == id. So j2f is the left inverse of i, and i is the right inverse of j2f.

The quotation [F] of the Floy program F, when executed by i, has to reconstruct the quotation [J] of the Joy program J, including all internal quotations of J which may b nested. There are two ways in which these quotations can be constructed by F. Both must begin with the empty quotation []. Then, in the forward direcction, start with the first, second .. item and append them in that order to the quotation so far. Alternatively, in the reverse direction, start with the last, second-last .. item and prepend them in that order to the quotation so far. Since there are two ways in which a Floy program could operate, there are in fact two possible Floy programs corresponding to a given Joy program. And, for sheer perversity, a Floy program could use a mixture of these two strategies. But such mixtures wil not be further considered here. The forward or backward versions can be constructed by two different Joy-to-Floy programs, j2f-forwards and j2f-reverse. Both Joy-to-Floy programs traverse the original Joy quotation [J] and any of its internal quotations in the forward direction. Both start building [F] as an empty quotation []. Then j2f-forwards starts at the first, second .. item of [J] or its internal quotation to build [F] in the forward direction, so that when [F] is ever executed by the i combinator, it will reconstruct [J] in the forward direction. On the other hand, j2f-reverse also starts at the first, second .. item to build [F] in the reverse direction, so that when [F] is ever executed by i, it will reconstruct [J] in the reverse direction. Is your head spinning yet? So was mine when writing the code.

The remainder of this note consists of the separately commented output flatjoy.out from running flatjoy.joy. The first definitions define four quoted Joy programs that will later be used to test the two Joy-to-Floy translators. The first is the program that does nothing, the second contains some actual code without nesting, the third has two occurrances of one level of nesting, and the fourth has two occurrances of two levels.

	DEFINE 
	P0 == [];                                       # empty code 
	P1 == [2 3 + dup *];                            # non-empty code 
	P2 == [[1 2 3] [dup *] map];                    # nest data & code 
	P3 == [ [[1 2][3 4][5] []] [[dup *] map] map ]. # double nesting 

Next comes the definition of j2f-forwards and its subsidiaries. The forward version [F] has to use a lot of concatenation, so the first definition abbreviates concat to c. The second definition is for j2f-f, a recursive operator, the real workhorse. If the item to be translated is a list, then do some preparation, step recursively through the list, and then do some finalisation. The preparation consists of appending an empty list to what has been constructed so far, and the finalisation consists of appending code which, when executed by i, will append the unit list of the reconstructed list to the partially reconstructed [J]. On the other hand, if the item is not a list, but an atom foo, build code and append code which, when executed by i, will append foo to the partially reconstructed [J]. The final definition is for j2f-forwards, which must provide an empty starting list [] and then step through [J] by calling j2f-f.

	DEFINE 
	    c  ==  concat; 
	    j2f-f == 
	        [ list ] 
	        [ [[[]] concat] dip [j2f-f] step [[] cons c] concat ] 
	        [ [] cons [c] cons concat ] 
	        ifte; 
	    j2f-forwards == [[]] swap [j2f-f] step. 

The following are the tests. For each of P0..P3, we see the original quoted Joy program [J], then its translation [F] by j2f-forwards, then the result of executing this by the i combinator to yield the original [J]. For each of P1..P3 we also see the result of executing this by a further i combinator. In [F], remember that c is short for concat.

	P0 . 
[]

	P0 j2f-forwards . 
[[]]

	P0 j2f-forwards i . 
[]
	 
	P1 . 
[2 3 + dup *]

	P1 j2f-forwards . 
[[] [2] c [3] c [+] c [dup] c [*] c]

	P1 j2f-forwards i . 
[2 3 + dup *]

	P1 j2f-forwards i i . 
25
	 
	P2 . 
[[1 2 3] [dup *] map]

	P2 j2f-forwards . 
[[] [] [1] c [2] c [3] c [] cons c [] [dup] c [*] c [] cons c [map] c]

	P2 j2f-forwards i . 
[[1 2 3] [dup *] map]

	P2 j2f-forwards i i . 
[1 4 9]
	 
	P3 . 
[[[1 2] [3 4] [5] []] [[dup *] map] map]

	P3 j2f-forwards . 
[[] [] [] [1] c [2] c [] cons c [] [3] c [4] c [] cons c [] [5] c [] cons c [] []
cons c [] cons c [] [] [dup] c [*] c [] cons c [map] c [] cons c [map] c]

	P3 j2f-forwards i . 
[[[1 2] [3 4] [5] []] [[dup *] map] map]

	P3 j2f-forwards i i . 
[[1 4] [9 16] [25] []]

The Floy programs so constructed are almost readable, and with some practice one might even learn to write that way - if one really had to. But there is something else that is unsatisfactory. On inspecting the code for j2f-f and the translations that it produces, we see that both use a lot of concatenations: j2f-f uses concat directly, and the translations use the abbreviation c, which will be executed when the translation is executed by the i combinator. In both cases the concatenation serves to append a quite short list to another list which might already be quite long. But that means that the other longer list has to be copied entirely to have the short list as its new tail. This is inefficient, and a better solution would be to work the other way, so that a short list is prepended rather than appended.

The following definition for translator j2f-reverse addresses the problem. From a given Joy program [J] it constructs in reverse an equivalent Floy program [F] which when executed by the i combinator reconstructs in reverse the original Joy program [J]. Instead of c for concat, define s for swoncat, to be used inside the translation [F]. The workhorse is j2f-r. If the item is a list do some preparation, recursively step through the list and then do some finalisation. Finalisation consists in prepending [] to the [F] built so far, it will be the starting point when the completed [F] is executed by i. Preparation consists in prepending to [F] code which when executed i will prepend the constructed quotation to the partially rebuilt [J]. If the item is not a list but say foo, construct and prepend code which when executed by i, will prepend foo. The final j2f-reverse has to provide an inital empty list [] and after stepping through [J] prepend another empty list to that.

	DEFINE 
	    s  ==  swoncat; 
	    j2f-r == 
	        [ list ] 
	        [ [[swons] swoncat] dip [j2f-r] step [] swons ] 
	        [ [] cons [s] cons swoncat ] 
	        ifte; 
	    j2f-reverse == [] swap [j2f-r] step [] swons. 

The tests for j2f-reverse are the same as for j2f-forwards: Show [J], then translate to yield [F] , then i the translation to reconstruct [J], and where appropriate, i the reconstruction.

	P0 . 
[]

	P0 j2f-reverse . 
[[]]

	P0 j2f-reverse i . 
[]
	 
	P1 . 
[2 3 + dup *]

	P1 j2f-reverse . 
[[] [*] s [dup] s [+] s [3] s [2] s]

	P1 j2f-reverse i . 
[2 3 + dup *]

	P1 j2f-reverse i i . 
25
	 
	P2 . 
[[1 2 3] [dup *] map]

	P2 j2f-reverse . 
[[] [map] s [] [*] s [dup] s swons [] [3] s [2] s [1] s swons]

	P2 j2f-reverse i . 
[[1 2 3] [dup *] map]

	P2 j2f-reverse i i . 
[1 4 9]
	 
	P3 . 
[[[1 2] [3 4] [5] []] [[dup *] map] map]

	P3 j2f-reverse . 
[[] [map] s [] [map] s [] [*] s [dup] s swons swons [] [] swons [] [5] s swons [] [4]
s [3] s swons [] [2] s [1] s swons swons]

	P3 j2f-reverse i . 
[[[1 2] [3 4] [5] []] [[dup *] map] map]

	P3 j2f-reverse i i . 
[[1 4] [9 16] [25] []]
As may be seen, the reverse Floy programs are much harder to read and write when one's mindset is for postfix notation. Finally, I could not resist flattening the already flattened:
	                # Double flattening - an exercise in Joy obfuscation: 
	P3 j2f-forwards j2f-reverse . 
[[] [c] s [] [map] s swons [c] s [cons] s [] swons [c] s [] [map] s swons [c] s
[cons] s [] swons [c] s [] [*] s swons [c] s [] [dup] s swons [] swons [] swons [c] s
[cons] s [] swons [c] s [cons] s [] swons [] swons [c] s [cons] s [] swons [c] s []
[5] s swons [] swons [c] s [cons] s [] swons [c] s [] [4] s swons [c] s [] [3] s
swons [] swons [c] s [cons] s [] swons [c] s [] [2] s swons [c] s [] [1] s swons []
swons [] swons [] swons]

	                # partial de-obfuscation  -  by Joy itself: 
	P3 j2f-forwards j2f-reverse i . 
[[] [] [] [1] c [2] c [] cons c [] [3] c [4] c [] cons c [] [5] c [] cons c [] [] cons c [] cons c [] [] [dup] c [*] c [] cons c [map] c [] cons c [map] c]

	                # full de-obfuscation  -  again by Joy: 
	P3 j2f-forwards j2f-reverse i i . 
gc - 191 nodes inspected, 126 nodes copied, clock: 1
[[[1 2] [3 4] [5] []] [[dup *] map] map]

	                # final run of de-obfuscated original: 
	P3 j2f-forwards j2f-reverse i i i . 
[[1 4] [9 16] [25] []]

This ends the output file flatjoy.out from running the source file flatjoy.joy. The definitions of the two translators j2f-forwards and j2f-backwards have been added in slightly modified form to the Joy library for symbolic manipulation, symlib.joy.

Here again is the theorem already mentioned at the beginning:

        For every Joy program J
            there is a Floy program F which
                constructs the quotation [J] of the Joy program J:
                              F    =>  [J]
                    and hence F i  ==  [J] i  ==  J
The proof is by induction on the structure of the Joy program J, using either of the two Joy-to-Floy translators just shown. A informal version of the proof is as follows: The base case is for the quoted empty program []; the two tests for P0 have demonstrated the correctness of the translators. For the induction on the length of the program without any internal quotations, the two tests for P1 have demonstrated the correctness. For the induction on the depth of nested quotations, the two tests for each of P2 and P3 demonstrate the correctness. A more formal proof would need to proceed by analysing the actual code for the translators and even of the implementation of Joy, but this is outside the scope of this note.

The translations into the two versions of Floy contain many occurrances of unitlists of the form [foo], each followed by a c (for concat) in the forward version, or by a s (for swoncat) in the reverse version. It might be thought that unitlists already break the ideal of flatness. For Joy as it is, there seems to be no other solution. But the syntax of Joy could be extended by adding unary operators in prefix notation. Two kinds of solutions suggest themselves:

Writing say Q for one such operator, Q foo would be the same as [foo]. Then for the forward and reverse versions of Floy one might write Q foo concat, or Q foo swoncat. Then one might combine the quoting operator with the concat or swoncat in two other unary operators C or S, and then writing C foo, or S foo.

The other kind of solution would introduce an operator F, so that F foo would be the same as [foo] first, with the effect of pushing foo onto the stack. Then concat or swoncat would have to be replaced by append and prepend, possibly by combined unary operators A and P, and then writing A foo, or P foo.

Similar unary operators have been suggested before in different contexts for different purposes. But such a major addition to the syntax does not sit so well with the ideal of simple concatenativity, and no such addition is currently planned for Joy. However, it may so turn out that compelling arguments for additional syntax will emerge in the future.